Page MenuHomePhabricator

[llvm] Preliminary fat-lto-objects support
Needs ReviewPublic

Authored by paulkirth on Mar 23 2023, 5:54 PM.

Details

Summary

Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.

Within LLVM, we add a new EmbedBitcodePass that serializes the module to
the object file, and expose a new pass pipeline for compiling fat
objects. The new pipeline initially clones the module and runs the
selected (Thin)LTOPrelink pipeline, after which it will serialize the
module into a .llvm.lto section of an ELF file. When compiling for
(Thin)LTO, this normally the point at which the compiler would emit a
object file containing the bitcode and metadata.

After that point we compile the original module using the
PerModuleDefaultPipeline used for non-LTO compilation. We generate
standard object files at the end of this pipeline, which contain machine
code and the new .llvm.lto section containing bitcode.

Since the two pipelines operate on different copies of the module, we
can be sure that the bitcode in the .llvm.lto section and object code
in .text are congruent with the existing output produced by the
default and LTO pipelines.

Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
paulkirth added inline comments.Apr 12 2023, 6:09 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
1443

Noted. I'll take a look at refactoring this accordingly.

1481

Running the full default pipeline after the pre-link pipeline doesn't make a lot of sense. You should only run the module optimization pipeline here. For ThinLTO, the pre-link pipeline + module optimization will give you pretty much exactly the default pipeline.

This is a very helpful detail, so thanks for pointing that out.

For FatLTO, it will run module optimization twice, at least until D148010 lands.

Just for clarification, when you say FatLTO are you referring to full/normal LTO? In my experience FatLTO has typically referred to the feature being implemented in this patch, since it uses a fat object file format for LTO purposes, and has been common in GCC for a long time. The only place I've seen FatLTO used otherwise is in the Rust compiler, so I want to be sure I'm understanding you correctly.

This will still have some problems though, such as running the optimization pipeline extension points twice.

Do you have any recommendations about how we may address that? or at least suggestions on where to focus investigation?

nikic added inline comments.Apr 13 2023, 12:25 AM
llvm/lib/Bitcode/Writer/EmbedBitcodePass.cpp
54

Does this get assigned the correct section flags? I'd expect it to need SHF_EXCLUDE so it doesn't end up in the final binary when linked without LTO.

llvm/lib/Passes/PassBuilderPipelines.cpp
1481

Just for clarification, when you say FatLTO are you referring to full/normal LTO? In my experience FatLTO has typically referred to the feature being implemented in this patch, since it uses a fat object file format for LTO purposes, and has been common in GCC for a long time. The only place I've seen FatLTO used otherwise is in the Rust compiler, so I want to be sure I'm understanding you correctly.

Yes, I was using fat as in non-thin here :) It's unfortunate that that the term "fat LTO" can now refer to both "non-thin LTO" and "fat-object LTO", including the peculiar combination of "fat thin LTO"...

Do you have any recommendations about how we may address that? or at least suggestions on where to focus investigation?

The way to observe this would be to try -ffat-lto-object together with something like -fsanitize=address. You should see address sanitizer getting run twice.

As to addressing it, I'm not sure there's a super clean way to do that short of adding an extra parameter to buildThinLTOPreLinkDefaultPipeline to suppress running the relevant extension points.

paulkirth updated this revision to Diff 513765.Apr 14 2023, 3:15 PM

Rebase and address some comments

  • Fix some wrong code introduced when spliting the patch up and rebasing.
  • Fix typo in comment
  • Avoid spliting function

I still need to investigate the SHF_EXCLUDE flag, avoiding the perModuleDefaultPipeline, and extension points.

paulkirth added inline comments.Apr 14 2023, 3:41 PM
llvm/lib/Bitcode/Writer/EmbedBitcodePass.cpp
54

It seems to work fine w/ the current implementation, and the .llvm.lto section doesn't end up in the final binary. We actually have several tests to that effect added later in the stack.

The caveat there is that this stack of patches uses some custom logic for the .llvm.lto section in LLD. This was originally something our intern worked on, which was based on similar patches in D130777. I'm now trying to get that work into a landable state, so if we can achieve something similar w/ the existing section flags, I'd prefer to use that mechanism over some bespoke logic.

It looks like we usually handle this type of thing with named metadata, and that the section flags are set in TargetLoweringObjectFileImpl.cpp.

I'll take a pass at adopting that approach instead, since it seems like more idiomatic and less brittle solution.

llvm/lib/Passes/PassBuilderPipelines.cpp
1481

Yes, I was using fat as in non-thin here :) It's unfortunate that that the term "fat LTO" can now refer to both "non-thin LTO" and "fat-object LTO", including the peculiar combination of "fat thin LTO"...

Even more concerning: it could also be ThinFatLTO ;)

so we're really in it now. Good thing English is so unambiguous and never leads to miscommunication.

Do you have any recommendations about how we may address that? or at least suggestions on where to focus investigation?

The way to observe this would be to try -ffat-lto-object together with something like -fsanitize=address. You should see address sanitizer getting run twice.

As to addressing it, I'm not sure there's a super clean way to do that short of adding an extra parameter to buildThinLTOPreLinkDefaultPipeline to suppress running the relevant extension points.

At first glance that doesn't look too bad, so that is probably what I'll try.

Also thanks for the pointer on ASAN that will be useful as I try and clean this up. :)

Adjust FatLTO pipline based on comments.

  • Generated code seems to be very close to what we want. How we define the pipeline is rather subjective, so it isn't an issue if the FatLTO pipeline isn't really identical to the default -O3 pipeline when generating the oject code.
  • Remove left over function declaration missed in the last update.
nikic requested changes to this revision.Apr 17 2023, 12:15 PM

Some nits, but this generally looks okay as a starting point.

llvm/include/llvm/Bitcode/EmbedBitcodePass.h
32

USe -> Use here and above

34

Stray semicolon at the end.

llvm/include/llvm/Passes/PassBuilder.h
240

size -> side

246

You can drop this paragraph, it should support O0 fine.

248

ThinLTOPreLink -> ThinLTO as this is always a pre-link pipeline?

llvm/lib/Bitcode/Writer/EmbedBitcodePass.cpp
31

You can pass true to suppress the crash, as this is user error, not a compiler error.

44

TBH I'm not sure we need the use list option here at all, as it's debugging functionality. I don't think anyone needs use lists embedded in fat object files. Especially if this only does something with one LTO type (ThinLTOBitcodeWriter hardcodes this to false).

50

Drop llvm:: prefix.

This revision now requires changes to proceed.Apr 17 2023, 12:15 PM
paulkirth updated this revision to Diff 514428.Apr 17 2023, 2:58 PM
paulkirth marked 9 inline comments as done.

Address comments

  • Fix typos and nits
  • Clean up comments
  • Remove use of EmitLLVMUseLists
  • Use embedBufferInModule since it does exactly what we want: create a nameed metadata and section, and mark them exclude

TODO: deal w/ extension points

it'd be good to highlight somewhere that the thinlto-pre-link pipeline + module optimization pipeline is not exactly equivalent to the default pipeline (all the Phase params everywhere), so with this you'll get a different binary if you end up not going down the LTO route in the end than a typical non-fat build, especially if people ever want to adjust the thinlto pipeline (e.g. less aggressive inlining in the pre-link pipeline). the potential fiddling with the thinlto pre-link pipeline in the future is what has me most worried about this patch.

(the commit description is out of date referencing PerModuleDefaultPipeline)

I'd definitely like @tejohnson's feedback on this if there are any potential pitfalls we're missing

it'd be good to highlight somewhere that the thinlto-pre-link pipeline + module optimization pipeline is not exactly equivalent to the default pipeline (all the Phase params everywhere), so with this you'll get a different binary if you end up not going down the LTO route in the end than a typical non-fat build, especially if people ever want to adjust the thinlto pipeline (e.g. less aggressive inlining in the pre-link pipeline). the potential fiddling with the thinlto pre-link pipeline in the future is what has me most worried about this patch.

Noted. I'll look at adding some patch notes to go along with this that discuss the tradeoffs. There should be some kind of documentation in the rst files too, which is currently missing.

For what it's worth, I don't know if its a problem that they aren't identical. Ideally, they would be, but the only way to do that would be to compile twice, once for LTO and once for normal object code. @phosek and I discussed how support for that may look w/ @tejohnson and came to the conclusion that we probably don't want to add that type of complexity to clang. Plus, I don't think it's necessarily wrong to generate the IR as if we're doing LTO/ThinLTO and then complete the LTO/ThinLTO pipeline for the module. Ideally, that would be extremely close to running the perModuleDefaultPipeline, but I think it's probably fine if they skew somewhat, since I don't imaging them straying too far. That said, I'm often wong, so I'm happy to defer to others on this point. :)

Is your worry here that we may move some kind of important transforms from prelink to post-link (or vise-versa)? Changing instrumentation or inlining are the ones that come to mind, but I'm sure there are probably lots of cases where altering the pipeline could be a problem. I'm just not sure they're any more important for this case more than other bitcode artifacts.

(the commit description is out of date referencing PerModuleDefaultPipeline)

Good catch, thank you.

I'd definitely like @tejohnson's feedback on this if there are any potential pitfalls we're missing

+100 to that

paulkirth edited the summary of this revision. (Show Details)

Update summary and add documentation

TODO: handle extension points.

paulkirth updated this revision to Diff 515909.Apr 21 2023, 2:05 PM

Check if we're compiling for FatLTO before running callbacks should run in the pre-link pipeline in the module optimization pipeline.

  • Adds a boolean param to buildModuleOptimizationPipeline() that defaults to false
  • Guards the early and late callbacks that could run in the pre-link pipeline.

@tejohnson Do you have any thoughts here regarding the pipeline or other shortcomings this approach may have?

@tejohnson Do you have any thoughts here regarding the pipeline or other shortcomings this approach may have?

Sorry for missing the earlier ping. I have some concerns about invoking just the ThinLTO prelink pipeline + the module optimization pipeline. Specifically, afaict this only invokes the module simplification pipeline once, with Phase == ThinLTOPreLink. If you search through buildModuleSimplificationPipeline for Phase, you will see there are certain things we only do if we are in the post link phase, or if we are not in the pre-link phase. We will miss out on some optimizations if we never invoke this again with a post link phase. For example, with SamplePGO, we only do indirect call promotion in the post link phase. We will miss out on ICP completely by never setting up the module simplification pipeline again with a post link phase. There are other examples, and additional ones if you look down into the inliner pipeline setups invoked from here and passed the phase.

paulkirth updated this revision to Diff 517415.Apr 26 2023, 6:08 PM

Rebase and try to more closely match the perModuleDefaultPipeline

Since the pre-link pipelines seem to run module simplification + module
optimization differently than for the non-lto default pipeline, we try to
approximate the default pipeline more closely by running those two pipelines
as they would be run for non-lto compiliation.

The patch also updates the documentation to match the new approach.

@tejohnson Do you have any thoughts here regarding the pipeline or other shortcomings this approach may have?

Sorry for missing the earlier ping. I have some concerns about invoking just the ThinLTO prelink pipeline + the module optimization pipeline. Specifically, afaict this only invokes the module simplification pipeline once, with Phase == ThinLTOPreLink. If you search through buildModuleSimplificationPipeline for Phase, you will see there are certain things we only do if we are in the post link phase, or if we are not in the pre-link phase. We will miss out on some optimizations if we never invoke this again with a post link phase. For example, with SamplePGO, we only do indirect call promotion in the post link phase. We will miss out on ICP completely by never setting up the module simplification pipeline again with a post link phase. There are other examples, and additional ones if you look down into the inliner pipeline setups invoked from here and passed the phase.

Thanks for the feedback. After some trial and error, I think that running module simplification + module optimization should be a good first approximation of the default pipeline. Since the goal for FatLTO is to generate a section w/ LTO compatible bitcode + a text section roughly approximating the per TU -O<n> output I think that would get us there without introducing any obvious gotchas. It isn't ideal to run so many passes again, but from what I could tell it didn't seem like they would do anything wrong.

That said, I missed the differences in module simplification, so please let me know if that isn't accurate.

for your use case, is it ok for performance to not be 100% if you go down the non-LTO route? presumably if you're not using LTO you're not caring about squeezing out 100% performance. for example in Chrome-land we've talked about this sort of thing for tests where we don't really care if performance is optimal, so an approximation of -O2 (or whatever) is good enough. if we solidify this as a tradeoff and don't promise 100% compatibility with the default optimization pipeline, I'm much more comfortable with that as long as people don't complain if FatLTO non-LTO performance ever slightly degrades

Right, I think we basically have two options on how to do this:

  1. Make fat LTO exactly equivalent to LTO bitcode + default-optimized object code. This would require cloning the input module, running the LTO pipeline on one module, embedding the bitcode in the other and then running the default pipeline in that one. This will robustly match the LTO/default behavior exactly, at the cost of compile-time.
  2. Make fat LTO approximately equivalent to LTO bitcode + default-optimized object code, with the usual cost-benefit tradeoffs. In this case, I don't think that rerunning module simplification after embedding bitcode makes much sense, because the differences between the pre-link ThinLTO pipeline and the default pipeline are too minor.

I believe the differences are:

  • Various bits related to SamplePGO.
  • The (currently disabled) MemProfiler pass. This looks like a plain bug to me: What is an instrumentation pass doing in the simplification pipeline? I'm pretty sure this should be in the optimization pipeline instead. Even if not, we could easily explicitly run this one, because it's at the very end of the simplification pipeline.
  • A single GlobalDCE pass at the end of the pipeline. I'm pretty sure this is another bug, because the comment says "Remove dead code, except in the ThinLTO pre-link pipeline where we may want to keep available_externally functions", even though available_externally functions only become relevant post-link, not pre-link.
  • Loop rotation will not duplicate calls in headers pre-link.

Ignoring the bugs, the only substantial differences seem to be related to SamplePGO, which I'm not familiar with. I'm not clear on how SamplePGO relates to fat LTO usage scenarios.

for your use case, is it ok for performance to not be 100% if you go down the non-LTO route? presumably if you're not using LTO you're not caring about squeezing out 100% performance. for example in Chrome-land we've talked about this sort of thing for tests where we don't really care if performance is optimal, so an approximation of -O2 (or whatever) is good enough. if we solidify this as a tradeoff and don't promise 100% compatibility with the default optimization pipeline, I'm much more comfortable with that as long as people don't complain if FatLTO non-LTO performance ever slightly degrades

I agree ensuring full performance with FatLTO's non-LTO native object isn't terribly important. But my concern is whether this results in some unexpected behavior with certain passes not being run at all - e.g. ICP won't run at all if you are doing SamplePGO in this configuration.

Right, I think we basically have two options on how to do this:

  1. Make fat LTO exactly equivalent to LTO bitcode + default-optimized object code. This would require cloning the input module, running the LTO pipeline on one module, embedding the bitcode in the other and then running the default pipeline in that one. This will robustly match the LTO/default behavior exactly, at the cost of compile-time.
  2. Make fat LTO approximately equivalent to LTO bitcode + default-optimized object code, with the usual cost-benefit tradeoffs. In this case, I don't think that rerunning module simplification after embedding bitcode makes much sense, because the differences between the pre-link ThinLTO pipeline and the default pipeline are too minor.

I believe the differences are:

  • Various bits related to SamplePGO.
  • The (currently disabled) MemProfiler pass. This looks like a plain bug to me: What is an instrumentation pass doing in the simplification pipeline? I'm pretty sure this should be in the optimization pipeline instead. Even if not, we could easily explicitly run this one, because it's at the very end of the simplification pipeline.

I added this. As with other profile instrumentation passes it is off by default. I think I added it here because this is also where we add the PGO instrumentation passes, although I suppose there are specific reasons for doing those here (i.e. inlining is also invoked here). I can move it out so this becomes a non-issue.

  • A single GlobalDCE pass at the end of the pipeline. I'm pretty sure this is another bug, because the comment says "Remove dead code, except in the ThinLTO pre-link pipeline where we may want to keep available_externally functions", even though available_externally functions only become relevant post-link, not pre-link.

ThinLTO importing is not the only producer of available_externally functions. There can be available_externally functions coming out of clang, and we want them to survive until after importing and the post LTO link inlining is done.

  • Loop rotation will not duplicate calls in headers pre-link.

This seems minor I agree.

Ignoring the bugs, the only substantial differences seem to be related to SamplePGO, which I'm not familiar with. I'm not clear on how SamplePGO relates to fat LTO usage scenarios.

As mentioned earlier, I think the question is whether it will be a surprise when using SamplePGO that certain things simply don't get run when using the non-LTO native objects. I don't know enough about when these will get used to know if this is a configuration anyone will care about. If it is, I suppose one approach would be to invoke the longer pipelines only under SamplePGO. Either way, this probably needs some clear documentation.

llvm/lib/Passes/PassBuilderPipelines.cpp
1485

Why pass in phase None for these calls to module simplification and optimization passes, instead of the corresponding post-LTO link phase? I would think we would want the complement of the phase provided earlier. I'm pretty sure there are some passes that are configured to run in exactly one of these phases, and we might end up duplicating them if running a pre-LTO link phase + a default (non-LTO) phase optimization pipeline.

  • The (currently disabled) MemProfiler pass. This looks like a plain bug to me: What is an instrumentation pass doing in the simplification pipeline? I'm pretty sure this should be in the optimization pipeline instead. Even if not, we could easily explicitly run this one, because it's at the very end of the simplification pipeline.

I added this. As with other profile instrumentation passes it is off by default. I think I added it here because this is also where we add the PGO instrumentation passes, although I suppose there are specific reasons for doing those here (i.e. inlining is also invoked here). I can move it out so this becomes a non-issue.

Right. I'd expect the pipeline position here to be based on whether the instrumentation needs to be pre-inline or post-inline. The pre-inline (non-CS) PGO instrumentation runs during simplification, the post-inline CS PGO instrumentation is part of optimization.

  • A single GlobalDCE pass at the end of the pipeline. I'm pretty sure this is another bug, because the comment says "Remove dead code, except in the ThinLTO pre-link pipeline where we may want to keep available_externally functions", even though available_externally functions only become relevant post-link, not pre-link.

ThinLTO importing is not the only producer of available_externally functions. There can be available_externally functions coming out of clang, and we want them to survive until after importing and the post LTO link inlining is done.

Good point. The justification still seems somewhat dubious to me, because GlobalDCE will only remove available_externally functions if they aren't referenced. Referenced available_externally functions are only dropped by EliminateAvailableExternally, which we don't run pre-link. It's not obvious to me why we would want to keep around unreferenced available_externally functions pre-link even in a ThinLTO scenario.

paulkirth updated this revision to Diff 517705.Apr 27 2023, 1:51 PM

Use separate modules and pipelies to optimize the bitcode and asembly sections.

  • The EmbedBitcodePass (which maybe should be renamed now...) was updated to clone the current module and run a separate pipeline on the clone.
  • The cloned module is then used to generate the .llvm.tlo bitcode section and discarded.
  • The EmbedBitcodePass is initialized in the FatLTO pipeline with the corect pre-link LTO pipeline.
  • The FatLTO pipeline now only runs the EmbedBitcodePass and then the PerModuleDefaultPipeline.
  • The MPM used to optimize the cloned module is generated from the same pass builder as the defautl pipeline, so there shouldn't be any difference in configuration.
paulkirth added inline comments.Apr 27 2023, 2:07 PM
llvm/lib/Passes/PassBuilderPipelines.cpp
1485

The intent there was to closely mimic the non-lto pipeline. For now I've attempted to just split everything, but I'm not sure I love the approach. It's certainly simple, but I didn't like giving a pass its own passmanager... plus its does seem to be a bit wasteful to run two complete pipelines on the module.

@tejohnson @nikic Are there any unresolved issues w/ this approach, now that it uses two completely disjoint pipelines?

tejohnson added inline comments.Thu, May 25, 11:37 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
1485

I don't think the prior code mimic'ed the non-LTO pipeline, however. My understanding is that the prior version did the following for the non embedded code on which it would generate a native object:

(Thin)LTOPreLink + ModuleSimplification(ThinOrFullLTOPhase::None) + ModuleOptimization(ThinOrFullLTOPhase::None)

So you would have been essentially duplicating ModuleSimplification (since it is invoked in both the *LTO pre-link and again with LTO Phase = None). That can cause some issues because there are passes that are only meant to be invoked once that you could duplicate. A good example is PGO instrumentation/annotation (see the conditions under which addPGOInstrPasses is called from buildModuleSimplificationPipeline).

With your new version, my understanding is that you are calling:

  1. buildPerModuleDefaultPipeline on original Module -> generate native object
  2. (Thin)LTOPreLink on Module clone -> embed IR

And buildPerModuleDefaultPipeline is largely just ModuleSimplification + ModuleOptimization. So I'm not sure about why it would be more wasteful (other than cloning the Module) - you are not running more pipelines total than in the prior code. Unless I am misunderstanding the former or current code, which is possible!

paulkirth marked 5 inline comments as done.Thu, May 25, 2:17 PM
paulkirth added inline comments.
llvm/lib/Passes/PassBuilderPipelines.cpp
1485

I don't think the prior code mimic'ed the non-LTO pipeline, however. My understanding is that the prior version did the following for the non embedded code on which it would generate a native object:

(Thin)LTOPreLink + ModuleSimplification(ThinOrFullLTOPhase::None) + ModuleOptimization(ThinOrFullLTOPhase::None)

So you would have been essentially duplicating ModuleSimplification (since it is invoked in both the *LTO pre-link and again with LTO Phase = None). That can cause some issues because there are passes that are only meant to be invoked once that you could duplicate. A good example is PGO instrumentation/annotation (see the conditions under which addPGOInstrPasses is called from buildModuleSimplificationPipeline).

With your new version, my understanding is that you are calling:

  1. buildPerModuleDefaultPipeline on original Module -> generate native object
  2. (Thin)LTOPreLink on Module clone -> embed IR

And buildPerModuleDefaultPipeline is largely just ModuleSimplification + ModuleOptimization. So I'm not sure about why it would be more wasteful (other than cloning the Module) - you are not running more pipelines total than in the prior code. Unless I am misunderstanding the former or current code, which is possible!

I think that's an accurate summary. While my intent originally was to closely mimic the default pipeline, I don't think it was very close in reality. Also, as you pointed out the new approach isn't nearly as wasteful as I felt initially.

I agree ensuring full performance with FatLTO's non-LTO native object isn't terribly important. But my concern is whether this results in some unexpected behavior with certain passes not being run at all - e.g. ICP won't run at all if you are doing SamplePGO in this configuration.

...

As mentioned earlier, I think the question is whether it will be a surprise when using SamplePGO that certain things simply don't get run when using the non-LTO native objects. I don't know enough about when these will get used to know if this is a configuration anyone will care about. If it is, I suppose one approach would be to invoke the longer pipelines only under SamplePGO. Either way, this probably needs some clear documentation.

These are all performance concerns for the non-LTO native objects, not correctness right? I think if we clearly document that performance might be very meh for fat LTO non-LTO native objects this would be fine.

llvm/include/llvm/Bitcode/EmbedBitcodePass.h
33

is this default param necessary?

llvm/lib/Passes/PassBuilderPipelines.cpp
1265

do we still need IsFatLTO with the latest approach?

1485

IIUC the previous version was ThinLTOPreLink + ModuleOptimization(ThinOrFullLTOPhase::None), where we'd write out the module after pre-link and before module optimization to some global. So we'd avoid running the simplification pipeline twice with that. (the problem is that the simplification pipeline is slightly different between a normal build and a ThinLTO build)

the current version will run the simplification pipeline twice now, once for ThinLTO and once for the default pipeline

llvm/lib/Passes/PassRegistry.def
61

these should be parameterizable, see MODULE_PASS_WITH_PARAMS

paulkirth marked an inline comment as done.Thu, May 25, 2:37 PM

These are all performance concerns for the non-LTO native objects, not correctness right? I think if we clearly document that performance might be very meh for fat LTO non-LTO native objects this would be fine.

In the prior version, I think its too hard to say if running pipelines, like ModuleSimplification, multiple times is safe/correct without some fairly detailed analysis. I can easily imagine any transform that is only expected to run once may introduce a subtle bug if run again on the same module. Nothing obvious here jumps out at me, but I'm also not confident enough to say that it can't happen.

That said, since FatLTO no longer use a modified/custom pipeline but uses the (Thin)LTO and Module piplines on independed modules, there should no longer be any correctness nor performance issues over a normal compilation for the non-LTO object code.

llvm/include/llvm/Bitcode/EmbedBitcodePass.h
33

IIRC it was required for some reason, but let me double check that.

llvm/lib/Passes/PassBuilderPipelines.cpp
1265

That's a good point. I don't think so, so I can simplify this futher. Thanks for pointing this out.

llvm/lib/Passes/PassRegistry.def
61

That is a much better way to handle this. Thank you.

paulkirth updated this revision to Diff 525901.Thu, May 25, 6:29 PM
paulkirth edited the summary of this revision. (Show Details)

Update summary, documentation, and address comments.

aeubanks added inline comments.Tue, May 30, 10:59 AM
llvm/include/llvm/Bitcode/EmbedBitcodePass.h
27

extraneous ;

29

this is never used

44

if you really wanted to hook this up to the pipeline parsing (so you could specify a sub-passmanager) you'd have to thread through all the parsing code. probably not worth it

llvm/include/llvm/Passes/PassBuilder.h
242

comment out of date

llvm/lib/Passes/PassBuilderPipelines.cpp
1265

still not removed?

1471

this comment is obsolete now

paulkirth updated this revision to Diff 527074.Wed, May 31, 9:03 AM
paulkirth marked 7 inline comments as done.

Address remaining comments

  • Fix Doc string
  • Remove obsolete comment
  • Remove uses of IsThinkLTO
  • Remove some dead code
  • Fix some typos
  • Fix test looking for wrong pass name in the error string
  • git clang-format
llvm/lib/Passes/PassBuilderPipelines.cpp
1471

I was thinking once that patch lands we can take the ThinOrFullLTOPhase direcly and avoid the branch, but maybe it still makes sense to leave this as is.

this basically lgtm

llvm/lib/Bitcode/Writer/EmbedBitcodePass.cpp
31–38

these should be unnecessary with the newly added pass params, e.g. -passes=embed-bitcode<emit-summary>

paulkirth updated this revision to Diff 529076.Tue, Jun 6, 4:07 PM

Rebase

  • Fix typo in docs
  • Remove unneeded cl::opts
paulkirth marked an inline comment as done.Tue, Jun 6, 4:08 PM

@tejohnson @nikic any lingering concerns?