Page MenuHomePhabricator

[LTO] Enable module summary emission by default for regular LTO
ClosedPublic

Authored by tobiasvk on Jun 13 2017, 11:36 AM.

Details

Summary

With D33921, we gained the ability to have module summaries in regular
LTO modules without triggering ThinLTO compilation. Module summaries in
regular LTO allow garbage collection (dead stripping) before LTO
compilation and thus open up additional optimization opportunities.

This patch enables summary emission in regular LTO for all targets
except ld64-based ones (which use the legacy LTO API).

Event Timeline

tobiasvk created this revision.Jun 13 2017, 11:36 AM
pcc edited edge metadata.Jun 13 2017, 12:12 PM

Have you considered writing the regular LTO summaries unconditionally if -flto was specified? That was how I imagined that the interface would look.

Also, how were you planning to expose the reference graph to the linker? I gather from your message that you are teaching your linker to read the module summary index directly from bitcode files. I wonder whether it would be worth trying to avoid needing to read summaries multiple times. The approach that I had in mind was to somehow teach the linker to add regular object files to the combined summary index by creating a "global value summary" for each section, with a reference for each relocation. (This would be similar to how we add regular LTO inputs to the combined summary in D33922.) Then LTO would run as usual. Any regular object sections would then naturally participate in the summary-based dead stripping that LTO already does.

In D34156#779270, @pcc wrote:

Have you considered writing the regular LTO summaries unconditionally if -flto was specified? That was how I imagined that the interface would look.

Absolutely, if people are OK with that. I would have enabled it by default in our tree anyway. Let me know if you prefer this (and other people if you have objections).

Also, how were you planning to expose the reference graph to the linker? I gather from your message that you are teaching your linker to read the module summary index directly from bitcode files.

Yep, pretty much. I have a layer between the linker and LLVM that implements a getRefsBySymbol API which lets the linker iterate the reference graph. We've had this out of tree for some time, it's just that your recent patch made it straightforward to implement upstream.

I wonder whether it would be worth trying to avoid needing to read summaries multiple times. The approach that I had in mind was to somehow teach the linker to add regular object files to the combined summary index by creating a "global value summary" for each section, with a reference for each relocation. (This would be similar to how we add regular LTO inputs to the combined summary in D33922.) Then LTO would run as usual. Any regular object sections would then naturally participate in the summary-based dead stripping that LTO already does.

It could be done but I'm skeptical about this from a cost/benefit perspective. We'd only save one additional read of the summaries, which is pretty cheap anyway. The GC logic in the linker is non-trivial and doesn't map particularly well to the combined summary (e.g. we'd have to deal with the liveness of groups of symbols rather than individual ones when symbols share an input section).

Thanks,
Tobias

pcc added a comment.Jun 13 2017, 3:45 PM
In D34156#779270, @pcc wrote:

Have you considered writing the regular LTO summaries unconditionally if -flto was specified? That was how I imagined that the interface would look.

Absolutely, if people are OK with that. I would have enabled it by default in our tree anyway. Let me know if you prefer this (and other people if you have objections).

I'd be fine with it. We may want to make it conditional on the target (e.g. the summary wouldn't be much use on Darwin targets right now), but let's see what others think.

I wonder whether it would be worth trying to avoid needing to read summaries multiple times. The approach that I had in mind was to somehow teach the linker to add regular object files to the combined summary index by creating a "global value summary" for each section, with a reference for each relocation. (This would be similar to how we add regular LTO inputs to the combined summary in D33922.) Then LTO would run as usual. Any regular object sections would then naturally participate in the summary-based dead stripping that LTO already does.

It could be done but I'm skeptical about this from a cost/benefit perspective. We'd only save one additional read of the summaries, which is pretty cheap anyway. The GC logic in the linker is non-trivial and doesn't map particularly well to the combined summary (e.g. we'd have to deal with the liveness of groups of symbols rather than individual ones when symbols share an input section).

That's a fair point, and now that I think about it it does seem better for GC to be entirely driven by the linker, as you have done. Longer term we may consider replacing summary-based dead stripping entirely with linker-driven dead stripping.

tobiasvk updated this revision to Diff 102585.Jun 14 2017, 11:45 AM
  • Change patch to always emit a module summary for regular LTO

I don't see any real downside of having a summary given the marginal time and
space overhead it takes to construct and save it.

mehdi_amini edited edge metadata.Jun 14 2017, 2:30 PM
  • Change patch to always emit a module summary for regular LTO

    I don't see any real downside of having a summary given the marginal time and space overhead it takes to construct and save it.

I think some code/logic needs to be changed in LLVM first:

bool LTOModule::isThinLTO() {
  // Right now the detection is only based on the summary presence. We may want
  // to add a dedicated flag at some point.
  • Change patch to always emit a module summary for regular LTO

    I don't see any real downside of having a summary given the marginal time and space overhead it takes to construct and save it.

I think some code/logic needs to be changed in LLVM first:

bool LTOModule::isThinLTO() {
  // Right now the detection is only based on the summary presence. We may want
  // to add a dedicated flag at some point.

No, that check still works. hasGlobalValueSummary returns false for regular LTO summaries (since they a have different block ID in the bitcode). I believe @pcc is refactoring these APIs in a separate patch, since the name hasGlobalValueSummary doesn't quite reflect its purpose anymore.

pcc added a comment.Jun 14 2017, 2:38 PM
  • Change patch to always emit a module summary for regular LTO

    I don't see any real downside of having a summary given the marginal time and space overhead it takes to construct and save it.

I think some code/logic needs to be changed in LLVM first:

bool LTOModule::isThinLTO() {
  // Right now the detection is only based on the summary presence. We may want
  // to add a dedicated flag at some point.

Thanks for pointing that out. Right now that function will ignore full LTO summaries because they are represented using a different record number, but once D33922 lands the hasGlobalValueSummary function will return true if the module contains a full LTO summary. I will update that function as part of D33922 to check for presence of a ThinLTO summary.

OK! Thanks both :)

tobiasvk retitled this revision from [LTO] Add -femit-summary-index flag to emit summary for regular LTO to [LTO] Enable module summary emission by default for regular LTO.Jun 15 2017, 9:01 AM
tobiasvk edited the summary of this revision. (Show Details)
pcc accepted this revision.Jun 15 2017, 10:38 AM

Please confirm that we can still self host with full LTO now that D33922 has landed.

LGTM otherwise. Thanks!

This revision is now accepted and ready to land.Jun 15 2017, 10:38 AM
In D34156#781415, @pcc wrote:

Please confirm that we can still self host with full LTO now that D33922 has landed.

Good point. And the answer seems to be no :/

ld.lld: .../llvm/lib/Linker/IRMover.cpp:242: llvm::Type *(anonymous namespace)::TypeMapTy::get(llvm::Type *, SmallPtrSet<llvm::StructType *, 8> &): Assertion `!(Pair.first != Ty && Pair.second == Ty) && "mapping to a source type"' failed.

This is with cmake -DCLANG_ENABLE_BOOTSTRAP=1-DBOOTSTRAP_LLVM_ENABLE_LLD=1-DBOOTSTRAP_LLVM_ENABLE_LTO=1 (...) && ninja stage2.

Hi Tobias, I tracked down the failure self-hosting LLVM with LTO with this revision to https://bugs.llvm.org/show_bug.cgi?id=37684#c2 and have a fix under review in D47898. This revision needs to be updated to include the following trivial EmitSummaryIndex->PrepareForThinLTO renames to build:

--- a/lib/CodeGen/BackendUtil.cpp
+++ b/lib/CodeGen/BackendUtil.cpp
@@ -944,7 +944,7 @@ void EmitAssemblyHelper::EmitAssemblyWithNewPassManager(
   ModulePassManager MPM(CodeGenOpts.DebugPassManager);
 
   if (!CodeGenOpts.DisableLLVMPasses) {
-    bool IsThinLTO = CodeGenOpts.EmitSummaryIndex;
+    bool IsThinLTO = CodeGenOpts.PrepareForThinLTO;
     bool IsLTO = CodeGenOpts.PrepareForLTO;
 
     if (CodeGenOpts.OptimizationLevel == 0) {
diff --git a/lib/CodeGen/CGDebugInfo.cpp b/lib/CodeGen/CGDebugInfo.cpp
index db6a82b415..91f80e5739 100644
--- a/lib/CodeGen/CGDebugInfo.cpp
+++ b/lib/CodeGen/CGDebugInfo.cpp
@@ -578,7 +578,7 @@ void CGDebugInfo::CreateCompileUnit() {
                           CSInfo,
                           getSource(SM, SM.getMainFileID())),
       CGOpts.EmitVersionIdentMetadata ? Producer : "",
-      LO.Optimize || CGOpts.PrepareForLTO || CGOpts.EmitSummaryIndex,
+      LO.Optimize || CGOpts.PrepareForLTO || CGOpts.PrepareForThinLTO,
       CGOpts.DwarfDebugFlags, RuntimeVers,
       CGOpts.EnableSplitDwarf ? "" : CGOpts.SplitDwarfFile, EmissionKind,
       0 /* DWOid */, CGOpts.SplitDwarfInlining, CGOpts.DebugInfoForProfiling,

Are you still interested in landing this?

Hi Tobias, I tracked down the failure self-hosting LLVM with LTO with this revision to https://bugs.llvm.org/show_bug.cgi?id=37684#c2 and have a fix under review in D47898.

Fantastic! That sounds exactly like the problem I was seeing back then. Thanks for tracking this down!

Are you still interested in landing this?

Sure, I'll push an update to the patch.

Tobias

tobiasvk updated this revision to Diff 150965.Jun 12 2018, 9:21 AM
  • Rebase for current tree
  • Fix -flto -save-temps which the previous patch broke

@pcc, @tejohnson: Are you OK with this going in now that D47898 has merged?

tejohnson accepted this revision.Jun 20 2018, 10:21 AM

LGTM with following suggestions.

clang/lib/CodeGen/BackendUtil.cpp
776

Perhaps sink all this code into the use site of EmitLTOSummary below, especially because you don't need to check !CodeGenOpts.PrepareForThinLTO.

998

Ditto here.

This revision was automatically updated to reflect the committed changes.