This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Compute synthetic function entry count
ClosedPublic

Authored by eraman on Feb 20 2018, 11:53 AM.

Details

Summary

This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.

Diff Detail

Event Timeline

eraman created this revision.Feb 20 2018, 11:53 AM

Sorry for the lateness of the review! Comments/questions below.

include/llvm/IR/ModuleSummaryIndex.h
525

Need to document that this is only populated by the Thin Link (and unused by per-module summaries).

1212

For consistency it would be nice to have the above two lines match the same code in child_begin (same with child*_end). I.e.

FunctionSummary *F =
    cast<FunctionSummary>(N.getSummaryList().front()->getBaseObject());
lib/Bitcode/Reader/BitcodeReader.cpp
5252

Update message (maybe change to a range rather than a list, since we now have quite a few valid versions).

5360–5361

document constant 0 param (here and in other constructor calls). I.e. "/*Entry Count=*/0"

lib/Bitcode/Writer/BitcodeWriter.cpp
3877

We read this value out of the 5th index unconditionally (or at least when the version is high enough) in BitcodeReader.cpp. Won't it cause an issue on the reader side if we only emit this (and adjust where the ref count is emitted) conditionally?

Regardless, suggest asserting in the writer here and in writePerModuleFunctionSummaryRecord when you expect the FS->entryCount() to be 0.

lib/Transforms/IPO/FunctionImport.cpp
121 ↗(On Diff #135117)

why off by default?

672 ↗(On Diff #135117)

Some comments about what the algorithm does would be nice.

680 ↗(On Diff #135117)

Can this be moved down into the if statement body? It isn't used outside of it.

690 ↗(On Diff #135117)

ditto.

694 ↗(On Diff #135117)

can this overflow?
Also, why only set the entry count on the first copy? Shouldn't we propagate into all copies?

901 ↗(On Diff #135117)

Probably shouldn't do this during importing - setting on the functions already in the dest module isn't really related to importing anyway. Also, if done from the caller (i.e. see thinBackend(), we have a ready map of GUID->Summary, DefinedGlobals). Using that is much more efficient that looking up in the full Index hash map.

994 ↗(On Diff #135117)

Remove?

eraman updated this revision to Diff 141773.Apr 9 2018, 5:56 PM
eraman marked 8 inline comments as done.

Address Teresa's comments and rebase.

lib/Bitcode/Writer/BitcodeWriter.cpp
3877

Indeed this is a bug. I have changed it to write unconditionally. I don't quite follow the second part of your comment regarding the assert.

lib/Transforms/IPO/FunctionImport.cpp
121 ↗(On Diff #135117)

I would like to enable this in the inliner, do some tuning and then turn this on by default. If you strongly prefer this to be default now, I will do so.

672 ↗(On Diff #135117)

I haven't explained the algorithm here because it is done in SyntheticCountUtils that works on any callgraph. I have tried to add some comments. Let me know if you want any specific details.

694 ↗(On Diff #135117)

Yes, this could overflow. I have changed it to a SaturatingAdd.

This should have been set to all copies. I have modified this as well as initialization counts.

901 ↗(On Diff #135117)

thinBackend has the DefinedGlobalsMap, but not the other callers to this function. Could you suggest what is the best way to do this on all callers?

Sorry for the delay.

include/llvm/IR/ModuleSummaryIndex.h
503–504

Document '0' parameter like you did elsewhere.

lib/Analysis/ModuleSummaryAnalysis.cpp
519–520

Document '0' parameter like you did elsewhere.

lib/Bitcode/Writer/BitcodeWriter.cpp
3877

Nevermind the comment about the assert, looking again, I don't think it makes much sense.

lib/Transforms/IPO/FunctionImport.cpp
121 ↗(On Diff #135117)

Ok

901 ↗(On Diff #135117)

You only need to invoke this functionality from two places:

  1. The new LTO API (thinBackend) - tested via llvm-lto2
  2. The old C LTO API (ThinLTOCodeGenerator.cpp) - tested via llvm-lto

Both of these already have the DefinedGlobals map, because they invoke other Thin Link optimizations that require it. I.e. search for callers of thinLTOInternalizeModule, which is passed this map. And just add a call to this functionality, split out into a similar type of function, that takes the map. You'll note that for testing purposes, the old C API has the ability to call the thin link optimizations independent of the other optimizations, i.e. see ThinLTOCodeGenerator::internalize, and you may want to do something similar too (for testing via llvm-lto).

127 ↗(On Diff #141773)

Nit - does this need to be split into 2 lines? It looks like it would fit on one 80 col line.

686 ↗(On Diff #141773)

Nit: reads better without the "of" (i.e. just "All successors are the ...").

729 ↗(On Diff #141773)

Nit: comment needs a fix, i.e. should "takes of" be "takes on"?

975 ↗(On Diff #141773)

Similarly, this handling can be moved into renameModuleForThinLTO, called further below here, which already does some index-based updates for each imported global. See the start of processGlobalForThinLTO, which does:
ValueInfo VI = ImportIndex.getValueInfo(GV.getGUID());

and the ValueInfo has a pointer to the summary entry. You can have it get the ValueInfo once and share it with your update, which can be done in that method.

You may want to add an interface to the ValueInfo to get the summary entry given a ModulePath (Name), since it only has an interface to return the list of summary entries currently.

eraman marked 5 inline comments as done.Nov 15 2018, 5:55 PM

Sorry for this very long delay in following up.

lib/Transforms/IPO/FunctionImport.cpp
901 ↗(On Diff #135117)

The test I wrote (function_entry_count.ll) runs opt to do function importing. The entry point is from FunctionImportLegacyPass::runOnModule. Is this something that I don't need to support (and rewrite the test appropriately)?

729 ↗(On Diff #141773)

I think I intended "takes care of"

eraman marked 3 inline comments as done.Nov 27 2018, 5:21 PM
eraman added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
901 ↗(On Diff #135117)

Based on our offline chat, I changed the test not to use opt (and use llvm-lto2 instead) and I am able to move this code as you've suggested.

975 ↗(On Diff #141773)

I am not sure what is the good way to guard the change with enable-import-entry-count flag. I don't like the idea of referencing the flag from a utils file, but the alternative is to pass the value through the callchain starting from renameModuleForThinLTO, which has other callers. Do you have any suggestions?

tejohnson added inline comments.Nov 28 2018, 7:00 AM
lib/Transforms/IPO/FunctionImport.cpp
975 ↗(On Diff #141773)

Perhaps do the updates on both the imported functions and the defined functions (what you were doing above) in renameModuleForThinLTO (it's called on the original module and again below for the imported symbols). Then you cover both my comments (since it's called outside importing for the orig module's defined symbols). In that case the EnableImportEntryCount flag can be completely moved into the utils source file.

eraman marked 2 inline comments as done.Nov 30 2018, 11:18 AM
eraman added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
975 ↗(On Diff #141773)

I have moved the updates to processGlobalForThinLTO. I am not sure if I am doing the right thing comparing ModulePath() with M.getModuleIdentifier().

eraman updated this revision to Diff 176166.Nov 30 2018, 11:22 AM
eraman marked an inline comment as done.

Rebase and address Teresa's comments.

tejohnson added inline comments.Dec 6 2018, 1:19 PM
lib/Analysis/ModuleSummaryAnalysis.cpp
512

nit: s/InsCount/InstCount/
Thanks for adding the documentation for these existing parameters too!

lib/LTO/LTO.cpp
1174

Add brief comment about what this does.

lib/LTO/ThinLTOCodeGenerator.cpp
203

I don't see this new parameter getting used anywhere. I think it might be leftover from your earlier approach (along with the other changes to this file)?

893

Add a call to your new analysis above here so it is supported with the old LTO API as well. Then you can augment your new test with an llvm-lto invocation as well.

lib/Transforms/IPO/FunctionImport.cpp
847 ↗(On Diff #176166)

I assume that eventually we would want to gate this based on whether we have FDO or not - how will that work?

851 ↗(On Diff #176166)

What is a fake/invalid edge in the index?

eraman marked 8 inline comments as done.Dec 11 2018, 5:36 PM

I have to bump up the index version to 6 after rebasing and fix a few failing test cases.

lib/LTO/ThinLTOCodeGenerator.cpp
203

Yes, this is leftover from earlier iterations

lib/Transforms/IPO/FunctionImport.cpp
847 ↗(On Diff #176166)

I think eventually we want this to work even when profile data is available by filling in the synthesized count for only those functions which do not have an entry count (a hybrid mode). What I have done now is to gate this on -enable-import-entry-count. To avoid declaring this as extern, I have moved the computeSyntheticCount to FunctionImportUtils.cpp (where the flag is defined) and bail out if the flag is false.

851 ↗(On Diff #176166)

These are the edges from the fake root node to all nodes without parent (see calculateCallGraphRoot). But there is no need to handle them separately (and the FIXME should be removed). The Edge.second.RelBlockFreq will be 0 and hence this lambda will return 0 for the fake edges which is the right thing to do.

eraman updated this revision to Diff 177804.Dec 11 2018, 5:37 PM
eraman marked 2 inline comments as done.

Address Teresa's comments, rebase and fix test failures after rebasing.

tejohnson added inline comments.Dec 12 2018, 8:27 AM
lib/Transforms/IPO/FunctionImport.cpp
847 ↗(On Diff #176166)

It doesn't belong in FunctionImportUtils.cpp, which is operating on IR in the backends. It belongs somewhere in lib/LTO since this is done during the LTO/thin link portion of the compile. Historically we have put some of these facilities in FunctionImport.cpp even when they aren't directly related to importing (e.g. computeDeadSymbols), but it probably makes sense for them to move to a separate file under lib/LTO/ that can be shared by both the new and old LTO APIs. Maybe best to just create that new file now for your new code (in which case we can move computeDeadSymbols there as a follow on). Not sure the best name, maybe something like SummaryBasedOptimizations.cpp ? Open to better names...

Regarding avoiding that option being extern, 2 possibilities come to mind:

  • Does an entry count of 0 in the summary mean that synthetic counts weren't created? If so, can you gate on that.

Or probably more robust:

  • Consider adding a flag to the index (see withGlobalValueDeadStripping).
851 ↗(On Diff #176166)

Ok sounds good - I noticed that the FIXME is still there though.

eraman marked 2 inline comments as done.Dec 12 2018, 2:28 PM
eraman added inline comments.
lib/Transforms/IPO/FunctionImport.cpp
847 ↗(On Diff #176166)

I have create a new SummaryBasedOptimizations.cpp and moved the code there. I have also added a flag to Index to indicate whether synthetic counts are computed.

eraman updated this revision to Diff 177926.Dec 12 2018, 2:28 PM

Address review comments

tejohnson accepted this revision.Dec 13 2018, 8:39 AM

LGTM with 1 minor comment nit below, and also can you please add a test like test/Bitcode/thinlto-deadstrip-flag.ll to test for the new flag directly?

include/llvm/IR/ModuleSummaryIndex.h
818

s/propagatin/propagation/

This revision is now accepted and ready to land.Dec 13 2018, 8:39 AM
eraman updated this revision to Diff 178106.Dec 13 2018, 11:44 AM

Add a test case to check for synthetic entry counts and rebase.

eraman marked an inline comment as done.Dec 13 2018, 11:45 AM

Add a test case to check for HasSyntheticEntryCounts flag.

This revision was automatically updated to reflect the committed changes.