This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Sorry for the lateness of the review! Comments/questions below.
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
457 ↗ | (On Diff #135117) | Need to document that this is only populated by the Thin Link (and unused by per-module summaries). |
1094 ↗ | (On Diff #135117) | For consistency it would be nice to have the above two lines match the same code in child_begin (same with child*_end). I.e. FunctionSummary *F = cast<FunctionSummary>(N.getSummaryList().front()->getBaseObject()); |
lib/Bitcode/Reader/BitcodeReader.cpp | ||
5145 ↗ | (On Diff #135117) | Update message (maybe change to a range rather than a list, since we now have quite a few valid versions). |
5248 ↗ | (On Diff #135117) | document constant 0 param (here and in other constructor calls). I.e. "/*Entry Count=*/0" |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
3747 ↗ | (On Diff #135117) | We read this value out of the 5th index unconditionally (or at least when the version is high enough) in BitcodeReader.cpp. Won't it cause an issue on the reader side if we only emit this (and adjust where the ref count is emitted) conditionally? Regardless, suggest asserting in the writer here and in writePerModuleFunctionSummaryRecord when you expect the FS->entryCount() to be 0. |
lib/Transforms/IPO/FunctionImport.cpp | ||
121 ↗ | (On Diff #135117) | why off by default? |
672 ↗ | (On Diff #135117) | Some comments about what the algorithm does would be nice. |
680 ↗ | (On Diff #135117) | Can this be moved down into the if statement body? It isn't used outside of it. |
690 ↗ | (On Diff #135117) | ditto. |
694 ↗ | (On Diff #135117) | can this overflow? |
901 ↗ | (On Diff #135117) | Probably shouldn't do this during importing - setting on the functions already in the dest module isn't really related to importing anyway. Also, if done from the caller (i.e. see thinBackend(), we have a ready map of GUID->Summary, DefinedGlobals). Using that is much more efficient that looking up in the full Index hash map. |
994 ↗ | (On Diff #135117) | Remove? |
Address Teresa's comments and rebase.
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
---|---|---|
3747 ↗ | (On Diff #135117) | Indeed this is a bug. I have changed it to write unconditionally. I don't quite follow the second part of your comment regarding the assert. |
lib/Transforms/IPO/FunctionImport.cpp | ||
121 ↗ | (On Diff #135117) | I would like to enable this in the inliner, do some tuning and then turn this on by default. If you strongly prefer this to be default now, I will do so. |
672 ↗ | (On Diff #135117) | I haven't explained the algorithm here because it is done in SyntheticCountUtils that works on any callgraph. I have tried to add some comments. Let me know if you want any specific details. |
694 ↗ | (On Diff #135117) | Yes, this could overflow. I have changed it to a SaturatingAdd. This should have been set to all copies. I have modified this as well as initialization counts. |
901 ↗ | (On Diff #135117) | thinBackend has the DefinedGlobalsMap, but not the other callers to this function. Could you suggest what is the best way to do this on all callers? |
Sorry for the delay.
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
448 ↗ | (On Diff #141773) | Document '0' parameter like you did elsewhere. |
lib/Analysis/ModuleSummaryAnalysis.cpp | ||
469 ↗ | (On Diff #141773) | Document '0' parameter like you did elsewhere. |
lib/Bitcode/Writer/BitcodeWriter.cpp | ||
3747 ↗ | (On Diff #135117) | Nevermind the comment about the assert, looking again, I don't think it makes much sense. |
lib/Transforms/IPO/FunctionImport.cpp | ||
121 ↗ | (On Diff #135117) | Ok |
901 ↗ | (On Diff #135117) | You only need to invoke this functionality from two places:
Both of these already have the DefinedGlobals map, because they invoke other Thin Link optimizations that require it. I.e. search for callers of thinLTOInternalizeModule, which is passed this map. And just add a call to this functionality, split out into a similar type of function, that takes the map. You'll note that for testing purposes, the old C API has the ability to call the thin link optimizations independent of the other optimizations, i.e. see ThinLTOCodeGenerator::internalize, and you may want to do something similar too (for testing via llvm-lto). |
127 ↗ | (On Diff #141773) | Nit - does this need to be split into 2 lines? It looks like it would fit on one 80 col line. |
686 ↗ | (On Diff #141773) | Nit: reads better without the "of" (i.e. just "All successors are the ..."). |
729 ↗ | (On Diff #141773) | Nit: comment needs a fix, i.e. should "takes of" be "takes on"? |
975 ↗ | (On Diff #141773) | Similarly, this handling can be moved into renameModuleForThinLTO, called further below here, which already does some index-based updates for each imported global. See the start of processGlobalForThinLTO, which does: and the ValueInfo has a pointer to the summary entry. You can have it get the ValueInfo once and share it with your update, which can be done in that method. You may want to add an interface to the ValueInfo to get the summary entry given a ModulePath (Name), since it only has an interface to return the list of summary entries currently. |
Sorry for this very long delay in following up.
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
901 ↗ | (On Diff #135117) | The test I wrote (function_entry_count.ll) runs opt to do function importing. The entry point is from FunctionImportLegacyPass::runOnModule. Is this something that I don't need to support (and rewrite the test appropriately)? |
729 ↗ | (On Diff #141773) | I think I intended "takes care of" |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
901 ↗ | (On Diff #135117) | Based on our offline chat, I changed the test not to use opt (and use llvm-lto2 instead) and I am able to move this code as you've suggested. |
975 ↗ | (On Diff #141773) | I am not sure what is the good way to guard the change with enable-import-entry-count flag. I don't like the idea of referencing the flag from a utils file, but the alternative is to pass the value through the callchain starting from renameModuleForThinLTO, which has other callers. Do you have any suggestions? |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
975 ↗ | (On Diff #141773) | Perhaps do the updates on both the imported functions and the defined functions (what you were doing above) in renameModuleForThinLTO (it's called on the original module and again below for the imported symbols). Then you cover both my comments (since it's called outside importing for the orig module's defined symbols). In that case the EnableImportEntryCount flag can be completely moved into the utils source file. |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
975 ↗ | (On Diff #141773) | I have moved the updates to processGlobalForThinLTO. I am not sure if I am doing the right thing comparing ModulePath() with M.getModuleIdentifier(). |
lib/Analysis/ModuleSummaryAnalysis.cpp | ||
---|---|---|
479 ↗ | (On Diff #176166) | nit: s/InsCount/InstCount/ |
lib/LTO/LTO.cpp | ||
1160 ↗ | (On Diff #176166) | Add brief comment about what this does. |
lib/LTO/ThinLTOCodeGenerator.cpp | ||
203 ↗ | (On Diff #176166) | I don't see this new parameter getting used anywhere. I think it might be leftover from your earlier approach (along with the other changes to this file)? |
941 ↗ | (On Diff #176166) | Add a call to your new analysis above here so it is supported with the old LTO API as well. Then you can augment your new test with an llvm-lto invocation as well. |
lib/Transforms/IPO/FunctionImport.cpp | ||
847 ↗ | (On Diff #176166) | I assume that eventually we would want to gate this based on whether we have FDO or not - how will that work? |
851 ↗ | (On Diff #176166) | What is a fake/invalid edge in the index? |
I have to bump up the index version to 6 after rebasing and fix a few failing test cases.
lib/LTO/ThinLTOCodeGenerator.cpp | ||
---|---|---|
203 ↗ | (On Diff #176166) | Yes, this is leftover from earlier iterations |
lib/Transforms/IPO/FunctionImport.cpp | ||
847 ↗ | (On Diff #176166) | I think eventually we want this to work even when profile data is available by filling in the synthesized count for only those functions which do not have an entry count (a hybrid mode). What I have done now is to gate this on -enable-import-entry-count. To avoid declaring this as extern, I have moved the computeSyntheticCount to FunctionImportUtils.cpp (where the flag is defined) and bail out if the flag is false. |
851 ↗ | (On Diff #176166) | These are the edges from the fake root node to all nodes without parent (see calculateCallGraphRoot). But there is no need to handle them separately (and the FIXME should be removed). The Edge.second.RelBlockFreq will be 0 and hence this lambda will return 0 for the fake edges which is the right thing to do. |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
847 ↗ | (On Diff #176166) | It doesn't belong in FunctionImportUtils.cpp, which is operating on IR in the backends. It belongs somewhere in lib/LTO since this is done during the LTO/thin link portion of the compile. Historically we have put some of these facilities in FunctionImport.cpp even when they aren't directly related to importing (e.g. computeDeadSymbols), but it probably makes sense for them to move to a separate file under lib/LTO/ that can be shared by both the new and old LTO APIs. Maybe best to just create that new file now for your new code (in which case we can move computeDeadSymbols there as a follow on). Not sure the best name, maybe something like SummaryBasedOptimizations.cpp ? Open to better names... Regarding avoiding that option being extern, 2 possibilities come to mind:
Or probably more robust:
|
851 ↗ | (On Diff #176166) | Ok sounds good - I noticed that the FIXME is still there though. |
lib/Transforms/IPO/FunctionImport.cpp | ||
---|---|---|
847 ↗ | (On Diff #176166) | I have create a new SummaryBasedOptimizations.cpp and moved the code there. I have also added a flag to Index to indicate whether synthetic counts are computed. |
LGTM with 1 minor comment nit below, and also can you please add a test like test/Bitcode/thinlto-deadstrip-flag.ll to test for the new flag directly?
include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
818 ↗ | (On Diff #177926) | s/propagatin/propagation/ |