This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Add call edges' relative block frequency to per-module summary.
ClosedPublic

Authored by eraman on Jan 17 2018, 4:26 PM.

Details

Summary

This allows relative block frequency of call edges to be passed to the
thinlink stage where it will be used to compute synthetic entry counts
of functions.

Diff Detail

Repository
rL LLVM

Event Timeline

eraman created this revision.Jan 17 2018, 4:26 PM
tejohnson added inline comments.Jan 18 2018, 4:21 PM
include/llvm/Bitcode/LLVMBitCodes.h
259 ↗(On Diff #130310)

Could use a description of when this is used.

include/llvm/IR/ModuleSummaryIndex.h
58 ↗(On Diff #130310)

Do we need 64 bits in practice? Especially since these are scaled down by the entry freq. Concerned about the size increase for large indices that have huge numbers of edges. Also, since we'll never have both the Hotness and the RelBlockFreq, it would be great if we could union them, although that would require a flag on the index and I suppose we might want to support mixing modules built with and without profile data? Perhaps some other way to combine them. I.e. use a single field and reserve the lowest 3 bits for the hotness, shifting the relative frequency into the upper n-3 bits...

lib/Analysis/ModuleSummaryAnalysis.cpp
63 ↗(On Diff #130310)

Suggest moving this to the bitcode writer and always adding RelBlockFreq into the in-memory index here.

Is there any reason to default off? Why not just default on from the get-go?

290 ↗(On Diff #130310)

Any chance getEntryFreq will be 0?

292 ↗(On Diff #130310)

Suggest saving the ValueInfo already created earlier on line 280.

eraman added inline comments.Jan 19 2018, 2:56 PM
include/llvm/IR/ModuleSummaryIndex.h
58 ↗(On Diff #130310)

From our offline discussions I thought the use of 64 bits doesn't matter as they get compressed. Is the concern about in-memory size? I can reduce the size to, say, 29 bits (uint32_t with 3 bits reserved for the enum). Another thing to note is that only the per-module summary has this information. Is there a way to make use of this?

lib/Analysis/ModuleSummaryAnalysis.cpp
63 ↗(On Diff #130310)

Until this information actually gets used (first propagate the counts and then use the counts in the inliner), this just poses extra compile time and increased summary size if not turned off. Alternatively, I can drop this patch and then send it as part of a larger patch that builds the callgraph and propagates the counts.

tejohnson added inline comments.Jan 19 2018, 4:07 PM
include/llvm/IR/ModuleSummaryIndex.h
58 ↗(On Diff #130310)

Right - concerned about in-memory size. Why does only the per-module summary contain this? Isn't that just temporary - I thought the idea was to propagate this into the backends?

lib/Analysis/ModuleSummaryAnalysis.cpp
63 ↗(On Diff #130310)

How long do you expect until the other patches to be sent? If not too long I don't know that it matters too much since we are going to incur the extra space/memory/time in the near term anyway. Better to have it tested.

eraman marked 2 inline comments as done.Jan 22 2018, 4:43 PM
eraman added inline comments.
include/llvm/IR/ModuleSummaryIndex.h
58 ↗(On Diff #130310)

Made this into a 29 bit bitfield.

lib/Analysis/ModuleSummaryAnalysis.cpp
63 ↗(On Diff #130310)

Most of the patches are adding necessary infrastructure but the final component is tuning the inliner which could take a long time. I have now changed the code to compute relative block frequency unconditionally but have left the default value of the flag to false so it won't be written.

290 ↗(On Diff #130310)

When they are computed they can't be 0, but there is an interface to set block frequency (this is used for incremental update) although in that case BFI should be recalculated after the pass. In any case, since the possibility exists I have guarded the division.

eraman updated this revision to Diff 130987.Jan 22 2018, 4:43 PM

Address Teresa's comments

tejohnson added inline comments.Jan 23 2018, 10:36 AM
include/llvm/IR/ModuleSummaryIndex.h
63 ↗(On Diff #130987)

This shouldn't be here - it is adding overhead to every single call edge. I confirmed that with this here, the new sizeof(CalleeInfo) is 16 bytes (4 bytes for Hotness+RelBlockFreq, 4 bytes for padding to start uint64_t, 8 bytes for uint64_t). Without this defined here the sizeof(CalleeInfo) goes down to 4 bytes as expected.

lib/Analysis/ModuleSummaryAnalysis.cpp
52 ↗(On Diff #130987)

Not needed now?

eraman updated this revision to Diff 131117.Jan 23 2018, 11:29 AM
eraman marked an inline comment as done.

Address Teresa's comments.

eraman added inline comments.Jan 23 2018, 11:29 AM
include/llvm/IR/ModuleSummaryIndex.h
63 ↗(On Diff #130987)

Good catch. I have made this to a static constexpr and avoided the definition. Let me know if this looks okay.

This revision is now accepted and ready to land.Jan 23 2018, 11:47 AM
This revision was automatically updated to reflect the committed changes.
eraman reopened this revision.Jan 24 2018, 3:15 PM

The patch caused two issues

  • I used CI (of type CallInst*) to get the parent BB, but CI could be null. The fix is to use &BB directly.
  • Use of HotnessType Hotness : 3 gave warnings. The fix is to make it uint32_t Hotness : 3 and use static_cast to convert back and forth from enum.

I will update the patch. PTAL.

This revision is now accepted and ready to land.Jan 24 2018, 3:15 PM
eraman updated this revision to Diff 131366.Jan 24 2018, 3:16 PM

Fix buildbot errors and warnings.

george.karpenkov resigned from this revision.Jun 21 2018, 9:58 AM
This revision was automatically updated to reflect the committed changes.