This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Track and use context-sensitive post-optimization function size to drive global pre-inliner in llvm-profgen
ClosedPublic

Authored by wenlei on Aug 16 2021, 6:46 PM.

Details

Summary

This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions.

To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can now query context byte size under switch context-cost-for-preinliner. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context.

The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from Offset2LocStackMap, which will be addressed in split context work; 2) use inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed.

Diff Detail

Event Timeline

wenlei created this revision.Aug 16 2021, 6:46 PM
wenlei requested review of this revision.Aug 16 2021, 6:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 6:46 PM
wenlei edited the summary of this revision. (Show Details)Aug 16 2021, 6:48 PM
wmi added inline comments.Aug 17 2021, 9:38 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
55

The function size computed for a certain context doesn't include the size of any function inlined into it. This matches well with top-down inline in sample loader, but may not match well with cgscc inliner. I forget how much inlining is done by the inliner in sample loader and how much is done by cgscc inliner for CSSPGO. Is it a problem right now?

wenlei added inline comments.Aug 17 2021, 10:07 AM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
55

It's difficult for sample loader to capture all inlining since cgscc inliner uses difference heuristic, but most of the hot inlining should be done in sample loader. The size here is used for global top-down pre-inlining in llvm-profgen which would guide sample loader inlining. cgscc inliner will still be able to use context-less profile and current heuristic to estimate cost/benefit without being affected by what we're doing here.

wmi accepted this revision.Aug 17 2021, 12:09 PM

LGTM.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
55

I see. Merging the non-inlined context profile from other modules back to the base profile in preinliner will be helpful for cgscc inliner.

This revision is now accepted and ready to land.Aug 17 2021, 12:09 PM
hoy added inline comments.Aug 18 2021, 2:51 PM
llvm/tools/llvm-profgen/CSPreInliner.cpp
35

nit: name the switch "use-context-cost-for-preinliner"?

llvm/tools/llvm-profgen/ProfileGenerator.cpp
406

nit: this is gone with D108002

llvm/tools/llvm-profgen/ProfiledBinary.cpp
90

getFunctionSize can be zero for function optimized away in the future? Wondering if Optional worker better here.

97

When Size is invalid, can it mean Context is shorter than an inline path in previous build? For example, given a leaf function with a previous inline path A @ B @ leaf, and the current context B @ leaf, if we don't find a valid size on the reverse tri A @ B @ leaf , should we fall back to use the size stored in node A, instead of finding a sibling context like A @ C @ leaf?

311

should use UsePseudoProbes instead of ShowPseudoProbe?

llvm/tools/llvm-profgen/ProfiledBinary.h
109

typo: instructins

wenlei added inline comments.Aug 18 2021, 3:29 PM
llvm/tools/llvm-profgen/CSPreInliner.cpp
35

changed

llvm/tools/llvm-profgen/ProfileGenerator.cpp
406

rebased

llvm/tools/llvm-profgen/ProfiledBinary.cpp
90

Yeah, currently we don't track zero size context. This is one of the todos mentioned above. I plan to deal with that later when we properly track zero size - currently zero means unknown.

97

if we don't find a valid size on the reverse tri A @ B @ leaf

You meant to say B @ leaf?

That's a good point though currently in this case, we won't find sibling A @ C @ leaf either, because the node for B @ leaf exists. We would get zero from B @ leaf.

Assuming zero means unknown before we start to track zero size, we need to tweak it slightly to retrieve A @ B @ leaf in this case.

if (!Size) {
  if  (!CurrNode)
      CurrNode = PrevNode;
  while (!Size && CurrNode) {
    CurrNode = &CurrNode->getAllChildContext().begin()->second;
    Size = CurrNode->getFunctionSize();
  }
}
311

This was intentional as it though we might still want to have a way to dump line callsite even for binary with probes. I don't have a strong opinion though.

llvm/tools/llvm-profgen/ProfiledBinary.h
109

fixed

wenlei updated this revision to Diff 367342.Aug 18 2021, 3:29 PM

Address comments.

wenlei updated this revision to Diff 367343.Aug 18 2021, 3:36 PM

Shrink perfscript used in test.

hoy accepted this revision.Aug 18 2021, 4:24 PM
hoy added inline comments.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
97

Looks good. The code works for the A @ B @ leaf case.

311

ShowPseudoProbe makes sense. I actually misunderstood this code as on the non-printing path.

This revision was landed with ongoing or failed builds.Aug 18 2021, 11:01 PM
This revision was automatically updated to reflect the committed changes.
MTC added a subscriber: MTC.Aug 18 2021, 11:56 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp