This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][Preinliner] Trim cold call edges of the profiled call graph for a more stable profile generation.
ClosedPublic

Authored by hoy on Mar 27 2023, 4:32 PM.

Details

Summary

I've noticed that for some services CSSPGO profile is less stable than non-CS AutoFDO profile from profiling to profiling without source changes. This is manifested by comparing profile similarities. For example in my experiments, AutoFDO profiles are always 99+% similar over same binary but different inputs (very close dynamic traffics) while CSSPGO profile similarity is around 90%.

The main source of the profile stability is the top-down order computed on the profiled call graph in the llvm-profgen CS preinliner. The top-down order is used to guide the CS preinliner to pre-compute an inline decision that is later on fulfilled by the compiler. A subtle change in the top-down order from run to run could cause a different inline decision computed. A deeper look in the diversion of the top-down order revealed that:

  • The topological sorting inside one SCC isn't quite right. This is fixed by D130717: [SCCIterator] Fix an issue in scc_member_iterator sorting.
  • The profiled call graphs of the two sides of the A/B run isn't 100% the same. The call edges in the two runs do not subsume each other, and edges appear in both graphs may not have exactly the same weight. This is due to the nature that the graphs are dynamic. However, I saw that the graphs can be made more close by removing the cold edges from them and this bumped up the CSSPGO profile stableness to the same level of the AutoFDO profile.

Removing cold call edges from the dynamic call graph may have an impact on cold inlining, but so far I haven't seen any performance issues since the CS preinliner mainly targets hot callsites, and cold inlining can always be done by the compiler CGSCC inliner.

Also fixing an issue where the largest weight instead of the accumulated weight for a call edge is used in the profiled call graph.

Diff Detail

Event Timeline

hoy created this revision.Mar 27 2023, 4:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 4:32 PM
hoy requested review of this revision.Mar 27 2023, 4:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 4:32 PM
hoy edited the summary of this revision. (Show Details)Mar 27 2023, 4:34 PM
hoy added reviewers: wenlei, wlei.
wenlei added inline comments.Mar 27 2023, 4:51 PM
llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
157

good catch

llvm/tools/llvm-profgen/CSPreInliner.cpp
84

I understand that you want to do this only for preinliner, not for sample loader in compiler. But the current way of doing this - exposing internals of profiled calll graph only for users to tweak the actual graph, is less than idea as it breaks the abstraction.

Can we have a separate parameter (i.e. hotCallonly) for ctor of ProfiledCallGraph, so this is all self contained? And perhaps slight more efficient as it avoids add and then removal.

hoy added inline comments.Mar 27 2023, 4:57 PM
llvm/tools/llvm-profgen/CSPreInliner.cpp
84

I can pass in the threshold and do the trimming in place inside ProfiledCallGraph, wdyt?

Perhaps we have to add the edges first and then remove the cold ones after the weight accumulation is done.

wenlei added inline comments.Mar 27 2023, 5:35 PM
llvm/tools/llvm-profgen/CSPreInliner.cpp
84

Sounds good.

(I'm still leaning towards something like hotEdgesOnly as extra argument to keep the interface simple. But I can see that we will need to pass in ProfileSummary anyways, so passing a threshold value directly sounds good.)

hoy updated this revision to Diff 508862.Mar 27 2023, 5:54 PM

Addressing comments.

wenlei added inline comments.Mar 27 2023, 6:04 PM
llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
67

nit: IgnoreColdCallThreshold

llvm/tools/llvm-profgen/CSPreInliner.cpp
84

why is +1 needed? if cold threshold is 0, nothing is cold, and we filter out nothing -- it just works?

wlei added inline comments.Mar 27 2023, 9:27 PM
llvm/tools/llvm-profgen/CSPreInliner.cpp
82

I kinda remember we discussed offline that you tested for trimming hot edges vs trimming cold edges, how about other threshold, like trimming non-hot edges, or it's just a balance, then I'm wondering if other value can perform better, in that case we can tune a new threshold?

hoy added inline comments.Mar 28 2023, 9:06 AM
llvm/tools/llvm-profgen/CSPreInliner.cpp
82

Good question. With the cold threshold, CSSPGO profile goes up to 99% similar run to run, so using hot threshold isn't that helpful and it also regresses performance a little bit.

84

There are still zero-weight edges coming from callstack unwinding and they do not show up in the profile. I think. we want to trim them.

hoy updated this revision to Diff 509042.Mar 28 2023, 9:08 AM

Addressing feedbacks.

wenlei added inline comments.Mar 28 2023, 10:11 AM
llvm/tools/llvm-profgen/CSPreInliner.cpp
84

what I meant is if cold threshold is already zero anyways, that would mean nothing is cold. so nothing should be trimmed. Usually cold threshold is never zero, in which case all those zero-weight edges will be trimmed without +1.

hoy added inline comments.Mar 28 2023, 10:30 AM
llvm/tools/llvm-profgen/CSPreInliner.cpp
84

The cold threshold is mainly computed from LBR samples and it shouldn't be zero. But we are trimming edges with weight below the threshold (line 199 in ProfiledCallGraph.h), so the +1 here is to make sure all cold edges go away.

The +1 is not needed here if we use a less-equal check on the profiled call graph check. But there isn't a way to not trim any edges unless we make the zero threshold special in the interface. (well, I have the check in line 192 in ProfiledCallGraph.h to make is special right now, but it's unnecessary as interface-wise zero is not considered special).

wenlei added inline comments.Mar 28 2023, 3:09 PM
llvm/tools/llvm-profgen/CSPreInliner.cpp
84

I understand what you said. But let me try one more time. :) So you want to differentiate 0 as special value to disable trimming vs 0 as a real threshold for trimming. What I meant we don't need to differentiate, because 0 as a real threshold also means no need for trimming at all.

If we consider anything "< threshold" as cold, threshold equals 0 means nothing is cold, hence nothing needs to be trimmed. This is not a big deal anyways, but I just feel that things fits together nicely already without needing +1.

hoy updated this revision to Diff 509155.Mar 28 2023, 3:22 PM

Addressing feedbacks.

llvm/tools/llvm-profgen/CSPreInliner.cpp
84

Talked off. Would like to go with inclusive for the threshold.

wenlei accepted this revision.Mar 28 2023, 3:23 PM

lgtm, thanks.

This revision is now accepted and ready to land.Mar 28 2023, 3:23 PM
This revision was landed with ongoing or failed builds.Mar 28 2023, 4:25 PM
This revision was automatically updated to reflect the committed changes.