This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][PriorityInliner] Do not use block weight to drive callsite inlining.
ClosedPublic

Authored by hoy on Mar 1 2022, 4:07 PM.

Details

Summary

The priority-based inliner currenlty uses block count combined with callee entry count to drive callsite inlining. This doesn't work well with LTO where postlink inlining is driven by prelink-annotated block count which could be based on the merge of all context profiles. I'm fixing it by using callee profile entry count only which should be context-sensitive.

I'm seeing 0.2% perf improvment for one of our internal large benchmarks with probe-based non-CS profile.

Diff Detail

Event Timeline

hoy created this revision.Mar 1 2022, 4:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 4:07 PM
hoy requested review of this revision.Mar 1 2022, 4:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 4:07 PM
wenlei added inline comments.Mar 1 2022, 4:21 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1309

When we don't have callee samples, should we fall back to call site block counts?

In reality we would also need to tolerate some source change, i.e. the call site didn't exist in pass1 build.

hoy added inline comments.Mar 1 2022, 4:31 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1309

If the callsite doesn't exist in pass1, the caller profile will probably be discarded due to checksum mismatch. So here when callee sample is missing, it's likely that the callsite is cold in this particular context. Using block count might end up treating it as hot.

hoy added inline comments.Mar 1 2022, 4:36 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1309

Interesting point on tolerating source changes. That would require some changes to probe numbering. So far we number callsite probes sequentially which is easily broken with a new callsite introduced. We might somehow need to keep this numbering stable across builds.

wenlei added inline comments.Mar 1 2022, 4:41 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1309

If the callsite doesn't exist in pass1, the caller profile will probably be discarded due to checksum mismatch.

Is it? If that's the behavior today, it defeats the purpose of CFG based profile - we should be able to match as long as CFG does not change. we can have a change that adds a call site without changing CFG.

hoy added inline comments.Mar 1 2022, 5:48 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1309

Yes, callsites are currently considered when computing the function checksum:

FunctionHash = (uint64_t)CallProbeIds.size() << 48 |
(uint64_t)Indexes.size() << 32 | JC.getCRC();

We do this to prevent callsite samples mismatch. E.g, indirect call target samples can be applied to an irrelevant callsite when the number of callsite changes.

wenlei accepted this revision.Mar 1 2022, 6:25 PM

lgtm, thanks.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1309

This is something we need to address eventually. The design of CSSPGO is made to tolerate changes not altering CFG, but looks like the current implementation does not satisfy that yet.. For now this change looks fine.

This revision is now accepted and ready to land.Mar 1 2022, 6:25 PM
This revision was landed with ongoing or failed builds.Mar 1 2022, 6:43 PM
This revision was automatically updated to reflect the committed changes.