This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Fix incorrect probe distribution factor computation in top-down inliner
ClosedPublic

Authored by wlei on Apr 1 2021, 10:04 PM.

Details

Summary

We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it.

Sum does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again.

This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug.

Diff Detail

Event Timeline

wlei created this revision.Apr 1 2021, 10:04 PM
wlei requested review of this revision.Apr 1 2021, 10:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2021, 10:04 PM
wlei edited the summary of this revision. (Show Details)Apr 1 2021, 10:54 PM
wlei added reviewers: wmi, hoy, wenlei, davidxl.
hoy added inline comments.Apr 1 2021, 11:28 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
875

Thanks for the fix. This seems losing the original distribution factor pre

875

nvm, ignore this comment.

1327

Thanks for the fix! I'm wondering if the fix should be made here by scaling down SumOrigin as well. SumOrigin is used to guide ICP and we are checking the scaled count against the original count below (line 1342). This could cause a promotable indirect callsite not promoted after the callsite is duplicated. E.g, an indirect callsite with only one target is duplicated in prelink with even distribution. And in postlink, each of the callsite has a target that counts 50% of the original counts. If the original callsite is supposed to be promoted, once duplicated, it may not get promoted in postlink because of the lower weight (50% vs original 100%) in either place.

wenlei added inline comments.Apr 1 2021, 11:41 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1327

It's arguable as to what is more desirable, if a call site is duplicated. If we scale down SumOrigin, the ICP logic would be more consistent between when there's duplication vs not. But on the other hand, we might indeed want to penalize duplicated call sites because ICP (and inline) for all of them has larger cost that is not taken into account. I think that's more of matter of tuning, but the current fix in the patch is addressing something clearly wrong. We would need some perf testing if we're making the change of scaling down SumOrigin.

hoy added inline comments.Apr 1 2021, 11:51 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1327

Agreed. Numbers are needed to validate a change scaling down SumOrigin. Current change looks good to me.

@wlei Can you please add a test?

wlei added inline comments.Apr 2 2021, 10:22 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1327

Thanks for the explanation!

Can you please add a test?

Good point, will try to add a test.

wlei updated this revision to Diff 335295.Apr 5 2021, 10:50 AM
wlei edited the summary of this revision. (Show Details)

ad test case: manually make a < 1 factor of the promotable call site, after ICP check the discriminator

hoy added inline comments.Apr 5 2021, 11:05 AM
llvm/test/Transforms/SampleProfile/pseudo-probe-icp-factor.ll
248 ↗(On Diff #335295)

Thanks for adding the test.

Can you comment here how to interpret the discriminator? Can you also add a check for which call instruction this metadata is attached to? An example could be test/Transforms/SampleProfile/pseudo-probe-profile.ll

wlei updated this revision to Diff 335316.Apr 5 2021, 1:47 PM

Addressing Hongtao's feedback

hoy accepted this revision.Apr 5 2021, 2:28 PM

lgtm, thanks!

llvm/lib/Transforms/IPO/SampleProfile.cpp
875

Nit: may just do a type casting here instead of using the multiplication. The compiler may not optimize it away in different FP models.

This revision is now accepted and ready to land.Apr 5 2021, 2:28 PM
wenlei added inline comments.Apr 5 2021, 3:20 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
875

+1 for explicit casting

wlei updated this revision to Diff 335344.Apr 5 2021, 3:23 PM

switch to explicit cast

wenlei accepted this revision.Apr 5 2021, 4:03 PM

lgtm, thanks.

wenlei added a comment.Apr 7 2021, 8:48 AM

Going to pushed this for Lei (who's on leave for the week), so it's available in private fork too.