This is an archive of the discontinued LLVM Phabricator instance.

[AutoFDO] Inline replay for cold/small callees from sample profile loader
ClosedPublic

Authored by wenlei on Nov 26 2019, 4:05 PM.

Details

Summary

Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout.

This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold. Last, -sample-profile-reinline-all is added to replay all inlining from previous compilation - this is for tuning only and the purpose is to avoid loss of context-sensitivity due to inlining churn at the cost of not using context-sensitive profile to drive inlining.

Diff Detail

Event Timeline

wenlei created this revision.Nov 26 2019, 4:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 26 2019, 4:05 PM
wenlei updated this revision to Diff 231147.Nov 26 2019, 4:20 PM

update test case

wmi added a comment.Dec 5 2019, 9:10 AM

Did performance test and I saw 0.4% improvement in an internal benchmark. That is a good improvement, thanks for the change!

llvm/lib/Transforms/IPO/SampleProfile.cpp
976–979

Can we inline all by setting sample-profile-cold-inline-threshold to a very large number so ProfileReInlineAll is not needed?

wenlei updated this revision to Diff 232444.Dec 5 2019, 1:27 PM
wenlei marked an inline comment as done.

address feedback

wenlei added a comment.Dec 5 2019, 2:37 PM

Did performance test and I saw 0.4% improvement in an internal benchmark. That is a good improvement, thanks for the change!

Thanks for doing the measurement. I'm really glad that this change and other recent ones in this area helped your workload too. These are like "prototype" for the stuff we talked about a while ago. These numbers are important signals so we know what we're trying to do is at least a step in the right direction.

llvm/lib/Transforms/IPO/SampleProfile.cpp
976–979

good suggestion. changed.

wmi accepted this revision.Dec 6 2019, 9:05 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Dec 6 2019, 9:05 AM
This revision was automatically updated to reflect the committed changes.