This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO] Enable sample-profile-top-down-load by default.
ClosedPublic

Authored by wmi on Jun 30 2020, 3:32 PM.

Details

Summary

sample-profile-top-down-load is an internal option which can enable top-down order of inlining and profile annotation in sample profile load pass. It was found to be beneficial for better profile annotation.

Recently we found it could also solve some build time issue. Suppose function A has many callsites in function B. In the last release binary where sample profile was collected, the outline copy of A is large because there are many other functions inlined into A. However although all the callsites calling A in B are inlined, but every inlined body is small (A was inlined into B before other functions are inlined into A), there is no build time issue in last release.

In an optimized build using the sample profile collected from last release, without top-down inlining, we saw a case that A got very large because of inlining, and then multiple callsites of A got inlined into B, and that led to a huge B which caused significant build time issue besides profile annotation issue.

To solve that problem, the patch proposes to enable the flag sample-profile-top-down-load by default.

I reevaluated the performance again in two server benchmarks. Run one benchmark 6 times, it had no performance change in 4 runs and had 0.2% improvement in 2 runs. Run another benchmark 6 times and it had no performance change.

Diff Detail

Event Timeline

wmi created this revision.Jun 30 2020, 3:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2020, 3:32 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

Great. We use this switch together with -sample-profile-merge-inlinee, so the profile to drive top-down inlining is more accurate and that often gives best performance. May want to turn both on together?

wmi added a comment.Jun 30 2020, 4:53 PM

Sounds good to me. I will redo the performance test for it.

wmi updated this revision to Diff 275836.Jul 6 2020, 2:45 PM

Enable -sample-profile-merge-inlinee by default together with -sample-profile-top-down-load.

wmi added a comment.Jul 6 2020, 2:54 PM

I tested the performance with sample-profile-top-down-load and sample-profile-merge-inlinee both enabled. In different compiler versions I got different result. In one version about three weeks older, I got 0.4% improvement for one benchmark steadily in multiple runs and neutral for another. In the head llvm version, I saw neutral result for both benchmarks.

wmi updated this revision to Diff 275867.Jul 6 2020, 5:36 PM

Disable sample-profile-merge-inlinee when sample-profile-top-down-load is not effective (Currently sample-profile-top-down-load is only effective for new pass manager).

wenlei accepted this revision.Jul 6 2020, 8:03 PM

Thanks for measurement. LGTM.

This revision is now accepted and ready to land.Jul 6 2020, 8:03 PM
davidxl accepted this revision.Jul 8 2020, 8:30 AM

lgtm

This revision was automatically updated to reflect the committed changes.
ychen added a subscriber: ychen.Nov 30 2020, 3:35 PM
ychen added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
1791

I don't fully understand the patch TBH. ProfileMergeInlinee is also set to false when CG == nullptr which holds true when the legacy pass manager is used. Is this intended?

hoy added a subscriber: hoy.Nov 30 2020, 6:47 PM
hoy added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
1791

ProfileMergeInlinee is set to false when a top-down inlining is not available. This happens when the top-down inlining is explicitly disabled, or when a call graph is not available which means a top-down order cannot be computed.

ychen added inline comments.Nov 30 2020, 7:54 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1791

@hoy, thank you. I just realized that I should've asked this in D70655 where the call graph is not computed hence not available for the legacy pass manager.