This is an archive of the discontinued LLVM Phabricator instance.

[ModuleInliner] Add a cost-benefit-based priority
ClosedPublic

Authored by kazu on Sep 21 2022, 10:50 AM.

Details

Summary

This patch teaches the module inliner a traversal order designed for
the instrumentation FDO (+ThinLTO) scenario.

The new traversal order prioritizes call sites in the following order:

  1. Those call sites that are expected to reduce the caller size
  1. Those call sites that have gone through the cost-benefit analaysis
  1. The remaining call sites

With this fairly simple traversal order, a large internel benchmark
yields performance comparable to the bottom-up inliner -- both in
terms of the execution performance and .text* sizes.

Big thanks goes to Liqiang Tao for the module inliner infrastructure.

I still have hacks outside this patch to prevent excessively long
compilation or .text* size explosion. I'm trying to come up with
acceptable solutions in near future.

Diff Detail

Event Timeline

kazu created this revision.Sep 21 2022, 10:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 10:50 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
kazu requested review of this revision.Sep 21 2022, 10:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 10:50 AM
kazu updated this revision to Diff 462822.Sep 25 2022, 11:53 PM

Use getStaticBonusApplied.

kazu retitled this revision from [ModuleInliner] Add a cost-benefit-based priority (WORK-IN-PROGRESS) to [ModuleInliner] Add a cost-benefit-based priority.Sep 25 2022, 11:54 PM
kazu edited the summary of this revision. (Show Details)
kazu added a reviewer: taolq.Sep 25 2022, 11:54 PM
kazu added a comment.Sep 27 2022, 9:16 AM

Please take a look. Thanks!

a large internel benchmark yields performance comparable to the bottom-up inliner -- both in terms of the execution performance and .text* sizes.

This looks promising. The comparison is between enable-module-inliner on vs off, right? Do you plan to tune and open up module inliner for sample PGO and non-PGO cases where cost-benefit analysis isn't available yet?

llvm/lib/Analysis/InlineOrder.cpp
34

perhaps just name it "cost-benefit"? "ratio" can be confusing.

kazu added a comment.Sep 27 2022, 12:52 PM

a large internel benchmark yields performance comparable to the bottom-up inliner -- both in terms of the execution performance and .text* sizes.

This looks promising. The comparison is between enable-module-inliner on vs off, right?

Thanks. Yes, -mllvm -enable-module-inliner is the only difference. Both the baseline and the experiment use FDO, ThinLTO, and -fsplit-machine-functions.

One thing I might point out here is that I am not doing any cleanup beyond whatever basic cleanups InlineFunction performs. With the bottom-up inliner, we diligently clean up after each SCC (see PassBuilder::addPGOInstrPasses), but that doesn't seem to matter with the module inliner. My hypothesis is that once we inline those call sites that reduce the caller size, followed by ones with high benefit-to-cost ratios, then we've captured the vast majority of the benefit from inlining. That is, we don't really need the exact "tightest" instruction count after DCE, CSE, etc. Even if the module inliner didn't give us additional performance, it could still simplify our life -- no CGSCC maintenance or successive cleanups.

Do you plan to tune and open up module inliner for sample PGO and non-PGO cases where cost-benefit analysis isn't available yet?

Yes, I'd like to do something for the sample PGO case. I'd like to enable the cost-benefit analysis for the sample PGO, but that's been my hardest project. IIUC, the sample profile loader keeps inlining functions top down until the weight of the inlining subtree goes below the threshold. As a result, by the time we get to the profile-driven inliner (Inliner.cpp), we don't have a lot of interesting decisions left to make. For the sample PGO case, I think I have to depart from the top-down inlining and start inlining callees from where it matters according to some combinations of context sensitivity, profile counts, and the usual metrics from trial inlining (InlineCost.cpp), etc.

The non-PGO case isn't our primary interest. That said, if I come up with reasonable heuristics, I might contribute that. There may be a long-term benefit in steering the community toward the module inliner as the sole inliner as opposed to directing them to the two different inliners (Inliner.cpp and ModuleInliner.cpp) depending on whether they are using instrumentation FDO or not.

taolq accepted this revision.Sep 29 2022, 12:59 AM

LGTM.
Thanks for this work. It's glad to see this infrastructure works well.
Looking forward to discovering reasonable heuristics :-D
Don't forget to resolve the comment.

This revision is now accepted and ready to land.Sep 29 2022, 12:59 AM
kazu updated this revision to Diff 463925.Sep 29 2022, 8:58 AM

Renamed OptRatio to CostBenefit.

Renamed optratio to cost-benefit.

kazu marked an inline comment as done.Sep 29 2022, 8:59 AM
This revision was landed with ongoing or failed builds.Sep 29 2022, 9:00 AM
This revision was automatically updated to reflect the committed changes.
kazu added a comment.Sep 29 2022, 9:08 AM

LGTM.
Thanks for this work. It's glad to see this infrastructure works well.
Looking forward to discovering reasonable heuristics :-D

Thank you for the review! We just started the module inliner journey. I'm also looking forward to improving both the priority and threshold functions.