This is an archive of the discontinued LLVM Phabricator instance.

[InlineCost] Implement cost-benefit-based inliner
ClosedPublic

Authored by kazu on Dec 7 2020, 11:32 AM.

Details

Summary

This patch adds an alternative cost metric for the inliner to take
into account both the cost (i.e. size) and cycle count savings into
account.

Without this patch, we decide to inline a given call site if the size
of inlining the call site is below the threshold that is computed
according to the hotness of the call site.

This patch adds a new cost metric, turned off by default, to take over
the handling of hot call sites. Specifically, with the new cost
metric, we decide to inline a given call site if the ratio of cycle
savings to size exceeds a threshold. The cycle savings are computed
from call site costs, parameter propagation, folded conditional
branches, etc, all weighted by their respective profile counts. The
size is primarily the callee size, but we subtract call site costs and
the size of basic blocks that are never executed.

The new cost metric implicitly takes advantage of the machine function
splitter recently introduced by Snehasish Kumar, which dramatically
reduces the cost of duplicating (e.g. inlining) cold basic blocks by
placing cold basic blocks of hot functions in the .text.unlikely
section.

We evaluated the new cost metric on clang bootstrap and SPECInt 2017.

For clang bootstrap, we observe 0.69% runtime improvement.

For SPECInt we report the change in IntRate the C/C++ benchmarks. All
benchmarks apart from perlbench and omnetpp improve, on average by
0.21% with the max for mcf at 1.96%.

Benchmark % Change
500.perlbench_r -0.45
502.gcc_r 0.13
505.mcf_r 1.96
520.omnetpp_r -0.28
523.xalancbmk_r 0.49
525.x264_r 0.00
531.deepsjeng_r 0.00
541.leela_r 0.35
557.xz_r 0.21

Diff Detail

Event Timeline

kazu created this revision.Dec 7 2020, 11:32 AM
kazu requested review of this revision.Dec 7 2020, 11:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2020, 11:32 AM
mtrofin added inline comments.Dec 7 2020, 2:00 PM
llvm/lib/Analysis/InlineCost.cpp
484

can size be negative? Should it be int64_t?

Also, is it a size, or it's a cost - judging from its usage, probably the latter. Should it be ColdSizeCost?

680

it's auto*, right?

731

for really large profile counts (e.g. worker thread that blocks, waiting for work) this may overflow. Do we care?

davidxl added inline comments.Dec 8 2020, 9:08 AM
llvm/lib/Analysis/InlineCost.cpp
81

Make the documentation string more detailed.

481

To differentiate existing (static) cost which models size, perhaps name it 'weightedCostAtBBStart'? or 'DynamicCostAtBBStart'?

484

The naming needs to be consistent. Is it computing static size or weighted cost?

630

If function entry has no count, early skip -- to avoid compile time impact to nonPGO build.

676

Check function entry count first for both caller and callee.

676

Is 'F' callee? if so, name BFI CalleeBFI .

687

if caller entry has count, this should assert to be true.

690

No need for this loop if the entry count is checked at beginning.

kazu updated this revision to Diff 311103.Dec 10 2020, 8:14 PM

Incorporated feedback.

kazu marked 10 inline comments as done.Dec 10 2020, 8:22 PM

PTAL. Thanks!

llvm/lib/Analysis/InlineCost.cpp
481

This is part of static size computation. I've expanded the comments.

484

I'm computing the static size here. CostAtBBStart is just a copy of Cost at the beginning of the basic block being processed so that we can compute the size of the the basic block in case it's cold.

Since I'm just copying Cost, I'd like to use the same type as Cost.

676

I've renamed BFI to CalleeBFI.

I've moved up the entry count for Callee.

For the caller, do I need to check its entry count? All I need is the profile count at the call site. In any event, I've moved up the code to obtain the profile count at the call site.

731

I've switched to 128-bit APInt. I've put justification for the bit size as a comment for CycleSavings.

davidxl added inline comments.Dec 11 2020, 11:13 AM
llvm/lib/Analysis/InlineCost.cpp
481

In comment, it should be Cost - CostAtBBStart

632

Are these guards needed? It is already checked in costBenefitAnalysis. Perhaps move the checking in a common place and set it to boolean variable: costBenefitAnalysisEnabled and the value of the variable can be checked here.

634

should assert that it has value

692

checker CallerEntry count earlier than this check can be removed.

kazu updated this revision to Diff 311337.Dec 11 2020, 4:54 PM
kazu marked 4 inline comments as done.

Incorporated the feedback.

kazu marked 5 inline comments as done.Dec 11 2020, 4:57 PM

PTAL. Thanks!

llvm/lib/Analysis/InlineCost.cpp
632

I've created a function called isCostBenefitAnalysisEnabled.

kazu marked an inline comment as done.Dec 16 2020, 8:55 AM

Friendly ping. Thanks!

mtrofin accepted this revision.Dec 16 2020, 9:03 AM

lgtm, but see if @davidxl has any additional comments.

This revision is now accepted and ready to land.Dec 16 2020, 9:03 AM

I suggest also dumping some opt remarks under --pass-remarks-analysis. This also allows you to write test cases to cover various types of savings.

llvm/lib/Analysis/InlineCost.cpp
696

Use interface Function::getEntryCount or Function::hasProfileData instead.

748

how about load & store instructions that can be SROAed?

This revision was landed with ongoing or failed builds.Dec 18 2020, 12:38 AM
This revision was automatically updated to reflect the committed changes.
aemerson added a subscriber: aemerson.

FYI I left a comment on https://reviews.llvm.org/D98213 about why I think this feature should not have been enabled.

Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2023, 2:22 PM
Herald added a subscriber: ChuanqiXu. · View Herald Transcript