Page MenuHomePhabricator

[InlineCost] Implement cost-benefit-based inliner

Authored by kazu on Dec 7 2020, 11:32 AM.



This patch adds an alternative cost metric for the inliner to take
into account both the cost (i.e. size) and cycle count savings into

Without this patch, we decide to inline a given call site if the size
of inlining the call site is below the threshold that is computed
according to the hotness of the call site.

This patch adds a new cost metric, turned off by default, to take over
the handling of hot call sites. Specifically, with the new cost
metric, we decide to inline a given call site if the ratio of cycle
savings to size exceeds a threshold. The cycle savings are computed
from call site costs, parameter propagation, folded conditional
branches, etc, all weighted by their respective profile counts. The
size is primarily the callee size, but we subtract call site costs and
the size of basic blocks that are never executed.

The new cost metric implicitly takes advantage of the machine function
splitter recently introduced by Snehasish Kumar, which dramatically
reduces the cost of duplicating (e.g. inlining) cold basic blocks by
placing cold basic blocks of hot functions in the .text.unlikely

We evaluated the new cost metric on clang bootstrap and SPECInt 2017.

For clang bootstrap, we observe 0.69% runtime improvement.

For SPECInt we report the change in IntRate the C/C++ benchmarks. All
benchmarks apart from perlbench and omnetpp improve, on average by
0.21% with the max for mcf at 1.96%.

Benchmark % Change
500.perlbench_r -0.45
502.gcc_r 0.13
505.mcf_r 1.96
520.omnetpp_r -0.28
523.xalancbmk_r 0.49
525.x264_r 0.00
531.deepsjeng_r 0.00
541.leela_r 0.35
557.xz_r 0.21

Diff Detail

Event Timeline

kazu created this revision.Dec 7 2020, 11:32 AM
kazu requested review of this revision.Dec 7 2020, 11:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2020, 11:32 AM
mtrofin added inline comments.Dec 7 2020, 2:00 PM

can size be negative? Should it be int64_t?

Also, is it a size, or it's a cost - judging from its usage, probably the latter. Should it be ColdSizeCost?


it's auto*, right?


for really large profile counts (e.g. worker thread that blocks, waiting for work) this may overflow. Do we care?

davidxl added inline comments.Dec 8 2020, 9:08 AM

Make the documentation string more detailed.


To differentiate existing (static) cost which models size, perhaps name it 'weightedCostAtBBStart'? or 'DynamicCostAtBBStart'?


The naming needs to be consistent. Is it computing static size or weighted cost?


If function entry has no count, early skip -- to avoid compile time impact to nonPGO build.


Check function entry count first for both caller and callee.


Is 'F' callee? if so, name BFI CalleeBFI .


if caller entry has count, this should assert to be true.


No need for this loop if the entry count is checked at beginning.

kazu updated this revision to Diff 311103.Dec 10 2020, 8:14 PM

Incorporated feedback.

kazu marked 10 inline comments as done.Dec 10 2020, 8:22 PM

PTAL. Thanks!


This is part of static size computation. I've expanded the comments.


I'm computing the static size here. CostAtBBStart is just a copy of Cost at the beginning of the basic block being processed so that we can compute the size of the the basic block in case it's cold.

Since I'm just copying Cost, I'd like to use the same type as Cost.


I've renamed BFI to CalleeBFI.

I've moved up the entry count for Callee.

For the caller, do I need to check its entry count? All I need is the profile count at the call site. In any event, I've moved up the code to obtain the profile count at the call site.


I've switched to 128-bit APInt. I've put justification for the bit size as a comment for CycleSavings.

davidxl added inline comments.Dec 11 2020, 11:13 AM

In comment, it should be Cost - CostAtBBStart


Are these guards needed? It is already checked in costBenefitAnalysis. Perhaps move the checking in a common place and set it to boolean variable: costBenefitAnalysisEnabled and the value of the variable can be checked here.


should assert that it has value


checker CallerEntry count earlier than this check can be removed.

kazu updated this revision to Diff 311337.Dec 11 2020, 4:54 PM
kazu marked 4 inline comments as done.

Incorporated the feedback.

kazu marked 5 inline comments as done.Dec 11 2020, 4:57 PM

PTAL. Thanks!


I've created a function called isCostBenefitAnalysisEnabled.

kazu marked an inline comment as done.Dec 16 2020, 8:55 AM

Friendly ping. Thanks!

mtrofin accepted this revision.Dec 16 2020, 9:03 AM

lgtm, but see if @davidxl has any additional comments.

This revision is now accepted and ready to land.Dec 16 2020, 9:03 AM

I suggest also dumping some opt remarks under --pass-remarks-analysis. This also allows you to write test cases to cover various types of savings.


Use interface Function::getEntryCount or Function::hasProfileData instead.


how about load & store instructions that can be SROAed?

This revision was landed with ongoing or failed builds.Dec 18 2020, 12:38 AM
This revision was automatically updated to reflect the committed changes.