This is an archive of the discontinued LLVM Phabricator instance.

[PartialInlining] Add frequency based cost analysis
ClosedPublic

Authored by davidxl on May 2 2017, 10:47 PM.

Details

Summary

In this patch, the runtime and size cost of making calls to outlined function is computed and compared with potential savings of partial inlining.

Diff Detail

Event Timeline

davidxl created this revision.May 2 2017, 10:47 PM
davidxl updated this revision to Diff 97549.May 2 2017, 11:08 PM

Update test case -- added a cold region test.

eraman edited edge metadata.May 3 2017, 6:00 PM

I am not fully sure if this cost model - first check if the function is a suitable candidate for outlining and then use inliner's cost analysis to inline the original function to its callsites - is the right one. Specifically, I'm thinking if both should be done on a per-callsite basis.

lib/Transforms/IPO/PartialInlining.cpp
396

This should ideally be in InlineCost. Not all instructions have the same cost (eg. switch). At the least it should be documented here.

413

nit: This is also in multiples of InstrCost like SizeCost above. Using a consistent naming will make this more readable.

443

I wonder if it is better to first clone the function and then call this isPartialInliningBeneficial. We might end up cloning it unnecessarily, but could reuse all these analyses which are computed on the duplicate function later.

453

std::max?

458

NonWeightedSavings above only includes the savings due to elimination of call and arg setup. Since inlining also can eliminate instructions in a given context, we are likely undercounting the savings here.

lib/Transforms/Utils/CodeExtractor.cpp
701

This should be multiplied by InstrCost.

705

This is different from how it is used in InlineCost. In inline cost analysis, there is no separate static and runtime costs and even CallPenalty is meant to model static cost (saving and restoring registers etc). It is clear that they need to be modeled separately, but for now I prefer replicating what is in InlineCost.cpp and may be assign both the same value.

davidxl updated this revision to Diff 97865.May 4 2017, 12:22 PM
davidxl marked an inline comment as done.

Addressed review feedbacks by eraman

davidxl marked 3 inline comments as done.May 4 2017, 12:25 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
396

Added comment.

458

possibly. Since the inlined region is just a set of guarding blocks, the addition benefit is probably not that high.

lib/Transforms/Utils/CodeExtractor.cpp
701

removed the function. Use the actual extracted function to compute instead.

eraman added inline comments.May 4 2017, 5:46 PM
lib/Transforms/IPO/PartialInlining.cpp
103–138

Add an empty line between functions (her and in other places)

376

It wasn't obvious to me why callee entry freq >= outlining call's frequency. I think I understand it now - we ensure the non-return block has a single predecessor and hence can't be in a loop, but a comment above would help.

395

This is not in multiples of InstrCost (and also inliner's switch cost computation is more sophisticated now, but you've the TODO above)

427

may be add an assert that OutlinedFunctionCost >= OutlinedRegionCost to potentially catch any bugs.

589

nullptr instead of 0

lib/Transforms/Utils/CodeExtractor.cpp
23

Why do you need this?

754

Why not grab the (only)user to this function in unswitchFunction and get the basic block from there. It adds some overhead, but IMO is cleaner than this.

davidxl updated this revision to Diff 97911.May 4 2017, 10:06 PM
  1. Addressed review feedbacks
  2. More refactorings and naming cleanups (for consistency)
  3. Added handling of function entry count update and a new test case for it.
davidxl marked 5 inline comments as done.May 4 2017, 10:08 PM
davidxl added inline comments.
lib/Transforms/Utils/CodeExtractor.cpp
23

for getCallsiteCost and getInlineCost interface.

754

can you clarify on this?

eraman added inline comments.May 5 2017, 11:07 AM
lib/Transforms/Utils/CodeExtractor.cpp
23

I don't see any use of those interfaces in CodeExtractor.cpp. Am I missing something?

754

Something like cast<CallInst>(*F->user_begin())->getParent() in unswitchFunction should work right?

davidxl updated this revision to Diff 98043.May 5 2017, 5:16 PM

Address feedback -- revert change in CodeExtractor.h and .cc file

Also let the cost analysis only trust PGO and user annotation data. Without PGO, add an option to control the aggressiveness.

davidxl updated this revision to Diff 98580.May 10 2017, 9:18 PM

Further tuned cost/benefit computation with SPEC 2k/06 tuning. Get speedups in a few benchmarks and fixed the regression in perlbmk (many cases with early return branch which is never taken).

Drive-by-comment.

lib/Transforms/IPO/PartialInlining.cpp
624–625

nit: No need for {}'s in single-line bodies.

davidxl updated this revision to Diff 98790.May 12 2017, 9:36 AM

Remove { }

In terms of correctness and heuristics this looks good. I'v a few comments that are mostly stylistic and documentation issues.

lib/Transforms/IPO/PartialInlining.cpp
49

make this cl::ReallyHidden since it is only used in testing

65

The description string and comment does not make it clear whether this is the upper limit (I think it is) or not.

103–138

an empty line below

119

typo: successfully -> successful

123

Empty line?

137

Not sure if this has been clang-formatted because it looks like the next two lines can be merged into one (I may be wrong)

140

Inline -> InlineCost. usedto->used to

352

I prefer auto here and below since the get methods clearly convey the type.

361

This certainly need some more comments. Essentially you don't want to over-estimate the savings and hence you assume that outlinined region is at least OutlineRegionFreqPercent likely even if BFI tells otherwise.

362

Flip the branch condition and so an early return?

363

std::min

372

auto here and in the next line.

455

move the BFI computation to a lambda maybe?

602

Instead of using two variables with similar names, why not initialize the uint64_t to 0 above the previous if and inside the if do F->getEntryCount()->getValue() ?

davidxl marked 10 inline comments as done.May 12 2017, 1:36 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
65

Updated the comment.

137

The ; will be @column 80 which clang-format does not like.

602

Both vars are used later. I changed the second var name to make it look friendiler.

davidxl updated this revision to Diff 98834.May 12 2017, 1:37 PM

Address review feedbacks.

eraman accepted this revision.May 12 2017, 1:43 PM

LGTM

lib/Transforms/IPO/PartialInlining.cpp
372

I think you can use auto for NormWeightedRCost also.

This revision is now accepted and ready to land.May 12 2017, 1:43 PM
This revision was automatically updated to reflect the committed changes.