Page MenuHomePhabricator

[PartialInlining] Add frequency based cost analysis

Authored by davidxl on May 2 2017, 10:47 PM.



In this patch, the runtime and size cost of making calls to outlined function is computed and compared with potential savings of partial inlining.

Diff Detail


Event Timeline

davidxl created this revision.May 2 2017, 10:47 PM
davidxl updated this revision to Diff 97549.May 2 2017, 11:08 PM

Update test case -- added a cold region test.

eraman edited edge metadata.May 3 2017, 6:00 PM

I am not fully sure if this cost model - first check if the function is a suitable candidate for outlining and then use inliner's cost analysis to inline the original function to its callsites - is the right one. Specifically, I'm thinking if both should be done on a per-callsite basis.

396 ↗(On Diff #97549)

This should ideally be in InlineCost. Not all instructions have the same cost (eg. switch). At the least it should be documented here.

413 ↗(On Diff #97549)

nit: This is also in multiples of InstrCost like SizeCost above. Using a consistent naming will make this more readable.

443 ↗(On Diff #97549)

I wonder if it is better to first clone the function and then call this isPartialInliningBeneficial. We might end up cloning it unnecessarily, but could reuse all these analyses which are computed on the duplicate function later.

453 ↗(On Diff #97549)


458 ↗(On Diff #97549)

NonWeightedSavings above only includes the savings due to elimination of call and arg setup. Since inlining also can eliminate instructions in a given context, we are likely undercounting the savings here.

701 ↗(On Diff #97549)

This should be multiplied by InstrCost.

705 ↗(On Diff #97549)

This is different from how it is used in InlineCost. In inline cost analysis, there is no separate static and runtime costs and even CallPenalty is meant to model static cost (saving and restoring registers etc). It is clear that they need to be modeled separately, but for now I prefer replicating what is in InlineCost.cpp and may be assign both the same value.

davidxl updated this revision to Diff 97865.May 4 2017, 12:22 PM
davidxl marked an inline comment as done.

Addressed review feedbacks by eraman

davidxl marked 3 inline comments as done.May 4 2017, 12:25 PM
davidxl added inline comments.
396 ↗(On Diff #97549)

Added comment.

458 ↗(On Diff #97549)

possibly. Since the inlined region is just a set of guarding blocks, the addition benefit is probably not that high.

701 ↗(On Diff #97549)

removed the function. Use the actual extracted function to compute instead.

eraman added inline comments.May 4 2017, 5:46 PM
107 ↗(On Diff #97865)

Add an empty line between functions (her and in other places)

388 ↗(On Diff #97865)

It wasn't obvious to me why callee entry freq >= outlining call's frequency. I think I understand it now - we ensure the non-return block has a single predecessor and hence can't be in a loop, but a comment above would help.

435 ↗(On Diff #97865)

This is not in multiples of InstrCost (and also inliner's switch cost computation is more sophisticated now, but you've the TODO above)

467 ↗(On Diff #97865)

may be add an assert that OutlinedFunctionCost >= OutlinedRegionCost to potentially catch any bugs.

589 ↗(On Diff #97865)

nullptr instead of 0

23 ↗(On Diff #97865)

Why do you need this?

754 ↗(On Diff #97865)

Why not grab the (only)user to this function in unswitchFunction and get the basic block from there. It adds some overhead, but IMO is cleaner than this.

davidxl updated this revision to Diff 97911.May 4 2017, 10:06 PM
  1. Addressed review feedbacks
  2. More refactorings and naming cleanups (for consistency)
  3. Added handling of function entry count update and a new test case for it.
davidxl marked 5 inline comments as done.May 4 2017, 10:08 PM
davidxl added inline comments.
23 ↗(On Diff #97865)

for getCallsiteCost and getInlineCost interface.

754 ↗(On Diff #97865)

can you clarify on this?

eraman added inline comments.May 5 2017, 11:07 AM
23 ↗(On Diff #97865)

I don't see any use of those interfaces in CodeExtractor.cpp. Am I missing something?

754 ↗(On Diff #97865)

Something like cast<CallInst>(*F->user_begin())->getParent() in unswitchFunction should work right?

davidxl updated this revision to Diff 98043.May 5 2017, 5:16 PM

Address feedback -- revert change in CodeExtractor.h and .cc file

Also let the cost analysis only trust PGO and user annotation data. Without PGO, add an option to control the aggressiveness.

davidxl updated this revision to Diff 98580.May 10 2017, 9:18 PM

Further tuned cost/benefit computation with SPEC 2k/06 tuning. Get speedups in a few benchmarks and fixed the regression in perlbmk (many cases with early return branch which is never taken).


795–797 ↗(On Diff #98580)

nit: No need for {}'s in single-line bodies.

davidxl updated this revision to Diff 98790.May 12 2017, 9:36 AM

Remove { }

In terms of correctness and heuristics this looks good. I'v a few comments that are mostly stylistic and documentation issues.

49 ↗(On Diff #98043)

make this cl::ReallyHidden since it is only used in testing

66 ↗(On Diff #98043)

The description string and comment does not make it clear whether this is the upper limit (I think it is) or not.

127 ↗(On Diff #98043)

typo: successfully -> successful

141 ↗(On Diff #98043)

an empty line below

175 ↗(On Diff #98043)

Not sure if this has been clang-formatted because it looks like the next two lines can be merged into one (I may be wrong)

178 ↗(On Diff #98043)

Inline -> InlineCost. usedto->used to

161 ↗(On Diff #98790)

Empty line?

397 ↗(On Diff #98790)

I prefer auto here and below since the get methods clearly convey the type.

406 ↗(On Diff #98790)

This certainly need some more comments. Essentially you don't want to over-estimate the savings and hence you assume that outlinined region is at least OutlineRegionFreqPercent likely even if BFI tells otherwise.

407 ↗(On Diff #98790)

Flip the branch condition and so an early return?

408 ↗(On Diff #98790)


458 ↗(On Diff #98790)

auto here and in the next line.

564 ↗(On Diff #98790)

move the BFI computation to a lambda maybe?

761 ↗(On Diff #98790)

Instead of using two variables with similar names, why not initialize the uint64_t to 0 above the previous if and inside the if do F->getEntryCount()->getValue() ?

davidxl marked 10 inline comments as done.May 12 2017, 1:36 PM
davidxl added inline comments.
66 ↗(On Diff #98043)

Updated the comment.

175 ↗(On Diff #98043)

The ; will be @column 80 which clang-format does not like.

761 ↗(On Diff #98790)

Both vars are used later. I changed the second var name to make it look friendiler.

davidxl updated this revision to Diff 98834.May 12 2017, 1:37 PM

Address review feedbacks.

eraman accepted this revision.May 12 2017, 1:43 PM


458 ↗(On Diff #98790)

I think you can use auto for NormWeightedRCost also.

This revision is now accepted and ready to land.May 12 2017, 1:43 PM
This revision was automatically updated to reflect the committed changes.