This is an archive of the discontinued LLVM Phabricator instance.

[PartialInlining] Add frequency based cost analysis
ClosedPublic

Authored by davidxl on May 2 2017, 10:47 PM.

Details

Summary

In this patch, the runtime and size cost of making calls to outlined function is computed and compared with potential savings of partial inlining.

Diff Detail

Event Timeline

davidxl created this revision.May 2 2017, 10:47 PM
davidxl updated this revision to Diff 97549.May 2 2017, 11:08 PM

Update test case -- added a cold region test.

eraman edited edge metadata.May 3 2017, 6:00 PM

I am not fully sure if this cost model - first check if the function is a suitable candidate for outlining and then use inliner's cost analysis to inline the original function to its callsites - is the right one. Specifically, I'm thinking if both should be done on a per-callsite basis.

lib/Transforms/IPO/PartialInlining.cpp
468

This should ideally be in InlineCost. Not all instructions have the same cost (eg. switch). At the least it should be documented here.

485

nit: This is also in multiples of InstrCost like SizeCost above. Using a consistent naming will make this more readable.

515

I wonder if it is better to first clone the function and then call this isPartialInliningBeneficial. We might end up cloning it unnecessarily, but could reuse all these analyses which are computed on the duplicate function later.

525

std::max?

530

NonWeightedSavings above only includes the savings due to elimination of call and arg setup. Since inlining also can eliminate instructions in a given context, we are likely undercounting the savings here.

lib/Transforms/Utils/CodeExtractor.cpp
701

This should be multiplied by InstrCost.

705

This is different from how it is used in InlineCost. In inline cost analysis, there is no separate static and runtime costs and even CallPenalty is meant to model static cost (saving and restoring registers etc). It is clear that they need to be modeled separately, but for now I prefer replicating what is in InlineCost.cpp and may be assign both the same value.

davidxl updated this revision to Diff 97865.May 4 2017, 12:22 PM
davidxl marked an inline comment as done.

Addressed review feedbacks by eraman

davidxl marked 3 inline comments as done.May 4 2017, 12:25 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
468

Added comment.

530

possibly. Since the inlined region is just a set of guarding blocks, the addition benefit is probably not that high.

lib/Transforms/Utils/CodeExtractor.cpp
701

removed the function. Use the actual extracted function to compute instead.

eraman added inline comments.May 4 2017, 5:46 PM
lib/Transforms/IPO/PartialInlining.cpp
132–174

Add an empty line between functions (her and in other places)

448

It wasn't obvious to me why callee entry freq >= outlining call's frequency. I think I understand it now - we ensure the non-return block has a single predecessor and hence can't be in a loop, but a comment above would help.

467

This is not in multiples of InstrCost (and also inliner's switch cost computation is more sophisticated now, but you've the TODO above)

499

may be add an assert that OutlinedFunctionCost >= OutlinedRegionCost to potentially catch any bugs.

659

nullptr instead of 0

lib/Transforms/Utils/CodeExtractor.cpp
23

Why do you need this?

754

Why not grab the (only)user to this function in unswitchFunction and get the basic block from there. It adds some overhead, but IMO is cleaner than this.

davidxl updated this revision to Diff 97911.May 4 2017, 10:06 PM
  1. Addressed review feedbacks
  2. More refactorings and naming cleanups (for consistency)
  3. Added handling of function entry count update and a new test case for it.
davidxl marked 5 inline comments as done.May 4 2017, 10:08 PM
davidxl added inline comments.
lib/Transforms/Utils/CodeExtractor.cpp
23

for getCallsiteCost and getInlineCost interface.

754

can you clarify on this?

eraman added inline comments.May 5 2017, 11:07 AM
lib/Transforms/Utils/CodeExtractor.cpp
23

I don't see any use of those interfaces in CodeExtractor.cpp. Am I missing something?

754

Something like cast<CallInst>(*F->user_begin())->getParent() in unswitchFunction should work right?

davidxl updated this revision to Diff 98043.May 5 2017, 5:16 PM

Address feedback -- revert change in CodeExtractor.h and .cc file

Also let the cost analysis only trust PGO and user annotation data. Without PGO, add an option to control the aggressiveness.

davidxl updated this revision to Diff 98580.May 10 2017, 9:18 PM

Further tuned cost/benefit computation with SPEC 2k/06 tuning. Get speedups in a few benchmarks and fixed the regression in perlbmk (many cases with early return branch which is never taken).

Drive-by-comment.

lib/Transforms/IPO/PartialInlining.cpp
755–758

nit: No need for {}'s in single-line bodies.

davidxl updated this revision to Diff 98790.May 12 2017, 9:36 AM

Remove { }

In terms of correctness and heuristics this looks good. I'v a few comments that are mostly stylistic and documentation issues.

lib/Transforms/IPO/PartialInlining.cpp
49

make this cl::ReallyHidden since it is only used in testing

66

The description string and comment does not make it clear whether this is the upper limit (I think it is) or not.

120

typo: successfully -> successful

132–174

an empty line below

152

Empty line?

166

Not sure if this has been clang-formatted because it looks like the next two lines can be merged into one (I may be wrong)

169

Inline -> InlineCost. usedto->used to

401

I prefer auto here and below since the get methods clearly convey the type.

410

This certainly need some more comments. Essentially you don't want to over-estimate the savings and hence you assume that outlinined region is at least OutlineRegionFreqPercent likely even if BFI tells otherwise.

411

Flip the branch condition and so an early return?

412

std::min

421

auto here and in the next line.

527

move the BFI computation to a lambda maybe?

721

Instead of using two variables with similar names, why not initialize the uint64_t to 0 above the previous if and inside the if do F->getEntryCount()->getValue() ?

davidxl marked 10 inline comments as done.May 12 2017, 1:36 PM
davidxl added inline comments.
lib/Transforms/IPO/PartialInlining.cpp
66

Updated the comment.

166

The ; will be @column 80 which clang-format does not like.

721

Both vars are used later. I changed the second var name to make it look friendiler.

davidxl updated this revision to Diff 98834.May 12 2017, 1:37 PM

Address review feedbacks.

eraman accepted this revision.May 12 2017, 1:43 PM

LGTM

lib/Transforms/IPO/PartialInlining.cpp
421

I think you can use auto for NormWeightedRCost also.

This revision is now accepted and ready to land.May 12 2017, 1:43 PM
This revision was automatically updated to reflect the committed changes.