In this patch, the runtime and size cost of making calls to outlined function is computed and compared with potential savings of partial inlining.
Details
Diff Detail
Event Timeline
I am not fully sure if this cost model - first check if the function is a suitable candidate for outlining and then use inliner's cost analysis to inline the original function to its callsites - is the right one. Specifically, I'm thinking if both should be done on a per-callsite basis.
lib/Transforms/IPO/PartialInlining.cpp | ||
---|---|---|
468 | This should ideally be in InlineCost. Not all instructions have the same cost (eg. switch). At the least it should be documented here. | |
485 | nit: This is also in multiples of InstrCost like SizeCost above. Using a consistent naming will make this more readable. | |
515 | I wonder if it is better to first clone the function and then call this isPartialInliningBeneficial. We might end up cloning it unnecessarily, but could reuse all these analyses which are computed on the duplicate function later. | |
525 | std::max? | |
530 | NonWeightedSavings above only includes the savings due to elimination of call and arg setup. Since inlining also can eliminate instructions in a given context, we are likely undercounting the savings here. | |
lib/Transforms/Utils/CodeExtractor.cpp | ||
701 | This should be multiplied by InstrCost. | |
705 | This is different from how it is used in InlineCost. In inline cost analysis, there is no separate static and runtime costs and even CallPenalty is meant to model static cost (saving and restoring registers etc). It is clear that they need to be modeled separately, but for now I prefer replicating what is in InlineCost.cpp and may be assign both the same value. |
lib/Transforms/IPO/PartialInlining.cpp | ||
---|---|---|
132–174 | Add an empty line between functions (her and in other places) | |
448 | It wasn't obvious to me why callee entry freq >= outlining call's frequency. I think I understand it now - we ensure the non-return block has a single predecessor and hence can't be in a loop, but a comment above would help. | |
467 | This is not in multiples of InstrCost (and also inliner's switch cost computation is more sophisticated now, but you've the TODO above) | |
499 | may be add an assert that OutlinedFunctionCost >= OutlinedRegionCost to potentially catch any bugs. | |
659 | nullptr instead of 0 | |
lib/Transforms/Utils/CodeExtractor.cpp | ||
23 | Why do you need this? | |
754 | Why not grab the (only)user to this function in unswitchFunction and get the basic block from there. It adds some overhead, but IMO is cleaner than this. |
- Addressed review feedbacks
- More refactorings and naming cleanups (for consistency)
- Added handling of function entry count update and a new test case for it.
Address feedback -- revert change in CodeExtractor.h and .cc file
Also let the cost analysis only trust PGO and user annotation data. Without PGO, add an option to control the aggressiveness.
Further tuned cost/benefit computation with SPEC 2k/06 tuning. Get speedups in a few benchmarks and fixed the regression in perlbmk (many cases with early return branch which is never taken).
Drive-by-comment.
lib/Transforms/IPO/PartialInlining.cpp | ||
---|---|---|
755–758 | nit: No need for {}'s in single-line bodies. |
In terms of correctness and heuristics this looks good. I'v a few comments that are mostly stylistic and documentation issues.
lib/Transforms/IPO/PartialInlining.cpp | ||
---|---|---|
49 | make this cl::ReallyHidden since it is only used in testing | |
66 | The description string and comment does not make it clear whether this is the upper limit (I think it is) or not. | |
120 | typo: successfully -> successful | |
132–174 | an empty line below | |
152 | Empty line? | |
166 | Not sure if this has been clang-formatted because it looks like the next two lines can be merged into one (I may be wrong) | |
169 | Inline -> InlineCost. usedto->used to | |
401 | I prefer auto here and below since the get methods clearly convey the type. | |
410 | This certainly need some more comments. Essentially you don't want to over-estimate the savings and hence you assume that outlinined region is at least OutlineRegionFreqPercent likely even if BFI tells otherwise. | |
411 | Flip the branch condition and so an early return? | |
412 | std::min | |
421 | auto here and in the next line. | |
527 | move the BFI computation to a lambda maybe? | |
721 | Instead of using two variables with similar names, why not initialize the uint64_t to 0 above the previous if and inside the if do F->getEntryCount()->getValue() ? |
LGTM
lib/Transforms/IPO/PartialInlining.cpp | ||
---|---|---|
421 | I think you can use auto for NormWeightedRCost also. |
make this cl::ReallyHidden since it is only used in testing