This patch seeks to:
- Enable Partial Inlining by default.
- Disable Partial Inlining during thinLTO prepare/prelink stage.
- Add option to force Partial Inlining during thinLTO prepare/prelink (enable-lto-prelink-partial-inlining or enable-npm-lto-prelink-partial-inlining)
Regular LTO pass was not modified as it currently has a completely different (ie. customized) pre-link pass than thinLTO.
Details from RFC (http://lists.llvm.org/pipermail/llvm-dev/2017-November/118752.html):
We've seen small gains on SPEC2006/2017 runtimes as well as lnt
compile-times with a 2nd stage bootstrap of LLVM. We also saw positive
gains on our internal workloads.
Brief description of Partial Inlining
A pass in opt that runs after the normal inlining pass. Looks for branches
to a return block in the entry and immediate successor blocks of a
function. If found, it outlines the rest of the function using the
CodeExtractor. It then attempts to inline the leftover entry block (and
possibly one or more of its successors) to all its callers. This
effectively peels the early return block(s) into the caller, which could be
executed without incurring the call overhead of the function just to return
immediately. Inlining and call overhead cost, as well as branch
probabilities of the return block(s) are taken into account before inlining
is done. If inlining is not successful, then the changes are discarded.
eg.
void foo() { bar(); // rest of the code in foo } void bar() { if (X) return; // rest of code (to be outlined) }
After Partial Inlining:
void foo() { if (!X) bar.outlined(); // rest of the code in foo } void bar.outlined() { // rest of the code in bar }
Here are the numbers on a Power8 PPCLE running Ubuntu 15.04 in ST-mode
Runtime performance (speed)
Workload | Improvement |
SPEC2006(C/C++) | 0.06% (geomean) |
SPEC2017(C/C++) | 0.10% (geomean) |
Compile time performance for Bootstrapped LLVM
Workload | Improvement |
SPEC2006(C/C++) | 0.41% (cumulative) |
SPEC2017(C/C++) | -0.16% (cumulative) |
lnt | 0.61% (geomean) |
Compile time performance
Workload | Increase |
SPEC2006(C/C++) | 1.31% (cumulative) |
SPEC2017(C/C++) | 0.25% (cumulative) |
Code size
Workload | Increase |
SPEC2006(C/C++) | 3.90% (geomean) |
SPEC2017(C/C++) | 1.05% (geomean) |
NOTE1: Code size increase in SPEC2006 was mainly attributed to benchmark
"astar", which increased by 86%. Removing this outlier, we get a more
reasonable increase of 0.58%.
Minor nit: this line exceeds 80 columns but the format still looks better than the result from clang-format. I am not sure whether we should strictly abide by that 80-columns rule here or not (perhaps other more experience reviewers can comment). If we always want to abide by the 80-columns rule, maybe we could format like the following:
FYI: this is the result from clang-format: