This is an archive of the discontinued LLVM Phabricator instance.

Enable Partial Inlining by default
Needs ReviewPublic

Authored by sfertile on Nov 26 2017, 9:54 PM.

Details

Summary

This patch seeks to:

  • Enable Partial Inlining by default.
  • Disable Partial Inlining during thinLTO prepare/prelink stage.
  • Add option to force Partial Inlining during thinLTO prepare/prelink (enable-lto-prelink-partial-inlining or enable-npm-lto-prelink-partial-inlining)

Regular LTO pass was not modified as it currently has a completely different (ie. customized) pre-link pass than thinLTO.

Details from RFC (http://lists.llvm.org/pipermail/llvm-dev/2017-November/118752.html):

We've seen small gains on SPEC2006/2017 runtimes as well as lnt
compile-times with a 2nd stage bootstrap of LLVM. We also saw positive
gains on our internal workloads.


Brief description of Partial Inlining

A pass in opt that runs after the normal inlining pass. Looks for branches
to a return block in the entry and immediate successor blocks of a
function. If found, it outlines the rest of the function using the
CodeExtractor. It then attempts to inline the leftover entry block (and
possibly one or more of its successors) to all its callers. This
effectively peels the early return block(s) into the caller, which could be
executed without incurring the call overhead of the function just to return
immediately. Inlining and call overhead cost, as well as branch
probabilities of the return block(s) are taken into account before inlining
is done. If inlining is not successful, then the changes are discarded.

eg.

void foo() {
  bar();
  // rest of the code in foo
}

void bar() {
  if (X)
    return;
  // rest of code (to be outlined)
}

After Partial Inlining:

void foo() {
  if (!X)
    bar.outlined();
  // rest of the code in foo
}

void bar.outlined() {
  // rest of the code in bar
}

Here are the numbers on a Power8 PPCLE running Ubuntu 15.04 in ST-mode

Runtime performance (speed)

WorkloadImprovement
SPEC2006(C/C++)0.06% (geomean)
SPEC2017(C/C++)0.10% (geomean)

Compile time performance for Bootstrapped LLVM

WorkloadImprovement
SPEC2006(C/C++)0.41% (cumulative)
SPEC2017(C/C++)-0.16% (cumulative)
lnt0.61% (geomean)

Compile time performance

WorkloadIncrease
SPEC2006(C/C++)1.31% (cumulative)
SPEC2017(C/C++)0.25% (cumulative)

Code size

WorkloadIncrease
SPEC2006(C/C++)3.90% (geomean)
SPEC2017(C/C++)1.05% (geomean)

NOTE1: Code size increase in SPEC2006 was mainly attributed to benchmark
"astar", which increased by 86%. Removing this outlier, we get a more
reasonable increase of 0.58%.

Diff Detail

Repository
rL LLVM

Event Timeline

gyiu created this revision.Nov 26 2017, 9:54 PM
jtony added inline comments.Nov 27 2017, 11:53 AM
lib/Passes/PassBuilder.cpp
160

Minor nit: this line exceeds 80 columns but the format still looks better than the result from clang-format. I am not sure whether we should strictly abide by that 80-columns rule here or not (perhaps other more experience reviewers can comment). If we always want to abide by the 80-columns rule, maybe we could format like the following:

static cl::opt<bool>                                          
    RunPartialInliningThinLTOPreLink("enable-npm-lto-prelink-partial-inlining",  
        cl::init(true), cl::Hidden, cl::ZeroOrMore,                              
        cl::desc("Run Partial inlining pass during thinLTO prelink"));

FYI: this is the result from clang-format:

static cl::opt<bool> RunPartialInliningThinLTOPreLink(                           
    "enable-npm-lto-prelink-partial-inlining", cl::init(true), cl::Hidden,       
    cl::ZeroOrMore,                                                              
    cl::desc("Run Partial inlining pass during thinLTO prelink"));
lib/Transforms/IPO/PassManagerBuilder.cpp
47

Same nit format issue here with above.

nemanjai edited edge metadata.Nov 29 2017, 3:18 AM

Doesn't this patch do more than turn partial inlining on by default - i.e. takes away the ability to turn it off on the command line? I think we should have a separate option to turn it on (on by default) and to also run it on thin LTO pre-link (off by default).

gyiu marked 2 inline comments as done.Nov 29 2017, 12:25 PM

@nemanjai Good eye, but actually there's already an option disable partial inlining somewhere else (options to enable and disable are in different files, go figure). Defined in 'lib/Transforms/IPO/PartialInlining.cpp', called '-disable-partial-inlining'.

In D40477#939528, @gyiu wrote:

@nemanjai Good eye, but actually there's already an option disable partial inlining somewhere else (options to enable and disable are in different files, go figure). Defined in 'lib/Transforms/IPO/PartialInlining.cpp', called '-disable-partial-inlining'.

Ah OK. So we're adding it to the pass pipeline unconditionally, but can use -disable-partial-inlining to make the pass not do anything.

FWIW that's typically how we do things.

This definitely needs numbers - across multiple platforms. Both performance and size of resultant binaries.

FWIW that's typically how we do things.

Yeah, which is why this patch seemed a bit surprising. It appears that we were previously conditionally adding it to the pipeline and had an option to turn it on/off in the pass itself. Now it's a bit of a hybrid approach and I think that warrants a comment in the pass builder:

  • It is unconditionally added to the non-LTO pipeline
  • It is conditionally added to the pre-link step in the LTO pipeline
  • The pass itself has a command line option to disable it (everywhere)

FWIW that's typically how we do things.

Yeah, which is why this patch seemed a bit surprising. It appears that we were previously conditionally adding it to the pipeline and had an option to turn it on/off in the pass itself. Now it's a bit of a hybrid approach and I think that warrants a comment in the pass builder:

  • It is unconditionally added to the non-LTO pipeline
  • It is conditionally added to the pre-link step in the LTO pipeline
  • The pass itself has a command line option to disable it (everywhere)

Agreed.

gyiu added a comment.Nov 29 2017, 1:19 PM

This definitely needs numbers - across multiple platforms. Both performance and size of resultant binaries.

There's actually an RFC for this and I've collected numbers on PPC. Someone else collected numbers for ARM. Didn't get any replies on X86.

http://lists.llvm.org/pipermail/llvm-dev/2017-November/118752.html

In D40477#939609, @gyiu wrote:

This definitely needs numbers - across multiple platforms. Both performance and size of resultant binaries.

There's actually an RFC for this and I've collected numbers on PPC. Someone else collected numbers for ARM. Didn't get any replies on X86.

http://lists.llvm.org/pipermail/llvm-dev/2017-November/118752.html

You should update with the information from the email in the patch description to avoid having to look into multiple places.

gyiu edited the summary of this revision. (Show Details)Nov 29 2017, 1:30 PM
gyiu edited the summary of this revision. (Show Details)Nov 29 2017, 3:47 PM
gyiu edited the summary of this revision. (Show Details)Nov 29 2017, 3:58 PM
fhahn added a subscriber: fhahn.Dec 9 2017, 4:42 AM
sfertile commandeered this revision.Jan 29 2018, 11:31 AM
sfertile edited reviewers, added: gyiu; removed: sfertile.
vsk added a subscriber: vsk.Nov 16 2018, 11:41 PM
vsk added a comment.Nov 18 2018, 9:19 PM

Imho CodeExtractor needs a little more work before it's ready to be on-by-default. There are two main blockers: missing debug info support, and a missing whitelist of extract-able intrinsics. The latter poses a high risk for miscompiles (see llvm.org/PR39545, llvm.org/PR39671).

In D40477#1302461, @vsk wrote:

Imho CodeExtractor needs a little more work before it's ready to be on-by-default. There are two main blockers: missing debug info support, and a missing whitelist of extract-able intrinsics. The latter poses a high risk for miscompiles (see llvm.org/PR39545, llvm.org/PR39671).

What does the missing debug info support mean? If it is mean to cloning function would miss debug information, it looks like two recent patches had resolved this problem (see D86185 and D96531).

vsk added a comment.EditedApr 1 2021, 9:47 AM

I was referring to CodeExtractor. The concerns I had about using it were addressed in D53267, D72801, D72795, and some other follow ups.

In D40477#2664384, @vsk wrote:

I was referring to CodeExtractor. The concerns I had about using it were addressed in D53267, D72801, D72795, and some other follow ups.

It makes sense to turn it off if there is so many bugs. Maybe we should consider it later.

wenlei added a subscriber: wenlei.Jun 30 2021, 5:53 PM