This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
Passes/
1/1
PassBuilder.cpp
-
Transforms/IPO/
-
IPO/
1/1
PassManagerBuilder.cpp

Differential D40477

Enable Partial Inlining by default
Needs ReviewPublic

Authored by sfertile on Nov 26 2017, 9:54 PM.

Download Raw Diff

Details

Reviewers

kbarton
nemanjai
lei
syzaara
jtony
stefanp
hfinkel
davidxl
gyiu

Summary

This patch seeks to:

Enable Partial Inlining by default.
Disable Partial Inlining during thinLTO prepare/prelink stage.
Add option to force Partial Inlining during thinLTO prepare/prelink (enable-lto-prelink-partial-inlining or enable-npm-lto-prelink-partial-inlining)

Regular LTO pass was not modified as it currently has a completely different (ie. customized) pre-link pass than thinLTO.

Details from RFC (http://lists.llvm.org/pipermail/llvm-dev/2017-November/118752.html):

We've seen small gains on SPEC2006/2017 runtimes as well as lnt
compile-times with a 2nd stage bootstrap of LLVM. We also saw positive
gains on our internal workloads.

Brief description of Partial Inlining

A pass in opt that runs after the normal inlining pass. Looks for branches
to a return block in the entry and immediate successor blocks of a
function. If found, it outlines the rest of the function using the
CodeExtractor. It then attempts to inline the leftover entry block (and
possibly one or more of its successors) to all its callers. This
effectively peels the early return block(s) into the caller, which could be
executed without incurring the call overhead of the function just to return
immediately. Inlining and call overhead cost, as well as branch
probabilities of the return block(s) are taken into account before inlining
is done. If inlining is not successful, then the changes are discarded.

eg.

void foo() {
  bar();
  // rest of the code in foo
}

void bar() {
  if (X)
    return;
  // rest of code (to be outlined)
}

After Partial Inlining:

void foo() {
  if (!X)
    bar.outlined();
  // rest of the code in foo
}

void bar.outlined() {
  // rest of the code in bar
}

Here are the numbers on a Power8 PPCLE running Ubuntu 15.04 in ST-mode

Runtime performance (speed)

Workload	Improvement
SPEC2006(C/C++)	0.06% (geomean)
SPEC2017(C/C++)	0.10% (geomean)

Compile time performance for Bootstrapped LLVM

Workload	Improvement
SPEC2006(C/C++)	0.41% (cumulative)
SPEC2017(C/C++)	-0.16% (cumulative)
lnt	0.61% (geomean)

Compile time performance

Workload	Increase
SPEC2006(C/C++)	1.31% (cumulative)
SPEC2017(C/C++)	0.25% (cumulative)

Code size

Workload	Increase
SPEC2006(C/C++)	3.90% (geomean)
SPEC2017(C/C++)	1.05% (geomean)

NOTE1: Code size increase in SPEC2006 was mainly attributed to benchmark
"astar", which increased by 86%. Removing this outlier, we get a more
reasonable increase of 0.58%.

Diff Detail

Repository: rL LLVM

Event Timeline

gyiu created this revision.Nov 26 2017, 9:54 PM

Herald added a subscriber: mehdi_amini. · View Herald TranscriptNov 26 2017, 9:54 PM

haicheng added a subscriber: haicheng.Nov 27 2017, 10:50 AM

jtony added inline comments.Nov 27 2017, 11:53 AM

lib/Passes/PassBuilder.cpp
160	Minor nit: this line exceeds 80 columns but the format still looks better than the result from clang-format. I am not sure whether we should strictly abide by that 80-columns rule here or not (perhaps other more experience reviewers can comment). If we always want to abide by the 80-columns rule, maybe we could format like the following: static cl::opt<bool> RunPartialInliningThinLTOPreLink("enable-npm-lto-prelink-partial-inlining", cl::init(true), cl::Hidden, cl::ZeroOrMore, cl::desc("Run Partial inlining pass during thinLTO prelink")); FYI: this is the result from clang-format: static cl::opt<bool> RunPartialInliningThinLTOPreLink( "enable-npm-lto-prelink-partial-inlining", cl::init(true), cl::Hidden, cl::ZeroOrMore, cl::desc("Run Partial inlining pass during thinLTO prelink"));
lib/Transforms/IPO/PassManagerBuilder.cpp
47	Same nit format issue here with above.

Doesn't this patch do more than turn partial inlining on by default - i.e. takes away the ability to turn it off on the command line? I think we should have a separate option to turn it on (on by default) and to also run it on thin LTO pre-link (off by default).

@nemanjai Good eye, but actually there's already an option disable partial inlining somewhere else (options to enable and disable are in different files, go figure). Defined in 'lib/Transforms/IPO/PartialInlining.cpp', called '-disable-partial-inlining'.

In D40477#939528, @gyiu wrote:

@nemanjai Good eye, but actually there's already an option disable partial inlining somewhere else (options to enable and disable are in different files, go figure). Defined in 'lib/Transforms/IPO/PartialInlining.cpp', called '-disable-partial-inlining'.

Ah OK. So we're adding it to the pass pipeline unconditionally, but can use -disable-partial-inlining to make the pass not do anything.

FWIW that's typically how we do things.

This definitely needs numbers - across multiple platforms. Both performance and size of resultant binaries.

In D40477#939585, @echristo wrote:

FWIW that's typically how we do things.

Yeah, which is why this patch seemed a bit surprising. It appears that we were previously conditionally adding it to the pipeline and had an option to turn it on/off in the pass itself. Now it's a bit of a hybrid approach and I think that warrants a comment in the pass builder:

It is unconditionally added to the non-LTO pipeline
It is conditionally added to the pre-link step in the LTO pipeline
The pass itself has a command line option to disable it (everywhere)

In D40477#939596, @nemanjai wrote:

In D40477#939585, @echristo wrote:

FWIW that's typically how we do things.

Yeah, which is why this patch seemed a bit surprising. It appears that we were previously conditionally adding it to the pipeline and had an option to turn it on/off in the pass itself. Now it's a bit of a hybrid approach and I think that warrants a comment in the pass builder:

It is unconditionally added to the non-LTO pipeline

It is conditionally added to the pre-link step in the LTO pipeline

The pass itself has a command line option to disable it (everywhere)

Agreed.

In D40477#939587, @echristo wrote:

This definitely needs numbers - across multiple platforms. Both performance and size of resultant binaries.

There's actually an RFC for this and I've collected numbers on PPC. Someone else collected numbers for ARM. Didn't get any replies on X86.

http://lists.llvm.org/pipermail/llvm-dev/2017-November/118752.html

In D40477#939609, @gyiu wrote:

In D40477#939587, @echristo wrote:

This definitely needs numbers - across multiple platforms. Both performance and size of resultant binaries.

There's actually an RFC for this and I've collected numbers on PPC. Someone else collected numbers for ARM. Didn't get any replies on X86.

http://lists.llvm.org/pipermail/llvm-dev/2017-November/118752.html

You should update with the information from the email in the patch description to avoid having to look into multiple places.

gyiu edited the summary of this revision. (Show Details)Nov 29 2017, 1:30 PM

gyiu edited the summary of this revision. (Show Details)Nov 29 2017, 3:47 PM

gyiu edited the summary of this revision. (Show Details)Nov 29 2017, 3:58 PM

fhahn added a subscriber: fhahn.Dec 9 2017, 4:42 AM

fhahn mentioned this in D38585: Enable interprocedural optimization in libquantum - LLVM-part [WIP].Dec 9 2017, 4:51 AM

sfertile commandeered this revision.Jan 29 2018, 11:31 AM

sfertile edited reviewers, added: gyiu; removed: sfertile.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 29 2018, 11:31 AM

junbuml added a subscriber: junbuml.Jan 29 2018, 11:35 AM

vsk added a subscriber: vsk.Nov 16 2018, 11:41 PM

Herald added a subscriber: dexonsmith. · View Herald TranscriptNov 16 2018, 11:41 PM

Imho CodeExtractor needs a little more work before it's ready to be on-by-default. There are two main blockers: missing debug info support, and a missing whitelist of extract-able intrinsics. The latter poses a high risk for miscompiles (see llvm.org/PR39545, llvm.org/PR39671).

dtzWill added a subscriber: dtzWill.Nov 19 2018, 6:45 AM

In D40477#1302461, @vsk wrote:

Imho CodeExtractor needs a little more work before it's ready to be on-by-default. There are two main blockers: missing debug info support, and a missing whitelist of extract-able intrinsics. The latter poses a high risk for miscompiles (see llvm.org/PR39545, llvm.org/PR39671).

What does the missing debug info support mean? If it is mean to cloning function would miss debug information, it looks like two recent patches had resolved this problem (see D86185 and D96531).

I was referring to CodeExtractor. The concerns I had about using it were addressed in D53267, D72801, D72795, and some other follow ups.

In D40477#2664384, @vsk wrote:

I was referring to CodeExtractor. The concerns I had about using it were addressed in D53267, D72801, D72795, and some other follow ups.

It makes sense to turn it off if there is so many bugs. Maybe we should consider it later.

wenlei added a subscriber: wenlei.Jun 30 2021, 5:53 PM

Herald added a subscriber: ormris. · View Herald TranscriptJun 30 2021, 5:53 PM

Revision Contents

Path

Size

lib/

Passes/

PassBuilder.cpp

10 lines

Transforms/

IPO/

PassManagerBuilder.cpp

9 lines

Diff 124322

lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Vectorize/SLPVectorizer.h"		#include "llvm/Transforms/Vectorize/SLPVectorizer.h"

#include <type_traits>		#include <type_traits>

using namespace llvm;		using namespace llvm;

static cl::opt<unsigned> MaxDevirtIterations("pm-max-devirt-iterations",		static cl::opt<unsigned> MaxDevirtIterations("pm-max-devirt-iterations",
cl::ReallyHidden, cl::init(4));		cl::ReallyHidden, cl::init(4));

static cl::opt<bool>		static cl::opt<bool>
RunPartialInlining("enable-npm-partial-inlining", cl::init(false),		RunPartialInliningThinLTOPreLink("enable-npm-lto-prelink-partial-inlining", cl::init(true),
		jtonyUnsubmitted Done Reply Inline Actions Minor nit: this line exceeds 80 columns but the format still looks better than the result from clang-format. I am not sure whether we should strictly abide by that 80-columns rule here or not (perhaps other more experience reviewers can comment). If we always want to abide by the 80-columns rule, maybe we could format like the following: static cl::opt<bool> RunPartialInliningThinLTOPreLink("enable-npm-lto-prelink-partial-inlining", cl::init(true), cl::Hidden, cl::ZeroOrMore, cl::desc("Run Partial inlining pass during thinLTO prelink")); FYI: this is the result from clang-format: static cl::opt<bool> RunPartialInliningThinLTOPreLink( "enable-npm-lto-prelink-partial-inlining", cl::init(true), cl::Hidden, cl::ZeroOrMore, cl::desc("Run Partial inlining pass during thinLTO prelink")); jtony: Minor nit: this line exceeds 80 columns but the format still looks better than the result from…
cl::Hidden, cl::ZeroOrMore,		cl::Hidden, cl::ZeroOrMore,
cl::desc("Run Partial inlinining pass"));		cl::desc("Run Partial inlining pass during thinLTO prelink"));

static cl::opt<bool>		static cl::opt<bool>
RunNewGVN("enable-npm-newgvn", cl::init(false),		RunNewGVN("enable-npm-newgvn", cl::init(false),
cl::Hidden, cl::ZeroOrMore,		cl::Hidden, cl::ZeroOrMore,
cl::desc("Run NewGVN instead of GVN"));		cl::desc("Run NewGVN instead of GVN"));

static cl::opt<bool> EnableEarlyCSEMemSSA(		static cl::opt<bool> EnableEarlyCSEMemSSA(
"enable-npm-earlycse-memssa", cl::init(true), cl::Hidden,		"enable-npm-earlycse-memssa", cl::init(true), cl::Hidden,
▲ Show 20 Lines • Show All 516 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
ModulePassManager MPM(DebugLogging);		ModulePassManager MPM(DebugLogging);

// Optimize globals now that the module is fully simplified.		// Optimize globals now that the module is fully simplified.
MPM.addPass(GlobalOptPass());		MPM.addPass(GlobalOptPass());
MPM.addPass(GlobalDCEPass());		MPM.addPass(GlobalDCEPass());

// Run partial inlining pass to partially inline functions that have		// Run partial inlining pass to partially inline functions that have
// large bodies.		// large bodies.
if (RunPartialInlining)
MPM.addPass(PartialInlinerPass());		MPM.addPass(PartialInlinerPass());

// Remove avail extern fns and globals definitions since we aren't compiling		// Remove avail extern fns and globals definitions since we aren't compiling
// an object file for later LTO. For LTO we want to preserve these so they		// an object file for later LTO. For LTO we want to preserve these so they
// are eligible for inlining at link-time. Note if they are unreferenced they		// are eligible for inlining at link-time. Note if they are unreferenced they
// will be removed by GlobalDCE later, so this only impacts referenced		// will be removed by GlobalDCE later, so this only impacts referenced
// available externally globals. Eventually they will be suppressed during		// available externally globals. Eventually they will be suppressed during
// codegen, but eliminating here enables more opportunity for GlobalDCE as it		// codegen, but eliminating here enables more opportunity for GlobalDCE as it
// may make globals referenced by available external functions dead and saves		// may make globals referenced by available external functions dead and saves
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	PassBuilder::buildThinLTOPreLinkDefaultPipeline(OptimizationLevel Level,

// Run partial inlining pass to partially inline functions that have		// Run partial inlining pass to partially inline functions that have
// large bodies.		// large bodies.
// FIXME: It isn't clear whether this is really the right place to run this		// FIXME: It isn't clear whether this is really the right place to run this
// in ThinLTO. Because there is another canonicalization and simplification		// in ThinLTO. Because there is another canonicalization and simplification
// phase that will run after the thin link, running this here ends up with		// phase that will run after the thin link, running this here ends up with
// less information than will be available later and it may grow functions in		// less information than will be available later and it may grow functions in
// ways that aren't beneficial.		// ways that aren't beneficial.
if (RunPartialInlining)		if (RunPartialInliningThinLTOPreLink)
MPM.addPass(PartialInlinerPass());		MPM.addPass(PartialInlinerPass());

// Reduce the size of the IR as much as possible.		// Reduce the size of the IR as much as possible.
MPM.addPass(GlobalOptPass());		MPM.addPass(GlobalOptPass());

return MPM;		return MPM;
}		}

▲ Show 20 Lines • Show All 941 Lines • Show Last 20 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

Show All 38 Lines
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"		#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"

using namespace llvm;		using namespace llvm;

static cl::opt<bool>		static cl::opt<bool>
RunPartialInlining("enable-partial-inlining", cl::init(false), cl::Hidden,		RunPartialInliningThinLTOPreLink("enable-lto-prelink-partial-inlining", cl::init(false),
		jtonyUnsubmitted Done Reply Inline Actions Same nit format issue here with above. jtony: Same nit format issue here with above.
cl::ZeroOrMore, cl::desc("Run Partial inlinining pass"));		cl::Hidden, cl::ZeroOrMore,
		cl::desc("Run Partial inlining pass during thinLTO prelink"));

static cl::opt<bool>		static cl::opt<bool>
RunLoopVectorization("vectorize-loops", cl::Hidden,		RunLoopVectorization("vectorize-loops", cl::Hidden,
cl::desc("Run the Loop vectorization passes"));		cl::desc("Run the Loop vectorization passes"));

static cl::opt<bool>		static cl::opt<bool>
RunSLPVectorization("vectorize-slp", cl::Hidden,		RunSLPVectorization("vectorize-slp", cl::Hidden,
cl::desc("Run the SLP vectorization passes"));		cl::desc("Run the SLP vectorization passes"));
▲ Show 20 Lines • Show All 455 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
addExtensionsToPM(EP_CGSCCOptimizerLate, MPM);		addExtensionsToPM(EP_CGSCCOptimizerLate, MPM);
addFunctionSimplificationPasses(MPM);		addFunctionSimplificationPasses(MPM);

// FIXME: This is a HACK! The inliner pass above implicitly creates a CGSCC		// FIXME: This is a HACK! The inliner pass above implicitly creates a CGSCC
// pass manager that we are specifically trying to avoid. To prevent this		// pass manager that we are specifically trying to avoid. To prevent this
// we must insert a no-op module pass to reset the pass manager.		// we must insert a no-op module pass to reset the pass manager.
MPM.add(createBarrierNoopPass());		MPM.add(createBarrierNoopPass());

if (RunPartialInlining)		// Only run partial inlining when we're either i) not preparing for thinLTO,
		// or ii) preparing for thinLTO AND we've turned on the CL option.
		if (!PrepareForThinLTO \|\| (PrepareForThinLTO && RunPartialInliningThinLTOPreLink))
MPM.add(createPartialInliningPass());		MPM.add(createPartialInliningPass());

if (OptLevel > 1 && !PrepareForLTO && !PrepareForThinLTO)		if (OptLevel > 1 && !PrepareForLTO && !PrepareForThinLTO)
// Remove avail extern fns and globals definitions if we aren't		// Remove avail extern fns and globals definitions if we aren't
// compiling an object file for later LTO. For LTO we want to preserve		// compiling an object file for later LTO. For LTO we want to preserve
// these so they are eligible for inlining at link-time. Note if they		// these so they are eligible for inlining at link-time. Note if they
// are unreferenced they will be removed by GlobalDCE later, so		// are unreferenced they will be removed by GlobalDCE later, so
// this only impacts referenced available externally globals.		// this only impacts referenced available externally globals.
▲ Show 20 Lines • Show All 485 Lines • Show Last 20 Lines