This is an archive of the discontinued LLVM Phabricator instance.

[Inliner] Add a flag to disable manual alloca merging in the Inliner.
ClosedPublic

Authored by chandlerc on Aug 2 2016, 12:56 AM.

Download Raw Diff

Details

Reviewers

Commits

rGf702d8ecb66f: [Inliner] Add a flag to disable manual alloca merging in the Inliner.
rL278892: [Inliner] Add a flag to disable manual alloca merging in the Inliner.

Summary

This is off for now while testing can take place to make sure that in
fact we do sufficient stack coloring to fully obviate the manual alloca
array merging.

Some context on why we should be using stack coloring rather than
merging allocas in this way:

LLVM relies very heavily on analyzing pointers as coming from different
allocas in order to make aliasing decisions. These are some of the most
powerful aliasing signals available in LLVM. So merging allocas is an
extremely destructive operation on the LLVM IR -- it takes away highly
valuable and hard to reconstruct information.

As a consequence, inlined functions which happen to have array allocas
that this pattern matches will fail to be properly interleaved unless
SROA manages to hoist everything to an SSA register. Instead, the
inliner will have added an unnecessary dependence that one inlined
function execute after the other because they will have been rewritten
to refer to the same memory.

All that said, folks will reasonably want some time to experiment here
and make sure there are no significant regressions. A flag should give
us an easy knob to test.

For more context, see the thread here:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/103277.html
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103285.html

Diff Detail

Repository: rL LLVM

Event Timeline

chandlerc updated this revision to Diff 66435.Aug 2 2016, 12:56 AM

chandlerc retitled this revision from to [Inliner] Add a flag to disable manual alloca merging in the Inliner..

chandlerc updated this object.

chandlerc added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptAug 2 2016, 12:56 AM

Pulling it out into a separate function seems like a nice cleanup anyway. That seems fine regardless of whether we ultimately remove the logic.

And the flag is off by default and easy enough to remove. So LGTM.

Any initial results from disabling the alloca merging?
Also, do you know if we have any PR's related to the information loss due to this alloca merging?

This revision is now accepted and ready to land.Aug 2 2016, 1:14 AM

mcrosier edited edge metadata.Aug 8 2016, 12:47 PM

mcrosier added a subscriber: bmakam.

@bmakam: Would you mind downloading and testing this patch? Please do full correctness and SPEC200X performance (our head-to-head methodology with train input would be fine).

In D23052#508976, @mcrosier wrote:

@bmakam: Would you mind downloading and testing this patch? Please do full correctness and SPEC200X performance (our head-to-head methodology with train input would be fine).

Tested on Kryo. Only these SPEC200X benchmarks had non-noise performance gains(runtime) with this change:

spec2006/soplex:train	[-2.347%, +3.157%]
spec2006/sphinx3:train	[+0.469%, +2.124%]
spec2006/xalancbmk:train	[+6.566%, +8.743%]

There were no regressions.

In D23052#509930, @bmakam wrote:

In D23052#508976, @mcrosier wrote:

@bmakam: Would you mind downloading and testing this patch? Please do full correctness and SPEC200X performance (our head-to-head methodology with train input would be fine).

Tested on Kryo. Only these SPEC200X benchmarks had non-noise performance gains(runtime) with this change:

spec2006/soplex:train [-2.347%, +3.157%]

spec2006/sphinx3:train [+0.469%, +2.124%]

spec2006/xalancbmk:train [+6.566%, +8.743%]

There were no regressions.

Thanks, Balaram. No correctness issues, correct?

In D23052#509942, @mcrosier wrote:

In D23052#509930, @bmakam wrote:

In D23052#508976, @mcrosier wrote:

@bmakam: Would you mind downloading and testing this patch? Please do full correctness and SPEC200X performance (our head-to-head methodology with train input would be fine).

Tested on Kryo. Only these SPEC200X benchmarks had non-noise performance gains(runtime) with this change:

spec2006/soplex:train [-2.347%, +3.157%]

spec2006/sphinx3:train [+0.469%, +2.124%]

spec2006/xalancbmk:train [+6.566%, +8.743%]

There were no regressions.

Thanks, Balaram. No correctness issues, correct?

Oh I forgot running full correctness. There were no correctness issues in SPEC200X benchmarks. I will run full correctness tests.

In D23052#509953, @bmakam wrote:

In D23052#509942, @mcrosier wrote:

In D23052#509930, @bmakam wrote:

In D23052#508976, @mcrosier wrote:

@bmakam: Would you mind downloading and testing this patch? Please do full correctness and SPEC200X performance (our head-to-head methodology with train input would be fine).

Tested on Kryo. Only these SPEC200X benchmarks had non-noise performance gains(runtime) with this change:

spec2006/soplex:train [-2.347%, +3.157%]

spec2006/sphinx3:train [+0.469%, +2.124%]

spec2006/xalancbmk:train [+6.566%, +8.743%]

There were no regressions.

Thanks, Balaram. No correctness issues, correct?

Oh I forgot running full correctness. There were no correctness issues in SPEC200X benchmarks. I will run full correctness tests.

Just finished running full correctness tests. There were no correctness issues found in my tests.
Thanks,
Balaram

Closed by commit rL278892: [Inliner] Add a flag to disable manual alloca merging in the Inliner. (authored by chandlerc). · Explain WhyAug 16 2016, 7:48 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

IPO/

Inliner.cpp

115 lines

Diff 68299

llvm/trunk/lib/Transforms/IPO/Inliner.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
STATISTIC(NumDeleted, "Number of functions deleted because all callers found");		STATISTIC(NumDeleted, "Number of functions deleted because all callers found");
STATISTIC(NumMergedAllocas, "Number of allocas merged together");		STATISTIC(NumMergedAllocas, "Number of allocas merged together");

// This weirdly named statistic tracks the number of times that, when attempting		// This weirdly named statistic tracks the number of times that, when attempting
// to inline a function A into B, we analyze the callers of B in order to see		// to inline a function A into B, we analyze the callers of B in order to see
// if those would be more profitable and blocked inline steps.		// if those would be more profitable and blocked inline steps.
STATISTIC(NumCallerCallersAnalyzed, "Number of caller-callers analyzed");		STATISTIC(NumCallerCallersAnalyzed, "Number of caller-callers analyzed");

		/// Flag to disable manual alloca merging.
		///
		/// Merging of allocas was originally done as a stack-size saving technique
		/// prior to LLVM's code generator having support for stack coloring based on
		/// lifetime markers. It is now in the process of being removed. To experiment
		/// with disabling it and relying fully on lifetime marker based stack
		/// coloring, you can pass this flag to LLVM.
		static cl::opt<bool>
		DisableInlinedAllocaMerging("disable-inlined-alloca-merging",
		cl::init(false), cl::Hidden);

namespace {		namespace {
enum class InlinerFunctionImportStatsOpts {		enum class InlinerFunctionImportStatsOpts {
No = 0,		No = 0,
Basic = 1,		Basic = 1,
Verbose = 2,		Verbose = 2,
};		};

cl::opt<InlinerFunctionImportStatsOpts> InlinerFunctionImportStats(		cl::opt<InlinerFunctionImportStatsOpts> InlinerFunctionImportStats(
Show All 20 Lines	void Inliner::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addRequired<ProfileSummaryInfoWrapperPass>();		AU.addRequired<ProfileSummaryInfoWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
getAAResultsAnalysisUsage(AU);		getAAResultsAnalysisUsage(AU);
CallGraphSCCPass::getAnalysisUsage(AU);		CallGraphSCCPass::getAnalysisUsage(AU);
}		}

typedef DenseMap<ArrayType , std::vector<AllocaInst >> InlinedArrayAllocasTy;		typedef DenseMap<ArrayType , std::vector<AllocaInst >> InlinedArrayAllocasTy;

/// If it is possible to inline the specified call site,		/// Look at all of the allocas that we inlined through this call site. If we
/// do so and update the CallGraph for this operation.		/// have already inlined other allocas through other calls into this function,
		/// then we know that they have disjoint lifetimes and that we can merge them.
///		///
/// This function also does some basic book-keeping to update the IR. The		/// There are many heuristics possible for merging these allocas, and the
/// InlinedArrayAllocas map keeps track of any allocas that are already		/// different options have different tradeoffs. One thing that we really
/// available from other functions inlined into the caller. If we are able to		/// don't want to hurt is SRoA: once inlining happens, often allocas are no
/// inline this call site we attempt to reuse already available allocas or add		/// longer address taken and so they can be promoted.
/// any new allocas to the set if not possible.		///
static bool InlineCallIfPossible(		/// Our "solution" for that is to only merge allocas whose outermost type is an
CallSite CS, InlineFunctionInfo &IFI,		/// array type. These are usually not promoted because someone is using a
InlinedArrayAllocasTy &InlinedArrayAllocas, int InlineHistory,		/// variable index into them. These are also often the most important ones to
bool InsertLifetime, function_ref<AAResults &(Function &)> AARGetter,		/// merge.
ImportedFunctionsInliningStatistics &ImportedFunctionsStats) {		///
Function *Callee = CS.getCalledFunction();		/// A better solution would be to have real memory lifetime markers in the IR
Function *Caller = CS.getCaller();		/// and not have the inliner do any merging of allocas at all. This would
		/// allow the backend to do proper stack slot coloring of all allocas that
AAResults &AAR = AARGetter(*Callee);		/// actually make it to the backend, which is really what we want.
		///
// Try to inline the function. Get the list of static allocas that were		/// Because we don't have this information, we do this simple and useful hack.
// inlined.		static void mergeInlinedArrayAllocas(
if (!InlineFunction(CS, IFI, &AAR, InsertLifetime))		Function *Caller, InlineFunctionInfo &IFI,
return false;		InlinedArrayAllocasTy &InlinedArrayAllocas, int InlineHistory) {

if (InlinerFunctionImportStats != InlinerFunctionImportStatsOpts::No)
ImportedFunctionsStats.recordInline(Caller, Callee);

AttributeFuncs::mergeAttributesForInlining(Caller, Callee);

// Look at all of the allocas that we inlined through this call site. If we
// have already inlined other allocas through other calls into this function,
// then we know that they have disjoint lifetimes and that we can merge them.
//
// There are many heuristics possible for merging these allocas, and the
// different options have different tradeoffs. One thing that we really
// don't want to hurt is SRoA: once inlining happens, often allocas are no
// longer address taken and so they can be promoted.
//
// Our "solution" for that is to only merge allocas whose outermost type is an
// array type. These are usually not promoted because someone is using a
// variable index into them. These are also often the most important ones to
// merge.
//
// A better solution would be to have real memory lifetime markers in the IR
// and not have the inliner do any merging of allocas at all. This would
// allow the backend to do proper stack slot coloring of all allocas that
// actually make it to the backend, which is really what we want.
//
// Because we don't have this information, we do this simple and useful hack.
//
SmallPtrSet<AllocaInst *, 16> UsedAllocas;		SmallPtrSet<AllocaInst *, 16> UsedAllocas;

// When processing our SCC, check to see if CS was inlined from some other		// When processing our SCC, check to see if CS was inlined from some other
// call site. For example, if we're processing "A" in this code:		// call site. For example, if we're processing "A" in this code:
// A() { B() }		// A() { B() }
// B() { x = alloca ... C() }		// B() { x = alloca ... C() }
// C() { y = alloca ... }		// C() { y = alloca ... }
// Assume that C was not inlined into B initially, and so we're processing A		// Assume that C was not inlined into B initially, and so we're processing A
// and decide to inline B into A. Doing this makes an alloca available for		// and decide to inline B into A. Doing this makes an alloca available for
// reuse and makes a callsite (C) available for inlining. When we process		// reuse and makes a callsite (C) available for inlining. When we process
// the C call site we don't want to do any alloca merging between X and Y		// the C call site we don't want to do any alloca merging between X and Y
// because their scopes are not disjoint. We could make this smarter by		// because their scopes are not disjoint. We could make this smarter by
// keeping track of the inline history for each alloca in the		// keeping track of the inline history for each alloca in the
// InlinedArrayAllocas but this isn't likely to be a significant win.		// InlinedArrayAllocas but this isn't likely to be a significant win.
if (InlineHistory != -1) // Only do merging for top-level call sites in SCC.		if (InlineHistory != -1) // Only do merging for top-level call sites in SCC.
return true;		return;

// Loop over all the allocas we have so far and see if they can be merged with		// Loop over all the allocas we have so far and see if they can be merged with
// a previously inlined alloca. If not, remember that we had it.		// a previously inlined alloca. If not, remember that we had it.
for (unsigned AllocaNo = 0, e = IFI.StaticAllocas.size(); AllocaNo != e;		for (unsigned AllocaNo = 0, e = IFI.StaticAllocas.size(); AllocaNo != e;
++AllocaNo) {		++AllocaNo) {
AllocaInst *AI = IFI.StaticAllocas[AllocaNo];		AllocaInst *AI = IFI.StaticAllocas[AllocaNo];

// Don't bother trying to merge array allocations (they will usually be		// Don't bother trying to merge array allocations (they will usually be
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	for (unsigned AllocaNo = 0, e = IFI.StaticAllocas.size(); AllocaNo != e;
// If we were unable to merge away the alloca either because there are no		// If we were unable to merge away the alloca either because there are no
// allocas of the right type available or because we reused them all		// allocas of the right type available or because we reused them all
// already, remember that this alloca came from an inlined function and mark		// already, remember that this alloca came from an inlined function and mark
// it used so we don't reuse it for other allocas from this inline		// it used so we don't reuse it for other allocas from this inline
// operation.		// operation.
AllocasForType.push_back(AI);		AllocasForType.push_back(AI);
UsedAllocas.insert(AI);		UsedAllocas.insert(AI);
}		}
		}

		/// If it is possible to inline the specified call site,
		/// do so and update the CallGraph for this operation.
		///
		/// This function also does some basic book-keeping to update the IR. The
		/// InlinedArrayAllocas map keeps track of any allocas that are already
		/// available from other functions inlined into the caller. If we are able to
		/// inline this call site we attempt to reuse already available allocas or add
		/// any new allocas to the set if not possible.
		static bool InlineCallIfPossible(
		CallSite CS, InlineFunctionInfo &IFI,
		InlinedArrayAllocasTy &InlinedArrayAllocas, int InlineHistory,
		bool InsertLifetime, function_ref<AAResults &(Function &)> &AARGetter,
		ImportedFunctionsInliningStatistics &ImportedFunctionsStats) {
		Function *Callee = CS.getCalledFunction();
		Function *Caller = CS.getCaller();

		AAResults &AAR = AARGetter(*Callee);

		// Try to inline the function. Get the list of static allocas that were
		// inlined.
		if (!InlineFunction(CS, IFI, &AAR, InsertLifetime))
		return false;

		if (InlinerFunctionImportStats != InlinerFunctionImportStatsOpts::No)
		ImportedFunctionsStats.recordInline(Caller, Callee);

		AttributeFuncs::mergeAttributesForInlining(Caller, Callee);

		if (!DisableInlinedAllocaMerging)
		mergeInlinedArrayAllocas(Caller, IFI, InlinedArrayAllocas, InlineHistory);

return true;		return true;
}		}

static void emitAnalysis(CallSite CS, OptimizationRemarkEmitter &ORE,		static void emitAnalysis(CallSite CS, OptimizationRemarkEmitter &ORE,
const Twine &Msg) {		const Twine &Msg) {
ORE.emitOptimizationRemarkAnalysis(DEBUG_TYPE, CS.getInstruction(), Msg);		ORE.emitOptimizationRemarkAnalysis(DEBUG_TYPE, CS.getInstruction(), Msg);
}		}
▲ Show 20 Lines • Show All 498 Lines • Show Last 20 Lines