This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
Inliner.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
-
inline-attr-prop.ll

Differential D36726

[Inliner] Teach the inliner to propagate attributes that have specific effects on inlining thresholds when we happen to inline into the entry (extended) basic block.
Needs ReviewPublic

Authored by chandlerc on Aug 14 2017, 6:04 PM.

Download Raw Diff

Details

Reviewers

hfinkel
davidxl
eraman
sanjoy

Summary

The fundamental idea here is to address a fragility in the bottom-up
inliner. When inlining a call edge it can *erase* information which
would otherwise very dramatically impact the inlining heuristics. We've
seen this before with noalias and alignment parameter attributes and
addressed them with scoped aliasing information and assumptions
respectively.

However, another aspect of information that may be lost are the
inlinehint and cold attributes which can have a significant impact
on inlining thresholds. Specifically, there is an assymetry in how these
end up being handled depending on whether we happen to successfully
defer inlining during the bottom-up walk or not.

If we defer inlining during the bottom-up walk, we will *also* end up
"propagating" the inlinehint and cold up one level of the call graph
by inlining the wrapper function first and then re-considering the new
call edge which still has these attributes.

If we *don't* defer inlining, then we will inline into the function and
lose these attributes. Later on, we may simplify away everything else
and end up forming an *exact copy* of the original function body. But
because the inlining step lost these attributes this copy of the
original function body will inline substantially differently (either too
little in the case of inlinehint or too much in the case of cold).

With this patch, we try to detect the obviously safe cases where we are
inlining into a callsite that could easily simplify into a trivial
wrapper by looking for callsites in the (extended) entry block. This
also makes it easy to ensure that the callsite is reached for the case
of the cold attribute that may be a dynamic property.

The goal here is mostly to make inlining decisions a bit more
predictable. We should still work on improving the logic around
deferring inlining during the bottom-up walk to find better inlining
candidates, this should just be a useful backstop to ensure that in
addition to missing the deferral opportunity we don't *also* erase
valuable inlining attributes.

An example of a case today that is highly surprising (and motivates this
patch) would be a trivial lambda function passed to STL algorithms. The
operator()s on this lambda object may not be marked with inlinehint,
but it may be a trivial wrapper around some other function which is
marked with inlinehint. It is then surprising when the inliner
"erases" this information and makes different (in either direction)
inlining decisions through the wrapper than it would through a direct
call to the member function. This pattern has actually shown up in a few
benchmarks.

So far, initial benchmarking has shown no substantial performance swings
either direction, but I'm still collecting more precise numbers. I just
wanted to get this out for review.

Depends on D36722.

Diff Detail

Build Status

Buildable 9288
Build 9288: arc lint + arc unit

Event Timeline

chandlerc created this revision.Aug 14 2017, 6:04 PM

Herald added a subscriber: mcrosier. · View Herald TranscriptAug 14 2017, 6:04 PM

Harbormaster completed remote builds in B9288: Diff 111106.Aug 14 2017, 6:06 PM

I will provide more comments on propagating the inline hint later. However, I do think it is wrong to propagate the cold attribute in inliner -- there should be already an inter-procedural attribute propagation pass that does this.

In D36726#841627, @davidxl wrote:

I will provide more comments on propagating the inline hint later. However, I do think it is wrong to propagate the cold attribute in inliner -- there should be already an inter-procedural attribute propagation pass that does this.

While I'd love to teach our IPO attribute propagation to do that, we might still need to do it here. The inliner might make the opportunity for this propagation visible and then delete the call removing the chance to do it all within a single pass run, and the interprocedural pass never get a chance to see the intermediate state.

sanjoy resigned from this revision.Jan 29 2022, 5:33 PM

Herald added a subscriber: ormris. · View Herald TranscriptJan 29 2022, 5:33 PM

Revision Contents

Path

Size

lib/

Transforms/

IPO/

Inliner.cpp

114 lines

test/

Transforms/

Inline/

inline-attr-prop.ll

199 lines

Diff 111106

lib/Transforms/IPO/Inliner.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
/// prior to LLVM's code generator having support for stack coloring based on		/// prior to LLVM's code generator having support for stack coloring based on
/// lifetime markers. It is now in the process of being removed. To experiment		/// lifetime markers. It is now in the process of being removed. To experiment
/// with disabling it and relying fully on lifetime marker based stack		/// with disabling it and relying fully on lifetime marker based stack
/// coloring, you can pass this flag to LLVM.		/// coloring, you can pass this flag to LLVM.
static cl::opt<bool>		static cl::opt<bool>
DisableInlinedAllocaMerging("disable-inlined-alloca-merging",		DisableInlinedAllocaMerging("disable-inlined-alloca-merging",
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

		/// Flag to control the maximum number of instructions scanned when checking
		/// for inlining into a definitely reached call site.
		///
		/// This is set to a completely arbitrary value and primarily exists to prevent
		/// run-away compile time on truly outrageous callers.
		static cl::opt<int>
		ReachabilityInstrScanLimit("inliner-reachability-instr-scan-limit",
		cl::init(50), cl::Hidden);

namespace {		namespace {
enum class InlinerFunctionImportStatsOpts {		enum class InlinerFunctionImportStatsOpts {
No = 0,		No = 0,
Basic = 1,		Basic = 1,
Verbose = 2,		Verbose = 2,
};		};

cl::opt<InlinerFunctionImportStatsOpts> InlinerFunctionImportStats(		cl::opt<InlinerFunctionImportStatsOpts> InlinerFunctionImportStats(
▲ Show 20 Lines • Show All 664 Lines • ▼ Show 20 Lines	FunctionsToRemove.erase(
FunctionsToRemove.end());		FunctionsToRemove.end());
for (CallGraphNode *CGN : FunctionsToRemove) {		for (CallGraphNode *CGN : FunctionsToRemove) {
delete CG.removeFunctionFromModule(CGN);		delete CG.removeFunctionFromModule(CGN);
++NumDeleted;		++NumDeleted;
}		}
return true;		return true;
}		}

		namespace {
		/// Struct of properties used to control propagation of attributes after
		/// inlining.
		struct AttrPropCSInfo {
		/// True when the callsite is in the entry extended basic block.
		bool IsInExtendedEntry;

		/// True when the callsite is in the entry extended basic block and
		/// definitely reached if the function is called.
		///
		/// For example, this will be false if there are intervening calls which may
		/// unwind or simply never return.
		bool IsInExtendedEntryAndReached;
		};
		} // namespace

		static AttrPropCSInfo computeAttrPropagationCallSiteInfo(CallSite CS) {
		AttrPropCSInfo Info = {false, false};

		BasicBlock &ParentBB = *CS.getParent();
		Function &Caller = *ParentBB.getParent();

		// First see if the call site is part of the "extended" entry basic block.
		SmallVector<BasicBlock *, 4> Blocks;
		BasicBlock BB = &Caller.begin();
		do {
		Blocks.push_back(BB);
		if (BB == &ParentBB)
		break;
		BB = BB->getUniqueSuccessor();
		} while (BB);

		if (!BB)
		return Info;

		Info.IsInExtendedEntry = true;

		// Now scan the instructions in these blocks to see if the callsite is also
		// definitely reached.
		int NumInstructionsScanned = 0;
		for (BasicBlock *BB : Blocks) {
		for (Instruction &I : *BB) {
		// If we've reached our limit on the scan, just return.
		if (++NumInstructionsScanned > ReachabilityInstrScanLimit)
		return Info;

		// If we've found this callsite, we're done.
		if (&I == CS.getInstruction())
		break;

		// Otherwise check if this is a call that could prevent reaching our
		// callsite.
		auto *CI = dyn_cast<CallInst>(&I);
		if (!CI)
		continue;

		// FIXME: We use `readnone` as an approximation for "definitely returns"
		// here as in other places in LLVM. They're all a bit imprecise.
		if (!CI->doesNotThrow() \|\| !CI->doesNotAccessMemory())
		// If this could throw or access memory, assume we don't have
		// reachability.
		return Info;
		}
		}

		// We finished the scan w/o breaking reachability!
		Info.IsInExtendedEntryAndReached = true;
		return Info;
		}

		static bool canPropagateAttrsAfterInlining(Function &Caller, Function &Callee) {
		return (!Caller.hasFnAttribute(Attribute::InlineHint) &&
		Callee.hasFnAttribute(Attribute::InlineHint)) \|\|
		(!Caller.hasFnAttribute(Attribute::Cold) &&
		Callee.hasFnAttribute(Attribute::Cold));
		}

		static void propagateAttrsAfterInlining(Function &Caller, Function &Callee,
		AttrPropCSInfo PropInfo) {
		assert(canPropagateAttrsAfterInlining(Caller, Callee) &&
		"Tried to propate attributes when there were none!");

		// We only propagate callee attributes when inlining into the extended entry
		// block.
		if (!PropInfo.IsInExtendedEntry)
		return;

		if (Callee.hasFnAttribute(Attribute::InlineHint))
		Caller.addFnAttr(Attribute::InlineHint);

		// If the call is definitely reached as well, sink the cold attribute down to
		// the caller as well.
		if (PropInfo.IsInExtendedEntryAndReached &&
		Callee.hasFnAttribute(Attribute::Cold))
		Caller.addFnAttr(Attribute::Cold);
		}

PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,		PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
CGSCCAnalysisManager &AM, LazyCallGraph &CG,		CGSCCAnalysisManager &AM, LazyCallGraph &CG,
CGSCCUpdateResult &UR) {		CGSCCUpdateResult &UR) {
const ModuleAnalysisManager &MAM =		const ModuleAnalysisManager &MAM =
AM.getResult<ModuleAnalysisManagerCGSCCProxy>(InitialC, CG).getManager();		AM.getResult<ModuleAnalysisManagerCGSCCProxy>(InitialC, CG).getManager();
bool Changed = false;		bool Changed = false;

assert(InitialC.size() > 0 && "Cannot handle an empty SCC!");		assert(InitialC.size() > 0 && "Cannot handle an empty SCC!");
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	for (; i < (int)Calls.size() && Calls[i].first.getCaller() == &F; ++i) {
<< F.getName() << " -> " << Callee.getName() << "\n");		<< F.getName() << " -> " << Callee.getName() << "\n");
continue;		continue;
}		}

// Check whether we want to inline this callsite.		// Check whether we want to inline this callsite.
if (!shouldInline(CS, GetInlineCost, ORE))		if (!shouldInline(CS, GetInlineCost, ORE))
continue;		continue;

		Optional<AttrPropCSInfo> AttrPropInfo;
		if (canPropagateAttrsAfterInlining(F, Callee))
		AttrPropInfo = computeAttrPropagationCallSiteInfo(CS);

// Setup the data structure used to plumb customization into the		// Setup the data structure used to plumb customization into the
// `InlineFunction` routine.		// `InlineFunction` routine.
InlineFunctionInfo IFI(		InlineFunctionInfo IFI(
/cg=/nullptr, &GetAssumptionCache, PSI,		/cg=/nullptr, &GetAssumptionCache, PSI,
&FAM.getResult<BlockFrequencyAnalysis>(*(CS.getCaller())),		&FAM.getResult<BlockFrequencyAnalysis>(*(CS.getCaller())),
&FAM.getResult<BlockFrequencyAnalysis>(Callee));		&FAM.getResult<BlockFrequencyAnalysis>(Callee));

if (!InlineFunction(CS, IFI))		if (!InlineFunction(CS, IFI))
Show All 9 Lines	for (; i < (int)Calls.size() && Calls[i].first.getCaller() == &F; ++i) {
if (Function *NewCallee = CS.getCalledFunction())		if (Function *NewCallee = CS.getCalledFunction())
if (!NewCallee->isDeclaration())		if (!NewCallee->isDeclaration())
Calls.push_back({CS, NewHistoryID});		Calls.push_back({CS, NewHistoryID});
}		}

// Merge the attributes based on the inlining.		// Merge the attributes based on the inlining.
AttributeFuncs::mergeAttributesForInlining(F, Callee);		AttributeFuncs::mergeAttributesForInlining(F, Callee);

		// Also propagate inlining-specific attributes.
		if (AttrPropInfo)
		propagateAttrsAfterInlining(F, Callee, *AttrPropInfo);

// For local functions, check whether this makes the callee trivially		// For local functions, check whether this makes the callee trivially
// dead. In that case, we can drop the body of the function eagerly		// dead. In that case, we can drop the body of the function eagerly
// which may reduce the number of callers of other functions to one,		// which may reduce the number of callers of other functions to one,
// changing inline cost thresholds.		// changing inline cost thresholds.
if (Callee.hasLocalLinkage()) {		if (Callee.hasLocalLinkage()) {
// To check this we also need to nuke any dead constant uses (perhaps		// To check this we also need to nuke any dead constant uses (perhaps
// made dead by this operation on other functions).		// made dead by this operation on other functions).
Callee.removeDeadConstantUsers();		Callee.removeDeadConstantUsers();
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

test/Transforms/Inline/inline-attr-prop.ll

This file was added.

				; Test that when we successfully inline a call we propagate attributes to the
				; caller that have specific impact on subsequent inlining.
				;
				; RUN: opt -S < %s -passes=inline -inline-threshold=50 -inlinecold-threshold=10 -inlinehint-threshold=200 \| FileCheck %s

				@a = global i32 4

				; Filler that is larger than the cold threshold but smaller than the normal
				; threshold.
				define i32 @filler() {
				; CHECK-LABEL: define i32 @filler()
				entry:
				%a1 = load volatile i32, i32* @a
				%a2 = load volatile i32, i32* @a
				%a3 = load volatile i32, i32* @a
				%a4 = load volatile i32, i32* @a
				%a5 = load volatile i32, i32* @a
				%a6 = load volatile i32, i32* @a
				%a7 = load volatile i32, i32* @a
				%a8 = load volatile i32, i32* @a
				%a9 = load volatile i32, i32* @a
				%a10 = load volatile i32, i32* @a
				ret i32 %a10
				}

				define i32 @cold_small() cold {
				; CHECK-LABEL: define i32 @cold_small()
				; CHECK: #[[COLD_ATTR_GROUP:[0-9]+]] {
				entry:
				%a1 = load volatile i32, i32* @a
				ret i32 %a1
				}

				define i32 @cold_medium() cold {
				; CHECK-LABEL: define i32 @cold_medium()
				; CHECK: #[[COLD_ATTR_GROUP]] {
				entry:
				%a = call i32 @filler()
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @hint_medium() inlinehint {
				; CHECK-LABEL: define i32 @hint_medium()
				; CHECK: #[[HINT_ATTR_GROUP:[0-9]+]] {
				entry:
				%a = call i32 @filler()
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @hint_large() inlinehint {
				; CHECK-LABEL: define i32 @hint_large()
				; CHECK: #[[HINT_ATTR_GROUP]] {
				entry:
				%a1 = call i32 @filler()
				%a2 = call i32 @filler()
				%a3 = call i32 @filler()
				%a4 = call i32 @filler()
				%a5 = call i32 @filler()
				; CHECK-NOT: call
				ret i32 %a5
				}

				define i32 @normal_small() {
				; CHECK-LABEL: define i32 @normal_small()
				entry:
				%a1 = load volatile i32, i32* @a
				ret i32 %a1
				}

				define i32 @normal_medium() {
				; CHECK-LABEL: define i32 @normal_medium()
				entry:
				%a = call i32 @filler()
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @normal_large() {
				; CHECK-LABEL: define i32 @normal_large()
				entry:
				%a1 = call i32 @filler()
				%a2 = call i32 @filler()
				%a3 = call i32 @filler()
				%a4 = call i32 @filler()
				%a5 = call i32 @filler()
				; CHECK-NOT: call
				ret i32 %a5
				}

				define i32 @base_case_test_cold() {
				; CHECK-LABEL: define i32 @base_case_test_cold()
				entry:
				%a1 = call i32 @cold_small()
				; CHECK-NOT: call
				%a2 = call i32 @cold_medium()
				; CHECK: call i32 @cold_medium()
				ret i32 %a2
				}

				define i32 @base_case_test_hint() {
				; CHECK-LABEL: define i32 @base_case_test_hint()
				entry:
				%a2 = call i32 @hint_medium()
				%a3 = call i32 @hint_large()
				; CHECK-NOT: call
				ret i32 %a3
				}

				define i32 @base_case_test_normal() {
				; CHECK-LABEL: define i32 @base_case_test_normal()
				entry:
				%a1 = call i32 @normal_small()
				%a2 = call i32 @normal_medium()
				; CHECK-NOT: call
				%a3 = call i32 @normal_large()
				; CHECK: call i32 @normal_large()
				ret i32 %a3
				}

				define i32 @cold_wrapper_small() {
				; CHECK-LABEL: define i32 @cold_wrapper_small()
				; CHECK: #[[COLD_ATTR_GROUP]] {
				entry:
				%a = call i32 @cold_small()
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @cold_wrapper_medium() {
				; CHECK-LABEL: define i32 @cold_wrapper_medium()
				; CHECK: #[[COLD_ATTR_GROUP]] {
				entry:
				; Use `alwaysinline` to simulate the case where for whatever reason we both
				; don't defer inlining and end up inlining at this phase but ending up with
				; more code than is desirable to inline.
				%a = call i32 @cold_medium() alwaysinline
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @test_call_cold_wrapper_small() {
				; CHECK-LABEL: define i32 @test_call_cold_wrapper_small()
				entry:
				%a = call i32 @cold_wrapper_small()
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @test_call_cold_wrapper_medium() {
				; CHECK-LABEL: define i32 @test_call_cold_wrapper_medium()
				entry:
				%a = call i32 @cold_wrapper_medium()
				; CHECK: call i32 @cold_wrapper_medium()
				ret i32 %a
				}

				define i32 @hint_wrapper_medium() {
				; CHECK-LABEL: define i32 @hint_wrapper_medium()
				; CHECK: #[[HINT_ATTR_GROUP]] {
				entry:
				%a = call i32 @hint_medium()
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @hint_wrapper_large() {
				; CHECK-LABEL: define i32 @hint_wrapper_large()
				; CHECK: #[[HINT_ATTR_GROUP]] {
				entry:
				; Use `alwaysinline` to simulate the case where for whatever reason we both
				; don't defer inlining and end up inlining at this phase but ending up with
				; more code than is desirable to inline.
				%a = call i32 @hint_large()
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @test_call_hint_wrapper_medium() {
				; CHECK-LABEL: define i32 @test_call_hint_wrapper_medium()
				entry:
				%a = call i32 @hint_wrapper_medium()
				; CHECK-NOT: call
				ret i32 %a
				}

				define i32 @test_call_hint_wrapper_large() {
				; CHECK-LABEL: define i32 @test_call_hint_wrapper_large()
				entry:
				%a = call i32 @hint_wrapper_large()
				; CHECK-NOT: call
				ret i32 %a
				}

				; CHECK-LABEL: attributes
				; CHECK: #[[COLD_ATTR_GROUP]] = { cold }
				; CHECK-LABEL: attributes
				; CHECK: #[[HINT_ATTR_GROUP]] = { inlinehint }