This is an archive of the discontinued LLVM Phabricator instance.

Differential D8267

Cleanup early-exit from analyzeCall
Needs ReviewPublic

Authored by eraman on Mar 11 2015, 2:19 PM.

Download Raw Diff

This revision needs review, but there are no reviewers specified.

Details

Reviewers: None

Summary

This makes the following changes to the early bailout code:

Speculatively apply all possible bonuses to Threshold initially. Whenever we know a certain bonus can not be applied, subtract it from Threshold. Note that we are still not processing the entire function, although we might delay early bailout.
It currently calculates the 10% and 50% vector bonuses after applying a single basic-block bonus on the original threshold. I think expressing the bonuses in terms of the original threshold makes it less confusing.
Set the threshold to 0 instead of 1 when an unreachable instruction is seen after the call. Has no practical effect, but keeps it in line with the comment before that code.

Diff Detail

Repository: rL LLVM

Event Timeline

eraman updated this revision to Diff 21758.Mar 11 2015, 2:19 PM

eraman retitled this revision from to Cleanup early-exit from analyzeCall.

eraman updated this object.

eraman edited the test plan for this revision. (Show Details)

eraman set the repository for this revision to rL LLVM.

eraman added a subscriber: Unknown Object (MLST).

I think it would even more clear to reverse everything.

Add all of the possible bonuses to the threshold, and when they cease to apply, subtract them, including subtracting any part of the vector bonus that ended up not applying at the very end.

That makes it more clear that this doesn't meaningfully change the bounds on how much of the function we look at, and there aren't two thresholds at all. We just need to have the threshold always be the max of all possible bonuses which could apply.

(It would be good to update the commit log message as well to be more clear that this doesn't cause us to look at the entire function, just to assume the maximum bonuses will apply until proven wrong.)

Based on Chandler's comments, removed MaxThreshold, speculatively bumped up Threshold to include maximum possible vector bonus and readjust the threshold when the actual vector bonus could be determined.

eraman added a reviewer: chandlerc.Mar 13 2015, 1:54 PM

(sorry for delays)

lib/Analysis/IPA/InlineCost.cpp
1229–1240	How about just moving the comment up above NumVectorInstructions checks, and do two different subtracts from Threshold based on the specific instruction ratios?
1237	Where do you subtract off the single-bb bonus?
test/Transforms/Inline/vector-bonus.ll
2	Did you try running this test without your change? I think you will find that it doesn't fail -- you don't pipe the output of opt to the FileCheck tool, and so we never check anything. I would suggest "testing your test" by ensuring your test fails with the old version of opt first.
36–38	I would use CHECK-LABEL and put the checks inside the function body of @bar so its clear that there are checks associated with that function.

Sorry about the bad testcase. I verified that inlining didn't happen before and happened after the change by printing debug messages, but never tested the test case as written. I will update it.

lib/Analysis/IPA/InlineCost.cpp
1229–1240	If you mean something like if (NumVectorInstructions <= NumInstructions / 10) Threshold -= FiftyPercentVectorBonus; else if (NumVectorInstructions <= NumInstructions / 2) Threshold -= (FiftyPercentVectorBonus - TenPercentVectorBonus); a side effect of the above is that we won't set VectorBonus and won't print the value as part of the DEBUG_PRINT_STAT. Not a big deal, but I've found the stats useful. (Or do you want to move the Threshold += VectorBonus to where VectorBonus is set?)

Fixed the test case and changed the code that computed the revised threshold based on review comments.

Ping.

chandlerc removed a reviewer: chandlerc.Apr 6 2016, 10:44 PM

Revision Contents

Path

Size

lib/

Analysis/

IPA/

InlineCost.cpp

51 lines

test/

Transforms/

Inline/

vector-bonus.ll

37 lines

Diff 22032

lib/Analysis/IPA/InlineCost.cpp

Show First 20 Lines • Show All 945 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {

// If the caller is a recursive function then we don't want to inline		// If the caller is a recursive function then we don't want to inline
// functions which allocate a lot of stack space because it would increase		// functions which allocate a lot of stack space because it would increase
// the caller stack usage dramatically.		// the caller stack usage dramatically.
if (IsCallerRecursive &&		if (IsCallerRecursive &&
AllocatedSize > InlineConstants::TotalAllocaSizeRecursiveCaller)		AllocatedSize > InlineConstants::TotalAllocaSizeRecursiveCaller)
return false;		return false;

if (NumVectorInstructions > NumInstructions/2)		// Check if we've past the maximum possible threshold so we don't spin in
VectorBonus = FiftyPercentVectorBonus;		// huge basic blocks that will never inline.
else if (NumVectorInstructions > NumInstructions/10)		if (Cost > Threshold)
VectorBonus = TenPercentVectorBonus;
else
VectorBonus = 0;

// Check if we've past the threshold so we don't spin in huge basic
// blocks that will never inline.
if (Cost > (Threshold + VectorBonus))
return false;		return false;
}		}

return true;		return true;
}		}

/// \brief Compute the base pointer and cumulative constant offsets for V.		/// \brief Compute the base pointer and cumulative constant offsets for V.
///		///
Show All 39 Lines
/// Returns true if inlining this call is viable, and false if it is not		/// Returns true if inlining this call is viable, and false if it is not
/// viable. It computes the cost and adjusts the threshold based on numerous		/// viable. It computes the cost and adjusts the threshold based on numerous
/// factors and heuristics. If this method returns false but the computed cost		/// factors and heuristics. If this method returns false but the computed cost
/// is below the computed threshold, then inlining was forcibly disabled by		/// is below the computed threshold, then inlining was forcibly disabled by
/// some artifact of the routine.		/// some artifact of the routine.
bool CallAnalyzer::analyzeCall(CallSite CS) {		bool CallAnalyzer::analyzeCall(CallSite CS) {
++NumCallsAnalyzed;		++NumCallsAnalyzed;

// Track whether the post-inlining function would have more than one basic
// block. A single basic block is often intended for inlining. Balloon the
// threshold by 50% until we pass the single-BB phase.
bool SingleBB = true;
int SingleBBBonus = Threshold / 2;
Threshold += SingleBBBonus;

// Perform some tweaks to the cost and threshold based on the direct		// Perform some tweaks to the cost and threshold based on the direct
// callsite information.		// callsite information.

// We want to more aggressively inline vector-dense kernels, so up the		// We want to more aggressively inline vector-dense kernels, so up the
// threshold, and we'll lower it if the % of vector instructions gets too		// threshold, and we'll lower it if the % of vector instructions gets too
// low.		// low.
assert(NumInstructions == 0);		assert(NumInstructions == 0);
assert(NumVectorInstructions == 0);		assert(NumVectorInstructions == 0);
FiftyPercentVectorBonus = Threshold;		FiftyPercentVectorBonus = 3 * Threshold / 2;
TenPercentVectorBonus = Threshold / 2;		TenPercentVectorBonus = 3 * Threshold / 4;
const DataLayout &DL = F.getParent()->getDataLayout();		const DataLayout &DL = F.getParent()->getDataLayout();

		// Track whether the post-inlining function would have more than one basic
		// block. A single basic block is often intended for inlining. Balloon the
		// threshold by 50% until we pass the single-BB phase.
		bool SingleBB = true;
		int SingleBBBonus = Threshold / 2;

		// Speculatively apply all possible bonuses to Threshold. If cost exceeds
		// this Threshold any time, and cost cannot decrease, we can stop processing
		// the rest of the function body.
		Threshold += (SingleBBBonus + FiftyPercentVectorBonus);

// Give out bonuses per argument, as the instructions setting them up will		// Give out bonuses per argument, as the instructions setting them up will
// be gone after inlining.		// be gone after inlining.
for (unsigned I = 0, E = CS.arg_size(); I != E; ++I) {		for (unsigned I = 0, E = CS.arg_size(); I != E; ++I) {
if (CS.isByValArgument(I)) {		if (CS.isByValArgument(I)) {
// We approximate the number of loads and stores needed by dividing the		// We approximate the number of loads and stores needed by dividing the
// size of the byval type by the target's pointer size.		// size of the byval type by the target's pointer size.
PointerType *PTy = cast<PointerType>(CS.getArgument(I)->getType());		PointerType *PTy = cast<PointerType>(CS.getArgument(I)->getType());
unsigned TypeSize = DL.getTypeSizeInBits(PTy->getElementType());		unsigned TypeSize = DL.getTypeSizeInBits(PTy->getElementType());
Show All 26 Lines	bool CallAnalyzer::analyzeCall(CallSite CS) {

// If the instruction after the call, or if the normal destination of the		// If the instruction after the call, or if the normal destination of the
// invoke is an unreachable instruction, the function is noreturn. As such,		// invoke is an unreachable instruction, the function is noreturn. As such,
// there is little point in inlining this unless there is literally zero		// there is little point in inlining this unless there is literally zero
// cost.		// cost.
Instruction *Instr = CS.getInstruction();		Instruction *Instr = CS.getInstruction();
if (InvokeInst *II = dyn_cast<InvokeInst>(Instr)) {		if (InvokeInst *II = dyn_cast<InvokeInst>(Instr)) {
if (isa<UnreachableInst>(II->getNormalDest()->begin()))		if (isa<UnreachableInst>(II->getNormalDest()->begin()))
Threshold = 1;		Threshold = 0;
} else if (isa<UnreachableInst>(++BasicBlock::iterator(Instr)))		} else if (isa<UnreachableInst>(++BasicBlock::iterator(Instr)))
Threshold = 1;		Threshold = 0;

// If this function uses the coldcc calling convention, prefer not to inline		// If this function uses the coldcc calling convention, prefer not to inline
// it.		// it.
if (F.getCallingConv() == CallingConv::Cold)		if (F.getCallingConv() == CallingConv::Cold)
Cost += InlineConstants::ColdccPenalty;		Cost += InlineConstants::ColdccPenalty;

// Check if we're done. This can happen due to bonuses and penalties.		// Check if we're done. This can happen due to bonuses and penalties.
if (Cost > Threshold)		if (Cost > Threshold)
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	bool CallAnalyzer::analyzeCall(CallSite CS) {
typedef SetVector<BasicBlock , SmallVector<BasicBlock , 16>,		typedef SetVector<BasicBlock , SmallVector<BasicBlock , 16>,
SmallPtrSet<BasicBlock *, 16> > BBSetVector;		SmallPtrSet<BasicBlock *, 16> > BBSetVector;
BBSetVector BBWorklist;		BBSetVector BBWorklist;
BBWorklist.insert(&F.getEntryBlock());		BBWorklist.insert(&F.getEntryBlock());
// Note that we must not cache the size, this loop grows the worklist.		// Note that we must not cache the size, this loop grows the worklist.
for (unsigned Idx = 0; Idx != BBWorklist.size(); ++Idx) {		for (unsigned Idx = 0; Idx != BBWorklist.size(); ++Idx) {
// Bail out the moment we cross the threshold. This means we'll under-count		// Bail out the moment we cross the threshold. This means we'll under-count
// the cost, but only when undercounting doesn't matter.		// the cost, but only when undercounting doesn't matter.
if (Cost > (Threshold + VectorBonus))		if (Cost > Threshold)
break;		break;

BasicBlock *BB = BBWorklist[Idx];		BasicBlock *BB = BBWorklist[Idx];
if (BB->empty())		if (BB->empty())
continue;		continue;

// Disallow inlining a blockaddress. A blockaddress only has defined		// Disallow inlining a blockaddress. A blockaddress only has defined
// behavior for an indirect branch in the same function, and we do not		// behavior for an indirect branch in the same function, and we do not
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	bool CallAnalyzer::analyzeCall(CallSite CS) {
}		}

// If this is a noduplicate call, we can still inline as long as		// If this is a noduplicate call, we can still inline as long as
// inlining this would cause the removal of the caller (so the instruction		// inlining this would cause the removal of the caller (so the instruction
// is not actually duplicated, just moved).		// is not actually duplicated, just moved).
if (!OnlyOneCallAndLocalLinkage && ContainsNoDuplicateCall)		if (!OnlyOneCallAndLocalLinkage && ContainsNoDuplicateCall)
return false;		return false;

Threshold += VectorBonus;		// We applied the maximum possible vector bonus at the beginning. Now,
		// subtract the excess bonus, if any, from the Threshold before
		// comparing against Cost.
		if (NumVectorInstructions <= NumInstructions / 10)
		Threshold -= FiftyPercentVectorBonus;
		else if (NumVectorInstructions <= NumInstructions / 2)
		Threshold -= (FiftyPercentVectorBonus - TenPercentVectorBonus);

return Cost < Threshold;		return Cost < Threshold;
		chandlercUnsubmitted Not Done Reply Inline Actions Where do you subtract off the single-bb bonus? chandlerc: Where do you subtract off the single-bb bonus?
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		chandlercUnsubmitted Not Done Reply Inline Actions How about just moving the comment up above NumVectorInstructions checks, and do two different subtracts from Threshold based on the specific instruction ratios? chandlerc: How about just moving the comment up above NumVectorInstructions checks, and do two different…
		eramanAuthorUnsubmitted Not Done Reply Inline Actions If you mean something like if (NumVectorInstructions <= NumInstructions / 10) Threshold -= FiftyPercentVectorBonus; else if (NumVectorInstructions <= NumInstructions / 2) Threshold -= (FiftyPercentVectorBonus - TenPercentVectorBonus); a side effect of the above is that we won't set VectorBonus and won't print the value as part of the DEBUG_PRINT_STAT. Not a big deal, but I've found the stats useful. (Or do you want to move the Threshold += VectorBonus to where VectorBonus is set?) eraman: If you mean something like if (NumVectorInstructions <= NumInstructions / 10) Threshold -=…
/// \brief Dump stats about this call's analysis.		/// \brief Dump stats about this call's analysis.
void CallAnalyzer::dump() {		void CallAnalyzer::dump() {
#define DEBUG_PRINT_STAT(x) dbgs() << " " #x ": " << x << "\n"		#define DEBUG_PRINT_STAT(x) dbgs() << " " #x ": " << x << "\n"
DEBUG_PRINT_STAT(NumConstantArgs);		DEBUG_PRINT_STAT(NumConstantArgs);
DEBUG_PRINT_STAT(NumConstantOffsetPtrArgs);		DEBUG_PRINT_STAT(NumConstantOffsetPtrArgs);
DEBUG_PRINT_STAT(NumAllocaArgs);		DEBUG_PRINT_STAT(NumAllocaArgs);
DEBUG_PRINT_STAT(NumConstantPtrCmps);		DEBUG_PRINT_STAT(NumConstantPtrCmps);
DEBUG_PRINT_STAT(NumConstantPtrDiffs);		DEBUG_PRINT_STAT(NumConstantPtrDiffs);
DEBUG_PRINT_STAT(NumInstructionsSimplified);		DEBUG_PRINT_STAT(NumInstructionsSimplified);
		DEBUG_PRINT_STAT(NumInstructions);
DEBUG_PRINT_STAT(SROACostSavings);		DEBUG_PRINT_STAT(SROACostSavings);
DEBUG_PRINT_STAT(SROACostSavingsLost);		DEBUG_PRINT_STAT(SROACostSavingsLost);
DEBUG_PRINT_STAT(ContainsNoDuplicateCall);		DEBUG_PRINT_STAT(ContainsNoDuplicateCall);
DEBUG_PRINT_STAT(Cost);		DEBUG_PRINT_STAT(Cost);
DEBUG_PRINT_STAT(Threshold);		DEBUG_PRINT_STAT(Threshold);
DEBUG_PRINT_STAT(VectorBonus);
#undef DEBUG_PRINT_STAT		#undef DEBUG_PRINT_STAT
}		}
#endif		#endif

INITIALIZE_PASS_BEGIN(InlineCostAnalysis, "inline-cost", "Inline Cost Analysis",		INITIALIZE_PASS_BEGIN(InlineCostAnalysis, "inline-cost", "Inline Cost Analysis",
true, true)		true, true)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

test/Transforms/Inline/vector-bonus.ll

This file was added.

				; RUN: opt < %s -inline -inline-threshold=35 -S \| FileCheck %s

				chandlercUnsubmitted Not Done Reply Inline Actions Did you try running this test without your change? I think you will find that it doesn't fail -- you don't pipe the output of opt to the FileCheck tool, and so we never check anything. I would suggest "testing your test" by ensuring your test fails with the old version of opt first. chandlerc: Did you try running this test without your change? I think you will find that it doesn't fail…
				define i32 @bar(<4 x i32> %v, i32 %i) #0 {
				entry:
				%cmp = icmp sgt i32 %i, 4
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%mul1 = mul nsw i32 %i, %i
				br label %return

				if.else: ; preds = %entry
				%add1 = add nsw i32 %i, %i
				%add2 = add nsw i32 %i, %i
				%add3 = add nsw i32 %i, %i
				%add4 = add nsw i32 %i, %i
				%add5 = add nsw i32 %i, %i
				%add6 = add nsw i32 %i, %i
				%vecext = extractelement <4 x i32> %v, i32 0
				%vecext7 = extractelement <4 x i32> %v, i32 1
				%add7 = add nsw i32 %vecext, %vecext7
				br label %return

				return: ; preds = %if.else, %if.then
				%retval.0 = phi i32 [ %mul1, %if.then ], [ %add7, %if.else ]
				ret i32 %retval.0
				}

				define i32 @foo(<4 x i32> %v, i32 %a) #1 {
				; CHECK-LABEL: @foo(
				; CHECK-NOT: call i32 @bar
				; CHECK: ret
				entry:
				%call = call i32 @bar(<4 x i32> %v, i32 %a)
				ret i32 %call
				}