This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Analysis/
-
Analysis/
10
InlineCost.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
-
alloca-bonus.ll
-
inline-cold-callee.ll
-
inline-cold.ll
-
inline-hot-callee.ll
-
inline-hot-callsite.ll
-
inline-optsize.ll
-
inline_unreachable-2.ll
-
optimization-remarks-passed-yaml.ll
-
ptr-diff.ll

Differential D24338

[InlineCost] Remove CallPenalty and change MinSizeThreshold to 5
AcceptedPublic

Authored by jmolloy on Sep 8 2016, 4:59 AM.

Download Raw Diff

Details

Reviewers

chandlerc
reames
mehdi_amini

Summary

Call instructions had a penalty of 25 applied during inline cost analysis. This seems nonsensical - calls surely don't cost any more than any other instruction when inlined. I presume that the logic was that a call dominates in terms of time so as to diminish the benefits of inlining other instructions. In reality this penalty was applied in some cases and not others, causing skewed behaviour and some pretty glaringly obvious errors. For example:

int g() {
  h();
}
int f() {
  g();
}

With an inline threshold set to zero (which *should* mean no code bloat), we won't inline g into f. This is because there's a call penalty of 25 applied for the call to h(), but there *isn't an equivalent bonus applied for removing the call to g()*! This has caused the inlining threshold for minsize to be set to a default of... 25. Which means that for functions that *don't* have a call in them, we'll accept around 5 instructions of code bloat in minimum size mode!

This call penalty has logic behind it but it's skewing decisions and causing us to try and counterbalance it in strange places. Nuke it.

This improves code size by a percentage point or so over the test-suite in minsize mode, but most importantly removes some really quite glaring clangers of inlining decisions and is a more sane default. The effects on -Os and -O3 is very minimal, because their thresholds are 125 and 225 respectively which are really quite large. However with functions with many calls inside this may change the inlining decision.

Zero would be the obvious choice for minsize, however I've noticed code size regressions when picking 0 - most of these are fixed with a threshold of 5. The reason for this is the cost model isn't accurate - being a little more aggressive allows us to take some risks that sometimes (or often) pay off. For example:

int g(int a) {
  h(0);
}
int f() {
  g();
}

With a threshold of zero we will not inline g into f, because the call to h requires one more argument to set up than the call to g it replaces. In reality this extra parameter will cause negligible code size impact, and we have the possibility of removing g entirely if nothing else references it (happens often when using C++ containers).

I've tested this on the test-suite and it gives decent results. Excluding TSVC (see later):

242 total test executables
68 have a code size improvement from between 0.01% - 4.6% (most of these are tiny, in the 0.01 - 1.00% range)
34 have a code size regression from between 0.01% - 5.3%  (most of these are tiny, in the 0.01 - 1.00% range)
0.07% geomean improvement in code size.

The TSVC benchmarks were excluded because they *improve* in codesize by 40% - their size almost halves. There are 36 of these, with code size improvements ranging from 32% to 45%.

7.55% geomean improvement in code size, with TSVC included in the sample.

These tests were done on Armv7, thumb mode, -Oz. I have confirmed that -O3 looks vaguely similar to how it did before (not surprising - the threshold there is 225 so it less likely to be affected by this change).

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 70678.Sep 8 2016, 4:59 AM

jmolloy retitled this revision from to [InlineCost] Remove CallPenalty and change MinSizeThreshold to 5.

jmolloy updated this object.

jmolloy added reviewers: chandlerc, reames.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

Herald added subscribers: eraman, mzolotukhin, aemerson. · View Herald TranscriptSep 8 2016, 4:59 AM

jmolloy mentioned this in D22261: [InlineCost] Set minsize inline threshold to 0.Sep 8 2016, 4:59 AM

Keeping in mind that the inliner threshold is (AFAIK) a balance between code-size and performance. With this view, it "makes sense" to have a different cost for calls: they can trigger spilling around the call site and they may need to push args on the stack. Of course the same cost should be applied to the call that is removed in the caller.
I'm not saying that 25 is the best approximation, indeed I'd expect that the number of args to the call play a role.

Hi,

I agree completely, and this is actually already the case. The argument setup is accounted for; currently *on top of* the call penalty.

I see the inline cost metric for performance being highly correlated to IR code size, as the big gains from inlining come from code simplification and constant propagation, which is reflected in number of resulting IR instructions.

Calls on Power are actually slower than a normal instruction, so inlining is even more important for performance. This probably needs some benchmarking and perhaps a way to conditionalize cost. A few more inline comments as well.

Thanks!

-eric

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1064 ↗	(On Diff #70678)	Whitespace :)
test/Transforms/Inline/ephemeral.ll
22 ↗	(On Diff #70678)	Since you're having to change the function here it seems reasonable to document it a bit more with how the structure of the function is affecting the inlining?
test/Transforms/Inline/inline_minisize.ll
25 ↗	(On Diff #70678)	What happened here?
200 ↗	(On Diff #70678)	not minsize?
test/Transforms/Inline/nonnull.ll
1 ↗	(On Diff #70678)	Not sure what to do about the tests that need a threshold change here, it seems slightly off to add a threshold to them. Thoughts on alternatives?

I agree that having a call penalty in addition to argument set up cost skews inline cost computation. Some high level comments:

This patch has two changes : removing the call penalty and reduce MinSizeThreshold. It is preferable to separate them.
The call penalty change doesn't affect just the code size, so it is important to measure the performance impact of this change.
Could you generate the patch with a large -U value to get full context?

lib/Analysis/InlineCost.cpp
1112	This needs more justification. The instruction gets its cost in the visitor. In addition you're adding InstrCost. Is this to simulate the effect of call with one argument or something else? Comments will be helpful.

eraman added a subscriber: davidxl.Nov 1 2016, 3:26 PM

The change makes sense overall. Easwaran mentioned this exact issue a short while ago.

include/llvm/Analysis/InlineCost.h
44 ↗	(On Diff #70678)	We should not remove this parameter. Instead it should be used in the opposite way. Currently the CallAnalyzer::analyzeCall only considers cost of parameter passing (subtracted from the overall cost), it in fact should subtract CallPenalty : Cost -= CallPenalty; to model cost of function call (call/ret, pro/epi logue). Note that the cost here means runtime cost. The current inline cost is highly tuned toward size cost.
lib/Analysis/InlineCost.cpp
1254	From the point of view of modelling runtime cost, this should really be Cost -= CallPenalty. The penalty should actually be defined by target.

echristo added inline comments.Nov 1 2016, 4:30 PM

include/llvm/Analysis/InlineCost.h
44 ↗	(On Diff #70678)	+1
lib/Analysis/InlineCost.cpp
1254	Agreed.

In D24338#585295, @eraman wrote:

I agree that having a call penalty in addition to argument set up cost skews inline cost computation.

What about register spills for instance? An IR call can really be more costly when lowered to machine code.

(Add inline the answer to David, so that there is the relevant context)

lib/Analysis/InlineCost.cpp
938	In D24338, @davidxl wrote: Do you mean parameter passing? That should be counted independently by the instruction visitor. No, I don’t mean the parameters, the visitor account for one instruction per parameter just before this indeed. However a call has other extra-cost on top of a regular instruction, even without any argument: it splits live-ranges and may cause spills around it to preserve registers that aren't callee-saved. In this sense, it make sense that a call can "cost" more than a normal instruction.

Hi all,

Thanks for the rush of comments! This has been on my back burner for a while, you've motivated me to try and push it forward :)

Having stared at this code for a long time now, my overall feeling is that, long term, we should try and move a lot of the inline cost logic around calls into TargetTransformInfo. TTI::getCallCost already does a similar cost calculation, just with fewer special cases. Doing this would allow the broad brush heuristics "CallPenalty" and "InliningMultiplier" to be removed entirely, as targets that cared about them could simply override getCallCost.

That's a large change however, and is likely to change cost calculations. The updated diff is essentially what you all have suggested, and I think gets us one stage of the way towards the longer term goal above.

Most uses of CallPenalty are moved into TargetTransformInfo, with a default value the same as InlineConstants::CallPenalty. InlineConstants::CallPenalty stays, because it is also used as a default cost for a soft-float library call (this could probably be renamed, thinking about it).

I've removed the skew by subtracting CallPenalty when setting up the initial threshold.

CallPenalty is calculated from TargetTransformInfo, but is overridable from the command line. This is very useful for inlining tests - while the penalty has a use in practice, when devising small testcases it does get in the way. This allows the testcases to stay small, and makes sense in most cases because the tests are already overriding the inlining cost.

This is the first step, and shouldn't cause much impact at all in performance. The overall effect is actually the same as my previous patchset, and I benchmarked that one very heavily and found no glaring performance issues.

The next step after this would be to sort out -Oz, which is affected by this change the most. It makes sense to reduce the call penalty to zero when optimizing purely for size (-Oz) and to change the -Oz inline threshold to zero to compensate. We'll then have sensibly modelled inlining behaviour when focussing on size.

jmolloy updated this revision to Diff 76702.Nov 2 2016, 7:26 AM

Thanks, that looks good as a first step! See one inline comment.

lib/Analysis/InlineCost.cpp
1272	I think this should be a separate patch, all the rest of this revision would be NFC

Hi Mehdi,

Thanks! Would you prefer I strip that out here and submit another phab review, or is it OK to split them apart upon committing?

Cheers,

James

davidxl added inline comments.Nov 3 2016, 7:55 PM

include/llvm/Analysis/TargetTransformInfo.h
188 ↗	(On Diff #76702)	I think it is a mistake to conflate two different things here. One is the penalty of 'not' inlining a callsite (aka the benefit of inlining it). This penalty models the branch(call)/return, function prologue and epilogue etc. It can also model the size impact of call instruction. The other is the overhead of a newly exposed callsite from the inline instance. The latter models the potential caller save register spills etc. The former is the one that needs to defined in TTI. For the latter, I think it should stay in Inliner proper. In the future, it can be replaced by target dependent register pressure analysis. In other words, this interface should be getCallPenalty().
include/llvm/Analysis/TargetTransformInfoImpl.h
132 ↗	(On Diff #76702)	Why does it need to be tied with TCC_Basic? Why not just defining an independent default?
lib/Analysis/InlineCost.cpp
110	This is a good parameter to be added to InlineParams struct
943	This one should use a different parameter from CallPenalty (that models overhead of exposed callsites -- which does not need to live in TTI)
1206	A more common way is to check getNumOfOccurences.
1207	The computation here should be pushed to getInlineParams
1209	Why not directly using the TTI returned value?
lib/Transforms/IPO/Inliner.cpp
288 ↗	(On Diff #76702)	should -1 be removed too?

In D24338#586539, @jmolloy wrote:

Hi Mehdi,

Thanks! Would you prefer I strip that out here and submit another phab review, or is it OK to split them apart upon committing?

I'd go ahead and strip them out so the review can continue without conflating them? (Mostly because the review seems to continue...)

I'm even fine if you want to land that part of this first, and then clean up the layering with TTI, but should check with David.

chandlerc added inline comments.Nov 4 2016, 1:14 AM

include/llvm/Analysis/TargetTransformInfo.h
188 ↗	(On Diff #76702)	Sorry I'm a bit late to the thread, but I'm confused by this and some of the other comments. There are definitely two concepts here: the cost of having a call in the instruction stream, and the expected "benefit" of inlining. But I don't think `CallPenalty` is really being used to model the latter concept these days. The threshold is what models the latter concept. So originally, in early 2010, we started using `CallPenalty` scaled by the number of arguments to track the the "caller cost of having a call". Then later in 2010, we replaced this usage of `CallPenalty` with a new and completely unrelated usage to model the fact that "big" functions are "slow". Even though it was applied to a threshold that is primarily size based. Then a few years later we completely changed how we even think about inline cost. Since then, the idea of functions being either "big" or "slow" and modeling that with `CallPenalty`, IMO, no longer makes sense in the inline cost analysis at all. So I suspect that the current usage of `CallPenalty` is just wrong. If it is doing anything, it is augmenting the basic call cost to fix a failure for it to account for caller (size) cost of having a call instruction. My only real concern with adding an API is understanding why we can't just make this what the existing `getCallCost` do exactly this. In fact, it already does most of this. Is there just some way we can adjust it to allow us to transition from `CallPenalty` to `getCallCost` to model the caller-side cost? If the name is too confusing, we could rename it `getCallerCallCost` or some such... However, I would not merge `getInliningThresholdMultiplier` into this. That is the target's tool to adjust the threshold which models the other thing you are talking about David. I agree they should remain separate. If we need more granular control of the threshold in a target, we should just add an API that has that granularity (with an understanding of why it is needed). And I still think it is reasonable to factor the caller cost into the threshold (after it is scaled), as it does model one aspect of the "benefit" of inlining. The threshold itself models the rest, and the multiplier can scale it appropriately.
lib/Analysis/InlineCost.cpp
110	See above, but as a consequence I disagree. I think this is just a cost modeling issue, and not a threshold issue. The InlineParams should be concerned with setting up the threshold to model the "benefit".
1209	Agreed. I think this will become much more straight-forward when it is expressed as a cost and thus can use normal cost values to configure itself.

davidxl added inline comments.Nov 4 2016, 2:16 PM

include/llvm/Analysis/TargetTransformInfo.h
188 ↗	(On Diff #76702)	I agree it is confusing. My suggestion of handling runtime cost modeling in this patch actually makes it worse, so I take it back -- the runtime cost modeling part can be handled later in a different patch. So let's go back and focus on size modeling intended by Jame's original patch. The fundamental problem Jame's noticed is that the cost (size) modeling of the original callsite (inline candidate) and the cost of introduced new site is not handled consistently. Example caller () { callee (); // to be inlined; } callee () { new_cs1(); .. } When computing the cost of inlining callee(), the inline cost analysis first subtract the callsite cost -- but only partially: it subtracts the cost associated with parameter passing, but does not subtract other size penalty associated with the eliminated callsite (presumably higher than base instruction cost). However, when considering the cost of new_cs1(), it not only adds the parameter passing cost, but also the additional penalty. Theoretically, if a simple wrapper function is inlined, there should be zero size impact, but in the current implementation, the wrapper function is blocked from being inlined. In short, I think the simplest fix to the problem is to make the cost adjustment for the original and new callsites consistent -- it is very similar to the first version of the patch with the difference that CallPenalty is retained. Regarding 'getCallCost' -- it computes the cost of both parameter passing and call itself. Ideally it should be used in inline cost adjustment, but unfortunately implemented slightly differently -- e.g., the parameter passing cost computation is less sophisticated than what inliner uses.

Hi David and Chandler,

Thanks for all these comments. I'm glad the two main reviewers agree with each other at this point :)

I've stripped this patch down to the basics - removing the skew. This one should be a no-brainer. Following up to this, I'd like to, in separate patches:

Change the minsize threshold to 5, as I've determined through testing that setting it to zero leaves codesize and performance on the table due to the inaccuracy of the IR cost model.
Improve TTI::getCallCost(), teaching it the tricks the inliner knows.
Hopefully eventually switch to using getCallCost() in the inliner.

Does this sound acceptable?

Cheers,

James

LGTM
(but you should wait a little to give an opportunity to chandlerc / davidxl to approve as well).

This revision is now accepted and ready to land.Nov 7 2016, 8:41 AM

This one looks good.

Easwaran, can you help measure the performance impact of this change with internal benchmarks?

In D24338#588190, @davidxl wrote:

This one looks good.

Easwaran, can you help measure the performance impact of this change with internal benchmarks?

This is hurting one of our internal benchmarks by > 5%. I'm looking into the root cause now.

In D24338#589426, @eraman wrote:

In D24338#588190, @davidxl wrote:

This one looks good.

Easwaran, can you help measure the performance impact of this change with internal benchmarks?

This is hurting one of our internal benchmarks by > 5%. I'm looking into the root cause now.

It turns out this is due to a bug in a local inliner patch we have that got exposed by this (and that prevented one crucial function from being inlined). This patch itself LGTM.

Revision Contents

Path

Size

lib/

Analysis/

InlineCost.cpp

4 lines

test/

Transforms/

Inline/

alloca-bonus.ll

6 lines

inline-cold-callee.ll

6 lines

inline-cold.ll

4 lines

inline-hot-callee.ll

6 lines

inline-hot-callsite.ll

6 lines

inline-optsize.ll

3 lines

inline_unreachable-2.ll

3 lines

optimization-remarks-passed-yaml.ll

4 lines

ptr-diff.ll

11 lines

Diff 77027

lib/Analysis/InlineCost.cpp

	Show First 20 Lines • Show All 671 Lines • ▼ Show 20 Lines
	if (TTI.isLoweredToCall(F)) {			if (TTI.isLoweredToCall(F)) {
	// We account for the average 1 instruction per call argument setup			// We account for the average 1 instruction per call argument setup
	// here.			// here.
	Cost += CS.arg_size() * InlineConstants::InstrCost;			Cost += CS.arg_size() * InlineConstants::InstrCost;

	// Everything other than inline ASM will also have a significant cost			// Everything other than inline ASM will also have a significant cost
	// merely from making the call.			// merely from making the call.
	if (!isa<InlineAsm>(CS.getCalledValue()))			if (!isa<InlineAsm>(CS.getCalledValue()))
	Cost += InlineConstants::CallPenalty;			Cost += InlineConstants::CallPenalty;
	mehdi_aminiUnsubmitted Not Done Reply Inline Actions In D24338, @davidxl wrote: Do you mean parameter passing? That should be counted independently by the instruction visitor. No, I don’t mean the parameters, the visitor account for one instruction per parameter just before this indeed. However a call has other extra-cost on top of a regular instruction, even without any argument: it splits live-ranges and may cause spills around it to preserve registers that aren't callee-saved. In this sense, it make sense that a call can "cost" more than a normal instruction. mehdi_amini: >>! In D24338, @davidxl wrote: > Do you mean parameter passing? That should be counted…
	}			}

	return Base::visitCallSite(CS);			return Base::visitCallSite(CS);
	}			}

				davidxlUnsubmitted Not Done Reply Inline Actions This one should use a different parameter from CallPenalty (that models overhead of exposed callsites -- which does not need to live in TTI) davidxl: This one should use a different parameter from CallPenalty (that models overhead of exposed…
	// Otherwise we're in a very special case -- an indirect function call. See			// Otherwise we're in a very special case -- an indirect function call. See
	// if we can be particularly clever about this.			// if we can be particularly clever about this.
	Value *Callee = CS.getCalledValue();			Value *Callee = CS.getCalledValue();

	// First, pay the price of the argument setup. We account for the average			// First, pay the price of the argument setup. We account for the average
	// 1 instruction per call argument setup here.			// 1 instruction per call argument setup here.
	Cost += CS.arg_size() * InlineConstants::InstrCost;			Cost += CS.arg_size() * InlineConstants::InstrCost;

	▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines
	bool hasSoftFloatAttr = false;			bool hasSoftFloatAttr = false;

	// If the function has the "use-soft-float" attribute, mark it as			// If the function has the "use-soft-float" attribute, mark it as
	// expensive.			// expensive.
	if (F.hasFnAttribute("use-soft-float")) {			if (F.hasFnAttribute("use-soft-float")) {
	Attribute Attr = F.getFnAttribute("use-soft-float");			Attribute Attr = F.getFnAttribute("use-soft-float");
	StringRef Val = Attr.getValueAsString();			StringRef Val = Attr.getValueAsString();
	if (Val == "true")			if (Val == "true")
	hasSoftFloatAttr = true;			hasSoftFloatAttr = true;
				eramanUnsubmitted Not Done Reply Inline Actions This needs more justification. The instruction gets its cost in the visitor. In addition you're adding InstrCost. Is this to simulate the effect of call with one argument or something else? Comments will be helpful. eraman: This needs more justification. The instruction gets its cost in the visitor. In addition you're…
	}			}

	if (TTI.getFPOpCost(I->getType()) == TargetTransformInfo::TCC_Expensive \|\|			if (TTI.getFPOpCost(I->getType()) == TargetTransformInfo::TCC_Expensive \|\|
	hasSoftFloatAttr)			hasSoftFloatAttr)
	Cost += InlineConstants::CallPenalty;			Cost += InlineConstants::CallPenalty;
	}			}

	// If the instruction simplified to a constant, there is no cost to this			// If the instruction simplified to a constant, there is no cost to this
	▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	++NumCallsAnalyzed;			++NumCallsAnalyzed;

	// Perform some tweaks to the cost and threshold based on the direct			// Perform some tweaks to the cost and threshold based on the direct
	// callsite information.			// callsite information.

	// We want to more aggressively inline vector-dense kernels, so up the			// We want to more aggressively inline vector-dense kernels, so up the
	// threshold, and we'll lower it if the % of vector instructions gets too			// threshold, and we'll lower it if the % of vector instructions gets too
	// low. Note that these bonuses are some what arbitrary and evolved over time			// low. Note that these bonuses are some what arbitrary and evolved over time
	// by accident as much as because they are principled bonuses.			// by accident as much as because they are principled bonuses.
				davidxlUnsubmitted Not Done Reply Inline Actions A more common way is to check getNumOfOccurences. davidxl: A more common way is to check getNumOfOccurences.
	//			//
				davidxlUnsubmitted Not Done Reply Inline Actions The computation here should be pushed to getInlineParams davidxl: The computation here should be pushed to getInlineParams
	// FIXME: It would be nice to remove all such bonuses. At least it would be			// FIXME: It would be nice to remove all such bonuses. At least it would be
	// nice to base the bonus values on something more scientific.			// nice to base the bonus values on something more scientific.
				davidxlUnsubmitted Not Done Reply Inline Actions Why not directly using the TTI returned value? davidxl: Why not directly using the TTI returned value?
				chandlercUnsubmitted Not Done Reply Inline Actions Agreed. I think this will become much more straight-forward when it is expressed as a cost and thus can use normal cost values to configure itself. chandlerc: Agreed. I think this will become much more straight-forward when it is expressed as a cost and…
	assert(NumInstructions == 0);			assert(NumInstructions == 0);
	assert(NumVectorInstructions == 0);			assert(NumVectorInstructions == 0);

	// Update the threshold based on callsite properties			// Update the threshold based on callsite properties
	updateThreshold(CS, F);			updateThreshold(CS, F);

	FiftyPercentVectorBonus = 3 * Threshold / 2;			FiftyPercentVectorBonus = 3 * Threshold / 2;
	TenPercentVectorBonus = 3 * Threshold / 4;			TenPercentVectorBonus = 3 * Threshold / 4;
	Show All 28 Lines
	// FIXME: The maxStoresPerMemcpy setting from the target should be used			// FIXME: The maxStoresPerMemcpy setting from the target should be used
	// here instead of a magic number of 8, but it's not available via			// here instead of a magic number of 8, but it's not available via
	// DataLayout.			// DataLayout.
	NumStores = std::min(NumStores, 8U);			NumStores = std::min(NumStores, 8U);

	Cost -= 2 * NumStores * InlineConstants::InstrCost;			Cost -= 2 * NumStores * InlineConstants::InstrCost;
	} else {			} else {
	// For non-byval arguments subtract off one instruction per call			// For non-byval arguments subtract off one instruction per call
	// argument.			// argument.
				davidxlUnsubmitted Not Done Reply Inline Actions From the point of view of modelling runtime cost, this should really be Cost -= CallPenalty. The penalty should actually be defined by target. davidxl: From the point of view of modelling runtime cost, this should really be Cost -= CallPenalty.
				echristoUnsubmitted Not Done Reply Inline Actions Agreed. echristo: Agreed.
	Cost -= InlineConstants::InstrCost;			Cost -= InlineConstants::InstrCost;
	}			}
	}			}
				// The call instruction also disappears after inlining.
				Cost -= InlineConstants::InstrCost + InlineConstants::CallPenalty;

	// If there is only one call of the function, and it has internal linkage,			// If there is only one call of the function, and it has internal linkage,
	// the cost of inlining it drops dramatically.			// the cost of inlining it drops dramatically.
	bool OnlyOneCallAndLocalLinkage =			bool OnlyOneCallAndLocalLinkage =
	F.hasLocalLinkage() && F.hasOneUse() && &F == CS.getCalledFunction();			F.hasLocalLinkage() && F.hasOneUse() && &F == CS.getCalledFunction();
	if (OnlyOneCallAndLocalLinkage)			if (OnlyOneCallAndLocalLinkage)
	Cost -= InlineConstants::LastCallToStaticBonus;			Cost -= InlineConstants::LastCallToStaticBonus;

	// If this function uses the coldcc calling convention, prefer not to inline			// If this function uses the coldcc calling convention, prefer not to inline
	// it.			// it.
	if (F.getCallingConv() == CallingConv::Cold)			if (F.getCallingConv() == CallingConv::Cold)
	Cost += InlineConstants::ColdccPenalty;			Cost += InlineConstants::ColdccPenalty;

				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I think this should be a separate patch, all the rest of this revision would be NFC mehdi_amini: I think this should be a separate patch, all the rest of this revision would be NFC
	// Check if we're done. This can happen due to bonuses and penalties.			// Check if we're done. This can happen due to bonuses and penalties.
	if (Cost > Threshold)			if (Cost > Threshold)
	return false;			return false;

	if (F.empty())			if (F.empty())
	return true;			return true;

	Function *Caller = CS.getInstruction()->getParent()->getParent();			Function *Caller = CS.getInstruction()->getParent()->getParent();
	▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

test/Transforms/Inline/alloca-bonus.ll

Show All 16 Lines
define void @inner1(i32 *%ptr) {		define void @inner1(i32 *%ptr) {
%A = load i32, i32* %ptr		%A = load i32, i32* %ptr
store i32 0, i32* %ptr		store i32 0, i32* %ptr
%C = getelementptr inbounds i32, i32* %ptr, i32 0		%C = getelementptr inbounds i32, i32* %ptr, i32 0
%D = getelementptr inbounds i32, i32* %ptr, i32 1		%D = getelementptr inbounds i32, i32* %ptr, i32 1
%E = bitcast i32* %ptr to i8*		%E = bitcast i32* %ptr to i8*
%F = select i1 false, i32* %ptr, i32* @glbl		%F = select i1 false, i32* %ptr, i32* @glbl
call void @llvm.lifetime.start(i64 0, i8* %E)		call void @llvm.lifetime.start(i64 0, i8* %E)
		call void @extern()
ret void		ret void
}		}

define void @outer2() {		define void @outer2() {
; CHECK-LABEL: @outer2(		; CHECK-LABEL: @outer2(
; CHECK: call void @inner2		; CHECK: call void @inner2
%ptr = alloca i32		%ptr = alloca i32
call void @inner2(i32* %ptr)		call void @inner2(i32* %ptr)
ret void		ret void
}		}

; %D poisons this call, scalar-repl can't handle that instruction.		; %D poisons this call, scalar-repl can't handle that instruction.
define void @inner2(i32 *%ptr) {		define void @inner2(i32 *%ptr) {
%A = load i32, i32* %ptr		%A = load i32, i32* %ptr
store i32 0, i32* %ptr		store i32 0, i32* %ptr
%C = getelementptr inbounds i32, i32* %ptr, i32 0		%C = getelementptr inbounds i32, i32* %ptr, i32 0
%D = getelementptr inbounds i32, i32* %ptr, i32 %A		%D = getelementptr inbounds i32, i32* %ptr, i32 %A
%E = bitcast i32* %ptr to i8*		%E = bitcast i32* %ptr to i8*
%F = select i1 false, i32* %ptr, i32* @glbl		%F = select i1 false, i32* %ptr, i32* @glbl
call void @llvm.lifetime.start(i64 0, i8* %E)		call void @llvm.lifetime.start(i64 0, i8* %E)
		call void @extern()
ret void		ret void
}		}

define void @outer3() {		define void @outer3() {
; CHECK-LABEL: @outer3(		; CHECK-LABEL: @outer3(
; CHECK-NOT: call void @inner3		; CHECK-NOT: call void @inner3
%ptr = alloca i32		%ptr = alloca i32
call void @inner3(i32* %ptr, i1 undef)		call void @inner3(i32* %ptr, i1 undef)
ret void		ret void
}		}

define void @inner3(i32 *%ptr, i1 %x) {		define void @inner3(i32 *%ptr, i1 %x) {
%A = icmp eq i32* %ptr, null		%A = icmp eq i32* %ptr, null
%B = and i1 %x, %A		%B = and i1 %x, %A
		call void @extern()
br i1 %A, label %bb.true, label %bb.false		br i1 %A, label %bb.true, label %bb.false
bb.true:		bb.true:
; This block musn't be counted in the inline cost.		; This block musn't be counted in the inline cost.
%t1 = load i32, i32* %ptr		%t1 = load i32, i32* %ptr
%t2 = add i32 %t1, 1		%t2 = add i32 %t1, 1
%t3 = add i32 %t2, 1		%t3 = add i32 %t2, 1
%t4 = add i32 %t3, 1		%t4 = add i32 %t3, 1
%t5 = add i32 %t4, 1		%t5 = add i32 %t4, 1
Show All 25 Lines	; CHECK-NOT: call void @inner4
ret void		ret void
}		}

; %B poisons this call, scalar-repl can't handle that instruction. However, we		; %B poisons this call, scalar-repl can't handle that instruction. However, we
; still want to detect that the icmp and branch can be handled.		; still want to detect that the icmp and branch can be handled.
define void @inner4(i32 *%ptr, i32 %A) {		define void @inner4(i32 *%ptr, i32 %A) {
%B = getelementptr inbounds i32, i32* %ptr, i32 %A		%B = getelementptr inbounds i32, i32* %ptr, i32 %A
%C = icmp eq i32* %ptr, null		%C = icmp eq i32* %ptr, null
		call void @extern()
br i1 %C, label %bb.true, label %bb.false		br i1 %C, label %bb.true, label %bb.false
bb.true:		bb.true:
; This block musn't be counted in the inline cost.		; This block musn't be counted in the inline cost.
%t1 = load i32, i32* %ptr		%t1 = load i32, i32* %ptr
%t2 = add i32 %t1, 1		%t2 = add i32 %t1, 1
%t3 = add i32 %t2, 1		%t3 = add i32 %t2, 1
%t4 = add i32 %t3, 1		%t4 = add i32 %t3, 1
%t5 = add i32 %t4, 1		%t5 = add i32 %t4, 1
Show All 26 Lines
}		}

; %D poisons this call, scalar-repl can't handle that instruction. However, if		; %D poisons this call, scalar-repl can't handle that instruction. However, if
; the flag is set appropriately, the poisoning instruction is inside of dead		; the flag is set appropriately, the poisoning instruction is inside of dead
; code, and so shouldn't be counted.		; code, and so shouldn't be counted.
define void @inner5(i1 %flag, i32 *%ptr) {		define void @inner5(i1 %flag, i32 *%ptr) {
%A = load i32, i32* %ptr		%A = load i32, i32* %ptr
store i32 0, i32* %ptr		store i32 0, i32* %ptr
		call void @extern()
%C = getelementptr inbounds i32, i32* %ptr, i32 0		%C = getelementptr inbounds i32, i32* %ptr, i32 0
br i1 %flag, label %if.then, label %exit		br i1 %flag, label %if.then, label %exit

if.then:		if.then:
%D = getelementptr inbounds i32, i32* %ptr, i32 %A		%D = getelementptr inbounds i32, i32* %ptr, i32 %A
%E = bitcast i32* %ptr to i8*		%E = bitcast i32* %ptr to i8*
%F = select i1 false, i32* %ptr, i32* @glbl		%F = select i1 false, i32* %ptr, i32* @glbl
call void @llvm.lifetime.start(i64 0, i8* %E)		call void @llvm.lifetime.start(i64 0, i8* %E)
ret void		ret void

exit:		exit:
ret void		ret void
}		}

		declare void @extern()

test/Transforms/Inline/inline-cold-callee.ll

	; RUN: opt < %s -inline -inlinecold-threshold=0 -S \| FileCheck %s			; RUN: opt < %s -inline -inlinecold-threshold=0 -S \| FileCheck %s

	; This tests that a cold callee gets the (lower) inlinecold-threshold even without			; This tests that a cold callee gets the (lower) inlinecold-threshold even without
	; Cold hint and does not get inlined because the cost exceeds the inlinecold-threshold.			; Cold hint and does not get inlined because the cost exceeds the inlinecold-threshold.
	; A callee with identical body does gets inlined because cost fits within the			; A callee with identical body does gets inlined because cost fits within the
	; inline-threshold			; inline-threshold

	define i32 @callee1(i32 %x) !prof !21 {			define i32 @callee1(i32 %x) !prof !21 {
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1
				call void @extern()
	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @callee2(i32 %x) !prof !22 {			define i32 @callee2(i32 %x) !prof !22 {
	; CHECK-LABEL: @callee2(			; CHECK-LABEL: @callee2(
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1
				call void @extern()
	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @caller2(i32 %y1) !prof !22 {			define i32 @caller2(i32 %y1) !prof !22 {
	; CHECK-LABEL: @caller2(			; CHECK-LABEL: @caller2(
	; CHECK: call i32 @callee2			; CHECK: call i32 @callee2
	; CHECK-NOT: call i32 @callee1			; CHECK-NOT: call i32 @callee1
	; CHECK: ret i32 %x3.i			; CHECK: ret i32 %x3.i
	%y2 = call i32 @callee2(i32 %y1)			%y2 = call i32 @callee2(i32 %y1)
	%y3 = call i32 @callee1(i32 %y2)			%y3 = call i32 @callee1(i32 %y2)
	ret i32 %y3			ret i32 %y3
	}			}

				declare void @extern()

	!llvm.module.flags = !{!1}			!llvm.module.flags = !{!1}
	!21 = !{!"function_entry_count", i64 100}			!21 = !{!"function_entry_count", i64 100}
	!22 = !{!"function_entry_count", i64 1}			!22 = !{!"function_entry_count", i64 1}

	!1 = !{i32 1, !"ProfileSummary", !2}			!1 = !{i32 1, !"ProfileSummary", !2}
	!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}			!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
	!3 = !{!"ProfileFormat", !"InstrProf"}			!3 = !{!"ProfileFormat", !"InstrProf"}
	!4 = !{!"TotalCount", i64 10000}			!4 = !{!"TotalCount", i64 10000}
	Show All 10 Lines

test/Transforms/Inline/inline-cold.ll

Show All 11 Lines

@a = global i32 4		@a = global i32 4

; This function should be larger than the cold threshold (75), but smaller		; This function should be larger than the cold threshold (75), but smaller
; than the regular threshold.		; than the regular threshold.
; Function Attrs: nounwind readnone uwtable		; Function Attrs: nounwind readnone uwtable
define i32 @simpleFunction(i32 %a) #0 {		define i32 @simpleFunction(i32 %a) #0 {
entry:		entry:
		call void @extern()
%a1 = load volatile i32, i32* @a		%a1 = load volatile i32, i32* @a
%x1 = add i32 %a1, %a1		%x1 = add i32 %a1, %a1
%a2 = load volatile i32, i32* @a		%a2 = load volatile i32, i32* @a
%x2 = add i32 %x1, %a2		%x2 = add i32 %x1, %a2
%a3 = load volatile i32, i32* @a		%a3 = load volatile i32, i32* @a
%x3 = add i32 %x2, %a3		%x3 = add i32 %x2, %a3
%a4 = load volatile i32, i32* @a		%a4 = load volatile i32, i32* @a
%x4 = add i32 %x3, %a4		%x4 = add i32 %x3, %a4
Show All 21 Lines
define i32 @ColdFunction(i32 %a) #1 {		define i32 @ColdFunction(i32 %a) #1 {
; CHECK-LABEL: @ColdFunction		; CHECK-LABEL: @ColdFunction
; CHECK: ret		; CHECK: ret
; OVERRIDE-LABEL: @ColdFunction		; OVERRIDE-LABEL: @ColdFunction
; OVERRIDE: ret		; OVERRIDE: ret
; DEFAULT-LABEL: @ColdFunction		; DEFAULT-LABEL: @ColdFunction
; DEFAULT: ret		; DEFAULT: ret
entry:		entry:
		call void @extern()
%a1 = load volatile i32, i32* @a		%a1 = load volatile i32, i32* @a
%x1 = add i32 %a1, %a1		%x1 = add i32 %a1, %a1
%a2 = load volatile i32, i32* @a		%a2 = load volatile i32, i32* @a
%x2 = add i32 %x1, %a2		%x2 = add i32 %x1, %a2
%a3 = load volatile i32, i32* @a		%a3 = load volatile i32, i32* @a
%x3 = add i32 %x2, %a3		%x3 = add i32 %x2, %a3
%a4 = load volatile i32, i32* @a		%a4 = load volatile i32, i32* @a
%x4 = add i32 %x3, %a4		%x4 = add i32 %x3, %a4
Show All 21 Lines
define i32 @ColdFunction2(i32 %a) #1 {		define i32 @ColdFunction2(i32 %a) #1 {
; CHECK-LABEL: @ColdFunction2		; CHECK-LABEL: @ColdFunction2
; CHECK: ret		; CHECK: ret
; OVERRIDE-LABEL: @ColdFunction2		; OVERRIDE-LABEL: @ColdFunction2
; OVERRIDE: ret		; OVERRIDE: ret
; DEFAULT-LABEL: @ColdFunction2		; DEFAULT-LABEL: @ColdFunction2
; DEFAULT: ret		; DEFAULT: ret
entry:		entry:
		call void @extern()
%a1 = load volatile i32, i32* @a		%a1 = load volatile i32, i32* @a
%x1 = add i32 %a1, %a1		%x1 = add i32 %a1, %a1
%a2 = load volatile i32, i32* @a		%a2 = load volatile i32, i32* @a
%x2 = add i32 %x1, %a2		%x2 = add i32 %x1, %a2
%a3 = load volatile i32, i32* @a		%a3 = load volatile i32, i32* @a
%x3 = add i32 %x2, %a3		%x3 = add i32 %x2, %a3
%a4 = load volatile i32, i32* @a		%a4 = load volatile i32, i32* @a
%x4 = add i32 %x3, %a4		%x4 = add i32 %x3, %a4
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	entry:
%0 = tail call i32 @ColdFunction(i32 5)		%0 = tail call i32 @ColdFunction(i32 5)
%1 = tail call i32 @simpleFunction(i32 6)		%1 = tail call i32 @simpleFunction(i32 6)
%2 = tail call i32 @ColdFunction2(i32 5)		%2 = tail call i32 @ColdFunction2(i32 5)
%3 = add i32 %0, %1		%3 = add i32 %0, %1
%add = add i32 %2, %3		%add = add i32 %2, %3
ret i32 %add		ret i32 %add
}		}

		declare void @extern()
attributes #0 = { nounwind readnone uwtable }		attributes #0 = { nounwind readnone uwtable }
attributes #1 = { nounwind cold readnone uwtable }		attributes #1 = { nounwind cold readnone uwtable }

test/Transforms/Inline/inline-hot-callee.ll

	; RUN: opt < %s -inline -inline-threshold=0 -inlinehint-threshold=100 -S \| FileCheck %s			; RUN: opt < %s -inline -inline-threshold=0 -inlinehint-threshold=100 -S \| FileCheck %s

	; This tests that a hot callee gets the (higher) inlinehint-threshold even without			; This tests that a hot callee gets the (higher) inlinehint-threshold even without
	; inline hints and gets inlined because the cost is less than inlinehint-threshold.			; inline hints and gets inlined because the cost is less than inlinehint-threshold.
	; A cold callee with identical body does not get inlined because cost exceeds the			; A cold callee with identical body does not get inlined because cost exceeds the
	; inline-threshold			; inline-threshold

	define i32 @callee1(i32 %x) !prof !21 {			define i32 @callee1(i32 %x) !prof !21 {
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1
				call void @extern()
	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @callee2(i32 %x) !prof !22 {			define i32 @callee2(i32 %x) !prof !22 {
	; CHECK-LABEL: @callee2(			; CHECK-LABEL: @callee2(
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1
				call void @extern()
	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @caller2(i32 %y1) !prof !22 {			define i32 @caller2(i32 %y1) !prof !22 {
	; CHECK-LABEL: @caller2(			; CHECK-LABEL: @caller2(
	; CHECK: call i32 @callee2			; CHECK: call i32 @callee2
	; CHECK-NOT: call i32 @callee1			; CHECK-NOT: call i32 @callee1
	; CHECK: ret i32 %x3.i			; CHECK: ret i32 %x3.i
	%y2 = call i32 @callee2(i32 %y1)			%y2 = call i32 @callee2(i32 %y1)
	%y3 = call i32 @callee1(i32 %y2)			%y3 = call i32 @callee1(i32 %y2)
	ret i32 %y3			ret i32 %y3
	}			}

				declare void @extern()

	!llvm.module.flags = !{!1}			!llvm.module.flags = !{!1}
	!21 = !{!"function_entry_count", i64 300}			!21 = !{!"function_entry_count", i64 300}
	!22 = !{!"function_entry_count", i64 1}			!22 = !{!"function_entry_count", i64 1}

	!1 = !{i32 1, !"ProfileSummary", !2}			!1 = !{i32 1, !"ProfileSummary", !2}
	!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}			!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
	!3 = !{!"ProfileFormat", !"InstrProf"}			!3 = !{!"ProfileFormat", !"InstrProf"}
	!4 = !{!"TotalCount", i64 10000}			!4 = !{!"TotalCount", i64 10000}
	Show All 10 Lines

test/Transforms/Inline/inline-hot-callsite.ll

	; RUN: opt < %s -inline -inline-threshold=0 -hot-callsite-threshold=100 -S \| FileCheck %s			; RUN: opt < %s -inline -inline-threshold=0 -hot-callsite-threshold=100 -S \| FileCheck %s

	; This tests that a hot callsite gets the (higher) inlinehint-threshold even without			; This tests that a hot callsite gets the (higher) inlinehint-threshold even without
	; without inline hints and gets inlined because the cost is less than			; without inline hints and gets inlined because the cost is less than
	; inlinehint-threshold. A cold callee with identical body does not get inlined because			; inlinehint-threshold. A cold callee with identical body does not get inlined because
	; cost exceeds the inline-threshold			; cost exceeds the inline-threshold

	define i32 @callee1(i32 %x) {			define i32 @callee1(i32 %x) {
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1
				call void @extern()
	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @callee2(i32 %x) {			define i32 @callee2(i32 %x) {
	; CHECK-LABEL: @callee2(			; CHECK-LABEL: @callee2(
	%x1 = add i32 %x, 1			%x1 = add i32 %x, 1
	%x2 = add i32 %x1, 1			%x2 = add i32 %x1, 1
	%x3 = add i32 %x2, 1			%x3 = add i32 %x2, 1
				call void @extern()
	ret i32 %x3			ret i32 %x3
	}			}

	define i32 @caller2(i32 %y1) {			define i32 @caller2(i32 %y1) {
	; CHECK-LABEL: @caller2(			; CHECK-LABEL: @caller2(
	; CHECK: call i32 @callee2			; CHECK: call i32 @callee2
	; CHECK-NOT: call i32 @callee1			; CHECK-NOT: call i32 @callee1
	; CHECK: ret i32 %x3.i			; CHECK: ret i32 %x3.i
	%y2 = call i32 @callee2(i32 %y1), !prof !22			%y2 = call i32 @callee2(i32 %y1), !prof !22
	%y3 = call i32 @callee1(i32 %y2), !prof !21			%y3 = call i32 @callee1(i32 %y2), !prof !21
	ret i32 %y3			ret i32 %y3
	}			}

				declare void @extern()

	!llvm.module.flags = !{!1}			!llvm.module.flags = !{!1}
	!21 = !{!"branch_weights", i64 300}			!21 = !{!"branch_weights", i64 300}
	!22 = !{!"branch_weights", i64 1}			!22 = !{!"branch_weights", i64 1}

	!1 = !{i32 1, !"ProfileSummary", !2}			!1 = !{i32 1, !"ProfileSummary", !2}
	!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}			!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
	!3 = !{!"ProfileFormat", !"InstrProf"}			!3 = !{!"ProfileFormat", !"InstrProf"}
	!4 = !{!"TotalCount", i64 10000}			!4 = !{!"TotalCount", i64 10000}
	Show All 10 Lines

test/Transforms/Inline/inline-optsize.ll

	; RUN: opt -S -Oz < %s \| FileCheck %s -check-prefix=OZ			; RUN: opt -S -Oz < %s \| FileCheck %s -check-prefix=OZ
	; RUN: opt -S -O2 < %s \| FileCheck %s -check-prefix=O2			; RUN: opt -S -O2 < %s \| FileCheck %s -check-prefix=O2
	; RUN: opt -S -Os < %s \| FileCheck %s -check-prefix=OS			; RUN: opt -S -Os < %s \| FileCheck %s -check-prefix=OS

	; The inline threshold for a function with the optsize attribute is currently			; The inline threshold for a function with the optsize attribute is currently
	; the same as the global inline threshold for -Os. Check that the optsize			; the same as the global inline threshold for -Os. Check that the optsize
	; function attribute doesn't alter the function-specific inline threshold if the			; function attribute doesn't alter the function-specific inline threshold if the
	; global inline threshold is lower (as for -Oz).			; global inline threshold is lower (as for -Oz).

	@a = global i32 4			@a = global i32 4

	; This function should be larger than the inline threshold for -Oz (25), but			; This function should be larger than the inline threshold for -Oz (25), but
	; smaller than the inline threshold for optsize (75).			; smaller than the inline threshold for optsize (75).
	define i32 @inner() {			define i32 @inner() {
				call void @extern()
	%a1 = load volatile i32, i32* @a			%a1 = load volatile i32, i32* @a
	%x1 = add i32 %a1, %a1			%x1 = add i32 %a1, %a1
	%a2 = load volatile i32, i32* @a			%a2 = load volatile i32, i32* @a
	%x2 = add i32 %x1, %a2			%x2 = add i32 %x1, %a2
	%a3 = load volatile i32, i32* @a			%a3 = load volatile i32, i32* @a
	%x3 = add i32 %x2, %a3			%x3 = add i32 %x2, %a3
	%a4 = load volatile i32, i32* @a			%a4 = load volatile i32, i32* @a
	%x4 = add i32 %x3, %a4			%x4 = add i32 %x3, %a4
	Show All 14 Lines
	; @inner() should not be inlined for -O2, -Os and -Oz.			; @inner() should not be inlined for -O2, -Os and -Oz.
	; OZ: call			; OZ: call
	; O2: call			; O2: call
	; OS: call			; OS: call
	define i32 @outer2() minsize {			define i32 @outer2() minsize {
	%r = call i32 @inner()			%r = call i32 @inner()
	ret i32 %r			ret i32 %r
	}			}

				declare void @extern()
				No newline at end of file

test/Transforms/Inline/inline_unreachable-2.ll

	; RUN: opt < %s -inline -S \| FileCheck %s			; RUN: opt < %s -inline -S \| FileCheck %s

	; CHECK-LABEL: caller			; CHECK-LABEL: caller
	; CHECK: call void @callee			; CHECK: call void @callee
	define void @caller(i32 %a, i1 %b) #0 {			define void @caller(i32 %a, i1 %b) #0 {
	call void @callee(i32 %a, i1 %b)			call void @callee(i32 %a, i1 %b)
	unreachable			unreachable
	}			}

	define void @callee(i32 %a, i1 %b) {			define void @callee(i32 %a, i1 %b) {
				call void @extern()
	call void asm sideeffect "", ""()			call void asm sideeffect "", ""()
	br i1 %b, label %bb1, label %bb2			br i1 %b, label %bb1, label %bb2
	bb1:			bb1:
	call void asm sideeffect "", ""()			call void asm sideeffect "", ""()
	ret void			ret void
	bb2:			bb2:
	call void asm sideeffect "", ""()			call void asm sideeffect "", ""()
	ret void			ret void
	}			}

				declare void @extern()
				No newline at end of file

test/Transforms/Inline/optimization-remarks-passed-yaml.ll

	; RUN: opt < %s -S -inline -pass-remarks-output=%t -pass-remarks=inline \			; RUN: opt < %s -S -inline -pass-remarks-output=%t -pass-remarks=inline \
	; RUN: -pass-remarks-missed=inline -pass-remarks-analysis=inline \			; RUN: -pass-remarks-missed=inline -pass-remarks-analysis=inline \
	; RUN: -pass-remarks-with-hotness 2>&1 \| FileCheck %s			; RUN: -pass-remarks-with-hotness 2>&1 \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=YAML %s			; RUN: cat %t \| FileCheck -check-prefix=YAML %s

	; Check the YAML file for inliner-generated passed and analysis remarks. This			; Check the YAML file for inliner-generated passed and analysis remarks. This
	; is the input:			; is the input:

	; 1 int foo() { return 1; }			; 1 int foo() { return 1; }
	; 2			; 2
	; 3 int bar() {			; 3 int bar() {
	; 4 return foo();			; 4 return foo();
	; 5 }			; 5 }

	; CHECK: remark: /tmp/s.c:4:10: foo can be inlined into bar with cost={{[0-9]+}} (threshold={{[0-9]+}}) (hotness: 30)			; CHECK: remark: /tmp/s.c:4:10: foo can be inlined into bar with cost={{[0-9\-]+}} (threshold={{[0-9]+}}) (hotness: 30)
	; CHECK-NEXT: remark: /tmp/s.c:4:10: foo inlined into bar (hotness: 30)			; CHECK-NEXT: remark: /tmp/s.c:4:10: foo inlined into bar (hotness: 30)

	; YAML: --- !Analysis			; YAML: --- !Analysis
	; YAML-NEXT: Pass: inline			; YAML-NEXT: Pass: inline
	; YAML-NEXT: Name: CanBeInlined			; YAML-NEXT: Name: CanBeInlined
	; YAML-NEXT: DebugLoc: { File: /tmp/s.c, Line: 4, Column: 10 }			; YAML-NEXT: DebugLoc: { File: /tmp/s.c, Line: 4, Column: 10 }
	; YAML-NEXT: Function: bar			; YAML-NEXT: Function: bar
	; YAML-NEXT: Hotness: 30			; YAML-NEXT: Hotness: 30
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - Callee: foo			; YAML-NEXT: - Callee: foo
	; YAML-NEXT: - String: ' can be inlined into '			; YAML-NEXT: - String: ' can be inlined into '
	; YAML-NEXT: - Caller: bar			; YAML-NEXT: - Caller: bar
	; YAML-NEXT: - String: ' with cost='			; YAML-NEXT: - String: ' with cost='
	; YAML-NEXT: - Cost: '{{[0-9]+}}'			; YAML-NEXT: - Cost: '{{[0-9\-]+}}'
	; YAML-NEXT: - String: ' (threshold='			; YAML-NEXT: - String: ' (threshold='
	; YAML-NEXT: - Threshold: '{{[0-9]+}}'			; YAML-NEXT: - Threshold: '{{[0-9]+}}'
	; YAML-NEXT: - String: ')'			; YAML-NEXT: - String: ')'
	; YAML-NEXT: ...			; YAML-NEXT: ...
	; YAML-NEXT: --- !Passed			; YAML-NEXT: --- !Passed
	; YAML-NEXT: Pass: inline			; YAML-NEXT: Pass: inline
	; YAML-NEXT: Name: Inlined			; YAML-NEXT: Name: Inlined
	; YAML-NEXT: DebugLoc: { File: /tmp/s.c, Line: 4, Column: 10 }			; YAML-NEXT: DebugLoc: { File: /tmp/s.c, Line: 4, Column: 10 }
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

test/Transforms/Inline/ptr-diff.ll

; RUN: opt -inline < %s -S -o - -inline-threshold=10 \| FileCheck %s		; RUN: opt -inline < %s -S -o - -inline-threshold=10 \| FileCheck %s

target datalayout = "p:32:32-p1:64:64-p2:16:16-n16:32:64"		target datalayout = "p:32:32-p1:64:64-p2:16:16-n16:32:64"

define i32 @outer1() {		define i32 @outer1() {
; CHECK-LABEL: @outer1(		; CHECK-LABEL: @outer1(
; CHECK-NOT: call		; CHECK-NOT: call i32
; CHECK: ret i32		; CHECK: ret i32

%ptr = alloca i32		%ptr = alloca i32
%ptr1 = getelementptr inbounds i32, i32* %ptr, i32 0		%ptr1 = getelementptr inbounds i32, i32* %ptr, i32 0
%ptr2 = getelementptr inbounds i32, i32* %ptr, i32 42		%ptr2 = getelementptr inbounds i32, i32* %ptr, i32 42
%result = call i32 @inner1(i32* %ptr1, i32* %ptr2)		%result = call i32 @inner1(i32* %ptr1, i32* %ptr2)
ret i32 %result		ret i32 %result
}		}

define i32 @inner1(i32* %begin, i32* %end) {		define i32 @inner1(i32* %begin, i32* %end) {
		call void @extern()
%begin.i = ptrtoint i32* %begin to i32		%begin.i = ptrtoint i32* %begin to i32
%end.i = ptrtoint i32* %end to i32		%end.i = ptrtoint i32* %end to i32
%distance = sub i32 %end.i, %begin.i		%distance = sub i32 %end.i, %begin.i
%icmp = icmp sle i32 %distance, 42		%icmp = icmp sle i32 %distance, 42
br i1 %icmp, label %then, label %else		br i1 %icmp, label %then, label %else

then:		then:
ret i32 3		ret i32 3
Show All 12 Lines	; CHECK: ret i32

%ptr1 = getelementptr i32, i32* %ptr, i32 0		%ptr1 = getelementptr i32, i32* %ptr, i32 0
%ptr2 = getelementptr i32, i32* %ptr, i32 42		%ptr2 = getelementptr i32, i32* %ptr, i32 42
%result = call i32 @inner2(i32* %ptr1, i32* %ptr2)		%result = call i32 @inner2(i32* %ptr1, i32* %ptr2)
ret i32 %result		ret i32 %result
}		}

define i32 @inner2(i32* %begin, i32* %end) {		define i32 @inner2(i32* %begin, i32* %end) {
		call void @extern()
%begin.i = ptrtoint i32* %begin to i32		%begin.i = ptrtoint i32* %begin to i32
%end.i = ptrtoint i32* %end to i32		%end.i = ptrtoint i32* %end to i32
%distance = sub i32 %end.i, %begin.i		%distance = sub i32 %end.i, %begin.i
%icmp = icmp sle i32 %distance, 42		%icmp = icmp sle i32 %distance, 42
br i1 %icmp, label %then, label %else		br i1 %icmp, label %then, label %else

then:		then:
ret i32 3		ret i32 3

else:		else:
%t = load i32, i32* %begin		%t = load i32, i32* %begin
ret i32 %t		ret i32 %t
}		}

; The inttoptrs are free since it is a smaller integer to a larger		; The inttoptrs are free since it is a smaller integer to a larger
; pointer size		; pointer size
define i32 @inttoptr_free_cost(i32 %a, i32 %b, i32 %c) {		define i32 @inttoptr_free_cost(i32 %a, i32 %b, i32 %c) {
		call void @extern()
%p1 = inttoptr i32 %a to i32 addrspace(1)*		%p1 = inttoptr i32 %a to i32 addrspace(1)*
%p2 = inttoptr i32 %b to i32 addrspace(1)*		%p2 = inttoptr i32 %b to i32 addrspace(1)*
%p3 = inttoptr i32 %c to i32 addrspace(1)*		%p3 = inttoptr i32 %c to i32 addrspace(1)*
%t1 = load i32, i32 addrspace(1)* %p1		%t1 = load i32, i32 addrspace(1)* %p1
%t2 = load i32, i32 addrspace(1)* %p2		%t2 = load i32, i32 addrspace(1)* %p2
%t3 = load i32, i32 addrspace(1)* %p3		%t3 = load i32, i32 addrspace(1)* %p3
%s = add i32 %t1, %t2		%s = add i32 %t1, %t2
%s1 = add i32 %s, %t3		%s1 = add i32 %s, %t3
ret i32 %s1		ret i32 %s1
}		}

define i32 @inttoptr_free_cost_user(i32 %begin, i32 %end) {		define i32 @inttoptr_free_cost_user(i32 %begin, i32 %end) {
; CHECK-LABEL: @inttoptr_free_cost_user(		; CHECK-LABEL: @inttoptr_free_cost_user(
; CHECK-NOT: call		; CHECK-NOT: call i32
%x = call i32 @inttoptr_free_cost(i32 %begin, i32 %end, i32 9)		%x = call i32 @inttoptr_free_cost(i32 %begin, i32 %end, i32 9)
ret i32 %x		ret i32 %x
}		}

; The inttoptrs have a cost since it is a larger integer to a smaller		; The inttoptrs have a cost since it is a larger integer to a smaller
; pointer size		; pointer size
define i32 @inttoptr_cost_smaller_ptr(i32 %a, i32 %b, i32 %c) {		define i32 @inttoptr_cost_smaller_ptr(i32 %a, i32 %b, i32 %c) {
		call void @extern()
%p1 = inttoptr i32 %a to i32 addrspace(2)*		%p1 = inttoptr i32 %a to i32 addrspace(2)*
%p2 = inttoptr i32 %b to i32 addrspace(2)*		%p2 = inttoptr i32 %b to i32 addrspace(2)*
%p3 = inttoptr i32 %c to i32 addrspace(2)*		%p3 = inttoptr i32 %c to i32 addrspace(2)*
%t1 = load i32, i32 addrspace(2)* %p1		%t1 = load i32, i32 addrspace(2)* %p1
%t2 = load i32, i32 addrspace(2)* %p2		%t2 = load i32, i32 addrspace(2)* %p2
%t3 = load i32, i32 addrspace(2)* %p3		%t3 = load i32, i32 addrspace(2)* %p3
%s = add i32 %t1, %t2		%s = add i32 %t1, %t2
%s1 = add i32 %s, %t3		%s1 = add i32 %s, %t3
ret i32 %s1		ret i32 %s1
}		}

define i32 @inttoptr_cost_smaller_ptr_user(i32 %begin, i32 %end) {		define i32 @inttoptr_cost_smaller_ptr_user(i32 %begin, i32 %end) {
; CHECK-LABEL: @inttoptr_cost_smaller_ptr_user(		; CHECK-LABEL: @inttoptr_cost_smaller_ptr_user(
; CHECK: call		; CHECK: call i32
%x = call i32 @inttoptr_cost_smaller_ptr(i32 %begin, i32 %end, i32 9)		%x = call i32 @inttoptr_cost_smaller_ptr(i32 %begin, i32 %end, i32 9)
ret i32 %x		ret i32 %x
}		}

		declare void @extern()
		No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[InlineCost] Remove CallPenalty and change MinSizeThreshold to 5AcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 77027

lib/Analysis/InlineCost.cpp

test/Transforms/Inline/alloca-bonus.ll

test/Transforms/Inline/inline-cold-callee.ll

test/Transforms/Inline/inline-cold.ll

test/Transforms/Inline/inline-hot-callee.ll

test/Transforms/Inline/inline-hot-callsite.ll

test/Transforms/Inline/inline-optsize.ll

test/Transforms/Inline/inline_unreachable-2.ll

test/Transforms/Inline/optimization-remarks-passed-yaml.ll

test/Transforms/Inline/ptr-diff.ll

[InlineCost] Remove CallPenalty and change MinSizeThreshold to 5
AcceptedPublic