This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Add getInliningThresholdMultiplier.
ClosedPublic

Authored by jlebar on Mar 29 2016, 9:33 AM.

Download Raw Diff

Details

Reviewers

chandlerc
hfinkel

Commits

rG8650a4da9334: [TTI] Add getInliningThresholdMultiplier.
rL266405: [TTI] Add getInliningThresholdMultiplier.

Summary

InlineCost's threshold is multiplied by this value. This lets us adjust
the inlining threshold up or down on a per-target basis. For example,
we might want to increase the threshold on targets where calls are
unusually expensive.

Diff Detail

Repository: rL LLVM

Event Timeline

jlebar updated this revision to Diff 51937.Mar 29 2016, 9:33 AM

jlebar retitled this revision from to [TTI] Add getInliningThresholdMultiplier..

jlebar updated this object.

jlebar added a reviewer: chandlerc.

jlebar added a subscriber: llvm-commits.

arsenm added a subscriber: arsenm.Mar 29 2016, 9:39 AM

arsenm added inline comments.

lib/Analysis/TargetTransformInfo.cpp
71–73 ↗	(On Diff #51937)	I thought using host FP was to be avoided, so I'm surprised this isn't an integer factor. Couldn't you have one assert on isnormal()?

jlebar added inline comments.Mar 29 2016, 9:43 AM

lib/Analysis/TargetTransformInfo.cpp
71–73 ↗	(On Diff #51937)	I thought using host FP was to be avoided No idea. It seems substantially less flexible to use an integer factor (you can't decrease the inlining threshold except to 0), but it works fine for my case, so I have no problem changing it if that's the convention. Couldn't you have one assert on isnormal()? isnormal(0) seems to be false, but giving a multiplier of 0 doesn't seem outrageous to me?

arsenm added inline comments.Mar 29 2016, 9:47 AM

lib/Analysis/TargetTransformInfo.cpp
71–73 ↗	(On Diff #51937)	You could return a rational numerator + denominator pair

Specify the inlining threshold multiplier as a fraction.

Wouldn't it be better to use an additive bonus to the threshold instead of multiplicative factor to account for expensive calls?

In D18560#387488, @eraman wrote:

Wouldn't it be better to use an additive bonus to the threshold instead of multiplicative factor to account for expensive calls?

Not necessarily? It seems not unreasonable that if we e.g. went from an un-multiplied inlining threshold of 10 to 100 that the adjusted threshold should also increase by 10x, rather than increase by +90. Like, it depends on what is the intent behind the 10 --> 100 change.

Given how blunt this is, and given that it's used by exactly one target at the moment, I'm inclined not to bikeshed too hard on the exact form it takes. Like, it would be better to rearrange all of the constants inside InlineCost to better reflect costs on the target machine. But the discussion of exactly what form this should take would probably be much more productive at the point that what we have doesn't work for some new target.

In D18560#387502, @jlebar wrote:

In D18560#387488, @eraman wrote:

Wouldn't it be better to use an additive bonus to the threshold instead of multiplicative factor to account for expensive calls?

Not necessarily? It seems not unreasonable that if we e.g. went from an un-multiplied inlining threshold of 10 to 100 that the adjusted threshold should also increase by 10x, rather than increase by +90. Like, it depends on what is the intent behind the 10 --> 100 change.

Given how blunt this is, and given that it's used by exactly one target at the moment, I'm inclined not to bikeshed too hard on the exact form it takes. Like, it would be better to rearrange all of the constants inside InlineCost to better reflect costs on the target machine. But the discussion of exactly what form this should take would probably be much more productive at the point that what we have doesn't work for some new target.

If you'd like a target-specific cost adjustment, which seems perfectly reasonable, it should be additive, and should be a function of the call site (i.e. both the caller and the callee, not just the caller). Also, hopefully the rationale for these numbers can be clearly articulated in comments in the TTI implementation, so we know how to adjust them if we change how TTI.getUserCost works.

If you'd like a target-specific cost adjustment, which seems perfectly reasonable, it should be additive,

One of the other reasons I think a multiplicative setting makes sense is, there's no way to change the fudge factor. All you can change is the base inlining threshold. So the question is, as a user changing the inlining threshold, what do I expect to happen on a platform that has an inlining threshold adjustment?

Suppose I set the (global, un-fudged) inlining threshold (again, the only thing I can change) to something very small. It seems to me that should imply that relatively little inlining occurs. But of course an additive fudge doesn't accomplish that. And in fact there's no good way for me to set the post-fudge inlining threshold to a small value -- I'd have to set it to a negative number, and I'd have to calculate that specific value by looking at the TTI.

In contrast, if the fudge is multiplicative, then we have the property that a small pre-fudge inlining threshold results in a small post-fudge threshold.

If a multipliciative fudge is off the table, I think I'd feel more comfortable with the TTI giving an absolute inlining threshold that can be overridden on the command line. Is that any more appealing to you, Hal?

should be a function of the call site (i.e. both the caller and the callee, not just the caller).

Sure, I can make that change once we agree on the rest here.

Also, hopefully the rationale for these numbers can be clearly articulated in comments in the TTI implementation, so we know how to adjust them if we change how TTI.getUserCost works.

Haha, just like the current cross-arch threshold is so clearly motivated? :-p I don't have a good rationale other than, this number worked well for many internal benchmarks at Google. It's absolutely not scientifically arrived at; it's just a starting place.

Clearly we want more inlining on nvptx than on other archs, and, anecdotally, I tried a few benchmarks that showed 10% gains from this change. But ptxas does its own inlining, and it's a black box, so this is all very mushy. And more importantly, speed is only half the story -- inlining big functions is a size/speed tradeoff, and that's harder to quantify. Like, GPUs tend to have relatively little code to begin with, so maybe we have more wiggle room on the size end? But how much -- 1.5x, 5x, 100x? Who knows.

In D18560#389927, @jlebar wrote:

If you'd like a target-specific cost adjustment, which seems perfectly reasonable, it should be additive,

One of the other reasons I think a multiplicative setting makes sense is, there's no way to change the fudge factor. All you can change is the base inlining threshold. So the question is, as a user changing the inlining threshold, what do I expect to happen on a platform that has an inlining threshold adjustment?

Suppose I set the (global, un-fudged) inlining threshold (again, the only thing I can change) to something very small. It seems to me that should imply that relatively little inlining occurs. But of course an additive fudge doesn't accomplish that. And in fact there's no good way for me to set the post-fudge inlining threshold to a small value -- I'd have to set it to a negative number, and I'd have to calculate that specific value by looking at the TTI.

In contrast, if the fudge is multiplicative, then we have the property that a small pre-fudge inlining threshold results in a small post-fudge threshold.

If a multipliciative fudge is off the table, I think I'd feel more comfortable with the TTI giving an absolute inlining threshold that can be overridden on the command line. Is that any more appealing to you, Hal?

I ended up chatting with Hal about this and he made a really great point about this. I had been thinking that it is really brittle to have the target provided inlining threshold be an absolute number instead of a multiplier / ratio as you have it.

However, Hal pointed out that this creates a coupling that could also be problematic. Consider an out-of-tree target with a carefully tuned inlining threshold multiplier. If lots of targets do this, changing the threshold could become extremely problematic because small changes would would still disturb the target-specific tunings. His suggestion was to use an absolute threshold from the target in the absense of an explicitly specified command line flag.

Thinking about this more, I think it still presents the same problem. Consider a change to the inliner that significantly changes the rate at which we inline things. It might be useful to be able to adjust the threshold when making the change to keep most inlining decisions neutral across targets.

I'm not completely sure that either of these approaches is really resistant to undue coupling...

One possible alternative is to not have the size-based inlining be target configurable, and to make this exclusively handled by the proposed runtime cost estimation based inlining when that is available. Perhaps this could be documented as a temporary hook that will be replaced with the runtime estimation? I'm curious what others think here.

Also, hopefully the rationale for these numbers can be clearly articulated in comments in the TTI implementation, so we know how to adjust them if we change how TTI.getUserCost works.

Haha, just like the current cross-arch threshold is so clearly motivated?

I'm going to write up a document describing the current system since this keeps coming up and causing confusion. I'll also try to sketch how other models would fit into this.

I don't have a good rationale other than, this number worked well for many internal benchmarks at Google. It's absolutely not scientifically arrived at; it's just a starting place.

Clearly we want more inlining on nvptx than on other archs, and, anecdotally, I tried a few benchmarks that showed 10% gains from this change. But ptxas does its own inlining, and it's a black box, so this is all very mushy. And more importantly, speed is only half the story -- inlining big functions is a size/speed tradeoff, and that's harder to quantify. Like, GPUs tend to have relatively little code to begin with, so maybe we have more wiggle room on the size end? But how much -- 1.5x, 5x, 100x? Who knows.

GPUs tend to not have significant icache-style bottlenecks because their kernels are typically small and easily predicted, etc. Still, there remains a fundamental limit on instruction working set size where inlining is profitable.

I think you'll want to do some "science" on this value eventually and document it. Typically we walk the value up and down and try to get an idea of the shape of the aggregate performance curve across a wide range of benchmarks. Then we look for what is usually a large flat mesa in the curve, and go with the small end of that.

-Chandler

chandlerc mentioned this in D18561: [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5..Apr 11 2016, 9:38 AM

Consider an out-of-tree target with a carefully tuned inlining threshold multiplier. If lots of targets do this, changing the threshold could become extremely problematic because small changes would would still disturb the target-specific tunings.

My thought was, the inlining multiplier is such a coarse knob -- set atop rather coarse heuristics -- that tuning it particularly carefully for a target might not make much sense.

I don't know if that's true or not, but it's at least an argument one could make.

His suggestion was to use an absolute threshold from the target in the absense of an explicitly specified command line flag. Thinking about this more, I think it still presents the same problem. Consider a change to the inliner that significantly changes the rate at which we inline things. It might be useful to be able to adjust the threshold when making the change to keep most inlining decisions neutral across targets.

I guess this (and the previous one, to an extent) are a question of API stability guarantees. To the extent that we don't promise to have a stable API for out-of-tree targets, we could change the inliner and at the same time break compilation for any out-of-tree targets that set the threshold, right. They'll have to fix the error and recompute their inlining threshold.

That doesn't seem so unreasonable to me -- it's not like we'd be doing this once a month, or even every release. And as an out-of-tree target, you'd only be broken if you explicitly opted in to this tuning mechanism, which we could make clear comes with this possibility of future intentional breakage.

I think you'll want to do some "science" on this value eventually and document it.

Yes, absolutely. That change doesn't even appear in this patch, though; let's worry about it separately, if that's OK?

In D18560#397232, @jlebar wrote:

Consider an out-of-tree target with a carefully tuned inlining threshold multiplier. If lots of targets do this, changing the threshold could become extremely problematic because small changes would would still disturb the target-specific tunings.

My thought was, the inlining multiplier is such a coarse knob -- set atop rather coarse heuristics -- that tuning it particularly carefully for a target might not make much sense.

I don't know if that's true or not, but it's at least an argument one could make.

His suggestion was to use an absolute threshold from the target in the absense of an explicitly specified command line flag. Thinking about this more, I think it still presents the same problem. Consider a change to the inliner that significantly changes the rate at which we inline things. It might be useful to be able to adjust the threshold when making the change to keep most inlining decisions neutral across targets.

I guess this (and the previous one, to an extent) are a question of API stability guarantees. To the extent that we don't promise to have a stable API for out-of-tree targets, we could change the inliner and at the same time break compilation for any out-of-tree targets that set the threshold, right. They'll have to fix the error and recompute their inlining threshold.

That doesn't seem so unreasonable to me -- it's not like we'd be doing this once a month, or even every release. And as an out-of-tree target, you'd only be broken if you explicitly opted in to this tuning mechanism, which we could make clear comes with this possibility of future intentional breakage.

This is a concern, and you're right that any of the tuning knobs need to be returned in response to upstream changes. Fundamentally, I'm most concerned with sending the right message here, and not unnecessarily constraining our future actions (because while we're certainly free to change whatever we want in the future, regardless of backend tuning, in practice we try to be nice and accommodating because these things take time).

The message we want to send is:

This is a temporary hook, and while some mechanism to achieve this affect will be provided in the future, it might look nothing like this.
This is a coarse heuristic mechanism, not really backed by any self-consistent model.

After chatting with Chandler about this, my understanding of the current mechanism is this: Our current heuristic is size based (with some exceptions), and aims to answer the following question: Will inlining this function make the caller smaller?

Doing this involves two pieces. The first is constructing an estimate of the size of the inlined function (which is truncated at some point to keep the cost of the analysis from becoming quadratic), accounting for constant propagation via function arguments and resulting simplifications. Then we construct an estimate of the caller's size associated with the call site that can be eliminated by inlining (the call itself, allocas that can be eliminated, etc.).

The second piece is more empirical. How many instructions, on average, will be eliminated in the caller by resulting simplification by inlining a function? This is where the current threshold comes in, it is a (rough) estimate of this. Unfortunately, as currently used, it is also not purely an estimate of this, but also accounts for performance effects (which is why, for instance, we have a bonus for callees with vector instructions, etc.).

Longer term, we really need to separate the performance-based metric from the size-based metric. They're estimating different things, although we're unfortunately conflating them currently (by hiding our performance-based metric inside our estimate of size decrease from resulting caller simplification, and to a lesser extent, things like the call penalty).

For your use case, GPUs, you want to represent the fact that calls are really expensive, and you don't care nearly as much about code size, although you do care about compile time. As a result, your limit is more to limit the compile time from later stages of the compiler (and perhaps the inliner itself) than anything else. That having been said, NVIDIA GPUs do have an icache that holds some kilobytes of compiled code, and spilling out of that will cost you (although often not a huge amount).

In the long term, I imagine we want a more top-down heuristic for this kind of inlining, because we want to keep a function, or group of functions, below some absolute size threshold (unless there is a huge compensating performance speedup). The size based heuristic is also good for your use case, but only if it happens in a more pure sense.

So my conclusion is that all of this is wrong ;) -- but in the name of making progress, I'm fine with a way to bump the threshold for some targets. I think it should be just a number, because that's the simplest possible thing we can do. A number if difficult, however, because cannot directly account for -Os, etc. factors that go into the current threshold. Providing a fraction to be multiplied by the current threshold seems problematic because it sends the message that this is something that should be finely tuned, and it shouldn't be. Could this just be an integer multipler for now? That will prevent fine tuning, send the message that this is a temporary coarse knob, and then we can work on a more principled solution. If that would work for you, I think that is the best solution for now.

I think you'll want to do some "science" on this value eventually and document it.

Yes, absolutely. That change doesn't even appear in this patch, though; let's worry about it separately, if that's OK?

Could this just be an integer multipler for now? That will prevent fine tuning, send the message that this is a temporary coarse knob, and then we can work on a more principled solution.

That sounds great to me, I will spin a patch. If anyone else disagrees, speak now or forever hold your peace...

Update patch so that getInliningThresholdMultiplier is just a fixed number.

Hal, I hope I didn't misunderstand: I took your comment to suggest that the multiplier should be fixed for a particular target -- it shouldn't depend on the caller/callee. I can add that back if that's not what you had in mind.

In D18560#401680, @jlebar wrote:

Hal, I hope I didn't misunderstand: I took your comment to suggest that the multiplier should be fixed for a particular target -- it shouldn't depend on the caller/callee. I can add that back if that's not what you had in mind.

This is fine. We can add those things back if we come up with a use case. As discussed, we really need to rework a bunch of this logic. LGTM.

This revision is now accepted and ready to land.Apr 14 2016, 6:42 PM

Closed by commit rL266405: [TTI] Add getInliningThresholdMultiplier. (authored by jlebar). · Explain WhyApr 14 2016, 6:44 PM

This revision was automatically updated to reflect the committed changes.

Thank you very much, everyone!

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

TargetTransformInfo.h

12 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

2 lines

lib/

Analysis/

InlineCost.cpp

4 lines

TargetTransformInfo.cpp

4 lines

Diff 53829

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	public:
/// being called in the event it is a library call with special lowering.		/// being called in the event it is a library call with special lowering.
int getCallCost(const Function *F, int NumArgs = -1) const;		int getCallCost(const Function *F, int NumArgs = -1) const;

/// \brief Estimate the cost of calling a specific function when lowered.		/// \brief Estimate the cost of calling a specific function when lowered.
///		///
/// This overload allows specifying a set of candidate argument values.		/// This overload allows specifying a set of candidate argument values.
int getCallCost(const Function F, ArrayRef<const Value > Arguments) const;		int getCallCost(const Function F, ArrayRef<const Value > Arguments) const;

		/// \returns A value by which our inlining threshold should be multiplied.
		/// This is primarily used to bump up the inlining threshold wholesale on
		/// targets where calls are unusually expensive.
		///
		/// TODO: This is a rather blunt instrument. Perhaps altering the costs of
		/// individual classes of instructions would be better.
		unsigned getInliningThresholdMultiplier() const;

/// \brief Estimate the cost of an intrinsic when lowered.		/// \brief Estimate the cost of an intrinsic when lowered.
///		///
/// Mirrors the \c getCallCost method but uses an intrinsic identifier.		/// Mirrors the \c getCallCost method but uses an intrinsic identifier.
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys) const;		ArrayRef<Type *> ParamTys) const;

/// \brief Estimate the cost of an intrinsic when lowered.		/// \brief Estimate the cost of an intrinsic when lowered.
///		///
▲ Show 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	public:
virtual const DataLayout &getDataLayout() const = 0;		virtual const DataLayout &getDataLayout() const = 0;
virtual int getOperationCost(unsigned Opcode, Type Ty, Type OpTy) = 0;		virtual int getOperationCost(unsigned Opcode, Type Ty, Type OpTy) = 0;
virtual int getGEPCost(Type PointeeType, const Value Ptr,		virtual int getGEPCost(Type PointeeType, const Value Ptr,
ArrayRef<const Value *> Operands) = 0;		ArrayRef<const Value *> Operands) = 0;
virtual int getCallCost(FunctionType *FTy, int NumArgs) = 0;		virtual int getCallCost(FunctionType *FTy, int NumArgs) = 0;
virtual int getCallCost(const Function *F, int NumArgs) = 0;		virtual int getCallCost(const Function *F, int NumArgs) = 0;
virtual int getCallCost(const Function *F,		virtual int getCallCost(const Function *F,
ArrayRef<const Value *> Arguments) = 0;		ArrayRef<const Value *> Arguments) = 0;
		virtual unsigned getInliningThresholdMultiplier() = 0;
virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys) = 0;		ArrayRef<Type *> ParamTys) = 0;
virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments) = 0;		ArrayRef<const Value *> Arguments) = 0;
virtual int getUserCost(const User *U) = 0;		virtual int getUserCost(const User *U) = 0;
virtual bool hasBranchDivergence() = 0;		virtual bool hasBranchDivergence() = 0;
virtual bool isSourceOfDivergence(const Value *V) = 0;		virtual bool isSourceOfDivergence(const Value *V) = 0;
virtual bool isLoweredToCall(const Function *F) = 0;		virtual bool isLoweredToCall(const Function *F) = 0;
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	public:
}		}
int getCallCost(const Function *F, int NumArgs) override {		int getCallCost(const Function *F, int NumArgs) override {
return Impl.getCallCost(F, NumArgs);		return Impl.getCallCost(F, NumArgs);
}		}
int getCallCost(const Function *F,		int getCallCost(const Function *F,
ArrayRef<const Value *> Arguments) override {		ArrayRef<const Value *> Arguments) override {
return Impl.getCallCost(F, Arguments);		return Impl.getCallCost(F, Arguments);
}		}
		unsigned getInliningThresholdMultiplier() override {
		return Impl.getInliningThresholdMultiplier();
		}
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys) override {		ArrayRef<Type *> ParamTys) override {
return Impl.getIntrinsicCost(IID, RetTy, ParamTys);		return Impl.getIntrinsicCost(IID, RetTy, ParamTys);
}		}
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments) override {		ArrayRef<const Value *> Arguments) override {
return Impl.getIntrinsicCost(IID, RetTy, Arguments);		return Impl.getIntrinsicCost(IID, RetTy, Arguments);
}		}
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	unsigned getCallCost(FunctionType *FTy, int NumArgs) {
if (NumArgs < 0)		if (NumArgs < 0)
// Set the argument number to the number of explicit arguments in the		// Set the argument number to the number of explicit arguments in the
// function.		// function.
NumArgs = FTy->getNumParams();		NumArgs = FTy->getNumParams();

return TTI::TCC_Basic * (NumArgs + 1);		return TTI::TCC_Basic * (NumArgs + 1);
}		}

		unsigned getInliningThresholdMultiplier() { return 1; }

unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys) {		ArrayRef<Type *> ParamTys) {
switch (IID) {		switch (IID) {
default:		default:
// Intrinsics rarely (if ever) have normal argument setup constraints.		// Intrinsics rarely (if ever) have normal argument setup constraints.
// Model them as having a basic instruction cost.		// Model them as having a basic instruction cost.
// FIXME: This is wrong for libc intrinsics.		// FIXME: This is wrong for libc intrinsics.
return TTI::TCC_Basic;		return TTI::TCC_Basic;
▲ Show 20 Lines • Show All 384 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	case Instruction::ZExt: {
return TargetTransformInfo::TCC_Free;		return TargetTransformInfo::TCC_Free;
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
}		}
}		}

return BaseT::getOperationCost(Opcode, Ty, OpTy);		return BaseT::getOperationCost(Opcode, Ty, OpTy);
}		}

		unsigned getInliningThresholdMultiplier() { return 1; }

void getUnrollingPreferences(Loop *L, TTI::UnrollingPreferences &UP) {		void getUnrollingPreferences(Loop *L, TTI::UnrollingPreferences &UP) {
// This unrolling functionality is target independent, but to provide some		// This unrolling functionality is target independent, but to provide some
// motivation for its intended use, for x86:		// motivation for its intended use, for x86:

// According to the Intel 64 and IA-32 Architectures Optimization Reference		// According to the Intel 64 and IA-32 Architectures Optimization Reference
// Manual, Intel Core models and later have a loop stream detector (and		// Manual, Intel Core models and later have a loop stream detector (and
// associated uop queue) that can benefit from partial unrolling.		// associated uop queue) that can benefit from partial unrolling.
// The relevant requirements are:		// The relevant requirements are:
▲ Show 20 Lines • Show All 653 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 649 Lines • ▼ Show 20 Lines	bool ColdCallee =
FunctionCount <= (uint64_t)(0.01 * (double)MaxFunctionCount));		FunctionCount <= (uint64_t)(0.01 * (double)MaxFunctionCount));
// Command line argument for DefaultInlineThreshold will override the default		// Command line argument for DefaultInlineThreshold will override the default
// ColdThreshold. If we have -inline-threshold but no -inlinecold-threshold,		// ColdThreshold. If we have -inline-threshold but no -inlinecold-threshold,
// do not use the default cold threshold even if it is smaller.		// do not use the default cold threshold even if it is smaller.
if ((DefaultInlineThreshold.getNumOccurrences() == 0 \|\|		if ((DefaultInlineThreshold.getNumOccurrences() == 0 \|\|
ColdThreshold.getNumOccurrences() > 0) &&		ColdThreshold.getNumOccurrences() > 0) &&
ColdCallee && ColdThreshold < Threshold)		ColdCallee && ColdThreshold < Threshold)
Threshold = ColdThreshold;		Threshold = ColdThreshold;

		// Finally, take the target-specific inlining threshold multiplier into
		// account.
		Threshold *= TTI.getInliningThresholdMultiplier();
}		}

bool CallAnalyzer::visitCmpInst(CmpInst &I) {		bool CallAnalyzer::visitCmpInst(CmpInst &I) {
Value LHS = I.getOperand(0), RHS = I.getOperand(1);		Value LHS = I.getOperand(0), RHS = I.getOperand(1);
// First try to handle simplified comparisons.		// First try to handle simplified comparisons.
if (!isa<Constant>(LHS))		if (!isa<Constant>(LHS))
if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))		if (Constant *SimpleLHS = SimplifiedValues.lookup(LHS))
LHS = SimpleLHS;		LHS = SimpleLHS;
▲ Show 20 Lines • Show All 874 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines

	int TargetTransformInfo::getCallCost(const Function *F,			int TargetTransformInfo::getCallCost(const Function *F,
	ArrayRef<const Value *> Arguments) const {			ArrayRef<const Value *> Arguments) const {
	int Cost = TTIImpl->getCallCost(F, Arguments);			int Cost = TTIImpl->getCallCost(F, Arguments);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

				unsigned TargetTransformInfo::getInliningThresholdMultiplier() const {
				return TTIImpl->getInliningThresholdMultiplier();
				}

	int TargetTransformInfo::getIntrinsicCost(			int TargetTransformInfo::getIntrinsicCost(
	Intrinsic::ID IID, Type RetTy, ArrayRef<const Value > Arguments) const {			Intrinsic::ID IID, Type RetTy, ArrayRef<const Value > Arguments) const {
	int Cost = TTIImpl->getIntrinsicCost(IID, RetTy, Arguments);			int Cost = TTIImpl->getIntrinsicCost(IID, RetTy, Arguments);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

	int TargetTransformInfo::getUserCost(const User *U) const {			int TargetTransformInfo::getUserCost(const User *U) const {
	▲ Show 20 Lines • Show All 352 Lines • Show Last 20 Lines