This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
9
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
CodeGen/
-
CodeGenPrepare.cpp
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
AArch64TargetTransformInfo.cpp
-
AMDGPU/
-
AMDGPUTargetTransformInfo.h
-
AMDGPUTargetTransformInfo.cpp
-
ARM/
-
ARMTargetTransformInfo.h
-
ARMTargetTransformInfo.cpp
-
Lanai/
-
LanaiTargetTransformInfo.h
-
NVPTX/
-
NVPTXTargetTransformInfo.h
-
NVPTXTargetTransformInfo.cpp
-
PowerPC/
-
PPCTargetTransformInfo.h
-
PPCTargetTransformInfo.cpp
-
SystemZ/
-
SystemZTargetTransformInfo.h
-
SystemZTargetTransformInfo.cpp
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
Transforms/
-
Scalar/
-
IndVarSimplify.cpp
-
Vectorize/
-
LoopVectorize.cpp
-
SLPVectorizer.cpp

Differential D43769

[TTI] rename getArithmeticInstructionCost() to getUnitThroughput(); NFC
ClosedPublic

Authored by spatel on Feb 26 2018, 10:18 AM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
hfinkel
fhahn
anemet
craig.topper
eastig

Summary

Let's explicitly state the meaning of this API in its name and the code comments to make the intent clear.

This assumes that the x86 overrides of the base cost model are doing the right thing already and D43733 is improving on that. The definition and usage of this cost were not obvious in D43079 / D42981.

If clients are not using this cost as intended, then the name change should make it apparent. Out-of-tree targets will have to update for the cosmetic change, but that's an opportunity to verify that they're either ok with the base model costs or have a reasonable override for those costs.

Diff Detail

Event Timeline

spatel created this revision.Feb 26 2018, 10:18 AM

Herald added subscribers: kbarton, javed.absar, nhaehnle and 5 others. · View Herald TranscriptFeb 26 2018, 10:18 AM

ABataev added inline comments.Feb 26 2018, 10:22 AM

include/llvm/Analysis/TargetTransformInfo.h
723	Seems to me very general, the name does not show that this is the throughput for arithmetic instructions only. I think it is going to be enough just to clarify the comment.

spatel added inline comments.Feb 26 2018, 10:50 AM

include/llvm/Analysis/TargetTransformInfo.h
723	getXXXCost is the most ambiguous. getALUUnitThroughput is better? I'd like to make it so we don't have to read the comments to distinguish this from getUserCost and getOperationCost. So if we don't change this name, then change those? I suppose we should change all of the throughput APIs in one shot if we're going to do this. But if there's no support, then I'll just update the code comment.

ABataev added inline comments.Feb 26 2018, 11:29 AM

include/llvm/Analysis/TargetTransformInfo.h
723	Still not sure. We're estimating the cost of the arithmetic LLVM instructions here in terms of throughput, not the ALU Unit throughput.

spatel added inline comments.Feb 26 2018, 12:45 PM

include/llvm/Analysis/TargetTransformInfo.h
723	Maybe I'm still confused. As I understand D43733, we're not estimating the LLVM instruction throughput for the target as a whole. If we were, then fmul/fadd/fsub should have a higher cost than integer add/sub because all recent x86 have greater overall throughput for scalar integer ops than FP ops because there are more scalar ALUs. IMO, the more important part is that we change Cost -> Throughput for everything (I think) that the vectorizers are using, so we're at least different than 'user cost' or 'operation cost'. I guess people will have to read the comment anyway to distinguish the subtlety between 'getArithmeticThroughput' / 'getALUThroughput'.

ABataev added inline comments.Feb 26 2018, 12:57 PM

include/llvm/Analysis/TargetTransformInfo.h
723	As I understand, we estimate throughout for the target as a whole. And because of that we assume that the cost of the integer scalar instructions is just 1 (though it may be lower). This just needs to be fixed somehow, e.g by introducing some multiplier.

RKSimon added subscribers: courbet, andreadb.Feb 26 2018, 2:21 PM

RKSimon added inline comments.

include/llvm/Analysis/TargetTransformInfo.h
723	If we did have access to the target's scheduling model, maybe something like: int Cost = (int) std::max(Inst->getThroughput() / Model->IssueWidth, 1.0f) ? Although I don't think we're even close to being able to use scheduling models here yet - x86 at least has very few CPUs correctly modelled, although there does seem to be some interest from @andreadb @courbet et al. to use the more aggressively,

fhahn added inline comments.Feb 26 2018, 2:41 PM

include/llvm/Analysis/TargetTransformInfo.h
723	Maybe AArch64 would be a good target to test using the scheduling model info? The available and required CPU units should be modelled quite well there.

This just needs to be fixed somehow, e.g by introducing some multiplier.

Now I'm even more confused.

The cost model is approximating the throughput of the entire target, not a single pipeline/execution unit.
But D43733 did the opposite of that. We said that FP ops after Pentium4 have the lowest possible reciprocal throughput value which means they have the same throughput as an integer add.
Did we purposely introduce wrong costs because we acknowledge that the cost-based calculations in the vectorizers are broken? And this is how we are working around that?

In D43769#1020943, @spatel wrote:

This just needs to be fixed somehow, e.g by introducing some multiplier.

Now I'm even more confused.

The cost model is approximating the throughput of the entire target, not a single pipeline/execution unit.
But D43733 did the opposite of that. We said that FP ops after Pentium4 have the lowest possible reciprocal throughput value which means they have the same throughput as an integer add.
Did we purposely introduce wrong costs because we acknowledge that the cost-based calculations in the vectorizers are broken? And this is how we are working around that?

The throughput of the integer ops is something about 0.25-0.5, but we just can't represent these numbers with integer type, so we just round these values to 1. That's why I say, that we need some kind of multiplier to fix this problem.

In D43769#1020967, @ABataev wrote:

In D43769#1020943, @spatel wrote:

This just needs to be fixed somehow, e.g by introducing some multiplier.

Now I'm even more confused.

The cost model is approximating the throughput of the entire target, not a single pipeline/execution unit.
But D43733 did the opposite of that. We said that FP ops after Pentium4 have the lowest possible reciprocal throughput value which means they have the same throughput as an integer add.
Did we purposely introduce wrong costs because we acknowledge that the cost-based calculations in the vectorizers are broken? And this is how we are working around that?

The throughput of the integer ops is something about 0.25-0.5, but we just can't represent these numbers with integer type, so we just round these values to 1. That's why I say, that we need some kind of multiplier to fix this problem.

Ok, but do you agree that changing the cost of FP ops to be the same as int ops moved us further away from x86 reality than before?

In D43769#1020993, @spatel wrote:

In D43769#1020967, @ABataev wrote:

In D43769#1020943, @spatel wrote:

This just needs to be fixed somehow, e.g by introducing some multiplier.

Now I'm even more confused.

The cost model is approximating the throughput of the entire target, not a single pipeline/execution unit.
But D43733 did the opposite of that. We said that FP ops after Pentium4 have the lowest possible reciprocal throughput value which means they have the same throughput as an integer add.
Did we purposely introduce wrong costs because we acknowledge that the cost-based calculations in the vectorizers are broken? And this is how we are working around that?

The throughput of the integer ops is something about 0.25-0.5, but we just can't represent these numbers with integer type, so we just round these values to 1. That's why I say, that we need some kind of multiplier to fix this problem.

Ok, but do you agree that changing the cost of FP ops to be the same as int ops moved us further away from x86 reality than before?

Yes, I agree. But I think this is the less important problem because this cost model is intended for comparing the cost of the instructions with the same base types, like for comparing scalar float type with vector float type or scalar integer type with vector integer type.

Sanjay,
Maybe it's worth to send the API change to llvm-dev as RFC?

include/llvm/Analysis/TargetTransformInfo.h
723	Still not sure. We're estimating the cost of the arithmetic LLVM instructions here in terms of throughput, not the ALU Unit throughput. Agree with Alexey getUnitThroughput looks confusing whilst the information is requested for an operation. My understanding of 'throughput', when it is used for operations, is: Throughput is the number of cycles after issue that another instruction can begin execution. In the function comments I see it is the reciprocal throughput which means: The reciprocal throughput is the maximum number of instructions of the same kind that can be executed per clock cycle when the operands of each instruction are independent of the preceding instructions. The reciprocal throughput is also called issue latency. I don't know how many people are familiar with the difference. As a result there will be incorrect uses. Another point is that reciprocal throughput can be less than 1. I think such cases can be represented by a pair of integers: number of operations, number of clocks. Is it important for the function to be ALU specific? Can the function be used for memory operations?

nhaehnle added inline comments.Feb 27 2018, 7:06 PM

include/llvm/Analysis/TargetTransformInfo.h
723	The name of the function should really be `getUnitReciprocalThroughput`. Throughput means higher numbers = better, whereas here higher numbers = worse.

At this point, I'll settle for just having an accurate function comment. :)
Let me update with just that much and see if we can reach consensus.

Can the function be used for memory operations?

I don't think so - we have "getMemoryOpCost" for that purpose IIUC. So again, a renaming exercise would be best if we make the whole API uniform.

Patch updated:
Just try to make it clear that this is a reciprocal throughput and explain what that means in the function comment.

This makes sense to me, and is in line with how we use it in AMDGPU (where we don't have multiple execution units).

Ping.

This revision is now accepted and ready to land.Mar 7 2018, 10:59 AM

rL326956

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

38 lines

TargetTransformInfoImpl.h

12 lines

CodeGen/

BasicTTIImpl.h

12 lines

lib/

Analysis/

TargetTransformInfo.cpp

13 lines

CodeGen/

CodeGenPrepare.cpp

8 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

2 lines

AArch64TargetTransformInfo.cpp

30 lines

AMDGPU/

AMDGPUTargetTransformInfo.h

2 lines

AMDGPUTargetTransformInfo.cpp

10 lines

ARM/

ARMTargetTransformInfo.h

2 lines

ARMTargetTransformInfo.cpp

6 lines

Lanai/

LanaiTargetTransformInfo.h

10 lines

NVPTX/

NVPTXTargetTransformInfo.h

2 lines

NVPTXTargetTransformInfo.cpp

10 lines

PowerPC/

PPCTargetTransformInfo.h

2 lines

PPCTargetTransformInfo.cpp

6 lines

SystemZ/

SystemZTargetTransformInfo.h

2 lines

SystemZTargetTransformInfo.cpp

10 lines

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

18 lines

Transforms/

Scalar/

IndVarSimplify.cpp

6 lines

Vectorize/

LoopVectorize.cpp

8 lines

SLPVectorizer.cpp

40 lines

Diff 135919

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 704 Lines • ▼ Show 20 Lines	public:
/// performed.		/// performed.
unsigned getMaxPrefetchIterationsAhead() const;		unsigned getMaxPrefetchIterationsAhead() const;

/// \return The maximum interleave factor that any transform should try to		/// \return The maximum interleave factor that any transform should try to
/// perform for this target. This number depends on the level of parallelism		/// perform for this target. This number depends on the level of parallelism
/// and the number of execution units in the CPU.		/// and the number of execution units in the CPU.
unsigned getMaxInterleaveFactor(unsigned VF) const;		unsigned getMaxInterleaveFactor(unsigned VF) const;

/// \return The expected cost of arithmetic ops, such as mul, xor, fsub, etc.		/// This is the expected reciprocal throughput in cycles of a math/logic op on
		/// a particular pipeline on the target. For example, if an independent
		/// integer add can execute every cycle on an ALU for this target, then the
		/// cost should be 1. If a division prevents any other division from
		/// executing on its unit for 5 cycles, then the cost should be 5. This cost
		/// is not affected if the target has N independent pipelines that can execute
		/// this kind of operation for greater parallelism.
/// \p Args is an optional argument which holds the instruction operands		/// \p Args is an optional argument which holds the instruction operands
/// values so the TTI can analyize those values searching for special		/// values so the TTI can analyze those values searching for special
/// cases\optimizations based on those values.		/// cases or optimizations based on those values.
int getArithmeticInstrCost(		int getUnitThroughput(
		ABataevUnsubmitted Not Done Reply Inline Actions Seems to me very general, the name does not show that this is the throughput for arithmetic instructions only. I think it is going to be enough just to clarify the comment. ABataev: Seems to me very general, the name does not show that this is the throughput for arithmetic…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions getXXXCost is the most ambiguous. getALUUnitThroughput is better? I'd like to make it so we don't have to read the comments to distinguish this from getUserCost and getOperationCost. So if we don't change this name, then change those? I suppose we should change all of the throughput APIs in one shot if we're going to do this. But if there's no support, then I'll just update the code comment. spatel: getXXXCost is the most ambiguous. getALUUnitThroughput is better? I'd like to make it so we…
		ABataevUnsubmitted Not Done Reply Inline Actions Still not sure. We're estimating the cost of the arithmetic LLVM instructions here in terms of throughput, not the ALU Unit throughput. ABataev: Still not sure. We're estimating the cost of the arithmetic LLVM instructions here in terms of…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Maybe I'm still confused. As I understand D43733, we're not estimating the LLVM instruction throughput for the target as a whole. If we were, then fmul/fadd/fsub should have a higher cost than integer add/sub because all recent x86 have greater overall throughput for scalar integer ops than FP ops because there are more scalar ALUs. IMO, the more important part is that we change Cost -> Throughput for everything (I think) that the vectorizers are using, so we're at least different than 'user cost' or 'operation cost'. I guess people will have to read the comment anyway to distinguish the subtlety between 'getArithmeticThroughput' / 'getALUThroughput'. spatel: Maybe I'm still confused. As I understand D43733, we're not estimating the LLVM instruction…
		ABataevUnsubmitted Not Done Reply Inline Actions As I understand, we estimate throughout for the target as a whole. And because of that we assume that the cost of the integer scalar instructions is just 1 (though it may be lower). This just needs to be fixed somehow, e.g by introducing some multiplier. ABataev: As I understand, we estimate throughout for the target as a whole. And because of that we…
		RKSimonUnsubmitted Not Done Reply Inline Actions If we did have access to the target's scheduling model, maybe something like: int Cost = (int) std::max(Inst->getThroughput() / Model->IssueWidth, 1.0f) ? Although I don't think we're even close to being able to use scheduling models here yet - x86 at least has very few CPUs correctly modelled, although there does seem to be some interest from @andreadb @courbet et al. to use the more aggressively, RKSimon: If we did have access to the target's scheduling model, maybe something like: int Cost = (int)…
		fhahnUnsubmitted Not Done Reply Inline Actions Maybe AArch64 would be a good target to test using the scheduling model info? The available and required CPU units should be modelled quite well there. fhahn: Maybe AArch64 would be a good target to test using the scheduling model info? The available and…
		eastigUnsubmitted Not Done Reply Inline Actions Still not sure. We're estimating the cost of the arithmetic LLVM instructions here in terms of throughput, not the ALU Unit throughput. Agree with Alexey getUnitThroughput looks confusing whilst the information is requested for an operation. My understanding of 'throughput', when it is used for operations, is: Throughput is the number of cycles after issue that another instruction can begin execution. In the function comments I see it is the reciprocal throughput which means: The reciprocal throughput is the maximum number of instructions of the same kind that can be executed per clock cycle when the operands of each instruction are independent of the preceding instructions. The reciprocal throughput is also called issue latency. I don't know how many people are familiar with the difference. As a result there will be incorrect uses. Another point is that reciprocal throughput can be less than 1. I think such cases can be represented by a pair of integers: number of operations, number of clocks. Is it important for the function to be ALU specific? Can the function be used for memory operations? eastig: > Still not sure. We're estimating the cost of the arithmetic LLVM instructions here in terms…
		nhaehnleUnsubmitted Not Done Reply Inline Actions The name of the function should really be `getUnitReciprocalThroughput`. Throughput means higher numbers = better, whereas here higher numbers = worse. nhaehnle: The name of the function should really be `getUnitReciprocalThroughput`. Throughput means…
unsigned Opcode, Type *Ty, OperandValueKind Opd1Info = OK_AnyValue,		unsigned Opcode, Type *Ty, OperandValueKind Opd1Info = OK_AnyValue,
OperandValueKind Opd2Info = OK_AnyValue,		OperandValueKind Opd2Info = OK_AnyValue,
OperandValueProperties Opd1PropInfo = OP_None,		OperandValueProperties Opd1PropInfo = OP_None,
OperandValueProperties Opd2PropInfo = OP_None,		OperandValueProperties Opd2PropInfo = OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >()) const;		ArrayRef<const Value > Args = ArrayRef<const Value >()) const;

/// \return The cost of a shuffle instruction of kind Kind and of type Tp.		/// \return The cost of a shuffle instruction of kind Kind and of type Tp.
/// The index and subtype parameters are used by the subvector insertion and		/// The index and subtype parameters are used by the subvector insertion and
▲ Show 20 Lines • Show All 311 Lines • ▼ Show 20 Lines	public:
virtual unsigned getCacheLineSize() = 0;		virtual unsigned getCacheLineSize() = 0;
virtual llvm::Optional<unsigned> getCacheSize(CacheLevel Level) = 0;		virtual llvm::Optional<unsigned> getCacheSize(CacheLevel Level) = 0;
virtual llvm::Optional<unsigned> getCacheAssociativity(CacheLevel Level) = 0;		virtual llvm::Optional<unsigned> getCacheAssociativity(CacheLevel Level) = 0;
virtual unsigned getPrefetchDistance() = 0;		virtual unsigned getPrefetchDistance() = 0;
virtual unsigned getMinPrefetchStride() = 0;		virtual unsigned getMinPrefetchStride() = 0;
virtual unsigned getMaxPrefetchIterationsAhead() = 0;		virtual unsigned getMaxPrefetchIterationsAhead() = 0;
virtual unsigned getMaxInterleaveFactor(unsigned VF) = 0;		virtual unsigned getMaxInterleaveFactor(unsigned VF) = 0;
virtual unsigned		virtual unsigned
getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,		getUnitThroughput(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,
OperandValueKind Opd2Info,		OperandValueKind Opd2Info,
OperandValueProperties Opd1PropInfo,		OperandValueProperties Opd1PropInfo,
OperandValueProperties Opd2PropInfo,		OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args) = 0;		ArrayRef<const Value *> Args) = 0;
virtual int getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,		virtual int getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) = 0;		Type *SubTp) = 0;
virtual int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		virtual int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
const Instruction *I) = 0;		const Instruction *I) = 0;
virtual int getExtractWithExtendCost(unsigned Opcode, Type *Dst,		virtual int getExtractWithExtendCost(unsigned Opcode, Type *Dst,
VectorType *VecTy, unsigned Index) = 0;		VectorType *VecTy, unsigned Index) = 0;
virtual int getCFInstrCost(unsigned Opcode) = 0;		virtual int getCFInstrCost(unsigned Opcode) = 0;
virtual int getCmpSelInstrCost(unsigned Opcode, Type *ValTy,		virtual int getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	public:
unsigned getMaxInterleaveFactor(unsigned VF) override {		unsigned getMaxInterleaveFactor(unsigned VF) override {
return Impl.getMaxInterleaveFactor(VF);		return Impl.getMaxInterleaveFactor(VF);
}		}
unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,		unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,
unsigned &JTSize) override {		unsigned &JTSize) override {
return Impl.getEstimatedNumberOfCaseClusters(SI, JTSize);		return Impl.getEstimatedNumberOfCaseClusters(SI, JTSize);
}		}
unsigned		unsigned
getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,		getUnitThroughput(unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,
OperandValueKind Opd2Info,		OperandValueKind Opd2Info,
OperandValueProperties Opd1PropInfo,		OperandValueProperties Opd1PropInfo,
OperandValueProperties Opd2PropInfo,		OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args) override {		ArrayRef<const Value *> Args) override {
return Impl.getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return Impl.getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo, Args);		Opd1PropInfo, Opd2PropInfo, Args);
}		}
int getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,		int getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) override {		Type *SubTp) override {
return Impl.getShuffleCost(Kind, Tp, Index, SubTp);		return Impl.getShuffleCost(Kind, Tp, Index, SubTp);
}		}
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
const Instruction *I) override {		const Instruction *I) override {
return Impl.getCastInstrCost(Opcode, Dst, Src, I);		return Impl.getCastInstrCost(Opcode, Dst, Src, I);
▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	public:
unsigned getPrefetchDistance() { return 0; }		unsigned getPrefetchDistance() { return 0; }

unsigned getMinPrefetchStride() { return 1; }		unsigned getMinPrefetchStride() { return 1; }

unsigned getMaxPrefetchIterationsAhead() { return UINT_MAX; }		unsigned getMaxPrefetchIterationsAhead() { return UINT_MAX; }

unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }		unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }

unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned getUnitThroughput(unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info,		TTI::OperandValueKind Opd1Info,
TTI::OperandValueKind Opd2Info,		TTI::OperandValueKind Opd2Info,
TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo,		TTI::OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args) {		ArrayRef<const Value *> Args) {
return 1;		return 1;
}		}

unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Ty, int Index,		unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Ty, int Index,
Type *SubTp) {		Type *SubTp) {
return 1;		return 1;
}		}

▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 468 Lines • ▼ Show 20 Lines	else
// associated with one argument as a heuristic.		// associated with one argument as a heuristic.
Cost += getScalarizationOverhead(VecTy, false, true);		Cost += getScalarizationOverhead(VecTy, false, true);

return Cost;		return Cost;
}		}

unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }		unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }

unsigned getArithmeticInstrCost(		unsigned getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >()) {		ArrayRef<const Value > Args = ArrayRef<const Value >()) {
// Check if any of the operands are vector operands.		// Check if any of the operands are vector operands.
const TargetLoweringBase *TLI = getTLI();		const TargetLoweringBase *TLI = getTLI();
Show All 20 Lines	unsigned getUnitThroughput(
}		}

// Else, assume that we need to scalarize this op.		// Else, assume that we need to scalarize this op.
// TODO: If one of the types get legalized by splitting, handle this		// TODO: If one of the types get legalized by splitting, handle this
// similarly to what getCastInstrCost() does.		// similarly to what getCastInstrCost() does.
if (Ty->isVectorTy()) {		if (Ty->isVectorTy()) {
unsigned Num = Ty->getVectorNumElements();		unsigned Num = Ty->getVectorNumElements();
unsigned Cost = static_cast<T *>(this)		unsigned Cost = static_cast<T *>(this)
->getArithmeticInstrCost(Opcode, Ty->getScalarType());		->getUnitThroughput(Opcode, Ty->getScalarType());
// Return the cost of multiple scalar invocation plus the cost of		// Return the cost of multiple scalar invocation plus the cost of
// inserting and extracting the values.		// inserting and extracting the values.
return getScalarizationOverhead(Ty, Args) + Num * Cost;		return getScalarizationOverhead(Ty, Args) + Num * Cost;
}		}

// We don't know anything about this scalar instruction.		// We don't know anything about this scalar instruction.
return OpCost;		return OpCost;
}		}
▲ Show 20 Lines • Show All 558 Lines • ▼ Show 20 Lines	unsigned getIntrinsicInstrCost(
auto MinCustomCostI = std::min_element(CustomCost.begin(), CustomCost.end());		auto MinCustomCostI = std::min_element(CustomCost.begin(), CustomCost.end());
if (MinCustomCostI != CustomCost.end())		if (MinCustomCostI != CustomCost.end())
return *MinCustomCostI;		return *MinCustomCostI;

// If we can't lower fmuladd into an FMA estimate the cost as a floating		// If we can't lower fmuladd into an FMA estimate the cost as a floating
// point mul followed by an add.		// point mul followed by an add.
if (IID == Intrinsic::fmuladd)		if (IID == Intrinsic::fmuladd)
return static_cast<T *>(this)		return static_cast<T *>(this)
->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) +		->getUnitThroughput(BinaryOperator::FMul, RetTy) +
static_cast<T *>(this)		static_cast<T *>(this)
->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy);		->getUnitThroughput(BinaryOperator::FAdd, RetTy);

// Else, assume that we need to scalarize this intrinsic. For math builtins		// Else, assume that we need to scalarize this intrinsic. For math builtins
// this will emit a costly libcall, adding call overhead and spills. Make it		// this will emit a costly libcall, adding call overhead and spills. Make it
// very expensive.		// very expensive.
if (RetTy->isVectorTy()) {		if (RetTy->isVectorTy()) {
unsigned ScalarizationCost =		unsigned ScalarizationCost =
((ScalarizationCostPassed != std::numeric_limits<unsigned>::max())		((ScalarizationCostPassed != std::numeric_limits<unsigned>::max())
? ScalarizationCostPassed		? ScalarizationCostPassed
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	unsigned getArithmeticReductionCost(unsigned Opcode, Type *Ty,
unsigned MVTLen =		unsigned MVTLen =
LT.second.isVector() ? LT.second.getVectorNumElements() : 1;		LT.second.isVector() ? LT.second.getVectorNumElements() : 1;
while (NumVecElts > MVTLen) {		while (NumVecElts > MVTLen) {
NumVecElts /= 2;		NumVecElts /= 2;
// Assume the pairwise shuffles add a cost.		// Assume the pairwise shuffles add a cost.
ShuffleCost += (IsPairwise + 1) *		ShuffleCost += (IsPairwise + 1) *
ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,		ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,
NumVecElts, Ty);		NumVecElts, Ty);
ArithCost += ConcreteTTI->getArithmeticInstrCost(Opcode, Ty);		ArithCost += ConcreteTTI->getUnitThroughput(Opcode, Ty);
Ty = VectorType::get(ScalarTy, NumVecElts);		Ty = VectorType::get(ScalarTy, NumVecElts);
++LongVectorCount;		++LongVectorCount;
}		}
// The minimal length of the vector is limited by the real length of vector		// The minimal length of the vector is limited by the real length of vector
// operations performed on the current platform. That's why several final		// operations performed on the current platform. That's why several final
// reduction operations are performed on the vectors with the same		// reduction operations are performed on the vectors with the same
// architecture-dependent length.		// architecture-dependent length.
ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1) *		ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1) *
ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,		ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,
NumVecElts, Ty);		NumVecElts, Ty);
ArithCost += (NumReduxLevels - LongVectorCount) *		ArithCost += (NumReduxLevels - LongVectorCount) *
ConcreteTTI->getArithmeticInstrCost(Opcode, Ty);		ConcreteTTI->getUnitThroughput(Opcode, Ty);
return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false, true);		return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false, true);
}		}

/// Try to calculate op costs for min/max reduction operations.		/// Try to calculate op costs for min/max reduction operations.
/// \param CondTy Conditional type for the Select instruction.		/// \param CondTy Conditional type for the Select instruction.
unsigned getMinMaxReductionCost(Type Ty, Type CondTy, bool IsPairwise,		unsigned getMinMaxReductionCost(Type Ty, Type CondTy, bool IsPairwise,
bool) {		bool) {
assert(Ty->isVectorTy() && "Expect a vector type");		assert(Ty->isVectorTy() && "Expect a vector type");
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 364 Lines • ▼ Show 20 Lines
unsigned TargetTransformInfo::getMaxPrefetchIterationsAhead() const {		unsigned TargetTransformInfo::getMaxPrefetchIterationsAhead() const {
return TTIImpl->getMaxPrefetchIterationsAhead();		return TTIImpl->getMaxPrefetchIterationsAhead();
}		}

unsigned TargetTransformInfo::getMaxInterleaveFactor(unsigned VF) const {		unsigned TargetTransformInfo::getMaxInterleaveFactor(unsigned VF) const {
return TTIImpl->getMaxInterleaveFactor(VF);		return TTIImpl->getMaxInterleaveFactor(VF);
}		}

int TargetTransformInfo::getArithmeticInstrCost(		int TargetTransformInfo::getUnitThroughput(
unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,		unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,
OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,		OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,
OperandValueProperties Opd2PropInfo,		OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args) const {		ArrayRef<const Value *> Args) const {
int Cost = TTIImpl->getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		int Cost = TTIImpl->getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo, Args);		Opd1PropInfo, Opd2PropInfo, Args);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

int TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Ty, int Index,		int TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Ty, int Index,
Type *SubTp) const {		Type *SubTp) const {
int Cost = TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);		int Cost = TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
▲ Show 20 Lines • Show All 614 Lines • ▼ Show 20 Lines	int TargetTransformInfo::getInstructionThroughput(const Instruction *I) const {
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
getOperandInfo(I->getOperand(0));		getOperandInfo(I->getOperand(0));
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
getOperandInfo(I->getOperand(1));		getOperandInfo(I->getOperand(1));
SmallVector<const Value*, 2> Operands(I->operand_values());		SmallVector<const Value*, 2> Operands(I->operand_values());
return getArithmeticInstrCost(I->getOpcode(), I->getType(), Op1VK,		return getUnitThroughput(I->getOpcode(), I->getType(), Op1VK, Op2VK,
Op2VK, TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None,
Operands);		TargetTransformInfo::OP_None, Operands);
}		}
case Instruction::Select: {		case Instruction::Select: {
const SelectInst *SI = cast<SelectInst>(I);		const SelectInst *SI = cast<SelectInst>(I);
Type *CondTy = SI->getCondition()->getType();		Type *CondTy = SI->getCondition()->getType();
return getCmpSelInstrCost(I->getOpcode(), I->getType(), CondTy, I);		return getCmpSelInstrCost(I->getOpcode(), I->getType(), CondTy, I);
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 5,644 Lines • ▼ Show 20 Lines	for (const auto &Inst : InstsToBePromoted) {
bool IsArg0Constant = isa<UndefValue>(Arg0) \|\| isa<ConstantInt>(Arg0) \|\|		bool IsArg0Constant = isa<UndefValue>(Arg0) \|\| isa<ConstantInt>(Arg0) \|\|
isa<ConstantFP>(Arg0);		isa<ConstantFP>(Arg0);
TargetTransformInfo::OperandValueKind Arg0OVK =		TargetTransformInfo::OperandValueKind Arg0OVK =
IsArg0Constant ? TargetTransformInfo::OK_UniformConstantValue		IsArg0Constant ? TargetTransformInfo::OK_UniformConstantValue
: TargetTransformInfo::OK_AnyValue;		: TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Arg1OVK =		TargetTransformInfo::OperandValueKind Arg1OVK =
!IsArg0Constant ? TargetTransformInfo::OK_UniformConstantValue		!IsArg0Constant ? TargetTransformInfo::OK_UniformConstantValue
: TargetTransformInfo::OK_AnyValue;		: TargetTransformInfo::OK_AnyValue;
ScalarCost += TTI.getArithmeticInstrCost(		ScalarCost += TTI.getUnitThroughput(Inst->getOpcode(), Inst->getType(),
Inst->getOpcode(), Inst->getType(), Arg0OVK, Arg1OVK);		Arg0OVK, Arg1OVK);
VectorCost += TTI.getArithmeticInstrCost(Inst->getOpcode(), PromotedType,		VectorCost += TTI.getUnitThroughput(Inst->getOpcode(), PromotedType,
Arg0OVK, Arg1OVK);		Arg0OVK, Arg1OVK);
}		}
DEBUG(dbgs() << "Estimated cost of computation to be promoted:\nScalar: "		DEBUG(dbgs() << "Estimated cost of computation to be promoted:\nScalar: "
<< ScalarCost << "\nVector: " << VectorCost << '\n');		<< ScalarCost << "\nVector: " << VectorCost << '\n');
return ScalarCost > VectorCost;		return ScalarCost > VectorCost;
}		}

/// \brief Generate a constant vector with \p Val with the same		/// \brief Generate a constant vector with \p Val with the same
/// number of elements as the transition.		/// number of elements as the transition.
▲ Show 20 Lines • Show All 933 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	public:
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index);		unsigned Index);

int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);

int getArithmeticInstrCost(		int getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >());		ArrayRef<const Value > Args = ArrayRef<const Value >());

int getAddressComputationCost(Type Ty, ScalarEvolution SE, const SCEV *Ptr);		int getAddressComputationCost(Type Ty, ScalarEvolution SE, const SCEV *Ptr);
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 468 Lines • ▼ Show 20 Lines	if (Index != -1U) {
if (Index == 0)		if (Index == 0)
return 0;		return 0;
}		}

// All other insert/extracts cost this much.		// All other insert/extracts cost this much.
return ST->getVectorInsertExtractBaseCost();		return ST->getVectorInsertExtractBaseCost();
}		}

int AArch64TTIImpl::getArithmeticInstrCost(		int AArch64TTIImpl::getUnitThroughput(
unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info,		unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info,
TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args) {		TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args) {
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);

// If the instruction is a widening instruction (e.g., uaddl, saddw, etc.),		// If the instruction is a widening instruction (e.g., uaddl, saddw, etc.),
// add in the widening overhead specified by the sub-target. Since the		// add in the widening overhead specified by the sub-target. Since the
Show All 9 Lines	int AArch64TTIImpl::getUnitThroughput(

if (ISD == ISD::SDIV &&		if (ISD == ISD::SDIV &&
Opd2Info == TargetTransformInfo::OK_UniformConstantValue &&		Opd2Info == TargetTransformInfo::OK_UniformConstantValue &&
Opd2PropInfo == TargetTransformInfo::OP_PowerOf2) {		Opd2PropInfo == TargetTransformInfo::OP_PowerOf2) {
// On AArch64, scalar signed division by constants power-of-two are		// On AArch64, scalar signed division by constants power-of-two are
// normally expanded to the sequence ADD + CMP + SELECT + SRA.		// normally expanded to the sequence ADD + CMP + SELECT + SRA.
// The OperandValue properties many not be same as that of previous		// The OperandValue properties many not be same as that of previous
// operation; conservatively assume OP_None.		// operation; conservatively assume OP_None.
Cost += getArithmeticInstrCost(Instruction::Add, Ty, Opd1Info, Opd2Info,		Cost += getUnitThroughput(Instruction::Add, Ty, Opd1Info, Opd2Info,
TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None);
Cost += getArithmeticInstrCost(Instruction::Sub, Ty, Opd1Info, Opd2Info,		Cost += getUnitThroughput(Instruction::Sub, Ty, Opd1Info, Opd2Info,
TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None);
Cost += getArithmeticInstrCost(Instruction::Select, Ty, Opd1Info, Opd2Info,		Cost += getUnitThroughput(Instruction::Select, Ty, Opd1Info, Opd2Info,
TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None);
Cost += getArithmeticInstrCost(Instruction::AShr, Ty, Opd1Info, Opd2Info,		Cost += getUnitThroughput(Instruction::AShr, Ty, Opd1Info, Opd2Info,
TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None);
return Cost;		return Cost;
}		}

switch (ISD) {		switch (ISD) {
default:		default:
return Cost + BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return Cost + BaseT::getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);
case ISD::ADD:		case ISD::ADD:
case ISD::MUL:		case ISD::MUL:
case ISD::XOR:		case ISD::XOR:
case ISD::OR:		case ISD::OR:
case ISD::AND:		case ISD::AND:
// These nodes are marked as 'custom' for combining purposes only.		// These nodes are marked as 'custom' for combining purposes only.
// We know that they are legal. See LowerAdd in ISelLowering.		// We know that they are legal. See LowerAdd in ISelLowering.
return (Cost + 1) * LT.first;		return (Cost + 1) * LT.first;
▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	public:
bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,		bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
unsigned Alignment,		unsigned Alignment,
unsigned AddrSpace) const;		unsigned AddrSpace) const;

unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);

bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) const;		bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) const;

int getArithmeticInstrCost(		int getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >());		ArrayRef<const Value > Args = ArrayRef<const Value >());

unsigned getCFInstrCost(unsigned Opcode);		unsigned getCFInstrCost(unsigned Opcode);
Show All 28 Lines

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_ds_fmax: {
Info.IsVolatile = !Volatile->isNullValue();		Info.IsVolatile = !Volatile->isNullValue();
return true;		return true;
}		}
default:		default:
return false;		return false;
}		}
}		}

int AMDGPUTTIImpl::getArithmeticInstrCost(		int AMDGPUTTIImpl::getUnitThroughput(
unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info,		unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info,
TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args ) {		TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args ) {
EVT OrigTy = TLI->getValueType(DL, Ty);		EVT OrigTy = TLI->getValueType(DL, Ty);
if (!OrigTy.isSimple()) {		if (!OrigTy.isSimple()) {
return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return BaseT::getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);
}		}

// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);

// Because we don't have any legal vector operations, but the legal types, we		// Because we don't have any legal vector operations, but the legal types, we
// need to account for split vectors.		// need to account for split vectors.
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	if (SLT == MVT::f32 \|\| SLT == MVT::f16) {

return LT.first * NElts * Cost;		return LT.first * NElts * Cost;
}		}
break;		break;
default:		default:
break;		break;
}		}

return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return BaseT::getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info, Opd1PropInfo,
Opd1PropInfo, Opd2PropInfo);		Opd2PropInfo);
}		}

unsigned AMDGPUTTIImpl::getCFInstrCost(unsigned Opcode) {		unsigned AMDGPUTTIImpl::getCFInstrCost(unsigned Opcode) {
// XXX - For some reason this isn't called for switch.		// XXX - For some reason this isn't called for switch.
switch (Opcode) {		switch (Opcode) {
case Instruction::Br:		case Instruction::Br:
case Instruction::Ret:		case Instruction::Ret:
return 10;		return 10;
▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	public:
int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		int getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);

int getAddressComputationCost(Type Val, ScalarEvolution SE,		int getAddressComputationCost(Type Val, ScalarEvolution SE,
const SCEV *Ptr);		const SCEV *Ptr);

int getArithmeticInstrCost(		int getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Op1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Op1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Op2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Op2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >());		ArrayRef<const Value > Args = ArrayRef<const Value >());

int getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,		int getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
Show All 24 Lines

lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 447 Lines • ▼ Show 20 Lines	if (Kind == TTI::SK_Alternate) {
if (const auto *Entry = CostTableLookup(NEONAltShuffleTbl,		if (const auto *Entry = CostTableLookup(NEONAltShuffleTbl,
ISD::VECTOR_SHUFFLE, LT.second))		ISD::VECTOR_SHUFFLE, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);		return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);
}		}
return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);		return BaseT::getShuffleCost(Kind, Tp, Index, SubTp);
}		}

int ARMTTIImpl::getArithmeticInstrCost(		int ARMTTIImpl::getUnitThroughput(
unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info,		unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info,
TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo,		TTI::OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args) {		ArrayRef<const Value *> Args) {
int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);		int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);

const unsigned FunctionCallDivCost = 20;		const unsigned FunctionCallDivCost = 20;
Show All 38 Lines	static const CostTblEntry CostTbl[] = {
{ ISD::UREM, MVT::v16i8, 16 * FunctionCallDivCost},		{ ISD::UREM, MVT::v16i8, 16 * FunctionCallDivCost},
// Multiplication.		// Multiplication.
};		};

if (ST->hasNEON())		if (ST->hasNEON())
if (const auto *Entry = CostTableLookup(CostTbl, ISDOpcode, LT.second))		if (const auto *Entry = CostTableLookup(CostTbl, ISDOpcode, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

int Cost = BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info,		int Cost = BaseT::getUnitThroughput(Opcode, Ty, Op1Info, Op2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);

// This is somewhat of a hack. The problem that we are facing is that SROA		// This is somewhat of a hack. The problem that we are facing is that SROA
// creates a sequence of shift, and, or instructions to construct values.		// creates a sequence of shift, and, or instructions to construct values.
// These sequences are recognized by the ISel and have zero-cost. Not so for		// These sequences are recognized by the ISel and have zero-cost. Not so for
// the vectorized code. Because we have support for v2i64 but not i64 those		// the vectorized code. Because we have support for v2i64 but not i64 those
// sequences look particularly beneficial to vectorize.		// sequences look particularly beneficial to vectorize.
// To work around this we increase the cost of v2i64 operations to make them		// To work around this we increase the cost of v2i64 operations to make them
// seem less beneficial.		// seem less beneficial.
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

lib/Target/Lanai/LanaiTargetTransformInfo.h

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm, Type *Ty) {
return getIntImmCost(Imm, Ty);		return getIntImmCost(Imm, Ty);
}		}

int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) {		Type *Ty) {
return getIntImmCost(Imm, Ty);		return getIntImmCost(Imm, Ty);
}		}

unsigned getArithmeticInstrCost(		unsigned getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >()) {		ArrayRef<const Value > Args = ArrayRef<const Value >()) {
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);

switch (ISD) {		switch (ISD) {
default:		default:
return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return BaseT::getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);
case ISD::MUL:		case ISD::MUL:
case ISD::SDIV:		case ISD::SDIV:
case ISD::UDIV:		case ISD::UDIV:
case ISD::UREM:		case ISD::UREM:
// This increases the cost associated with multiplication and division		// This increases the cost associated with multiplication and division
// to 64 times what the baseline arithmetic cost is. The arithmetic		// to 64 times what the baseline arithmetic cost is. The arithmetic
// instruction cost was arbitrarily chosen to reduce the desirability		// instruction cost was arbitrarily chosen to reduce the desirability
// of emitting arithmetic instructions that are emulated in software.		// of emitting arithmetic instructions that are emulated in software.
// TODO: Investigate the performance impact given specialized lowerings.		// TODO: Investigate the performance impact given specialized lowerings.
return 64 * BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return 64 * BaseT::getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);
}		}
}		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_LANAI_LANAITARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_LANAI_LANAITARGETTRANSFORMINFO_H

lib/Target/NVPTX/NVPTXTargetTransformInfo.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	public:
unsigned getFlatAddressSpace() const {		unsigned getFlatAddressSpace() const {
return AddressSpace::ADDRESS_SPACE_GENERIC;		return AddressSpace::ADDRESS_SPACE_GENERIC;
}		}

// Increase the inlining cost threshold by a factor of 5, reflecting that		// Increase the inlining cost threshold by a factor of 5, reflecting that
// calls are particularly expensive in NVPTX.		// calls are particularly expensive in NVPTX.
unsigned getInliningThresholdMultiplier() { return 5; }		unsigned getInliningThresholdMultiplier() { return 5; }

int getArithmeticInstrCost(		int getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >());		ArrayRef<const Value > Args = ArrayRef<const Value >());

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
Show All 22 Lines

lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	if (const Instruction *I = dyn_cast<Instruction>(V)) {
// inter-procedural analysis.		// inter-procedural analysis.
if (isa<CallInst>(I))		if (isa<CallInst>(I))
return true;		return true;
}		}

return false;		return false;
}		}

int NVPTXTTIImpl::getArithmeticInstrCost(		int NVPTXTTIImpl::getUnitThroughput(
unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info,		unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info,
TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args) {		TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args) {
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);

int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);

switch (ISD) {		switch (ISD) {
default:		default:
return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return BaseT::getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);
case ISD::ADD:		case ISD::ADD:
case ISD::MUL:		case ISD::MUL:
case ISD::XOR:		case ISD::XOR:
case ISD::OR:		case ISD::OR:
case ISD::AND:		case ISD::AND:
// The machine code (SASS) simulates an i64 with two i32. Therefore, we		// The machine code (SASS) simulates an i64 with two i32. Therefore, we
// estimate that arithmetic operations on i64 are twice as expensive as		// estimate that arithmetic operations on i64 are twice as expensive as
// those on types that can fit into one machine register.		// those on types that can fit into one machine register.
if (LT.second.SimpleTy == MVT::i64)		if (LT.second.SimpleTy == MVT::i64)
return 2 * LT.first;		return 2 * LT.first;
// Delegate other cases to the basic TTI.		// Delegate other cases to the basic TTI.
return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return BaseT::getUnitThroughput(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);
}		}
}		}

void NVPTXTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void NVPTXTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP) {		TTI::UnrollingPreferences &UP) {
BaseT::getUnrollingPreferences(L, SE, UP);		BaseT::getUnrollingPreferences(L, SE, UP);

// Enable partial unrolling and runtime unrolling, but reduce the		// Enable partial unrolling and runtime unrolling, but reduce the
// threshold. This partially unrolls small loops which are often		// threshold. This partially unrolls small loops which are often
// unrolled by the PTX to SASS compiler and unrolling earlier can be		// unrolled by the PTX to SASS compiler and unrolling earlier can be
// beneficial.		// beneficial.
UP.Partial = UP.Runtime = true;		UP.Partial = UP.Runtime = true;
UP.PartialThreshold = UP.Threshold / 4;		UP.PartialThreshold = UP.Threshold / 4;
}		}

lib/Target/PowerPC/PPCTargetTransformInfo.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	public:
const TTI::MemCmpExpansionOptions *enableMemCmpExpansion(		const TTI::MemCmpExpansionOptions *enableMemCmpExpansion(
bool IsZeroCmp) const;		bool IsZeroCmp) const;
bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();
unsigned getNumberOfRegisters(bool Vector);		unsigned getNumberOfRegisters(bool Vector);
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getCacheLineSize();		unsigned getCacheLineSize();
unsigned getPrefetchDistance();		unsigned getPrefetchDistance();
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
int getArithmeticInstrCost(		int getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >());		ArrayRef<const Value > Args = ArrayRef<const Value >());
int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);		int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
Show All 18 Lines

lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	if (Directive == PPC::DIR_PWR7 \|\| Directive == PPC::DIR_PWR8 \|\|
Directive == PPC::DIR_PWR9)		Directive == PPC::DIR_PWR9)
return 12;		return 12;

// For most things, modern systems have two execution units (and		// For most things, modern systems have two execution units (and
// out-of-order execution).		// out-of-order execution).
return 2;		return 2;
}		}

int PPCTTIImpl::getArithmeticInstrCost(		int PPCTTIImpl::getUnitThroughput(
unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info,		unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info,
TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args) {		TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args) {
assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");		assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");

// Fallback to the default implementation.		// Fallback to the default implementation.
return BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info,		return BaseT::getUnitThroughput(Opcode, Ty, Op1Info, Op2Info, Opd1PropInfo,
Opd1PropInfo, Opd2PropInfo);		Opd2PropInfo);
}		}

int PPCTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		int PPCTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);

// PPC, for both Altivec/VSX and QPX, support cheap arbitrary permutations		// PPC, for both Altivec/VSX and QPX, support cheap arbitrary permutations
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZTargetTransformInfo.h

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	public:
unsigned getMinPrefetchStride() { return 2048; }		unsigned getMinPrefetchStride() { return 2048; }

bool hasDivRemOp(Type *DataType, bool IsSigned);		bool hasDivRemOp(Type *DataType, bool IsSigned);
bool prefersVectorizedAddressing() { return false; }		bool prefersVectorizedAddressing() { return false; }
bool LSRWithInstrQueries() { return true; }		bool LSRWithInstrQueries() { return true; }
bool supportsEfficientVectorElementLoadStore() { return true; }		bool supportsEfficientVectorElementLoadStore() { return true; }
bool enableInterleavedAccessVectorization() { return true; }		bool enableInterleavedAccessVectorization() { return true; }

int getArithmeticInstrCost(		int getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >());		ArrayRef<const Value > Args = ArrayRef<const Value >());
int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);		int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);
unsigned getVectorTruncCost(Type SrcTy, Type DstTy);		unsigned getVectorTruncCost(Type SrcTy, Type DstTy);
Show All 20 Lines

lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	unsigned SystemZTTIImpl::getRegisterBitWidth(bool Vector) const {
return 0;		return 0;
}		}

bool SystemZTTIImpl::hasDivRemOp(Type *DataType, bool IsSigned) {		bool SystemZTTIImpl::hasDivRemOp(Type *DataType, bool IsSigned) {
EVT VT = TLI->getValueType(DL, DataType);		EVT VT = TLI->getValueType(DL, DataType);
return (VT.isScalarInteger() && TLI->isTypeLegal(VT));		return (VT.isScalarInteger() && TLI->isTypeLegal(VT));
}		}

int SystemZTTIImpl::getArithmeticInstrCost(		int SystemZTTIImpl::getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Op1Info, TTI::OperandValueKind Op2Info,		TTI::OperandValueKind Op1Info, TTI::OperandValueKind Op2Info,
TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo,		TTI::OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args) {		ArrayRef<const Value *> Args) {

// TODO: return a good value for BB-VECTORIZER that includes the		// TODO: return a good value for BB-VECTORIZER that includes the
// immediate loads, which we do not want to count for the loop		// immediate loads, which we do not want to count for the loop
Show All 23 Lines	if (CI != nullptr &&
if (Opcode == Instruction::SDiv)		if (Opcode == Instruction::SDiv)
SDivPow2 = true;		SDivPow2 = true;
else		else
UDivPow2 = true;		UDivPow2 = true;
}		}
}		}

if (Ty->isVectorTy()) {		if (Ty->isVectorTy()) {
assert (ST->hasVector() && "getArithmeticInstrCost() called with vector type.");		assert (ST->hasVector() && "getUnitThroughput() called with vector type.");
unsigned VF = Ty->getVectorNumElements();		unsigned VF = Ty->getVectorNumElements();
unsigned NumVectors = getNumberOfParts(Ty);		unsigned NumVectors = getNumberOfParts(Ty);

// These vector operations are custom handled, but are still supported		// These vector operations are custom handled, but are still supported
// with one instruction per vector, regardless of element size.		// with one instruction per vector, regardless of element size.
if (Opcode == Instruction::Shl \|\| Opcode == Instruction::LShr \|\|		if (Opcode == Instruction::Shl \|\| Opcode == Instruction::LShr \|\|
Opcode == Instruction::AShr \|\| UDivPow2) {		Opcode == Instruction::AShr \|\| UDivPow2) {
return NumVectors;		return NumVectors;
Show All 10 Lines	if (Opcode == Instruction::FAdd \|\| Opcode == Instruction::FSub \|\|
Opcode == Instruction::FMul \|\| Opcode == Instruction::FDiv) {		Opcode == Instruction::FMul \|\| Opcode == Instruction::FDiv) {
switch (ScalarBits) {		switch (ScalarBits) {
case 32: {		case 32: {
// The vector enhancements facility 1 provides v4f32 instructions.		// The vector enhancements facility 1 provides v4f32 instructions.
if (ST->hasVectorEnhancements1())		if (ST->hasVectorEnhancements1())
return NumVectors;		return NumVectors;
// Return the cost of multiple scalar invocation plus the cost of		// Return the cost of multiple scalar invocation plus the cost of
// inserting and extracting the values.		// inserting and extracting the values.
unsigned ScalarCost = getArithmeticInstrCost(Opcode, Ty->getScalarType());		unsigned ScalarCost = getUnitThroughput(Opcode, Ty->getScalarType());
unsigned Cost = (VF * ScalarCost) + getScalarizationOverhead(Ty, Args);		unsigned Cost = (VF * ScalarCost) + getScalarizationOverhead(Ty, Args);
// FIXME: VF 2 for these FP operations are currently just as		// FIXME: VF 2 for these FP operations are currently just as
// expensive as for VF 4.		// expensive as for VF 4.
if (VF == 2)		if (VF == 2)
Cost *= 2;		Cost *= 2;
return Cost;		return Cost;
}		}
case 64:		case 64:
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if ((Opcode == Instruction::SDiv \|\| Opcode == Instruction::SRem))
return (ScalarBits < 32 ? 4 : (ScalarBits == 32 ? 2 : 1));		return (ScalarBits < 32 ? 4 : (ScalarBits == 32 ? 2 : 1));

if (Opcode == Instruction::UDiv \|\| Opcode == Instruction::URem)		if (Opcode == Instruction::UDiv \|\| Opcode == Instruction::URem)
// Clearing of low 64 bit reg + sext of op(s) for narrow types + dl[g]r		// Clearing of low 64 bit reg + sext of op(s) for narrow types + dl[g]r
return (ScalarBits < 32 ? 4 : 2);		return (ScalarBits < 32 ? 4 : 2);
}		}

// Fallback to the default implementation.		// Fallback to the default implementation.
return BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info,		return BaseT::getUnitThroughput(Opcode, Ty, Op1Info, Op2Info,
Opd1PropInfo, Opd2PropInfo, Args);		Opd1PropInfo, Opd2PropInfo, Args);
}		}


int SystemZTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		int SystemZTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
assert (Tp->isVectorTy());		assert (Tp->isVectorTy());
assert (ST->hasVector() && "getShuffleCost() called.");		assert (ST->hasVector() && "getShuffleCost() called.");
unsigned NumVectors = getNumberOfParts(Tp);		unsigned NumVectors = getNumberOfParts(Tp);
▲ Show 20 Lines • Show All 437 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	public:

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector);		unsigned getNumberOfRegisters(bool Vector);
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;		unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
int getArithmeticInstrCost(		int getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >());		ArrayRef<const Value > Args = ArrayRef<const Value >());
int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);		int getShuffleCost(TTI::ShuffleKind Kind, Type Tp, int Index, Type SubTp);
int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	unsigned X86TTIImpl::getMaxInterleaveFactor(unsigned VF) {
// Sandybridge and Haswell have multiple execution ports and pipelined		// Sandybridge and Haswell have multiple execution ports and pipelined
// vector units.		// vector units.
if (ST->hasAVX())		if (ST->hasAVX())
return 4;		return 4;

return 2;		return 2;
}		}

int X86TTIImpl::getArithmeticInstrCost(		int X86TTIImpl::getUnitThroughput(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Op1Info, TTI::OperandValueKind Op2Info,		TTI::OperandValueKind Op1Info, TTI::OperandValueKind Op2Info,
TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo,		TTI::OperandValueProperties Opd2PropInfo,
ArrayRef<const Value *> Args) {		ArrayRef<const Value *> Args) {
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	int X86TTIImpl::getUnitThroughput(

if (ISD == ISD::SDIV &&		if (ISD == ISD::SDIV &&
Op2Info == TargetTransformInfo::OK_UniformConstantValue &&		Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
Opd2PropInfo == TargetTransformInfo::OP_PowerOf2) {		Opd2PropInfo == TargetTransformInfo::OP_PowerOf2) {
// On X86, vector signed division by constants power-of-two are		// On X86, vector signed division by constants power-of-two are
// normally expanded to the sequence SRA + SRL + ADD + SRA.		// normally expanded to the sequence SRA + SRL + ADD + SRA.
// The OperandValue properties many not be same as that of previous		// The OperandValue properties many not be same as that of previous
// operation;conservatively assume OP_None.		// operation;conservatively assume OP_None.
int Cost = 2 * getArithmeticInstrCost(Instruction::AShr, Ty, Op1Info,		int Cost = 2 * getUnitThroughput(Instruction::AShr, Ty, Op1Info,
Op2Info, TargetTransformInfo::OP_None,		Op2Info, TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None);
Cost += getArithmeticInstrCost(Instruction::LShr, Ty, Op1Info, Op2Info,		Cost += getUnitThroughput(Instruction::LShr, Ty, Op1Info, Op2Info,
TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None);
Cost += getArithmeticInstrCost(Instruction::Add, Ty, Op1Info, Op2Info,		Cost += getUnitThroughput(Instruction::Add, Ty, Op1Info, Op2Info,
TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None,
TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None);

return Cost;		return Cost;
}		}

static const CostTblEntry AVX512BWUniformConstCostTable[] = {		static const CostTblEntry AVX512BWUniformConstCostTable[] = {
{ ISD::SHL, MVT::v64i8, 2 }, // psllw + pand.		{ ISD::SHL, MVT::v64i8, 2 }, // psllw + pand.
{ ISD::SRL, MVT::v64i8, 2 }, // psrlw + pand.		{ ISD::SRL, MVT::v64i8, 2 }, // psrlw + pand.
{ ISD::SRA, MVT::v64i8, 4 }, // psrlw, pand, pxor, psubb.		{ ISD::SRA, MVT::v64i8, 4 }, // psrlw, pand, pxor, psubb.
▲ Show 20 Lines • Show All 474 Lines • ▼ Show 20 Lines	static const CostTblEntry SSE1CostTable[] = {
{ ISD::FDIV, MVT::v4f32, 34 }, // Pentium III from http://www.agner.org/		{ ISD::FDIV, MVT::v4f32, 34 }, // Pentium III from http://www.agner.org/
};		};

if (ST->hasSSE1())		if (ST->hasSSE1())
if (const auto *Entry = CostTableLookup(SSE1CostTable, ISD, LT.second))		if (const auto *Entry = CostTableLookup(SSE1CostTable, ISD, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

// Fallback to the default implementation.		// Fallback to the default implementation.
return BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info);		return BaseT::getUnitThroughput(Opcode, Ty, Op1Info, Op2Info);
}		}

int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
// 64-bit packed float vectors (v2f32) are widened to type v4f32.		// 64-bit packed float vectors (v2f32) are widened to type v4f32.
// 64-bit packed integer vectors (v2i32) are promoted to type v2i64.		// 64-bit packed integer vectors (v2i32) are promoted to type v2i64.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);

▲ Show 20 Lines • Show All 2,120 Lines • Show Last 20 Lines

lib/Transforms/Scalar/IndVarSimplify.cpp

Show First 20 Lines • Show All 875 Lines • ▼ Show 20 Lines	static void visitIVCast(CastInst Cast, WideIVInfo &WI, ScalarEvolution SE,

// Cast is either an sext or zext up to this point.		// Cast is either an sext or zext up to this point.
// We should not widen an indvar if arithmetics on the wider indvar are more		// We should not widen an indvar if arithmetics on the wider indvar are more
// expensive than those on the narrower indvar. We check only the cost of ADD		// expensive than those on the narrower indvar. We check only the cost of ADD
// because at least an ADD is required to increment the induction variable. We		// because at least an ADD is required to increment the induction variable. We
// could compute more comprehensively the cost of all instructions on the		// could compute more comprehensively the cost of all instructions on the
// induction variable when necessary.		// induction variable when necessary.
if (TTI &&		if (TTI &&
TTI->getArithmeticInstrCost(Instruction::Add, Ty) >		TTI->getUnitThroughput(Instruction::Add, Ty) >
TTI->getArithmeticInstrCost(Instruction::Add,		TTI->getUnitThroughput(Instruction::Add,
Cast->getOperand(0)->getType())) {		Cast->getOperand(0)->getType())) {
return;		return;
}		}

if (!WI.WidestNativeType) {		if (!WI.WidestNativeType) {
WI.WidestNativeType = SE->getEffectiveSCEVType(Ty);		WI.WidestNativeType = SE->getEffectiveSCEVType(Ty);
WI.IsSigned = IsSigned;		WI.IsSigned = IsSigned;
return;		return;
}		}
▲ Show 20 Lines • Show All 1,703 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,205 Lines • ▼ Show 20 Lines	if (VF > 1 && isScalarWithPredication(I)) {

// These instructions have a non-void type, so account for the phi nodes		// These instructions have a non-void type, so account for the phi nodes
// that we will create. This cost is likely to be zero. The phi node		// that we will create. This cost is likely to be zero. The phi node
// cost, if any, should be scaled by the block probability because it		// cost, if any, should be scaled by the block probability because it
// models a copy at the end of each predicated block.		// models a copy at the end of each predicated block.
Cost += VF * TTI.getCFInstrCost(Instruction::PHI);		Cost += VF * TTI.getCFInstrCost(Instruction::PHI);

// The cost of the non-predicated instruction.		// The cost of the non-predicated instruction.
Cost += VF * TTI.getArithmeticInstrCost(I->getOpcode(), RetTy);		Cost += VF * TTI.getUnitThroughput(I->getOpcode(), RetTy);

// The cost of insertelement and extractelement instructions needed for		// The cost of insertelement and extractelement instructions needed for
// scalarization.		// scalarization.
Cost += getScalarizationOverhead(I, VF, TTI);		Cost += getScalarizationOverhead(I, VF, TTI);

// Scale the cost by the probability of executing the predicated blocks.		// Scale the cost by the probability of executing the predicated blocks.
// This assumes the predicated block for each vector lane is equally		// This assumes the predicated block for each vector lane is equally
// likely.		// likely.
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (isa<ConstantInt>(Op2)) {
Op2VP = TargetTransformInfo::OP_PowerOf2;		Op2VP = TargetTransformInfo::OP_PowerOf2;
Op2VK = TargetTransformInfo::OK_UniformConstantValue;		Op2VK = TargetTransformInfo::OK_UniformConstantValue;
}		}
} else if (Legal->isUniform(Op2)) {		} else if (Legal->isUniform(Op2)) {
Op2VK = TargetTransformInfo::OK_UniformValue;		Op2VK = TargetTransformInfo::OK_UniformValue;
}		}
SmallVector<const Value *, 4> Operands(I->operand_values());		SmallVector<const Value *, 4> Operands(I->operand_values());
unsigned N = isScalarAfterVectorization(I, VF) ? VF : 1;		unsigned N = isScalarAfterVectorization(I, VF) ? VF : 1;
return N * TTI.getArithmeticInstrCost(I->getOpcode(), VectorTy, Op1VK,		return N * TTI.getUnitThroughput(I->getOpcode(), VectorTy, Op1VK, Op2VK,
Op2VK, Op1VP, Op2VP, Operands);		Op1VP, Op2VP, Operands);
}		}
case Instruction::Select: {		case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);		SelectInst *SI = cast<SelectInst>(I);
const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());		const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));		bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
Type *CondTy = SI->getCondition()->getType();		Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond)		if (!ScalarCond)
CondTy = VectorType::get(CondTy, VF);		CondTy = VectorType::get(CondTy, VF);
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	case Instruction::Call: {
unsigned CallCost = getVectorCallCost(CI, VF, TTI, TLI, NeedToScalarize);		unsigned CallCost = getVectorCallCost(CI, VF, TTI, TLI, NeedToScalarize);
if (getVectorIntrinsicIDForCall(CI, TLI))		if (getVectorIntrinsicIDForCall(CI, TLI))
return std::min(CallCost, getVectorIntrinsicCost(CI, VF, TTI, TLI));		return std::min(CallCost, getVectorIntrinsicCost(CI, VF, TTI, TLI));
return CallCost;		return CallCost;
}		}
default:		default:
// The cost of executing VF copies of the scalar instruction. This opcode		// The cost of executing VF copies of the scalar instruction. This opcode
// is unknown. Assume that it is the same as 'mul'.		// is unknown. Assume that it is the same as 'mul'.
return VF * TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy) +		return VF * TTI.getUnitThroughput(Instruction::Mul, VectorTy) +
getScalarizationOverhead(I, VF, TTI);		getScalarizationOverhead(I, VF, TTI);
} // end of switch.		} // end of switch.
}		}

char LoopVectorize::ID = 0;		char LoopVectorize::ID = 0;

static const char lv_name[] = "Loop Vectorization";		static const char lv_name[] = "Loop Vectorization";

▲ Show 20 Lines • Show All 1,313 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 2,198 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
if (Op2VK == TargetTransformInfo::OK_UniformConstantValue && CInt &&		if (Op2VK == TargetTransformInfo::OK_UniformConstantValue && CInt &&
CInt->getValue().isPowerOf2())		CInt->getValue().isPowerOf2())
Op2VP = TargetTransformInfo::OP_PowerOf2;		Op2VP = TargetTransformInfo::OP_PowerOf2;

SmallVector<const Value *, 4> Operands(VL0->operand_values());		SmallVector<const Value *, 4> Operands(VL0->operand_values());
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -=		ReuseShuffleCost -=
(ReuseShuffleNumbers - VL.size()) *		(ReuseShuffleNumbers - VL.size()) *
TTI->getArithmeticInstrCost(S.Opcode, ScalarTy, Op1VK, Op2VK, Op1VP,		TTI->getUnitThroughput(S.Opcode, ScalarTy, Op1VK, Op2VK, Op1VP,
Op2VP, Operands);		Op2VP, Operands);
}		}
int ScalarCost =		int ScalarCost =
VecTy->getNumElements() *		VecTy->getNumElements() *
TTI->getArithmeticInstrCost(S.Opcode, ScalarTy, Op1VK, Op2VK, Op1VP,		TTI->getUnitThroughput(S.Opcode, ScalarTy, Op1VK, Op2VK, Op1VP,
Op2VP, Operands);		Op2VP, Operands);
int VecCost = TTI->getArithmeticInstrCost(S.Opcode, VecTy, Op1VK, Op2VK,		int VecCost = TTI->getUnitThroughput(S.Opcode, VecTy, Op1VK, Op2VK,
Op1VP, Op2VP, Operands);		Op1VP, Op2VP, Operands);
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;		TargetTransformInfo::OK_UniformConstantValue;

if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) *		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) *
TTI->getArithmeticInstrCost(Instruction::Add,		TTI->getUnitThroughput(Instruction::Add, ScalarTy,
ScalarTy, Op1VK, Op2VK);		Op1VK, Op2VK);
}		}
int ScalarCost =		int ScalarCost =
VecTy->getNumElements() *		VecTy->getNumElements() *
TTI->getArithmeticInstrCost(Instruction::Add, ScalarTy, Op1VK, Op2VK);		TTI->getUnitThroughput(Instruction::Add, ScalarTy, Op1VK, Op2VK);
int VecCost =		int VecCost =
TTI->getArithmeticInstrCost(Instruction::Add, VecTy, Op1VK, Op2VK);		TTI->getUnitThroughput(Instruction::Add, VecTy, Op1VK, Op2VK);

return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Cost of wide load - cost of scalar loads.		// Cost of wide load - cost of scalar loads.
unsigned alignment = dyn_cast<LoadInst>(VL0)->getAlignment();		unsigned alignment = dyn_cast<LoadInst>(VL0)->getAlignment();
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) *		ReuseShuffleCost -= (ReuseShuffleNumbers - VL.size()) *
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	case Instruction::ShuffleVector: {
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
int ScalarCost = 0;		int ScalarCost = 0;
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
for (unsigned Idx : E->ReuseShuffleIndices) {		for (unsigned Idx : E->ReuseShuffleIndices) {
Instruction *I = cast<Instruction>(VL[Idx]);		Instruction *I = cast<Instruction>(VL[Idx]);
if (!I)		if (!I)
continue;		continue;
ReuseShuffleCost -= TTI->getArithmeticInstrCost(		ReuseShuffleCost -= TTI->getUnitThroughput(I->getOpcode(), ScalarTy,
I->getOpcode(), ScalarTy, Op1VK, Op2VK);		Op1VK, Op2VK);
}		}
for (Value *V : VL) {		for (Value *V : VL) {
Instruction *I = cast<Instruction>(V);		Instruction *I = cast<Instruction>(V);
if (!I)		if (!I)
continue;		continue;
ReuseShuffleCost += TTI->getArithmeticInstrCost(		ReuseShuffleCost += TTI->getUnitThroughput(I->getOpcode(), ScalarTy,
I->getOpcode(), ScalarTy, Op1VK, Op2VK);		Op1VK, Op2VK);
}		}
}		}
int VecCost = 0;		int VecCost = 0;
for (Value *i : VL) {		for (Value *i : VL) {
Instruction *I = cast<Instruction>(i);		Instruction *I = cast<Instruction>(i);
if (!I)		if (!I)
break;		break;
ScalarCost +=		ScalarCost += TTI->getUnitThroughput(I->getOpcode(), ScalarTy, Op1VK,
TTI->getArithmeticInstrCost(I->getOpcode(), ScalarTy, Op1VK, Op2VK);		Op2VK);
}		}
// VecCost is equal to sum of the cost of creating 2 vectors		// VecCost is equal to sum of the cost of creating 2 vectors
// and the cost of creating shuffle.		// and the cost of creating shuffle.
Instruction *I0 = cast<Instruction>(VL[0]);		Instruction *I0 = cast<Instruction>(VL[0]);
VecCost =		VecCost = TTI->getUnitThroughput(I0->getOpcode(), VecTy, Op1VK, Op2VK);
TTI->getArithmeticInstrCost(I0->getOpcode(), VecTy, Op1VK, Op2VK);
Instruction *I1 = cast<Instruction>(VL[1]);		Instruction *I1 = cast<Instruction>(VL[1]);
VecCost +=		VecCost += TTI->getUnitThroughput(I1->getOpcode(), VecTy, Op1VK, Op2VK);
TTI->getArithmeticInstrCost(I1->getOpcode(), VecTy, Op1VK, Op2VK);
VecCost +=		VecCost +=
TTI->getShuffleCost(TargetTransformInfo::SK_Alternate, VecTy, 0);		TTI->getShuffleCost(TargetTransformInfo::SK_Alternate, VecTy, 0);
return ReuseShuffleCost + VecCost - ScalarCost;		return ReuseShuffleCost + VecCost - ScalarCost;
}		}
default:		default:
llvm_unreachable("Unknown instruction");		llvm_unreachable("Unknown instruction");
}		}
}		}
▲ Show 20 Lines • Show All 3,341 Lines • ▼ Show 20 Lines	int getReductionCost(TargetTransformInfo TTI, Value FirstReducedVal,

IsPairwiseReduction = PairwiseRdxCost < SplittingRdxCost;		IsPairwiseReduction = PairwiseRdxCost < SplittingRdxCost;
int VecReduxCost = IsPairwiseReduction ? PairwiseRdxCost : SplittingRdxCost;		int VecReduxCost = IsPairwiseReduction ? PairwiseRdxCost : SplittingRdxCost;

int ScalarReduxCost;		int ScalarReduxCost;
switch (ReductionData.getKind()) {		switch (ReductionData.getKind()) {
case RK_Arithmetic:		case RK_Arithmetic:
ScalarReduxCost =		ScalarReduxCost =
TTI->getArithmeticInstrCost(ReductionData.getOpcode(), ScalarTy);		TTI->getUnitThroughput(ReductionData.getOpcode(), ScalarTy);
break;		break;
case RK_Min:		case RK_Min:
case RK_Max:		case RK_Max:
case RK_UMin:		case RK_UMin:
case RK_UMax:		case RK_UMax:
ScalarReduxCost =		ScalarReduxCost =
TTI->getCmpSelInstrCost(ReductionData.getOpcode(), ScalarTy) +		TTI->getCmpSelInstrCost(ReductionData.getOpcode(), ScalarTy) +
TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,		TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
▲ Show 20 Lines • Show All 582 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[TTI] rename getArithmeticInstructionCost() to getUnitThroughput(); NFCClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 135919

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/CodeGen/BasicTTIImpl.h

lib/Analysis/TargetTransformInfo.cpp

lib/CodeGen/CodeGenPrepare.cpp

lib/Target/AArch64/AArch64TargetTransformInfo.h

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

lib/Target/ARM/ARMTargetTransformInfo.h

lib/Target/ARM/ARMTargetTransformInfo.cpp

lib/Target/Lanai/LanaiTargetTransformInfo.h

lib/Target/NVPTX/NVPTXTargetTransformInfo.h

lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp

lib/Target/PowerPC/PPCTargetTransformInfo.h

lib/Target/PowerPC/PPCTargetTransformInfo.cpp

lib/Target/SystemZ/SystemZTargetTransformInfo.h

lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

lib/Target/X86/X86TargetTransformInfo.h

lib/Target/X86/X86TargetTransformInfo.cpp

lib/Transforms/Scalar/IndVarSimplify.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

lib/Transforms/Vectorize/SLPVectorizer.cpp

[TTI] rename getArithmeticInstructionCost() to getUnitThroughput(); NFC
ClosedPublic