This is an archive of the discontinued LLVM Phabricator instance.

[CostModel] Unify Intrinsic Costs.
ClosedPublic

Authored by samparker on May 15 2020, 7:51 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
RKSimon
dmgreen
dfukalov
rampitec

Commits

rGbd9dce8f9acd: [CostModel] getUserCost for intrinsic throughput
rG871556a49455: [CostModel] Unify Intrinsic Costs.
rG1f72d5880e33: [CostModel] Check for free intrinsics in BasicTTI
rGb263fee4d2c9: [CostModel] Sink intrinsic costs to base TTI.
rGde71def3f59d: [CostModel] Unify Intrinsic Costs.

Summary

With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. This involved sinking cost of the TTIImpl into the base implementation, as it performs no target checks. The opcodes remaining were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. getInstructionThroughput can now directly return the result of getUserCost.
This has required a change in the AMDGPU backend for fabs and it as the tests suggest that they should always be 'free'. I've also changed the X86 backend to return '1' for any intrinsic when the CostKind isn't RecipThroughput.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

samparker created this revision.May 15 2020, 7:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2020, 7:51 AM

Herald added subscribers: kerbowa, hiraditya, tpr and 3 others. · View Herald Transcript

samparker added a parent revision: D79941: [NFCI][CostModel] Refactor getIntrinsicInstrCost.May 15 2020, 7:51 AM

samparker mentioned this in D79483: [CostModel] Replace getUserCost with getInstructionCost..

Having fabs free on AMDGPU LGTM.

arsenm added inline comments.May 15 2020, 9:51 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563	Previously this would have been reported from TLI.isFAbsFree, but I don't see that check getting dropped here?

samparker marked an inline comment as done.May 18 2020, 12:27 AM

samparker added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563	I don't recall seeing a check like that... but it makes sense. Having the base implementation call it should work.

samparker marked an inline comment as done.May 18 2020, 4:56 AM

samparker added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563	Okay, so now I see it and it would seem that the logic has changed because of the split and merge with this and D79941. AMDGPUISelLowering doesn't report that fabs vectors are free, so which is true?

dfukalov accepted this revision.May 18 2020, 11:45 AM

dfukalov added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563	This estimation is good in average. I'm going to add tests and improve this place after your commit. LGTM

This revision is now accepted and ready to land.May 18 2020, 11:45 AM

arsenm added inline comments.May 18 2020, 1:25 PM

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
563	Fabs is always free. Eventually vectors break down into scalars that have free fabs uses

Closed by commit rGde71def3f59d: [CostModel] Unify Intrinsic Costs. (authored by samparker). · Explain WhyMay 20 2020, 11:59 PM

This revision was automatically updated to reflect the committed changes.

This change has caused some large text size changes: http://llvm-compile-time-tracker.com/compare.php?from=7606a54363d3d90802977c9f5fb9046d4d780835&to=de71def3f59dc9f12f67141b5040d8e15c84d08a&stat=size-text There's a 5% increase on tramp3d-v4 and some large decreases (up to 8%) on debuginfo builds.

As the commit message tagged this as no function change intended and there are no test changes, I'm assuming this impact wasn't intended?

Er, wow, some of those are huge. Are you able to characterise those benchmarks and what optimisations they are affected by? I generally come across inlining, vectorization and unrolling changes and those all sound plausible candidates! Is a reproducer possible?

Any runtime improvements of these benchmarks?

I'm suspecting that the debug changes maybe caused by the base implementation treating the debug intrinsics as free. I'll revert this and break it down into three separate patches:

Sink all the trivially free intrinsics into the bottom-most implementation.
Combine getIntrinsicCost and getIntrinsicInstrCost.
Have getInstructionThroughput use getUserCost.

First part has gone in as: rGb263fee4d2c9: [CostModel] Sink intrinsic costs to base TTI.

fhahn mentioned this in D89479: [SimplifyCFG] Be more conservative when speculating in loops. (WIP).Oct 16 2020, 11:39 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

40 lines

TargetTransformInfoImpl.h

117 lines

CodeGen/

BasicTTIImpl.h

68 lines

lib/

Analysis/

TargetTransformInfo.cpp

33 lines

Target/

AMDGPU/

AMDGPUTargetTransformInfo.cpp

3 lines

X86/

X86TargetTransformInfo.cpp

6 lines

Transforms/

Scalar/

LoopIdiomRecognize.cpp

8 lines

Diff 265439

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show All 32 Lines
namespace Intrinsic {		namespace Intrinsic {
typedef unsigned ID;		typedef unsigned ID;
}		}

class AssumptionCache;		class AssumptionCache;
class BlockFrequencyInfo;		class BlockFrequencyInfo;
class DominatorTree;		class DominatorTree;
class BranchInst;		class BranchInst;
		class CallBase;
class Function;		class Function;
class GlobalValue;		class GlobalValue;
class IntrinsicInst;		class IntrinsicInst;
class LoadInst;		class LoadInst;
class LoopAccessInfo;		class LoopAccessInfo;
class Loop;		class Loop;
class LoopInfo;		class LoopInfo;
class ProfileSummaryInfo;		class ProfileSummaryInfo;
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	class IntrinsicCostAttributes {
unsigned VF = 1;		unsigned VF = 1;
// If ScalarizationCost is UINT_MAX, the cost of scalarizing the		// If ScalarizationCost is UINT_MAX, the cost of scalarizing the
// arguments and the return value will be computed based on types.		// arguments and the return value will be computed based on types.
unsigned ScalarizationCost = std::numeric_limits<unsigned>::max();		unsigned ScalarizationCost = std::numeric_limits<unsigned>::max();

public:		public:
IntrinsicCostAttributes(const IntrinsicInst &I);		IntrinsicCostAttributes(const IntrinsicInst &I);

IntrinsicCostAttributes(Intrinsic::ID Id, CallInst &CI,		IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI);

		IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI,
unsigned Factor);		unsigned Factor);

IntrinsicCostAttributes(Intrinsic::ID Id, CallInst &CI,		IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI,
unsigned Factor, unsigned ScalarCost);		unsigned Factor, unsigned ScalarCost);

IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,		IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
ArrayRef<Type *> Tys, FastMathFlags Flags);		ArrayRef<Type *> Tys, FastMathFlags Flags);

IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,		IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
ArrayRef<Type *> Tys, FastMathFlags Flags,		ArrayRef<Type *> Tys, FastMathFlags Flags,
unsigned ScalarCost);		unsigned ScalarCost);

IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,		IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
ArrayRef<Type *> Tys, FastMathFlags Flags,		ArrayRef<Type *> Tys, FastMathFlags Flags,
unsigned ScalarCost,		unsigned ScalarCost,
const IntrinsicInst *I);		const IntrinsicInst *I);

IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,		IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
ArrayRef<Type *> Tys);		ArrayRef<Type *> Tys);

IntrinsicCostAttributes(Intrinsic::ID Id, Type *Ty,		IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
ArrayRef<Value *> Args);		ArrayRef<Value *> Args);

Intrinsic::ID getID() const { return IID; }		Intrinsic::ID getID() const { return IID; }
const IntrinsicInst *getInst() const { return II; }		const IntrinsicInst *getInst() const { return II; }
Type *getReturnType() const { return RetTy; }		Type *getReturnType() const { return RetTy; }
unsigned getVectorFactor() const { return VF; }		unsigned getVectorFactor() const { return VF; }
FastMathFlags getFlags() const { return FMF; }		FastMathFlags getFlags() const { return FMF; }
unsigned getScalarizationCost() const { return ScalarizationCost; }		unsigned getScalarizationCost() const { return ScalarizationCost; }
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	public:
/// bonus is applied if the vector instructions exceed 50% and half that		/// bonus is applied if the vector instructions exceed 50% and half that
/// amount is applied if it exceeds 10%. Note that these bonuses are some what		/// amount is applied if it exceeds 10%. Note that these bonuses are some what
/// arbitrary and evolved over time by accident as much as because they are		/// arbitrary and evolved over time by accident as much as because they are
/// principled bonuses.		/// principled bonuses.
/// FIXME: It would be nice to base the bonus values on something more		/// FIXME: It would be nice to base the bonus values on something more
/// scientific. A target may has no bonus on vector instructions.		/// scientific. A target may has no bonus on vector instructions.
int getInlinerVectorBonusPercent() const;		int getInlinerVectorBonusPercent() const;

/// Estimate the cost of an intrinsic when lowered.
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys,
const User *U = nullptr,
TTI::TargetCostKind CostKind = TCK_SizeAndLatency) const;

/// Estimate the cost of an intrinsic when lowered.
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments,
const User *U = nullptr,
TTI::TargetCostKind CostKind = TCK_SizeAndLatency) const;

/// \return the expected cost of a memcpy, which could e.g. depend on the		/// \return the expected cost of a memcpy, which could e.g. depend on the
/// source/destination type and alignment and the number of bytes copied.		/// source/destination type and alignment and the number of bytes copied.
int getMemcpyCost(const Instruction *I) const;		int getMemcpyCost(const Instruction *I) const;

/// \return The estimated number of case clusters when lowering \p 'SI'.		/// \return The estimated number of case clusters when lowering \p 'SI'.
/// \p JTSize Set a jump table size only when \p SI is suitable for a jump		/// \p JTSize Set a jump table size only when \p SI is suitable for a jump
/// table.		/// table.
unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,		unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,
▲ Show 20 Lines • Show All 915 Lines • ▼ Show 20 Lines
public:		public:
virtual ~Concept() = 0;		virtual ~Concept() = 0;
virtual const DataLayout &getDataLayout() const = 0;		virtual const DataLayout &getDataLayout() const = 0;
virtual int getGEPCost(Type PointeeType, const Value Ptr,		virtual int getGEPCost(Type PointeeType, const Value Ptr,
ArrayRef<const Value *> Operands,		ArrayRef<const Value *> Operands,
TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;
virtual unsigned getInliningThresholdMultiplier() = 0;		virtual unsigned getInliningThresholdMultiplier() = 0;
virtual int getInlinerVectorBonusPercent() = 0;		virtual int getInlinerVectorBonusPercent() = 0;
virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type > ParamTys, const User U,
enum TargetCostKind CostKind) = 0;
virtual int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments,
const User *U,
enum TargetCostKind CostKind) = 0;
virtual int getMemcpyCost(const Instruction *I) = 0;		virtual int getMemcpyCost(const Instruction *I) = 0;
virtual unsigned		virtual unsigned
getEstimatedNumberOfCaseClusters(const SwitchInst &SI, unsigned &JTSize,		getEstimatedNumberOfCaseClusters(const SwitchInst &SI, unsigned &JTSize,
ProfileSummaryInfo *PSI,		ProfileSummaryInfo *PSI,
BlockFrequencyInfo *BFI) = 0;		BlockFrequencyInfo *BFI) = 0;
virtual int getUserCost(const User U, ArrayRef<const Value > Operands,		virtual int getUserCost(const User U, ArrayRef<const Value > Operands,
TargetCostKind CostKind) = 0;		TargetCostKind CostKind) = 0;
virtual bool hasBranchDivergence() = 0;		virtual bool hasBranchDivergence() = 0;
▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	int getGEPCost(Type PointeeType, const Value Ptr,
return Impl.getGEPCost(PointeeType, Ptr, Operands);		return Impl.getGEPCost(PointeeType, Ptr, Operands);
}		}
unsigned getInliningThresholdMultiplier() override {		unsigned getInliningThresholdMultiplier() override {
return Impl.getInliningThresholdMultiplier();		return Impl.getInliningThresholdMultiplier();
}		}
int getInlinerVectorBonusPercent() override {		int getInlinerVectorBonusPercent() override {
return Impl.getInlinerVectorBonusPercent();		return Impl.getInlinerVectorBonusPercent();
}		}
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> ParamTys,
const User *U = nullptr,
TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency) override {
return Impl.getIntrinsicCost(IID, RetTy, ParamTys, U, CostKind);
}
int getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments,
const User *U = nullptr,
TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency) override {
return Impl.getIntrinsicCost(IID, RetTy, Arguments, U, CostKind);
}
int getMemcpyCost(const Instruction *I) override {		int getMemcpyCost(const Instruction *I) override {
return Impl.getMemcpyCost(I);		return Impl.getMemcpyCost(I);
}		}
int getUserCost(const User U, ArrayRef<const Value > Operands,		int getUserCost(const User U, ArrayRef<const Value > Operands,
TargetCostKind CostKind) override {		TargetCostKind CostKind) override {
return Impl.getUserCost(U, Operands, CostKind);		return Impl.getUserCost(U, Operands, CostKind);
}		}
bool hasBranchDivergence() override { return Impl.hasBranchDivergence(); }		bool hasBranchDivergence() override { return Impl.hasBranchDivergence(); }
▲ Show 20 Lines • Show All 551 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	unsigned getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
bool UseMaskForCond,		bool UseMaskForCond,
bool UseMaskForGaps) {		bool UseMaskForGaps) {
return 1;		return 1;
}		}

unsigned getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		unsigned getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
		switch (ICA.getID()) {
		default:
		break;
		case Intrinsic::annotation:
		case Intrinsic::assume:
		case Intrinsic::sideeffect:
		case Intrinsic::dbg_declare:
		case Intrinsic::dbg_value:
		case Intrinsic::dbg_label:
		case Intrinsic::invariant_start:
		case Intrinsic::invariant_end:
		case Intrinsic::launder_invariant_group:
		case Intrinsic::strip_invariant_group:
		case Intrinsic::is_constant:
		case Intrinsic::lifetime_start:
		case Intrinsic::lifetime_end:
		case Intrinsic::objectsize:
		case Intrinsic::ptr_annotation:
		case Intrinsic::var_annotation:
		case Intrinsic::experimental_gc_result:
		case Intrinsic::experimental_gc_relocate:
		case Intrinsic::coro_alloc:
		case Intrinsic::coro_begin:
		case Intrinsic::coro_free:
		case Intrinsic::coro_end:
		case Intrinsic::coro_frame:
		case Intrinsic::coro_size:
		case Intrinsic::coro_suspend:
		case Intrinsic::coro_param:
		case Intrinsic::coro_subfn_addr:
		// These intrinsics don't actually represent code after lowering.
		return 0;
		}
return 1;		return 1;
}		}

unsigned getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys,		unsigned getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
return 1;		return 1;
}		}

▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	int getGEPCost(Type PointeeType, const Value Ptr,
if (static_cast<T *>(this)->isLegalAddressingMode(		if (static_cast<T *>(this)->isLegalAddressingMode(
TargetType, const_cast<GlobalValue *>(BaseGV),		TargetType, const_cast<GlobalValue *>(BaseGV),
BaseOffset.sextOrTrunc(64).getSExtValue(), HasBaseReg, Scale,		BaseOffset.sextOrTrunc(64).getSExtValue(), HasBaseReg, Scale,
Ptr->getType()->getPointerAddressSpace()))		Ptr->getType()->getPointerAddressSpace()))
return TTI::TCC_Free;		return TTI::TCC_Free;
return TTI::TCC_Basic;		return TTI::TCC_Basic;
}		}

unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,		int getUserCost(const User U, ArrayRef<const Value > Operands,
ArrayRef<Type > ParamTys, const User U,		TTI::TargetCostKind CostKind) {
TTI::TargetCostKind TCK_SizeAndLatency) {		auto TargetTTI = static_cast<T >(this);
switch (IID) {
default:
// Intrinsics rarely (if ever) have normal argument setup constraints.
// Model them as having a basic instruction cost.
return TTI::TCC_Basic;

// TODO: other libc intrinsics.
case Intrinsic::memcpy:
return static_cast<T *>(this)->getMemcpyCost(dyn_cast<Instruction>(U));

case Intrinsic::annotation:		if (const auto *CB = dyn_cast<CallBase>(U)) {
case Intrinsic::assume:		// Special-case throughput here because it can make wild differences
case Intrinsic::sideeffect:		// whether the arguments are passed around or just the arg types. The
case Intrinsic::dbg_declare:		// IntrinsicCostAttribute constructor used here will save all the
case Intrinsic::dbg_value:		// available information.
case Intrinsic::dbg_label:		// FIXME: More information isn't always useful and we shouldn't have to
case Intrinsic::invariant_start:		// make a special case like this.
case Intrinsic::invariant_end:		if (CostKind == TTI::TCK_RecipThroughput) {
case Intrinsic::launder_invariant_group:		if (auto *II = dyn_cast<IntrinsicInst>(CB)) {
case Intrinsic::strip_invariant_group:		IntrinsicCostAttributes Attrs(*II);
case Intrinsic::is_constant:		return TargetTTI->getIntrinsicInstrCost(Attrs, CostKind);
case Intrinsic::lifetime_start:
case Intrinsic::lifetime_end:
case Intrinsic::objectsize:
case Intrinsic::ptr_annotation:
case Intrinsic::var_annotation:
case Intrinsic::experimental_gc_result:
case Intrinsic::experimental_gc_relocate:
case Intrinsic::coro_alloc:
case Intrinsic::coro_begin:
case Intrinsic::coro_free:
case Intrinsic::coro_end:
case Intrinsic::coro_frame:
case Intrinsic::coro_size:
case Intrinsic::coro_suspend:
case Intrinsic::coro_param:
case Intrinsic::coro_subfn_addr:
// These intrinsics don't actually represent code after lowering.
return TTI::TCC_Free;
}
}		}
		return -1; // We know nothing about this call.
unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value > Arguments, const User U,
TTI::TargetCostKind CostKind) {
// Delegate to the generic intrinsic handling code. This mostly provides an
// opportunity for targets to (for example) special case the cost of
// certain intrinsics based on constants used as arguments.
SmallVector<Type *, 8> ParamTys;
ParamTys.reserve(Arguments.size());
for (unsigned Idx = 0, Size = Arguments.size(); Idx != Size; ++Idx)
ParamTys.push_back(Arguments[Idx]->getType());
return static_cast<T *>(this)->getIntrinsicCost(IID, RetTy, ParamTys, U,
CostKind);
}		}

unsigned getUserCost(const User U, ArrayRef<const Value > Operands,
TTI::TargetCostKind CostKind) {
auto TargetTTI = static_cast<T >(this);

// FIXME: Unlikely to be true for anything but CodeSize.		// FIXME: Unlikely to be true for anything but CodeSize.
if (const auto *CB = dyn_cast<CallBase>(U)) {
const Function *F = CB->getCalledFunction();		const Function *F = CB->getCalledFunction();
if (F) {		if (F) {
FunctionType *FTy = F->getFunctionType();		FunctionType *FTy = F->getFunctionType();
if (Intrinsic::ID IID = F->getIntrinsicID()) {		if (Intrinsic::ID IID = F->getIntrinsicID()) {
SmallVector<Type *, 8> ParamTys(FTy->param_begin(), FTy->param_end());		IntrinsicCostAttributes Attrs(IID, *CB);
return TargetTTI->getIntrinsicCost(IID, FTy->getReturnType(),		return TargetTTI->getIntrinsicInstrCost(Attrs, CostKind);
ParamTys, U, CostKind);
}		}

if (!TargetTTI->isLoweredToCall(F))		if (!TargetTTI->isLoweredToCall(F))
return TTI::TCC_Basic; // Give a basic cost if it will be lowered		return TTI::TCC_Basic; // Give a basic cost if it will be lowered

return TTI::TCC_Basic * (FTy->getNumParams() + 1);		return TTI::TCC_Basic * (FTy->getNumParams() + 1);
}		}
return TTI::TCC_Basic * (CB->arg_size() + 1);		return TTI::TCC_Basic * (CB->arg_size() + 1);
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	bool isTypeLegal(Type *Ty) {
return getTLI()->isTypeLegal(VT);		return getTLI()->isTypeLegal(VT);
}		}

int getGEPCost(Type PointeeType, const Value Ptr,		int getGEPCost(Type PointeeType, const Value Ptr,
ArrayRef<const Value *> Operands) {		ArrayRef<const Value *> Operands) {
return BaseT::getGEPCost(PointeeType, Ptr, Operands);		return BaseT::getGEPCost(PointeeType, Ptr, Operands);
}		}

unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value > Arguments, const User U,
TTI::TargetCostKind CostKind) {
return BaseT::getIntrinsicCost(IID, RetTy, Arguments, U, CostKind);
}

unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type > ParamTys, const User U,
TTI::TargetCostKind CostKind) {
if (IID == Intrinsic::cttz) {
if (getTLI()->isCheapToSpeculateCttz())
return TargetTransformInfo::TCC_Basic;
return TargetTransformInfo::TCC_Expensive;
}

if (IID == Intrinsic::ctlz) {
if (getTLI()->isCheapToSpeculateCtlz())
return TargetTransformInfo::TCC_Basic;
return TargetTransformInfo::TCC_Expensive;
}

return BaseT::getIntrinsicCost(IID, RetTy, ParamTys, U, CostKind);
}

unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,		unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,
unsigned &JumpTableSize,		unsigned &JumpTableSize,
ProfileSummaryInfo *PSI,		ProfileSummaryInfo *PSI,
BlockFrequencyInfo *BFI) {		BlockFrequencyInfo *BFI) {
/// Try to find the estimated number of clusters. Note that the number of		/// Try to find the estimated number of clusters. Note that the number of
/// clusters identified in this function could be different from the actual		/// clusters identified in this function could be different from the actual
/// numbers found in lowering. This function ignore switches that are		/// numbers found in lowering. This function ignore switches that are
/// lowered with a mix of jump table / bit test / BTree. This function was		/// lowered with a mix of jump table / bit test / BTree. This function was
▲ Show 20 Lines • Show All 755 Lines • ▼ Show 20 Lines	unsigned getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,

return Cost;		return Cost;
}		}

/// Get intrinsic cost based on arguments.		/// Get intrinsic cost based on arguments.
unsigned getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		unsigned getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {

		Intrinsic::ID IID = ICA.getID();
		auto ConcreteTTI = static_cast<T >(this);

		// Special case some scalar intrinsics.
		if (CostKind != TTI::TCK_RecipThroughput) {
		switch (IID) {
		default:
		break;
		case Intrinsic::cttz:
		if (getTLI()->isCheapToSpeculateCttz())
		return TargetTransformInfo::TCC_Basic;
		break;
		case Intrinsic::ctlz:
		if (getTLI()->isCheapToSpeculateCtlz())
		return TargetTransformInfo::TCC_Basic;
		break;
		case Intrinsic::memcpy:
		return ConcreteTTI->getMemcpyCost(ICA.getInst());
		}
		return BaseT::getIntrinsicInstrCost(ICA, CostKind);
		}

// TODO: Combine these two logic paths.		// TODO: Combine these two logic paths.
if (ICA.isTypeBasedOnly())		if (ICA.isTypeBasedOnly())
return getTypeBasedIntrinsicInstrCost(ICA, CostKind);		return getTypeBasedIntrinsicInstrCost(ICA, CostKind);

Intrinsic::ID IID = ICA.getID();
const IntrinsicInst *I = ICA.getInst();		const IntrinsicInst *I = ICA.getInst();
Type *RetTy = ICA.getReturnType();
const SmallVectorImpl<Value *> &Args = ICA.getArgs();		const SmallVectorImpl<Value *> &Args = ICA.getArgs();
unsigned VF = ICA.getVectorFactor();
FastMathFlags FMF = ICA.getFlags();		FastMathFlags FMF = ICA.getFlags();
		Type *RetTy = ICA.getReturnType();
		unsigned VF = ICA.getVectorFactor();
unsigned RetVF =		unsigned RetVF =
(RetTy->isVectorTy() ? cast<VectorType>(RetTy)->getNumElements() : 1);		(RetTy->isVectorTy() ? cast<VectorType>(RetTy)->getNumElements() : 1);
assert((RetVF == 1 \|\| VF == 1) && "VF > 1 and RetVF is a vector type");		assert((RetVF == 1 \|\| VF == 1) && "VF > 1 and RetVF is a vector type");
auto ConcreteTTI = static_cast<T >(this);

switch (IID) {		switch (IID) {
default: {		default: {
// Assume that we need to scalarize this intrinsic.		// Assume that we need to scalarize this intrinsic.
SmallVector<Type *, 4> Types;		SmallVector<Type *, 4> Types;
for (Value *Op : Args) {		for (Value *Op : Args) {
Type *OpTy = Op->getType();		Type *OpTy = Op->getType();
assert(VF == 1 \|\| !OpTy->isVectorTy());		assert(VF == 1 \|\| !OpTy->isVectorTy());
▲ Show 20 Lines • Show All 470 Lines • ▼ Show 20 Lines	unsigned getTypeBasedIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,

// If we can't lower fmuladd into an FMA estimate the cost as a floating		// If we can't lower fmuladd into an FMA estimate the cost as a floating
// point mul followed by an add.		// point mul followed by an add.
if (IID == Intrinsic::fmuladd)		if (IID == Intrinsic::fmuladd)
return ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FMul, RetTy,		return ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FMul, RetTy,
CostKind) +		CostKind) +
ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy,		ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy,
CostKind);		CostKind);
if (IID == Intrinsic::experimental_constrained_fmuladd)		if (IID == Intrinsic::experimental_constrained_fmuladd) {
return ConcreteTTI->getIntrinsicCost(		IntrinsicCostAttributes FMulAttrs(
Intrinsic::experimental_constrained_fmul, RetTy, Tys, nullptr,		Intrinsic::experimental_constrained_fmul, RetTy, Tys);
CostKind) +		IntrinsicCostAttributes FAddAttrs(
ConcreteTTI->getIntrinsicCost(		Intrinsic::experimental_constrained_fadd, RetTy, Tys);
Intrinsic::experimental_constrained_fadd, RetTy, Tys, nullptr,		return ConcreteTTI->getIntrinsicInstrCost(FMulAttrs, CostKind) +
CostKind);		ConcreteTTI->getIntrinsicInstrCost(FAddAttrs, CostKind);
		}

// Else, assume that we need to scalarize this intrinsic. For math builtins		// Else, assume that we need to scalarize this intrinsic. For math builtins
// this will emit a costly libcall, adding call overhead and spills. Make it		// this will emit a costly libcall, adding call overhead and spills. Make it
// very expensive.		// very expensive.
if (auto *RetVTy = dyn_cast<VectorType>(RetTy)) {		if (auto *RetVTy = dyn_cast<VectorType>(RetTy)) {
unsigned ScalarizationCost = SkipScalarizationCost ?		unsigned ScalarizationCost = SkipScalarizationCost ?
ScalarizationCostPassed : getScalarizationOverhead(RetVTy, true, false);		ScalarizationCostPassed : getScalarizationOverhead(RetVTy, true, false);

▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines

FunctionType *FTy = I.getCalledFunction()->getFunctionType();		FunctionType *FTy = I.getCalledFunction()->getFunctionType();
ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());		ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
Arguments.insert(Arguments.begin(), I.arg_begin(), I.arg_end());		Arguments.insert(Arguments.begin(), I.arg_begin(), I.arg_end());
if (auto *FPMO = dyn_cast<FPMathOperator>(&I))		if (auto *FPMO = dyn_cast<FPMathOperator>(&I))
FMF = FPMO->getFastMathFlags();		FMF = FPMO->getFastMathFlags();
}		}

IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, CallInst &CI,		IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,
		const CallBase &CI) :
		II(dyn_cast<IntrinsicInst>(&CI)), RetTy(CI.getType()), IID(Id) {

		if (auto *FPMO = dyn_cast<FPMathOperator>(&CI))
		FMF = FPMO->getFastMathFlags();

		FunctionType *FTy =
		CI.getCalledFunction()->getFunctionType();
		ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
		}

		IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,
		const CallBase &CI,
unsigned Factor) :		unsigned Factor) :
RetTy(CI.getType()), IID(Id), VF(Factor) {		RetTy(CI.getType()), IID(Id), VF(Factor) {

if (auto *FPMO = dyn_cast<FPMathOperator>(&CI))		if (auto *FPMO = dyn_cast<FPMathOperator>(&CI))
FMF = FPMO->getFastMathFlags();		FMF = FPMO->getFastMathFlags();

Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());		Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());
FunctionType *FTy =		FunctionType *FTy =
CI.getCalledFunction()->getFunctionType();		CI.getCalledFunction()->getFunctionType();
ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());		ParamTys.insert(ParamTys.begin(), FTy->param_begin(), FTy->param_end());
}		}

IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id, CallInst &CI,		IntrinsicCostAttributes::IntrinsicCostAttributes(Intrinsic::ID Id,
		const CallBase &CI,
unsigned Factor,		unsigned Factor,
unsigned ScalarCost) :		unsigned ScalarCost) :
RetTy(CI.getType()), IID(Id), VF(Factor), ScalarizationCost(ScalarCost) {		RetTy(CI.getType()), IID(Id), VF(Factor), ScalarizationCost(ScalarCost) {

if (auto *FPMO = dyn_cast<FPMathOperator>(&CI))		if (auto *FPMO = dyn_cast<FPMathOperator>(&CI))
FMF = FPMO->getFastMathFlags();		FMF = FPMO->getFastMathFlags();

Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());		Arguments.insert(Arguments.begin(), CI.arg_begin(), CI.arg_end());
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
}		}

int TargetTransformInfo::getGEPCost(Type PointeeType, const Value Ptr,		int TargetTransformInfo::getGEPCost(Type PointeeType, const Value Ptr,
ArrayRef<const Value *> Operands,		ArrayRef<const Value *> Operands,
TTI::TargetCostKind CostKind) const {		TTI::TargetCostKind CostKind) const {
return TTIImpl->getGEPCost(PointeeType, Ptr, Operands, CostKind);		return TTIImpl->getGEPCost(PointeeType, Ptr, Operands, CostKind);
}		}

int TargetTransformInfo::getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<const Value *> Arguments,
const User *U,
TTI::TargetCostKind CostKind) const {
int Cost = TTIImpl->getIntrinsicCost(IID, RetTy, Arguments, U, CostKind);
assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;
}

unsigned TargetTransformInfo::getEstimatedNumberOfCaseClusters(		unsigned TargetTransformInfo::getEstimatedNumberOfCaseClusters(
const SwitchInst &SI, unsigned &JTSize, ProfileSummaryInfo *PSI,		const SwitchInst &SI, unsigned &JTSize, ProfileSummaryInfo *PSI,
BlockFrequencyInfo *BFI) const {		BlockFrequencyInfo *BFI) const {
return TTIImpl->getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);		return TTIImpl->getEstimatedNumberOfCaseClusters(SI, JTSize, PSI, BFI);
}		}

int TargetTransformInfo::getUserCost(const User *U,		int TargetTransformInfo::getUserCost(const User *U,
ArrayRef<const Value *> Operands,		ArrayRef<const Value *> Operands,
▲ Show 20 Lines • Show All 1,155 Lines • ▼ Show 20 Lines	if (Shuffle->isZeroEltSplat())
return TTIImpl->getShuffleCost(SK_Broadcast, Ty, 0, nullptr);		return TTIImpl->getShuffleCost(SK_Broadcast, Ty, 0, nullptr);

if (Shuffle->isSingleSource())		if (Shuffle->isSingleSource())
return TTIImpl->getShuffleCost(SK_PermuteSingleSrc, Ty, 0, nullptr);		return TTIImpl->getShuffleCost(SK_PermuteSingleSrc, Ty, 0, nullptr);

return TTIImpl->getShuffleCost(SK_PermuteTwoSrc, Ty, 0, nullptr);		return TTIImpl->getShuffleCost(SK_PermuteTwoSrc, Ty, 0, nullptr);
}		}
case Instruction::Call:		case Instruction::Call:
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		return getUserCost(I, CostKind);
IntrinsicCostAttributes CostAttrs(*II);
return getIntrinsicInstrCost(CostAttrs, CostKind);
}
return -1;
default:		default:
// We don't have any information on this instruction.		// We don't have any information on this instruction.
return -1;		return -1;
}		}
}		}

TargetTransformInfo::Concept::~Concept() {}		TargetTransformInfo::Concept::~Concept() {}

▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 554 Lines • ▼ Show 20 Lines	case Intrinsic::round:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

int GCNTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		int GCNTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
		if (ICA.getID() == Intrinsic::fabs)
		arsenmUnsubmitted Not Done Reply Inline Actions Previously this would have been reported from TLI.isFAbsFree, but I don't see that check getting dropped here? arsenm: Previously this would have been reported from TLI.isFAbsFree, but I don't see that check…
		samparkerAuthorUnsubmitted Done Reply Inline Actions I don't recall seeing a check like that... but it makes sense. Having the base implementation call it should work. samparker: I don't recall seeing a check like that... but it makes sense. Having the base implementation…
		samparkerAuthorUnsubmitted Done Reply Inline Actions Okay, so now I see it and it would seem that the logic has changed because of the split and merge with this and D79941. AMDGPUISelLowering doesn't report that fabs vectors are free, so which is true? samparker: Okay, so now I see it and it would seem that the logic has changed because of the split and…
		dfukalovUnsubmitted Not Done Reply Inline Actions This estimation is good in average. I'm going to add tests and improve this place after your commit. LGTM dfukalov: This estimation is good in average. I'm going to add tests and improve this place after your…
		arsenmUnsubmitted Not Done Reply Inline Actions Fabs is always free. Eventually vectors break down into scalars that have free fabs uses arsenm: Fabs is always free. Eventually vectors break down into scalars that have free fabs uses
		return 0;

if (!intrinsicHasPackedVectorBenefit(ICA.getID()))		if (!intrinsicHasPackedVectorBenefit(ICA.getID()))
return BaseT::getIntrinsicInstrCost(ICA, CostKind);		return BaseT::getIntrinsicInstrCost(ICA, CostKind);

Type *RetTy = ICA.getReturnType();		Type *RetTy = ICA.getReturnType();
EVT OrigTy = TLI->getValueType(DL, RetTy);		EVT OrigTy = TLI->getValueType(DL, RetTy);
if (!OrigTy.isSimple()) {		if (!OrigTy.isSimple()) {
return BaseT::getIntrinsicInstrCost(ICA, CostKind);		return BaseT::getIntrinsicInstrCost(ICA, CostKind);
}		}
▲ Show 20 Lines • Show All 581 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,693 Lines • ▼ Show 20 Lines	if (const auto *Entry = CostTableLookup(X86CostTbl, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
}		}

return BaseT::getIntrinsicInstrCost(ICA, CostKind);		return BaseT::getIntrinsicInstrCost(ICA, CostKind);
}		}

int X86TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		int X86TTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
		if (CostKind != TTI::TCK_RecipThroughput)
		return 1;

if (ICA.isTypeBasedOnly())		if (ICA.isTypeBasedOnly())
return getTypeBasedIntrinsicInstrCost(ICA, CostKind);		return getTypeBasedIntrinsicInstrCost(ICA, CostKind);

static const CostTblEntry AVX512CostTbl[] = {		static const CostTblEntry AVX512CostTbl[] = {
{ ISD::ROTL, MVT::v8i64, 1 },		{ ISD::ROTL, MVT::v8i64, 1 },
{ ISD::ROTL, MVT::v4i64, 1 },		{ ISD::ROTL, MVT::v4i64, 1 },
{ ISD::ROTL, MVT::v2i64, 1 },		{ ISD::ROTL, MVT::v2i64, 1 },
{ ISD::ROTL, MVT::v16i32, 1 },		{ ISD::ROTL, MVT::v16i32, 1 },
▲ Show 20 Lines • Show All 1,217 Lines • ▼ Show 20 Lines
}		}

/// Calculate the cost of Gather / Scatter operation		/// Calculate the cost of Gather / Scatter operation
int X86TTIImpl::getGatherScatterOpCost(		int X86TTIImpl::getGatherScatterOpCost(
unsigned Opcode, Type SrcVTy, Value Ptr, bool VariableMask,		unsigned Opcode, Type SrcVTy, Value Ptr, bool VariableMask,
unsigned Alignment, TTI::TargetCostKind CostKind,		unsigned Alignment, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr) {		const Instruction *I = nullptr) {

		if (CostKind != TTI::TCK_RecipThroughput)
		return 1;

assert(SrcVTy->isVectorTy() && "Unexpected data type for Gather/Scatter");		assert(SrcVTy->isVectorTy() && "Unexpected data type for Gather/Scatter");
unsigned VF = cast<VectorType>(SrcVTy)->getNumElements();		unsigned VF = cast<VectorType>(SrcVTy)->getNumElements();
PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());		PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());
if (!PtrTy && Ptr->getType()->isVectorTy())		if (!PtrTy && Ptr->getType()->isVectorTy())
PtrTy = dyn_cast<PointerType>(		PtrTy = dyn_cast<PointerType>(
cast<VectorType>(Ptr->getType())->getElementType());		cast<VectorType>(Ptr->getType())->getElementType());
assert(PtrTy && "Unexpected type for Ptr argument");		assert(PtrTy && "Unexpected type for Ptr argument");
unsigned AddressSpace = PtrTy->getAddressSpace();		unsigned AddressSpace = PtrTy->getAddressSpace();
▲ Show 20 Lines • Show All 549 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

Show First 20 Lines • Show All 1,529 Lines • ▼ Show 20 Lines	bool LoopIdiomRecognize::recognizeAndInsertFFS() {
// the loop has only 6 instructions:		// the loop has only 6 instructions:
// %n.addr.0 = phi [ %n, %entry ], [ %shr, %while.cond ]		// %n.addr.0 = phi [ %n, %entry ], [ %shr, %while.cond ]
// %i.0 = phi [ %i0, %entry ], [ %inc, %while.cond ]		// %i.0 = phi [ %i0, %entry ], [ %inc, %while.cond ]
// %shr = ashr %n.addr.0, 1		// %shr = ashr %n.addr.0, 1
// %tobool = icmp eq %shr, 0		// %tobool = icmp eq %shr, 0
// %inc = add nsw %i.0, 1		// %inc = add nsw %i.0, 1
// br i1 %tobool		// br i1 %tobool

const Value *Args[] =		Value *Args[] =
{InitX, ZeroCheck ? ConstantInt::getTrue(InitX->getContext())		{InitX, ZeroCheck ? ConstantInt::getTrue(InitX->getContext())
: ConstantInt::getFalse(InitX->getContext())};		: ConstantInt::getFalse(InitX->getContext())};

// @llvm.dbg doesn't count as they have no semantic effect.		// @llvm.dbg doesn't count as they have no semantic effect.
auto InstWithoutDebugIt = CurLoop->getHeader()->instructionsWithoutDebug();		auto InstWithoutDebugIt = CurLoop->getHeader()->instructionsWithoutDebug();
uint32_t HeaderSize =		uint32_t HeaderSize =
std::distance(InstWithoutDebugIt.begin(), InstWithoutDebugIt.end());		std::distance(InstWithoutDebugIt.begin(), InstWithoutDebugIt.end());

		IntrinsicCostAttributes Attrs(IntrinID, InitX->getType(), Args);
		int Cost =
		TTI->getIntrinsicInstrCost(Attrs, TargetTransformInfo::TCK_SizeAndLatency);
if (HeaderSize != IdiomCanonicalSize &&		if (HeaderSize != IdiomCanonicalSize &&
TTI->getIntrinsicCost(IntrinID, InitX->getType(), Args) >		Cost > TargetTransformInfo::TCC_Basic)
TargetTransformInfo::TCC_Basic)
return false;		return false;

transformLoopToCountable(IntrinID, PH, CntInst, CntPhi, InitX, DefX,		transformLoopToCountable(IntrinID, PH, CntInst, CntPhi, InitX, DefX,
DefX->getDebugLoc(), ZeroCheck,		DefX->getDebugLoc(), ZeroCheck,
IsCntPhiUsedOutsideLoop);		IsCntPhiUsedOutsideLoop);
return true;		return true;
}		}

▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines