This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
1
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
AArch64TargetTransformInfo.cpp
-
AMDGPU/
-
AMDGPUTargetTransformInfo.h
-
AMDGPUTargetTransformInfo.cpp
-
RISCV/
-
RISCVTargetTransformInfo.h
-
RISCVTargetTransformInfo.cpp
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
1
SLPVectorizer.cpp
-
test/Analysis/CostModel/
-
Analysis/
-
CostModel/
-
AArch64/
-
reduce-minmax.ll
-
ARM/
-
intrinsic-cost-kinds.ll

Differential D153547

[CostModel] Use min/max intrinsics for vecreduce.min/max costs
ClosedPublic

Authored by dmgreen on Jun 22 2023, 7:02 AM.

Download Raw Diff

Details

Reviewers

RKSimon
anna
spatel
nikic

Commits

rG12025cef3ec8: [CostModel] Use min/max intrinsics for vecreduce.min/max costs

Summary

This changes the costmodelling of the vecreduce.min/max nodes to use the costs of the relevant min/max intrinsics instead of expanding them to compare and selects. The getMinMaxReductionCost have changed to take a Opcode for the relevant intrinsic, dropping the IsUnsigned and CondTy parameters as they are no longer needed.

A follow up patch will add some basic fminimum/fmaximum costmodelling.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Jun 22 2023, 7:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2023, 7:02 AM

Herald added subscribers: luke, foad, StephenFan and 27 others. · View Herald Transcript

dmgreen requested review of this revision.Jun 22 2023, 7:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2023, 7:02 AM

Herald added subscribers: wangpc, MaskRay. · View Herald Transcript

dmgreen added a child revision: D153548: [TTI][AArch64] Add basic vector_reduce_fmaximum/vector_reduce_fminimum costmodelling.Jun 22 2023, 7:03 AM

nikic added inline comments.Jun 22 2023, 7:08 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
1412	Opcode -> IntrinsicID? We don't usually use the term opcode for intrinsics. Could also use `Intrinsic::ID` type here (unless it's not available).
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
13807–13809	Use this value instead of your switch?

Harbormaster completed remote builds in B240496: Diff 533573.Jun 22 2023, 8:10 AM

Not to hijack this patch, but I was experimenting a similar patch for Loop vectorizer when trip count of the loop is low (when we vectorize it, we get regressions since we do not consider out-of-loop reduction cost in LoopVectorizer). The plan is to add a TTIHook, I'll put the change up for review.

Use Intrinsic::ID

In D153547#4442566, @anna wrote:

Not to hijack this patch, but I was experimenting a similar patch for Loop vectorizer when trip count of the loop is low (when we vectorize it, we get regressions since we do not consider out-of-loop reduction cost in LoopVectorizer). The plan is to add a TTIHook, I'll put the change up for review.

Sounds good. Low trip count loops are often unrolled prior to vectorization (or they are large so the overheads are quite small in comparison). The loop vectorizer hasn't in the past modelled many of the loop invariant overheads such as reductions and setup splats or constants. In here and D153548 my main aim was just to get some basic AArch64 cost modelling for the intrinsics.

Harbormaster completed remote builds in B240694: Diff 533860.Jun 22 2023, 11:33 PM

LGTM

This revision is now accepted and ready to land.Jun 23 2023, 12:21 AM

Closed by commit rG12025cef3ec8: [CostModel] Use min/max intrinsics for vecreduce.min/max costs (authored by dmgreen). · Explain WhyJul 4 2023, 7:02 AM

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG12025cef3ec8: [CostModel] Use min/max intrinsics for vecreduce.min/max costs.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

12 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

52 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

4 lines

AArch64TargetTransformInfo.cpp

16 lines

AMDGPU/

AMDGPUTargetTransformInfo.h

4 lines

AMDGPUTargetTransformInfo.cpp

6 lines

RISCV/

RISCVTargetTransformInfo.h

4 lines

RISCVTargetTransformInfo.cpp

8 lines

X86/

X86TargetTransformInfo.h

8 lines

X86TargetTransformInfo.cpp

37 lines

Transforms/

Vectorize/

SLPVectorizer.cpp

10 lines

test/

Analysis/

CostModel/

AArch64/

reduce-minmax.ll

16 lines

ARM/

intrinsic-cost-kinds.ll

8 lines

Diff 537089

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,403 Lines • ▼ Show 20 Lines
/// This is only the case for FP operations and when reassociation is not		/// This is only the case for FP operations and when reassociation is not
/// allowed.		/// allowed.
///		///
InstructionCost getArithmeticReductionCost(		InstructionCost getArithmeticReductionCost(
unsigned Opcode, VectorType *Ty, std::optional<FastMathFlags> FMF,		unsigned Opcode, VectorType *Ty, std::optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;

InstructionCost getMinMaxReductionCost(		InstructionCost getMinMaxReductionCost(
VectorType Ty, VectorType CondTy, bool IsUnsigned,		Intrinsic::ID IID, VectorType *Ty, FastMathFlags FMF = FastMathFlags(),
		nikicUnsubmitted Not Done Reply Inline Actions Opcode -> IntrinsicID? We don't usually use the term opcode for intrinsics. Could also use `Intrinsic::ID` type here (unless it's not available). nikic: Opcode -> IntrinsicID? We don't usually use the term opcode for intrinsics. Could also use…
FastMathFlags FMF = FastMathFlags(),
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const;

/// Calculate the cost of an extended reduction pattern, similar to		/// Calculate the cost of an extended reduction pattern, similar to
/// getArithmeticReductionCost of an Add reduction with multiply and optional		/// getArithmeticReductionCost of an Add reduction with multiply and optional
/// extensions. This is the cost of as:		/// extensions. This is the cost of as:
/// ResTy vecreduce.add(mul (A, B)).		/// ResTy vecreduce.add(mul (A, B)).
/// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)).		/// ResTy vecreduce.add(mul(ext(Ty A), ext(Ty B)).
InstructionCost getMulAccReductionCost(		InstructionCost getMulAccReductionCost(
▲ Show 20 Lines • Show All 532 Lines • ▼ Show 20 Lines	virtual InstructionCost getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond = false, bool UseMaskForGaps = false) = 0;		bool UseMaskForCond = false, bool UseMaskForGaps = false) = 0;
virtual InstructionCost		virtual InstructionCost
getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
std::optional<FastMathFlags> FMF,		std::optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;
virtual InstructionCost		virtual InstructionCost
getMinMaxReductionCost(VectorType Ty, VectorType CondTy, bool IsUnsigned,		getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty, FastMathFlags FMF,
FastMathFlags FMF, TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;
virtual InstructionCost getExtendedReductionCost(		virtual InstructionCost getExtendedReductionCost(
unsigned Opcode, bool IsUnsigned, Type ResTy, VectorType Ty,		unsigned Opcode, bool IsUnsigned, Type ResTy, VectorType Ty,
FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) = 0;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) = 0;
virtual InstructionCost getMulAccReductionCost(		virtual InstructionCost getMulAccReductionCost(
bool IsUnsigned, Type ResTy, VectorType Ty,		bool IsUnsigned, Type ResTy, VectorType Ty,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) = 0;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) = 0;
virtual InstructionCost		virtual InstructionCost
▲ Show 20 Lines • Show All 607 Lines • ▼ Show 20 Lines	public:
}		}
InstructionCost		InstructionCost
getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
std::optional<FastMathFlags> FMF,		std::optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind) override {		TTI::TargetCostKind CostKind) override {
return Impl.getArithmeticReductionCost(Opcode, Ty, FMF, CostKind);		return Impl.getArithmeticReductionCost(Opcode, Ty, FMF, CostKind);
}		}
InstructionCost		InstructionCost
getMinMaxReductionCost(VectorType Ty, VectorType CondTy, bool IsUnsigned,		getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty, FastMathFlags FMF,
FastMathFlags FMF,
TTI::TargetCostKind CostKind) override {		TTI::TargetCostKind CostKind) override {
return Impl.getMinMaxReductionCost(Ty, CondTy, IsUnsigned, FMF, CostKind);		return Impl.getMinMaxReductionCost(IID, Ty, FMF, CostKind);
}		}
InstructionCost		InstructionCost
getExtendedReductionCost(unsigned Opcode, bool IsUnsigned, Type *ResTy,		getExtendedReductionCost(unsigned Opcode, bool IsUnsigned, Type *ResTy,
VectorType *Ty, FastMathFlags FMF,		VectorType *Ty, FastMathFlags FMF,
TTI::TargetCostKind CostKind) override {		TTI::TargetCostKind CostKind) override {
return Impl.getExtendedReductionCost(Opcode, IsUnsigned, ResTy, Ty, FMF,		return Impl.getExtendedReductionCost(Opcode, IsUnsigned, ResTy, Ty, FMF,
CostKind);		CostKind);
}		}
▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	public:
}		}

InstructionCost getArithmeticReductionCost(unsigned, VectorType *,		InstructionCost getArithmeticReductionCost(unsigned, VectorType *,
std::optional<FastMathFlags> FMF,		std::optional<FastMathFlags> FMF,
TTI::TargetCostKind) const {		TTI::TargetCostKind) const {
return 1;		return 1;
}		}

InstructionCost getMinMaxReductionCost(VectorType , VectorType , bool,		InstructionCost getMinMaxReductionCost(Intrinsic::ID IID, VectorType *,
FastMathFlags,		FastMathFlags,
TTI::TargetCostKind) const {		TTI::TargetCostKind) const {
return 1;		return 1;
}		}

InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned,		InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned,
Type ResTy, VectorType Ty,		Type ResTy, VectorType Ty,
FastMathFlags FMF,		FastMathFlags FMF,
▲ Show 20 Lines • Show All 649 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,884 Lines • ▼ Show 20 Lines	case Intrinsic::vector_reduce_xor:
std::nullopt, CostKind);		std::nullopt, CostKind);
case Intrinsic::vector_reduce_fadd:		case Intrinsic::vector_reduce_fadd:
return thisT()->getArithmeticReductionCost(Instruction::FAdd, VecOpTy,		return thisT()->getArithmeticReductionCost(Instruction::FAdd, VecOpTy,
FMF, CostKind);		FMF, CostKind);
case Intrinsic::vector_reduce_fmul:		case Intrinsic::vector_reduce_fmul:
return thisT()->getArithmeticReductionCost(Instruction::FMul, VecOpTy,		return thisT()->getArithmeticReductionCost(Instruction::FMul, VecOpTy,
FMF, CostKind);		FMF, CostKind);
case Intrinsic::vector_reduce_smax:		case Intrinsic::vector_reduce_smax:
		return thisT()->getMinMaxReductionCost(Intrinsic::smax, VecOpTy,
		ICA.getFlags(), CostKind);
case Intrinsic::vector_reduce_smin:		case Intrinsic::vector_reduce_smin:
case Intrinsic::vector_reduce_fmax:		return thisT()->getMinMaxReductionCost(Intrinsic::smin, VecOpTy,
case Intrinsic::vector_reduce_fmin:		ICA.getFlags(), CostKind);
return thisT()->getMinMaxReductionCost(
VecOpTy, cast<VectorType>(CmpInst::makeCmpResultType(VecOpTy)),
/IsUnsigned=/false, ICA.getFlags(), CostKind);
case Intrinsic::vector_reduce_umax:		case Intrinsic::vector_reduce_umax:
		return thisT()->getMinMaxReductionCost(Intrinsic::umax, VecOpTy,
		ICA.getFlags(), CostKind);
case Intrinsic::vector_reduce_umin:		case Intrinsic::vector_reduce_umin:
return thisT()->getMinMaxReductionCost(		return thisT()->getMinMaxReductionCost(Intrinsic::umin, VecOpTy,
VecOpTy, cast<VectorType>(CmpInst::makeCmpResultType(VecOpTy)),		ICA.getFlags(), CostKind);
/IsUnsigned=/true, ICA.getFlags(), CostKind);		case Intrinsic::vector_reduce_fmax:
		return thisT()->getMinMaxReductionCost(Intrinsic::maxnum, VecOpTy,
		ICA.getFlags(), CostKind);
		case Intrinsic::vector_reduce_fmin:
		return thisT()->getMinMaxReductionCost(Intrinsic::minnum, VecOpTy,
		ICA.getFlags(), CostKind);
case Intrinsic::abs: {		case Intrinsic::abs: {
// abs(X) = select(icmp(X,0),X,sub(0,X))		// abs(X) = select(icmp(X,0),X,sub(0,X))
Type *CondTy = RetTy->getWithNewBitWidth(1);		Type *CondTy = RetTy->getWithNewBitWidth(1);
CmpInst::Predicate Pred = CmpInst::ICMP_SGT;		CmpInst::Predicate Pred = CmpInst::ICMP_SGT;
InstructionCost Cost = 0;		InstructionCost Cost = 0;
Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, RetTy, CondTy,		Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, RetTy, CondTy,
Pred, CostKind);		Pred, CostKind);
Cost += thisT()->getCmpSelInstrCost(BinaryOperator::Select, RetTy, CondTy,		Cost += thisT()->getCmpSelInstrCost(BinaryOperator::Select, RetTy, CondTy,
▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
assert(Ty && "Unknown reduction vector type");		assert(Ty && "Unknown reduction vector type");
if (TTI::requiresOrderedReduction(FMF))		if (TTI::requiresOrderedReduction(FMF))
return getOrderedReductionCost(Opcode, Ty, CostKind);		return getOrderedReductionCost(Opcode, Ty, CostKind);
return getTreeReductionCost(Opcode, Ty, CostKind);		return getTreeReductionCost(Opcode, Ty, CostKind);
}		}

/// Try to calculate op costs for min/max reduction operations.		/// Try to calculate op costs for min/max reduction operations.
/// \param CondTy Conditional type for the Select instruction.		/// \param CondTy Conditional type for the Select instruction.
InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		InstructionCost getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
// Targets must implement a default value for the scalable case, since		// Targets must implement a default value for the scalable case, since
// we don't know how many lanes the vector has.		// we don't know how many lanes the vector has.
if (isa<ScalableVectorType>(Ty))		if (isa<ScalableVectorType>(Ty))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();

Type *ScalarTy = Ty->getElementType();		Type *ScalarTy = Ty->getElementType();
Type *ScalarCondTy = CondTy->getElementType();
unsigned NumVecElts = cast<FixedVectorType>(Ty)->getNumElements();		unsigned NumVecElts = cast<FixedVectorType>(Ty)->getNumElements();
unsigned NumReduxLevels = Log2_32(NumVecElts);		unsigned NumReduxLevels = Log2_32(NumVecElts);
unsigned CmpOpcode;
if (Ty->isFPOrFPVectorTy()) {
CmpOpcode = Instruction::FCmp;
} else {
assert(Ty->isIntOrIntVectorTy() &&
"expecting floating point or integer type for min/max reduction");
CmpOpcode = Instruction::ICmp;
}
InstructionCost MinMaxCost = 0;		InstructionCost MinMaxCost = 0;
InstructionCost ShuffleCost = 0;		InstructionCost ShuffleCost = 0;
std::pair<InstructionCost, MVT> LT = thisT()->getTypeLegalizationCost(Ty);		std::pair<InstructionCost, MVT> LT = thisT()->getTypeLegalizationCost(Ty);
unsigned LongVectorCount = 0;		unsigned LongVectorCount = 0;
unsigned MVTLen =		unsigned MVTLen =
LT.second.isVector() ? LT.second.getVectorNumElements() : 1;		LT.second.isVector() ? LT.second.getVectorNumElements() : 1;
while (NumVecElts > MVTLen) {		while (NumVecElts > MVTLen) {
NumVecElts /= 2;		NumVecElts /= 2;
auto *SubTy = FixedVectorType::get(ScalarTy, NumVecElts);		auto *SubTy = FixedVectorType::get(ScalarTy, NumVecElts);
CondTy = FixedVectorType::get(ScalarCondTy, NumVecElts);

ShuffleCost +=		ShuffleCost +=
thisT()->getShuffleCost(TTI::SK_ExtractSubvector, Ty, std::nullopt,		thisT()->getShuffleCost(TTI::SK_ExtractSubvector, Ty, std::nullopt,
CostKind, NumVecElts, SubTy);		CostKind, NumVecElts, SubTy);
MinMaxCost +=
thisT()->getCmpSelInstrCost(CmpOpcode, SubTy, CondTy,		IntrinsicCostAttributes Attrs(IID, SubTy, {SubTy, SubTy}, FMF);
CmpInst::BAD_ICMP_PREDICATE, CostKind) +		MinMaxCost += getIntrinsicInstrCost(Attrs, CostKind);
thisT()->getCmpSelInstrCost(Instruction::Select, SubTy, CondTy,
CmpInst::BAD_ICMP_PREDICATE, CostKind);
Ty = SubTy;		Ty = SubTy;
++LongVectorCount;		++LongVectorCount;
}		}

NumReduxLevels -= LongVectorCount;		NumReduxLevels -= LongVectorCount;

// The minimal length of the vector is limited by the real length of vector		// The minimal length of the vector is limited by the real length of vector
// operations performed on the current platform. That's why several final		// operations performed on the current platform. That's why several final
// reduction opertions are perfomed on the vectors with the same		// reduction opertions are perfomed on the vectors with the same
// architecture-dependent length.		// architecture-dependent length.
ShuffleCost +=		ShuffleCost +=
NumReduxLevels * thisT()->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,		NumReduxLevels * thisT()->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,
std::nullopt, CostKind, 0, Ty);		std::nullopt, CostKind, 0, Ty);
MinMaxCost +=		IntrinsicCostAttributes Attrs(IID, Ty, {Ty, Ty}, FMF);
NumReduxLevels *		MinMaxCost += NumReduxLevels * getIntrinsicInstrCost(Attrs, CostKind);
(thisT()->getCmpSelInstrCost(CmpOpcode, Ty, CondTy,
CmpInst::BAD_ICMP_PREDICATE, CostKind) +
thisT()->getCmpSelInstrCost(Instruction::Select, Ty, CondTy,
CmpInst::BAD_ICMP_PREDICATE, CostKind));
// The last min/max should be in vector registers and we counted it above.		// The last min/max should be in vector registers and we counted it above.
// So just need a single extractelement.		// So just need a single extractelement.
return ShuffleCost + MinMaxCost +		return ShuffleCost + MinMaxCost +
thisT()->getVectorInstrCost(Instruction::ExtractElement, Ty,		thisT()->getVectorInstrCost(Instruction::ExtractElement, Ty,
CostKind, 0, nullptr, nullptr);		CostKind, 0, nullptr, nullptr);
}		}

InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned,		InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned,
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,052 Lines • ▼ Show 20 Lines	InstructionCost TargetTransformInfo::getArithmeticReductionCost(
TTI::TargetCostKind CostKind) const {		TTI::TargetCostKind CostKind) const {
InstructionCost Cost =		InstructionCost Cost =
TTIImpl->getArithmeticReductionCost(Opcode, Ty, FMF, CostKind);		TTIImpl->getArithmeticReductionCost(Opcode, Ty, FMF, CostKind);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

InstructionCost TargetTransformInfo::getMinMaxReductionCost(		InstructionCost TargetTransformInfo::getMinMaxReductionCost(
VectorType Ty, VectorType CondTy, bool IsUnsigned, FastMathFlags FMF,		Intrinsic::ID IID, VectorType *Ty, FastMathFlags FMF,
TTI::TargetCostKind CostKind) const {		TTI::TargetCostKind CostKind) const {
InstructionCost Cost =		InstructionCost Cost =
TTIImpl->getMinMaxReductionCost(Ty, CondTy, IsUnsigned, FMF, CostKind);		TTIImpl->getMinMaxReductionCost(IID, Ty, FMF, CostKind);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

InstructionCost TargetTransformInfo::getExtendedReductionCost(		InstructionCost TargetTransformInfo::getExtendedReductionCost(
unsigned Opcode, bool IsUnsigned, Type ResTy, VectorType Ty,		unsigned Opcode, bool IsUnsigned, Type ResTy, VectorType Ty,
FastMathFlags FMF, TTI::TargetCostKind CostKind) const {		FastMathFlags FMF, TTI::TargetCostKind CostKind) const {
return TTIImpl->getExtendedReductionCost(Opcode, IsUnsigned, ResTy, Ty, FMF,		return TTIImpl->getExtendedReductionCost(Opcode, IsUnsigned, ResTy, Ty, FMF,
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	public:

InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
unsigned Index, Value Op0, Value Op1);		unsigned Index, Value Op0, Value Op1);
InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,		InstructionCost getVectorInstrCost(const Instruction &I, Type *Val,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
unsigned Index);		unsigned Index);

InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		InstructionCost getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getArithmeticReductionCostSVE(unsigned Opcode,		InstructionCost getArithmeticReductionCostSVE(unsigned Opcode,
VectorType *ValTy,		VectorType *ValTy,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getSpliceCost(VectorType *Tp, int Index);		InstructionCost getSpliceCost(VectorType *Tp, int Index);

▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 3,268 Lines • ▼ Show 20 Lines	bool AArch64TTIImpl::isLegalToVectorizeReduction(
case RecurKind::FMulAdd:		case RecurKind::FMulAdd:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

InstructionCost		InstructionCost
AArch64TTIImpl::getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		AArch64TTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);		std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);

if (LT.second.getScalarType() == MVT::f16 && !ST->hasFullFP16())		if (LT.second.getScalarType() == MVT::f16 && !ST->hasFullFP16())
return BaseT::getMinMaxReductionCost(Ty, CondTy, IsUnsigned, FMF, CostKind);		return BaseT::getMinMaxReductionCost(IID, Ty, FMF, CostKind);

assert((isa<ScalableVectorType>(Ty) == isa<ScalableVectorType>(CondTy)) &&
"Both vector needs to be equally scalable");

InstructionCost LegalizationCost = 0;		InstructionCost LegalizationCost = 0;
if (LT.first > 1) {		if (LT.first > 1) {
Type *LegalVTy = EVT(LT.second).getTypeForEVT(Ty->getContext());		Type *LegalVTy = EVT(LT.second).getTypeForEVT(Ty->getContext());
Intrinsic::ID MinMaxOpcode =		IntrinsicCostAttributes Attrs(IID, LegalVTy, {LegalVTy, LegalVTy}, FMF);
Ty->isFPOrFPVectorTy()
? Intrinsic::maxnum
: (IsUnsigned ? Intrinsic::umin : Intrinsic::smin);
IntrinsicCostAttributes Attrs(MinMaxOpcode, LegalVTy, {LegalVTy, LegalVTy},
FMF);
LegalizationCost = getIntrinsicInstrCost(Attrs, CostKind) * (LT.first - 1);		LegalizationCost = getIntrinsicInstrCost(Attrs, CostKind) * (LT.first - 1);
}		}

return LegalizationCost + /Cost of horizontal reduction/ 2;		return LegalizationCost + /Cost of horizontal reduction/ 2;
}		}

InstructionCost AArch64TTIImpl::getArithmeticReductionCostSVE(		InstructionCost AArch64TTIImpl::getArithmeticReductionCostSVE(
unsigned Opcode, VectorType *ValTy, TTI::TargetCostKind CostKind) {		unsigned Opcode, VectorType *ValTy, TTI::TargetCostKind CostKind) {
▲ Show 20 Lines • Show All 502 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	public:
int getInlinerVectorBonusPercent() const { return InlinerVectorBonusPercent; }		int getInlinerVectorBonusPercent() const { return InlinerVectorBonusPercent; }

InstructionCost getArithmeticReductionCost(		InstructionCost getArithmeticReductionCost(
unsigned Opcode, VectorType *Ty, std::optional<FastMathFlags> FMF,		unsigned Opcode, VectorType *Ty, std::optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		InstructionCost getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETTRANSFORMINFO_H

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 781 Lines • ▼ Show 20 Lines	GCNTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
if (!ST->hasVOP3PInsts() \|\| OrigTy.getScalarSizeInBits() != 16)		if (!ST->hasVOP3PInsts() \|\| OrigTy.getScalarSizeInBits() != 16)
return BaseT::getArithmeticReductionCost(Opcode, Ty, FMF, CostKind);		return BaseT::getArithmeticReductionCost(Opcode, Ty, FMF, CostKind);

std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);		std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
return LT.first * getFullRateInstrCost();		return LT.first * getFullRateInstrCost();
}		}

InstructionCost		InstructionCost
GCNTTIImpl::getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		GCNTTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
EVT OrigTy = TLI->getValueType(DL, Ty);		EVT OrigTy = TLI->getValueType(DL, Ty);

// Computes cost on targets that have packed math instructions(which support		// Computes cost on targets that have packed math instructions(which support
// 16-bit types only).		// 16-bit types only).
if (!ST->hasVOP3PInsts() \|\| OrigTy.getScalarSizeInBits() != 16)		if (!ST->hasVOP3PInsts() \|\| OrigTy.getScalarSizeInBits() != 16)
return BaseT::getMinMaxReductionCost(Ty, CondTy, IsUnsigned, FMF, CostKind);		return BaseT::getMinMaxReductionCost(IID, Ty, FMF, CostKind);

std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);		std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
return LT.first * getHalfRateInstrCost(CostKind);		return LT.first * getHalfRateInstrCost(CostKind);
}		}

InstructionCost GCNTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,		InstructionCost GCNTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
unsigned Index, Value *Op0,		unsigned Index, Value *Op0,
▲ Show 20 Lines • Show All 525 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I);		const Instruction *I);

InstructionCost getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		InstructionCost getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH,		TTI::CastContextHint CCH,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		InstructionCost getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
std::optional<FastMathFlags> FMF,		std::optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned,		InstructionCost getExtendedReductionCost(unsigned Opcode, bool IsUnsigned,
Type ResTy, VectorType ValTy,		Type ResTy, VectorType ValTy,
▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 1,206 Lines • ▼ Show 20 Lines	if (isa<ScalableVectorType>(Ty)) {
const unsigned MinSize = DL.getTypeSizeInBits(Ty).getKnownMinValue();		const unsigned MinSize = DL.getTypeSizeInBits(Ty).getKnownMinValue();
const unsigned VectorBits = getVScaleForTuning() RISCV::RVVBitsPerBlock;		const unsigned VectorBits = getVScaleForTuning() RISCV::RVVBitsPerBlock;
return RISCVTargetLowering::computeVLMAX(VectorBits, EltSize, MinSize);		return RISCVTargetLowering::computeVLMAX(VectorBits, EltSize, MinSize);
}		}
return cast<FixedVectorType>(Ty)->getNumElements();		return cast<FixedVectorType>(Ty)->getNumElements();
}		}

InstructionCost		InstructionCost
RISCVTTIImpl::getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		RISCVTTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
if (isa<FixedVectorType>(Ty) && !ST->useRVVForFixedLengthVectors())		if (isa<FixedVectorType>(Ty) && !ST->useRVVForFixedLengthVectors())
return BaseT::getMinMaxReductionCost(Ty, CondTy, IsUnsigned, FMF, CostKind);		return BaseT::getMinMaxReductionCost(IID, Ty, FMF, CostKind);

// Skip if scalar size of Ty is bigger than ELEN.		// Skip if scalar size of Ty is bigger than ELEN.
if (Ty->getScalarSizeInBits() > ST->getELEN())		if (Ty->getScalarSizeInBits() > ST->getELEN())
return BaseT::getMinMaxReductionCost(Ty, CondTy, IsUnsigned, FMF, CostKind);		return BaseT::getMinMaxReductionCost(IID, Ty, FMF, CostKind);

std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);		std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
if (Ty->getElementType()->isIntegerTy(1))		if (Ty->getElementType()->isIntegerTy(1))
// vcpop sequences, see vreduction-mask.ll. umax, smin actually only		// vcpop sequences, see vreduction-mask.ll. umax, smin actually only
// cost 2, but we don't have enough info here so we slightly over cost.		// cost 2, but we don't have enough info here so we slightly over cost.
return (LT.first - 1) + 3;		return (LT.first - 1) + 3;

// IR Reduction is composed by two vmv and one rvv reduction instruction.		// IR Reduction is composed by two vmv and one rvv reduction instruction.
▲ Show 20 Lines • Show All 540 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	public:

InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
std::optional<FastMathFlags> FMF,		std::optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getMinMaxCost(Type Ty, Type CondTy,		InstructionCost getMinMaxCost(Intrinsic::ID IID, Type *Ty,
TTI::TargetCostKind CostKind, bool IsUnsigned,		TTI::TargetCostKind CostKind,
FastMathFlags FMF);		FastMathFlags FMF);

InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		InstructionCost getMinMaxReductionCost(Intrinsic::ID IID, VectorType *Ty,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getInterleavedMemoryOpCost(		InstructionCost getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);
InstructionCost getInterleavedMemoryOpCostAVX512(		InstructionCost getInterleavedMemoryOpCostAVX512(
unsigned Opcode, FixedVectorType *VecTy, unsigned Factor,		unsigned Opcode, FixedVectorType *VecTy, unsigned Factor,
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,250 Lines • ▼ Show 20 Lines	while (NumVecElts > 1) {
ReductionCost += getArithmeticInstrCost(Opcode, Ty, CostKind);		ReductionCost += getArithmeticInstrCost(Opcode, Ty, CostKind);
}		}

// Add the final extract element to the cost.		// Add the final extract element to the cost.
return ReductionCost + getVectorInstrCost(Instruction::ExtractElement, Ty,		return ReductionCost + getVectorInstrCost(Instruction::ExtractElement, Ty,
CostKind, 0, nullptr, nullptr);		CostKind, 0, nullptr, nullptr);
}		}

InstructionCost X86TTIImpl::getMinMaxCost(Type Ty, Type CondTy,		InstructionCost X86TTIImpl::getMinMaxCost(Intrinsic::ID IID, Type *Ty,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
bool IsUnsigned, FastMathFlags FMF) {		FastMathFlags FMF) {
Intrinsic::ID Id;		IntrinsicCostAttributes ICA(IID, Ty, {Ty, Ty}, FMF);
if (Ty->isIntOrIntVectorTy()) {
Id = IsUnsigned ? Intrinsic::umin : Intrinsic::smin;
} else {
assert(Ty->isFPOrFPVectorTy() &&
"Expected float point or integer vector type.");
Id = Intrinsic::minnum;
}

IntrinsicCostAttributes ICA(Id, Ty, {Ty, Ty}, FMF);
return getIntrinsicInstrCost(ICA, CostKind);		return getIntrinsicInstrCost(ICA, CostKind);
}		}

InstructionCost		InstructionCost
X86TTIImpl::getMinMaxReductionCost(VectorType ValTy, VectorType CondTy,		X86TTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *ValTy,
bool IsUnsigned, FastMathFlags FMF,		FastMathFlags FMF,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(ValTy);		std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(ValTy);

MVT MTy = LT.second;		MVT MTy = LT.second;

int ISD;		int ISD;
if (ValTy->isIntOrIntVectorTy()) {		if (ValTy->isIntOrIntVectorTy()) {
ISD = IsUnsigned ? ISD::UMIN : ISD::SMIN;		ISD = (IID == Intrinsic::umin \|\| IID == Intrinsic::umax) ? ISD::UMIN
		: ISD::SMIN;
} else {		} else {
assert(ValTy->isFPOrFPVectorTy() &&		assert(ValTy->isFPOrFPVectorTy() &&
"Expected float point or integer vector type.");		"Expected float point or integer vector type.");
ISD = ISD::FMINNUM;		ISD = (IID == Intrinsic::minnum \|\| IID == Intrinsic::maxnum)
		? ISD::FMINNUM
		: ISD::FMINIMUM;
}		}

// We use the Intel Architecture Code Analyzer(IACA) to measure the throughput		// We use the Intel Architecture Code Analyzer(IACA) to measure the throughput
// and make it as the cost.		// and make it as the cost.

static const CostTblEntry SSE2CostTbl[] = {		static const CostTblEntry SSE2CostTbl[] = {
{ISD::UMIN, MVT::v2i16, 5}, // need pxors to use pminsw/pmaxsw		{ISD::UMIN, MVT::v2i16, 5}, // need pxors to use pminsw/pmaxsw
{ISD::UMIN, MVT::v4i16, 7}, // need pxors to use pminsw/pmaxsw		{ISD::UMIN, MVT::v4i16, 7}, // need pxors to use pminsw/pmaxsw
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	X86TTIImpl::getMinMaxReductionCost(Intrinsic::ID IID, VectorType *ValTy,

auto *Ty = ValVTy;		auto *Ty = ValVTy;
InstructionCost MinMaxCost = 0;		InstructionCost MinMaxCost = 0;
if (LT.first != 1 && MTy.isVector() &&		if (LT.first != 1 && MTy.isVector() &&
MTy.getVectorNumElements() < ValVTy->getNumElements()) {		MTy.getVectorNumElements() < ValVTy->getNumElements()) {
// Type needs to be split. We need LT.first - 1 operations ops.		// Type needs to be split. We need LT.first - 1 operations ops.
Ty = FixedVectorType::get(ValVTy->getElementType(),		Ty = FixedVectorType::get(ValVTy->getElementType(),
MTy.getVectorNumElements());		MTy.getVectorNumElements());
auto *SubCondTy = FixedVectorType::get(CondTy->getElementType(),		MinMaxCost = getMinMaxCost(IID, Ty, CostKind, FMF);
MTy.getVectorNumElements());
MinMaxCost = getMinMaxCost(Ty, SubCondTy, CostKind, IsUnsigned, FMF);
MinMaxCost *= LT.first - 1;		MinMaxCost *= LT.first - 1;
NumVecElts = MTy.getVectorNumElements();		NumVecElts = MTy.getVectorNumElements();
}		}

if (ST->hasBWI())		if (ST->hasBWI())
if (const auto *Entry = CostTableLookup(AVX512BWCostTbl, ISD, MTy))		if (const auto *Entry = CostTableLookup(AVX512BWCostTbl, ISD, MTy))
return MinMaxCost + Entry->Cost;		return MinMaxCost + Entry->Cost;

Show All 10 Lines	if (const auto *Entry = CostTableLookup(SSE2CostTbl, ISD, MTy))
return MinMaxCost + Entry->Cost;		return MinMaxCost + Entry->Cost;

unsigned ScalarSize = ValTy->getScalarSizeInBits();		unsigned ScalarSize = ValTy->getScalarSizeInBits();

// Special case power of 2 reductions where the scalar type isn't changed		// Special case power of 2 reductions where the scalar type isn't changed
// by type legalization.		// by type legalization.
if (!isPowerOf2_32(ValVTy->getNumElements()) \|\|		if (!isPowerOf2_32(ValVTy->getNumElements()) \|\|
ScalarSize != MTy.getScalarSizeInBits())		ScalarSize != MTy.getScalarSizeInBits())
return BaseT::getMinMaxReductionCost(ValTy, CondTy, IsUnsigned, FMF,		return BaseT::getMinMaxReductionCost(IID, ValTy, FMF, CostKind);
CostKind);

// Now handle reduction with the legal type, taking into account size changes		// Now handle reduction with the legal type, taking into account size changes
// at each level.		// at each level.
while (NumVecElts > 1) {		while (NumVecElts > 1) {
// Determine the size of the remaining vector we need to reduce.		// Determine the size of the remaining vector we need to reduce.
unsigned Size = NumVecElts * ScalarSize;		unsigned Size = NumVecElts * ScalarSize;
NumVecElts /= 2;		NumVecElts /= 2;
// If we're reducing from 256/512 bits, use an extract_subvector.		// If we're reducing from 256/512 bits, use an extract_subvector.
Show All 27 Lines	if (Size > 128) {
Type::getIntNTy(ValTy->getContext(), Size), 128 / Size);		Type::getIntNTy(ValTy->getContext(), Size), 128 / Size);
MinMaxCost += getArithmeticInstrCost(		MinMaxCost += getArithmeticInstrCost(
Instruction::LShr, ShiftTy, TTI::TCK_RecipThroughput,		Instruction::LShr, ShiftTy, TTI::TCK_RecipThroughput,
{TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},		{TargetTransformInfo::OK_AnyValue, TargetTransformInfo::OP_None},
{TargetTransformInfo::OK_UniformConstantValue, TargetTransformInfo::OP_None});		{TargetTransformInfo::OK_UniformConstantValue, TargetTransformInfo::OP_None});
}		}

// Add the arithmetic op for this level.		// Add the arithmetic op for this level.
auto *SubCondTy =		MinMaxCost += getMinMaxCost(IID, Ty, CostKind, FMF);
FixedVectorType::get(CondTy->getElementType(), Ty->getNumElements());
MinMaxCost += getMinMaxCost(Ty, SubCondTy, CostKind, IsUnsigned, FMF);
}		}

// Add the final extract element to the cost.		// Add the final extract element to the cost.
return MinMaxCost + getVectorInstrCost(Instruction::ExtractElement, Ty,		return MinMaxCost + getVectorInstrCost(Instruction::ExtractElement, Ty,
CostKind, 0, nullptr, nullptr);		CostKind, 0, nullptr, nullptr);
}		}

/// Calculate the cost of materializing a 64-bit value. This helper		/// Calculate the cost of materializing a 64-bit value. This helper
▲ Show 20 Lines • Show All 1,198 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,798 Lines • ▼ Show 20 Lines	case RecurKind::FMul: {
break;		break;
}		}
case RecurKind::FMax:		case RecurKind::FMax:
case RecurKind::FMin:		case RecurKind::FMin:
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin: {		case RecurKind::UMin: {
if (!AllConsts) {
auto *VecCondTy =
cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));
bool IsUnsigned =
RdxKind == RecurKind::UMax \|\| RdxKind == RecurKind::UMin;
VectorCost = TTI->getMinMaxReductionCost(VectorTy, VecCondTy,
IsUnsigned, FMF, CostKind);
}
Intrinsic::ID Id = getMinMaxReductionIntrinsicOp(RdxKind);		Intrinsic::ID Id = getMinMaxReductionIntrinsicOp(RdxKind);
		if (!AllConsts)
		VectorCost = TTI->getMinMaxReductionCost(Id, VectorTy, FMF, CostKind);
		nikicUnsubmitted Not Done Reply Inline Actions Use this value instead of your switch? nikic: Use this value instead of your switch?
ScalarCost = EvaluateScalarCost([&]() {		ScalarCost = EvaluateScalarCost([&]() {
IntrinsicCostAttributes ICA(Id, ScalarTy, {ScalarTy, ScalarTy}, FMF);		IntrinsicCostAttributes ICA(Id, ScalarTy, {ScalarTy, ScalarTy}, FMF);
return TTI->getIntrinsicInstrCost(ICA, CostKind);		return TTI->getIntrinsicInstrCost(ICA, CostKind);
});		});
break;		break;
}		}
default:		default:
llvm_unreachable("Expected arithmetic or min/max reduction operation");		llvm_unreachable("Expected arithmetic or min/max reduction operation");
▲ Show 20 Lines • Show All 1,204 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	;
%V8i32 = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> undef)		%V8i32 = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> undef)
%V2i64 = call i64 @llvm.vector.reduce.smax.v2i64(<2 x i64> undef)		%V2i64 = call i64 @llvm.vector.reduce.smax.v2i64(<2 x i64> undef)
%V4i64 = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> undef)		%V4i64 = call i64 @llvm.vector.reduce.smax.v4i64(<4 x i64> undef)
ret void		ret void
}		}

define void @reduce_fmin16() {		define void @reduce_fmin16() {
; CHECK-NOF16-LABEL: 'reduce_fmin16'		; CHECK-NOF16-LABEL: 'reduce_fmin16'
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2f16 = call half @llvm.vector.reduce.fmin.v2f16(<2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2f16 = call half @llvm.vector.reduce.fmin.v2f16(<2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V4f16 = call half @llvm.vector.reduce.fmin.v4f16(<4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V4f16 = call half @llvm.vector.reduce.fmin.v4f16(<4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 117 for instruction: %V8f16 = call half @llvm.vector.reduce.fmin.v8f16(<8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 237 for instruction: %V8f16 = call half @llvm.vector.reduce.fmin.v8f16(<8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 190 for instruction: %V16f16 = call half @llvm.vector.reduce.fmin.v16f16(<16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 350 for instruction: %V16f16 = call half @llvm.vector.reduce.fmin.v16f16(<16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2f16m = call half @llvm.vector.reduce.fminimum.v2f16(<2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2f16m = call half @llvm.vector.reduce.fminimum.v2f16(<2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4f16m = call half @llvm.vector.reduce.fminimum.v4f16(<4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4f16m = call half @llvm.vector.reduce.fminimum.v4f16(<4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V8f16m = call half @llvm.vector.reduce.fminimum.v8f16(<8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V8f16m = call half @llvm.vector.reduce.fminimum.v8f16(<8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V16f16m = call half @llvm.vector.reduce.fminimum.v16f16(<16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V16f16m = call half @llvm.vector.reduce.fminimum.v16f16(<16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-F16-LABEL: 'reduce_fmin16'		; CHECK-F16-LABEL: 'reduce_fmin16'
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16 = call half @llvm.vector.reduce.fmin.v2f16(<2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16 = call half @llvm.vector.reduce.fmin.v2f16(<2 x half> undef)
Show All 14 Lines	;
%V4f16m = call half @llvm.vector.reduce.fminimum.v4f16(<4 x half> undef)		%V4f16m = call half @llvm.vector.reduce.fminimum.v4f16(<4 x half> undef)
%V8f16m = call half @llvm.vector.reduce.fminimum.v8f16(<8 x half> undef)		%V8f16m = call half @llvm.vector.reduce.fminimum.v8f16(<8 x half> undef)
%V16f16m = call half @llvm.vector.reduce.fminimum.v16f16(<16 x half> undef)		%V16f16m = call half @llvm.vector.reduce.fminimum.v16f16(<16 x half> undef)
ret void		ret void
}		}

define void @reduce_fmax16() {		define void @reduce_fmax16() {
; CHECK-NOF16-LABEL: 'reduce_fmax16'		; CHECK-NOF16-LABEL: 'reduce_fmax16'
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V2f16 = call half @llvm.vector.reduce.fmax.v2f16(<2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V2f16 = call half @llvm.vector.reduce.fmax.v2f16(<2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V4f16 = call half @llvm.vector.reduce.fmax.v4f16(<4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V4f16 = call half @llvm.vector.reduce.fmax.v4f16(<4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 117 for instruction: %V8f16 = call half @llvm.vector.reduce.fmax.v8f16(<8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 237 for instruction: %V8f16 = call half @llvm.vector.reduce.fmax.v8f16(<8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 190 for instruction: %V16f16 = call half @llvm.vector.reduce.fmax.v16f16(<16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 350 for instruction: %V16f16 = call half @llvm.vector.reduce.fmax.v16f16(<16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2f16m = call half @llvm.vector.reduce.fmaximum.v2f16(<2 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2f16m = call half @llvm.vector.reduce.fmaximum.v2f16(<2 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4f16m = call half @llvm.vector.reduce.fmaximum.v4f16(<4 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %V4f16m = call half @llvm.vector.reduce.fmaximum.v4f16(<4 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V8f16m = call half @llvm.vector.reduce.fmaximum.v8f16(<8 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %V8f16m = call half @llvm.vector.reduce.fmaximum.v8f16(<8 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V16f16m = call half @llvm.vector.reduce.fmaximum.v16f16(<16 x half> undef)		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 58 for instruction: %V16f16m = call half @llvm.vector.reduce.fmaximum.v16f16(<16 x half> undef)
; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void		; CHECK-NOF16-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;		;
; CHECK-F16-LABEL: 'reduce_fmax16'		; CHECK-F16-LABEL: 'reduce_fmax16'
; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16 = call half @llvm.vector.reduce.fmax.v2f16(<2 x half> undef)		; CHECK-F16-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %V2f16 = call half @llvm.vector.reduce.fmax.v2f16(<2 x half> undef)
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/ARM/intrinsic-cost-kinds.ll

	Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	call void @llvm.masked.scatter.v16f32.v16p0(<16 x float> %va, <16 x ptr> %vb, i32 1, <16 x i1> %vc)			call void @llvm.masked.scatter.v16f32.v16p0(<16 x float> %va, <16 x ptr> %vb, i32 1, <16 x i1> %vc)
	ret void			ret void
	}			}

	define void @reduce_fmax(<16 x float> %va) {			define void @reduce_fmax(<16 x float> %va) {
	; THRU-LABEL: 'reduce_fmax'			; THRU-LABEL: 'reduce_fmax'
	; THRU-NEXT: Cost Model: Found an estimated cost of 133 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			; THRU-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
	; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; THRU-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	; LATE-LABEL: 'reduce_fmax'			; LATE-LABEL: 'reduce_fmax'
	; LATE-NEXT: Cost Model: Found an estimated cost of 131 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			; LATE-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE-LABEL: 'reduce_fmax'			; SIZE-LABEL: 'reduce_fmax'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 122 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			; SIZE-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE_LATE-LABEL: 'reduce_fmax'			; SIZE_LATE-LABEL: 'reduce_fmax'
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 131 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	%v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			%v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
	ret void			ret void
	}			}

	define void @memcpy(ptr %a, ptr %b, i32 %c) {			define void @memcpy(ptr %a, ptr %b, i32 %c) {
	; THRU-LABEL: 'memcpy'			; THRU-LABEL: 'memcpy'
	Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CostModel] Use min/max intrinsics for vecreduce.min/max costsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 537089

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll

llvm/test/Analysis/CostModel/ARM/intrinsic-cost-kinds.ll

[CostModel] Use min/max intrinsics for vecreduce.min/max costs
ClosedPublic