This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
BasicTTIImpl.h
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
20/36
AArch64TargetTransformInfo.cpp
-
test/Analysis/CostModel/AArch64/
-
Analysis/
-
CostModel/
-
AArch64/
3/4
sve-getIntrinsicInstrCost-vector-reduce.ll

Differential D93639

[AArch64][SVE]Add cost model for vector reduce for scalable vector
ClosedPublic

Authored by CarolineConcatto on Dec 21 2020, 8:07 AM.

Download Raw Diff

Details

Reviewers

efriedma
sdesmalen
david-arm
ctetreau

Commits

rG172f1f8952c9: [AArch64][SVE]Add cost model for vector reduce for scalable vector

Summary

It uses MaxVScale to compute the maximum size of the scalable vector,
and assumes vector.reduce.<operand> uses tree-wise reduction algorithm
to compute the cost model.

Depends on D93030

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

CarolineConcatto created this revision.Dec 21 2020, 8:07 AM

Herald added a reviewer: efriedma. · View Herald TranscriptDec 21 2020, 8:07 AM

Herald added subscribers: NickHung, psnobl, hiraditya and 2 others. · View Herald Transcript

CarolineConcatto requested review of this revision.Dec 21 2020, 8:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 21 2020, 8:07 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

CarolineConcatto added reviewers: sdesmalen, david-arm, ctetreau.Dec 21 2020, 8:09 AM

Harbormaster completed remote builds in B83155: Diff 313111.Dec 21 2020, 8:52 AM

cameron.mcinally added a subscriber: cameron.mcinally.Dec 26 2020, 9:47 AM

sdesmalen added inline comments.Jan 5 2021, 9:58 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1107	My understanding is that the PairWise form is not something that can be expressed with the LLVM IR intrinsic and is therefore specific to fixed-width vectors which can use `shufflevector`. Perhaps you can just assert IsPairWise is false. @david-arm any thoughts?
1110	The scalable property should match for Ty and CondTy, so that can be an assert. With the if-condition changed to only check Ty.
1114	remove TODO.
1117	nit: `s/getMaxVScale().getValue()/*getMaxVScale()/`
1122–1129	nit: `unsigned CmpOpcode = Ty->isFPOrFPVectorTy() ? Instruction::FCmp : Instruction::ICmp;`
1135	I think this should be: Log2(LT.first) * MinMaxVectorCost + NumReduxLevels * MinMaxScalarCost Because the number of legalized vectors will do min/max operations amongst the vectors (to end up with a single vector), before it will do the min/max reduction on the elements within the vector. That also means distinguishing the cost for CmpSel on vectors and CmpSel on scalars.
1141	remove TODO.
1148	nit: You can just as well inline this into the expression below.

Have a couple nits, but seems reasonable to me. Somebody else should review as well.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1118	please name the type
1139	ScalableVectorType
1141	why not just return invalid cost now and assert at the call site?
1169	NIT: `isa<ScalableVectorType>(ValTy)`

david-arm added inline comments.Jan 6 2021, 12:25 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1107	I think we can probably assert IsPairwiseForm=false for now.
1114	Remove TODO here as max vscale should always be a valid value for AArch64 + scalable types?
1117	I don't think we need to use max vscale at all here to be honest. I've discussed this with Carol privately and getTypeLegalizationCost below already returns the number of parts needed (LT.first) along with the MVT for the legal vector type (LT.second). Since we have instructions that do both min/max operations (SMIN,etc.) as well as the horizontal vector reductions for min/max (SMINV, etc.) we can just do something similar to what X86 did and have a very simple cost lookup table for the legal horizontal reduction (SMINV), then add on a legalisation cost. i.e. something like: unsigned LegalizationCost = 0; if (not legal) { // reduce to a single vector, i.e. // res0 = SMIN part0, part1 // res1 = SMIN res0, part2 // ... LegalizationCost = getMinMaxCost(); // SMIN, SMAX, etc. LegalizationCost *= LT.first - 1; } // do cost lookup for SMINV, etc. return LegalizationCost + LegalReductionCost;
1121	As mentioned above, I think we can avoid using max vscale and calculating redux levels.
1135	Similar to my comment above about max vscale, I think we can do this in a much simpler way. Rather than try to calculate a complex cost in this way using selects and compares, etc., I think we can just do it the same way that X86 does. We can have a simple cost lookup table for legal reductions (SMIN/MAX,UMIN/MAX, etc.), then add on a legalisation factor.
1141	Remove TODO?
1163	Same comments as for the MinMax case. We can avoid all reference to max vscale and redux levels, by treating this as the summation of: LegalizationCost (sequence of FADD,ADD,etc. to reduce to single vector) + LegalReductionCost (single horizontal reduction instruction, e.g. FADDV). I think the only case we don't support here is mul reductions, but we can just put in a high cost estimate for now, right?

-Split the cost on legalization and horizontal vector reduction cost as
suggested in the review

Harbormaster completed remote builds in B84347: Diff 315167.Jan 7 2021, 10:19 AM

sdesmalen added inline comments.Jan 8 2021, 3:21 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121–1123	I don't think this condition is necessary because LT.first > 1 already implies splitting is required.
1124	You can use `EVT::getTypeForEVT` instead.
1135	Sure, we can do that. I was more thinking that we'd then lose scaling of the cost if we know something about vscale. Reductions requires successive inter-lane operations (as opposed to e.g. fadd, which can be done in parallel), so the number of lanes probably has some impact on the cost. If we make it dependent on MaxVscale, the cost-model could return a lower cost if MaxVL=128bits than if MaxVL is 1024bits (so we can tweak the costs with -msve-vector-bits=<bits>). X86 doesn't have this problem, because the vector size is always fixed, so it can use a look-up-table. I'm not saying that I'm completely against a simple fixed cost initially, just sharing my thoughts.
1147–1148	I don't know what default could be here other than `llvm_unreachable`. And if that is unreachable, then you can just as well return the final cost on line 1137.
1166–1168	same as above, I think this is implied by LT.first > 1.
1169–1170	Use `EVT::getTypeForEVT` for LT.second ?
1176	move this to the `default` in switch below?
1189	For Min/Max you use a fixed-cost of `1` and here you use `2` and `16`. Does that mean Min/Max should also be 2? And should 16 actually be Invalid when this function will use InstructionCost because it cannot be expanded for SVE? If so, can you add a TODO?

use EVT instead of MVT to get the vector type
use unreachable for the MinMax switch
add TODO for Arithmetic cost when the ISD is not supported
change MinMax horizontal reduction cost to be 2

-remove redundant code when computing Type for MinMax Cost

Thank you @sdesmalen for helping to refine the code.
I hope it is near to the end now.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1147–1148	I've added unreachable to the switch for default as it is not expected to be reached at any time. Without the default case the compiler complains: warning: control reaches end of non-void function as I did not added no return after the switch.
1189	So, I have changed the cost of MinMax and Arithmetic to be 2. I imagine it is fine for them to be the same.

Harbormaster completed remote builds in B84847: Diff 316122.Jan 12 2021, 10:06 AM

Harbormaster completed remote builds in B84870: Diff 316150.Jan 12 2021, 10:45 AM

sdesmalen added inline comments.Jan 13 2021, 7:10 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1119	We already know `Ty` is a `VectorType` (see line 1104), so this cast is redundant.
1120	Nit: can you call this `LegalVTy`? With the two comments above, that would become: Type *LegalVTy = EVT(LT.second).getTypeForEVT(Ty->getContext();
1131–1138	The switch statement below and selection of the ISD seems unnecessary if the cost is the same, you can just write: `return LegalizationCost + /Cost of reduction/ 2;`
1159	We already know Ty is a VectorType (see line 1151), so this cast is redundant.
1169	nit: `s/getArithmeticReductionCostScalableVectorType/getArithmeticReductionCostSVE`

sdesmalen added inline comments.Jan 13 2021, 9:52 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1169	@CarolineConcatto pointed out to me that earlier on in this patch @ctetreau suggested the opposite. My argument for having 'SVE' in the name over 'ScalableVectorType' is that the reduction cost is different when SVE is available, regardless of whether the type passed is a scalable or fixed-width vector. For fixed-width vectors, the compiler may still choose to use one of the SVE reduction instructions if available. The code in that function is not specific to scalable vectors either.

-remove redundant code
-change getArithmeticReductionCostScalableVectorType to getArithmeticReductionCostSVE
-replace unsigned by int

CarolineConcatto marked 13 inline comments as done.Jan 13 2021, 10:22 AM

Harbormaster completed remote builds in B85045: Diff 316442.Jan 13 2021, 10:53 AM

Thanks for the changes so far, I think the patch is nearly there now.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1153–1155	SVE has no instructions for MUL/FMUL reductions, so these should fall under `default` (return Invalid).
llvm/test/Analysis/CostModel/AArch64/sve-getIntrinsicInstrCost-vector-reduce.ll
14	Can you make two tests for each reduction: One with a legal type (`<vscale x 2 x i64>` (or `<vscale x 2 x double>` for fp reductions)) One with an illegal type that needs splitting (`<vscale x 8 x i64>` (or `<vscale x 8 x double>` for fp reductions))

remove cost for vector.reduce.{mul"fmul}
add regression tests for not legal vectors

Harbormaster completed remote builds in B85529: Diff 317233.Jan 17 2021, 11:38 AM

-add missing test for fmax,fadd with integer

@Thank you Sander,
I added more tests. Now we have tests for legal and not legal, besides the test with integer for FP operands.

Harbormaster completed remote builds in B85535: Diff 317240.Jan 17 2021, 12:57 PM

change vector size to 128bits

Harbormaster completed remote builds in B85540: Diff 317245.Jan 17 2021, 3:17 PM

sdesmalen added inline comments.Jan 18 2021, 1:10 AM

llvm/test/Analysis/CostModel/AArch64/sve-getIntrinsicInstrCost-vector-reduce.ll

92–99

These tests can be removed because Floating Point reductions need an FP vector as their input, not an integer vector.

154

I was actually hoping for the test file to be organised in pairs of tests with a legal type, followed by an illegal (too wide) type, as follows:

define i64 @mul.i64.nxv2i64(<vscale x 2 x i64> %v) {
; CHECK-LABEL: 'mul.i64.nxv2i64'
; CHECK-NEXT: Cost Model: Found an estimated cost of .... for instruction:   %r = call i64 @llvm.vector.reduce.mul.nxv2i64(<vscale x 2 x i64> %v)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   ret i64 %r

  %r = call i64 @llvm.vector.reduce.mul.nxv2i64(<vscale x 2 x i64> %v)
  ret i64 %r
}

define i64 @mul.i64.nxv8i64(<vscale x 8 x i64> %v) {
; CHECK-LABEL: 'mul.i64.nxv8i64'
; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction:   %r = call i64 @llvm.vector.reduce.mul.nxv8i64(<vscale x 8 x i64> %v)
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction:   ret i64 %r

  %r = call i64 @llvm.vector.reduce.mul.nxv8i64(<vscale x 8 x i64> %v)
  ret i64 %r
}

where the only difference is the number of elements (not the element type). This then clearly tests the cost on a legal type followed by a test where the code for legalization cost is exercised.

-re-arrange the test
-remove f<operand> tests with integer

Harbormaster completed remote builds in B85571: Diff 317305.Jan 18 2021, 3:56 AM

LGTM

llvm/test/Analysis/CostModel/AArch64/sve-getIntrinsicInstrCost-vector-reduce.ll
20	This is different from what I suggested (increasing the number of elements), but the result of increasing the element-width is the same for the cost function, so I guess it's okay.

This revision is now accepted and ready to land.Jan 18 2021, 4:13 AM

Closed by commit rG172f1f8952c9: [AArch64][SVE]Add cost model for vector reduce for scalable vector (authored by CarolineConcatto). · Explain WhyJan 19 2021, 3:54 AM

This revision was automatically updated to reflect the committed changes.

CarolineConcatto marked 2 inline comments as done.

CarolineConcatto added a commit: rG172f1f8952c9: [AArch64][SVE]Add cost model for vector reduce for scalable vector.

CarolineConcatto mentioned this in D95363: [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse.Feb 7 2021, 10:41 AM

liaolucy mentioned this in D126372: [RISCV]Add basic cost model for vector reduce for scalable vector.May 25 2022, 5:38 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

BasicTTIImpl.h

4 lines

lib/

Target/

AArch64/

AArch64TargetTransformInfo.h

8 lines

AArch64TargetTransformInfo.cpp

62 lines

test/

Analysis/

CostModel/

AArch64/

sve-getIntrinsicInstrCost-vector-reduce.ll

127 lines

Diff 316442

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,282 Lines • ▼ Show 20 Lines	unsigned getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
case Intrinsic::vector_reduce_or:		case Intrinsic::vector_reduce_or:
case Intrinsic::vector_reduce_xor:		case Intrinsic::vector_reduce_xor:
case Intrinsic::vector_reduce_smax:		case Intrinsic::vector_reduce_smax:
case Intrinsic::vector_reduce_smin:		case Intrinsic::vector_reduce_smin:
case Intrinsic::vector_reduce_fmax:		case Intrinsic::vector_reduce_fmax:
case Intrinsic::vector_reduce_fmin:		case Intrinsic::vector_reduce_fmin:
case Intrinsic::vector_reduce_umax:		case Intrinsic::vector_reduce_umax:
case Intrinsic::vector_reduce_umin: {		case Intrinsic::vector_reduce_umin: {
if (isa<ScalableVectorType>(RetTy))
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
IntrinsicCostAttributes Attrs(IID, RetTy, Args[0]->getType(), FMF, 1, I);		IntrinsicCostAttributes Attrs(IID, RetTy, Args[0]->getType(), FMF, 1, I);
return getTypeBasedIntrinsicInstrCost(Attrs, CostKind);		return getTypeBasedIntrinsicInstrCost(Attrs, CostKind);
}		}
case Intrinsic::vector_reduce_fadd:		case Intrinsic::vector_reduce_fadd:
case Intrinsic::vector_reduce_fmul: {		case Intrinsic::vector_reduce_fmul: {
if (isa<ScalableVectorType>(RetTy))
return BaseT::getIntrinsicInstrCost(ICA, CostKind);
IntrinsicCostAttributes Attrs(		IntrinsicCostAttributes Attrs(
IID, RetTy, {Args[0]->getType(), Args[1]->getType()}, FMF, 1, I);		IID, RetTy, {Args[0]->getType(), Args[1]->getType()}, FMF, 1, I);
return getTypeBasedIntrinsicInstrCost(Attrs, CostKind);		return getTypeBasedIntrinsicInstrCost(Attrs, CostKind);
}		}
case Intrinsic::fshl:		case Intrinsic::fshl:
case Intrinsic::fshr: {		case Intrinsic::fshr: {
if (isa<ScalableVectorType>(RetTy))		if (isa<ScalableVectorType>(RetTy))
return BaseT::getIntrinsicInstrCost(ICA, CostKind);		return BaseT::getIntrinsicInstrCost(ICA, CostKind);
▲ Show 20 Lines • Show All 738 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	public:

int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index);		unsigned Index);

unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);		unsigned getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind);

int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);

		int getMinMaxReductionCost(VectorType Ty, VectorType CondTy,
		bool IsPairwise, bool IsUnsigned,
		TTI::TargetCostKind CostKind);

		int getArithmeticReductionCostSVE(unsigned Opcode, VectorType *ValTy,
		bool IsPairwiseForm,
		TTI::TargetCostKind CostKind);

int getArithmeticInstrCost(		int getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,095 Lines • ▼ Show 20 Lines	bool AArch64TTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty,
case Instruction::FCmp:		case Instruction::FCmp:
return Flags.NoNaN;		return Flags.NoNaN;
default:		default:
llvm_unreachable("Unhandled reduction opcode");		llvm_unreachable("Unhandled reduction opcode");
}		}
return false;		return false;
}		}

		int AArch64TTIImpl::getMinMaxReductionCost(VectorType Ty, VectorType CondTy,
		bool IsPairwise, bool IsUnsigned,
		TTI::TargetCostKind CostKind) {
		if (!isa<ScalableVectorType>(Ty))
		sdesmalenUnsubmitted Done Reply Inline Actions My understanding is that the PairWise form is not something that can be expressed with the LLVM IR intrinsic and is therefore specific to fixed-width vectors which can use `shufflevector`. Perhaps you can just assert IsPairWise is false. @david-arm any thoughts? sdesmalen: My understanding is that the PairWise form is not something that can be expressed with the LLVM…
		david-armUnsubmitted Done Reply Inline Actions I think we can probably assert IsPairwiseForm=false for now. david-arm: I think we can probably assert IsPairwiseForm=false for now.
		return BaseT::getMinMaxReductionCost(Ty, CondTy, IsPairwise, IsUnsigned,
		CostKind);
		assert((isa<ScalableVectorType>(Ty) && isa<ScalableVectorType>(CondTy)) &&
		sdesmalenUnsubmitted Done Reply Inline Actions The scalable property should match for Ty and CondTy, so that can be an assert. With the if-condition changed to only check Ty. sdesmalen: The scalable property should match for Ty and CondTy, so that can be an assert. With the if…
		"Both vector needs to be scalable");

		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);
		int LegalizationCost = 0;
		david-armUnsubmitted Not Done Reply Inline Actions Remove TODO here as max vscale should always be a valid value for AArch64 + scalable types? david-arm: Remove TODO here as max vscale should always be a valid value for AArch64 + scalable types?
		sdesmalenUnsubmitted Not Done Reply Inline Actions remove TODO. sdesmalen: remove TODO.
		if (LT.first > 1) {
		Type *LegalVTy = EVT(LT.second).getTypeForEVT(Ty->getContext());
		unsigned CmpOpcode =
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: `s/getMaxVScale().getValue()/getMaxVScale()/` sdesmalen:* nit: `s/getMaxVScale().getValue()/*getMaxVScale()/`
		david-armUnsubmitted Not Done Reply Inline Actions I don't think we need to use max vscale at all here to be honest. I've discussed this with Carol privately and getTypeLegalizationCost below already returns the number of parts needed (LT.first) along with the MVT for the legal vector type (LT.second). Since we have instructions that do both min/max operations (SMIN,etc.) as well as the horizontal vector reductions for min/max (SMINV, etc.) we can just do something similar to what X86 did and have a very simple cost lookup table for the legal horizontal reduction (SMINV), then add on a legalisation cost. i.e. something like: unsigned LegalizationCost = 0; if (not legal) { // reduce to a single vector, i.e. // res0 = SMIN part0, part1 // res1 = SMIN res0, part2 // ... LegalizationCost = getMinMaxCost(); // SMIN, SMAX, etc. LegalizationCost = LT.first - 1; } // do cost lookup for SMINV, etc. return LegalizationCost + LegalReductionCost; david-arm:* I don't think we need to use max vscale at all here to be honest. I've discussed this with…
		Ty->isFPOrFPVectorTy() ? Instruction::FCmp : Instruction::ICmp;
		ctetreauUnsubmitted Not Done Reply Inline Actions please name the type ctetreau: please name the type
		LegalizationCost =
		sdesmalenUnsubmitted Done Reply Inline Actions We already know `Ty` is a `VectorType` (see line 1104), so this cast is redundant. sdesmalen: We already know `Ty` is a `VectorType` (see line 1104), so this cast is redundant.
		getCmpSelInstrCost(CmpOpcode, LegalVTy, LegalVTy,
		sdesmalenUnsubmitted Done Reply Inline Actions Nit: can you call this `LegalVTy`? With the two comments above, that would become: Type LegalVTy = EVT(LT.second).getTypeForEVT(Ty->getContext(); sdesmalen:* Nit: can you call this `LegalVTy`? With the two comments above, that would become: Type…
		CmpInst::BAD_ICMP_PREDICATE, CostKind) +
		david-armUnsubmitted Not Done Reply Inline Actions As mentioned above, I think we can avoid using max vscale and calculating redux levels. david-arm: As mentioned above, I think we can avoid using max vscale and calculating redux levels.
		getCmpSelInstrCost(Instruction::Select, LegalVTy, LegalVTy,
		CmpInst::BAD_ICMP_PREDICATE, CostKind);
		sdesmalenUnsubmitted Done Reply Inline Actions I don't think this condition is necessary because LT.first > 1 already implies splitting is required. sdesmalen: I don't think this condition is necessary because LT.first > 1 already implies splitting is…
		LegalizationCost *= LT.first - 1;
		sdesmalenUnsubmitted Done Reply Inline Actions You can use `EVT::getTypeForEVT` instead. sdesmalen: You can use `EVT::getTypeForEVT` instead.
		}

		return LegalizationCost + /Cost of horizontal reduction/ 2;
		}

		sdesmalenUnsubmitted Done Reply Inline Actions nit: `unsigned CmpOpcode = Ty->isFPOrFPVectorTy() ? Instruction::FCmp : Instruction::ICmp;` sdesmalen: nit: `unsigned CmpOpcode = Ty->isFPOrFPVectorTy() ? Instruction::FCmp : Instruction::ICmp;`
		int AArch64TTIImpl::getArithmeticReductionCostSVE(
		unsigned Opcode, VectorType *ValTy, bool IsPairwise,
		TTI::TargetCostKind CostKind) {
		assert(!IsPairwise && "Cannot be pair wise to continue");

		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
		sdesmalenUnsubmitted Not Done Reply Inline Actions I think this should be: Log2(LT.first) * MinMaxVectorCost + NumReduxLevels * MinMaxScalarCost Because the number of legalized vectors will do min/max operations amongst the vectors (to end up with a single vector), before it will do the min/max reduction on the elements within the vector. That also means distinguishing the cost for CmpSel on vectors and CmpSel on scalars. sdesmalen: I think this should be: ```Log2(LT.first) * MinMaxVectorCost + NumReduxLevels *…
		david-armUnsubmitted Not Done Reply Inline Actions Similar to my comment above about max vscale, I think we can do this in a much simpler way. Rather than try to calculate a complex cost in this way using selects and compares, etc., I think we can just do it the same way that X86 does. We can have a simple cost lookup table for legal reductions (SMIN/MAX,UMIN/MAX, etc.), then add on a legalisation factor. david-arm: Similar to my comment above about max vscale, I think we can do this in a much simpler way.
		sdesmalenUnsubmitted Not Done Reply Inline Actions Sure, we can do that. I was more thinking that we'd then lose scaling of the cost if we know something about vscale. Reductions requires successive inter-lane operations (as opposed to e.g. fadd, which can be done in parallel), so the number of lanes probably has some impact on the cost. If we make it dependent on MaxVscale, the cost-model could return a lower cost if MaxVL=128bits than if MaxVL is 1024bits (so we can tweak the costs with -msve-vector-bits=<bits>). X86 doesn't have this problem, because the vector size is always fixed, so it can use a look-up-table. I'm not saying that I'm completely against a simple fixed cost initially, just sharing my thoughts. sdesmalen: Sure, we can do that. I was more thinking that we'd then lose scaling of the cost if we know…
		int LegalizationCost = 0;
		if (LT.first > 1) {
		Type *LegalVTy = EVT(LT.second).getTypeForEVT(ValTy->getContext());
		sdesmalenUnsubmitted Done Reply Inline Actions The switch statement below and selection of the ISD seems unnecessary if the cost is the same, you can just write: `return LegalizationCost + /Cost of reduction/ 2;` sdesmalen: The switch statement below and selection of the ISD seems unnecessary if the cost is the same…
		LegalizationCost = getArithmeticInstrCost(Opcode, LegalVTy, CostKind);
		ctetreauUnsubmitted Not Done Reply Inline Actions ScalableVectorType ctetreau: ScalableVectorType
		LegalizationCost *= LT.first - 1;
		}
		david-armUnsubmitted Not Done Reply Inline Actions Remove TODO? david-arm: Remove TODO?
		sdesmalenUnsubmitted Not Done Reply Inline Actions remove TODO. sdesmalen: remove TODO.
		ctetreauUnsubmitted Not Done Reply Inline Actions why not just return invalid cost now and assert at the call site? ctetreau: why not just return invalid cost now and assert at the call site?

		int ISD = TLI->InstructionOpcodeToISD(Opcode);
		assert(ISD && "Invalid opcode");
		// Add the final reduction cost for the legal horizontal reduction
		switch (ISD) {
		case ISD::ADD:
		case ISD::AND:
		sdesmalenUnsubmitted Done Reply Inline Actions nit: You can just as well inline this into the expression below. sdesmalen: nit: You can just as well inline this into the expression below.
		sdesmalenUnsubmitted Not Done Reply Inline Actions I don't know what default could be here other than `llvm_unreachable`. And if that is unreachable, then you can just as well return the final cost on line 1137. sdesmalen: I don't know what default could be here other than `llvm_unreachable`. And if that is…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions I've added unreachable to the switch for default as it is not expected to be reached at any time. Without the default case the compiler complains: warning: control reaches end of non-void function as I did not added no return after the switch. CarolineConcatto: I've added unreachable to the switch for default as it is not expected to be reached at any…
		case ISD::OR:
		case ISD::XOR:
		case ISD::FADD:
		return LegalizationCost + 2;
		case ISD::FMUL:
		case ISD::MUL:
		return LegalizationCost + 16;
		sdesmalenUnsubmitted Done Reply Inline Actions SVE has no instructions for MUL/FMUL reductions, so these should fall under `default` (return Invalid). sdesmalen: SVE has no instructions for MUL/FMUL reductions, so these should fall under `default` (return…
		default:
		// TODO: Replace for invalid when InstructionCost is used
		// cases not supported by SVE
		return 16;
		sdesmalenUnsubmitted Done Reply Inline Actions We already know Ty is a VectorType (see line 1151), so this cast is redundant. sdesmalen: We already know Ty is a VectorType (see line 1151), so this cast is redundant.
		}
		}

int AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode,		int AArch64TTIImpl::getArithmeticReductionCost(unsigned Opcode,
		david-armUnsubmitted Not Done Reply Inline Actions Same comments as for the MinMax case. We can avoid all reference to max vscale and redux levels, by treating this as the summation of: LegalizationCost (sequence of FADD,ADD,etc. to reduce to single vector) + LegalReductionCost (single horizontal reduction instruction, e.g. FADDV). I think the only case we don't support here is mul reductions, but we can just put in a high cost estimate for now, right? david-arm: Same comments as for the MinMax case. We can avoid all reference to max vscale and redux levels…
VectorType *ValTy,		VectorType *ValTy,
bool IsPairwiseForm,		bool IsPairwiseForm,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {

		if (isa<ScalableVectorType>(ValTy))
		sdesmalenUnsubmitted Done Reply Inline Actions same as above, I think this is implied by LT.first > 1. sdesmalen: same as above, I think this is implied by LT.first > 1.
		return getArithmeticReductionCostSVE(Opcode, ValTy, IsPairwiseForm,
		ctetreauUnsubmitted Done Reply Inline Actions NIT: `isa<ScalableVectorType>(ValTy)` ctetreau: NIT: `isa<ScalableVectorType>(ValTy)`
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `s/getArithmeticReductionCostScalableVectorType/getArithmeticReductionCostSVE` sdesmalen: nit: `s/getArithmeticReductionCostScalableVectorType/getArithmeticReductionCostSVE`
		sdesmalenUnsubmitted Not Done Reply Inline Actions @CarolineConcatto pointed out to me that earlier on in this patch @ctetreau suggested the opposite. My argument for having 'SVE' in the name over 'ScalableVectorType' is that the reduction cost is different when SVE is available, regardless of whether the type passed is a scalable or fixed-width vector. For fixed-width vectors, the compiler may still choose to use one of the SVE reduction instructions if available. The code in that function is not specific to scalable vectors either. sdesmalen: @CarolineConcatto pointed out to me that earlier on in this patch @ctetreau suggested the…
		CostKind);
		sdesmalenUnsubmitted Done Reply Inline Actions Use `EVT::getTypeForEVT` for LT.second ? sdesmalen: Use `EVT::getTypeForEVT` for LT.second ?
if (IsPairwiseForm)		if (IsPairwiseForm)
return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm,		return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm,
CostKind);		CostKind);

std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
MVT MTy = LT.second;		MVT MTy = LT.second;
		sdesmalenUnsubmitted Done Reply Inline Actions move this to the `default` in switch below? sdesmalen: move this to the `default` in switch below?
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");

// Horizontal adds can use the 'addv' instruction. We model the cost of these		// Horizontal adds can use the 'addv' instruction. We model the cost of these
// instructions as normal vector adds. This is the only arithmetic vector		// instructions as normal vector adds. This is the only arithmetic vector
// reduction operation for which we have an instruction.		// reduction operation for which we have an instruction.
static const CostTblEntry CostTblNoPairwise[]{		static const CostTblEntry CostTblNoPairwise[]{
{ISD::ADD, MVT::v8i8, 1},		{ISD::ADD, MVT::v8i8, 1},
{ISD::ADD, MVT::v16i8, 1},		{ISD::ADD, MVT::v16i8, 1},
{ISD::ADD, MVT::v4i16, 1},		{ISD::ADD, MVT::v4i16, 1},
{ISD::ADD, MVT::v8i16, 1},		{ISD::ADD, MVT::v8i16, 1},
{ISD::ADD, MVT::v4i32, 1},		{ISD::ADD, MVT::v4i32, 1},
};		};
		sdesmalenUnsubmitted Done Reply Inline Actions For Min/Max you use a fixed-cost of `1` and here you use `2` and `16`. Does that mean Min/Max should also be 2? And should 16 actually be Invalid when this function will use InstructionCost because it cannot be expanded for SVE? If so, can you add a TODO? sdesmalen: For Min/Max you use a fixed-cost of `1` and here you use `2` and `16`. Does that mean Min/Max…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions So, I have changed the cost of MinMax and Arithmetic to be 2. I imagine it is fine for them to be the same. CarolineConcatto: So, I have changed the cost of MinMax and Arithmetic to be 2. I imagine it is fine for them to…

if (const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy))		if (const auto *Entry = CostTableLookup(CostTblNoPairwise, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm,		return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwiseForm,
CostKind);		CostKind);
}		}

▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/sve-getIntrinsicInstrCost-vector-reduce.ll

This file was added.

				; Check getIntrinsicInstrCost in BasicTTIImpl.h with SVE for masked gather

				; RUN: opt -cost-model -analyze -mtriple=aarch64--linux-gnu -mattr=+sve < %s 2>%t \| FileCheck %s


				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				define i64 @add.i64.nxv8i64(<vscale x 8 x i64> %v) {
				; CHECK-LABEL: 'add.i64.nxv8i64'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r = call i64 @llvm.vector.reduce.add.nxv8i64(<vscale x 8 x i64> %v)
				; CHECK-NEXT:Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r

				%r = call i64 @llvm.vector.reduce.add.nxv8i64(<vscale x 8 x i64> %v)
				sdesmalenUnsubmitted Done Reply Inline Actions Can you make two tests for each reduction: One with a legal type (`<vscale x 2 x i64>` (or `<vscale x 2 x double>` for fp reductions)) One with an illegal type that needs splitting (`<vscale x 8 x i64>` (or `<vscale x 8 x double>` for fp reductions)) sdesmalen: Can you make two tests for each reduction: * One with a legal type (`<vscale x 2 x i64>` (or…
				ret i64 %r
				}

				define i8 @mul.i8.nxv8i8(<vscale x 8 x i8> %v) {
				; CHECK-LABEL: 'mul.i8.nxv8i8'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r = call i8 @llvm.vector.reduce.mul.nxv8i8(<vscale x 8 x i8> %v)
				sdesmalenUnsubmitted Not Done Reply Inline Actions This is different from what I suggested (increasing the number of elements), but the result of increasing the element-width is the same for the cost function, so I guess it's okay. sdesmalen: This is different from what I suggested (increasing the number of elements), but the result of…
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %r

				%r = call i8 @llvm.vector.reduce.mul.nxv8i8(<vscale x 8 x i8> %v)
				ret i8 %r
				}

				define i8 @and.i8.nxv8i8(<vscale x 8 x i8> %v) {
				; CHECK-LABEL: 'and.i8.nxv8i8'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r = call i8 @llvm.vector.reduce.and.nxv8i8(<vscale x 8 x i8> %v)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %r

				%r = call i8 @llvm.vector.reduce.and.nxv8i8(<vscale x 8 x i8> %v)
				ret i8 %r
				}

				define i8 @or.i8.nxv8i8(<vscale x 8 x i8> %v) {
				; CHECK-LABEL: 'or.i8.nxv8i8'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r = call i8 @llvm.vector.reduce.or.nxv8i8(<vscale x 8 x i8> %v)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %r

				%r = call i8 @llvm.vector.reduce.or.nxv8i8(<vscale x 8 x i8> %v)
				ret i8 %r
				}

				define i8 @xor.i8.nxv8i8(<vscale x 8 x i8> %v) {
				; CHECK-LABEL: 'xor.i8.nxv8i8'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r = call i8 @llvm.vector.reduce.xor.nxv8i8(<vscale x 8 x i8> %v)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %r

				%r = call i8 @llvm.vector.reduce.xor.nxv8i8(<vscale x 8 x i8> %v)
				ret i8 %r
				}

				define i8 @umin.i8.nxv8i8(<vscale x 8 x i8> %v) {
				; CHECK-LABEL: 'umin.i8.nxv8i8'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r = call i8 @llvm.vector.reduce.umin.nxv8i8(<vscale x 8 x i8> %v)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %r

				%r = call i8 @llvm.vector.reduce.umin.nxv8i8(<vscale x 8 x i8> %v)
				ret i8 %r
				}

				define float @fmax.f32.nxv8f32(<vscale x 8 x float> %v) {
				; CHECK-LABEL: 'fmax.f32.nxv8f32'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = call float @llvm.vector.reduce.fmax.nxv8f32(<vscale x 8 x float> %v)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r

				%r = call float @llvm.vector.reduce.fmax.nxv8f32(<vscale x 8 x float> %v)
				ret float %r
				}

				define float @fmin.f32.nxv8f32(<vscale x 8 x float> %v) {
				; CHECK-LABEL: 'fmin.f32.nxv8f32'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r = call float @llvm.vector.reduce.fmin.nxv8f32(<vscale x 8 x float> %v)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %r

				%r = call float @llvm.vector.reduce.fmin.nxv8f32(<vscale x 8 x float> %v)
				ret float %r
				}

				define i8 @umax.i8.nxv8i8(<vscale x 8 x i8> %v) {
				; CHECK-LABEL: 'umax.i8.nxv8i8'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r = call i8 @llvm.vector.reduce.umax.nxv8i8(<vscale x 8 x i8> %v)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %r

				%r = call i8 @llvm.vector.reduce.umax.nxv8i8(<vscale x 8 x i8> %v)
				ret i8 %r
				}
				define i8 @smin.i8.nxv8i8(<vscale x 8 x i8> %v) {
				; CHECK-LABEL: 'smin.i8.nxv8i8'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r = call i8 @llvm.vector.reduce.smin.nxv8i8(<vscale x 8 x i8> %v)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %r

				%r = call i8 @llvm.vector.reduce.smin.nxv8i8(<vscale x 8 x i8> %v)
				ret i8 %r
				}
				define i8 @smax.i8.nxv8i8(<vscale x 8 x i8> %v) {
				; CHECK-LABEL: 'smax.i8.nxv8i8'
				; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r = call i8 @llvm.vector.reduce.smax.nxv8i8(<vscale x 8 x i8> %v)
				sdesmalenUnsubmitted Done Reply Inline Actions These tests can be removed because Floating Point reductions need an FP vector as their input, not an integer vector. sdesmalen: These tests can be removed because Floating Point reductions need an FP vector as their input…
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %r

				%r = call i8 @llvm.vector.reduce.smax.nxv8i8(<vscale x 8 x i8> %v)
				ret i8 %r
				}

				define float @fadda_nxv8f32(float %start, <vscale x 8 x float> %a) #0 {
				; CHECK-LABEL: 'fadda_nxv8f32
				; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %res = call float @llvm.vector.reduce.fadd.nxv8f32(float %start, <vscale x 8 x float> %a)
				; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret float %res

				%res = call float @llvm.vector.reduce.fadd.nxv8f32(float %start, <vscale x 8 x float> %a)
				ret float %res
				}


				declare i64 @llvm.vector.reduce.add.nxv8i64(<vscale x 8 x i64>)
				declare i8 @llvm.vector.reduce.mul.nxv8i8(<vscale x 8 x i8>)
				declare i8 @llvm.vector.reduce.and.nxv8i8(<vscale x 8 x i8>)
				declare i8 @llvm.vector.reduce.or.nxv8i8(<vscale x 8 x i8>)
				declare i8 @llvm.vector.reduce.xor.nxv8i8(<vscale x 8 x i8>)
				declare float @llvm.vector.reduce.fmax.nxv8f32(<vscale x 8 x float>)
				declare float @llvm.vector.reduce.fmin.nxv8f32(<vscale x 8 x float>)
				declare i8 @llvm.vector.reduce.umin.nxv8i8(<vscale x 8 x i8>)
				declare i8 @llvm.vector.reduce.umax.nxv8i8(<vscale x 8 x i8>)
				declare i8 @llvm.vector.reduce.smin.nxv8i8(<vscale x 8 x i8>)
				declare i8 @llvm.vector.reduce.smax.nxv8i8(<vscale x 8 x i8>)
				declare float @llvm.vector.reduce.fadd.nxv8f32(float, <vscale x 8 x float>)
				sdesmalenUnsubmitted Done Reply Inline Actions I was actually hoping for the test file to be organised in pairs of tests with a legal type, followed by an illegal (too wide) type, as follows: define i64 @mul.i64.nxv2i64(<vscale x 2 x i64> %v) { ; CHECK-LABEL: 'mul.i64.nxv2i64' ; CHECK-NEXT: Cost Model: Found an estimated cost of .... for instruction: %r = call i64 @llvm.vector.reduce.mul.nxv2i64(<vscale x 2 x i64> %v) ; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r %r = call i64 @llvm.vector.reduce.mul.nxv2i64(<vscale x 2 x i64> %v) ret i64 %r } define i64 @mul.i64.nxv8i64(<vscale x 8 x i64> %v) { ; CHECK-LABEL: 'mul.i64.nxv8i64' ; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %r = call i64 @llvm.vector.reduce.mul.nxv8i64(<vscale x 8 x i64> %v) ; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %r %r = call i64 @llvm.vector.reduce.mul.nxv8i64(<vscale x 8 x i64> %v) ret i64 %r } where the only difference is the number of elements (not the element type). This then clearly tests the cost on a legal type followed by a test where the code for legalization cost is exercised. sdesmalen: I was actually hoping for the test file to be organised in pairs of tests with a legal type…

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE]Add cost model for vector reduce for scalable vectorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 316442

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/sve-getIntrinsicInstrCost-vector-reduce.ll

[AArch64][SVE]Add cost model for vector reduce for scalable vector
ClosedPublic