This is an archive of the discontinued LLVM Phabricator instance.

[ARM] remove cost-kind predicate for cmp/sel costs
ClosedPublic

Authored by spatel on Nov 4 2020, 11:30 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samparker

Commits

rG264a6df353b7: [ARM] remove cost-kind predicate for cmp/sel costs

Summary

This is the cmp/sel sibling to D90692.
Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops).

We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Nov 4 2020, 11:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2020, 11:30 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls, mcrosier. · View Herald Transcript

spatel requested review of this revision.Nov 4 2020, 11:30 AM

spatel mentioned this in D90554: [CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation.Nov 4 2020, 11:34 AM

samparker added inline comments.Nov 5 2020, 12:38 AM

llvm/test/Analysis/CostModel/ARM/intrinsic-cost-kinds.ll
220	I know this is a tiny change, but it's drawn my attention because it's so high the number is so high. Does this look right to you @dmgreen ? I would have thought we were able to break this up more efficiently with our native support, or is this because we'd have to copy the GPR registers into an FPRs to perform some final scalar maxs..?

spatel added inline comments.Nov 5 2020, 6:32 AM

llvm/test/Analysis/CostModel/ARM/intrinsic-cost-kinds.ll
220	Stepping through this, the cost is derived from the BasicTTIImpl calling back to the target for shuffle+cmp+sel: // Assume the pairwise shuffles add a cost. ShuffleCost += (IsPairwise + 1) * thisT()->getShuffleCost(TTI::SK_ExtractSubvector, Ty, NumVecElts, SubTy); MinMaxCost += thisT()->getCmpSelInstrCost(CmpOpcode, SubTy, CondTy, CmpInst::BAD_ICMP_PREDICATE, CostKind) + thisT()->getCmpSelInstrCost(Instruction::Select, SubTy, CondTy, CmpInst::BAD_ICMP_PREDICATE, CostKind); And this progresses for v16f32 as: 384 -> 480 -> 608 for the shuffles and 8 -> 12 -> 20 for the cmp/sel, so 608 + 20 = 628. The shuffle cost seems to be expanded as a series of insert/extract based on the number of elements in the vector, so it's exploding.

This looks fine to me, if Sam doesn't disagree. The costs look better to me, and the MVE costs should be our problem not yours :)

llvm/test/Analysis/CostModel/ARM/intrinsic-cost-kinds.ll
220	Our scalarization overhead is set very high, super high to protect us against regressions. I think it's actually O(n^2). It ends up meaning that wide vectors get very high costs, and was useful early on when we were getting some pretty ropy codegen. We have plans to set it to something more sensible but every time I try it still gives regressions somewhere. We only have 8 registers so sometimes these high costs are useful, especially in the vectorizer. It sounds like we could do with some better SK_ExtractSubvector costs for extracting simple legal vector, although it may be better to just have custom reduction costs in this specific case, as I don't think that code will model how we actually lower reductions. I will try and look into that at some point, but for the moment they are very deliberately very wrong :)

This revision is now accepted and ready to land.Nov 5 2020, 8:44 AM

and the MVE costs should be our problem not yours :)

Indeed! Thanks both for the explanations.

Closed by commit rG264a6df353b7: [ARM] remove cost-kind predicate for cmp/sel costs (authored by spatel). · Explain WhyNov 5 2020, 11:53 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG264a6df353b7: [ARM] remove cost-kind predicate for cmp/sel costs.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMTargetTransformInfo.cpp

16 lines

test/

Analysis/

CostModel/

ARM/

intrinsic-cost-kinds.ll

4 lines

select.ll

14 lines

Diff 303215

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	if (CostKind == TTI::TCK_CodeSize && ISD == ISD::SELECT &&
// i1 values may need rematerialising by using mov immediates and/or		// i1 values may need rematerialising by using mov immediates and/or
// flag setting instructions.		// flag setting instructions.
if (ValTy->isIntegerTy(1))		if (ValTy->isIntegerTy(1))
++Cost;		++Cost;

return Cost;		return Cost;
}		}

if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind,
I);

// On NEON a vector select gets lowered to vbsl.		// On NEON a vector select gets lowered to vbsl.
if (ST->hasNEON() && ValTy->isVectorTy() && ISD == ISD::SELECT) {		if (ST->hasNEON() && ValTy->isVectorTy() && ISD == ISD::SELECT && CondTy) {
// Lowering of some vector selects is currently far from perfect.		// Lowering of some vector selects is currently far from perfect.
static const TypeConversionCostTblEntry NEONVectorSelectTbl[] = {		static const TypeConversionCostTblEntry NEONVectorSelectTbl[] = {
{ ISD::SELECT, MVT::v4i1, MVT::v4i64, 44 + 12 + 1 },		{ ISD::SELECT, MVT::v4i1, MVT::v4i64, 44 + 12 + 1 },
{ ISD::SELECT, MVT::v8i1, MVT::v8i64, 50 },		{ ISD::SELECT, MVT::v8i1, MVT::v8i64, 50 },
{ ISD::SELECT, MVT::v16i1, MVT::v16i64, 100 }		{ ISD::SELECT, MVT::v16i1, MVT::v16i64, 100 }
};		};

EVT SelCondTy = TLI->getValueType(DL, CondTy);		EVT SelCondTy = TLI->getValueType(DL, CondTy);
EVT SelValTy = TLI->getValueType(DL, ValTy);		EVT SelValTy = TLI->getValueType(DL, ValTy);
if (SelCondTy.isSimple() && SelValTy.isSimple()) {		if (SelCondTy.isSimple() && SelValTy.isSimple()) {
if (const auto *Entry = ConvertCostTableLookup(NEONVectorSelectTbl, ISD,		if (const auto *Entry = ConvertCostTableLookup(NEONVectorSelectTbl, ISD,
SelCondTy.getSimpleVT(),		SelCondTy.getSimpleVT(),
SelValTy.getSimpleVT()))		SelValTy.getSimpleVT()))
return Entry->Cost;		return Entry->Cost;
}		}

std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
return LT.first;		return LT.first;
}		}

int BaseCost = ST->hasMVEIntegerOps() && ValTy->isVectorTy()		// Default to cheap (throughput/size of 1 instruction) but adjust throughput
? ST->getMVEVectorCostFactor()		// for "multiple beats" potentially needed by MVE instructions.
: 1;		int BaseCost = 1;
		if (CostKind != TTI::TCK_CodeSize && ST->hasMVEIntegerOps() &&
		ValTy->isVectorTy())
		BaseCost = ST->getMVEVectorCostFactor();

return BaseCost *		return BaseCost *
BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);		BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);
}		}

int ARMTTIImpl::getAddressComputationCost(Type Ty, ScalarEvolution SE,		int ARMTTIImpl::getAddressComputationCost(Type Ty, ScalarEvolution SE,
const SCEV *Ptr) {		const SCEV *Ptr) {
// Address computations in vectorized code with non-consecutive addresses will		// Address computations in vectorized code with non-consecutive addresses will
// likely result in more instructions compared to scalar code where the		// likely result in more instructions compared to scalar code where the
▲ Show 20 Lines • Show All 1,141 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/ARM/intrinsic-cost-kinds.ll

	Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
	;			;
	; SIZE-LABEL: 'fshl'			; SIZE-LABEL: 'fshl'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %s = call i32 @llvm.fshl.i32(i32 %a, i32 %b, i32 %c)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %s = call i32 @llvm.fshl.i32(i32 %a, i32 %b, i32 %c)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 546 for instruction: %v = call <16 x i32> @llvm.fshl.v16i32(<16 x i32> %va, <16 x i32> %vb, <16 x i32> %vc)			; SIZE-NEXT: Cost Model: Found an estimated cost of 546 for instruction: %v = call <16 x i32> @llvm.fshl.v16i32(<16 x i32> %va, <16 x i32> %vb, <16 x i32> %vc)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE_LATE-LABEL: 'fshl'			; SIZE_LATE-LABEL: 'fshl'
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %s = call i32 @llvm.fshl.i32(i32 %a, i32 %b, i32 %c)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %s = call i32 @llvm.fshl.i32(i32 %a, i32 %b, i32 %c)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 562 for instruction: %v = call <16 x i32> @llvm.fshl.v16i32(<16 x i32> %va, <16 x i32> %vb, <16 x i32> %vc)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 564 for instruction: %v = call <16 x i32> @llvm.fshl.v16i32(<16 x i32> %va, <16 x i32> %vb, <16 x i32> %vc)
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	%s = call i32 @llvm.fshl.i32(i32 %a, i32 %b, i32 %c)			%s = call i32 @llvm.fshl.i32(i32 %a, i32 %b, i32 %c)
	%v = call <16 x i32> @llvm.fshl.v16i32(<16 x i32> %va, <16 x i32> %vb, <16 x i32> %vc)			%v = call <16 x i32> @llvm.fshl.v16i32(<16 x i32> %va, <16 x i32> %vb, <16 x i32> %vc)
	ret void			ret void
	}			}

	define void @maskedgather(<16 x float*> %va, <16 x i1> %vb, <16 x float> %vc) {			define void @maskedgather(<16 x float*> %va, <16 x i1> %vb, <16 x float> %vc) {
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; LATE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			; LATE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
	; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE-LABEL: 'reduce_fmax'			; SIZE-LABEL: 'reduce_fmax'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 620 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			; SIZE-NEXT: Cost Model: Found an estimated cost of 620 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; SIZE_LATE-LABEL: 'reduce_fmax'			; SIZE_LATE-LABEL: 'reduce_fmax'
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 620 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 628 for instruction: %v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
				samparkerUnsubmitted Not Done Reply Inline Actions I know this is a tiny change, but it's drawn my attention because it's so high the number is so high. Does this look right to you @dmgreen ? I would have thought we were able to break this up more efficiently with our native support, or is this because we'd have to copy the GPR registers into an FPRs to perform some final scalar maxs..? samparker: I know this is a tiny change, but it's drawn my attention because it's so high the number is so…
				spatelAuthorUnsubmitted Done Reply Inline Actions Stepping through this, the cost is derived from the BasicTTIImpl calling back to the target for shuffle+cmp+sel: // Assume the pairwise shuffles add a cost. ShuffleCost += (IsPairwise + 1) * thisT()->getShuffleCost(TTI::SK_ExtractSubvector, Ty, NumVecElts, SubTy); MinMaxCost += thisT()->getCmpSelInstrCost(CmpOpcode, SubTy, CondTy, CmpInst::BAD_ICMP_PREDICATE, CostKind) + thisT()->getCmpSelInstrCost(Instruction::Select, SubTy, CondTy, CmpInst::BAD_ICMP_PREDICATE, CostKind); And this progresses for v16f32 as: 384 -> 480 -> 608 for the shuffles and 8 -> 12 -> 20 for the cmp/sel, so 608 + 20 = 628. The shuffle cost seems to be expanded as a series of insert/extract based on the number of elements in the vector, so it's exploding. spatel: Stepping through this, the cost is derived from the BasicTTIImpl calling back to the target for…
				dmgreenUnsubmitted Not Done Reply Inline Actions Our scalarization overhead is set very high, super high to protect us against regressions. I think it's actually O(n^2). It ends up meaning that wide vectors get very high costs, and was useful early on when we were getting some pretty ropy codegen. We have plans to set it to something more sensible but every time I try it still gives regressions somewhere. We only have 8 registers so sometimes these high costs are useful, especially in the vectorizer. It sounds like we could do with some better SK_ExtractSubvector costs for extracting simple legal vector, although it may be better to just have custom reduction costs in this specific case, as I don't think that code will model how we actually lower reductions. I will try and look into that at some point, but for the moment they are very deliberately very wrong :) dmgreen: Our scalarization overhead is set very high, super high to protect us against regressions. I…
	; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; SIZE_LATE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	%v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)			%v = call float @llvm.vector.reduce.fmax.v16f32(<16 x float> %va)
	ret void			ret void
	}			}

	define void @memcpy(i8* %a, i8* %b, i32 %c) {			define void @memcpy(i8* %a, i8* %b, i32 %c) {
	; THRU-LABEL: 'memcpy'			; THRU-LABEL: 'memcpy'
	Show All 18 Lines

llvm/test/Analysis/CostModel/ARM/select.ll

	Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v6 = select i1 undef, double undef, double undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v6 = select i1 undef, double undef, double undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = select <2 x i1> undef, <2 x i8> undef, <2 x i8> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v7 = select <2 x i1> undef, <2 x i8> undef, <2 x i8> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = select <4 x i1> undef, <4 x i8> undef, <4 x i8> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v8 = select <4 x i1> undef, <4 x i8> undef, <4 x i8> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = select <8 x i1> undef, <8 x i8> undef, <8 x i8> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v9 = select <8 x i1> undef, <8 x i8> undef, <8 x i8> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = select <16 x i1> undef, <16 x i8> undef, <16 x i8> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v10 = select <16 x i1> undef, <16 x i8> undef, <16 x i8> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = select <2 x i1> undef, <2 x i16> undef, <2 x i16> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v11 = select <2 x i1> undef, <2 x i16> undef, <2 x i16> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v12 = select <4 x i1> undef, <4 x i16> undef, <4 x i16> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v12 = select <4 x i1> undef, <4 x i16> undef, <4 x i16> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v13 = select <8 x i1> undef, <8 x i16> undef, <8 x i16> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v13 = select <8 x i1> undef, <8 x i16> undef, <8 x i16> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v13b = select <16 x i1> undef, <16 x i16> undef, <16 x i16> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v13b = select <16 x i1> undef, <16 x i16> undef, <16 x i16> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v14 = select <2 x i1> undef, <2 x i32> undef, <2 x i32> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v14 = select <2 x i1> undef, <2 x i32> undef, <2 x i32> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v15 = select <4 x i1> undef, <4 x i32> undef, <4 x i32> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v15 = select <4 x i1> undef, <4 x i32> undef, <4 x i32> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v15b = select <8 x i1> undef, <8 x i32> undef, <8 x i32> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v15b = select <8 x i1> undef, <8 x i32> undef, <8 x i32> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v15c = select <16 x i1> undef, <16 x i32> undef, <16 x i32> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v15c = select <16 x i1> undef, <16 x i32> undef, <16 x i32> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16 = select <2 x i1> undef, <2 x i64> undef, <2 x i64> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16 = select <2 x i1> undef, <2 x i64> undef, <2 x i64> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16a = select <4 x i1> undef, <4 x i64> undef, <4 x i64> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 19 for instruction: %v16a = select <4 x i1> undef, <4 x i64> undef, <4 x i64> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16b = select <8 x i1> undef, <8 x i64> undef, <8 x i64> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %v16b = select <8 x i1> undef, <8 x i64> undef, <8 x i64> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v16c = select <16 x i1> undef, <16 x i64> undef, <16 x i64> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 100 for instruction: %v16c = select <16 x i1> undef, <16 x i64> undef, <16 x i64> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v17 = select <2 x i1> undef, <2 x float> undef, <2 x float> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v17 = select <2 x i1> undef, <2 x float> undef, <2 x float> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v18 = select <4 x i1> undef, <4 x float> undef, <4 x float> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v18 = select <4 x i1> undef, <4 x float> undef, <4 x float> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v19 = select <2 x i1> undef, <2 x double> undef, <2 x double> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v19 = select <2 x i1> undef, <2 x double> undef, <2 x double> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v20 = select <1 x i1> undef, <1 x i32> undef, <1 x i32> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v20 = select <1 x i1> undef, <1 x i32> undef, <1 x i32> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21 = select <3 x i1> undef, <3 x float> undef, <3 x float> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v21 = select <3 x i1> undef, <3 x float> undef, <3 x float> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v22 = select <5 x i1> undef, <5 x double> undef, <5 x double> undef			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v22 = select <5 x i1> undef, <5 x double> undef, <5 x double> undef
	; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void			; CHECK-NEON-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
	;			;
	; CHECK-THUMB1-SIZE-LABEL: 'selects'			; CHECK-THUMB1-SIZE-LABEL: 'selects'
	; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v0 = select i1 undef, i1 undef, i1 undef			; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v0 = select i1 undef, i1 undef, i1 undef
	; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = select i1 undef, i8 undef, i8 undef			; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v1 = select i1 undef, i8 undef, i8 undef
	; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = select i1 undef, i16 undef, i16 undef			; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = select i1 undef, i16 undef, i16 undef
	; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3 = select i1 undef, i32 undef, i32 undef			; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3 = select i1 undef, i32 undef, i32 undef
	; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4 = select i1 undef, i64 undef, i64 undef			; CHECK-THUMB1-SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v4 = select i1 undef, i64 undef, i64 undef
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines