This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add support for scalable vectorization of loops with selects and cmps
ClosedPublic

Authored by david-arm on Jan 20 2021, 5:54 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
CarolineConcatto
kmclaughlin
c-rhodes
efriedma
dmgreen

Commits

rG2e080eb00ad7: [SVE] Add support for scalable vectorization of loops with selects and cmps

Summary

I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost
that prevented a cost being calculated for select instructions when using
scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost
to only do special cost calculations for fixed width vectors and fall
back to the base version for scalable vectors.

I have added a simple cost model test for cmps and selects:

test/Analysis/CostModel/sve-cmpsel.ll

and some simple tests that show we vectorize loops with cmp and select:

test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Jan 20 2021, 5:54 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJan 20 2021, 5:54 AM

Herald added subscribers: NickHung, psnobl, hiraditya and 2 others. · View Herald Transcript

david-arm requested review of this revision.Jan 20 2021, 5:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2021, 5:54 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added inline comments.Jan 20 2021, 5:56 AM

llvm/test/Analysis/CostModel/sve-cmpsel.ll
10	Here I've deliberately tried to avoid an explosion of test cases, so for the legal and illegal variants I used different element types.

dmgreen added a subscriber: dmgreen.Jan 20 2021, 6:17 AM

dmgreen added inline comments.

llvm/test/Analysis/CostModel/sve-cmpsel.ll
10	Cost model checks don't really need valid inputs, they can just use undef. It makes adding a lot of them much simpler. See something like llvm/test/Analysis/CostModel/ARM/reduce-smax.ll or llvm/test/Analysis/CostModel/X86/arith-fma.ll. Even though they are adding many (sub-) architectures, they are still managable.

Harbormaster completed remote builds in B85876: Diff 317853.Jan 20 2021, 7:07 AM

david-arm added inline comments.Jan 20 2021, 8:53 AM

llvm/test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll
13	Apologies, just realised these names are terrible! I'll fix them in another patch. I also wonder if autogenerating the CHECK lines really makes sense here.

Added more cost model test cases and optimised existing ones.
Renamed vectorisation tests to something more useful. :) I also reduced the number of CHECK lines as it looked too messy and fragile.

david-arm marked an inline comment as done.Jan 21 2021, 4:07 AM

david-arm added inline comments.

llvm/test/Analysis/CostModel/sve-cmpsel.ll
10	Thanks for the suggestion @dmgreen!

The code changes look simple enough, LGTM.

Up to you if you want to try and reduce the test boilerplate too.

llvm/test/Analysis/CostModel/sve-cmpsel.ll
10	I would combine all these into less functions to reduce all this boilerplate. Maybe one function for the cmp's and one for the sel's? Or maybe split out more into the legal/illegal types, like you have commented. They don't need to actually use the output of the instruction. It makes adding lots of test much easier to check, but up to you what you think.

This revision is now accepted and ready to land.Jan 21 2021, 6:40 AM

Closed by commit rG2e080eb00ad7: [SVE] Add support for scalable vectorization of loops with selects and cmps (authored by david-arm). · Explain WhyJan 22 2021, 1:48 AM

This revision was automatically updated to reflect the committed changes.

david-arm marked an inline comment as done.

david-arm added a commit: rG2e080eb00ad7: [SVE] Add support for scalable vectorization of loops with selects and cmps.

david-arm mentioned this in D95598: [AArch64][SVE]Add cost model for broadcast shuffle.Jan 28 2021, 5:41 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

4 lines

Transforms/

Vectorize/

LoopVectorize.cpp

4 lines

test/

Analysis/

CostModel/

sve-cmpsel.ll

146 lines

Transforms/

LoopVectorize/

AArch64/

sve-basic-vec.ll

78 lines

Diff 318461

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 701 Lines • ▼ Show 20 Lines	int AArch64TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
// TODO: Handle other cost kinds.		// TODO: Handle other cost kinds.
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind,		return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind,
I);		I);

int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
// We don't lower some vector selects well that are wider than the register		// We don't lower some vector selects well that are wider than the register
// width.		// width.
if (ValTy->isVectorTy() && ISD == ISD::SELECT) {		if (isa<FixedVectorType>(ValTy) && ISD == ISD::SELECT) {
// We would need this many instructions to hide the scalarization happening.		// We would need this many instructions to hide the scalarization happening.
const int AmortizationCost = 20;		const int AmortizationCost = 20;

// If VecPred is not set, check if we can get a predicate from the context		// If VecPred is not set, check if we can get a predicate from the context
// instruction, if its type matches the requested ValTy.		// instruction, if its type matches the requested ValTy.
if (VecPred == CmpInst::BAD_ICMP_PREDICATE && I && I->getType() == ValTy) {		if (VecPred == CmpInst::BAD_ICMP_PREDICATE && I && I->getType() == ValTy) {
CmpInst::Predicate CurrentPred;		CmpInst::Predicate CurrentPred;
if (match(I, m_Select(m_Cmp(CurrentPred, m_Value(), m_Value()), m_Value(),		if (match(I, m_Select(m_Cmp(CurrentPred, m_Value(), m_Value()), m_Value(),
Show All 25 Lines	if (isa<FixedVectorType>(ValTy) && ISD == ISD::SELECT) {
EVT SelValTy = TLI->getValueType(DL, ValTy);		EVT SelValTy = TLI->getValueType(DL, ValTy);
if (SelCondTy.isSimple() && SelValTy.isSimple()) {		if (SelCondTy.isSimple() && SelValTy.isSimple()) {
if (const auto *Entry = ConvertCostTableLookup(VectorSelectTbl, ISD,		if (const auto *Entry = ConvertCostTableLookup(VectorSelectTbl, ISD,
SelCondTy.getSimpleVT(),		SelCondTy.getSimpleVT(),
SelValTy.getSimpleVT()))		SelValTy.getSimpleVT()))
return Entry->Cost;		return Entry->Cost;
}		}
}		}
		// The base case handles scalable vectors fine for now, since it treats the
		// cost as 1 * legalization cost.
return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);		return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);
}		}

AArch64TTIImpl::TTI::MemCmpExpansionOptions		AArch64TTIImpl::TTI::MemCmpExpansionOptions
AArch64TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {		AArch64TTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
TTI::MemCmpExpansionOptions Options;		TTI::MemCmpExpansionOptions Options;
if (ST->requiresStrictAlign()) {		if (ST->requiresStrictAlign()) {
// TODO: Add cost modeling for strict align. Misaligned loads expand to		// TODO: Add cost modeling for strict align. Misaligned loads expand to
▲ Show 20 Lines • Show All 482 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,328 Lines • ▼ Show 20 Lines	return N * TTI.getArithmeticInstrCost(
TargetTransformInfo::OP_None, TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None, TargetTransformInfo::OP_None,
I->getOperand(0), I);		I->getOperand(0), I);
}		}
case Instruction::Select: {		case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);		SelectInst *SI = cast<SelectInst>(I);
const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());		const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));		bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
Type *CondTy = SI->getCondition()->getType();		Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond) {		if (!ScalarCond)
assert(!VF.isScalable() && "VF is assumed to be non scalable.");
CondTy = VectorType::get(CondTy, VF);		CondTy = VectorType::get(CondTy, VF);
}
return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy,		return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy,
CmpInst::BAD_ICMP_PREDICATE, CostKind, I);		CmpInst::BAD_ICMP_PREDICATE, CostKind, I);
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
Type *ValTy = I->getOperand(0)->getType();		Type *ValTy = I->getOperand(0)->getType();
Instruction *Op0AsInstruction = dyn_cast<Instruction>(I->getOperand(0));		Instruction *Op0AsInstruction = dyn_cast<Instruction>(I->getOperand(0));
if (canTruncateToMinimalBitwidth(Op0AsInstruction, VF))		if (canTruncateToMinimalBitwidth(Op0AsInstruction, VF))
▲ Show 20 Lines • Show All 2,376 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/sve-cmpsel.ll

This file was added.

				; RUN: opt -cost-model -analyze -mtriple=aarch64--linux-gnu -mattr=+sve < %s 2>%t \| FileCheck %s

				; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				; Check icmp for legal integer vectors.
				define void @cmp_legal_int() {
				; CHECK-LABEL: 'cmp_legal_int'
				david-armAuthorUnsubmitted Done Reply Inline Actions Here I've deliberately tried to avoid an explosion of test cases, so for the legal and illegal variants I used different element types. david-arm: Here I've deliberately tried to avoid an explosion of test cases, so for the legal and illegal…
				dmgreenUnsubmitted Done Reply Inline Actions Cost model checks don't really need valid inputs, they can just use undef. It makes adding a lot of them much simpler. See something like llvm/test/Analysis/CostModel/ARM/reduce-smax.ll or llvm/test/Analysis/CostModel/X86/arith-fma.ll. Even though they are adding many (sub-) architectures, they are still managable. dmgreen: Cost model checks don't really need valid inputs, they can just use undef. It makes adding a…
				david-armAuthorUnsubmitted Done Reply Inline Actions Thanks for the suggestion @dmgreen! david-arm: Thanks for the suggestion @dmgreen!
				dmgreenUnsubmitted Not Done Reply Inline Actions I would combine all these into less functions to reduce all this boilerplate. Maybe one function for the cmp's and one for the sel's? Or maybe split out more into the legal/illegal types, like you have commented. They don't need to actually use the output of the instruction. It makes adding lots of test much easier to check, but up to you what you think. dmgreen: I would combine all these into less functions to reduce all this boilerplate. Maybe one…
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %1 = icmp ne <vscale x 2 x i64> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %2 = icmp ne <vscale x 4 x i32> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %3 = icmp ne <vscale x 8 x i16> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %4 = icmp ne <vscale x 16 x i8> undef, undef
				%1 = icmp ne <vscale x 2 x i64> undef, undef
				%2 = icmp ne <vscale x 4 x i32> undef, undef
				%3 = icmp ne <vscale x 8 x i16> undef, undef
				%4 = icmp ne <vscale x 16 x i8> undef, undef
				ret void
				}

				; Check icmp for an illegal integer vector.
				define <vscale x 4 x i1> @cmp_nxv4i64() {
				; CHECK-LABEL: 'cmp_nxv4i64'
				; CHECK: Cost Model: Found an estimated cost of 2 for instruction: %res = icmp ne <vscale x 4 x i64> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 4 x i1> %res
				%res = icmp ne <vscale x 4 x i64> undef, undef
				ret <vscale x 4 x i1> %res
				}

				; Check icmp for legal predicate vectors.
				define void @cmp_legal_pred() {
				; CHECK-LABEL: 'cmp_legal_pred'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %1 = icmp ne <vscale x 2 x i1> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %2 = icmp ne <vscale x 4 x i1> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %3 = icmp ne <vscale x 8 x i1> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %4 = icmp ne <vscale x 16 x i1> undef, undef
				%1 = icmp ne <vscale x 2 x i1> undef, undef
				%2 = icmp ne <vscale x 4 x i1> undef, undef
				%3 = icmp ne <vscale x 8 x i1> undef, undef
				%4 = icmp ne <vscale x 16 x i1> undef, undef
				ret void
				}

				; Check icmp for an illegal predicate vector.
				define <vscale x 32 x i1> @cmp_nxv32i1() {
				; CHECK-LABEL: 'cmp_nxv32i1'
				; CHECK: Cost Model: Found an estimated cost of 2 for instruction: %res = icmp ne <vscale x 32 x i1> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 32 x i1> %res
				%res = icmp ne <vscale x 32 x i1> undef, undef
				ret <vscale x 32 x i1> %res
				}

				; Check fcmp for legal FP vectors
				define void @cmp_legal_fp() #0 {
				; CHECK-LABEL: 'cmp_legal_fp'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %1 = fcmp oge <vscale x 2 x double> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %2 = fcmp oge <vscale x 4 x float> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %3 = fcmp oge <vscale x 8 x half> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
				%1 = fcmp oge <vscale x 2 x double> undef, undef
				%2 = fcmp oge <vscale x 4 x float> undef, undef
				%3 = fcmp oge <vscale x 8 x half> undef, undef
				%4 = fcmp oge <vscale x 8 x bfloat> undef, undef
				ret void
				}

				; Check fcmp for an illegal FP vector
				define <vscale x 16 x i1> @cmp_nxv16f16() {
				; CHECK-LABEL: 'cmp_nxv16f16'
				; CHECK: Cost Model: Found an estimated cost of 2 for instruction: %res = fcmp oge <vscale x 16 x half> undef, undef
				; CHECK: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 16 x i1> %res
				%res = fcmp oge <vscale x 16 x half> undef, undef
				ret <vscale x 16 x i1> %res
				}

				; Check select for legal integer vectors
				define void @sel_legal_int() {
				; CHECK-LABEL: 'sel_legal_int'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %1 = select <vscale x 2 x i1> undef, <vscale x 2 x i64> undef, <vscale x 2 x i64> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %2 = select <vscale x 4 x i1> undef, <vscale x 4 x i32> undef, <vscale x 4 x i32> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %3 = select <vscale x 8 x i1> undef, <vscale x 8 x i16> undef, <vscale x 8 x i16> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %4 = select <vscale x 16 x i1> undef, <vscale x 16 x i8> undef, <vscale x 16 x i8> undef
				%1 = select <vscale x 2 x i1> undef, <vscale x 2 x i64> undef, <vscale x 2 x i64> undef
				%2 = select <vscale x 4 x i1> undef, <vscale x 4 x i32> undef, <vscale x 4 x i32> undef
				%3 = select <vscale x 8 x i1> undef, <vscale x 8 x i16> undef, <vscale x 8 x i16> undef
				%4 = select <vscale x 16 x i1> undef, <vscale x 16 x i8> undef, <vscale x 16 x i8> undef
				ret void
				}

				; Check select for an illegal integer vector
				define <vscale x 16 x i16> @sel_nxv16i16() {
				; CHECK-LABEL: 'sel_nxv16i16'
				; CHECK: Cost Model: Found an estimated cost of 2 for instruction: %res = select <vscale x 16 x i1> undef, <vscale x 16 x i16> undef, <vscale x 16 x i16> undef
				; CHECK: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 16 x i16> %res
				%res = select <vscale x 16 x i1> undef, <vscale x 16 x i16> undef, <vscale x 16 x i16> undef
				ret <vscale x 16 x i16> %res
				}

				; Check select for a legal FP vector
				define void @sel_legal_fp() #0 {
				; CHECK-LABEL: 'sel_legal_fp'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %1 = select <vscale x 2 x i1> undef, <vscale x 2 x double> undef, <vscale x 2 x double> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %2 = select <vscale x 4 x i1> undef, <vscale x 4 x float> undef, <vscale x 4 x float> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %3 = select <vscale x 8 x i1> undef, <vscale x 8 x half> undef, <vscale x 8 x half> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %4 = select <vscale x 8 x i1> undef, <vscale x 8 x bfloat> undef, <vscale x 8 x bfloat> undef
				%1 = select <vscale x 2 x i1> undef, <vscale x 2 x double> undef, <vscale x 2 x double> undef
				%2 = select <vscale x 4 x i1> undef, <vscale x 4 x float> undef, <vscale x 4 x float> undef
				%3 = select <vscale x 8 x i1> undef, <vscale x 8 x half> undef, <vscale x 8 x half> undef
				%4 = select <vscale x 8 x i1> undef, <vscale x 8 x bfloat> undef, <vscale x 8 x bfloat> undef
				ret void
				}

				; Check select for an illegal FP vector
				define <vscale x 8 x float> @sel_nxv8f32() {
				; CHECK-LABEL: 'sel_nxv8f32'
				; CHECK: Cost Model: Found an estimated cost of 2 for instruction: %res = select <vscale x 8 x i1> undef, <vscale x 8 x float> undef, <vscale x 8 x float> undef
				; CHECK: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 8 x float> %res
				%res = select <vscale x 8 x i1> undef, <vscale x 8 x float> undef, <vscale x 8 x float> undef
				ret <vscale x 8 x float> %res
				}

				; Check select for a legal predicate vector
				define void @sel_legal_pred() {
				; CHECK-LABEL: 'sel_legal_pred'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %1 = select <vscale x 2 x i1> undef, <vscale x 2 x i1> undef, <vscale x 2 x i1> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %2 = select <vscale x 4 x i1> undef, <vscale x 4 x i1> undef, <vscale x 4 x i1> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %3 = select <vscale x 8 x i1> undef, <vscale x 8 x i1> undef, <vscale x 8 x i1> undef
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %4 = select <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i1> undef
				%1 = select <vscale x 2 x i1> undef, <vscale x 2 x i1> undef, <vscale x 2 x i1> undef
				%2 = select <vscale x 4 x i1> undef, <vscale x 4 x i1> undef, <vscale x 4 x i1> undef
				%3 = select <vscale x 8 x i1> undef, <vscale x 8 x i1> undef, <vscale x 8 x i1> undef
				%4 = select <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i1> undef
				ret void
				}

				; Check select for an illegal predicate vector
				define <vscale x 32 x i1> @sel_nxv32i1() {
				; CHECK-LABEL: 'sel_nxv32i1'
				; CHECK: Cost Model: Found an estimated cost of 2 for instruction: %res = select <vscale x 32 x i1> undef, <vscale x 32 x i1> undef, <vscale x 32 x i1> undef
				; CHECK: Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 32 x i1> %res
				%res = select <vscale x 32 x i1> undef, <vscale x 32 x i1> undef, <vscale x 32 x i1> undef
				ret <vscale x 32 x i1> %res
				}

				attributes #0 = { "target-features"="+sve,+bf16" }

llvm/test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

This file was added.

				; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -mattr=+sve < %s -S 2>%t \| FileCheck %s

				; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-unknown-linux-gnu"

				define void @cmpsel_i32(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @cmpsel_i32(
				; CHECK-NEXT: entry:
				david-armAuthorUnsubmitted Done Reply Inline Actions Apologies, just realised these names are terrible! I'll fix them in another patch. I also wonder if autogenerating the CHECK lines really makes sense here. david-arm: Apologies, just realised these names are terrible! I'll fix them in another patch. I also…
				; CHECK: vector.body:
				; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 4 x i32>, <vscale x 4 x i32> {{.*}}, align 4
				; CHECK-NEXT: [[TMP1:%.*]] = icmp eq <vscale x 4 x i32> [[WIDE_LOAD]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 0, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP2:%.*]] = select <vscale x 4 x i1> [[TMP1]], <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 2, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 10, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK: store <vscale x 4 x i32> [[TMP2]], <vscale x 4 x i32>* {{.*}}, align 4
				;
				entry:
				%cmp7 = icmp sgt i64 %n, 0
				br i1 %cmp7, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%tobool.not = icmp eq i32 %0, 0
				%cond = select i1 %tobool.not, i32 2, i32 10
				%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				store i32 %cond, i32* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %for.end.loopexit, label %for.body, !llvm.loop !0

				for.end.loopexit: ; preds = %for.body
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				ret void
				}

				define void @cmpsel_f32(float* noalias nocapture %a, float* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @cmpsel_f32(
				; CHECK-NEXT: entry:
				; CHECK: vector.body:
				; CHECK: [[WIDE_LOAD:%.]] = load <vscale x 4 x float>, <vscale x 4 x float> {{.*}}, align 4
				; CHECK-NEXT: [[TMP1:%.*]] = fcmp ogt <vscale x 4 x float> [[WIDE_LOAD]], shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 3.000000e+00, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP2:%.*]] = select <vscale x 4 x i1> [[TMP1]], <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 1.000000e+01, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 2.000000e+00, i32 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer)
				; CHECK: store <vscale x 4 x float> [[TMP2]], <vscale x 4 x float>* {{.*}}, align 4

				entry:
				%cmp8 = icmp sgt i64 %n, 0
				br i1 %cmp8, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%cmp1 = fcmp ogt float %0, 3.000000e+00
				%conv = select i1 %cmp1, float 1.000000e+01, float 2.000000e+00
				%arrayidx3 = getelementptr inbounds float, float* %a, i64 %indvars.iv
				store float %conv, float* %arrayidx3, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !6

				for.end: ; preds = %for.body, %entry
				ret void
				}

				!0 = distinct !{!0, !1, !2, !3, !4, !5}
				!1 = !{!"llvm.loop.mustprogress"}
				!2 = !{!"llvm.loop.vectorize.width", i32 4}
				!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				!4 = !{!"llvm.loop.interleave.count", i32 1}
				!5 = !{!"llvm.loop.vectorize.enable", i1 true}
				!6 = distinct !{!6, !1, !2, !3, !4, !5}