This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Switch to using get.active.lane.mask when tail folding
ClosedPublic

Authored by reames on Jul 6 2022, 12:34 PM.

Download Raw Diff

Details

Reviewers

craig.topper
frasercrmck

Commits

rGb12930e1338b: [RISCV] Switch to using get.active.lane.mask when tail folding

Summary

The motivation here is to a) bring us closer into alignment with AArch64 under the assumption that codepath is better tested, and b) simplify pattern matching in an upcoming change.

The immediate impact is a significant IR reduction but a fairly minimal change in the generated assembly. Due to a difference in expansion behavior we get a saturating add vs an unsaturating one for the old code, but that's about it. This difference comes down to different handling of overflow, which doesn't seem to be possible here anyways, so the assembly codegen is arguably a minor regression. I don't expect that to matter in practice.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Jul 6 2022, 12:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 12:34 PM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 32 others. · View Herald Transcript

reames requested review of this revision.Jul 6 2022, 12:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 12:34 PM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

I don't see a vector IV being killed off in any of the changed tests. Am I missing something?

Harbormaster completed remote builds in B173966: Diff 442664.Jul 6 2022, 2:11 PM

In D129221#3633717, @craig.topper wrote:

I don't see a vector IV being killed off in any of the changed tests. Am I missing something?

No, you're completely right, I got myself confused. The vectorizer internally represents this with a test on a newly introduced vector IV, but then later generation handling (still in the vectorizer) converts that back into a use of the scalar one + a broadcast. So the actual code generated doesn't involve the vector IV at all.

In D129221#3636023, @reames wrote:

In D129221#3633717, @craig.topper wrote:

I don't see a vector IV being killed off in any of the changed tests. Am I missing something?

No, you're completely right, I got myself confused. The vectorizer internally represents this with a test on a newly introduced vector IV, but then later generation handling (still in the vectorizer) converts that back into a use of the scalar one + a broadcast. So the actual code generated doesn't involve the vector IV at all.

Thanks for clarifying. LGTM

This revision is now accepted and ready to land.Jul 7 2022, 1:51 PM

This revision was landed with ongoing or failed builds.Jul 8 2022, 10:25 AM

Closed by commit rGb12930e1338b: [RISCV] Switch to using get.active.lane.mask when tail folding (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGb12930e1338b: [RISCV] Switch to using get.active.lane.mask when tail folding.

reames mentioned this in D129501: Redefine get.active.lane.mask to allow a more scalar lowering.Jul 12 2022, 10:20 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVTargetTransformInfo.h

1 line

test/

Transforms/

LoopVectorize/

RISCV/

low-trip-count.ll

45 lines

scalable-tailfold.ll

56 lines

Diff 443282

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	public:
InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,		InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
const APInt &Imm, Type *Ty,		const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);		TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);

bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;
bool supportsScalableVectors() const { return ST->hasVInstructions(); }		bool supportsScalableVectors() const { return ST->hasVInstructions(); }
		bool emitGetActiveLaneMask() const { return ST->hasVInstructions(); }
Optional<unsigned> getMaxVScale() const;		Optional<unsigned> getMaxVScale() const;
Optional<unsigned> getVScaleForTuning() const;		Optional<unsigned> getVScaleForTuning() const;

TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;		TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;

unsigned getRegUsageForType(Type *Ty);		unsigned getRegUsageForType(Type *Ty);

InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll

	Show All 17 Lines
	; CHECK-NEXT: [[TMP7:%.*]] = sub i64 [[TMP6]], 1			; CHECK-NEXT: [[TMP7:%.*]] = sub i64 [[TMP6]], 1
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 5, [[TMP7]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 5, [[TMP7]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP4]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP4]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 8 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[TMP8]], i64 5)
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 8 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = call <vscale x 8 x i64> @llvm.experimental.stepvector.nxv8i64()			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i8, i8 [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = add <vscale x 8 x i64> zeroinitializer, [[TMP9]]			; CHECK-NEXT: [[TMP11:%.]] = bitcast i8 [[TMP10]] to <vscale x 8 x i8>*
	; CHECK-NEXT: [[VEC_IV:%.*]] = add <vscale x 8 x i64> [[BROADCAST_SPLAT]], [[TMP10]]			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0nxv8i8(<vscale x 8 x i8> [[TMP11]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]], <vscale x 8 x i8> poison)
	; CHECK-NEXT: [[TMP11:%.*]] = icmp ule <vscale x 8 x i64> [[VEC_IV]], shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 4, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP12:%.*]] = shl <vscale x 8 x i8> [[WIDE_MASKED_LOAD]], shufflevector (<vscale x 8 x i8> insertelement (<vscale x 8 x i8> poison, i8 1, i32 0), <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i8, i8 [[SRC:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i8, i8 [[DST:%.*]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i8, i8 [[TMP12]], i32 0			; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds i8, i8 [[TMP13]], i32 0
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i8 [[TMP13]] to <vscale x 8 x i8>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast i8 [[TMP14]] to <vscale x 8 x i8>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0nxv8i8(<vscale x 8 x i8> [[TMP14]], i32 1, <vscale x 8 x i1> [[TMP11]], <vscale x 8 x i8> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0nxv8i8(<vscale x 8 x i8> [[TMP15]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]], <vscale x 8 x i8> poison)
	; CHECK-NEXT: [[TMP15:%.*]] = shl <vscale x 8 x i8> [[WIDE_MASKED_LOAD]], shufflevector (<vscale x 8 x i8> insertelement (<vscale x 8 x i8> poison, i8 1, i32 0), <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP16:%.*]] = add <vscale x 8 x i8> [[TMP12]], [[WIDE_MASKED_LOAD1]]
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, i8 [[DST:%.*]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP17:%.]] = bitcast i8 [[TMP14]] to <vscale x 8 x i8>*
	; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds i8, i8 [[TMP16]], i32 0			; CHECK-NEXT: call void @llvm.masked.store.nxv8i8.p0nxv8i8(<vscale x 8 x i8> [[TMP16]], <vscale x 8 x i8>* [[TMP17]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
	; CHECK-NEXT: [[TMP18:%.]] = bitcast i8 [[TMP17]] to <vscale x 8 x i8>*			; CHECK-NEXT: [[TMP18:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[WIDE_MASKED_LOAD1:%.]] = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0nxv8i8(<vscale x 8 x i8> [[TMP18]], i32 1, <vscale x 8 x i1> [[TMP11]], <vscale x 8 x i8> poison)			; CHECK-NEXT: [[TMP19:%.*]] = mul i64 [[TMP18]], 8
	; CHECK-NEXT: [[TMP19:%.*]] = add <vscale x 8 x i8> [[TMP15]], [[WIDE_MASKED_LOAD1]]			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP19]]
	; CHECK-NEXT: [[TMP20:%.]] = bitcast i8 [[TMP17]] to <vscale x 8 x i8>*
	; CHECK-NEXT: call void @llvm.masked.store.nxv8i8.p0nxv8i8(<vscale x 8 x i8> [[TMP19]], <vscale x 8 x i8>* [[TMP20]], i32 1, <vscale x 8 x i1> [[TMP11]])
	; CHECK-NEXT: [[TMP21:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP22:%.*]] = mul i64 [[TMP21]], 8
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP22]]
	; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 true, label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_08:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_08:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[SRC]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[SRC]], i64 [[I_08]]
	; CHECK-NEXT: [[TMP23:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; CHECK-NEXT: [[TMP20:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[MUL:%.*]] = shl i8 [[TMP23]], 1			; CHECK-NEXT: [[MUL:%.*]] = shl i8 [[TMP20]], 1
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i8, i8 [[DST]], i64 [[I_08]]			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i8, i8 [[DST]], i64 [[I_08]]
	; CHECK-NEXT: [[TMP24:%.]] = load i8, i8 [[ARRAYIDX1]], align 1			; CHECK-NEXT: [[TMP21:%.]] = load i8, i8 [[ARRAYIDX1]], align 1
	; CHECK-NEXT: [[ADD:%.*]] = add i8 [[MUL]], [[TMP24]]			; CHECK-NEXT: [[ADD:%.*]] = add i8 [[MUL]], [[TMP21]]
	; CHECK-NEXT: store i8 [[ADD]], i8* [[ARRAYIDX1]], align 1			; CHECK-NEXT: store i8 [[ADD]], i8* [[ARRAYIDX1]], align 1
	; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[I_08]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], 5			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INC]], 5
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll

	Show All 13 Lines
	; CHECK-NEXT: br i1 [[TMP1]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[TMP1]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP2]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP2]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT1]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 [[TMP5]], i64 1024)
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 1 x i64> @llvm.experimental.stepvector.nxv1i64()			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = add <vscale x 1 x i64> zeroinitializer, [[TMP6]]			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <vscale x 1 x i64> @llvm.masked.load.nxv1i64.p0(ptr [[TMP7]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]], <vscale x 1 x i64> poison)
	; CHECK-NEXT: [[VEC_IV:%.*]] = add <vscale x 1 x i64> [[BROADCAST_SPLAT]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = add <vscale x 1 x i64> [[WIDE_MASKED_LOAD]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP8:%.*]] = icmp ule <vscale x 1 x i64> [[VEC_IV]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 1023, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)			; CHECK-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[TMP8]], ptr [[TMP7]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]])
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP9]], i32 0			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP9]]
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = call <vscale x 1 x i64> @llvm.masked.load.nxv1i64.p0(ptr [[TMP10]], i32 8, <vscale x 1 x i1> [[TMP8]], <vscale x 1 x i64> poison)			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 1 x i64> [[WIDE_MASKED_LOAD]], [[BROADCAST_SPLAT2]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[TMP11]], ptr [[TMP10]], i32 8, <vscale x 1 x i1> [[TMP8]])
	; CHECK-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br i1 [[TMP1]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[TMP1]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]			; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP2]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP2]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT1]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 1 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 [[TMP5]], i64 1024)
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 1 x i64> @llvm.experimental.stepvector.nxv1i64()			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = add <vscale x 1 x i64> zeroinitializer, [[TMP6]]			; CHECK-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP7]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]])
	; CHECK-NEXT: [[VEC_IV:%.*]] = add <vscale x 1 x i64> [[BROADCAST_SPLAT]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP8:%.*]] = icmp ule <vscale x 1 x i64> [[VEC_IV]], shufflevector (<vscale x 1 x i64> insertelement (<vscale x 1 x i64> poison, i64 1023, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer)			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[TMP9]], i32 0			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP10]], i32 8, <vscale x 1 x i1> [[TMP8]])
	; CHECK-NEXT: [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines