This is an archive of the discontinued LLVM Phabricator instance.

[LV] Recognize store of invariant value to invariant address as uniform
ClosedPublic

Authored by reames on Jul 22 2022, 7:53 AM.

Download Raw Diff

Details

Reviewers

david-arm
fhahn

Commits

rG0b47615fcf0c: [LV] Recognize store of invariant value to invariant address as uniform

Summary

This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result.

For context, the basic structure of the existing code and how the change fits in:

First, we select a widening strategy. (The result is irrelevant for this patch.)
Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.)
If it is, we overrule the widening strategy, and unconditionally scalarize.
VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.)

An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Jul 22 2022, 7:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 7:53 AM

Herald added subscribers: frasercrmck, luismarques, apazos and 21 others. · View Herald Transcript

reames requested review of this revision.Jul 22 2022, 7:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 7:53 AM

Herald added subscribers: alextsao1999, • pcwang-thead, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B177010: Diff 446832.Jul 22 2022, 8:44 AM

ping

david-arm added inline comments.Jul 27 2022, 5:14 AM

llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll
96 ↗	(On Diff #446832)	Hi @reames, something doesn't look right about this change because each store instruction is storing out a different value.

reames added inline comments.Jul 27 2022, 7:02 AM

llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll
96 ↗	(On Diff #446832)	This is correct, but not directly related to the thrust of the patch. This is a side effect of the change in isScalarWithPredication. We'd previously been considering these stores to be predicated. They are unconditional in the original IR, so this should be correct. If you want, I can split the patch further to do a pre-change with just the change in isScalarWithPredication.

fhahn added inline comments.Jul 27 2022, 7:13 AM

llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll
96 ↗	(On Diff #446832)	Yeah it would probably be good to split off the change to ‘ isPredicatedInst’, especially if it reduces the test changes per patch

reames added inline comments.Jul 27 2022, 7:43 AM

llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll
96 ↗	(On Diff #446832)	Split off as https://reviews.llvm.org/D130637. Will rebase this once that lands. This did turn out to be more test churn than I'd realized. Clearly should have split that from the start. Oh well.

reames added a parent revision: D130637: [LV] Don't predicate uniform mem op stores unneccessarily.Jul 27 2022, 7:43 AM

Rebase over split off and landed change.

reames edited the summary of this revision. (Show Details)Jul 28 2022, 11:37 AM

Harbormaster completed remote builds in B178116: Diff 448397.Jul 28 2022, 1:09 PM

LGTM!

This revision is now accepted and ready to land.Aug 2 2022, 6:22 AM

LGTM, thanks!

This revision was landed with ongoing or failed builds.Aug 2 2022, 8:10 AM

Closed by commit rG0b47615fcf0c: [LV] Recognize store of invariant value to invariant address as uniform (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG0b47615fcf0c: [LV] Recognize store of invariant value to invariant address as uniform.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

23 lines

test/

Transforms/

LoopVectorize/

RISCV/

scalable-tailfold.ll

37 lines

uniform-load-store.ll

111 lines

X86/

uniform_mem_op.ll

12 lines

pr47343-expander-lcssa-after-cfg-update.ll

1 line

Diff 449296

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,610 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {

// Start with the conditional branch. If the branch condition is an		// Start with the conditional branch. If the branch condition is an
// instruction contained in the loop that is only used by the branch, it is		// instruction contained in the loop that is only used by the branch, it is
// uniform.		// uniform.
auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));		auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));
if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())		if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())
addToWorklistIfAllowed(Cmp);		addToWorklistIfAllowed(Cmp);

		// Return true if all lanes perform the same memory operation, and we can
		// thus chose to execute only one.
		auto isUniformMemOpUse = [&](Instruction *I) {
		if (!Legal->isUniformMemOp(*I))
		return false;
		if (isa<LoadInst>(I))
		// Loading the same address always produces the same result - at least
		// assuming aliasing and ordering which have already been checked.
		return true;
		// Storing the same value on every iteration.
		return TheLoop->isLoopInvariant(cast<StoreInst>(I)->getValueOperand());
		};

auto isUniformDecision = [&](Instruction *I, ElementCount VF) {		auto isUniformDecision = [&](Instruction *I, ElementCount VF) {
InstWidening WideningDecision = getWideningDecision(I, VF);		InstWidening WideningDecision = getWideningDecision(I, VF);
assert(WideningDecision != CM_Unknown &&		assert(WideningDecision != CM_Unknown &&
"Widening decision should be ready at this moment");		"Widening decision should be ready at this moment");

// A uniform memory op is itself uniform. We exclude uniform stores		if (isUniformMemOpUse(I))
// here as they demand the last lane, not the first one.
if (isa<LoadInst>(I) && Legal->isUniformMemOp(*I)) {
assert(WideningDecision == CM_Scalarize);
return true;		return true;
}

return (WideningDecision == CM_Widen \|\|		return (WideningDecision == CM_Widen \|\|
WideningDecision == CM_Widen_Reverse \|\|		WideningDecision == CM_Widen_Reverse \|\|
WideningDecision == CM_Interleave);		WideningDecision == CM_Interleave);
};		};


// Returns true if Ptr is the pointer operand of a memory access instruction		// Returns true if Ptr is the pointer operand of a memory access instruction
Show All 37 Lines	for (auto &I : *BB) {
continue;		continue;
}		}

// If there's no pointer operand, there's nothing to do.		// If there's no pointer operand, there's nothing to do.
auto *Ptr = getLoadStorePointerOperand(&I);		auto *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)		if (!Ptr)
continue;		continue;

// A uniform memory op is itself uniform. We exclude uniform stores		if (isUniformMemOpUse(&I))
// here as they demand the last lane, not the first one.
if (isa<LoadInst>(I) && Legal->isUniformMemOp(I))
addToWorklistIfAllowed(&I);		addToWorklistIfAllowed(&I);

if (isUniformDecision(&I, VF)) {		if (isUniformDecision(&I, VF)) {
assert(isVectorizedMemAccessUse(&I, Ptr) && "consistency check");		assert(isVectorizedMemAccessUse(&I, Ptr) && "consistency check");
HasUniformUse.insert(Ptr);		HasUniformUse.insert(Ptr);
}		}
}		}

▲ Show 20 Lines • Show All 5,866 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @uniform_store(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %v, i64 %n) {			define void @uniform_store(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %v, i64 %n) {
	; CHECK-LABEL: @uniform_store(			; CHECK-LABEL: @uniform_store(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i64 -1025, [[TMP0]]
				; CHECK-NEXT: br i1 [[TMP1]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
				; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP2]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 [[TMP5]], i64 1024)
				; CHECK-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]
				; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP6]], i32 0
				; CHECK-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP7]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]])
				; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: store i64 [[V:%.]], ptr [[B:%.]], align 8			; CHECK-NEXT: store i64 [[V]], ptr [[B]], align 8
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; CHECK-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll

	Show First 20 Lines • Show All 643 Lines • ▼ Show 20 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @uniform_store(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %v, i64 %n) {			define void @uniform_store(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %v, i64 %n) {
	; SCALABLE-LABEL: @uniform_store(			; SCALABLE-LABEL: @uniform_store(
	; SCALABLE-NEXT: entry:			; SCALABLE-NEXT: entry:
				; SCALABLE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; SCALABLE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; SCALABLE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; SCALABLE: vector.ph:
				; SCALABLE-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; SCALABLE-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; SCALABLE-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; SCALABLE-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; SCALABLE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; SCALABLE-NEXT: br label [[VECTOR_BODY:%.*]]
				; SCALABLE: vector.body:
				; SCALABLE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; SCALABLE-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; SCALABLE-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8
				; SCALABLE-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; SCALABLE-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
				; SCALABLE-NEXT: store <vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8
				; SCALABLE-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
				; SCALABLE-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
				; SCALABLE-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; SCALABLE-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
				; SCALABLE: middle.block:
				; SCALABLE-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; SCALABLE-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; SCALABLE: scalar.ph:
				; SCALABLE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; SCALABLE-NEXT: br label [[FOR_BODY:%.*]]			; SCALABLE-NEXT: br label [[FOR_BODY:%.*]]
	; SCALABLE: for.body:			; SCALABLE: for.body:
	; SCALABLE-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; SCALABLE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; SCALABLE-NEXT: store i64 [[V:%.]], ptr [[B:%.]], align 8			; SCALABLE-NEXT: store i64 [[V]], ptr [[B]], align 8
	; SCALABLE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; SCALABLE-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; SCALABLE: for.end:			; SCALABLE: for.end:
	; SCALABLE-NEXT: ret void			; SCALABLE-NEXT: ret void
	;			;
	; FIXEDLEN-LABEL: @uniform_store(			; FIXEDLEN-LABEL: @uniform_store(
	; FIXEDLEN-NEXT: entry:			; FIXEDLEN-NEXT: entry:
	; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; FIXEDLEN: vector.ph:			; FIXEDLEN: vector.ph:
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0			; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[V]], i32 0			; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[V]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer			; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]			; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]
	; FIXEDLEN: vector.body:			; FIXEDLEN: vector.body:
	; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8			; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 8			; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 8
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 8
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 8
	; FIXEDLEN-NEXT: [[TMP2:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; FIXEDLEN-NEXT: [[TMP2:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
	; FIXEDLEN-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; FIXEDLEN-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
	; FIXEDLEN-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 0			; FIXEDLEN-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 0
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8
	; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 2			; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 2
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8
	; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; FIXEDLEN-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; FIXEDLEN-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	Show All 12 Lines
	; FIXEDLEN-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; FIXEDLEN-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; FIXEDLEN-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; FIXEDLEN-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; FIXEDLEN-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; FIXEDLEN-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; FIXEDLEN: for.end:			; FIXEDLEN: for.end:
	; FIXEDLEN-NEXT: ret void			; FIXEDLEN-NEXT: ret void
	;			;
	; TF-SCALABLE-LABEL: @uniform_store(			; TF-SCALABLE-LABEL: @uniform_store(
	; TF-SCALABLE-NEXT: entry:			; TF-SCALABLE-NEXT: entry:
				; TF-SCALABLE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP1:%.*]] = icmp ult i64 -1025, [[TMP0]]
				; TF-SCALABLE-NEXT: br i1 [[TMP1]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; TF-SCALABLE: vector.ph:
				; TF-SCALABLE-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1
				; TF-SCALABLE-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]
				; TF-SCALABLE-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP2]]
				; TF-SCALABLE-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
				; TF-SCALABLE-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; TF-SCALABLE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; TF-SCALABLE-NEXT: br label [[VECTOR_BODY:%.*]]
				; TF-SCALABLE: vector.body:
				; TF-SCALABLE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; TF-SCALABLE-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 0
				; TF-SCALABLE-NEXT: [[ACTIVE_LANE_MASK:%.*]] = call <vscale x 1 x i1> @llvm.get.active.lane.mask.nxv1i1.i64(i64 [[TMP5]], i64 1024)
				; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8
				; TF-SCALABLE-NEXT: [[TMP6:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP5]]
				; TF-SCALABLE-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP6]], i32 0
				; TF-SCALABLE-NEXT: call void @llvm.masked.store.nxv1i64.p0(<vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP7]], i32 8, <vscale x 1 x i1> [[ACTIVE_LANE_MASK]])
				; TF-SCALABLE-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
				; TF-SCALABLE-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP8]]
				; TF-SCALABLE-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; TF-SCALABLE-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
				; TF-SCALABLE: middle.block:
				; TF-SCALABLE-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; TF-SCALABLE: scalar.ph:
				; TF-SCALABLE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]			; TF-SCALABLE-NEXT: br label [[FOR_BODY:%.*]]
	; TF-SCALABLE: for.body:			; TF-SCALABLE: for.body:
	; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; TF-SCALABLE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; TF-SCALABLE-NEXT: store i64 [[V:%.]], ptr [[B:%.]], align 8			; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[B]], align 8
	; TF-SCALABLE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; TF-SCALABLE-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; TF-SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; TF-SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; TF-SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; TF-SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; TF-SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; TF-SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; TF-SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; TF-SCALABLE: for.end:			; TF-SCALABLE: for.end:
	; TF-SCALABLE-NEXT: ret void			; TF-SCALABLE-NEXT: ret void
	;			;
	; TF-FIXEDLEN-LABEL: @uniform_store(			; TF-FIXEDLEN-LABEL: @uniform_store(
	; TF-FIXEDLEN-NEXT: entry:			; TF-FIXEDLEN-NEXT: entry:
	; TF-FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; TF-FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; TF-FIXEDLEN: vector.ph:			; TF-FIXEDLEN: vector.ph:
	; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0			; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0
	; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; TF-FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]			; TF-FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]
	; TF-FIXEDLEN: vector.body:			; TF-FIXEDLEN: vector.body:
	; TF-FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; TF-FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; TF-FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; TF-FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; TF-FIXEDLEN-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8			; TF-FIXEDLEN-NEXT: store i64 [[V]], ptr [[B:%.*]], align 8
	; TF-FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 8
	; TF-FIXEDLEN-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; TF-FIXEDLEN-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
	; TF-FIXEDLEN-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0			; TF-FIXEDLEN-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
	; TF-FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP2]], align 8			; TF-FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP2]], align 8
	; TF-FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; TF-FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; TF-FIXEDLEN-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; TF-FIXEDLEN-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; TF-FIXEDLEN-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; TF-FIXEDLEN-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; TF-FIXEDLEN: middle.block:			; TF-FIXEDLEN: middle.block:
	; TF-FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; TF-FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024
	▲ Show 20 Lines • Show All 312 Lines • ▼ Show 20 Lines
	for.end:			for.end:
	ret void			ret void
	}			}


	define void @uniform_store_unaligned(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %v, i64 %n) {			define void @uniform_store_unaligned(ptr noalias nocapture %a, ptr noalias nocapture %b, i64 %v, i64 %n) {
	; SCALABLE-LABEL: @uniform_store_unaligned(			; SCALABLE-LABEL: @uniform_store_unaligned(
	; SCALABLE-NEXT: entry:			; SCALABLE-NEXT: entry:
				; SCALABLE-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; SCALABLE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP0]]
				; SCALABLE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; SCALABLE: vector.ph:
				; SCALABLE-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
				; SCALABLE-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP1]]
				; SCALABLE-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; SCALABLE-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 1 x i64> poison, i64 [[V:%.]], i32 0
				; SCALABLE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 1 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
				; SCALABLE-NEXT: br label [[VECTOR_BODY:%.*]]
				; SCALABLE: vector.body:
				; SCALABLE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; SCALABLE-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
				; SCALABLE-NEXT: store i64 [[V]], ptr [[B:%.*]], align 1
				; SCALABLE-NEXT: [[TMP3:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP2]]
				; SCALABLE-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP3]], i32 0
				; SCALABLE-NEXT: store <vscale x 1 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8
				; SCALABLE-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
				; SCALABLE-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP5]]
				; SCALABLE-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; SCALABLE-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
				; SCALABLE: middle.block:
				; SCALABLE-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; SCALABLE-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
				; SCALABLE: scalar.ph:
				; SCALABLE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; SCALABLE-NEXT: br label [[FOR_BODY:%.*]]			; SCALABLE-NEXT: br label [[FOR_BODY:%.*]]
	; SCALABLE: for.body:			; SCALABLE: for.body:
	; SCALABLE-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; SCALABLE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; SCALABLE-NEXT: store i64 [[V:%.]], ptr [[B:%.]], align 1			; SCALABLE-NEXT: store i64 [[V]], ptr [[B]], align 1
	; SCALABLE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[IV]]			; SCALABLE-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[IV]]
	; SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8			; SCALABLE-NEXT: store i64 [[V]], ptr [[ARRAYIDX]], align 8
	; SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; SCALABLE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024			; SCALABLE-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
	; SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; SCALABLE-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; SCALABLE: for.end:			; SCALABLE: for.end:
	; SCALABLE-NEXT: ret void			; SCALABLE-NEXT: ret void
	;			;
	; FIXEDLEN-LABEL: @uniform_store_unaligned(			; FIXEDLEN-LABEL: @uniform_store_unaligned(
	; FIXEDLEN-NEXT: entry:			; FIXEDLEN-NEXT: entry:
	; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; FIXEDLEN-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; FIXEDLEN: vector.ph:			; FIXEDLEN: vector.ph:
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0			; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[V]], i32 0			; FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> poison, i64 [[V]], i32 0
	; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer			; FIXEDLEN-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> poison, <2 x i32> zeroinitializer
	; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]			; FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]
	; FIXEDLEN: vector.body:			; FIXEDLEN: vector.body:
	; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2			; FIXEDLEN-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 2
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B:%.*]], align 1			; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B:%.*]], align 1
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 1			; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 1
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 1
	; FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 1
	; FIXEDLEN-NEXT: [[TMP2:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; FIXEDLEN-NEXT: [[TMP2:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
	; FIXEDLEN-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]			; FIXEDLEN-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
	; FIXEDLEN-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 0			; FIXEDLEN-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 0
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP4]], align 8
	; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 2			; FIXEDLEN-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 2
	; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8			; FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT2]], ptr [[TMP5]], align 8
	; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; FIXEDLEN-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; FIXEDLEN-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	Show All 35 Lines
	; TF-FIXEDLEN: vector.ph:			; TF-FIXEDLEN: vector.ph:
	; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0			; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i64> poison, i64 [[V:%.]], i32 0
	; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer			; TF-FIXEDLEN-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
	; TF-FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]			; TF-FIXEDLEN-NEXT: br label [[VECTOR_BODY:%.*]]
	; TF-FIXEDLEN: vector.body:			; TF-FIXEDLEN: vector.body:
	; TF-FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; TF-FIXEDLEN-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; TF-FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; TF-FIXEDLEN-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; TF-FIXEDLEN-NEXT: store i64 [[V]], ptr [[B:%.*]], align 1			; TF-FIXEDLEN-NEXT: store i64 [[V]], ptr [[B:%.*]], align 1
	; TF-FIXEDLEN-NEXT: store i64 [[V]], ptr [[B]], align 1
	; TF-FIXEDLEN-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]			; TF-FIXEDLEN-NEXT: [[TMP1:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 [[TMP0]]
	; TF-FIXEDLEN-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0			; TF-FIXEDLEN-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[TMP1]], i32 0
	; TF-FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP2]], align 8			; TF-FIXEDLEN-NEXT: store <2 x i64> [[BROADCAST_SPLAT]], ptr [[TMP2]], align 8
	; TF-FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; TF-FIXEDLEN-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; TF-FIXEDLEN-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; TF-FIXEDLEN-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; TF-FIXEDLEN-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; TF-FIXEDLEN-NEXT: br i1 [[TMP3]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; TF-FIXEDLEN: middle.block:			; TF-FIXEDLEN: middle.block:
	; TF-FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; TF-FIXEDLEN-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024
	Show All 30 Lines

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

	Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: store i32 0, i32* [[ADDR:%.*]], align 4			; CHECK-NEXT: store i32 0, i32* [[ADDR:%.*]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4			; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4			; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4			; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: store i32 0, i32* [[ADDR]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP0]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP0]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	▲ Show 20 Lines • Show All 475 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: store i32 0, i32* @f.e, align 1, !alias.scope !0, !noalias !3			; CHECK-NEXT: store i32 0, i32* @f.e, align 1, !alias.scope !0, !noalias !3
	; CHECK-NEXT: store i32 0, i32* @f.e, align 1, !alias.scope !0, !noalias !3			; CHECK-NEXT: store i32 0, i32* @f.e, align 1, !alias.scope !0, !noalias !3
	; CHECK-NEXT: store i8 10, i8* [[TMP0]], align 1			; CHECK-NEXT: store i8 10, i8* [[TMP0]], align 1
	; CHECK-NEXT: store i8 10, i8* [[TMP0]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[INDEX_NEXT]], 500			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[INDEX_NEXT]], 500
	; CHECK-NEXT: br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP2]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 500, 500			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 500, 500
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 500, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ 500, [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines