This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Don't require stride one for inbounds GEP
Needs ReviewPublic

Authored by nikic on May 5 2023, 1:28 AM.

Download Raw Diff

Details

Reviewers

reames
fhahn

Summary

I'm pretty sure I'm missing something here, but I'm not sure what it is... As far as I can tell, if we have an inbounds GEP, then we don't need the stride to be 1 or -1 specifically to assume it doesn't wrap in a nusw sense. If it wraps, then we will go from one half of the address space to the other, and no allocated object can be that large (or at least, using inbounds on such an object is not legal).

This is still WIP as some additional test updates are needed.

Diff Detail

Unit TestsFailed

	Time	Test
	3,880 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases/Linux::auto_memory_profile_test.cpp
	120 ms	x64 debian > LLVM.Transforms/LoopDistribute::scev-inserted-runtime-check.ll
	40 ms	x64 debian > LLVM.Transforms/LoopVectorize::interleaved-accesses-2.ll
	40 ms	x64 debian > LLVM.Transforms/LoopVectorize::interleaved-accesses-3.ll

Event Timeline

nikic created this revision.May 5 2023, 1:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 1:28 AM

Herald added subscribers: StephenFan, hiraditya, arichardson. · View Herald Transcript

nikic requested review of this revision.May 5 2023, 1:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 1:28 AM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

Harbormaster completed remote builds in B230181: Diff 519764.May 5 2023, 2:14 AM

I think this is correct, but for different reasoning.

"The successive addition of the current address, interpreted as an unsigned number, and an offset, interpreted as a signed number, does not wrap the unsigned address space and remains in bounds of the allocated object. As a corollary, if the added offset is non-negative, the addition does not wrap in an unsigned sense (nuw)."

A non-negative stride is clearly and explicitly okay from the corollary. The wording on the first part is slightly hard to follow, but I think this also disallows underflow *as integers* for negative offsets. However, the condition is the corollary maybe hints I'm missing something here? I *think* this is the distinction between arithmetic overflow (where a negative offset does NUW wrap), and address space overflow (where it doesn't) I *think* the later is the semantic we're actually proving here, but the definition of IncrementWrapFlags is a bit hard to follow for me.

@reames What LangRef is describing there is basically "nusw" in SCEV terms, which is I believe is also the property LAA is interested in (or at least, that's what it adds to PSE). What LangRef is trying to emphasize here is that inbounds is not nsw, it is nusw, which reduces to nuw for non-negative offsets.

In D149936#4327222, @nikic wrote:

@reames What LangRef is describing there is basically "nusw" in SCEV terms, which is I believe is also the property LAA is interested in (or at least, that's what it adds to PSE). What LangRef is trying to emphasize here is that inbounds is not nsw, it is nusw, which reduces to nuw for non-negative offsets.

I think I agree. If so, your patch here is definitely sound.

Might be good to update the SCEV docs or the inbounds docs to explicitly link the two definitions. Would make this type of discussion much easier going forward.

We should also maybe consider singing "nusw" into base SCEV wrap enum so we can be explicit about the difference. SCEV should conceptually be able to prove it on any inbounds gep which is known to reach a UB-if-poison use. (The use case here doesn't have to prove the existence of the use.) That may (or may not) actually be worth doing.

I've been thinking about this again and had a crazy thought: Do we need inbounds at all?

As far as I understand, everything here is working under the assumption that the pointer actually gets accessed on each iteration (otherwise a lot of the reasoning would not be valid anyway). The underlying object of an addrec only depends on the addrec start, so the underlying object must stay the same across all iterations. This means that, on each iteration, the pointer address must be part of that single underlying object -- otherwise we would not have provenance to perform the access, and behavior would be undefined. As no allocated object can cross the address space boundary, this implies that the pointer cannot wrap.

Herald added a subscriber: wangpc. · View Herald TranscriptJul 13 2023, 7:45 AM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

LoopAccessAnalysis.cpp

11 lines

test/

Analysis/

LoopAccessAnalysis/

wrapping-pointer-versioning.ll

1 line

Transforms/

LoopVectorize/

AArch64/

sve-gather-scatter.ll

61 lines

X86/

pr54634.ll

98 lines

LoopVersioning/

wrapping-pointer-versioning.ll

27 lines

Diff 519764

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,433 Lines • ▼ Show 20 Lines	std::optional<int64_t> llvm::getPtrStride(PredicatedScalarEvolution &PSE,
if (!ShouldCheckWrap)		if (!ShouldCheckWrap)
return Stride;		return Stride;

// The address calculation must not wrap. Otherwise, a dependence could be		// The address calculation must not wrap. Otherwise, a dependence could be
// inverted.		// inverted.
if (isNoWrapAddRec(Ptr, AR, PSE, Lp))		if (isNoWrapAddRec(Ptr, AR, PSE, Lp))
return Stride;		return Stride;

// An inbounds getelementptr that is a AddRec with a unit stride		// An inbounds getelementptr that is a AddRec cannot wrap: If it did wrap, we
// cannot wrap per definition. If it did, the result would be poison		// would go from an address in one half of the address space to an address in
// and any memory access dependent on it would be immediate UB		// the other half. As allocated objects can take up at most half the address
// when executed.		// space, the result would be poison and any memory access dependent on it
		// would be immediate UB when executed.
if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);		if (auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
GEP && GEP->isInBounds() && (Stride == 1 \|\| Stride == -1))		GEP && GEP->isInBounds())
return Stride;		return Stride;

// If the null pointer is undefined, then a access sequence which would		// If the null pointer is undefined, then a access sequence which would
// otherwise access it can be assumed not to unsigned wrap. Note that this		// otherwise access it can be assumed not to unsigned wrap. Note that this
// assumes the object in memory is aligned to the natural alignment.		// assumes the object in memory is aligned to the natural alignment.
unsigned AddrSpace = Ty->getPointerAddressSpace();		unsigned AddrSpace = Ty->getPointerAddressSpace();
if (!NullPointerIsDefined(Lp->getHeader()->getParent(), AddrSpace) &&		if (!NullPointerIsDefined(Lp->getHeader()->getParent(), AddrSpace) &&
(Stride == 1 \|\| Stride == -1))		(Stride == 1 \|\| Stride == -1))
▲ Show 20 Lines • Show All 1,434 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopAccessAnalysis/wrapping-pointer-versioning.ll

	Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
	; a SCEV predicate to continue analysis.			; a SCEV predicate to continue analysis.
	;			;
	; We can still analyze this by adding the required no wrap SCEV predicates.			; We can still analyze this by adding the required no wrap SCEV predicates.

	; LAA-LABEL: f5			; LAA-LABEL: f5
	; LAA: Memory dependences are safe{{$}}			; LAA: Memory dependences are safe{{$}}
	; LAA: SCEV assumptions:			; LAA: SCEV assumptions:
	; LAA-NEXT: {(2 * (trunc i64 %N to i32)),+,-2}<%for.body> Added Flags: <nssw>			; LAA-NEXT: {(2 * (trunc i64 %N to i32)),+,-2}<%for.body> Added Flags: <nssw>
	; LAA-NEXT: {((2 * (sext i32 (2 * (trunc i64 %N to i32)) to i64))<nsw> + %a),+,-4}<%for.body> Added Flags: <nusw>

	; LAA: [PSE] %arrayidxA = getelementptr inbounds i16, ptr %a, i32 %mul:			; LAA: [PSE] %arrayidxA = getelementptr inbounds i16, ptr %a, i32 %mul:
	; LAA-NEXT: ((2 * (sext i32 {(2 * (trunc i64 %N to i32)),+,-2}<%for.body> to i64))<nsw> + %a)			; LAA-NEXT: ((2 * (sext i32 {(2 * (trunc i64 %N to i32)),+,-2}<%for.body> to i64))<nsw> + %a)
	; LAA-NEXT: --> {((2 * (sext i32 (2 * (trunc i64 %N to i32)) to i64))<nsw> + %a),+,-4}<%for.body>			; LAA-NEXT: --> {((2 * (sext i32 (2 * (trunc i64 %N to i32)) to i64))<nsw> + %a),+,-4}<%for.body>

	define void @f5(ptr noalias %a,			define void @f5(ptr noalias %a,
	ptr noalias %b, i64 %N) {			ptr noalias %b, i64 %N) {
	entry:			entry:
	Show All 28 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll

	Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines



	define void @gather_nxv4i32_ind64_stride2(ptr noalias nocapture %a, ptr noalias nocapture readonly %b, i64 %n) #0 {			define void @gather_nxv4i32_ind64_stride2(ptr noalias nocapture %a, ptr noalias nocapture readonly %b, i64 %n) #0 {
	; CHECK-LABEL: @gather_nxv4i32_ind64_stride2(			; CHECK-LABEL: @gather_nxv4i32_ind64_stride2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 3			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 3
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ugt i64 [[TMP1]], [[N:%.]]			; CHECK-NEXT: [[MIN_ITERS_CHECK_NOT:%.]] = icmp ult i64 [[TMP1]], [[N:%.]]
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK_NOT]], label [[VECTOR_PH:%.]], label [[SCALAR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 3			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 3
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i64 [[TMP3]], i64 [[N_MOD_VF]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[TMP5]], 2			; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP6]], i64 0			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP8]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP4]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP6]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[STEP_ADD:%.*]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[STEP_ADD:%.*]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP7:%.*]] = shl <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP9:%.*]] = shl <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP8:%.*]] = shl <vscale x 4 x i64> [[STEP_ADD]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP10:%.*]] = shl <vscale x 4 x i64> [[STEP_ADD]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, ptr [[B:%.]], <vscale x 4 x i64> [[TMP7]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds float, ptr [[B:%.]], <vscale x 4 x i64> [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[B]], <vscale x 4 x i64> [[TMP8]]			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds float, ptr [[B]], <vscale x 4 x i64> [[TMP10]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0(<vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0(<vscale x 4 x ptr> [[TMP11]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)
	; CHECK-NEXT: [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0(<vscale x 4 x ptr> [[TMP10]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0(<vscale x 4 x ptr> [[TMP12]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> poison)
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[INDEX]]
	; CHECK-NEXT: store <vscale x 4 x float> [[WIDE_MASKED_GATHER]], ptr [[TMP11]], align 4			; CHECK-NEXT: store <vscale x 4 x float> [[WIDE_MASKED_GATHER]], ptr [[TMP13]], align 4
	; CHECK-NEXT: [[TMP12:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw i64 [[TMP12]], 2			; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i64 [[TMP14]], 2
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds float, ptr [[TMP11]], i64 [[TMP13]]			; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds float, ptr [[TMP13]], i64 [[TMP15]]
	; CHECK-NEXT: store <vscale x 4 x float> [[WIDE_MASKED_GATHER2]], ptr [[TMP14]], align 4			; CHECK-NEXT: store <vscale x 4 x float> [[WIDE_MASKED_GATHER2]], ptr [[TMP16]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP17:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP15]], 3			; CHECK-NEXT: [[TMP18:%.*]] = shl nuw nsw i64 [[TMP17]], 3
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP18]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[STEP_ADD]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[STEP_ADD]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP11:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[INDVARS_IV_STRIDE2:%.*]] = shl i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_STRIDE2:%.*]] = shl i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV_STRIDE2]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds float, ptr [[B]], i64 [[INDVARS_IV_STRIDE2]]
	; CHECK-NEXT: [[TMP18:%.*]] = load float, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP20:%.*]] = load float, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store float [[TMP18]], ptr [[ARRAYIDX2]], align 4			; CHECK-NEXT: store float [[TMP20]], ptr [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	Show All 21 Lines

llvm/test/Transforms/LoopVectorize/X86/pr54634.ll

	Show All 13 Lines
	; CHECK-NEXT: [[TMP5:%.]] = load ptr addrspace(10), ptr [[TMP0:%.]], align 8, !tbaa [[TBAA0:![0-9]+]]			; CHECK-NEXT: [[TMP5:%.]] = load ptr addrspace(10), ptr [[TMP0:%.]], align 8, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[TMP6:%.*]] = addrspacecast ptr addrspace(10) [[TMP4]] to ptr addrspace(11)			; CHECK-NEXT: [[TMP6:%.*]] = addrspacecast ptr addrspace(10) [[TMP4]] to ptr addrspace(11)
	; CHECK-NEXT: [[TMP7:%.*]] = load ptr addrspace(13), ptr addrspace(11) [[TMP6]], align 8, !tbaa [[TBAA5:![0-9]+]]			; CHECK-NEXT: [[TMP7:%.*]] = load ptr addrspace(13), ptr addrspace(11) [[TMP6]], align 8, !tbaa [[TBAA5:![0-9]+]]
	; CHECK-NEXT: [[DOTELT:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(10) [[TMP5]], i64 0, i32 0			; CHECK-NEXT: [[DOTELT:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(10) [[TMP5]], i64 0, i32 0
	; CHECK-NEXT: [[DOTUNPACK:%.*]] = load ptr addrspace(10), ptr addrspace(10) [[DOTELT]], align 8, !tbaa [[TBAA8:![0-9]+]]			; CHECK-NEXT: [[DOTUNPACK:%.*]] = load ptr addrspace(10), ptr addrspace(10) [[DOTELT]], align 8, !tbaa [[TBAA8:![0-9]+]]
	; CHECK-NEXT: [[DOTELT1:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(10) [[TMP5]], i64 0, i32 1			; CHECK-NEXT: [[DOTELT1:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(10) [[TMP5]], i64 0, i32 1
	; CHECK-NEXT: [[DOTUNPACK2:%.*]] = load i64, ptr addrspace(10) [[DOTELT1]], align 8, !tbaa [[TBAA8]]			; CHECK-NEXT: [[DOTUNPACK2:%.*]] = load i64, ptr addrspace(10) [[DOTELT1]], align 8, !tbaa [[TBAA8]]
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw i64 [[TMP2]], 1			; CHECK-NEXT: [[TMP8:%.*]] = add nsw i64 [[TMP2]], 1
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP8]], 28			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP8]], 16
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_SCEVCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.scevcheck:
	; CHECK-NEXT: [[MUL:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 16, i64 [[TMP2]])
	; CHECK-NEXT: [[MUL_RESULT:%.*]] = extractvalue { i64, i1 } [[MUL]], 0
	; CHECK-NEXT: [[MUL_OVERFLOW:%.*]] = extractvalue { i64, i1 } [[MUL]], 1
	; CHECK-NEXT: [[TMP9:%.*]] = sub i64 0, [[MUL_RESULT]]
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr i8, ptr addrspace(13) [[TMP7]], i64 [[MUL_RESULT]]
	; CHECK-NEXT: [[TMP11:%.*]] = icmp ult ptr addrspace(13) [[TMP10]], [[TMP7]]
	; CHECK-NEXT: [[TMP12:%.*]] = or i1 [[TMP11]], [[MUL_OVERFLOW]]
	; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(13) [[TMP7]], i64 8
	; CHECK-NEXT: [[MUL1:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 16, i64 [[TMP2]])
	; CHECK-NEXT: [[MUL_RESULT2:%.*]] = extractvalue { i64, i1 } [[MUL1]], 0
	; CHECK-NEXT: [[MUL_OVERFLOW3:%.*]] = extractvalue { i64, i1 } [[MUL1]], 1
	; CHECK-NEXT: [[TMP13:%.*]] = sub i64 0, [[MUL_RESULT2]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr i8, ptr addrspace(13) [[SCEVGEP]], i64 [[MUL_RESULT2]]
	; CHECK-NEXT: [[TMP15:%.*]] = icmp ult ptr addrspace(13) [[TMP14]], [[SCEVGEP]]
	; CHECK-NEXT: [[TMP16:%.*]] = or i1 [[TMP15]], [[MUL_OVERFLOW3]]
	; CHECK-NEXT: [[TMP17:%.*]] = or i1 [[TMP12]], [[TMP16]]
	; CHECK-NEXT: br i1 [[TMP17]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP8]], 16			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP8]], 16
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP8]], [[N_MOD_VF]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP8]], [[N_MOD_VF]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x ptr addrspace(10)> poison, ptr addrspace(10) [[DOTUNPACK]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x ptr addrspace(10)> poison, ptr addrspace(10) [[DOTUNPACK]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x ptr addrspace(10)> [[BROADCAST_SPLATINSERT]], <4 x ptr addrspace(10)> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x ptr addrspace(10)> [[BROADCAST_SPLATINSERT]], <4 x ptr addrspace(10)> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT7:%.*]] = insertelement <4 x ptr addrspace(10)> poison, ptr addrspace(10) [[DOTUNPACK]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT4:%.*]] = insertelement <4 x ptr addrspace(10)> poison, ptr addrspace(10) [[DOTUNPACK]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT8:%.*]] = shufflevector <4 x ptr addrspace(10)> [[BROADCAST_SPLATINSERT7]], <4 x ptr addrspace(10)> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT5:%.*]] = shufflevector <4 x ptr addrspace(10)> [[BROADCAST_SPLATINSERT4]], <4 x ptr addrspace(10)> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <4 x ptr addrspace(10)> poison, ptr addrspace(10) [[DOTUNPACK]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <4 x ptr addrspace(10)> poison, ptr addrspace(10) [[DOTUNPACK]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <4 x ptr addrspace(10)> [[BROADCAST_SPLATINSERT9]], <4 x ptr addrspace(10)> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT7:%.*]] = shufflevector <4 x ptr addrspace(10)> [[BROADCAST_SPLATINSERT6]], <4 x ptr addrspace(10)> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT11:%.*]] = insertelement <4 x ptr addrspace(10)> poison, ptr addrspace(10) [[DOTUNPACK]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <4 x ptr addrspace(10)> poison, ptr addrspace(10) [[DOTUNPACK]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT12:%.*]] = shufflevector <4 x ptr addrspace(10)> [[BROADCAST_SPLATINSERT11]], <4 x ptr addrspace(10)> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x ptr addrspace(10)> [[BROADCAST_SPLATINSERT8]], <4 x ptr addrspace(10)> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT13:%.*]] = insertelement <4 x i64> poison, i64 [[DOTUNPACK2]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT10:%.*]] = insertelement <4 x i64> poison, i64 [[DOTUNPACK2]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT14:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT13]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT11:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT10]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT15:%.*]] = insertelement <4 x i64> poison, i64 [[DOTUNPACK2]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT12:%.*]] = insertelement <4 x i64> poison, i64 [[DOTUNPACK2]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT16:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT15]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT13:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT12]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT17:%.*]] = insertelement <4 x i64> poison, i64 [[DOTUNPACK2]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT14:%.*]] = insertelement <4 x i64> poison, i64 [[DOTUNPACK2]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT18:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT17]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT15:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT14]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT19:%.*]] = insertelement <4 x i64> poison, i64 [[DOTUNPACK2]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT16:%.*]] = insertelement <4 x i64> poison, i64 [[DOTUNPACK2]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT20:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT19]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT17:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT16]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[STEP_ADD4:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[STEP_ADD5:%.*]] = add <4 x i64> [[STEP_ADD4]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[VEC_IND]], i32 0
	; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD4]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD1]], i32 0
	; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD5]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD2]], i32 0
	; CHECK-NEXT: call void @llvm.masked.scatter.v4p10.v4p13(<4 x ptr addrspace(10)> [[BROADCAST_SPLAT]], <4 x ptr addrspace(13)> [[TMP18]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10:![0-9]+]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4p10.v4p13(<4 x ptr addrspace(10)> [[BROADCAST_SPLAT]], <4 x ptr addrspace(13)> [[TMP9]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10:![0-9]+]]
	; CHECK-NEXT: call void @llvm.masked.scatter.v4p10.v4p13(<4 x ptr addrspace(10)> [[BROADCAST_SPLAT8]], <4 x ptr addrspace(13)> [[TMP19]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4p10.v4p13(<4 x ptr addrspace(10)> [[BROADCAST_SPLAT5]], <4 x ptr addrspace(13)> [[TMP10]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]
	; CHECK-NEXT: call void @llvm.masked.scatter.v4p10.v4p13(<4 x ptr addrspace(10)> [[BROADCAST_SPLAT10]], <4 x ptr addrspace(13)> [[TMP20]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4p10.v4p13(<4 x ptr addrspace(10)> [[BROADCAST_SPLAT7]], <4 x ptr addrspace(13)> [[TMP11]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]
	; CHECK-NEXT: call void @llvm.masked.scatter.v4p10.v4p13(<4 x ptr addrspace(10)> [[BROADCAST_SPLAT12]], <4 x ptr addrspace(13)> [[TMP21]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4p10.v4p13(<4 x ptr addrspace(10)> [[BROADCAST_SPLAT9]], <4 x ptr addrspace(13)> [[TMP12]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]
	; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[VEC_IND]], i32 1
	; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD]], i32 1
	; CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD4]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD1]], i32 1
	; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD5]], i32 1			; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], <4 x i64> [[STEP_ADD2]], i32 1
	; CHECK-NEXT: call void @llvm.masked.scatter.v4i64.v4p13(<4 x i64> [[BROADCAST_SPLAT14]], <4 x ptr addrspace(13)> [[TMP22]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4i64.v4p13(<4 x i64> [[BROADCAST_SPLAT11]], <4 x ptr addrspace(13)> [[TMP13]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]
	; CHECK-NEXT: call void @llvm.masked.scatter.v4i64.v4p13(<4 x i64> [[BROADCAST_SPLAT16]], <4 x ptr addrspace(13)> [[TMP23]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4i64.v4p13(<4 x i64> [[BROADCAST_SPLAT13]], <4 x ptr addrspace(13)> [[TMP14]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]
	; CHECK-NEXT: call void @llvm.masked.scatter.v4i64.v4p13(<4 x i64> [[BROADCAST_SPLAT18]], <4 x ptr addrspace(13)> [[TMP24]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4i64.v4p13(<4 x i64> [[BROADCAST_SPLAT15]], <4 x ptr addrspace(13)> [[TMP15]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]
	; CHECK-NEXT: call void @llvm.masked.scatter.v4i64.v4p13(<4 x i64> [[BROADCAST_SPLAT20]], <4 x ptr addrspace(13)> [[TMP25]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]			; CHECK-NEXT: call void @llvm.masked.scatter.v4i64.v4p13(<4 x i64> [[BROADCAST_SPLAT17]], <4 x ptr addrspace(13)> [[TMP16]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>), !tbaa [[TBAA10]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD5]], <i64 4, i64 4, i64 4, i64 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
	; CHECK-NEXT: [[TMP26:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP26]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP8]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP8]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[L44:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[L44:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[TOP:%.]] ], [ 0, [[VECTOR_SCEVCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[TOP:%.]] ]
	; CHECK-NEXT: br label [[L26:%.*]]			; CHECK-NEXT: br label [[L26:%.*]]
	; CHECK: L26:			; CHECK: L26:
	; CHECK-NEXT: [[VALUE_PHI5:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[TMP27:%.]], [[L26]] ]			; CHECK-NEXT: [[VALUE_PHI5:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[TMP18:%.]], [[L26]] ]
	; CHECK-NEXT: [[DOTREPACK:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], i64 [[VALUE_PHI5]], i32 0			; CHECK-NEXT: [[DOTREPACK:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], i64 [[VALUE_PHI5]], i32 0
	; CHECK-NEXT: store ptr addrspace(10) [[DOTUNPACK]], ptr addrspace(13) [[DOTREPACK]], align 8, !tbaa [[TBAA10]]			; CHECK-NEXT: store ptr addrspace(10) [[DOTUNPACK]], ptr addrspace(13) [[DOTREPACK]], align 8, !tbaa [[TBAA10]]
	; CHECK-NEXT: [[DOTREPACK4:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], i64 [[VALUE_PHI5]], i32 1			; CHECK-NEXT: [[DOTREPACK4:%.*]] = getelementptr inbounds { ptr addrspace(10), i64 }, ptr addrspace(13) [[TMP7]], i64 [[VALUE_PHI5]], i32 1
	; CHECK-NEXT: store i64 [[DOTUNPACK2]], ptr addrspace(13) [[DOTREPACK4]], align 8, !tbaa [[TBAA10]]			; CHECK-NEXT: store i64 [[DOTUNPACK2]], ptr addrspace(13) [[DOTREPACK4]], align 8, !tbaa [[TBAA10]]
	; CHECK-NEXT: [[TMP27]] = add i64 [[VALUE_PHI5]], 1			; CHECK-NEXT: [[TMP18]] = add i64 [[VALUE_PHI5]], 1
	; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i64 [[VALUE_PHI5]], [[TMP2]]			; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i64 [[VALUE_PHI5]], [[TMP2]]
	; CHECK-NEXT: br i1 [[DOTNOT]], label [[L44]], label [[L26]], !llvm.loop [[LOOP15:![0-9]+]]			; CHECK-NEXT: br i1 [[DOTNOT]], label [[L44]], label [[L26]], !llvm.loop [[LOOP15:![0-9]+]]
	; CHECK: L44:			; CHECK: L44:
	; CHECK-NEXT: ret ptr addrspace(10) null			; CHECK-NEXT: ret ptr addrspace(10) null
	;			;
	top:			top:
	%2 = sext i32 %1 to i64			%2 = sext i32 %1 to i64
	%3 = load atomic ptr addrspace(10) (ptr addrspace(10), i64), ptr addrspace(10) (ptr addrspace(10), i64)* bitcast (ptr @jlplt_ijl_alloc_array_1d_10294_got to ptr addrspace(10) (ptr addrspace(10), i64)**) unordered, align 8			%3 = load atomic ptr addrspace(10) (ptr addrspace(10), i64), ptr addrspace(10) (ptr addrspace(10), i64)* bitcast (ptr @jlplt_ijl_alloc_array_1d_10294_got to ptr addrspace(10) (ptr addrspace(10), i64)**) unordered, align 8
	Show All 40 Lines

llvm/test/Transforms/LoopVersioning/wrapping-pointer-versioning.ll

Show First 20 Lines • Show All 432 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body, %entry
%exitcond = icmp eq i64 %inc, %N		%exitcond = icmp eq i64 %inc, %N
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

; The following function is similar to the one above, but has the GEP		; The following function is similar to the one above, but has the GEP
; to pointer %A inbounds. The index %mul doesn't have the nsw flag.		; to pointer %A inbounds. Even though the index %mul doesn't have the nsw flag,
; This means that the SCEV expression for %mul can wrap and we need		; we know that it cannot wrap, as this would lead to a memory acces on an
; a SCEV predicate to continue analysis.		; out of bounds GEP.
;
; We can still analyze this by adding the required no wrap SCEV predicates.

define void @f5(ptr noalias %a,		define void @f5(ptr noalias %a,
; LV-LABEL: @f5(		; LV-LABEL: @f5(
; LV-NEXT: for.body.lver.check:		; LV-NEXT: for.body.lver.check:
; LV-NEXT: [[TRUNCN:%.]] = trunc i64 [[N:%.]] to i32		; LV-NEXT: [[TRUNCN:%.]] = trunc i64 [[N:%.]] to i32
; LV-NEXT: [[TMP0:%.*]] = add i64 [[N]], -1		; LV-NEXT: [[TMP0:%.*]] = add i64 [[N]], -1
; LV-NEXT: [[TMP1:%.*]] = shl i32 [[TRUNCN]], 1		; LV-NEXT: [[TMP1:%.*]] = shl i32 [[TRUNCN]], 1
; LV-NEXT: [[TMP2:%.*]] = trunc i64 [[TMP0]] to i32		; LV-NEXT: [[TMP2:%.*]] = trunc i64 [[TMP0]] to i32
; LV-NEXT: [[MUL1:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 2, i32 [[TMP2]])		; LV-NEXT: [[MUL1:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 2, i32 [[TMP2]])
; LV-NEXT: [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL1]], 0		; LV-NEXT: [[MUL_RESULT:%.*]] = extractvalue { i32, i1 } [[MUL1]], 0
; LV-NEXT: [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL1]], 1		; LV-NEXT: [[MUL_OVERFLOW:%.*]] = extractvalue { i32, i1 } [[MUL1]], 1
; LV-NEXT: [[TMP3:%.*]] = sub i32 [[TMP1]], [[MUL_RESULT]]		; LV-NEXT: [[TMP3:%.*]] = sub i32 [[TMP1]], [[MUL_RESULT]]
; LV-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], [[TMP1]]		; LV-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP3]], [[TMP1]]
; LV-NEXT: [[TMP5:%.*]] = or i1 [[TMP4]], [[MUL_OVERFLOW]]		; LV-NEXT: [[TMP5:%.*]] = or i1 [[TMP4]], [[MUL_OVERFLOW]]
; LV-NEXT: [[TMP6:%.*]] = icmp ugt i64 [[TMP0]], 4294967295		; LV-NEXT: [[TMP6:%.*]] = icmp ugt i64 [[TMP0]], 4294967295
; LV-NEXT: [[TMP7:%.*]] = or i1 [[TMP5]], [[TMP6]]		; LV-NEXT: [[TMP7:%.*]] = or i1 [[TMP5]], [[TMP6]]
; LV-NEXT: [[TMP8:%.*]] = sext i32 [[TMP1]] to i64		; LV-NEXT: br i1 [[TMP7]], label [[FOR_BODY_PH_LVER_ORIG:%.]], label [[FOR_BODY_PH:%.]]
; LV-NEXT: [[TMP9:%.*]] = shl nsw i64 [[TMP8]], 1
; LV-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP9]]
; LV-NEXT: [[MUL2:%.*]] = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 4, i64 [[TMP0]])
; LV-NEXT: [[MUL_RESULT3:%.*]] = extractvalue { i64, i1 } [[MUL2]], 0
; LV-NEXT: [[MUL_OVERFLOW4:%.*]] = extractvalue { i64, i1 } [[MUL2]], 1
; LV-NEXT: [[TMP10:%.*]] = sub i64 0, [[MUL_RESULT3]]
; LV-NEXT: [[TMP11:%.*]] = getelementptr i8, ptr [[SCEVGEP]], i64 [[TMP10]]
; LV-NEXT: [[TMP12:%.*]] = icmp ugt ptr [[TMP11]], [[SCEVGEP]]
; LV-NEXT: [[TMP13:%.*]] = or i1 [[TMP12]], [[MUL_OVERFLOW4]]
; LV-NEXT: [[TMP14:%.*]] = or i1 [[TMP7]], [[TMP13]]
; LV-NEXT: br i1 [[TMP14]], label [[FOR_BODY_PH_LVER_ORIG:%.]], label [[FOR_BODY_PH:%.]]
; LV: for.body.ph.lver.orig:		; LV: for.body.ph.lver.orig:
; LV-NEXT: br label [[FOR_BODY_LVER_ORIG:%.*]]		; LV-NEXT: br label [[FOR_BODY_LVER_ORIG:%.*]]
; LV: for.body.lver.orig:		; LV: for.body.lver.orig:
; LV-NEXT: [[IND_LVER_ORIG:%.]] = phi i64 [ 0, [[FOR_BODY_PH_LVER_ORIG]] ], [ [[INC_LVER_ORIG:%.]], [[FOR_BODY_LVER_ORIG]] ]		; LV-NEXT: [[IND_LVER_ORIG:%.]] = phi i64 [ 0, [[FOR_BODY_PH_LVER_ORIG]] ], [ [[INC_LVER_ORIG:%.]], [[FOR_BODY_LVER_ORIG]] ]
; LV-NEXT: [[IND1_LVER_ORIG:%.]] = phi i32 [ [[TRUNCN]], [[FOR_BODY_PH_LVER_ORIG]] ], [ [[DEC_LVER_ORIG:%.]], [[FOR_BODY_LVER_ORIG]] ]		; LV-NEXT: [[IND1_LVER_ORIG:%.]] = phi i32 [ [[TRUNCN]], [[FOR_BODY_PH_LVER_ORIG]] ], [ [[DEC_LVER_ORIG:%.]], [[FOR_BODY_LVER_ORIG]] ]
; LV-NEXT: [[MUL_LVER_ORIG:%.*]] = mul i32 [[IND1_LVER_ORIG]], 2		; LV-NEXT: [[MUL_LVER_ORIG:%.*]] = mul i32 [[IND1_LVER_ORIG]], 2
; LV-NEXT: [[ARRAYIDXA_LVER_ORIG:%.*]] = getelementptr inbounds i16, ptr [[A]], i32 [[MUL_LVER_ORIG]]		; LV-NEXT: [[ARRAYIDXA_LVER_ORIG:%.]] = getelementptr inbounds i16, ptr [[A:%.]], i32 [[MUL_LVER_ORIG]]
; LV-NEXT: [[LOADA_LVER_ORIG:%.*]] = load i16, ptr [[ARRAYIDXA_LVER_ORIG]], align 2		; LV-NEXT: [[LOADA_LVER_ORIG:%.*]] = load i16, ptr [[ARRAYIDXA_LVER_ORIG]], align 2
; LV-NEXT: [[ARRAYIDXB_LVER_ORIG:%.]] = getelementptr inbounds i16, ptr [[B:%.]], i64 [[IND_LVER_ORIG]]		; LV-NEXT: [[ARRAYIDXB_LVER_ORIG:%.]] = getelementptr inbounds i16, ptr [[B:%.]], i64 [[IND_LVER_ORIG]]
; LV-NEXT: [[LOADB_LVER_ORIG:%.*]] = load i16, ptr [[ARRAYIDXB_LVER_ORIG]], align 2		; LV-NEXT: [[LOADB_LVER_ORIG:%.*]] = load i16, ptr [[ARRAYIDXB_LVER_ORIG]], align 2
; LV-NEXT: [[ADD_LVER_ORIG:%.*]] = mul i16 [[LOADA_LVER_ORIG]], [[LOADB_LVER_ORIG]]		; LV-NEXT: [[ADD_LVER_ORIG:%.*]] = mul i16 [[LOADA_LVER_ORIG]], [[LOADB_LVER_ORIG]]
; LV-NEXT: store i16 [[ADD_LVER_ORIG]], ptr [[ARRAYIDXA_LVER_ORIG]], align 2		; LV-NEXT: store i16 [[ADD_LVER_ORIG]], ptr [[ARRAYIDXA_LVER_ORIG]], align 2
; LV-NEXT: [[INC_LVER_ORIG]] = add nuw nsw i64 [[IND_LVER_ORIG]], 1		; LV-NEXT: [[INC_LVER_ORIG]] = add nuw nsw i64 [[IND_LVER_ORIG]], 1
; LV-NEXT: [[DEC_LVER_ORIG]] = sub i32 [[IND1_LVER_ORIG]], 1		; LV-NEXT: [[DEC_LVER_ORIG]] = sub i32 [[IND1_LVER_ORIG]], 1
; LV-NEXT: [[EXITCOND_LVER_ORIG:%.*]] = icmp eq i64 [[INC_LVER_ORIG]], [[N]]		; LV-NEXT: [[EXITCOND_LVER_ORIG:%.*]] = icmp eq i64 [[INC_LVER_ORIG]], [[N]]
; LV-NEXT: br i1 [[EXITCOND_LVER_ORIG]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY_LVER_ORIG]]		; LV-NEXT: br i1 [[EXITCOND_LVER_ORIG]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY_LVER_ORIG]]
; LV: for.body.ph:		; LV: for.body.ph:
; LV-NEXT: br label [[FOR_BODY:%.*]]		; LV-NEXT: br label [[FOR_BODY:%.*]]
; LV: for.body:		; LV: for.body:
; LV-NEXT: [[IND:%.]] = phi i64 [ 0, [[FOR_BODY_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]		; LV-NEXT: [[IND:%.]] = phi i64 [ 0, [[FOR_BODY_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]
; LV-NEXT: [[IND1:%.]] = phi i32 [ [[TRUNCN]], [[FOR_BODY_PH]] ], [ [[DEC:%.]], [[FOR_BODY]] ]		; LV-NEXT: [[IND1:%.]] = phi i32 [ [[TRUNCN]], [[FOR_BODY_PH]] ], [ [[DEC:%.]], [[FOR_BODY]] ]
; LV-NEXT: [[MUL:%.*]] = mul i32 [[IND1]], 2		; LV-NEXT: [[MUL:%.*]] = mul i32 [[IND1]], 2
; LV-NEXT: [[ARRAYIDXA:%.*]] = getelementptr inbounds i16, ptr [[A]], i32 [[MUL]]		; LV-NEXT: [[ARRAYIDXA:%.*]] = getelementptr inbounds i16, ptr [[A]], i32 [[MUL]]
; LV-NEXT: [[LOADA:%.*]] = load i16, ptr [[ARRAYIDXA]], align 2		; LV-NEXT: [[LOADA:%.*]] = load i16, ptr [[ARRAYIDXA]], align 2
; LV-NEXT: [[ARRAYIDXB:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 [[IND]]		; LV-NEXT: [[ARRAYIDXB:%.*]] = getelementptr inbounds i16, ptr [[B]], i64 [[IND]]
; LV-NEXT: [[LOADB:%.*]] = load i16, ptr [[ARRAYIDXB]], align 2		; LV-NEXT: [[LOADB:%.*]] = load i16, ptr [[ARRAYIDXB]], align 2
; LV-NEXT: [[ADD:%.*]] = mul i16 [[LOADA]], [[LOADB]]		; LV-NEXT: [[ADD:%.*]] = mul i16 [[LOADA]], [[LOADB]]
; LV-NEXT: store i16 [[ADD]], ptr [[ARRAYIDXA]], align 2		; LV-NEXT: store i16 [[ADD]], ptr [[ARRAYIDXA]], align 2
; LV-NEXT: [[INC]] = add nuw nsw i64 [[IND]], 1		; LV-NEXT: [[INC]] = add nuw nsw i64 [[IND]], 1
; LV-NEXT: [[DEC]] = sub i32 [[IND1]], 1		; LV-NEXT: [[DEC]] = sub i32 [[IND1]], 1
; LV-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[N]]		; LV-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[N]]
; LV-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT5:%.*]], label [[FOR_BODY]]		; LV-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT2:%.*]], label [[FOR_BODY]]
; LV: for.end.loopexit:		; LV: for.end.loopexit:
; LV-NEXT: br label [[FOR_END:%.*]]		; LV-NEXT: br label [[FOR_END:%.*]]
; LV: for.end.loopexit5:		; LV: for.end.loopexit2:
; LV-NEXT: br label [[FOR_END]]		; LV-NEXT: br label [[FOR_END]]
; LV: for.end:		; LV: for.end:
; LV-NEXT: ret void		; LV-NEXT: ret void
;		;
ptr noalias %b, i64 %N) {		ptr noalias %b, i64 %N) {
entry:		entry:
%TruncN = trunc i64 %N to i32		%TruncN = trunc i64 %N to i32
br label %for.body		br label %for.body
Show All 26 Lines