This is an archive of the discontinued LLVM Phabricator instance.

Alternatively, it would also be possible to canonicalize gep(p, add(x, y)) to gep(gep(p, x), y) in general, in which case existing GEP reassociation support in LICM will take care of the rest. Arguably this is cleaner (as we should have a canonical form between these two possibilities), but it's more likely to cause fallout.

Harbormaster completed remote builds in B246479: Diff 541923.Jul 19 2023, 6:11 AM

Relaxed checks

Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2023, 3:08 PM

Herald added subscribers: cfe-commits, wangpc, zzheng. · View Herald Transcript

Harbormaster completed remote builds in B257295: Diff 556881.Sep 15 2023, 3:24 PM

updated

Harbormaster completed remote builds in B257298: Diff 556884.Sep 15 2023, 4:19 PM

unit test fixed

Harbormaster completed remote builds in B257304: Diff 556894.Sep 16 2023, 3:49 AM

unit tests

Harbormaster completed remote builds in B257306: Diff 556898.Sep 16 2023, 12:05 PM

@nikic Could you check out the updated code to make sure we're on the right track before I try to fix the rest of the unit tests?

unit tests

Harbormaster completed remote builds in B257396: Diff 557022.Sep 19 2023, 3:59 AM

Hexagon test updated

Harbormaster completed remote builds in B257445: Diff 557104.Sep 20 2023, 3:02 AM

d-smirnov added a reviewer: dmgreen.Sep 20 2023, 3:22 AM

paulwalker-arm added inline comments.Sep 20 2023, 3:32 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2318–2319	Perhaps move this block after the `We do not handle pointer-vector geps here` immediately below so this test can be removed.

nikic added inline comments.Sep 20 2023, 4:00 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2322	This needs a one-use check. The transform is not profitable if we have to keep both the add and the gep. Can also use `match(GEP.getOperand(1), m_Add(...))` here.
2325	This no longer checks for loop invariance, so we should remove any invariance-related terminology.
2337	This inbounds preservation is incorrect: https://alive2.llvm.org/ce/z/bJZvQG It's even incorrect if the add is also nsw.

Amended

Reordered and removed extra check

updated

Harbormaster completed remote builds in B257470: Diff 557138.Sep 20 2023, 12:45 PM

d-smirnov retitled this revision from [PATCH] [llvm] [InstCombine] Reassociate loop invariant GEP index calculations. to [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP.Sep 21 2023, 9:25 AM

d-smirnov edited the summary of this revision. (Show Details)

comment updated

Harbormaster completed remote builds in B257503: Diff 557187.Sep 21 2023, 11:35 AM

@nikic Updated. Please review

nikic added inline comments.Oct 2 2023, 6:39 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2325–2327
2333
2336	No need for the NewGEP variable.

amended

@nikic Amended.

Harbormaster completed remote builds in B257724: Diff 557538.Oct 4 2023, 1:50 AM

LGTM

We should give this a try, but I think there is a fairly large chance that this will cause regressions somewhere and a more targeted change may be necessary (e.g. only do this for loop-invariants in LICM).

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
2334–2335	IRBuilder needs to be used for all but the last instruction.

This revision is now accepted and ready to land.Oct 4 2023, 6:34 AM

Updated

Harbormaster completed remote builds in B257767: Diff 557609.Oct 5 2023, 8:59 AM

Closed by commit rGe13bed4c5f35: [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP (authored by d-smirnov, committed by MatsPetersson). · Explain WhyOct 6 2023, 4:38 AM

This revision was automatically updated to reflect the committed changes.

MatsPetersson added a commit: rGe13bed4c5f35: [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP.

How does this patch work with visitGEPOfGEP that does a reverse transformation?

// Replace: gep (gep %P, long B), long A, ...
// With:    T = long A+B; gep %P, T, ...

In D155688#4653347, @fiigii wrote:
How does this patch work with visitGEPOfGEP that does a reverse transformation?
// Replace: gep (gep %P, long B), long A, ...
// With:    T = long A+B; gep %P, T, ...

The reverse transform is only done if A + B simplifies.

By the way, this change did cause some code size regressions: http://llvm-compile-time-tracker.com/compare.php?from=a16f6462d756804276d4b39267b3c19bcd6949fe&to=e13bed4c5f3544c076ce57e36d9a11eefa5a7815&stat=size-text

The one that stood out to me is that btGjkEpa2.cpp from bullet has become 13% larger.

The reverse transform is only done if A + B simplifies.

Looks like`simplifyAddInst` may give add expressions, so I guess this patch may make IC run into infinite loops.

Additionally, this change could make longer GEP chains that could hurt other optimizations by exceeding AA or value-tracking thresholds.

We have some improvements with the patch, most notable: 549.fotonik_3d improves about 6%.
@nikic Should we revert the patch and try another location for it (in LICM pass, as you previously suggested)?

In D155688#4653520, @fiigii wrote:

The reverse transform is only done if A + B simplifies.

Looks like`simplifyAddInst` may give add expressions, so I guess this patch may make IC run into infinite loops.

simplifyAddInst can return an add instruction, but it will be an existing one. It will never introduce a new one. So I'm not sure how this would result in infinite loops?

In D155688#4653629, @d-smirnov wrote:

We have some improvements with the patch, most notable: 549.fotonik_3d improves about 6%.
@nikic Should we revert the patch and try another location for it (in LICM pass, as you previously suggested)?

I don't think we have cause to revert just yet, as we're not aware of any specific issues.

That would be fine. Thanks for explaining.

After this patch was recently pulled into my downstream, I'm seeing a lot of invariant.gep created by LICM. For example, in LBM_performStreamCollide in 470.lbm there are 65 of them. On RISC-V, these all get created in registers outside the loop and get spilled. Is ARM seeing anything like this or do you have more addressing modes that allow CodeGenPrepare to bring these back into the loop?

I hadn't realized this came from someone at Arm. The performance results I had were overall roughly flat, with some improvements and regressions. I think there were still some people working through some fixes for some of the knock-on effects but with those nothing large would stick out in what I saw.

I would expect Loop Strength Reduction (maybe with CGP) to be able to optimize the addressing modes back to something that is optimal for the loop if it can. It's not always super reliable though. Might there be something going wrong in that pass?

Revision Contents

Path

Size

clang/

test/

CodeGenCXX/

microsoft-abi-dynamic-cast.cpp

6 lines

llvm/

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

18 lines

test/

CodeGen/

Hexagon/

autohvx/

vector-align-tbaa.ll

156 lines

Transforms/

InstCombine/

align-addr.ll

7 lines

mem-par-metadata-memcpy.ll

8 lines

memrchr-4.ll

4 lines

shift.ll

4 lines

LoopVectorize/

AArch64/

sve-interleaved-accesses.ll

16 lines

sve-widen-phi.ll

4 lines

induction.ll

84 lines

interleaved-accesses.ll

50 lines

invariant-store-vectorization.ll

80 lines

runtime-check.ll

36 lines

LowerMatrixIntrinsics/

multiply-fused-loops.ll

40 lines

Diff 557628

clang/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	void* test9(B* x) { return dynamic_cast<void*>(x); }			void* test9(B* x) { return dynamic_cast<void*>(x); }
	// CHECK-LABEL: define dso_local noundef ptr @"?test9@@YAPAXPAUB@@@Z"(ptr noundef %x)			// CHECK-LABEL: define dso_local noundef ptr @"?test9@@YAPAXPAUB@@@Z"(ptr noundef %x)
	// CHECK: [[CHECK:%.*]] = icmp eq ptr %x, null			// CHECK: [[CHECK:%.*]] = icmp eq ptr %x, null
	// CHECK-NEXT: br i1 [[CHECK]]			// CHECK-NEXT: br i1 [[CHECK]]
	// CHECK: [[VBPTR:%.*]] = getelementptr inbounds i8, ptr %x, i32 4			// CHECK: [[VBPTR:%.*]] = getelementptr inbounds i8, ptr %x, i32 4
	// CHECK-NEXT: [[VBTBL:%.*]] = load ptr, ptr [[VBPTR]], align 4			// CHECK-NEXT: [[VBTBL:%.*]] = load ptr, ptr [[VBPTR]], align 4
	// CHECK-NEXT: [[VBOFFP:%.*]] = getelementptr inbounds i32, ptr [[VBTBL]], i32 1			// CHECK-NEXT: [[VBOFFP:%.*]] = getelementptr inbounds i32, ptr [[VBTBL]], i32 1
	// CHECK-NEXT: [[VBOFFS:%.*]] = load i32, ptr [[VBOFFP]], align 4			// CHECK-NEXT: [[VBOFFS:%.*]] = load i32, ptr [[VBOFFP]], align 4
	// CHECK-NEXT: [[DELTA:%.*]] = add nsw i32 [[VBOFFS]], 4			// CHECK-NEXT: [[BASE:%.*]] = getelementptr i8, ptr %x, i32 [[VBOFFS]]
	// CHECK-NEXT: [[ADJ:%.*]] = getelementptr inbounds i8, ptr %x, i32 [[DELTA]]			// CHECK-NEXT: [[ADJ:%.*]] = getelementptr i8, ptr [[BASE]], i32 4
	// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__RTCastToVoid(ptr nonnull [[ADJ]])			// CHECK-NEXT: [[CALL:%.*]] = tail call ptr @__RTCastToVoid(ptr [[ADJ]])
	// CHECK-NEXT: br label			// CHECK-NEXT: br label
	// CHECK: [[RET:%.*]] = phi ptr			// CHECK: [[RET:%.*]] = phi ptr
	// CHECK-NEXT: ret ptr [[RET]]			// CHECK-NEXT: ret ptr [[RET]]

	namespace PR25606 {			namespace PR25606 {
	struct Cleanup {			struct Cleanup {
	~Cleanup();			~Cleanup();
	};			};
	Show All 14 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 2,309 Lines • ▼ Show 20 Lines if (GEP.getOperand(1)->getType()->getScalarSizeInBits() ==

// is not necessarily retained). // is not necessarily retained).

Value *Y; Value *Y;

Value *X = GEP.getOperand(0); Value *X = GEP.getOperand(0);

if (Matched && if (Matched &&

match(V, m_Sub(m_PtrToInt(m_Value(Y)), m_PtrToInt(m_Specific(X)))) && match(V, m_Sub(m_PtrToInt(m_Value(Y)), m_PtrToInt(m_Specific(X)))) &&

getUnderlyingObject(X) == getUnderlyingObject(Y)) getUnderlyingObject(X) == getUnderlyingObject(Y))

return CastInst::CreatePointerBitCastOrAddrSpaceCast(Y, GEPType); return CastInst::CreatePointerBitCastOrAddrSpaceCast(Y, GEPType);

} }

// We do not handle pointer-vector geps here. // We do not handle pointer-vector geps here.

paulwalker-armUnsubmitted

Done

Perhaps move this block after the We do not handle pointer-vector geps here immediately below so this test can be removed.

paulwalker-arm: Perhaps move this block after the `We do not handle pointer-vector geps here` immediately below…

if (GEPType->isVectorTy()) if (GEPType->isVectorTy())

return nullptr; return nullptr;

nikicUnsubmitted

Done

This needs a one-use check. The transform is not profitable if we have to keep *both* the add and the gep.

Can also use match(GEP.getOperand(1), m_Add(...)) here.

nikic: This needs a one-use check. The transform is not profitable if we have to keep *both* the add…

if (GEP.getNumIndices() == 1) {

// Try to replace ADD + GEP with GEP + GEP.

Value *Idx1, *Idx2;

nikicUnsubmitted

Done

This no longer checks for loop invariance, so we should remove any invariance-related terminology.

nikic: This no longer checks for loop invariance, so we should remove any invariance-related…

if (match(GEP.getOperand(1),

m_OneUse(m_Add(m_Value(Idx1), m_Value(Idx2))))) {

nikicUnsubmitted

Done

// Try to replace ADD + GEP with GEP + GEP.

- if (BinaryOperator *Idx =

- dyn_cast_or_null<BinaryOperator>(GEP.getOperand(1)))

- if ((Idx->getOpcode() == Instruction::Add) && Idx->hasOneUse()) {

+ Value *Idx1, *Idx2;

+ if (match(GEP.getOperand(1), m_OneUse(m_Add(m_Value(Idx1), m_Value(Idx2)))) {

// %idx = add i64 %idx1, %idx2

nikic:

// %idx = add i64 %idx1, %idx2

// %gep = getelementptr i32, i32* %ptr, i64 %idx

// as:

// %newptr = getelementptr i32, i32* %ptr, i64 %idx1

// %newgep = getelementptr i32, i32* %newptr, i64 %idx2

auto *NewPtr = Builder.CreateGEP(GEP.getResultElementType(),

nikicUnsubmitted

Done

// %newgep = getelementptr i32, i32* %newptr, i64 %idx2

- Value *Ptr = GEP.getOperand(0);

+ Value *Ptr = GEP.getPointerOperand();

auto *NewPtr = GetElementPtrInst::Create(

nikic:

GEP.getPointerOperand(), Idx1);

return GetElementPtrInst::Create(GEP.getResultElementType(), NewPtr,

nikicUnsubmitted

Done

Value *Ptr = GEP.getPointerOperand();

- auto *NewPtr = GetElementPtrInst::Create(GEP.getResultElementType(), Ptr,

- Idx1, "", &GEP);

+ auto *NewPtr = Builder.CreateGEP(GEP.getResultElementType(), Ptr, Idx1);

return GetElementPtrInst::Create(GEP.getResultElementType(), NewPtr,

IRBuilder needs to be used for all but the last instruction.

nikic: IRBuilder needs to be used for all but the last instruction.

Idx2);

nikicUnsubmitted

Done

No need for the NewGEP variable.

nikic: No need for the NewGEP variable.

}

nikicUnsubmitted

Done

This inbounds preservation is incorrect: https://alive2.llvm.org/ce/z/bJZvQG

It's even incorrect if the add is also nsw.

nikic: This inbounds preservation is incorrect: https://alive2.llvm.org/ce/z/bJZvQG It's even…

}

if (!GEP.isInBounds()) { if (!GEP.isInBounds()) {

unsigned IdxWidth = unsigned IdxWidth =

DL.getIndexSizeInBits(PtrOp->getType()->getPointerAddressSpace()); DL.getIndexSizeInBits(PtrOp->getType()->getPointerAddressSpace());

APInt BasePtrOffset(IdxWidth, 0); APInt BasePtrOffset(IdxWidth, 0);

Value *UnderlyingPtrOp = Value *UnderlyingPtrOp =

PtrOp->stripAndAccumulateInBoundsConstantOffsets(DL, PtrOp->stripAndAccumulateInBoundsConstantOffsets(DL,

BasePtrOffset); BasePtrOffset);

bool CanBeNull, CanBeFreed; bool CanBeNull, CanBeFreed;

▲ Show 20 Lines • Show All 2,194 Lines • Show Last 20 Lines

llvm/test/CodeGen/Hexagon/autohvx/vector-align-tbaa.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -mtriple=hexagon -S -hexagon-vc -instcombine -hvc-va-full-stores < %s \| FileCheck %s			; RUN: opt -mtriple=hexagon -S -hexagon-vc -instcombine -hvc-va-full-stores < %s \| FileCheck %s

	; Check that Hexagon Vector Combine propagates (TBAA) metadata to the			; Check that Hexagon Vector Combine propagates (TBAA) metadata to the
	; generated output. (Use instcombine to clean the output up a bit.)			; generated output. (Use instcombine to clean the output up a bit.)

	target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"			target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
	target triple = "hexagon"			target triple = "hexagon"

	; Two unaligned loads, both with the same TBAA tag.			; Two unaligned loads, both with the same TBAA tag.
	;			;
	define <64 x i16> @f0(ptr %a0, i32 %a1) #0 {			define <64 x i16> @f0(ptr %a0, i32 %a1) #0 {
	; CHECK-LABEL: @f0(			; CHECK-LABEL: @f0(
	; CHECK-NEXT: b0:			; CHECK-NEXT: b0:
	; CHECK-NEXT: [[V0:%.]] = add i32 [[A1:%.]], 64			; CHECK-NEXT: [[TMP0:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[A1:%.*]]
	; CHECK-NEXT: [[V1:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[V0]]			; CHECK-NEXT: [[V1:%.*]] = getelementptr i16, ptr [[TMP0]], i32 64
	; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = and i32 [[PTI]], -128			; CHECK-NEXT: [[AND:%.*]] = and i32 [[PTI]], -128
	; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[ADD]] to ptr			; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[AND]] to ptr
	; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ALD14:%.*]] = load <32 x i32>, ptr [[ITP]], align 128, !tbaa [[TBAA0:![0-9]+]]			; CHECK-NEXT: [[ALD15:%.*]] = load <32 x i32>, ptr [[ITP]], align 128, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128
	; CHECK-NEXT: [[ALD2:%.*]] = load <128 x i8>, ptr [[GEP]], align 128, !tbaa [[TBAA0]]			; CHECK-NEXT: [[ALD2:%.*]] = load <128 x i8>, ptr [[GEP]], align 128, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr i8, ptr [[ITP]], i32 256			; CHECK-NEXT: [[GEP3:%.*]] = getelementptr i8, ptr [[ITP]], i32 256
	; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[PTI1]], 127			; CHECK-NEXT: [[AND4:%.*]] = and i32 [[PTI1]], 127
	; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[TMP0]], 0			; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[AND4]], 0
	; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP3]], i32 0), !tbaa [[TBAA0]]			; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP3]], i32 0), !tbaa [[TBAA0]]
	; CHECK-NEXT: [[CST4:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>			; CHECK-NEXT: [[CST5:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>
	; CHECK-NEXT: [[CUP6:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CST4]], <32 x i32> [[ALD14]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP7:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CST5]], <32 x i32> [[ALD15]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP6]] to <64 x i16>			; CHECK-NEXT: [[CST13:%.*]] = bitcast <32 x i32> [[CUP7]] to <64 x i16>
	; CHECK-NEXT: [[CST9:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>			; CHECK-NEXT: [[CST10:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>
	; CHECK-NEXT: [[CUP10:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CUP]], <32 x i32> [[CST9]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CUP]], <32 x i32> [[CST10]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST13:%.*]] = bitcast <32 x i32> [[CUP10]] to <64 x i16>			; CHECK-NEXT: [[CST14:%.*]] = bitcast <32 x i32> [[CUP11]] to <64 x i16>
	; CHECK-NEXT: [[V8:%.*]] = add <64 x i16> [[CST12]], [[CST13]]			; CHECK-NEXT: [[V8:%.*]] = add <64 x i16> [[CST13]], [[CST14]]
	; CHECK-NEXT: ret <64 x i16> [[V8]]			; CHECK-NEXT: ret <64 x i16> [[V8]]
	;			;
	b0:			b0:
	%v0 = add i32 %a1, 64			%v0 = add i32 %a1, 64
	%v1 = getelementptr i16, ptr %a0, i32 %v0			%v1 = getelementptr i16, ptr %a0, i32 %v0
	%v3 = load <64 x i16>, ptr %v1, align 2, !tbaa !0			%v3 = load <64 x i16>, ptr %v1, align 2, !tbaa !0
	%v4 = add i32 %a1, 128			%v4 = add i32 %a1, 128
	%v5 = getelementptr i16, ptr %a0, i32 %v4			%v5 = getelementptr i16, ptr %a0, i32 %v4
	%v7 = load <64 x i16>, ptr %v5, align 2, !tbaa !0			%v7 = load <64 x i16>, ptr %v5, align 2, !tbaa !0
	%v8 = add <64 x i16> %v3, %v7			%v8 = add <64 x i16> %v3, %v7
	ret <64 x i16> %v8			ret <64 x i16> %v8
	}			}

	; Two unaligned loads, only one with a TBAA tag.			; Two unaligned loads, only one with a TBAA tag.
	;			;
	define <64 x i16> @f1(ptr %a0, i32 %a1) #0 {			define <64 x i16> @f1(ptr %a0, i32 %a1) #0 {
	; CHECK-LABEL: @f1(			; CHECK-LABEL: @f1(
	; CHECK-NEXT: b0:			; CHECK-NEXT: b0:
	; CHECK-NEXT: [[V0:%.]] = add i32 [[A1:%.]], 64			; CHECK-NEXT: [[TMP0:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[A1:%.*]]
	; CHECK-NEXT: [[V1:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[V0]]			; CHECK-NEXT: [[V1:%.*]] = getelementptr i16, ptr [[TMP0]], i32 64
	; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = and i32 [[PTI]], -128			; CHECK-NEXT: [[AND:%.*]] = and i32 [[PTI]], -128
	; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[ADD]] to ptr			; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[AND]] to ptr
	; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ALD14:%.*]] = load <32 x i32>, ptr [[ITP]], align 128, !tbaa [[TBAA0]]			; CHECK-NEXT: [[ALD15:%.*]] = load <32 x i32>, ptr [[ITP]], align 128, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128
	; CHECK-NEXT: [[ALD2:%.*]] = load <128 x i8>, ptr [[GEP]], align 128			; CHECK-NEXT: [[ALD2:%.*]] = load <128 x i8>, ptr [[GEP]], align 128
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr i8, ptr [[ITP]], i32 256			; CHECK-NEXT: [[GEP3:%.*]] = getelementptr i8, ptr [[ITP]], i32 256
	; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[PTI1]], 127			; CHECK-NEXT: [[AND4:%.*]] = and i32 [[PTI1]], 127
	; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[TMP0]], 0			; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[AND4]], 0
	; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP3]], i32 0)			; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP3]], i32 0)
	; CHECK-NEXT: [[CST4:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>			; CHECK-NEXT: [[CST5:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>
	; CHECK-NEXT: [[CUP6:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CST4]], <32 x i32> [[ALD14]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP7:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CST5]], <32 x i32> [[ALD15]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP6]] to <64 x i16>			; CHECK-NEXT: [[CST13:%.*]] = bitcast <32 x i32> [[CUP7]] to <64 x i16>
	; CHECK-NEXT: [[CST9:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>			; CHECK-NEXT: [[CST10:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>
	; CHECK-NEXT: [[CUP10:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CUP]], <32 x i32> [[CST9]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CUP]], <32 x i32> [[CST10]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST13:%.*]] = bitcast <32 x i32> [[CUP10]] to <64 x i16>			; CHECK-NEXT: [[CST14:%.*]] = bitcast <32 x i32> [[CUP11]] to <64 x i16>
	; CHECK-NEXT: [[V8:%.*]] = add <64 x i16> [[CST12]], [[CST13]]			; CHECK-NEXT: [[V8:%.*]] = add <64 x i16> [[CST13]], [[CST14]]
	; CHECK-NEXT: ret <64 x i16> [[V8]]			; CHECK-NEXT: ret <64 x i16> [[V8]]
	;			;
	b0:			b0:
	%v0 = add i32 %a1, 64			%v0 = add i32 %a1, 64
	%v1 = getelementptr i16, ptr %a0, i32 %v0			%v1 = getelementptr i16, ptr %a0, i32 %v0
	%v3 = load <64 x i16>, ptr %v1, align 2, !tbaa !0			%v3 = load <64 x i16>, ptr %v1, align 2, !tbaa !0
	%v4 = add i32 %a1, 128			%v4 = add i32 %a1, 128
	%v5 = getelementptr i16, ptr %a0, i32 %v4			%v5 = getelementptr i16, ptr %a0, i32 %v4
	%v7 = load <64 x i16>, ptr %v5, align 2			%v7 = load <64 x i16>, ptr %v5, align 2
	%v8 = add <64 x i16> %v3, %v7			%v8 = add <64 x i16> %v3, %v7
	ret <64 x i16> %v8			ret <64 x i16> %v8
	}			}

	; Two unaligned loads, with different TBAA tags.			; Two unaligned loads, with different TBAA tags.
	;			;
	define <64 x i16> @f2(ptr %a0, i32 %a1) #0 {			define <64 x i16> @f2(ptr %a0, i32 %a1) #0 {
	; CHECK-LABEL: @f2(			; CHECK-LABEL: @f2(
	; CHECK-NEXT: b0:			; CHECK-NEXT: b0:
	; CHECK-NEXT: [[V0:%.]] = add i32 [[A1:%.]], 64			; CHECK-NEXT: [[TMP0:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[A1:%.*]]
	; CHECK-NEXT: [[V1:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[V0]]			; CHECK-NEXT: [[V1:%.*]] = getelementptr i16, ptr [[TMP0]], i32 64
	; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = and i32 [[PTI]], -128			; CHECK-NEXT: [[AND:%.*]] = and i32 [[PTI]], -128
	; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[ADD]] to ptr			; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[AND]] to ptr
	; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ALD14:%.*]] = load <32 x i32>, ptr [[ITP]], align 128, !tbaa [[TBAA0]]			; CHECK-NEXT: [[ALD15:%.*]] = load <32 x i32>, ptr [[ITP]], align 128, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128
	; CHECK-NEXT: [[ALD2:%.*]] = load <128 x i8>, ptr [[GEP]], align 128			; CHECK-NEXT: [[ALD2:%.*]] = load <128 x i8>, ptr [[GEP]], align 128
	; CHECK-NEXT: [[GEP3:%.*]] = getelementptr i8, ptr [[ITP]], i32 256			; CHECK-NEXT: [[GEP3:%.*]] = getelementptr i8, ptr [[ITP]], i32 256
	; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[PTI1]], 127			; CHECK-NEXT: [[AND4:%.*]] = and i32 [[PTI1]], 127
	; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[TMP0]], 0			; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[AND4]], 0
	; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP3]], i32 0), !tbaa [[TBAA3:![0-9]+]]			; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP3]], i32 0), !tbaa [[TBAA3:![0-9]+]]
	; CHECK-NEXT: [[CST4:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>			; CHECK-NEXT: [[CST5:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>
	; CHECK-NEXT: [[CUP6:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CST4]], <32 x i32> [[ALD14]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP7:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CST5]], <32 x i32> [[ALD15]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP6]] to <64 x i16>			; CHECK-NEXT: [[CST13:%.*]] = bitcast <32 x i32> [[CUP7]] to <64 x i16>
	; CHECK-NEXT: [[CST9:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>			; CHECK-NEXT: [[CST10:%.*]] = bitcast <128 x i8> [[ALD2]] to <32 x i32>
	; CHECK-NEXT: [[CUP10:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CUP]], <32 x i32> [[CST9]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.valignb.128B(<32 x i32> [[CUP]], <32 x i32> [[CST10]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST13:%.*]] = bitcast <32 x i32> [[CUP10]] to <64 x i16>			; CHECK-NEXT: [[CST14:%.*]] = bitcast <32 x i32> [[CUP11]] to <64 x i16>
	; CHECK-NEXT: [[V8:%.*]] = add <64 x i16> [[CST12]], [[CST13]]			; CHECK-NEXT: [[V8:%.*]] = add <64 x i16> [[CST13]], [[CST14]]
	; CHECK-NEXT: ret <64 x i16> [[V8]]			; CHECK-NEXT: ret <64 x i16> [[V8]]
	;			;
	b0:			b0:
	%v0 = add i32 %a1, 64			%v0 = add i32 %a1, 64
	%v1 = getelementptr i16, ptr %a0, i32 %v0			%v1 = getelementptr i16, ptr %a0, i32 %v0
	%v3 = load <64 x i16>, ptr %v1, align 2, !tbaa !0			%v3 = load <64 x i16>, ptr %v1, align 2, !tbaa !0
	%v4 = add i32 %a1, 128			%v4 = add i32 %a1, 128
	%v5 = getelementptr i16, ptr %a0, i32 %v4			%v5 = getelementptr i16, ptr %a0, i32 %v4
	%v7 = load <64 x i16>, ptr %v5, align 2, !tbaa !3			%v7 = load <64 x i16>, ptr %v5, align 2, !tbaa !3
	%v8 = add <64 x i16> %v3, %v7			%v8 = add <64 x i16> %v3, %v7
	ret <64 x i16> %v8			ret <64 x i16> %v8
	}			}

	; Two unaligned stores, both with the same TBAA tag.			; Two unaligned stores, both with the same TBAA tag.
	;			;
	define void @f3(ptr %a0, i32 %a1, <64 x i16> %a2, <64 x i16> %a3) #0 {			define void @f3(ptr %a0, i32 %a1, <64 x i16> %a2, <64 x i16> %a3) #0 {
	; CHECK-LABEL: @f3(			; CHECK-LABEL: @f3(
	; CHECK-NEXT: b0:			; CHECK-NEXT: b0:
	; CHECK-NEXT: [[V0:%.]] = add i32 [[A1:%.]], 64			; CHECK-NEXT: [[TMP0:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[A1:%.*]]
	; CHECK-NEXT: [[V1:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[V0]]			; CHECK-NEXT: [[V1:%.*]] = getelementptr i16, ptr [[TMP0]], i32 64
	; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = and i32 [[PTI]], -128			; CHECK-NEXT: [[AND:%.*]] = and i32 [[PTI]], -128
	; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[ADD]] to ptr			; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[AND]] to ptr
	; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[CST3:%.]] = bitcast <64 x i16> [[A2:%.]] to <32 x i32>			; CHECK-NEXT: [[CST3:%.]] = bitcast <64 x i16> [[A2:%.]] to <32 x i32>
	; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST3]], <32 x i32> undef, i32 [[PTI1]])			; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST3]], <32 x i32> undef, i32 [[PTI1]])
	; CHECK-NEXT: [[CST4:%.*]] = bitcast <32 x i32> [[CUP]] to <128 x i8>			; CHECK-NEXT: [[CST4:%.*]] = bitcast <32 x i32> [[CUP]] to <128 x i8>
	; CHECK-NEXT: [[CUP5:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> zeroinitializer, i32 [[PTI1]])			; CHECK-NEXT: [[CUP5:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> zeroinitializer, i32 [[PTI1]])
	; CHECK-NEXT: [[CST6:%.*]] = bitcast <32 x i32> [[CUP5]] to <128 x i8>			; CHECK-NEXT: [[CST6:%.*]] = bitcast <32 x i32> [[CUP5]] to <128 x i8>
	; CHECK-NEXT: [[CST7:%.]] = bitcast <64 x i16> [[A3:%.]] to <32 x i32>			; CHECK-NEXT: [[CST7:%.]] = bitcast <64 x i16> [[A3:%.]] to <32 x i32>
	; CHECK-NEXT: [[CST8:%.*]] = bitcast <64 x i16> [[A2]] to <32 x i32>			; CHECK-NEXT: [[CST8:%.*]] = bitcast <64 x i16> [[A2]] to <32 x i32>
	; CHECK-NEXT: [[CUP9:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST7]], <32 x i32> [[CST8]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP9:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST7]], <32 x i32> [[CST8]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST10:%.*]] = bitcast <32 x i32> [[CUP9]] to <128 x i8>			; CHECK-NEXT: [[CST10:%.*]] = bitcast <32 x i32> [[CUP9]] to <128 x i8>
	; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])			; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])
	; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP11]] to <128 x i8>			; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP11]] to <128 x i8>
	; CHECK-NEXT: [[CST13:%.*]] = bitcast <64 x i16> [[A3]] to <32 x i32>			; CHECK-NEXT: [[CST13:%.*]] = bitcast <64 x i16> [[A3]] to <32 x i32>
	; CHECK-NEXT: [[CUP14:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> undef, <32 x i32> [[CST13]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP14:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> undef, <32 x i32> [[CST13]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST15:%.*]] = bitcast <32 x i32> [[CUP14]] to <128 x i8>			; CHECK-NEXT: [[CST15:%.*]] = bitcast <32 x i32> [[CUP14]] to <128 x i8>
	; CHECK-NEXT: [[CUP16:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> zeroinitializer, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])			; CHECK-NEXT: [[CUP16:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> zeroinitializer, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])
	; CHECK-NEXT: [[CST17:%.*]] = bitcast <32 x i32> [[CUP16]] to <128 x i8>			; CHECK-NEXT: [[CST17:%.*]] = bitcast <32 x i32> [[CUP16]] to <128 x i8>
	; CHECK-NEXT: [[TRN:%.*]] = trunc <128 x i8> [[CST6]] to <128 x i1>			; CHECK-NEXT: [[TRN:%.*]] = trunc <128 x i8> [[CST6]] to <128 x i1>
	; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST4]], ptr [[ITP]], i32 128, <128 x i1> [[TRN]]), !tbaa [[TBAA5:![0-9]+]]			; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST4]], ptr [[ITP]], i32 128, <128 x i1> [[TRN]]), !tbaa [[TBAA5:![0-9]+]]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128
	; CHECK-NEXT: [[TRN18:%.*]] = trunc <128 x i8> [[CST12]] to <128 x i1>			; CHECK-NEXT: [[TRN18:%.*]] = trunc <128 x i8> [[CST12]] to <128 x i1>
	; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST10]], ptr [[GEP]], i32 128, <128 x i1> [[TRN18]]), !tbaa [[TBAA5]]			; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST10]], ptr [[GEP]], i32 128, <128 x i1> [[TRN18]]), !tbaa [[TBAA5]]
	; CHECK-NEXT: [[GEP19:%.*]] = getelementptr i8, ptr [[ITP]], i32 256			; CHECK-NEXT: [[GEP19:%.*]] = getelementptr i8, ptr [[ITP]], i32 256
	; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[PTI1]], 127			; CHECK-NEXT: [[AND20:%.*]] = and i32 [[PTI1]], 127
	; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[TMP0]], 0			; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[AND20]], 0
	; CHECK-NEXT: [[TRN20:%.*]] = trunc <128 x i8> [[CST17]] to <128 x i1>			; CHECK-NEXT: [[TRN21:%.*]] = trunc <128 x i8> [[CST17]] to <128 x i1>
	; CHECK-NEXT: [[CUP21:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0), !tbaa [[TBAA5]]			; CHECK-NEXT: [[CUP22:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0), !tbaa [[TBAA5]]
	; CHECK-NEXT: [[CST22:%.*]] = bitcast <32 x i32> [[CUP21]] to <128 x i8>			; CHECK-NEXT: [[CST23:%.*]] = bitcast <32 x i32> [[CUP22]] to <128 x i8>
	; CHECK-NEXT: [[TMP1:%.*]] = select <128 x i1> [[TRN20]], <128 x i8> [[CST15]], <128 x i8> [[CST22]]			; CHECK-NEXT: [[TMP1:%.*]] = select <128 x i1> [[TRN21]], <128 x i8> [[CST15]], <128 x i8> [[CST23]]
	; CHECK-NEXT: [[CST23:%.*]] = bitcast <128 x i8> [[TMP1]] to <32 x i32>			; CHECK-NEXT: [[CST24:%.*]] = bitcast <128 x i8> [[TMP1]] to <32 x i32>
	; CHECK-NEXT: call void @llvm.hexagon.V6.vS32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0, <32 x i32> [[CST23]]), !tbaa [[TBAA5]]			; CHECK-NEXT: call void @llvm.hexagon.V6.vS32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0, <32 x i32> [[CST24]]), !tbaa [[TBAA5]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	b0:			b0:
	%v0 = add i32 %a1, 64			%v0 = add i32 %a1, 64
	%v1 = getelementptr i16, ptr %a0, i32 %v0			%v1 = getelementptr i16, ptr %a0, i32 %v0
	store <64 x i16> %a2, ptr %v1, align 2, !tbaa !5			store <64 x i16> %a2, ptr %v1, align 2, !tbaa !5
	%v3 = add i32 %a1, 128			%v3 = add i32 %a1, 128
	%v4 = getelementptr i16, ptr %a0, i32 %v3			%v4 = getelementptr i16, ptr %a0, i32 %v3
	store <64 x i16> %a3, ptr %v4, align 2, !tbaa !5			store <64 x i16> %a3, ptr %v4, align 2, !tbaa !5
	ret void			ret void
	}			}

	; Two unaligned stores, only one with a TBAA tag.			; Two unaligned stores, only one with a TBAA tag.
	;			;
	define void @f4(ptr %a0, i32 %a1, <64 x i16> %a2, <64 x i16> %a3) #0 {			define void @f4(ptr %a0, i32 %a1, <64 x i16> %a2, <64 x i16> %a3) #0 {
	; CHECK-LABEL: @f4(			; CHECK-LABEL: @f4(
	; CHECK-NEXT: b0:			; CHECK-NEXT: b0:
	; CHECK-NEXT: [[V0:%.]] = add i32 [[A1:%.]], 64			; CHECK-NEXT: [[TMP0:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[A1:%.*]]
	; CHECK-NEXT: [[V1:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[V0]]			; CHECK-NEXT: [[V1:%.*]] = getelementptr i16, ptr [[TMP0]], i32 64
	; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = and i32 [[PTI]], -128			; CHECK-NEXT: [[AND:%.*]] = and i32 [[PTI]], -128
	; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[ADD]] to ptr			; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[AND]] to ptr
	; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[CST3:%.]] = bitcast <64 x i16> [[A2:%.]] to <32 x i32>			; CHECK-NEXT: [[CST3:%.]] = bitcast <64 x i16> [[A2:%.]] to <32 x i32>
	; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST3]], <32 x i32> undef, i32 [[PTI1]])			; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST3]], <32 x i32> undef, i32 [[PTI1]])
	; CHECK-NEXT: [[CST4:%.*]] = bitcast <32 x i32> [[CUP]] to <128 x i8>			; CHECK-NEXT: [[CST4:%.*]] = bitcast <32 x i32> [[CUP]] to <128 x i8>
	; CHECK-NEXT: [[CUP5:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> zeroinitializer, i32 [[PTI1]])			; CHECK-NEXT: [[CUP5:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> zeroinitializer, i32 [[PTI1]])
	; CHECK-NEXT: [[CST6:%.*]] = bitcast <32 x i32> [[CUP5]] to <128 x i8>			; CHECK-NEXT: [[CST6:%.*]] = bitcast <32 x i32> [[CUP5]] to <128 x i8>
	; CHECK-NEXT: [[CST7:%.]] = bitcast <64 x i16> [[A3:%.]] to <32 x i32>			; CHECK-NEXT: [[CST7:%.]] = bitcast <64 x i16> [[A3:%.]] to <32 x i32>
	; CHECK-NEXT: [[CST8:%.*]] = bitcast <64 x i16> [[A2]] to <32 x i32>			; CHECK-NEXT: [[CST8:%.*]] = bitcast <64 x i16> [[A2]] to <32 x i32>
	; CHECK-NEXT: [[CUP9:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST7]], <32 x i32> [[CST8]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP9:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST7]], <32 x i32> [[CST8]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST10:%.*]] = bitcast <32 x i32> [[CUP9]] to <128 x i8>			; CHECK-NEXT: [[CST10:%.*]] = bitcast <32 x i32> [[CUP9]] to <128 x i8>
	; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])			; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])
	; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP11]] to <128 x i8>			; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP11]] to <128 x i8>
	; CHECK-NEXT: [[CST13:%.*]] = bitcast <64 x i16> [[A3]] to <32 x i32>			; CHECK-NEXT: [[CST13:%.*]] = bitcast <64 x i16> [[A3]] to <32 x i32>
	; CHECK-NEXT: [[CUP14:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> undef, <32 x i32> [[CST13]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP14:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> undef, <32 x i32> [[CST13]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST15:%.*]] = bitcast <32 x i32> [[CUP14]] to <128 x i8>			; CHECK-NEXT: [[CST15:%.*]] = bitcast <32 x i32> [[CUP14]] to <128 x i8>
	; CHECK-NEXT: [[CUP16:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> zeroinitializer, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])			; CHECK-NEXT: [[CUP16:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> zeroinitializer, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])
	; CHECK-NEXT: [[CST17:%.*]] = bitcast <32 x i32> [[CUP16]] to <128 x i8>			; CHECK-NEXT: [[CST17:%.*]] = bitcast <32 x i32> [[CUP16]] to <128 x i8>
	; CHECK-NEXT: [[TRN:%.*]] = trunc <128 x i8> [[CST6]] to <128 x i1>			; CHECK-NEXT: [[TRN:%.*]] = trunc <128 x i8> [[CST6]] to <128 x i1>
	; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST4]], ptr [[ITP]], i32 128, <128 x i1> [[TRN]])			; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST4]], ptr [[ITP]], i32 128, <128 x i1> [[TRN]])
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128
	; CHECK-NEXT: [[TRN18:%.*]] = trunc <128 x i8> [[CST12]] to <128 x i1>			; CHECK-NEXT: [[TRN18:%.*]] = trunc <128 x i8> [[CST12]] to <128 x i1>
	; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST10]], ptr [[GEP]], i32 128, <128 x i1> [[TRN18]])			; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST10]], ptr [[GEP]], i32 128, <128 x i1> [[TRN18]])
	; CHECK-NEXT: [[GEP19:%.*]] = getelementptr i8, ptr [[ITP]], i32 256			; CHECK-NEXT: [[GEP19:%.*]] = getelementptr i8, ptr [[ITP]], i32 256
	; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[PTI1]], 127			; CHECK-NEXT: [[AND20:%.*]] = and i32 [[PTI1]], 127
	; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[TMP0]], 0			; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[AND20]], 0
	; CHECK-NEXT: [[TRN20:%.*]] = trunc <128 x i8> [[CST17]] to <128 x i1>			; CHECK-NEXT: [[TRN21:%.*]] = trunc <128 x i8> [[CST17]] to <128 x i1>
	; CHECK-NEXT: [[CUP21:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0), !tbaa [[TBAA5]]			; CHECK-NEXT: [[CUP22:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0), !tbaa [[TBAA5]]
	; CHECK-NEXT: [[CST22:%.*]] = bitcast <32 x i32> [[CUP21]] to <128 x i8>			; CHECK-NEXT: [[CST23:%.*]] = bitcast <32 x i32> [[CUP22]] to <128 x i8>
	; CHECK-NEXT: [[TMP1:%.*]] = select <128 x i1> [[TRN20]], <128 x i8> [[CST15]], <128 x i8> [[CST22]]			; CHECK-NEXT: [[TMP1:%.*]] = select <128 x i1> [[TRN21]], <128 x i8> [[CST15]], <128 x i8> [[CST23]]
	; CHECK-NEXT: [[CST23:%.*]] = bitcast <128 x i8> [[TMP1]] to <32 x i32>			; CHECK-NEXT: [[CST24:%.*]] = bitcast <128 x i8> [[TMP1]] to <32 x i32>
	; CHECK-NEXT: call void @llvm.hexagon.V6.vS32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0, <32 x i32> [[CST23]]), !tbaa [[TBAA5]]			; CHECK-NEXT: call void @llvm.hexagon.V6.vS32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0, <32 x i32> [[CST24]]), !tbaa [[TBAA5]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	b0:			b0:
	%v0 = add i32 %a1, 64			%v0 = add i32 %a1, 64
	%v1 = getelementptr i16, ptr %a0, i32 %v0			%v1 = getelementptr i16, ptr %a0, i32 %v0
	store <64 x i16> %a2, ptr %v1, align 2			store <64 x i16> %a2, ptr %v1, align 2
	%v3 = add i32 %a1, 128			%v3 = add i32 %a1, 128
	%v4 = getelementptr i16, ptr %a0, i32 %v3			%v4 = getelementptr i16, ptr %a0, i32 %v3
	store <64 x i16> %a3, ptr %v4, align 2, !tbaa !5			store <64 x i16> %a3, ptr %v4, align 2, !tbaa !5
	ret void			ret void
	}			}

	; Two unaligned store, with different TBAA tags.			; Two unaligned store, with different TBAA tags.
	;			;
	define void @f5(ptr %a0, i32 %a1, <64 x i16> %a2, <64 x i16> %a3) #0 {			define void @f5(ptr %a0, i32 %a1, <64 x i16> %a2, <64 x i16> %a3) #0 {
	; CHECK-LABEL: @f5(			; CHECK-LABEL: @f5(
	; CHECK-NEXT: b0:			; CHECK-NEXT: b0:
	; CHECK-NEXT: [[V0:%.]] = add i32 [[A1:%.]], 64			; CHECK-NEXT: [[TMP0:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[A1:%.*]]
	; CHECK-NEXT: [[V1:%.]] = getelementptr i16, ptr [[A0:%.]], i32 [[V0]]			; CHECK-NEXT: [[V1:%.*]] = getelementptr i16, ptr [[TMP0]], i32 64
	; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = and i32 [[PTI]], -128			; CHECK-NEXT: [[AND:%.*]] = and i32 [[PTI]], -128
	; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[ADD]] to ptr			; CHECK-NEXT: [[ITP:%.*]] = inttoptr i32 [[AND]] to ptr
	; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32			; CHECK-NEXT: [[PTI1:%.*]] = ptrtoint ptr [[V1]] to i32
	; CHECK-NEXT: [[CST3:%.]] = bitcast <64 x i16> [[A2:%.]] to <32 x i32>			; CHECK-NEXT: [[CST3:%.]] = bitcast <64 x i16> [[A2:%.]] to <32 x i32>
	; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST3]], <32 x i32> undef, i32 [[PTI1]])			; CHECK-NEXT: [[CUP:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST3]], <32 x i32> undef, i32 [[PTI1]])
	; CHECK-NEXT: [[CST4:%.*]] = bitcast <32 x i32> [[CUP]] to <128 x i8>			; CHECK-NEXT: [[CST4:%.*]] = bitcast <32 x i32> [[CUP]] to <128 x i8>
	; CHECK-NEXT: [[CUP5:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> zeroinitializer, i32 [[PTI1]])			; CHECK-NEXT: [[CUP5:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> zeroinitializer, i32 [[PTI1]])
	; CHECK-NEXT: [[CST6:%.*]] = bitcast <32 x i32> [[CUP5]] to <128 x i8>			; CHECK-NEXT: [[CST6:%.*]] = bitcast <32 x i32> [[CUP5]] to <128 x i8>
	; CHECK-NEXT: [[CST7:%.]] = bitcast <64 x i16> [[A3:%.]] to <32 x i32>			; CHECK-NEXT: [[CST7:%.]] = bitcast <64 x i16> [[A3:%.]] to <32 x i32>
	; CHECK-NEXT: [[CST8:%.*]] = bitcast <64 x i16> [[A2]] to <32 x i32>			; CHECK-NEXT: [[CST8:%.*]] = bitcast <64 x i16> [[A2]] to <32 x i32>
	; CHECK-NEXT: [[CUP9:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST7]], <32 x i32> [[CST8]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP9:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> [[CST7]], <32 x i32> [[CST8]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST10:%.*]] = bitcast <32 x i32> [[CUP9]] to <128 x i8>			; CHECK-NEXT: [[CST10:%.*]] = bitcast <32 x i32> [[CUP9]] to <128 x i8>
	; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])			; CHECK-NEXT: [[CUP11:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])
	; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP11]] to <128 x i8>			; CHECK-NEXT: [[CST12:%.*]] = bitcast <32 x i32> [[CUP11]] to <128 x i8>
	; CHECK-NEXT: [[CST13:%.*]] = bitcast <64 x i16> [[A3]] to <32 x i32>			; CHECK-NEXT: [[CST13:%.*]] = bitcast <64 x i16> [[A3]] to <32 x i32>
	; CHECK-NEXT: [[CUP14:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> undef, <32 x i32> [[CST13]], i32 [[PTI1]])			; CHECK-NEXT: [[CUP14:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> undef, <32 x i32> [[CST13]], i32 [[PTI1]])
	; CHECK-NEXT: [[CST15:%.*]] = bitcast <32 x i32> [[CUP14]] to <128 x i8>			; CHECK-NEXT: [[CST15:%.*]] = bitcast <32 x i32> [[CUP14]] to <128 x i8>
	; CHECK-NEXT: [[CUP16:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> zeroinitializer, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])			; CHECK-NEXT: [[CUP16:%.*]] = call <32 x i32> @llvm.hexagon.V6.vlalignb.128B(<32 x i32> zeroinitializer, <32 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, i32 [[PTI1]])
	; CHECK-NEXT: [[CST17:%.*]] = bitcast <32 x i32> [[CUP16]] to <128 x i8>			; CHECK-NEXT: [[CST17:%.*]] = bitcast <32 x i32> [[CUP16]] to <128 x i8>
	; CHECK-NEXT: [[TRN:%.*]] = trunc <128 x i8> [[CST6]] to <128 x i1>			; CHECK-NEXT: [[TRN:%.*]] = trunc <128 x i8> [[CST6]] to <128 x i1>
	; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST4]], ptr [[ITP]], i32 128, <128 x i1> [[TRN]]), !tbaa [[TBAA5]]			; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST4]], ptr [[ITP]], i32 128, <128 x i1> [[TRN]]), !tbaa [[TBAA5]]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ITP]], i32 128
	; CHECK-NEXT: [[TRN18:%.*]] = trunc <128 x i8> [[CST12]] to <128 x i1>			; CHECK-NEXT: [[TRN18:%.*]] = trunc <128 x i8> [[CST12]] to <128 x i1>
	; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST10]], ptr [[GEP]], i32 128, <128 x i1> [[TRN18]])			; CHECK-NEXT: call void @llvm.masked.store.v128i8.p0(<128 x i8> [[CST10]], ptr [[GEP]], i32 128, <128 x i1> [[TRN18]])
	; CHECK-NEXT: [[GEP19:%.*]] = getelementptr i8, ptr [[ITP]], i32 256			; CHECK-NEXT: [[GEP19:%.*]] = getelementptr i8, ptr [[ITP]], i32 256
	; CHECK-NEXT: [[TMP0:%.*]] = and i32 [[PTI1]], 127			; CHECK-NEXT: [[AND20:%.*]] = and i32 [[PTI1]], 127
	; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[TMP0]], 0			; CHECK-NEXT: [[ISZ:%.*]] = icmp ne i32 [[AND20]], 0
	; CHECK-NEXT: [[TRN20:%.*]] = trunc <128 x i8> [[CST17]] to <128 x i1>			; CHECK-NEXT: [[TRN21:%.*]] = trunc <128 x i8> [[CST17]] to <128 x i1>
	; CHECK-NEXT: [[CUP21:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0), !tbaa [[TBAA7:![0-9]+]]			; CHECK-NEXT: [[CUP22:%.*]] = call <32 x i32> @llvm.hexagon.V6.vL32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0), !tbaa [[TBAA7:![0-9]+]]
	; CHECK-NEXT: [[CST22:%.*]] = bitcast <32 x i32> [[CUP21]] to <128 x i8>			; CHECK-NEXT: [[CST23:%.*]] = bitcast <32 x i32> [[CUP22]] to <128 x i8>
	; CHECK-NEXT: [[TMP1:%.*]] = select <128 x i1> [[TRN20]], <128 x i8> [[CST15]], <128 x i8> [[CST22]]			; CHECK-NEXT: [[TMP1:%.*]] = select <128 x i1> [[TRN21]], <128 x i8> [[CST15]], <128 x i8> [[CST23]]
	; CHECK-NEXT: [[CST23:%.*]] = bitcast <128 x i8> [[TMP1]] to <32 x i32>			; CHECK-NEXT: [[CST24:%.*]] = bitcast <128 x i8> [[TMP1]] to <32 x i32>
	; CHECK-NEXT: call void @llvm.hexagon.V6.vS32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0, <32 x i32> [[CST23]]), !tbaa [[TBAA7]]			; CHECK-NEXT: call void @llvm.hexagon.V6.vS32b.pred.ai.128B(i1 [[ISZ]], ptr [[GEP19]], i32 0, <32 x i32> [[CST24]]), !tbaa [[TBAA7]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	b0:			b0:
	%v0 = add i32 %a1, 64			%v0 = add i32 %a1, 64
	%v1 = getelementptr i16, ptr %a0, i32 %v0			%v1 = getelementptr i16, ptr %a0, i32 %v0
	store <64 x i16> %a2, ptr %v1, align 2, !tbaa !5			store <64 x i16> %a2, ptr %v1, align 2, !tbaa !5
	%v3 = add i32 %a1, 128			%v3 = add i32 %a1, 128
	%v4 = getelementptr i16, ptr %a0, i32 %v3			%v4 = getelementptr i16, ptr %a0, i32 %v3
	Show All 15 Lines

llvm/test/Transforms/InstCombine/align-addr.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=instcombine -S \| FileCheck %s			; RUN: opt < %s -passes=instcombine -S \| FileCheck %s
	target datalayout = "E-p:64:64:64-p1:32:32:32-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"			target datalayout = "E-p:64:64:64-p1:32:32:32-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

				; Instcombine should be able to prove vector alignment in the
				; presence of a few mild address computation tricks.

	define void @test0(ptr %b, i64 %n, i64 %u, i64 %y) nounwind {			define void @test0(ptr %b, i64 %n, i64 %u, i64 %y) nounwind {
	; CHECK-LABEL: @test0(			; CHECK-LABEL: @test0(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[C:%.]] = ptrtoint ptr [[B:%.]] to i64			; CHECK-NEXT: [[C:%.]] = ptrtoint ptr [[B:%.]] to i64
	; CHECK-NEXT: [[D:%.*]] = and i64 [[C]], -16			; CHECK-NEXT: [[D:%.*]] = and i64 [[C]], -16
	; CHECK-NEXT: [[E:%.*]] = inttoptr i64 [[D]] to ptr			; CHECK-NEXT: [[E:%.*]] = inttoptr i64 [[D]] to ptr
	; CHECK-NEXT: [[V:%.]] = shl i64 [[U:%.]], 1			; CHECK-NEXT: [[V:%.]] = shl i64 [[U:%.]], 1
	; CHECK-NEXT: [[Z:%.]] = and i64 [[Y:%.]], -2			; CHECK-NEXT: [[Z:%.]] = and i64 [[Y:%.]], -2
	; CHECK-NEXT: [[T1421:%.]] = icmp eq i64 [[N:%.]], 0			; CHECK-NEXT: [[T1421:%.]] = icmp eq i64 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[T1421]], label [[RETURN:%.]], label [[BB:%.]]			; CHECK-NEXT: br i1 [[T1421]], label [[RETURN:%.]], label [[BB:%.]]
	; CHECK: bb:			; CHECK: bb:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INDVAR_NEXT:%.]], [[BB]] ], [ 20, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[INDVAR_NEXT:%.]], [[BB]] ], [ 20, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[J:%.*]] = mul i64 [[I]], [[V]]			; CHECK-NEXT: [[J:%.*]] = mul i64 [[I]], [[V]]
	; CHECK-NEXT: [[H:%.*]] = add i64 [[J]], [[Z]]			; CHECK-NEXT: [[TMP0:%.*]] = getelementptr double, ptr [[E]], i64 [[J]]
	; CHECK-NEXT: [[T8:%.*]] = getelementptr double, ptr [[E]], i64 [[H]]			; CHECK-NEXT: [[T8:%.*]] = getelementptr double, ptr [[TMP0]], i64 [[Z]]
	; CHECK-NEXT: store <2 x double> zeroinitializer, ptr [[T8]], align 8			; CHECK-NEXT: store <2 x double> zeroinitializer, ptr [[T8]], align 8
	; CHECK-NEXT: [[INDVAR_NEXT]] = add i64 [[I]], 1			; CHECK-NEXT: [[INDVAR_NEXT]] = add i64 [[I]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVAR_NEXT]], [[N]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVAR_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN]], label [[BB]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN]], label [[BB]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/mem-par-metadata-memcpy.ll

	Show All 17 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I_0:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[ADD2:%.]], [[FOR_INC:%.]] ]			; CHECK-NEXT: [[I_0:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[ADD2:%.]], [[FOR_INC:%.]] ]
	; CHECK-NEXT: [[CMP:%.]] = icmp slt i64 [[I_0]], [[SIZE:%.]]			; CHECK-NEXT: [[CMP:%.]] = icmp slt i64 [[I_0]], [[SIZE:%.]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY:%.]], label [[FOR_END:%.]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, ptr [[OUT:%.]], i64 [[I_0]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, ptr [[OUT:%.]], i64 [[I_0]]
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[I_0]], [[SIZE]]			; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[OUT]], i64 [[I_0]]
	; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i8, ptr [[OUT]], i64 [[ADD]]			; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr i8, ptr [[TMP0]], i64 [[SIZE]]
	; CHECK-NEXT: [[TMP0:%.*]] = load i16, ptr [[ARRAYIDX1]], align 1, !llvm.access.group [[ACC_GRP0:![0-9]+]]			; CHECK-NEXT: [[TMP1:%.*]] = load i16, ptr [[ARRAYIDX1]], align 1, !llvm.access.group [[ACC_GRP0:![0-9]+]]
	; CHECK-NEXT: store i16 [[TMP0]], ptr [[ARRAYIDX]], align 1, !llvm.access.group [[ACC_GRP0]]			; CHECK-NEXT: store i16 [[TMP1]], ptr [[ARRAYIDX]], align 1, !llvm.access.group [[ACC_GRP0]]
	; CHECK-NEXT: br label [[FOR_INC]]			; CHECK-NEXT: br label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[ADD2]] = add nuw nsw i64 [[I_0]], 2			; CHECK-NEXT: [[ADD2]] = add nuw nsw i64 [[I_0]], 2
	; CHECK-NEXT: br label [[FOR_COND]], !llvm.loop [[LOOP1:![0-9]+]]			; CHECK-NEXT: br label [[FOR_COND]], !llvm.loop [[LOOP1:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	Show All 28 Lines

llvm/test/Transforms/InstCombine/memrchr-4.ll

	Show All 28 Lines
	; on the assumption that N is in bounds.			; on the assumption that N is in bounds.

	define ptr @fold_memrchr_a11111_c_n(i32 %C, i64 %N) {			define ptr @fold_memrchr_a11111_c_n(i32 %C, i64 %N) {
	; CHECK-LABEL: @fold_memrchr_a11111_c_n(			; CHECK-LABEL: @fold_memrchr_a11111_c_n(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ne i64 [[N:%.]], 0			; CHECK-NEXT: [[TMP1:%.]] = icmp ne i64 [[N:%.]], 0
	; CHECK-NEXT: [[TMP2:%.]] = trunc i32 [[C:%.]] to i8			; CHECK-NEXT: [[TMP2:%.]] = trunc i32 [[C:%.]] to i8
	; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i8 [[TMP2]], 1			; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i8 [[TMP2]], 1
	; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP1]], i1 [[TMP3]], i1 false			; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP1]], i1 [[TMP3]], i1 false
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[N]], -1			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr @a11111, i64 [[N]]
	; CHECK-NEXT: [[MEMRCHR_PTR_PLUS:%.*]] = getelementptr inbounds i8, ptr @a11111, i64 [[TMP5]]			; CHECK-NEXT: [[MEMRCHR_PTR_PLUS:%.*]] = getelementptr i8, ptr [[TMP5]], i64 -1
	; CHECK-NEXT: [[MEMRCHR_SEL:%.*]] = select i1 [[TMP4]], ptr [[MEMRCHR_PTR_PLUS]], ptr null			; CHECK-NEXT: [[MEMRCHR_SEL:%.*]] = select i1 [[TMP4]], ptr [[MEMRCHR_PTR_PLUS]], ptr null
	; CHECK-NEXT: ret ptr [[MEMRCHR_SEL]]			; CHECK-NEXT: ret ptr [[MEMRCHR_SEL]]
	;			;

	%ret = call ptr @memrchr(ptr @a11111, i32 %C, i64 %N)			%ret = call ptr @memrchr(ptr @a11111, i32 %C, i64 %N)
	ret ptr %ret			ret ptr %ret
	}			}

	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/shift.ll

	Show First 20 Lines • Show All 1,749 Lines • ▼ Show 20 Lines
	; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=26135			; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=26135
	define void @ashr_out_of_range_1(ptr %A) {			define void @ashr_out_of_range_1(ptr %A) {
	; CHECK-LABEL: @ashr_out_of_range_1(			; CHECK-LABEL: @ashr_out_of_range_1(
	; CHECK-NEXT: [[L:%.]] = load i177, ptr [[A:%.]], align 4			; CHECK-NEXT: [[L:%.]] = load i177, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[L_FROZEN:%.*]] = freeze i177 [[L]]			; CHECK-NEXT: [[L_FROZEN:%.*]] = freeze i177 [[L]]
	; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i177 [[L_FROZEN]], -1			; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i177 [[L_FROZEN]], -1
	; CHECK-NEXT: [[B:%.*]] = select i1 [[TMP1]], i177 0, i177 [[L_FROZEN]]			; CHECK-NEXT: [[B:%.*]] = select i1 [[TMP1]], i177 0, i177 [[L_FROZEN]]
	; CHECK-NEXT: [[TMP2:%.*]] = trunc i177 [[B]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = trunc i177 [[B]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP2]], -1			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i177, ptr [[A]], i64 [[TMP2]]
	; CHECK-NEXT: [[G11:%.*]] = getelementptr i177, ptr [[A]], i64 [[TMP3]]			; CHECK-NEXT: [[G11:%.*]] = getelementptr i177, ptr [[TMP3]], i64 -1
	; CHECK-NEXT: [[C17:%.*]] = icmp sgt i177 [[B]], [[L_FROZEN]]			; CHECK-NEXT: [[C17:%.*]] = icmp sgt i177 [[B]], [[L_FROZEN]]
	; CHECK-NEXT: [[TMP4:%.*]] = sext i1 [[C17]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = sext i1 [[C17]] to i64
	; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, ptr [[G11]], i64 [[TMP4]]			; CHECK-NEXT: [[G62:%.*]] = getelementptr i177, ptr [[G11]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i177 [[L_FROZEN]], -1			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i177 [[L_FROZEN]], -1
	; CHECK-NEXT: [[B28:%.*]] = select i1 [[TMP5]], i177 0, i177 [[L_FROZEN]]			; CHECK-NEXT: [[B28:%.*]] = select i1 [[TMP5]], i177 0, i177 [[L_FROZEN]]
	; CHECK-NEXT: store i177 [[B28]], ptr [[G62]], align 4			; CHECK-NEXT: store i177 [[B28]], ptr [[G62]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll

	Show First 20 Lines • Show All 1,396 Lines • ▼ Show 20 Lines
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 3, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 3, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[I_MINUS_1:%.*]] = add i64 [[I]], -1
	; CHECK-NEXT: [[I_MINUS_3:%.*]] = add i64 [[I]], -3
	; CHECK-NEXT: [[A_I:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I]]			; CHECK-NEXT: [[A_I:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I]]
	; CHECK-NEXT: [[A_I_MINUS_1:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_MINUS_1]]			; CHECK-NEXT: [[TMP19:%.*]] = getelementptr i32, ptr [[A]], i64 [[I]]
	; CHECK-NEXT: [[A_I_MINUS_3:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_MINUS_3]]			; CHECK-NEXT: [[A_I_MINUS_1:%.*]] = getelementptr i32, ptr [[TMP19]], i64 -1
				; CHECK-NEXT: [[TMP20:%.*]] = getelementptr i32, ptr [[A]], i64 [[I]]
				; CHECK-NEXT: [[A_I_MINUS_3:%.*]] = getelementptr i32, ptr [[TMP20]], i64 -3
	; CHECK-NEXT: store i32 [[X]], ptr [[A_I_MINUS_1]], align 4			; CHECK-NEXT: store i32 [[X]], ptr [[A_I_MINUS_1]], align 4
	; CHECK-NEXT: store i32 [[Y]], ptr [[A_I_MINUS_3]], align 4			; CHECK-NEXT: store i32 [[Y]], ptr [[A_I_MINUS_3]], align 4
	; CHECK-NEXT: store i32 [[Z]], ptr [[A_I]], align 4			; CHECK-NEXT: store i32 [[Z]], ptr [[A_I]], align 4
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 2			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 2
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP33:![0-9]+]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP33:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	Show All 36 Lines
	; CHECK-NEXT: [[TMP1:%.*]] = add nuw i64 [[TMP0]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add nuw i64 [[TMP0]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 2			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 2
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP1]], [[TMP3]]
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[N]], 1			; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[N]], 1
	; CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], -4			; CHECK-NEXT: [[TMP5:%.*]] = and i64 [[TMP4]], -4
	; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[TMP5]], 4			; CHECK-NEXT: [[TMP6:%.]] = getelementptr i8, ptr [[B:%.]], i64 [[TMP5]]
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[B:%.]], i64 [[TMP6]]			; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[TMP6]], i64 4
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 2			; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 2
	; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[TMP5]], 6			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP7]]			; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[TMP7]], i64 6
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt ptr [[SCEVGEP2]], [[B]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt ptr [[SCEVGEP2]], [[B]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP1]], [[SCEVGEP]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP1]], [[SCEVGEP]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nsw i64 [[TMP8]], -4			; CHECK-NEXT: [[DOTNEG:%.*]] = mul nsw i64 [[TMP8]], -4
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP1]], [[DOTNEG]]			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP1]], [[DOTNEG]]
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

	Show All 29 Lines
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[INDEX]], 3
	; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[C]], i64 [[TMP4]]			; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[C]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[TMP5]], 5			; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[TMP5]], 5
	; CHECK-NEXT: [[TMP7:%.*]] = shl i64 [[INDEX]], 3			; CHECK-NEXT: [[TMP7:%.*]] = shl i64 [[INDEX]], 3
	; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = getelementptr i8, ptr [[C]], i64 [[TMP6]]
	; CHECK-NEXT: [[NEXT_GEP2:%.*]] = getelementptr i8, ptr [[C]], i64 [[TMP8]]			; CHECK-NEXT: [[NEXT_GEP2:%.*]] = getelementptr i8, ptr [[TMP8]], i64 [[TMP7]]
	; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[NEXT_GEP]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[NEXT_GEP]], align 4
	; CHECK-NEXT: [[WIDE_VEC3:%.*]] = load <vscale x 8 x i32>, ptr [[NEXT_GEP2]], align 4			; CHECK-NEXT: [[WIDE_VEC3:%.*]] = load <vscale x 8 x i32>, ptr [[NEXT_GEP2]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP9:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0			; CHECK-NEXT: [[TMP9:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP10:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1			; CHECK-NEXT: [[TMP10:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC3]])			; CHECK-NEXT: [[STRIDED_VEC4:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC3]])
	; CHECK-NEXT: [[TMP11:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC4]], 0			; CHECK-NEXT: [[TMP11:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC4]], 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC4]], 1			; CHECK-NEXT: [[TMP12:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC4]], 1
	▲ Show 20 Lines • Show All 331 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/induction.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines
	; IND-LABEL: @scalar_use(			; IND-LABEL: @scalar_use(
	; IND-NEXT: entry:			; IND-NEXT: entry:
	; IND-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 2			; IND-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 2
	; IND-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; IND-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; IND: vector.memcheck:			; IND: vector.memcheck:
	; IND-NEXT: [[TMP0:%.]] = shl i64 [[OFFSET:%.]], 2			; IND-NEXT: [[TMP0:%.]] = shl i64 [[OFFSET:%.]], 2
	; IND-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP0]]			; IND-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP0]]
	; IND-NEXT: [[TMP1:%.*]] = shl i64 [[N]], 2			; IND-NEXT: [[TMP1:%.*]] = shl i64 [[N]], 2
	; IND-NEXT: [[TMP2:%.*]] = add i64 [[TMP1]], [[TMP0]]			; IND-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
	; IND-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP2]]			; IND-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[TMP2]], i64 [[TMP0]]
	; IND-NEXT: [[TMP3:%.]] = shl i64 [[OFFSET2:%.]], 2			; IND-NEXT: [[TMP3:%.]] = shl i64 [[OFFSET2:%.]], 2
	; IND-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]			; IND-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]
	; IND-NEXT: [[TMP4:%.*]] = add i64 [[TMP1]], [[TMP3]]			; IND-NEXT: [[TMP4:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
	; IND-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP4]]			; IND-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[TMP4]], i64 [[TMP3]]
	; IND-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; IND-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
	; IND-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; IND-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
	; IND-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; IND-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; IND-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; IND-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; IND: vector.ph:			; IND: vector.ph:
	; IND-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -2			; IND-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -2
	; IND-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x float> poison, float [[B:%.]], i64 0			; IND-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x float> poison, float [[B:%.]], i64 0
	; IND-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT]], <2 x float> poison, <2 x i32> zeroinitializer			; IND-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT]], <2 x float> poison, <2 x i32> zeroinitializer
	; IND-NEXT: br label [[VECTOR_BODY:%.*]]			; IND-NEXT: br label [[VECTOR_BODY:%.*]]
	; IND: vector.body:			; IND: vector.body:
	; IND-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; IND-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; IND-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], [[OFFSET]]			; IND-NEXT: [[TMP5:%.*]] = getelementptr float, ptr [[A]], i64 [[INDEX]]
	; IND-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP5]]			; IND-NEXT: [[TMP6:%.*]] = getelementptr float, ptr [[TMP5]], i64 [[OFFSET]]
	; IND-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7			; IND-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7
	; IND-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], [[OFFSET2]]			; IND-NEXT: [[TMP7:%.*]] = getelementptr float, ptr [[A]], i64 [[INDEX]]
	; IND-NEXT: [[TMP8:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP7]]			; IND-NEXT: [[TMP8:%.*]] = getelementptr float, ptr [[TMP7]], i64 [[OFFSET2]]
	; IND-NEXT: [[WIDE_LOAD4:%.*]] = load <2 x float>, ptr [[TMP8]], align 4, !alias.scope !7			; IND-NEXT: [[WIDE_LOAD4:%.*]] = load <2 x float>, ptr [[TMP8]], align 4, !alias.scope !7
	; IND-NEXT: [[TMP9:%.*]] = fmul fast <2 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD4]]			; IND-NEXT: [[TMP9:%.*]] = fmul fast <2 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD4]]
	; IND-NEXT: [[TMP10:%.*]] = fadd fast <2 x float> [[WIDE_LOAD]], [[TMP9]]			; IND-NEXT: [[TMP10:%.*]] = fadd fast <2 x float> [[WIDE_LOAD]], [[TMP9]]
	; IND-NEXT: store <2 x float> [[TMP10]], ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7			; IND-NEXT: store <2 x float> [[TMP10]], ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7
	; IND-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; IND-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; IND-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; IND-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; IND-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; IND-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; IND: middle.block:			; IND: middle.block:
	; IND-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; IND-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; IND-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; IND-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; IND: scalar.ph:			; IND: scalar.ph:
	; IND-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; IND-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; IND-NEXT: br label [[FOR_BODY:%.*]]			; IND-NEXT: br label [[FOR_BODY:%.*]]
	; IND: for.body:			; IND: for.body:
	; IND-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; IND-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; IND-NEXT: [[IND_SUM:%.*]] = add i64 [[IV]], [[OFFSET]]			; IND-NEXT: [[TMP12:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; IND-NEXT: [[ARR_IDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM]]			; IND-NEXT: [[ARR_IDX:%.*]] = getelementptr float, ptr [[TMP12]], i64 [[OFFSET]]
	; IND-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4			; IND-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4
	; IND-NEXT: [[IND_SUM2:%.*]] = add i64 [[IV]], [[OFFSET2]]			; IND-NEXT: [[TMP13:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; IND-NEXT: [[ARR_IDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM2]]			; IND-NEXT: [[ARR_IDX2:%.*]] = getelementptr float, ptr [[TMP13]], i64 [[OFFSET2]]
	; IND-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4			; IND-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4
	; IND-NEXT: [[M:%.*]] = fmul fast float [[L2]], [[B]]			; IND-NEXT: [[M:%.*]] = fmul fast float [[L2]], [[B]]
	; IND-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]			; IND-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]
	; IND-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4			; IND-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4
	; IND-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; IND-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; IND-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]			; IND-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
	; IND-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; IND-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; IND: loopexit:			; IND: loopexit:
	; IND-NEXT: ret void			; IND-NEXT: ret void
	;			;
	; UNROLL-LABEL: @scalar_use(			; UNROLL-LABEL: @scalar_use(
	; UNROLL-NEXT: entry:			; UNROLL-NEXT: entry:
	; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4			; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
	; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; UNROLL: vector.memcheck:			; UNROLL: vector.memcheck:
	; UNROLL-NEXT: [[TMP0:%.]] = shl i64 [[OFFSET:%.]], 2			; UNROLL-NEXT: [[TMP0:%.]] = shl i64 [[OFFSET:%.]], 2
	; UNROLL-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP0]]			; UNROLL-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP0]]
	; UNROLL-NEXT: [[TMP1:%.*]] = shl i64 [[N]], 2			; UNROLL-NEXT: [[TMP1:%.*]] = shl i64 [[N]], 2
	; UNROLL-NEXT: [[TMP2:%.*]] = add i64 [[TMP1]], [[TMP0]]			; UNROLL-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
	; UNROLL-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP2]]			; UNROLL-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[TMP2]], i64 [[TMP0]]
	; UNROLL-NEXT: [[TMP3:%.]] = shl i64 [[OFFSET2:%.]], 2			; UNROLL-NEXT: [[TMP3:%.]] = shl i64 [[OFFSET2:%.]], 2
	; UNROLL-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]			; UNROLL-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]
	; UNROLL-NEXT: [[TMP4:%.*]] = add i64 [[TMP1]], [[TMP3]]			; UNROLL-NEXT: [[TMP4:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
	; UNROLL-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP4]]			; UNROLL-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[TMP4]], i64 [[TMP3]]
	; UNROLL-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; UNROLL-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
	; UNROLL-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; UNROLL-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
	; UNROLL-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; UNROLL-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; UNROLL-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; UNROLL-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL: vector.ph:			; UNROLL: vector.ph:
	; UNROLL-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -4			; UNROLL-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -4
	; UNROLL-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x float> poison, float [[B:%.]], i64 0			; UNROLL-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x float> poison, float [[B:%.]], i64 0
	; UNROLL-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT]], <2 x float> poison, <2 x i32> zeroinitializer			; UNROLL-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT]], <2 x float> poison, <2 x i32> zeroinitializer
	; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; UNROLL-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], [[OFFSET]]			; UNROLL-NEXT: [[TMP5:%.*]] = getelementptr float, ptr [[A]], i64 [[INDEX]]
	; UNROLL-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP5]]			; UNROLL-NEXT: [[TMP6:%.*]] = getelementptr float, ptr [[TMP5]], i64 [[OFFSET]]
	; UNROLL-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7			; UNROLL-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7
	; UNROLL-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[TMP6]], i64 2			; UNROLL-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[TMP6]], i64 2
	; UNROLL-NEXT: [[WIDE_LOAD4:%.*]] = load <2 x float>, ptr [[TMP7]], align 4, !alias.scope !4, !noalias !7			; UNROLL-NEXT: [[WIDE_LOAD4:%.*]] = load <2 x float>, ptr [[TMP7]], align 4, !alias.scope !4, !noalias !7
	; UNROLL-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], [[OFFSET2]]			; UNROLL-NEXT: [[TMP8:%.*]] = getelementptr float, ptr [[A]], i64 [[INDEX]]
	; UNROLL-NEXT: [[TMP9:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP8]]			; UNROLL-NEXT: [[TMP9:%.*]] = getelementptr float, ptr [[TMP8]], i64 [[OFFSET2]]
	; UNROLL-NEXT: [[WIDE_LOAD5:%.*]] = load <2 x float>, ptr [[TMP9]], align 4, !alias.scope !7			; UNROLL-NEXT: [[WIDE_LOAD5:%.*]] = load <2 x float>, ptr [[TMP9]], align 4, !alias.scope !7
	; UNROLL-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i64 2			; UNROLL-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i64 2
	; UNROLL-NEXT: [[WIDE_LOAD6:%.*]] = load <2 x float>, ptr [[TMP10]], align 4, !alias.scope !7			; UNROLL-NEXT: [[WIDE_LOAD6:%.*]] = load <2 x float>, ptr [[TMP10]], align 4, !alias.scope !7
	; UNROLL-NEXT: [[TMP11:%.*]] = fmul fast <2 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD5]]			; UNROLL-NEXT: [[TMP11:%.*]] = fmul fast <2 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD5]]
	; UNROLL-NEXT: [[TMP12:%.*]] = fmul fast <2 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD6]]			; UNROLL-NEXT: [[TMP12:%.*]] = fmul fast <2 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD6]]
	; UNROLL-NEXT: [[TMP13:%.*]] = fadd fast <2 x float> [[WIDE_LOAD]], [[TMP11]]			; UNROLL-NEXT: [[TMP13:%.*]] = fadd fast <2 x float> [[WIDE_LOAD]], [[TMP11]]
	; UNROLL-NEXT: [[TMP14:%.*]] = fadd fast <2 x float> [[WIDE_LOAD4]], [[TMP12]]			; UNROLL-NEXT: [[TMP14:%.*]] = fadd fast <2 x float> [[WIDE_LOAD4]], [[TMP12]]
	; UNROLL-NEXT: store <2 x float> [[TMP13]], ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7			; UNROLL-NEXT: store <2 x float> [[TMP13]], ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7
	; UNROLL-NEXT: store <2 x float> [[TMP14]], ptr [[TMP7]], align 4, !alias.scope !4, !noalias !7			; UNROLL-NEXT: store <2 x float> [[TMP14]], ptr [[TMP7]], align 4, !alias.scope !4, !noalias !7
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; UNROLL-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; UNROLL-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; UNROLL: scalar.ph:			; UNROLL: scalar.ph:
	; UNROLL-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; UNROLL-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; UNROLL-NEXT: br label [[FOR_BODY:%.*]]			; UNROLL-NEXT: br label [[FOR_BODY:%.*]]
	; UNROLL: for.body:			; UNROLL: for.body:
	; UNROLL-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; UNROLL-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; UNROLL-NEXT: [[IND_SUM:%.*]] = add i64 [[IV]], [[OFFSET]]			; UNROLL-NEXT: [[TMP16:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; UNROLL-NEXT: [[ARR_IDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM]]			; UNROLL-NEXT: [[ARR_IDX:%.*]] = getelementptr float, ptr [[TMP16]], i64 [[OFFSET]]
	; UNROLL-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4			; UNROLL-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4
	; UNROLL-NEXT: [[IND_SUM2:%.*]] = add i64 [[IV]], [[OFFSET2]]			; UNROLL-NEXT: [[TMP17:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; UNROLL-NEXT: [[ARR_IDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM2]]			; UNROLL-NEXT: [[ARR_IDX2:%.*]] = getelementptr float, ptr [[TMP17]], i64 [[OFFSET2]]
	; UNROLL-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4			; UNROLL-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4
	; UNROLL-NEXT: [[M:%.*]] = fmul fast float [[L2]], [[B]]			; UNROLL-NEXT: [[M:%.*]] = fmul fast float [[L2]], [[B]]
	; UNROLL-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]			; UNROLL-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]
	; UNROLL-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4			; UNROLL-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4
	; UNROLL-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; UNROLL-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]			; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
	; UNROLL-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; UNROLL-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; UNROLL: loopexit:			; UNROLL: loopexit:
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; INTERLEAVE-LABEL: @scalar_use(			; INTERLEAVE-LABEL: @scalar_use(
	; INTERLEAVE-NEXT: entry:			; INTERLEAVE-NEXT: entry:
	; INTERLEAVE-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 8			; INTERLEAVE-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 8
	; INTERLEAVE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; INTERLEAVE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; INTERLEAVE: vector.memcheck:			; INTERLEAVE: vector.memcheck:
	; INTERLEAVE-NEXT: [[TMP0:%.]] = shl i64 [[OFFSET:%.]], 2			; INTERLEAVE-NEXT: [[TMP0:%.]] = shl i64 [[OFFSET:%.]], 2
	; INTERLEAVE-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP0]]			; INTERLEAVE-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP0]]
	; INTERLEAVE-NEXT: [[TMP1:%.*]] = shl i64 [[N]], 2			; INTERLEAVE-NEXT: [[TMP1:%.*]] = shl i64 [[N]], 2
	; INTERLEAVE-NEXT: [[TMP2:%.*]] = add i64 [[TMP1]], [[TMP0]]			; INTERLEAVE-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
	; INTERLEAVE-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP2]]			; INTERLEAVE-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[TMP2]], i64 [[TMP0]]
	; INTERLEAVE-NEXT: [[TMP3:%.]] = shl i64 [[OFFSET2:%.]], 2			; INTERLEAVE-NEXT: [[TMP3:%.]] = shl i64 [[OFFSET2:%.]], 2
	; INTERLEAVE-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]			; INTERLEAVE-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]
	; INTERLEAVE-NEXT: [[TMP4:%.*]] = add i64 [[TMP1]], [[TMP3]]			; INTERLEAVE-NEXT: [[TMP4:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
	; INTERLEAVE-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP4]]			; INTERLEAVE-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[TMP4]], i64 [[TMP3]]
	; INTERLEAVE-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; INTERLEAVE-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
	; INTERLEAVE-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; INTERLEAVE-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
	; INTERLEAVE-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; INTERLEAVE-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; INTERLEAVE-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; INTERLEAVE-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; INTERLEAVE: vector.ph:			; INTERLEAVE: vector.ph:
	; INTERLEAVE-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -8			; INTERLEAVE-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -8
	; INTERLEAVE-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[B:%.]], i64 0			; INTERLEAVE-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[B:%.]], i64 0
	; INTERLEAVE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer			; INTERLEAVE-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
	; INTERLEAVE-NEXT: br label [[VECTOR_BODY:%.*]]			; INTERLEAVE-NEXT: br label [[VECTOR_BODY:%.*]]
	; INTERLEAVE: vector.body:			; INTERLEAVE: vector.body:
	; INTERLEAVE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; INTERLEAVE-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; INTERLEAVE-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], [[OFFSET]]			; INTERLEAVE-NEXT: [[TMP5:%.*]] = getelementptr float, ptr [[A]], i64 [[INDEX]]
	; INTERLEAVE-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP5]]			; INTERLEAVE-NEXT: [[TMP6:%.*]] = getelementptr float, ptr [[TMP5]], i64 [[OFFSET]]
	; INTERLEAVE-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7			; INTERLEAVE-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7
	; INTERLEAVE-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[TMP6]], i64 4			; INTERLEAVE-NEXT: [[TMP7:%.*]] = getelementptr inbounds float, ptr [[TMP6]], i64 4
	; INTERLEAVE-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x float>, ptr [[TMP7]], align 4, !alias.scope !4, !noalias !7			; INTERLEAVE-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x float>, ptr [[TMP7]], align 4, !alias.scope !4, !noalias !7
	; INTERLEAVE-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], [[OFFSET2]]			; INTERLEAVE-NEXT: [[TMP8:%.*]] = getelementptr float, ptr [[A]], i64 [[INDEX]]
	; INTERLEAVE-NEXT: [[TMP9:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP8]]			; INTERLEAVE-NEXT: [[TMP9:%.*]] = getelementptr float, ptr [[TMP8]], i64 [[OFFSET2]]
	; INTERLEAVE-NEXT: [[WIDE_LOAD5:%.*]] = load <4 x float>, ptr [[TMP9]], align 4, !alias.scope !7			; INTERLEAVE-NEXT: [[WIDE_LOAD5:%.*]] = load <4 x float>, ptr [[TMP9]], align 4, !alias.scope !7
	; INTERLEAVE-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i64 4			; INTERLEAVE-NEXT: [[TMP10:%.*]] = getelementptr inbounds float, ptr [[TMP9]], i64 4
	; INTERLEAVE-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x float>, ptr [[TMP10]], align 4, !alias.scope !7			; INTERLEAVE-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x float>, ptr [[TMP10]], align 4, !alias.scope !7
	; INTERLEAVE-NEXT: [[TMP11:%.*]] = fmul fast <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD5]]			; INTERLEAVE-NEXT: [[TMP11:%.*]] = fmul fast <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD5]]
	; INTERLEAVE-NEXT: [[TMP12:%.*]] = fmul fast <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD6]]			; INTERLEAVE-NEXT: [[TMP12:%.*]] = fmul fast <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD6]]
	; INTERLEAVE-NEXT: [[TMP13:%.*]] = fadd fast <4 x float> [[WIDE_LOAD]], [[TMP11]]			; INTERLEAVE-NEXT: [[TMP13:%.*]] = fadd fast <4 x float> [[WIDE_LOAD]], [[TMP11]]
	; INTERLEAVE-NEXT: [[TMP14:%.*]] = fadd fast <4 x float> [[WIDE_LOAD4]], [[TMP12]]			; INTERLEAVE-NEXT: [[TMP14:%.*]] = fadd fast <4 x float> [[WIDE_LOAD4]], [[TMP12]]
	; INTERLEAVE-NEXT: store <4 x float> [[TMP13]], ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7			; INTERLEAVE-NEXT: store <4 x float> [[TMP13]], ptr [[TMP6]], align 4, !alias.scope !4, !noalias !7
	; INTERLEAVE-NEXT: store <4 x float> [[TMP14]], ptr [[TMP7]], align 4, !alias.scope !4, !noalias !7			; INTERLEAVE-NEXT: store <4 x float> [[TMP14]], ptr [[TMP7]], align 4, !alias.scope !4, !noalias !7
	; INTERLEAVE-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; INTERLEAVE-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; INTERLEAVE-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; INTERLEAVE-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; INTERLEAVE-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; INTERLEAVE-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; INTERLEAVE: middle.block:			; INTERLEAVE: middle.block:
	; INTERLEAVE-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; INTERLEAVE-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; INTERLEAVE-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; INTERLEAVE-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; INTERLEAVE: scalar.ph:			; INTERLEAVE: scalar.ph:
	; INTERLEAVE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; INTERLEAVE-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; INTERLEAVE-NEXT: br label [[FOR_BODY:%.*]]			; INTERLEAVE-NEXT: br label [[FOR_BODY:%.*]]
	; INTERLEAVE: for.body:			; INTERLEAVE: for.body:
	; INTERLEAVE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; INTERLEAVE-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; INTERLEAVE-NEXT: [[IND_SUM:%.*]] = add i64 [[IV]], [[OFFSET]]			; INTERLEAVE-NEXT: [[TMP16:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; INTERLEAVE-NEXT: [[ARR_IDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM]]			; INTERLEAVE-NEXT: [[ARR_IDX:%.*]] = getelementptr float, ptr [[TMP16]], i64 [[OFFSET]]
	; INTERLEAVE-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4			; INTERLEAVE-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4
	; INTERLEAVE-NEXT: [[IND_SUM2:%.*]] = add i64 [[IV]], [[OFFSET2]]			; INTERLEAVE-NEXT: [[TMP17:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; INTERLEAVE-NEXT: [[ARR_IDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM2]]			; INTERLEAVE-NEXT: [[ARR_IDX2:%.*]] = getelementptr float, ptr [[TMP17]], i64 [[OFFSET2]]
	; INTERLEAVE-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4			; INTERLEAVE-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4
	; INTERLEAVE-NEXT: [[M:%.*]] = fmul fast float [[L2]], [[B]]			; INTERLEAVE-NEXT: [[M:%.*]] = fmul fast float [[L2]], [[B]]
	; INTERLEAVE-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]			; INTERLEAVE-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]
	; INTERLEAVE-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4			; INTERLEAVE-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4
	; INTERLEAVE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; INTERLEAVE-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; INTERLEAVE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]			; INTERLEAVE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
	; INTERLEAVE-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; INTERLEAVE-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; INTERLEAVE: loopexit:			; INTERLEAVE: loopexit:
	▲ Show 20 Lines • Show All 1,033 Lines • ▼ Show 20 Lines
	; IND-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; IND-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; IND-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp eq i32 [[TMP0]], 0			; IND-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp eq i32 [[TMP0]], 0
	; IND-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; IND-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; IND: vector.memcheck:			; IND: vector.memcheck:
	; IND-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[P:%.]], i64 4			; IND-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[P:%.]], i64 4
	; IND-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1			; IND-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1
	; IND-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64			; IND-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64
	; IND-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3			; IND-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3
	; IND-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP5]], 8			; IND-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[P]], i64 [[TMP5]]
	; IND-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[P]], i64 [[TMP6]]			; IND-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[TMP6]], i64 8
	; IND-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP4]], 4			; IND-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP4]], 4
	; IND-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], 4			; IND-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], 4
	; IND-NEXT: [[SCEVGEP2:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP8]]			; IND-NEXT: [[SCEVGEP2:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP8]]
	; IND-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP2]]			; IND-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP2]]
	; IND-NEXT: [[BOUND1:%.*]] = icmp ugt ptr [[SCEVGEP1]], [[A]]			; IND-NEXT: [[BOUND1:%.*]] = icmp ugt ptr [[SCEVGEP1]], [[A]]
	; IND-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; IND-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; IND-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; IND-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; IND: vector.ph:			; IND: vector.ph:
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; UNROLL-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; UNROLL-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 3			; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 3
	; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; UNROLL: vector.memcheck:			; UNROLL: vector.memcheck:
	; UNROLL-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[P:%.]], i64 4			; UNROLL-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[P:%.]], i64 4
	; UNROLL-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1			; UNROLL-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1
	; UNROLL-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64			; UNROLL-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64
	; UNROLL-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3			; UNROLL-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3
	; UNROLL-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP5]], 8			; UNROLL-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[P]], i64 [[TMP5]]
	; UNROLL-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[P]], i64 [[TMP6]]			; UNROLL-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[TMP6]], i64 8
	; UNROLL-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP4]], 4			; UNROLL-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP4]], 4
	; UNROLL-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], 4			; UNROLL-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], 4
	; UNROLL-NEXT: [[SCEVGEP2:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP8]]			; UNROLL-NEXT: [[SCEVGEP2:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP8]]
	; UNROLL-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP2]]			; UNROLL-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP2]]
	; UNROLL-NEXT: [[BOUND1:%.*]] = icmp ugt ptr [[SCEVGEP1]], [[A]]			; UNROLL-NEXT: [[BOUND1:%.*]] = icmp ugt ptr [[SCEVGEP1]], [[A]]
	; UNROLL-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; UNROLL-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; UNROLL-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; UNROLL-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; UNROLL: vector.ph:			; UNROLL: vector.ph:
	▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
	; INTERLEAVE-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1			; INTERLEAVE-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
	; INTERLEAVE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 8			; INTERLEAVE-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 8
	; INTERLEAVE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; INTERLEAVE-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; INTERLEAVE: vector.memcheck:			; INTERLEAVE: vector.memcheck:
	; INTERLEAVE-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[P:%.]], i64 4			; INTERLEAVE-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[P:%.]], i64 4
	; INTERLEAVE-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1			; INTERLEAVE-NEXT: [[TMP3:%.*]] = add i32 [[N]], -1
	; INTERLEAVE-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64			; INTERLEAVE-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64
	; INTERLEAVE-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3			; INTERLEAVE-NEXT: [[TMP5:%.*]] = shl nuw nsw i64 [[TMP4]], 3
	; INTERLEAVE-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP5]], 8			; INTERLEAVE-NEXT: [[TMP6:%.*]] = getelementptr i8, ptr [[P]], i64 [[TMP5]]
	; INTERLEAVE-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[P]], i64 [[TMP6]]			; INTERLEAVE-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[TMP6]], i64 8
	; INTERLEAVE-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP4]], 4			; INTERLEAVE-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP4]], 4
	; INTERLEAVE-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], 4			; INTERLEAVE-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], 4
	; INTERLEAVE-NEXT: [[SCEVGEP2:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP8]]			; INTERLEAVE-NEXT: [[SCEVGEP2:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP8]]
	; INTERLEAVE-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP2]]			; INTERLEAVE-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP2]]
	; INTERLEAVE-NEXT: [[BOUND1:%.*]] = icmp ugt ptr [[SCEVGEP1]], [[A]]			; INTERLEAVE-NEXT: [[BOUND1:%.*]] = icmp ugt ptr [[SCEVGEP1]], [[A]]
	; INTERLEAVE-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; INTERLEAVE-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; INTERLEAVE-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; INTERLEAVE-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; INTERLEAVE: vector.ph:			; INTERLEAVE: vector.ph:
	▲ Show 20 Lines • Show All 4,743 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

	Show First 20 Lines • Show All 1,371 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 3, i64 5, i64 7, i64 9>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 3, i64 5, i64 7, i64 9>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP4:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = or i64 [[TMP4]], 3			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = or i64 [[TMP4]], 3
	; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP4]], 5			; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[TMP4]], 5
	; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP4]], 7			; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP4]], 7
	; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[TMP4]], 9			; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i64> [[VEC_IND]], <i64 -1, i64 -1, i64 -1, i64 -1>
	; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i64> [[VEC_IND]], <i64 -1, i64 -1, i64 -1, i64 -1>			; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i64> [[VEC_IND]], <i64 -3, i64 -3, i64 -3, i64 -3>
	; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i64> [[VEC_IND]], <i64 -3, i64 -3, i64 -3, i64 -3>			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr i32, ptr [[A]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr i32, ptr [[TMP12]], i64 9
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i64> [[TMP8]], i64 0			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i64> [[TMP7]], i64 0
	; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP14]]			; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i64> [[TMP8]], i64 1			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i64> [[TMP7]], i64 1
	; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP16]]			; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP16]]
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x i64> [[TMP8]], i64 2			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x i64> [[TMP7]], i64 2
	; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP18]]			; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP18]]
	; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x i64> [[TMP8]], i64 3			; CHECK-NEXT: [[TMP20:%.*]] = extractelement <4 x i64> [[TMP7]], i64 3
	; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP20]]			; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP20]]
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i64> [[TMP9]], i64 0			; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i64> [[TMP8]], i64 0
	; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP22]]			; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP22]]
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i64> [[TMP9]], i64 1			; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i64> [[TMP8]], i64 1
	; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP24]]			; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP24]]
	; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i64> [[TMP9]], i64 2			; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i64> [[TMP8]], i64 2
	; CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP26]]			; CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP26]]
	; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i64> [[TMP9]], i64 3			; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i64> [[TMP8]], i64 3
	; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP28]]			; CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP28]]
	; CHECK-NEXT: store i32 [[X:%.*]], ptr [[TMP15]], align 4			; CHECK-NEXT: store i32 [[X:%.*]], ptr [[TMP15]], align 4
	; CHECK-NEXT: store i32 [[X]], ptr [[TMP17]], align 4			; CHECK-NEXT: store i32 [[X]], ptr [[TMP17]], align 4
	; CHECK-NEXT: store i32 [[X]], ptr [[TMP19]], align 4			; CHECK-NEXT: store i32 [[X]], ptr [[TMP19]], align 4
	; CHECK-NEXT: store i32 [[X]], ptr [[TMP21]], align 4			; CHECK-NEXT: store i32 [[X]], ptr [[TMP21]], align 4
	; CHECK-NEXT: store i32 [[Y:%.*]], ptr [[TMP23]], align 4			; CHECK-NEXT: store i32 [[Y:%.*]], ptr [[TMP23]], align 4
	; CHECK-NEXT: store i32 [[Y]], ptr [[TMP25]], align 4			; CHECK-NEXT: store i32 [[Y]], ptr [[TMP25]], align 4
	; CHECK-NEXT: store i32 [[Y]], ptr [[TMP27]], align 4			; CHECK-NEXT: store i32 [[Y]], ptr [[TMP27]], align 4
	; CHECK-NEXT: store i32 [[Y]], ptr [[TMP29]], align 4			; CHECK-NEXT: store i32 [[Y]], ptr [[TMP29]], align 4
	; CHECK-NEXT: store i32 [[Z:%.*]], ptr [[TMP10]], align 4			; CHECK-NEXT: store i32 [[Z:%.*]], ptr [[TMP9]], align 4
				; CHECK-NEXT: store i32 [[Z]], ptr [[TMP10]], align 4
	; CHECK-NEXT: store i32 [[Z]], ptr [[TMP11]], align 4			; CHECK-NEXT: store i32 [[Z]], ptr [[TMP11]], align 4
	; CHECK-NEXT: store i32 [[Z]], ptr [[TMP12]], align 4
	; CHECK-NEXT: store i32 [[Z]], ptr [[TMP13]], align 4			; CHECK-NEXT: store i32 [[Z]], ptr [[TMP13]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>
	; CHECK-NEXT: [[TMP30:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP30:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP30]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP34:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP30]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP34:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 3, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 3, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[I_MINUS_1:%.*]] = add i64 [[I]], -1
	; CHECK-NEXT: [[I_MINUS_3:%.*]] = add i64 [[I]], -3
	; CHECK-NEXT: [[A_I:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I]]			; CHECK-NEXT: [[A_I:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I]]
	; CHECK-NEXT: [[A_I_MINUS_1:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_MINUS_1]]			; CHECK-NEXT: [[TMP31:%.*]] = getelementptr i32, ptr [[A]], i64 [[I]]
	; CHECK-NEXT: [[A_I_MINUS_3:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[I_MINUS_3]]			; CHECK-NEXT: [[A_I_MINUS_1:%.*]] = getelementptr i32, ptr [[TMP31]], i64 -1
				; CHECK-NEXT: [[TMP32:%.*]] = getelementptr i32, ptr [[A]], i64 [[I]]
				; CHECK-NEXT: [[A_I_MINUS_3:%.*]] = getelementptr i32, ptr [[TMP32]], i64 -3
	; CHECK-NEXT: store i32 [[X]], ptr [[A_I_MINUS_1]], align 4			; CHECK-NEXT: store i32 [[X]], ptr [[A_I_MINUS_1]], align 4
	; CHECK-NEXT: store i32 [[Y]], ptr [[A_I_MINUS_3]], align 4			; CHECK-NEXT: store i32 [[Y]], ptr [[A_I_MINUS_3]], align 4
	; CHECK-NEXT: store i32 [[Z]], ptr [[A_I]], align 4			; CHECK-NEXT: store i32 [[Z]], ptr [[A_I]], align 4
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 2			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 2
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP35:![0-9]+]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP35:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	Show All 35 Lines
	; CHECK-NEXT: [[DOTPRE:%.]] = load i16, ptr [[A:%.]], align 2			; CHECK-NEXT: [[DOTPRE:%.]] = load i16, ptr [[A:%.]], align 2
	; CHECK-NEXT: [[TMP0:%.]] = lshr i64 [[N:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = lshr i64 [[N:%.]], 1
	; CHECK-NEXT: [[TMP1:%.*]] = add nuw i64 [[TMP0]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add nuw i64 [[TMP0]], 1
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 6			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 6
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[N]], 1			; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[N]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[TMP2]], -4			; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[TMP2]], -4
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP3]], 4			; CHECK-NEXT: [[TMP4:%.]] = getelementptr i8, ptr [[B:%.]], i64 [[TMP3]]
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[B:%.]], i64 [[TMP4]]			; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[TMP4]], i64 4
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 2			; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 2
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[TMP3]], 6			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP5]]			; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[TMP5]], i64 6
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt ptr [[SCEVGEP2]], [[B]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ugt ptr [[SCEVGEP2]], [[B]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP1]], [[SCEVGEP]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP1]], [[SCEVGEP]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP1]], -4			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP1]], -4
	; CHECK-NEXT: [[IND_END:%.*]] = shl i64 [[N_VEC]], 1			; CHECK-NEXT: [[IND_END:%.*]] = shl i64 [[N_VEC]], 1
	; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3			; CHECK-NEXT: [[VECTOR_RECUR_INIT:%.*]] = insertelement <4 x i16> poison, i16 [[DOTPRE]], i64 3
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll

	Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines

	define i32 @multiple_uniform_stores(ptr nocapture %var1, ptr nocapture readonly %var2, i32 %itr) #0 {			define i32 @multiple_uniform_stores(ptr nocapture %var1, ptr nocapture readonly %var2, i32 %itr) #0 {
	; CHECK-LABEL: @multiple_uniform_stores(			; CHECK-LABEL: @multiple_uniform_stores(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP20:%.]] = icmp eq i32 [[ITR:%.]], 0			; CHECK-NEXT: [[CMP20:%.]] = icmp eq i32 [[ITR:%.]], 0
	; CHECK-NEXT: br i1 [[CMP20]], label [[FOR_END10:%.]], label [[FOR_COND1_PREHEADER_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[CMP20]], label [[FOR_END10:%.]], label [[FOR_COND1_PREHEADER_PREHEADER:%.]]
	; CHECK: for.cond1.preheader.preheader:			; CHECK: for.cond1.preheader.preheader:
	; CHECK-NEXT: [[SCEVGEP3:%.]] = getelementptr i8, ptr [[VAR2:%.]], i64 4			; CHECK-NEXT: [[SCEVGEP3:%.]] = getelementptr i8, ptr [[VAR2:%.]], i64 4
				; CHECK-NEXT: [[INVARIANT_GEP5:%.]] = getelementptr i8, ptr [[VAR1:%.]], i64 4
	; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_COND1_PREHEADER:%.*]]
	; CHECK: for.cond1.preheader:			; CHECK: for.cond1.preheader:
	; CHECK-NEXT: [[INDVARS_IV23:%.]] = phi i64 [ [[INDVARS_IV_NEXT24:%.]], [[FOR_INC8:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_PREHEADER]] ]			; CHECK-NEXT: [[INDVARS_IV23:%.]] = phi i64 [ [[INDVARS_IV_NEXT24:%.]], [[FOR_INC8:%.*]] ], [ 0, [[FOR_COND1_PREHEADER_PREHEADER]] ]
	; CHECK-NEXT: [[J_022:%.]] = phi i32 [ [[J_1_LCSSA:%.]], [[FOR_INC8]] ], [ 0, [[FOR_COND1_PREHEADER_PREHEADER]] ]			; CHECK-NEXT: [[J_022:%.]] = phi i32 [ [[J_1_LCSSA:%.]], [[FOR_INC8]] ], [ 0, [[FOR_COND1_PREHEADER_PREHEADER]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[INDVARS_IV23]], 2			; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[INDVARS_IV23]], 2
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[VAR1:%.]], i64 [[TMP0]]			; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[VAR1]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.*]] = add nuw i64 [[TMP0]], 4			; CHECK-NEXT: [[GEP6:%.*]] = getelementptr i8, ptr [[INVARIANT_GEP5]], i64 [[TMP0]]
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[VAR1]], i64 [[TMP1]]
	; CHECK-NEXT: [[CMP218:%.*]] = icmp ult i32 [[J_022]], [[ITR]]			; CHECK-NEXT: [[CMP218:%.*]] = icmp ult i32 [[J_022]], [[ITR]]
	; CHECK-NEXT: br i1 [[CMP218]], label [[FOR_BODY3_LR_PH:%.*]], label [[FOR_INC8]]			; CHECK-NEXT: br i1 [[CMP218]], label [[FOR_BODY3_LR_PH:%.*]], label [[FOR_INC8]]
	; CHECK: for.body3.lr.ph:			; CHECK: for.body3.lr.ph:
	; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[VAR1]], i64 [[INDVARS_IV23]]			; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds i32, ptr [[VAR1]], i64 [[INDVARS_IV23]]
	; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[J_022]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[J_022]] to i64
	; CHECK-NEXT: [[ARRAYIDX5_PROMOTED:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4			; CHECK-NEXT: [[ARRAYIDX5_PROMOTED:%.*]] = load i32, ptr [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = xor i32 [[J_022]], -1			; CHECK-NEXT: [[TMP2:%.*]] = xor i32 [[J_022]], -1
	; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], [[ITR]]			; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[TMP2]], [[ITR]]
	; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP4]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64
	; CHECK-NEXT: [[TMP6:%.*]] = add nuw nsw i64 [[TMP5]], 1			; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP4]], 3			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP3]], 3
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP2]], 2			; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw i64 [[TMP1]], 2
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[VAR2]], i64 [[TMP7]]			; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[VAR2]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = xor i32 [[J_022]], -1			; CHECK-NEXT: [[TMP7:%.*]] = xor i32 [[J_022]], -1
	; CHECK-NEXT: [[TMP9:%.*]] = add i32 [[TMP8]], [[ITR]]			; CHECK-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], [[ITR]]
	; CHECK-NEXT: [[TMP10:%.*]] = zext i32 [[TMP9]] to i64			; CHECK-NEXT: [[TMP9:%.*]] = zext i32 [[TMP8]] to i64
	; CHECK-NEXT: [[TMP11:%.*]] = add nuw nsw i64 [[TMP2]], [[TMP10]]			; CHECK-NEXT: [[TMP10:%.*]] = add nuw nsw i64 [[TMP1]], [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw i64 [[TMP11]], 2			; CHECK-NEXT: [[TMP11:%.*]] = shl nuw nsw i64 [[TMP10]], 2
	; CHECK-NEXT: [[SCEVGEP4:%.*]] = getelementptr i8, ptr [[SCEVGEP3]], i64 [[TMP12]]			; CHECK-NEXT: [[SCEVGEP4:%.*]] = getelementptr i8, ptr [[SCEVGEP3]], i64 [[TMP11]]
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP4]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP4]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[GEP6]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP6]], 8589934588			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP5]], 8589934588
	; CHECK-NEXT: [[IND_END:%.*]] = add nuw nsw i64 [[N_VEC]], [[TMP2]]			; CHECK-NEXT: [[IND_END:%.*]] = add nuw nsw i64 [[N_VEC]], [[TMP1]]
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> <i32 poison, i32 0, i32 0, i32 0>, i32 [[ARRAYIDX5_PROMOTED]], i64 0			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> <i32 poison, i32 0, i32 0, i32 0>, i32 [[ARRAYIDX5_PROMOTED]], i64 0
				; CHECK-NEXT: [[INVARIANT_GEP:%.*]] = getelementptr i32, ptr [[VAR2]], i64 [[TMP1]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ [[TMP13]], [[VECTOR_PH]] ], [ [[TMP16:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ [[TMP12]], [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i64 [[INDEX]], [[TMP2]]			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i32, ptr [[INVARIANT_GEP]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[VAR2]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[GEP]], align 4, !alias.scope !23
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP14]], align 4, !alias.scope !23			; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP15:%.*]] = add <4 x i32> [[VEC_PHI]], [[WIDE_LOAD]]			; CHECK-NEXT: [[TMP14]] = add <4 x i32> [[TMP13]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP16]] = add <4 x i32> [[TMP15]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[DOTLCSSA:%.*]] = phi <4 x i32> [ [[TMP16]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[DOTLCSSA:%.*]] = phi <4 x i32> [ [[TMP14]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[DOTLCSSA]])			; CHECK-NEXT: [[TMP16:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[DOTLCSSA]])
	; CHECK-NEXT: store i32 [[TMP18]], ptr [[ARRAYIDX5]], align 4			; CHECK-NEXT: store i32 [[TMP16]], ptr [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP6]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP5]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_INC8_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_INC8_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[TMP2]], [[FOR_BODY3_LR_PH]] ], [ [[TMP2]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[TMP1]], [[FOR_BODY3_LR_PH]] ], [ [[TMP1]], [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP18]], [[MIDDLE_BLOCK]] ], [ [[ARRAYIDX5_PROMOTED]], [[FOR_BODY3_LR_PH]] ], [ [[ARRAYIDX5_PROMOTED]], [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP16]], [[MIDDLE_BLOCK]] ], [ [[ARRAYIDX5_PROMOTED]], [[FOR_BODY3_LR_PH]] ], [ [[ARRAYIDX5_PROMOTED]], [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY3:%.*]]			; CHECK-NEXT: br label [[FOR_BODY3:%.*]]
	; CHECK: for.body3:			; CHECK: for.body3:
	; CHECK-NEXT: [[TMP19:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP21:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[TMP17:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[TMP19:%.]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY3]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[VAR2]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[VAR2]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP19]], [[TMP20]]			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP17]], [[TMP18]]
	; CHECK-NEXT: [[TMP21]] = add nsw i32 [[ADD]], 1			; CHECK-NEXT: [[TMP19]] = add nsw i32 [[ADD]], 1
	; CHECK-NEXT: store i32 [[TMP21]], ptr [[ARRAYIDX5]], align 4			; CHECK-NEXT: store i32 [[TMP19]], ptr [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[ITR]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[ITR]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_INC8_LOOPEXIT_LOOPEXIT:%.*]], label [[FOR_BODY3]], !llvm.loop [[LOOP27:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_INC8_LOOPEXIT_LOOPEXIT:%.*]], label [[FOR_BODY3]], !llvm.loop [[LOOP27:![0-9]+]]
	; CHECK: for.inc8.loopexit.loopexit:			; CHECK: for.inc8.loopexit.loopexit:
	; CHECK-NEXT: br label [[FOR_INC8_LOOPEXIT]]			; CHECK-NEXT: br label [[FOR_INC8_LOOPEXIT]]
	; CHECK: for.inc8.loopexit:			; CHECK: for.inc8.loopexit:
	; CHECK-NEXT: br label [[FOR_INC8]]			; CHECK-NEXT: br label [[FOR_INC8]]
	▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/runtime-check.ll

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @test_runtime_check(			; CHECK-LABEL: @test_runtime_check(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.]] = icmp ult i64 [[N:%.]], 4
	; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]			; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
	; CHECK: vector.memcheck:			; CHECK: vector.memcheck:
	; CHECK-NEXT: [[TMP0:%.]] = shl i64 [[OFFSET:%.]], 2			; CHECK-NEXT: [[TMP0:%.]] = shl i64 [[OFFSET:%.]], 2
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP0]]			; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i8, ptr [[A:%.]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[N]], 2			; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[N]], 2
	; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[TMP1]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
	; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP2]]			; CHECK-NEXT: [[SCEVGEP1:%.*]] = getelementptr i8, ptr [[TMP2]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.]] = shl i64 [[OFFSET2:%.]], 2			; CHECK-NEXT: [[TMP3:%.]] = shl i64 [[OFFSET2:%.]], 2
	; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]			; CHECK-NEXT: [[SCEVGEP2:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP1]]
	; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP4]]			; CHECK-NEXT: [[SCEVGEP3:%.*]] = getelementptr i8, ptr [[TMP4]], i64 [[TMP3]]
	; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]			; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[SCEVGEP]], [[SCEVGEP3]]
	; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]			; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[SCEVGEP2]], [[SCEVGEP1]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -4			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], -4
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[B:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[B:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], [[OFFSET]]			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr float, ptr [[A]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr float, ptr [[TMP5]], i64 [[OFFSET]]
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP6]], align 4, !alias.scope !15, !noalias !18			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP6]], align 4, !alias.scope !15, !noalias !18
	; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], [[OFFSET2]]			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr float, ptr [[A]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = getelementptr float, ptr [[TMP7]], i64 [[OFFSET2]]
	; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x float>, ptr [[TMP8]], align 4, !alias.scope !18			; CHECK-NEXT: [[WIDE_LOAD4:%.*]] = load <4 x float>, ptr [[TMP8]], align 4, !alias.scope !18
	; CHECK-NEXT: [[TMP9:%.*]] = fmul fast <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD4]]			; CHECK-NEXT: [[TMP9:%.*]] = fmul fast <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD4]]
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <4 x float> [[WIDE_LOAD]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <4 x float> [[WIDE_LOAD]], [[TMP9]]
	; CHECK-NEXT: store <4 x float> [[TMP10]], ptr [[TMP6]], align 4, !alias.scope !15, !noalias !18			; CHECK-NEXT: store <4 x float> [[TMP10]], ptr [[TMP6]], align 4, !alias.scope !15, !noalias !18
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[N]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[IND_SUM:%.*]] = add i64 [[IV]], [[OFFSET]]			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: [[ARR_IDX:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM]]			; CHECK-NEXT: [[ARR_IDX:%.*]] = getelementptr float, ptr [[TMP12]], i64 [[OFFSET]]
	; CHECK-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4			; CHECK-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4
	; CHECK-NEXT: [[IND_SUM2:%.*]] = add i64 [[IV]], [[OFFSET2]]			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: [[ARR_IDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM2]]			; CHECK-NEXT: [[ARR_IDX2:%.*]] = getelementptr float, ptr [[TMP13]], i64 [[OFFSET2]]
	; CHECK-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4			; CHECK-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4
	; CHECK-NEXT: [[M:%.*]] = fmul fast float [[L2]], [[B]]			; CHECK-NEXT: [[M:%.*]] = fmul fast float [[L2]], [[B]]
	; CHECK-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]			; CHECK-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]
	; CHECK-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4			; CHECK-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; CHECK: loopexit:			; CHECK: loopexit:
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; }			; }

	define void @test_runtime_check2(ptr %a, float %b, i64 %offset, i64 %offset2, i64 %n, ptr %c) {			define void @test_runtime_check2(ptr %a, float %b, i64 %offset, i64 %offset2, i64 %n, ptr %c) {
	; CHECK-LABEL: @test_runtime_check2(			; CHECK-LABEL: @test_runtime_check2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[IND_SUM:%.]] = add i64 [[IV]], [[OFFSET:%.]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr float, ptr [[A:%.]], i64 [[IV]]
	; CHECK-NEXT: [[ARR_IDX:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[IND_SUM]]			; CHECK-NEXT: [[ARR_IDX:%.]] = getelementptr float, ptr [[TMP0]], i64 [[OFFSET:%.]]
	; CHECK-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4			; CHECK-NEXT: [[L1:%.*]] = load float, ptr [[ARR_IDX]], align 4
	; CHECK-NEXT: [[IND_SUM2:%.]] = add i64 [[IV]], [[OFFSET2:%.]]			; CHECK-NEXT: [[TMP1:%.*]] = getelementptr float, ptr [[A]], i64 [[IV]]
	; CHECK-NEXT: [[ARR_IDX2:%.*]] = getelementptr inbounds float, ptr [[A]], i64 [[IND_SUM2]]			; CHECK-NEXT: [[ARR_IDX2:%.]] = getelementptr float, ptr [[TMP1]], i64 [[OFFSET2:%.]]
	; CHECK-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4			; CHECK-NEXT: [[L2:%.*]] = load float, ptr [[ARR_IDX2]], align 4
	; CHECK-NEXT: [[M:%.]] = fmul fast float [[L2]], [[B:%.]]			; CHECK-NEXT: [[M:%.]] = fmul fast float [[L2]], [[B:%.]]
	; CHECK-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]			; CHECK-NEXT: [[AD:%.*]] = fadd fast float [[L1]], [[M]]
	; CHECK-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4			; CHECK-NEXT: store float [[AD]], ptr [[ARR_IDX]], align 4
	; CHECK-NEXT: [[C_IND:%.*]] = add nsw i64 [[IV]], -1			; CHECK-NEXT: [[TMP2:%.]] = getelementptr float, ptr [[C:%.]], i64 [[IV]]
	; CHECK-NEXT: [[C_IDX:%.]] = getelementptr inbounds float, ptr [[C:%.]], i64 [[C_IND]]			; CHECK-NEXT: [[C_IDX:%.*]] = getelementptr float, ptr [[TMP2]], i64 -1
	; CHECK-NEXT: [[LC:%.*]] = load float, ptr [[C_IDX]], align 4			; CHECK-NEXT: [[LC:%.*]] = load float, ptr [[C_IDX]], align 4
	; CHECK-NEXT: [[VC:%.*]] = fadd float [[LC]], 1.000000e+00			; CHECK-NEXT: [[VC:%.*]] = fadd float [[LC]], 1.000000e+00
	; CHECK-NEXT: [[C_IDX2:%.*]] = getelementptr inbounds float, ptr [[C]], i64 [[IV]]			; CHECK-NEXT: [[C_IDX2:%.*]] = getelementptr inbounds float, ptr [[C]], i64 [[IV]]
	; CHECK-NEXT: store float [[VC]], ptr [[C_IDX2]], align 4			; CHECK-NEXT: store float [[VC]], ptr [[C_IDX2]], align 4
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.]] = icmp eq i64 [[IV_NEXT]], [[N:%.]]			; CHECK-NEXT: [[EXITCOND:%.]] = icmp eq i64 [[IV_NEXT]], [[N:%.]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT:%.*]], label [[FOR_BODY]]
	; CHECK: loopexit:			; CHECK: loopexit:
	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops.ll

	Show All 21 Lines
	; CHECK-NEXT: br label [[INNER_HEADER:%.*]]			; CHECK-NEXT: br label [[INNER_HEADER:%.*]]
	; CHECK: inner.header:			; CHECK: inner.header:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[ROWS_BODY]] ], [ [[INNER_STEP:%.]], [[INNER_LATCH:%.*]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[ROWS_BODY]] ], [ [[INNER_STEP:%.]], [[INNER_LATCH:%.*]] ]
	; CHECK-NEXT: [[RESULT_VEC_0:%.]] = phi <2 x double> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP7:%.]], [[INNER_LATCH]] ]			; CHECK-NEXT: [[RESULT_VEC_0:%.]] = phi <2 x double> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP7:%.]], [[INNER_LATCH]] ]
	; CHECK-NEXT: [[RESULT_VEC_1:%.]] = phi <2 x double> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP9:%.]], [[INNER_LATCH]] ]			; CHECK-NEXT: [[RESULT_VEC_1:%.]] = phi <2 x double> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP9:%.]], [[INNER_LATCH]] ]
	; CHECK-NEXT: br label [[INNER_BODY:%.*]]			; CHECK-NEXT: br label [[INNER_BODY:%.*]]
	; CHECK: inner.body:			; CHECK: inner.body:
	; CHECK-NEXT: [[TMP0:%.*]] = shl i64 [[INNER_IV]], 2			; CHECK-NEXT: [[TMP0:%.*]] = shl i64 [[INNER_IV]], 2
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[ROWS_IV]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr double, ptr [[A:%.]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr double, ptr [[A:%.]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr double, ptr [[TMP1]], i64 [[ROWS_IV]]
	; CHECK-NEXT: [[COL_LOAD:%.*]] = load <2 x double>, ptr [[TMP2]], align 8			; CHECK-NEXT: [[COL_LOAD:%.*]] = load <2 x double>, ptr [[TMP2]], align 8
	; CHECK-NEXT: [[VEC_GEP:%.*]] = getelementptr double, ptr [[TMP2]], i64 4			; CHECK-NEXT: [[VEC_GEP:%.*]] = getelementptr double, ptr [[TMP2]], i64 4
	; CHECK-NEXT: [[COL_LOAD1:%.*]] = load <2 x double>, ptr [[VEC_GEP]], align 8			; CHECK-NEXT: [[COL_LOAD1:%.*]] = load <2 x double>, ptr [[VEC_GEP]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[COLS_IV]], 2			; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[COLS_IV]], 2
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP3]], [[INNER_IV]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr double, ptr [[B:%.]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr double, ptr [[B:%.]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr double, ptr [[TMP4]], i64 [[INNER_IV]]
	; CHECK-NEXT: [[COL_LOAD2:%.*]] = load <2 x double>, ptr [[TMP5]], align 8			; CHECK-NEXT: [[COL_LOAD2:%.*]] = load <2 x double>, ptr [[TMP5]], align 8
	; CHECK-NEXT: [[VEC_GEP3:%.*]] = getelementptr double, ptr [[TMP5]], i64 4			; CHECK-NEXT: [[VEC_GEP3:%.*]] = getelementptr double, ptr [[TMP5]], i64 4
	; CHECK-NEXT: [[COL_LOAD4:%.*]] = load <2 x double>, ptr [[VEC_GEP3]], align 8			; CHECK-NEXT: [[COL_LOAD4:%.*]] = load <2 x double>, ptr [[VEC_GEP3]], align 8
	; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <2 x double> [[COL_LOAD2]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <2 x double> [[COL_LOAD2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD]], <2 x double> [[SPLAT_SPLAT]], <2 x double> [[RESULT_VEC_0]])			; CHECK-NEXT: [[TMP6:%.*]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD]], <2 x double> [[SPLAT_SPLAT]], <2 x double> [[RESULT_VEC_0]])
	; CHECK-NEXT: [[SPLAT_SPLAT8:%.*]] = shufflevector <2 x double> [[COL_LOAD2]], <2 x double> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[SPLAT_SPLAT8:%.*]] = shufflevector <2 x double> [[COL_LOAD2]], <2 x double> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP7]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD1]], <2 x double> [[SPLAT_SPLAT8]], <2 x double> [[TMP6]])			; CHECK-NEXT: [[TMP7]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD1]], <2 x double> [[SPLAT_SPLAT8]], <2 x double> [[TMP6]])
	; CHECK-NEXT: [[SPLAT_SPLAT12:%.*]] = shufflevector <2 x double> [[COL_LOAD4]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[SPLAT_SPLAT12:%.*]] = shufflevector <2 x double> [[COL_LOAD4]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD]], <2 x double> [[SPLAT_SPLAT12]], <2 x double> [[RESULT_VEC_1]])			; CHECK-NEXT: [[TMP8:%.*]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD]], <2 x double> [[SPLAT_SPLAT12]], <2 x double> [[RESULT_VEC_1]])
	; CHECK-NEXT: [[SPLAT_SPLAT15:%.*]] = shufflevector <2 x double> [[COL_LOAD4]], <2 x double> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[SPLAT_SPLAT15:%.*]] = shufflevector <2 x double> [[COL_LOAD4]], <2 x double> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP9]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD1]], <2 x double> [[SPLAT_SPLAT15]], <2 x double> [[TMP8]])			; CHECK-NEXT: [[TMP9]] = call contract <2 x double> @llvm.fmuladd.v2f64(<2 x double> [[COL_LOAD1]], <2 x double> [[SPLAT_SPLAT15]], <2 x double> [[TMP8]])
	; CHECK-NEXT: br label [[INNER_LATCH]]			; CHECK-NEXT: br label [[INNER_LATCH]]
	; CHECK: inner.latch:			; CHECK: inner.latch:
	; CHECK-NEXT: [[INNER_STEP]] = add i64 [[INNER_IV]], 2			; CHECK-NEXT: [[INNER_STEP]] = add i64 [[INNER_IV]], 2
	; CHECK-NEXT: [[INNER_COND_NOT:%.*]] = icmp eq i64 [[INNER_STEP]], 4			; CHECK-NEXT: [[INNER_COND_NOT:%.*]] = icmp eq i64 [[INNER_STEP]], 4
	; CHECK-NEXT: br i1 [[INNER_COND_NOT]], label [[ROWS_LATCH]], label [[INNER_HEADER]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_COND_NOT]], label [[ROWS_LATCH]], label [[INNER_HEADER]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: rows.latch:			; CHECK: rows.latch:
	; CHECK-NEXT: [[ROWS_STEP]] = add i64 [[ROWS_IV]], 2			; CHECK-NEXT: [[ROWS_STEP]] = add i64 [[ROWS_IV]], 2
	; CHECK-NEXT: [[ROWS_COND_NOT:%.*]] = icmp eq i64 [[ROWS_STEP]], 4			; CHECK-NEXT: [[ROWS_COND_NOT:%.*]] = icmp eq i64 [[ROWS_STEP]], 4
	; CHECK-NEXT: [[TMP10:%.*]] = shl i64 [[COLS_IV]], 2			; CHECK-NEXT: [[TMP10:%.*]] = shl i64 [[COLS_IV]], 2
	; CHECK-NEXT: [[TMP11:%.*]] = add i64 [[TMP10]], [[ROWS_IV]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr double, ptr [[C:%.]], i64 [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr double, ptr [[C:%.]], i64 [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr double, ptr [[TMP11]], i64 [[ROWS_IV]]
	; CHECK-NEXT: store <2 x double> [[TMP7]], ptr [[TMP12]], align 8			; CHECK-NEXT: store <2 x double> [[TMP7]], ptr [[TMP12]], align 8
	; CHECK-NEXT: [[VEC_GEP16:%.*]] = getelementptr double, ptr [[TMP12]], i64 4			; CHECK-NEXT: [[VEC_GEP16:%.*]] = getelementptr double, ptr [[TMP12]], i64 4
	; CHECK-NEXT: store <2 x double> [[TMP9]], ptr [[VEC_GEP16]], align 8			; CHECK-NEXT: store <2 x double> [[TMP9]], ptr [[VEC_GEP16]], align 8
	; CHECK-NEXT: br i1 [[ROWS_COND_NOT]], label [[COLS_LATCH]], label [[ROWS_HEADER]]			; CHECK-NEXT: br i1 [[ROWS_COND_NOT]], label [[COLS_LATCH]], label [[ROWS_HEADER]]
	; CHECK: cols.latch:			; CHECK: cols.latch:
	; CHECK-NEXT: [[COLS_STEP]] = add i64 [[COLS_IV]], 2			; CHECK-NEXT: [[COLS_STEP]] = add i64 [[COLS_IV]], 2
	; CHECK-NEXT: [[COLS_COND_NOT:%.*]] = icmp eq i64 [[COLS_STEP]], 4			; CHECK-NEXT: [[COLS_COND_NOT:%.*]] = icmp eq i64 [[COLS_STEP]], 4
	; CHECK-NEXT: br i1 [[COLS_COND_NOT]], label [[CONTINUE:%.*]], label [[COLS_HEADER]]			; CHECK-NEXT: br i1 [[COLS_COND_NOT]], label [[CONTINUE:%.*]], label [[COLS_HEADER]]
	Show All 30 Lines
	; CHECK-NEXT: br label [[INNER_HEADER:%.*]]			; CHECK-NEXT: br label [[INNER_HEADER:%.*]]
	; CHECK: inner.header:			; CHECK: inner.header:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[ROWS_BODY]] ], [ [[INNER_STEP:%.]], [[INNER_LATCH:%.*]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[ROWS_BODY]] ], [ [[INNER_STEP:%.]], [[INNER_LATCH:%.*]] ]
	; CHECK-NEXT: [[RESULT_VEC_0:%.]] = phi <2 x i64> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP9:%.]], [[INNER_LATCH]] ]			; CHECK-NEXT: [[RESULT_VEC_0:%.]] = phi <2 x i64> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP9:%.]], [[INNER_LATCH]] ]
	; CHECK-NEXT: [[RESULT_VEC_1:%.]] = phi <2 x i64> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP13:%.]], [[INNER_LATCH]] ]			; CHECK-NEXT: [[RESULT_VEC_1:%.]] = phi <2 x i64> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP13:%.]], [[INNER_LATCH]] ]
	; CHECK-NEXT: br label [[INNER_BODY:%.*]]			; CHECK-NEXT: br label [[INNER_BODY:%.*]]
	; CHECK: inner.body:			; CHECK: inner.body:
	; CHECK-NEXT: [[TMP0:%.*]] = shl i64 [[INNER_IV]], 1			; CHECK-NEXT: [[TMP0:%.*]] = shl i64 [[INNER_IV]], 1
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[ROWS_IV]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i64, ptr [[A:%.]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[A:%.]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[TMP1]], i64 [[ROWS_IV]]
	; CHECK-NEXT: [[COL_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 8			; CHECK-NEXT: [[COL_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 8
	; CHECK-NEXT: [[VEC_GEP:%.*]] = getelementptr i64, ptr [[TMP2]], i64 2			; CHECK-NEXT: [[VEC_GEP:%.*]] = getelementptr i64, ptr [[TMP2]], i64 2
	; CHECK-NEXT: [[COL_LOAD1:%.*]] = load <2 x i64>, ptr [[VEC_GEP]], align 8			; CHECK-NEXT: [[COL_LOAD1:%.*]] = load <2 x i64>, ptr [[VEC_GEP]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[COLS_IV]], 2			; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[COLS_IV]], 2
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP3]], [[INNER_IV]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP4]], i64 [[INNER_IV]]
	; CHECK-NEXT: [[COL_LOAD2:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8			; CHECK-NEXT: [[COL_LOAD2:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8
	; CHECK-NEXT: [[VEC_GEP3:%.*]] = getelementptr i64, ptr [[TMP5]], i64 4			; CHECK-NEXT: [[VEC_GEP3:%.*]] = getelementptr i64, ptr [[TMP5]], i64 4
	; CHECK-NEXT: [[COL_LOAD4:%.*]] = load <2 x i64>, ptr [[VEC_GEP3]], align 8			; CHECK-NEXT: [[COL_LOAD4:%.*]] = load <2 x i64>, ptr [[VEC_GEP3]], align 8
	; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <2 x i64> [[COL_LOAD2]], <2 x i64> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <2 x i64> [[COL_LOAD2]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = mul <2 x i64> [[COL_LOAD]], [[SPLAT_SPLAT]]			; CHECK-NEXT: [[TMP6:%.*]] = mul <2 x i64> [[COL_LOAD]], [[SPLAT_SPLAT]]
	; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[RESULT_VEC_0]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[RESULT_VEC_0]], [[TMP6]]
	; CHECK-NEXT: [[SPLAT_SPLAT8:%.*]] = shufflevector <2 x i64> [[COL_LOAD2]], <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[SPLAT_SPLAT8:%.*]] = shufflevector <2 x i64> [[COL_LOAD2]], <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP8:%.*]] = mul <2 x i64> [[COL_LOAD1]], [[SPLAT_SPLAT8]]			; CHECK-NEXT: [[TMP8:%.*]] = mul <2 x i64> [[COL_LOAD1]], [[SPLAT_SPLAT8]]
	; CHECK-NEXT: [[TMP9]] = add <2 x i64> [[TMP7]], [[TMP8]]			; CHECK-NEXT: [[TMP9]] = add <2 x i64> [[TMP7]], [[TMP8]]
	; CHECK-NEXT: [[SPLAT_SPLAT12:%.*]] = shufflevector <2 x i64> [[COL_LOAD4]], <2 x i64> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[SPLAT_SPLAT12:%.*]] = shufflevector <2 x i64> [[COL_LOAD4]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP10:%.*]] = mul <2 x i64> [[COL_LOAD]], [[SPLAT_SPLAT12]]			; CHECK-NEXT: [[TMP10:%.*]] = mul <2 x i64> [[COL_LOAD]], [[SPLAT_SPLAT12]]
	; CHECK-NEXT: [[TMP11:%.*]] = add <2 x i64> [[RESULT_VEC_1]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = add <2 x i64> [[RESULT_VEC_1]], [[TMP10]]
	; CHECK-NEXT: [[SPLAT_SPLAT15:%.*]] = shufflevector <2 x i64> [[COL_LOAD4]], <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[SPLAT_SPLAT15:%.*]] = shufflevector <2 x i64> [[COL_LOAD4]], <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP12:%.*]] = mul <2 x i64> [[COL_LOAD1]], [[SPLAT_SPLAT15]]			; CHECK-NEXT: [[TMP12:%.*]] = mul <2 x i64> [[COL_LOAD1]], [[SPLAT_SPLAT15]]
	; CHECK-NEXT: [[TMP13]] = add <2 x i64> [[TMP11]], [[TMP12]]			; CHECK-NEXT: [[TMP13]] = add <2 x i64> [[TMP11]], [[TMP12]]
	; CHECK-NEXT: br label [[INNER_LATCH]]			; CHECK-NEXT: br label [[INNER_LATCH]]
	; CHECK: inner.latch:			; CHECK: inner.latch:
	; CHECK-NEXT: [[INNER_STEP]] = add i64 [[INNER_IV]], 2			; CHECK-NEXT: [[INNER_STEP]] = add i64 [[INNER_IV]], 2
	; CHECK-NEXT: [[INNER_COND_NOT:%.*]] = icmp eq i64 [[INNER_STEP]], 4			; CHECK-NEXT: [[INNER_COND_NOT:%.*]] = icmp eq i64 [[INNER_STEP]], 4
	; CHECK-NEXT: br i1 [[INNER_COND_NOT]], label [[ROWS_LATCH]], label [[INNER_HEADER]], !llvm.loop [[LOOP2:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_COND_NOT]], label [[ROWS_LATCH]], label [[INNER_HEADER]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: rows.latch:			; CHECK: rows.latch:
	; CHECK-NEXT: [[ROWS_STEP]] = add i64 [[ROWS_IV]], 2			; CHECK-NEXT: [[ROWS_STEP]] = add i64 [[ROWS_IV]], 2
	; CHECK-NEXT: [[ROWS_COND_NOT:%.*]] = icmp eq i64 [[ROWS_IV]], 0			; CHECK-NEXT: [[ROWS_COND_NOT:%.*]] = icmp eq i64 [[ROWS_IV]], 0
	; CHECK-NEXT: [[TMP14:%.*]] = shl i64 [[COLS_IV]], 1			; CHECK-NEXT: [[TMP14:%.*]] = shl i64 [[COLS_IV]], 1
	; CHECK-NEXT: [[TMP15:%.*]] = add i64 [[TMP14]], [[ROWS_IV]]			; CHECK-NEXT: [[TMP15:%.]] = getelementptr i64, ptr [[C:%.]], i64 [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr i64, ptr [[C:%.]], i64 [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = getelementptr i64, ptr [[TMP15]], i64 [[ROWS_IV]]
	; CHECK-NEXT: store <2 x i64> [[TMP9]], ptr [[TMP16]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP9]], ptr [[TMP16]], align 8
	; CHECK-NEXT: [[VEC_GEP16:%.*]] = getelementptr i64, ptr [[TMP16]], i64 2			; CHECK-NEXT: [[VEC_GEP16:%.*]] = getelementptr i64, ptr [[TMP16]], i64 2
	; CHECK-NEXT: store <2 x i64> [[TMP13]], ptr [[VEC_GEP16]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP13]], ptr [[VEC_GEP16]], align 8
	; CHECK-NEXT: br i1 [[ROWS_COND_NOT]], label [[COLS_LATCH]], label [[ROWS_HEADER]]			; CHECK-NEXT: br i1 [[ROWS_COND_NOT]], label [[COLS_LATCH]], label [[ROWS_HEADER]]
	; CHECK: cols.latch:			; CHECK: cols.latch:
	; CHECK-NEXT: [[COLS_STEP]] = add i64 [[COLS_IV]], 2			; CHECK-NEXT: [[COLS_STEP]] = add i64 [[COLS_IV]], 2
	; CHECK-NEXT: [[COLS_COND_NOT:%.*]] = icmp eq i64 [[COLS_IV]], 0			; CHECK-NEXT: [[COLS_COND_NOT:%.*]] = icmp eq i64 [[COLS_IV]], 0
	; CHECK-NEXT: br i1 [[COLS_COND_NOT]], label [[CONTINUE:%.*]], label [[COLS_HEADER]]			; CHECK-NEXT: br i1 [[COLS_COND_NOT]], label [[CONTINUE:%.*]], label [[COLS_HEADER]]
	Show All 36 Lines
	; CHECK-NEXT: br label [[INNER_HEADER:%.*]]			; CHECK-NEXT: br label [[INNER_HEADER:%.*]]
	; CHECK: inner.header:			; CHECK: inner.header:
	; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[ROWS_BODY]] ], [ [[INNER_STEP:%.]], [[INNER_LATCH:%.*]] ]			; CHECK-NEXT: [[INNER_IV:%.]] = phi i64 [ 0, [[ROWS_BODY]] ], [ [[INNER_STEP:%.]], [[INNER_LATCH:%.*]] ]
	; CHECK-NEXT: [[RESULT_VEC_0:%.]] = phi <2 x i64> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP9:%.]], [[INNER_LATCH]] ]			; CHECK-NEXT: [[RESULT_VEC_0:%.]] = phi <2 x i64> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP9:%.]], [[INNER_LATCH]] ]
	; CHECK-NEXT: [[RESULT_VEC_1:%.]] = phi <2 x i64> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP13:%.]], [[INNER_LATCH]] ]			; CHECK-NEXT: [[RESULT_VEC_1:%.]] = phi <2 x i64> [ zeroinitializer, [[ROWS_BODY]] ], [ [[TMP13:%.]], [[INNER_LATCH]] ]
	; CHECK-NEXT: br label [[INNER_BODY:%.*]]			; CHECK-NEXT: br label [[INNER_BODY:%.*]]
	; CHECK: inner.body:			; CHECK: inner.body:
	; CHECK-NEXT: [[TMP0:%.*]] = shl i64 [[INNER_IV]], 2			; CHECK-NEXT: [[TMP0:%.*]] = shl i64 [[INNER_IV]], 2
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[TMP0]], [[ROWS_IV]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr i64, ptr [[A:%.]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr i64, ptr [[A:%.]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i64, ptr [[TMP1]], i64 [[ROWS_IV]]
	; CHECK-NEXT: [[COL_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 8			; CHECK-NEXT: [[COL_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 8
	; CHECK-NEXT: [[VEC_GEP:%.*]] = getelementptr i64, ptr [[TMP2]], i64 4			; CHECK-NEXT: [[VEC_GEP:%.*]] = getelementptr i64, ptr [[TMP2]], i64 4
	; CHECK-NEXT: [[COL_LOAD1:%.*]] = load <2 x i64>, ptr [[VEC_GEP]], align 8			; CHECK-NEXT: [[COL_LOAD1:%.*]] = load <2 x i64>, ptr [[VEC_GEP]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[COLS_IV]], 1			; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[COLS_IV]], 1
	; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[TMP3]], [[INNER_IV]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr i64, ptr [[B:%.]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i64, ptr [[TMP4]], i64 [[INNER_IV]]
	; CHECK-NEXT: [[COL_LOAD2:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8			; CHECK-NEXT: [[COL_LOAD2:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8
	; CHECK-NEXT: [[VEC_GEP3:%.*]] = getelementptr i64, ptr [[TMP5]], i64 2			; CHECK-NEXT: [[VEC_GEP3:%.*]] = getelementptr i64, ptr [[TMP5]], i64 2
	; CHECK-NEXT: [[COL_LOAD4:%.*]] = load <2 x i64>, ptr [[VEC_GEP3]], align 8			; CHECK-NEXT: [[COL_LOAD4:%.*]] = load <2 x i64>, ptr [[VEC_GEP3]], align 8
	; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <2 x i64> [[COL_LOAD2]], <2 x i64> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[SPLAT_SPLAT:%.*]] = shufflevector <2 x i64> [[COL_LOAD2]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = mul <2 x i64> [[COL_LOAD]], [[SPLAT_SPLAT]]			; CHECK-NEXT: [[TMP6:%.*]] = mul <2 x i64> [[COL_LOAD]], [[SPLAT_SPLAT]]
	; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[RESULT_VEC_0]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[RESULT_VEC_0]], [[TMP6]]
	; CHECK-NEXT: [[SPLAT_SPLAT8:%.*]] = shufflevector <2 x i64> [[COL_LOAD2]], <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[SPLAT_SPLAT8:%.*]] = shufflevector <2 x i64> [[COL_LOAD2]], <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP8:%.*]] = mul <2 x i64> [[COL_LOAD1]], [[SPLAT_SPLAT8]]			; CHECK-NEXT: [[TMP8:%.*]] = mul <2 x i64> [[COL_LOAD1]], [[SPLAT_SPLAT8]]
	; CHECK-NEXT: [[TMP9]] = add <2 x i64> [[TMP7]], [[TMP8]]			; CHECK-NEXT: [[TMP9]] = add <2 x i64> [[TMP7]], [[TMP8]]
	; CHECK-NEXT: [[SPLAT_SPLAT12:%.*]] = shufflevector <2 x i64> [[COL_LOAD4]], <2 x i64> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[SPLAT_SPLAT12:%.*]] = shufflevector <2 x i64> [[COL_LOAD4]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP10:%.*]] = mul <2 x i64> [[COL_LOAD]], [[SPLAT_SPLAT12]]			; CHECK-NEXT: [[TMP10:%.*]] = mul <2 x i64> [[COL_LOAD]], [[SPLAT_SPLAT12]]
	; CHECK-NEXT: [[TMP11:%.*]] = add <2 x i64> [[RESULT_VEC_1]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = add <2 x i64> [[RESULT_VEC_1]], [[TMP10]]
	; CHECK-NEXT: [[SPLAT_SPLAT15:%.*]] = shufflevector <2 x i64> [[COL_LOAD4]], <2 x i64> undef, <2 x i32> <i32 1, i32 1>			; CHECK-NEXT: [[SPLAT_SPLAT15:%.*]] = shufflevector <2 x i64> [[COL_LOAD4]], <2 x i64> undef, <2 x i32> <i32 1, i32 1>
	; CHECK-NEXT: [[TMP12:%.*]] = mul <2 x i64> [[COL_LOAD1]], [[SPLAT_SPLAT15]]			; CHECK-NEXT: [[TMP12:%.*]] = mul <2 x i64> [[COL_LOAD1]], [[SPLAT_SPLAT15]]
	; CHECK-NEXT: [[TMP13]] = add <2 x i64> [[TMP11]], [[TMP12]]			; CHECK-NEXT: [[TMP13]] = add <2 x i64> [[TMP11]], [[TMP12]]
	; CHECK-NEXT: br label [[INNER_LATCH]]			; CHECK-NEXT: br label [[INNER_LATCH]]
	; CHECK: inner.latch:			; CHECK: inner.latch:
	; CHECK-NEXT: [[INNER_STEP]] = add i64 [[INNER_IV]], 2			; CHECK-NEXT: [[INNER_STEP]] = add i64 [[INNER_IV]], 2
	; CHECK-NEXT: [[INNER_COND_NOT:%.*]] = icmp eq i64 [[INNER_IV]], 0			; CHECK-NEXT: [[INNER_COND_NOT:%.*]] = icmp eq i64 [[INNER_IV]], 0
	; CHECK-NEXT: br i1 [[INNER_COND_NOT]], label [[ROWS_LATCH]], label [[INNER_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_COND_NOT]], label [[ROWS_LATCH]], label [[INNER_HEADER]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK: rows.latch:			; CHECK: rows.latch:
	; CHECK-NEXT: [[ROWS_STEP]] = add i64 [[ROWS_IV]], 2			; CHECK-NEXT: [[ROWS_STEP]] = add i64 [[ROWS_IV]], 2
	; CHECK-NEXT: [[ROWS_COND_NOT:%.*]] = icmp eq i64 [[ROWS_STEP]], 4			; CHECK-NEXT: [[ROWS_COND_NOT:%.*]] = icmp eq i64 [[ROWS_STEP]], 4
	; CHECK-NEXT: [[TMP14:%.*]] = shl i64 [[COLS_IV]], 2			; CHECK-NEXT: [[TMP14:%.*]] = shl i64 [[COLS_IV]], 2
	; CHECK-NEXT: [[TMP15:%.*]] = add i64 [[TMP14]], [[ROWS_IV]]			; CHECK-NEXT: [[TMP15:%.]] = getelementptr i64, ptr [[C:%.]], i64 [[TMP14]]
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr i64, ptr [[C:%.]], i64 [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = getelementptr i64, ptr [[TMP15]], i64 [[ROWS_IV]]
	; CHECK-NEXT: store <2 x i64> [[TMP9]], ptr [[TMP16]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP9]], ptr [[TMP16]], align 8
	; CHECK-NEXT: [[VEC_GEP16:%.*]] = getelementptr i64, ptr [[TMP16]], i64 4			; CHECK-NEXT: [[VEC_GEP16:%.*]] = getelementptr i64, ptr [[TMP16]], i64 4
	; CHECK-NEXT: store <2 x i64> [[TMP13]], ptr [[VEC_GEP16]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP13]], ptr [[VEC_GEP16]], align 8
	; CHECK-NEXT: br i1 [[ROWS_COND_NOT]], label [[COLS_LATCH]], label [[ROWS_HEADER]]			; CHECK-NEXT: br i1 [[ROWS_COND_NOT]], label [[COLS_LATCH]], label [[ROWS_HEADER]]
	; CHECK: cols.latch:			; CHECK: cols.latch:
	; CHECK-NEXT: [[COLS_STEP]] = add i64 [[COLS_IV]], 2			; CHECK-NEXT: [[COLS_STEP]] = add i64 [[COLS_IV]], 2
	; CHECK-NEXT: [[COLS_COND_NOT:%.*]] = icmp eq i64 [[COLS_STEP]], 8			; CHECK-NEXT: [[COLS_COND_NOT:%.*]] = icmp eq i64 [[COLS_STEP]], 8
	; CHECK-NEXT: br i1 [[COLS_COND_NOT]], label [[CONTINUE:%.*]], label [[COLS_HEADER]]			; CHECK-NEXT: br i1 [[COLS_COND_NOT]], label [[CONTINUE:%.*]], label [[COLS_HEADER]]
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; CHECK: inner.latch:			; CHECK: inner.latch:
	; CHECK-NEXT: [[INNER_STEP]] = add i64 [[INNER_IV]], 2			; CHECK-NEXT: [[INNER_STEP]] = add i64 [[INNER_IV]], 2
	; CHECK-NEXT: [[INNER_COND_NOT:%.*]] = icmp eq i64 [[INNER_IV]], 0			; CHECK-NEXT: [[INNER_COND_NOT:%.*]] = icmp eq i64 [[INNER_IV]], 0
	; CHECK-NEXT: br i1 [[INNER_COND_NOT]], label [[ROWS_LATCH]], label [[INNER_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]			; CHECK-NEXT: br i1 [[INNER_COND_NOT]], label [[ROWS_LATCH]], label [[INNER_HEADER]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: rows.latch:			; CHECK: rows.latch:
	; CHECK-NEXT: [[ROWS_STEP]] = add i64 [[ROWS_IV]], 2			; CHECK-NEXT: [[ROWS_STEP]] = add i64 [[ROWS_IV]], 2
	; CHECK-NEXT: [[ROWS_COND_NOT:%.*]] = icmp eq i64 [[ROWS_IV]], 0			; CHECK-NEXT: [[ROWS_COND_NOT:%.*]] = icmp eq i64 [[ROWS_IV]], 0
	; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[COLS_IV]], 1			; CHECK-NEXT: [[TMP18:%.*]] = shl i64 [[COLS_IV]], 1
	; CHECK-NEXT: [[TMP19:%.*]] = add i64 [[TMP18]], [[ROWS_IV]]			; CHECK-NEXT: [[TMP19:%.*]] = getelementptr float, ptr [[C]], i64 [[TMP18]]
	; CHECK-NEXT: [[TMP20:%.*]] = getelementptr float, ptr [[C]], i64 [[TMP19]]			; CHECK-NEXT: [[TMP20:%.*]] = getelementptr float, ptr [[TMP19]], i64 [[ROWS_IV]]
	; CHECK-NEXT: store <2 x float> [[TMP15]], ptr [[TMP20]], align 8			; CHECK-NEXT: store <2 x float> [[TMP15]], ptr [[TMP20]], align 8
	; CHECK-NEXT: [[VEC_GEP23:%.*]] = getelementptr float, ptr [[TMP20]], i64 2			; CHECK-NEXT: [[VEC_GEP23:%.*]] = getelementptr float, ptr [[TMP20]], i64 2
	; CHECK-NEXT: store <2 x float> [[TMP17]], ptr [[VEC_GEP23]], align 8			; CHECK-NEXT: store <2 x float> [[TMP17]], ptr [[VEC_GEP23]], align 8
	; CHECK-NEXT: br i1 [[ROWS_COND_NOT]], label [[COLS_LATCH]], label [[ROWS_HEADER]]			; CHECK-NEXT: br i1 [[ROWS_COND_NOT]], label [[COLS_LATCH]], label [[ROWS_HEADER]]
	; CHECK: cols.latch:			; CHECK: cols.latch:
	; CHECK-NEXT: [[COLS_STEP]] = add i64 [[COLS_IV]], 2			; CHECK-NEXT: [[COLS_STEP]] = add i64 [[COLS_IV]], 2
	; CHECK-NEXT: [[COLS_COND_NOT:%.*]] = icmp eq i64 [[COLS_IV]], 0			; CHECK-NEXT: [[COLS_COND_NOT:%.*]] = icmp eq i64 [[COLS_IV]], 0
	; CHECK-NEXT: br i1 [[COLS_COND_NOT]], label [[CONTINUE:%.*]], label [[COLS_HEADER]]			; CHECK-NEXT: br i1 [[COLS_COND_NOT]], label [[CONTINUE:%.*]], label [[COLS_HEADER]]
	Show All 23 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEPClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 557628

clang/test/CodeGenCXX/microsoft-abi-dynamic-cast.cpp

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/CodeGen/Hexagon/autohvx/vector-align-tbaa.ll

llvm/test/Transforms/InstCombine/align-addr.ll

llvm/test/Transforms/InstCombine/mem-par-metadata-memcpy.ll

llvm/test/Transforms/InstCombine/memrchr-4.ll

llvm/test/Transforms/InstCombine/shift.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

llvm/test/Transforms/LoopVectorize/induction.ll

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

llvm/test/Transforms/LoopVectorize/invariant-store-vectorization.ll

llvm/test/Transforms/LoopVectorize/runtime-check.ll

llvm/test/Transforms/LowerMatrixIntrinsics/multiply-fused-loops.ll

[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
ClosedPublic