This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
X86/
2
cost-model-assert.ll
1
uniform_mem_op.ll
-
pr44488-predication.ll

Differential D92056

[LoopVec] Support global addresses as argument to uniform mem ops
ClosedPublic

Authored by reames on Nov 24 2020, 12:34 PM.

Download Raw Diff

Details

Reviewers

fhahn
anna

Commits

rG0c866a3d6aa4: [LoopVec] Support non-instructions as argument to uniform mem ops

Summary

The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists.

This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of *it's* operand. Only instructions within the loop have their uses scanned.)

Diff Detail

Event Timeline

reames created this revision.Nov 24 2020, 12:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 12:34 PM

Herald added subscribers: dantrushin, bollu, hiraditya, mcrosier. · View Herald Transcript

reames requested review of this revision.Nov 24 2020, 12:34 PM

Harbormaster completed remote builds in B80006: Diff 307435.Nov 24 2020, 1:33 PM

ping

Nice catch! LGTM from me. One comment inline. Pls see if @fhahn has any comments as well?

llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll
52	Any idea why these extra no-ops get added? That seems like an unfortunate side effect (especially given the algorithm works to reduce compile time).

LGTM, thanks! It looks like the patch needs re-formatting before submitting.

llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll
52	IIUC this is related to the fact that we are now considering `%1 = load i8, i8* undef, align 1,` as uniform and probably chose a higher interleave count because of that? If the pointer argument would be an instruction or argument, we should get the same result even without the patch. It might be worth changing the load to use a 'real' pointer.
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
600	With the patch, constant expressions should also be supported, right? Could you add a test case for that as well?

This revision is now accepted and ready to land.Dec 3 2020, 1:52 PM

This revision was landed with ongoing or failed builds.Dec 3 2020, 2:56 PM

Closed by commit rG0c866a3d6aa4: [LoopVec] Support non-instructions as argument to uniform mem ops (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG0c866a3d6aa4: [LoopVec] Support non-instructions as argument to uniform mem ops.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

86 lines

test/

Transforms/

LoopVectorize/

X86/

cost-model-assert.ll

77 lines

uniform_mem_op.ll

77 lines

pr44488-predication.ll

55 lines

Diff 307435

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,051 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {

// Start with the conditional branch. If the branch condition is an		// Start with the conditional branch. If the branch condition is an
// instruction contained in the loop that is only used by the branch, it is		// instruction contained in the loop that is only used by the branch, it is
// uniform.		// uniform.
auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));		auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));
if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())		if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())
addToWorklistIfAllowed(Cmp);		addToWorklistIfAllowed(Cmp);

// Holds consecutive and consecutive-like pointers. Consecutive-like pointers
// are pointers that are treated like consecutive pointers during
// vectorization. The pointer operands of interleaved accesses are an
// example.
SmallSetVector<Value *, 8> ConsecutiveLikePtrs;

// Holds pointer operands of instructions that are possibly non-uniform.
SmallPtrSet<Value *, 8> PossibleNonUniformPtrs;

auto isUniformDecision = [&](Instruction *I, ElementCount VF) {		auto isUniformDecision = [&](Instruction *I, ElementCount VF) {
InstWidening WideningDecision = getWideningDecision(I, VF);		InstWidening WideningDecision = getWideningDecision(I, VF);
assert(WideningDecision != CM_Unknown &&		assert(WideningDecision != CM_Unknown &&
"Widening decision should be ready at this moment");		"Widening decision should be ready at this moment");

// The address of a uniform mem op is itself uniform. We exclude stores		// A uniform memory op is itself uniform. We exclude uniform stores
// here as there's an assumption in the current code that all uses of		// here as they demand the last lane, not the first one.
// uniform instructions are uniform and, as noted below, uniform stores are
// still handled via replication (i.e. aren't uniform after vectorization).
if (isa<LoadInst>(I) && Legal->isUniformMemOp(*I)) {		if (isa<LoadInst>(I) && Legal->isUniformMemOp(*I)) {
assert(WideningDecision == CM_Scalarize);		assert(WideningDecision == CM_Scalarize);
return true;		return true;
}		}

return (WideningDecision == CM_Widen \|\|		return (WideningDecision == CM_Widen \|\|
WideningDecision == CM_Widen_Reverse \|\|		WideningDecision == CM_Widen_Reverse \|\|
WideningDecision == CM_Interleave);		WideningDecision == CM_Interleave);
};		};


// Returns true if Ptr is the pointer operand of a memory access instruction		// Returns true if Ptr is the pointer operand of a memory access instruction
// I, and I is known to not require scalarization.		// I, and I is known to not require scalarization.
auto isVectorizedMemAccessUse = [&](Instruction I, Value Ptr) -> bool {		auto isVectorizedMemAccessUse = [&](Instruction I, Value Ptr) -> bool {
return getLoadStorePointerOperand(I) == Ptr && isUniformDecision(I, VF);		return getLoadStorePointerOperand(I) == Ptr && isUniformDecision(I, VF);
};		};

		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - + Lint: Pre-merge checks: clang-format: please reformat the code ``` - + ```
// Iterate over the instructions in the loop, and collect all		// Holds a list of values which are known to have at least one uniform use.
// consecutive-like pointer operands in ConsecutiveLikePtrs. If it's possible		// Note that there may be other uses which aren't uniform. A "uniform use"
// that a consecutive-like pointer operand will be scalarized, we collect it		// here is something which only demands lane 0 of the unrolled iterations;
// in PossibleNonUniformPtrs instead. We use two sets here because a single		// it does not imply that all lanes produce the same value (e.g. this is not
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - // it does not imply that all lanes produce the same value (e.g. this is not + // it does not imply that all lanes produce the same value (e.g. this is not Lint: Pre-merge checks: clang-format: please reformat the code ``` - // it does not imply that all lanes produce the…
// getelementptr instruction can be used by both vectorized and scalarized		// the usual meaning of uniform)
// memory instructions. For example, if a loop loads and stores from the same		SmallPtrSet<Value *, 8> HasUniformUse;
// location, but the store is conditional, the store will be scalarized, and
// the getelementptr won't remain uniform.		// Scan the loop for instructions which are either a) known to have only
		// lane 0 demanded or b) are uses which demand only lane 0 of their operand.
for (auto *BB : TheLoop->blocks())		for (auto *BB : TheLoop->blocks())
for (auto &I : *BB) {		for (auto &I : *BB) {
// If there's no pointer operand, there's nothing to do.		// If there's no pointer operand, there's nothing to do.
auto *Ptr = getLoadStorePointerOperand(&I);		auto *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)		if (!Ptr)
continue;		continue;

// For now, avoid walking use lists in other functions.		// A uniform memory op is itself uniform. We exclude uniform stores
// TODO: Rewrite this algorithm from uses up.		// here as they demand the last lane, not the first one.
if (!isa<Instruction>(Ptr) && !isa<Argument>(Ptr))
continue;

// A uniform memory op is itself uniform. We exclude stores here as we
// haven't yet added dedicated logic in the CLONE path and rely on
// REPLICATE + DSE for correctness.
if (isa<LoadInst>(I) && Legal->isUniformMemOp(I))		if (isa<LoadInst>(I) && Legal->isUniformMemOp(I))
addToWorklistIfAllowed(&I);		addToWorklistIfAllowed(&I);

// True if all users of Ptr are memory accesses that have Ptr as their		if (isUniformDecision(&I, VF)) {
// pointer operand. Since loops are assumed to be in LCSSA form, this		assert(isVectorizedMemAccessUse(&I, Ptr) && "consistency check");
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - assert(isVectorizedMemAccessUse(&I, Ptr) && "consistency check"); - HasUniformUse.insert(Ptr); + assert(isVectorizedMemAccessUse(&I, Ptr) && "consistency check"); + HasUniformUse.insert(Ptr); Lint: Pre-merge checks: clang-format: please reformat the code ``` - assert(isVectorizedMemAccessUse(&I, Ptr) &&…
		HasUniformUse.insert(Ptr);
		}
		}

		// Add to the worklist any operands which have only uniform (e.g. lane 0
		// demanding) users. Since loops are assumed to be in LCSSA form, this
// disallows uses outside the loop as well.		// disallows uses outside the loop as well.
		for (auto *V : HasUniformUse) {
		if (isOutOfScope(V))
		continue;
		auto *I = cast<Instruction>(V);
auto UsersAreMemAccesses =		auto UsersAreMemAccesses =
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - auto UsersAreMemAccesses = - llvm::all_of(I->users(), [&](User U) -> bool { - return isVectorizedMemAccessUse(cast<Instruction>(U), V); - }); + auto UsersAreMemAccesses = llvm::all_of(I->users(), [&](User U) -> bool { + return isVectorizedMemAccessUse(cast<Instruction>(U), V); + }); Lint: Pre-merge checks: clang-format: please reformat the code ``` - auto UsersAreMemAccesses = - llvm::all_of…
llvm::all_of(Ptr->users(), [&](User *U) -> bool {		llvm::all_of(I->users(), [&](User *U) -> bool {
return getLoadStorePointerOperand(U) == Ptr;		return isVectorizedMemAccessUse(cast<Instruction>(U), V);
});		});
		if (UsersAreMemAccesses)
// Ensure the memory instruction will not be scalarized or used by
// gather/scatter, making its pointer operand non-uniform. If the pointer
// operand is used by any instruction other than a memory access, we
// conservatively assume the pointer operand may be non-uniform.
if (!UsersAreMemAccesses \|\| !isUniformDecision(&I, VF))
PossibleNonUniformPtrs.insert(Ptr);

// If the memory instruction will be vectorized and its pointer operand
// is consecutive-like, or interleaving - the pointer operand should
// remain uniform.
else
ConsecutiveLikePtrs.insert(Ptr);
}

// Add to the Worklist all consecutive and consecutive-like pointers that
// aren't also identified as possibly non-uniform.
for (auto *V : ConsecutiveLikePtrs)
if (!PossibleNonUniformPtrs.count(V))
if (auto *I = dyn_cast<Instruction>(V))
addToWorklistIfAllowed(I);		addToWorklistIfAllowed(I);
		}

// Expand Worklist in topological order: whenever a new instruction		// Expand Worklist in topological order: whenever a new instruction
// is added , its users should be already inside Worklist. It ensures		// is added , its users should be already inside Worklist. It ensures
// a uniform instruction will only be used by uniform instructions.		// a uniform instruction will only be used by uniform instructions.
unsigned idx = 0;		unsigned idx = 0;
while (idx != Worklist.size()) {		while (idx != Worklist.size()) {
Instruction *I = Worklist[idx++];		Instruction *I = Worklist[idx++];

▲ Show 20 Lines • Show All 3,582 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll

	Show All 14 Lines
	; CHECK-LABEL: @cff_index_load_offsets(			; CHECK-LABEL: @cff_index_load_offsets(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_THEN:%.]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_THEN:%.]], label [[EXIT:%.*]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x i8> undef, i8 [[X:%.]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x i8> undef, i8 [[X:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> undef, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.*]] = insertelement <4 x i8> undef, i8 [[X]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT2]], <4 x i8> undef, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4			; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i8, i8 null, i64 [[TMP1]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i8, i8 null, i64 [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT]] to <4 x i32>			; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP3:%.*]] = shl nuw <4 x i32> [[TMP2]], <i32 24, i32 24, i32 24, i32 24>			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
	; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[P:%.]], align 1, [[TBAA1:!tbaa !.]]			; CHECK-NEXT: [[NEXT_GEP1:%.]] = getelementptr i8, i8 null, i64 [[TMP3]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i8> undef, i8 [[TMP4]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT]] to <4 x i32>
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT1]], <4 x i8> undef, <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT3]] to <4 x i32>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT2]] to <4 x i32>			; CHECK-NEXT: [[TMP6:%.*]] = shl nuw <4 x i32> [[TMP4]], <i32 24, i32 24, i32 24, i32 24>
	; CHECK-NEXT: [[TMP6:%.*]] = shl nuw nsw <4 x i32> [[TMP5]], <i32 16, i32 16, i32 16, i32 16>			; CHECK-NEXT: [[TMP7:%.*]] = shl nuw <4 x i32> [[TMP5]], <i32 24, i32 24, i32 24, i32 24>
	; CHECK-NEXT: [[TMP7:%.*]] = or <4 x i32> [[TMP6]], [[TMP3]]			; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[P:%.]], align 1, [[TBAA1:!tbaa !.]]
	; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 undef, align 1, [[TBAA1]]			; CHECK-NEXT: [[BROADCAST_SPLATINSERT4:%.*]] = insertelement <4 x i8> undef, i8 [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP9:%.]] = load i8, i8 undef, align 1, [[TBAA1]]			; CHECK-NEXT: [[BROADCAST_SPLAT5:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT4]], <4 x i8> undef, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP10:%.]] = load i8, i8 undef, align 1, [[TBAA1]]			; CHECK-NEXT: [[TMP9:%.]] = load i8, i8 [[P]], align 1, [[TBAA1]]
	; CHECK-NEXT: [[TMP11:%.]] = load i8, i8 undef, align 1, [[TBAA1]]			; CHECK-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <4 x i8> undef, i8 [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = or <4 x i32> [[TMP7]], zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT7:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT6]], <4 x i8> undef, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP13:%.*]] = or <4 x i32> [[TMP12]], zeroinitializer			; CHECK-NEXT: [[TMP10:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT5]] to <4 x i32>
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP13]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT7]] to <4 x i32>
	; CHECK-NEXT: store i32 [[TMP14]], i32* undef, align 4, [[TBAA4:!tbaa !.*]]			; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw <4 x i32> [[TMP10]], <i32 16, i32 16, i32 16, i32 16>
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[TMP13]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw <4 x i32> [[TMP11]], <i32 16, i32 16, i32 16, i32 16>
	; CHECK-NEXT: store i32 [[TMP15]], i32* undef, align 4, [[TBAA4]]			; CHECK-NEXT: [[TMP14:%.*]] = or <4 x i32> [[TMP12]], [[TMP6]]
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i32> [[TMP13]], i32 2			; CHECK-NEXT: [[TMP15:%.*]] = or <4 x i32> [[TMP13]], [[TMP7]]
	; CHECK-NEXT: store i32 [[TMP16]], i32* undef, align 4, [[TBAA4]]			; CHECK-NEXT: [[TMP16:%.]] = load i8, i8 undef, align 1, [[TBAA1]]
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[TMP13]], i32 3			; CHECK-NEXT: [[TMP17:%.]] = load i8, i8 undef, align 1, [[TBAA1]]
	; CHECK-NEXT: store i32 [[TMP17]], i32* undef, align 4, [[TBAA4]]			; CHECK-NEXT: [[TMP18:%.*]] = or <4 x i32> [[TMP14]], zeroinitializer
				annaUnsubmitted Not Done Reply Inline Actions Any idea why these extra no-ops get added? That seems like an unfortunate side effect (especially given the algorithm works to reduce compile time). anna: Any idea why these extra no-ops get added? That seems like an unfortunate side effect…
				fhahnUnsubmitted Not Done Reply Inline Actions IIUC this is related to the fact that we are now considering `%1 = load i8, i8* undef, align 1,` as uniform and probably chose a higher interleave count because of that? If the pointer argument would be an instruction or argument, we should get the same result even without the patch. It might be worth changing the load to use a 'real' pointer. fhahn: IIUC this is related to the fact that we are now considering `%1 = load i8, i8* undef, align 1…
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[TMP19:%.*]] = or <4 x i32> [[TMP15]], zeroinitializer
	; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0			; CHECK-NEXT: [[TMP20:%.*]] = or <4 x i32> [[TMP18]], zeroinitializer
	; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]			; CHECK-NEXT: [[TMP21:%.*]] = or <4 x i32> [[TMP19]], zeroinitializer
				; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i32> [[TMP20]], i32 0
				; CHECK-NEXT: store i32 [[TMP22]], i32* undef, align 4, [[TBAA4:!tbaa !.*]]
				; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP20]], i32 1
				; CHECK-NEXT: store i32 [[TMP23]], i32* undef, align 4, [[TBAA4]]
				; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i32> [[TMP20]], i32 2
				; CHECK-NEXT: store i32 [[TMP24]], i32* undef, align 4, [[TBAA4]]
				; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP20]], i32 3
				; CHECK-NEXT: store i32 [[TMP25]], i32* undef, align 4, [[TBAA4]]
				; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i32> [[TMP21]], i32 0
				; CHECK-NEXT: store i32 [[TMP26]], i32* undef, align 4, [[TBAA4]]
				; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i32> [[TMP21]], i32 1
				; CHECK-NEXT: store i32 [[TMP27]], i32* undef, align 4, [[TBAA4]]
				; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x i32> [[TMP21]], i32 2
				; CHECK-NEXT: store i32 [[TMP28]], i32* undef, align 4, [[TBAA4]]
				; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i32> [[TMP21]], i32 3
				; CHECK-NEXT: store i32 [[TMP29]], i32* undef, align 4, [[TBAA4]]
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 8
				; CHECK-NEXT: [[TMP30:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
				; CHECK-NEXT: br i1 [[TMP30]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1, 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1, 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[SW_EPILOG:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[SW_EPILOG:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ null, [[MIDDLE_BLOCK]] ], [ null, [[IF_THEN]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ null, [[MIDDLE_BLOCK]] ], [ null, [[IF_THEN]] ]
	; CHECK-NEXT: br label [[FOR_BODY68:%.*]]			; CHECK-NEXT: br label [[FOR_BODY68:%.*]]
	; CHECK: for.body68:			; CHECK: for.body68:
	; CHECK-NEXT: [[P_359:%.]] = phi i8 [ [[ADD_PTR86:%.*]], [[FOR_BODY68]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[P_359:%.]] = phi i8 [ [[ADD_PTR86:%.*]], [[FOR_BODY68]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[CONV70:%.*]] = zext i8 [[X]] to i32			; CHECK-NEXT: [[CONV70:%.*]] = zext i8 [[X]] to i32
	; CHECK-NEXT: [[SHL71:%.*]] = shl nuw i32 [[CONV70]], 24			; CHECK-NEXT: [[SHL71:%.*]] = shl nuw i32 [[CONV70]], 24
	; CHECK-NEXT: [[TMP19:%.]] = load i8, i8 [[P]], align 1, [[TBAA1]]			; CHECK-NEXT: [[TMP31:%.]] = load i8, i8 [[P]], align 1, [[TBAA1]]
	; CHECK-NEXT: [[CONV73:%.*]] = zext i8 [[TMP19]] to i32			; CHECK-NEXT: [[CONV73:%.*]] = zext i8 [[TMP31]] to i32
	; CHECK-NEXT: [[SHL74:%.*]] = shl nuw nsw i32 [[CONV73]], 16			; CHECK-NEXT: [[SHL74:%.*]] = shl nuw nsw i32 [[CONV73]], 16
	; CHECK-NEXT: [[OR75:%.*]] = or i32 [[SHL74]], [[SHL71]]			; CHECK-NEXT: [[OR75:%.*]] = or i32 [[SHL74]], [[SHL71]]
	; CHECK-NEXT: [[TMP20:%.]] = load i8, i8 undef, align 1, [[TBAA1]]			; CHECK-NEXT: [[TMP32:%.]] = load i8, i8 undef, align 1, [[TBAA1]]
	; CHECK-NEXT: [[SHL78:%.*]] = shl nuw nsw i32 undef, 8			; CHECK-NEXT: [[SHL78:%.*]] = shl nuw nsw i32 undef, 8
	; CHECK-NEXT: [[OR79:%.*]] = or i32 [[OR75]], [[SHL78]]			; CHECK-NEXT: [[OR79:%.*]] = or i32 [[OR75]], [[SHL78]]
	; CHECK-NEXT: [[CONV81:%.*]] = zext i8 undef to i32			; CHECK-NEXT: [[CONV81:%.*]] = zext i8 undef to i32
	; CHECK-NEXT: [[OR83:%.*]] = or i32 [[OR79]], [[CONV81]]			; CHECK-NEXT: [[OR83:%.*]] = or i32 [[OR79]], [[CONV81]]
	; CHECK-NEXT: store i32 [[OR83]], i32* undef, align 4, [[TBAA4]]			; CHECK-NEXT: store i32 [[OR83]], i32* undef, align 4, [[TBAA4]]
	; CHECK-NEXT: [[ADD_PTR86]] = getelementptr inbounds i8, i8* [[P_359]], i64 4			; CHECK-NEXT: [[ADD_PTR86]] = getelementptr inbounds i8, i8* [[P_359]], i64 4
	; CHECK-NEXT: [[CMP66:%.]] = icmp ult i8 [[ADD_PTR86]], undef			; CHECK-NEXT: [[CMP66:%.]] = icmp ult i8 [[ADD_PTR86]], undef
	; CHECK-NEXT: br i1 [[CMP66]], label [[FOR_BODY68]], label [[SW_EPILOG]], [[LOOP8:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP66]], label [[FOR_BODY68]], label [[SW_EPILOG]], [[LOOP8:!llvm.loop !.*]]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

Show First 20 Lines • Show All 587 Lines • ▼ Show 20 Lines	loop:
%val = zext i8 %test to i32		%val = zext i8 %test to i32
%accum.next = add i32 %accum, %val		%accum.next = add i32 %accum, %val
%exit = icmp ugt i64 %iv, 4094		%exit = icmp ugt i64 %iv, 4094
br i1 %exit, label %loop_exit, label %loop		br i1 %exit, label %loop_exit, label %loop

loop_exit:		loop_exit:
ret i32 %accum.next		ret i32 %accum.next
}		}

		;; Same as uniform_load, but show that the uniformity analysis can handle
		;; pointer operands which are not local to the function.
		@GAddr = external global i32 align 4
		define i32 @uniform_load_global() {
		fhahnUnsubmitted Not Done Reply Inline Actions With the patch, constant expressions should also be supported, right? Could you add a test case for that as well? fhahn: With the patch, constant expressions should also be supported, right? Could you add a test case…
		; CHECK-LABEL: @uniform_load_global(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK: vector.ph:
		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP8:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP9:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP10:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_PHI3:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP11:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
		; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
		; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
		; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
		; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 @GAddr, align 4
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 @GAddr, align 4
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT4:%.*]] = insertelement <4 x i32> undef, i32 [[TMP5]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT5:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT4]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 @GAddr, align 4
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT6:%.*]] = insertelement <4 x i32> undef, i32 [[TMP6]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT7:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT6]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 @GAddr, align 4
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT8:%.*]] = insertelement <4 x i32> undef, i32 [[TMP7]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> undef, <4 x i32> zeroinitializer
		; CHECK-NEXT: [[TMP8]] = add <4 x i32> [[VEC_PHI]], [[BROADCAST_SPLAT]]
		; CHECK-NEXT: [[TMP9]] = add <4 x i32> [[VEC_PHI1]], [[BROADCAST_SPLAT5]]
		; CHECK-NEXT: [[TMP10]] = add <4 x i32> [[VEC_PHI2]], [[BROADCAST_SPLAT7]]
		; CHECK-NEXT: [[TMP11]] = add <4 x i32> [[VEC_PHI3]], [[BROADCAST_SPLAT9]]
		; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
		; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
		; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP21:!llvm.loop !.]]
		; CHECK: middle.block:
		; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP9]], [[TMP8]]
		; CHECK-NEXT: [[BIN_RDX10:%.*]] = add <4 x i32> [[TMP10]], [[BIN_RDX]]
		; CHECK-NEXT: [[BIN_RDX11:%.*]] = add <4 x i32> [[TMP11]], [[BIN_RDX10]]
		; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX11]])
		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4097, 4096
		; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
		; CHECK: scalar.ph:
		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]
		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
		; CHECK: for.body:
		; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
		; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[ACCUM_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
		; CHECK-NEXT: [[LOAD:%.]] = load i32, i32 @GAddr, align 4
		; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[LOAD]]
		; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV]], 4096
		; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOPEXIT]], label [[FOR_BODY]], [[LOOP22:!llvm.loop !.*]]
		; CHECK: loopexit:
		; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[FOR_BODY]] ], [ [[TMP13]], [[MIDDLE_BLOCK]] ]
		; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
		;
		entry:
		br label %for.body

		for.body:
		%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
		%accum = phi i32 [%accum.next, %for.body], [0, %entry]
		%load = load i32, i32* @GAddr
		%accum.next = add i32 %accum, %load
		%iv.next = add nuw nsw i64 %iv, 1
		%exitcond = icmp eq i64 %iv, 4096
		br i1 %exitcond, label %loopexit, label %for.body

		loopexit:
		ret i32 %accum.next
		}

llvm/test/Transforms/LoopVectorize/pr44488-predication.ll

	Show All 9 Lines

	define i16 @test_true_and_false_branch_equal() {			define i16 @test_true_and_false_branch_equal() {
	; CHECK-LABEL: @test_true_and_false_branch_equal(			; CHECK-LABEL: @test_true_and_false_branch_equal(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_SREM_CONTINUE2:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_SREM_CONTINUE4:%.*]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[INDEX]] to i16			; CHECK-NEXT: [[TMP0:%.*]] = trunc i32 [[INDEX]] to i16
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i16 99, [[TMP0]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = add i16 99, [[TMP0]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i16> undef, i16 [[OFFSET_IDX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i16> undef, i16 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT]], <2 x i16> undef, <2 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT]], <2 x i16> undef, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i16> [[BROADCAST_SPLAT]], <i16 0, i16 1>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i16> [[BROADCAST_SPLAT]], <i16 0, i16 1>
	; CHECK-NEXT: [[TMP1:%.*]] = add i16 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP1:%.*]] = add i16 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[TMP2:%.]] = load i16, i16 @v_38, align 1			; CHECK-NEXT: [[TMP2:%.]] = load i16, i16 @v_38, align 1
	; CHECK-NEXT: [[TMP3:%.]] = load i16, i16 @v_38, align 1			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i16> undef, i16 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i16> undef, i16 [[TMP2]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT1]], <2 x i16> undef, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i16> [[TMP4]], i16 [[TMP3]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = icmp eq <2 x i16> [[BROADCAST_SPLAT2]], <i16 32767, i16 32767>
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq <2 x i16> [[TMP5]], <i16 32767, i16 32767>			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <2 x i16> [[BROADCAST_SPLAT2]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = icmp eq <2 x i16> [[TMP5]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i1> [[TMP4]], <i1 true, i1 true>
	; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i1> [[TMP7]], <i1 true, i1 true>			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i1> [[TMP8]], i32 0			; CHECK-NEXT: br i1 [[TMP6]], label [[PRED_SREM_IF:%.]], label [[PRED_SREM_CONTINUE:%.]]
	; CHECK-NEXT: br i1 [[TMP9]], label [[PRED_SREM_IF:%.]], label [[PRED_SREM_CONTINUE:%.]]
	; CHECK: pred.srem.if:			; CHECK: pred.srem.if:
	; CHECK-NEXT: [[TMP10:%.*]] = srem i16 5786, [[TMP2]]			; CHECK-NEXT: [[TMP7:%.*]] = srem i16 5786, [[TMP2]]
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i16> undef, i16 [[TMP10]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i16> undef, i16 [[TMP7]], i32 0
	; CHECK-NEXT: br label [[PRED_SREM_CONTINUE]]			; CHECK-NEXT: br label [[PRED_SREM_CONTINUE]]
	; CHECK: pred.srem.continue:			; CHECK: pred.srem.continue:
	; CHECK-NEXT: [[TMP12:%.*]] = phi <2 x i16> [ undef, [[VECTOR_BODY]] ], [ [[TMP11]], [[PRED_SREM_IF]] ]			; CHECK-NEXT: [[TMP9:%.*]] = phi <2 x i16> [ undef, [[VECTOR_BODY]] ], [ [[TMP8]], [[PRED_SREM_IF]] ]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP8]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP5]], i32 1
	; CHECK-NEXT: br i1 [[TMP13]], label [[PRED_SREM_IF1:%.*]], label [[PRED_SREM_CONTINUE2]]			; CHECK-NEXT: br i1 [[TMP10]], label [[PRED_SREM_IF3:%.*]], label [[PRED_SREM_CONTINUE4]]
	; CHECK: pred.srem.if1:			; CHECK: pred.srem.if3:
	; CHECK-NEXT: [[TMP14:%.*]] = srem i16 5786, [[TMP3]]			; CHECK-NEXT: [[TMP11:%.*]] = srem i16 5786, [[TMP2]]
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x i16> [[TMP12]], i16 [[TMP14]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i16> [[TMP9]], i16 [[TMP11]], i32 1
	; CHECK-NEXT: br label [[PRED_SREM_CONTINUE2]]			; CHECK-NEXT: br label [[PRED_SREM_CONTINUE4]]
	; CHECK: pred.srem.continue2:			; CHECK: pred.srem.continue4:
	; CHECK-NEXT: [[TMP16:%.*]] = phi <2 x i16> [ [[TMP12]], [[PRED_SREM_CONTINUE]] ], [ [[TMP15]], [[PRED_SREM_IF1]] ]			; CHECK-NEXT: [[TMP13:%.*]] = phi <2 x i16> [ [[TMP9]], [[PRED_SREM_CONTINUE]] ], [ [[TMP12]], [[PRED_SREM_IF3]] ]
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP7]], <2 x i16> <i16 5786, i16 5786>, <2 x i16> [[TMP16]]			; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP4]], <2 x i16> <i16 5786, i16 5786>, <2 x i16> [[TMP13]]
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 0
	; CHECK-NEXT: store i16 [[TMP17]], i16* @v_39, align 1			; CHECK-NEXT: store i16 [[TMP14]], i16* @v_39, align 1
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x i16> [[PREDPHI]], i32 1
	; CHECK-NEXT: store i16 [[TMP18]], i16* @v_39, align 1			; CHECK-NEXT: store i16 [[TMP15]], i16* @v_39, align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i32 [[INDEX_NEXT]], 12			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i32 [[INDEX_NEXT]], 12
	; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 12, 12			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 12, 12
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 111, [[MIDDLE_BLOCK]] ], [ 99, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 111, [[MIDDLE_BLOCK]] ], [ 99, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_07:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC7:%.]], [[FOR_LATCH:%.*]] ]			; CHECK-NEXT: [[I_07:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC7:%.]], [[FOR_LATCH:%.*]] ]
	; CHECK-NEXT: [[LV:%.]] = load i16, i16 @v_38, align 1			; CHECK-NEXT: [[LV:%.]] = load i16, i16 @v_38, align 1
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i16 [[LV]], 32767			; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i16 [[LV]], 32767
	; CHECK-NEXT: br i1 [[CMP1]], label [[COND_END:%.*]], label [[COND_END]]			; CHECK-NEXT: br i1 [[CMP1]], label [[COND_END:%.*]], label [[COND_END]]
	; CHECK: cond.end:			; CHECK: cond.end:
	; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i16 [[LV]], 0			; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i16 [[LV]], 0
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_LATCH]], label [[COND_FALSE4:%.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_LATCH]], label [[COND_FALSE4:%.*]]
	; CHECK: cond.false4:			; CHECK: cond.false4:
	; CHECK-NEXT: [[REM:%.*]] = srem i16 5786, [[LV]]			; CHECK-NEXT: [[REM:%.*]] = srem i16 5786, [[LV]]
	; CHECK-NEXT: br label [[FOR_LATCH]]			; CHECK-NEXT: br label [[FOR_LATCH]]
	; CHECK: for.latch:			; CHECK: for.latch:
	; CHECK-NEXT: [[COND6:%.*]] = phi i16 [ [[REM]], [[COND_FALSE4]] ], [ 5786, [[COND_END]] ]			; CHECK-NEXT: [[COND6:%.*]] = phi i16 [ [[REM]], [[COND_FALSE4]] ], [ 5786, [[COND_END]] ]
	; CHECK-NEXT: store i16 [[COND6]], i16* @v_39, align 1			; CHECK-NEXT: store i16 [[COND6]], i16* @v_39, align 1
	; CHECK-NEXT: [[INC7]] = add nsw i16 [[I_07]], 1			; CHECK-NEXT: [[INC7]] = add nsw i16 [[I_07]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[INC7]], 111			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[INC7]], 111
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], !llvm.loop !2			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[RV:%.]] = load i16, i16 @v_39, align 1			; CHECK-NEXT: [[RV:%.]] = load i16, i16 @v_39, align 1
	; CHECK-NEXT: ret i16 [[RV]]			; CHECK-NEXT: ret i16 [[RV]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.latch			for.body: ; preds = %entry, %for.latch
	Show All 24 Lines