This is an archive of the discontinued LLVM Phabricator instance.

[LV] Mark instructions with loop invariant arguments as uniform. (WIP)
AbandonedPublic

Authored by fhahn on Oct 10 2019, 1:41 PM.

Download Raw Diff

Details

Reviewers

hsaito
rengolin
dcaballe
Ayal

Summary

As suggested by Ayal in D59995, we can mark instructions with
loop invariant arguments as uniform. They will always produce
the same result.

Now that we can have more uniform instructions, there were some
assertions that needed relaxing a bit.

Also, there still seems to be an issue with constant folding in LV not
being able to simplify some uniform values compared to their replicated
equivalents. I still have to look into that, but I wanted to make sure
the overall approach aligns well.

The overall impact of the change is probably quite low, but at least in
the test-suite, there are around 4 benchmarks were we ended up
vectorizing a few more loops.

Currently we still miss some uniform instructions, that only have
uniform operands, but that can be addressed as follow-up.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 39359
Build 39375: arc lint + arc unit

Event Timeline

fhahn created this revision.Oct 10 2019, 1:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2019, 1:41 PM

Herald added subscribers: rkruppe, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B39359: Diff 224463.Oct 10 2019, 1:46 PM

Ayal mentioned this in D69067: [LV] Record GEP widening decisions in recipe (NFCI).Oct 31 2019, 7:57 AM

Thanks for coming back to look into this @fhahn !

Overall, wonder what the current forward-propagating invariance analysis is missing, and if it's better to keep it separate from the backward-propagating "DemandedLanes" analysis aka uniform-after-vectorization. More specific comments inline.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2065	There's a distinction between (1) the truly "uniform values" of Legal->isUniform(), and (2) those of Cost->isUniformAfterVectorization() that "become" uniform. In both cases generating the single scalar value of lane zero suffices; the values of all other lanes need not be generated, because (1) they are all equal to that of lane zero, or (2) they are all dead. I.e., DemandedLanes={0} in case (2). Instructions of case (1) should have ideally been LICM'd out, except for conditional instructions that have side-effects. Case (2) is what this assert is trying to verify - a user requesting the value of some lane>0 contradicts the liveness assumption; feeding it with the value of lane 0 may be feeding it a different, wrong value. Note that this change to getOrCreateScalarValue() alone "fixes" PR40816, but there the users requesting the values of lanes>0 are essentially dead. Would be good to devise a test involving a predicated uniform-after-vectorization instruction, that is essentially live. Would be good to come up with a better name than UniformAfterVectorization, which looks up a set named "Uniforms". Suggestions?
4703	Such instructions should be identified as Legal->isUniform(), right? Note that in general loop invariance is stronger than uniformity, though Legal->isUniform() currently does return isLoopInvariant().
4713	While we're here, _or_null should be dropped, &I cannot possibly be null.
5485	Predicated instructions must not be uniform-after-vectorization currently, and this should better be checked earlier to avoid inserting them into Uniforms, or bailout from vectorizing, to prevent crashing as in PR40816. This assert doesn't trigger there because it's masked under UseEmulatedMaskedMemRefHack(). When predicated instructions are allowed to be uniform-after-vectorization, it would be good for VPlan to reflect it, and have the VPRegionBlock built for it generate only a single replica (per part?) - that of lane zero.
5533	Comments and code above should be updated?

Thanks for taking a look Ayal! I'll update the patch once D70298 landed, to avoid causing unnecessary rebasing.

Ayal mentioned this in D71071: [LV] Pick correct BB as insert point when fixing PHI for FORs..Dec 6 2019, 1:45 AM

fhahn mentioned this in D91398: [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE).Nov 14 2020, 12:19 PM

In D68831#1731425, @Ayal wrote:

Overall, wonder what the current forward-propagating invariance analysis is missing, and if it's better to keep it separate from the backward-propagating "DemandedLanes" analysis aka uniform-after-vectorization. More specific comments inline.

Just to note, the terminology "demanded lane" is much more clear than the existing uniform-after-vectorization. It took me quite a while to figure out what that code was doing the first time I glanced at it, particular since it didn't consider values which were Legal->isUniform as being uniform after vectorization!

rkruppe removed a subscriber: rkruppe.Nov 23 2020, 6:22 AM

This is no longer needed

Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 2:15 AM

Herald added a subscriber: • pcwang-thead. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

40 lines

test/

Transforms/

LoopVectorize/

AArch64/

extractvalue-no-scalarization-required.ll

60 lines

X86/

12 lines

38 lines

40 lines

40 lines

invariant-load-gather.ll

18 lines

invariant-store-vectorization.ll

57 lines

load-deref-pred.ll

282 lines

first-order-recurrence.ll

5 lines

no_outside_user.ll

17 lines

pr32859.ll

25 lines

vector-intrinsic-call-cost.ll

39 lines

Diff 224463

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,056 Lines • ▼ Show 20 Lines
Value *		Value *
InnerLoopVectorizer::getOrCreateScalarValue(Value *V,		InnerLoopVectorizer::getOrCreateScalarValue(Value *V,
const VPIteration &Instance) {		const VPIteration &Instance) {
// If the value is not an instruction contained in the loop, it should		// If the value is not an instruction contained in the loop, it should
// already be scalar.		// already be scalar.
if (OrigLoop->isLoopInvariant(V))		if (OrigLoop->isLoopInvariant(V))
return V;		return V;

assert(Instance.Lane > 0		// Always use lane 0 for uniform values.
		AyalUnsubmitted Not Done Reply Inline Actions There's a distinction between (1) the truly "uniform values" of Legal->isUniform(), and (2) those of Cost->isUniformAfterVectorization() that "become" uniform. In both cases generating the single scalar value of lane zero suffices; the values of all other lanes need not be generated, because (1) they are all equal to that of lane zero, or (2) they are all dead. I.e., DemandedLanes={0} in case (2). Instructions of case (1) should have ideally been LICM'd out, except for conditional instructions that have side-effects. Case (2) is what this assert is trying to verify - a user requesting the value of some lane>0 contradicts the liveness assumption; feeding it with the value of lane 0 may be feeding it a different, wrong value. Note that this change to getOrCreateScalarValue() alone "fixes" PR40816, but there the users requesting the values of lanes>0 are essentially dead. Would be good to devise a test involving a predicated uniform-after-vectorization instruction, that is essentially live. Would be good to come up with a better name than UniformAfterVectorization, which looks up a set named "Uniforms". Suggestions? Ayal: There's a distinction between (1) the truly "uniform values" of Legal->isUniform(), and (2)…
? !Cost->isUniformAfterVectorization(cast<Instruction>(V), VF)		unsigned Lane = Instance.Lane;
: true && "Uniform values only have lane zero");		if (Cost->isUniformAfterVectorization(cast<Instruction>(V), VF))
		Lane = 0;
// If the value from the original loop has not been vectorized, it is		// If the value from the original loop has not been vectorized, it is
// represented by UF x VF scalar values in the new loop. Return the requested		// represented by UF x VF scalar values in the new loop. Return the requested
// scalar value.		// scalar value.
if (VectorLoopValueMap.hasScalarValue(V, Instance))		if (VectorLoopValueMap.hasScalarValue(V, {Instance.Part, Lane}))
return VectorLoopValueMap.getScalarValue(V, Instance);		return VectorLoopValueMap.getScalarValue(V, {Instance.Part, Lane});

// If the value has not been scalarized, get its entry in VectorLoopValueMap		// If the value has not been scalarized, get its entry in VectorLoopValueMap
// for the given unroll part. If this entry is not a vector type (i.e., the		// for the given unroll part. If this entry is not a vector type (i.e., the
// vectorization factor is one), there is no need to generate an		// vectorization factor is one), there is no need to generate an
// extractelement instruction.		// extractelement instruction.
auto *U = getOrCreateVectorValue(V, Instance.Part);		auto *U = getOrCreateVectorValue(V, Instance.Part);
if (!U->getType()->isVectorTy()) {		if (!U->getType()->isVectorTy()) {
assert(VF == 1 && "Value not scalarized has non-vector type");		assert(VF == 1 && "Value not scalarized has non-vector type");
▲ Show 20 Lines • Show All 2,584 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectLoopUniforms(unsigned VF) {
// instruction contained in the loop that is only used by the branch, it is		// instruction contained in the loop that is only used by the branch, it is
// uniform.		// uniform.
auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));		auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));
if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse()) {		if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse()) {
Worklist.insert(Cmp);		Worklist.insert(Cmp);
LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *Cmp << "\n");		LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *Cmp << "\n");
}		}

// Holds consecutive and consecutive-like pointers. Consecutive-like pointers		// Holds consecutive and consecutive-like pointers, as well as trivially loop
// are pointers that are treated like consecutive pointers during		// invariant instructions. Consecutive-like pointers are pointers that are
// vectorization. The pointer operands of interleaved accesses are an		// treated like consecutive pointers during vectorization. The pointer
// example.		// operands of interleaved accesses are an example.
SmallSetVector<Instruction *, 8> ConsecutiveLikePtrs;		SmallSetVector<Instruction *, 8> PotentialUniformRoots;

// Holds pointer operands of instructions that are possibly non-uniform.		// Holds pointer operands of instructions that are possibly non-uniform.
SmallPtrSet<Instruction *, 8> PossibleNonUniformPtrs;		SmallPtrSet<Instruction *, 8> PossibleNonUniformPtrs;

auto isUniformDecision = [&](Instruction *I, unsigned VF) {		auto isUniformDecision = [&](Instruction *I, unsigned VF) {
InstWidening WideningDecision = getWideningDecision(I, VF);		InstWidening WideningDecision = getWideningDecision(I, VF);
assert(WideningDecision != CM_Unknown &&		assert(WideningDecision != CM_Unknown &&
"Widening decision should be ready at this moment");		"Widening decision should be ready at this moment");

return (WideningDecision == CM_Widen \|\|		return (WideningDecision == CM_Widen \|\|
WideningDecision == CM_Widen_Reverse \|\|		WideningDecision == CM_Widen_Reverse \|\|
WideningDecision == CM_Interleave);		WideningDecision == CM_Interleave);
};		};
// Iterate over the instructions in the loop, and collect all		// Iterate over the instructions in the loop, and collect all
// consecutive-like pointer operands in ConsecutiveLikePtrs. If it's possible		// consecutive-like pointer operands in ConsecutiveLikePtrs. If it's possible
// that a consecutive-like pointer operand will be scalarized, we collect it		// that a consecutive-like pointer operand will be scalarized, we collect it
// in PossibleNonUniformPtrs instead. We use two sets here because a single		// in PossibleNonUniformPtrs instead. We use two sets here because a single
// getelementptr instruction can be used by both vectorized and scalarized		// getelementptr instruction can be used by both vectorized and scalarized
// memory instructions. For example, if a loop loads and stores from the same		// memory instructions. For example, if a loop loads and stores from the same
// location, but the store is conditional, the store will be scalarized, and		// location, but the store is conditional, the store will be scalarized, and
// the getelementptr won't remain uniform.		// the getelementptr won't remain uniform.
for (auto *BB : TheLoop->blocks())		for (auto *BB : TheLoop->blocks())
for (auto &I : *BB) {		for (auto &I : *BB) {
		// Instructions with loop invariant operands are uniform, as long as
		// they do not read or write memory, are PHI nodes or terminators.
		AyalUnsubmitted Not Done Reply Inline Actions Such instructions should be identified as Legal->isUniform(), right? Note that in general loop invariance is stronger than uniformity, though Legal->isUniform() currently does return isLoopInvariant(). Ayal: Such instructions should be identified as Legal->isUniform(), right? Note that in general loop…
		if (&I != BB->getTerminator() && !I.mayReadOrWriteMemory() &&
		!isa<PHINode>(&I) && all_of(I.operands(), [this](Use &U) {
		return this->TheLoop->isLoopInvariant(U);
		})) {
		PotentialUniformRoots.insert(&I);
		continue;
		}

// If there's no pointer operand, there's nothing to do.		// If there's no pointer operand, there's nothing to do.
auto *Ptr = dyn_cast_or_null<Instruction>(getLoadStorePointerOperand(&I));		auto *Ptr = dyn_cast_or_null<Instruction>(getLoadStorePointerOperand(&I));
		AyalUnsubmitted Not Done Reply Inline Actions While we're here, _or_null should be dropped, &I cannot possibly be null. Ayal: While we're here, _or_null should be dropped, &I cannot possibly be null.
if (!Ptr)		if (!Ptr)
continue;		continue;

// True if all users of Ptr are memory accesses that have Ptr as their		// True if all users of Ptr are memory accesses that have Ptr as their
// pointer operand.		// pointer operand.
auto UsersAreMemAccesses =		auto UsersAreMemAccesses =
llvm::all_of(Ptr->users(), [&](User *U) -> bool {		llvm::all_of(Ptr->users(), [&](User *U) -> bool {
return getLoadStorePointerOperand(U) == Ptr;		return getLoadStorePointerOperand(U) == Ptr;
});		});

// Ensure the memory instruction will not be scalarized or used by		// Ensure the memory instruction will not be scalarized or used by
// gather/scatter, making its pointer operand non-uniform. If the pointer		// gather/scatter, making its pointer operand non-uniform. If the pointer
// operand is used by any instruction other than a memory access, we		// operand is used by any instruction other than a memory access, we
// conservatively assume the pointer operand may be non-uniform.		// conservatively assume the pointer operand may be non-uniform.
if (!UsersAreMemAccesses \|\| !isUniformDecision(&I, VF))		if (!UsersAreMemAccesses \|\| !isUniformDecision(&I, VF))
PossibleNonUniformPtrs.insert(Ptr);		PossibleNonUniformPtrs.insert(Ptr);

// If the memory instruction will be vectorized and its pointer operand		// If the memory instruction will be vectorized and its pointer operand
// is consecutive-like, or interleaving - the pointer operand should		// is consecutive-like, or interleaving - the pointer operand should
// remain uniform.		// remain uniform.
else		else
ConsecutiveLikePtrs.insert(Ptr);		PotentialUniformRoots.insert(Ptr);
}		}

// Add to the Worklist all consecutive and consecutive-like pointers that		// Add to the Worklist all consecutive and consecutive-like pointers that
// aren't also identified as possibly non-uniform.		// aren't also identified as possibly non-uniform.
for (auto *V : ConsecutiveLikePtrs)		for (auto *V : PotentialUniformRoots)
if (PossibleNonUniformPtrs.find(V) == PossibleNonUniformPtrs.end()) {		if (PossibleNonUniformPtrs.find(V) == PossibleNonUniformPtrs.end()) {
LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *V << "\n");		LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *V << "\n");
Worklist.insert(V);		Worklist.insert(V);
}		}

// Expand Worklist in topological order: whenever a new instruction		// Expand Worklist in topological order: whenever a new instruction
// is added , its users should be already inside Worklist. It ensures		// is added , its users should be already inside Worklist. It ensures
// a uniform instruction will only be used by uniform instructions.		// a uniform instruction will only be used by uniform instructions.
Show All 10 Lines	for (auto OV : I->operand_values()) {
auto *OP = dyn_cast<PHINode>(OV);		auto *OP = dyn_cast<PHINode>(OV);
if (OP && Legal->isFirstOrderRecurrence(OP))		if (OP && Legal->isFirstOrderRecurrence(OP))
continue;		continue;
// If all the users of the operand are uniform, then add the		// If all the users of the operand are uniform, then add the
// operand into the uniform worklist.		// operand into the uniform worklist.
auto *OI = cast<Instruction>(OV);		auto *OI = cast<Instruction>(OV);
if (llvm::all_of(OI->users(), [&](User *U) -> bool {		if (llvm::all_of(OI->users(), [&](User *U) -> bool {
auto *J = cast<Instruction>(U);		auto *J = cast<Instruction>(U);
return Worklist.count(J) \|\|		return Worklist.count(J) \|\| TheLoop->isLoopInvariant(U) \|\|
(OI == getLoadStorePointerOperand(J) &&		(OI == getLoadStorePointerOperand(J) &&
isUniformDecision(J, VF));		isUniformDecision(J, VF));
})) {		})) {
Worklist.insert(OI);		Worklist.insert(OI);
LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *OI << "\n");		LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *OI << "\n");
}		}
}		}
}		}
▲ Show 20 Lines • Show All 710 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB)
PredicatedBBsAfterVectorization.insert(BB);		PredicatedBBsAfterVectorization.insert(BB);
}		}
}		}
}		}

int LoopVectorizationCostModel::computePredInstDiscount(		int LoopVectorizationCostModel::computePredInstDiscount(
Instruction PredInst, DenseMap<Instruction , unsigned> &ScalarCosts,		Instruction PredInst, DenseMap<Instruction , unsigned> &ScalarCosts,
unsigned VF) {		unsigned VF) {
assert(!isUniformAfterVectorization(PredInst, VF) &&
"Instruction marked uniform-after-vectorization will be predicated");
AyalUnsubmitted Not Done Reply Inline Actions Predicated instructions must not be uniform-after-vectorization currently, and this should better be checked earlier to avoid inserting them into Uniforms, or bailout from vectorizing, to prevent crashing as in PR40816. This assert doesn't trigger there because it's masked under UseEmulatedMaskedMemRefHack(). When predicated instructions are allowed to be uniform-after-vectorization, it would be good for VPlan to reflect it, and have the VPRegionBlock built for it generate only a single replica (per part?) - that of lane zero. Ayal: Predicated instructions must not be uniform-after-vectorization currently, and this should…

// Initialize the discount to zero, meaning that the scalar version and the		// Initialize the discount to zero, meaning that the scalar version and the
// vector version cost the same.		// vector version cost the same.
int Discount = 0;		int Discount = 0;

// Holds instructions to analyze. The instructions we visit are mapped in		// Holds instructions to analyze. The instructions we visit are mapped in
// ScalarCosts. Those instructions are the ones that would be scalarized if		// ScalarCosts. Those instructions are the ones that would be scalarized if
// we find that the scalar version costs less.		// we find that the scalar version costs less.
Show All 23 Lines	auto canBeScalarized = [&](Instruction *I) -> bool {
// marked uniform after vectorization, rather than VF identical values.		// marked uniform after vectorization, rather than VF identical values.
// Thus, if we scalarize an instruction that uses a uniform, we would		// Thus, if we scalarize an instruction that uses a uniform, we would
// create uses of values corresponding to the lanes we aren't emitting code		// create uses of values corresponding to the lanes we aren't emitting code
// for. This behavior can be changed by allowing getScalarValue to clone		// for. This behavior can be changed by allowing getScalarValue to clone
// the lane zero values for uniforms rather than asserting.		// the lane zero values for uniforms rather than asserting.
for (Use &U : I->operands())		for (Use &U : I->operands())
if (auto *J = dyn_cast<Instruction>(U.get()))		if (auto *J = dyn_cast<Instruction>(U.get()))
if (isUniformAfterVectorization(J, VF))		if (isUniformAfterVectorization(J, VF))
return false;		return false;
		AyalUnsubmitted Not Done Reply Inline Actions Comments and code above should be updated? Ayal: Comments and code above should be updated?

// Otherwise, we can scalarize the instruction.		// Otherwise, we can scalarize the instruction.
return true;		return true;
};		};

// Compute the expected cost discount from scalarizing the entire expression		// Compute the expected cost discount from scalarizing the entire expression
// feeding the predicated instruction. We currently only consider expressions		// feeding the predicated instruction. We currently only consider expressions
// that are single-use instruction chains.		// that are single-use instruction chains.
▲ Show 20 Lines • Show All 2,312 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll

; REQUIRES: asserts		; REQUIRES: asserts

; RUN: opt -loop-vectorize -mtriple=arm64-apple-ios %s -S -debug -disable-output 2>&1 \| FileCheck --check-prefix=CM %s		; RUN: opt -loop-vectorize -mtriple=arm64-apple-ios %s -S -debug -disable-output 2>&1 \| FileCheck --check-prefix=CM %s
; RUN: opt -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 %s -S \| FileCheck --check-prefix=FORCED %s		; RUN: opt -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 %s -S \| FileCheck --check-prefix=FORCED %s

; Test case from PR41294.		; Test case from PR41294.

; Check scalar cost for extractvalue. The constant and loop invariant operands are free,		; Check scalar cost for extractvalue. The constant and loop invariant operands are free,
; leaving cost 3 for scalarizing the result + 2 for executing the op with VF 2.		; leaving cost 3 for scalarizing the result + 2 for executing the op with VF 2.

; CM: LV: Scalar loop costs: 7.		; CM: LV: Scalar loop costs: 7.
; CM: LV: Found an estimated cost of 5 for VF 2 For instruction: %a = extractvalue { i64, i64 } %sv, 0		; CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %a = extractvalue { i64, i64 } %sv, 0
; CM-NEXT: LV: Found an estimated cost of 5 for VF 2 For instruction: %b = extractvalue { i64, i64 } %sv, 1		; CM-NEXT: LV: Found an estimated cost of 1 for VF 2 For instruction: %b = extractvalue { i64, i64 } %sv, 1

; Check that the extractvalue operands are actually free in vector code.		; Check that the extractvalue operands are actually free in vector code.

; FORCED-LABEL: vector.body: ; preds = %vector.body, %vector.ph		; FORCED-LABEL: vector.body: ; preds = %vector.body, %vector.ph
; FORCED-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]		; FORCED-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; FORCED-NEXT: %broadcast.splatinsert = insertelement <2 x i32> undef, i32 %index, i32 0		; FORCED-NEXT: %broadcast.splatinsert = insertelement <2 x i32> undef, i32 %index, i32 0
; FORCED-NEXT: %broadcast.splat = shufflevector <2 x i32> %broadcast.splatinsert, <2 x i32> undef, <2 x i32> zeroinitializer		; FORCED-NEXT: %broadcast.splat = shufflevector <2 x i32> %broadcast.splatinsert, <2 x i32> undef, <2 x i32> zeroinitializer
; FORCED-NEXT: %induction = add <2 x i32> %broadcast.splat, <i32 0, i32 1>		; FORCED-NEXT: %induction = add <2 x i32> %broadcast.splat, <i32 0, i32 1>
; FORCED-NEXT: %0 = add i32 %index, 0		; FORCED-NEXT: %0 = add i32 %index, 0
; FORCED-NEXT: %1 = extractvalue { i64, i64 } %sv, 0		; FORCED-NEXT: %1 = extractvalue { i64, i64 } %sv, 0
; FORCED-NEXT: %2 = extractvalue { i64, i64 } %sv, 0		; FORCED-NEXT: %broadcast.splatinsert1 = insertelement <2 x i64> undef, i64 %1, i32 0
; FORCED-NEXT: %3 = insertelement <2 x i64> undef, i64 %1, i32 0		; FORCED-NEXT: %broadcast.splat2 = shufflevector <2 x i64> %broadcast.splatinsert1, <2 x i64> undef, <2 x i32> zeroinitializer
; FORCED-NEXT: %4 = insertelement <2 x i64> %3, i64 %2, i32 1		; FORCED-NEXT: %2 = extractvalue { i64, i64 } %sv, 1
; FORCED-NEXT: %5 = extractvalue { i64, i64 } %sv, 1		; FORCED-NEXT: %broadcast.splatinsert3 = insertelement <2 x i64> undef, i64 %2, i32 0
; FORCED-NEXT: %6 = extractvalue { i64, i64 } %sv, 1		; FORCED-NEXT: %broadcast.splat4 = shufflevector <2 x i64> %broadcast.splatinsert3, <2 x i64> undef, <2 x i32> zeroinitializer
; FORCED-NEXT: %7 = insertelement <2 x i64> undef, i64 %5, i32 0		; FORCED-NEXT: %3 = getelementptr i64, i64* %dst, i32 %0
; FORCED-NEXT: %8 = insertelement <2 x i64> %7, i64 %6, i32 1		; FORCED-NEXT: %4 = add <2 x i64> %broadcast.splat2, %broadcast.splat4
; FORCED-NEXT: %9 = getelementptr i64, i64* %dst, i32 %0		; FORCED-NEXT: %5 = getelementptr i64, i64* %3, i32 0
; FORCED-NEXT: %10 = add <2 x i64> %4, %8		; FORCED-NEXT: %6 = bitcast i64* %5 to <2 x i64>*
; FORCED-NEXT: %11 = getelementptr i64, i64* %9, i32 0		; FORCED-NEXT: store <2 x i64> %4, <2 x i64>* %6, align 4
; FORCED-NEXT: %12 = bitcast i64* %11 to <2 x i64>*
; FORCED-NEXT: store <2 x i64> %10, <2 x i64>* %12, align 4
; FORCED-NEXT: %index.next = add i32 %index, 2		; FORCED-NEXT: %index.next = add i32 %index, 2
; FORCED-NEXT: %13 = icmp eq i32 %index.next, 0		; FORCED-NEXT: %7 = icmp eq i32 %index.next, 0
; FORCED-NEXT: br i1 %13, label %middle.block, label %vector.body, !llvm.loop !0		; FORCED-NEXT: br i1 %7, label %middle.block, label %vector.body, !llvm.loop !0

define void @test1(i64* %dst, {i64, i64} %sv) {		define void @test1(i64* %dst, {i64, i64} %sv) {
entry:		entry:
br label %loop.body		br label %loop.body

loop.body:		loop.body:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.body ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.body ]
%a = extractvalue { i64, i64 } %sv, 0		%a = extractvalue { i64, i64 } %sv, 0
Show All 9 Lines	exit:
ret void		ret void
}		}


; Similar to the test case above, but checks getVectorCallCost as well.		; Similar to the test case above, but checks getVectorCallCost as well.
declare float @pow(float, float) readnone nounwind		declare float @pow(float, float) readnone nounwind

; CM: LV: Scalar loop costs: 16.		; CM: LV: Scalar loop costs: 16.
; CM: LV: Found an estimated cost of 5 for VF 2 For instruction: %a = extractvalue { float, float } %sv, 0		; CM: LV: Found an estimated cost of 1 for VF 2 For instruction: %a = extractvalue { float, float } %sv, 0
; CM-NEXT: LV: Found an estimated cost of 5 for VF 2 For instruction: %b = extractvalue { float, float } %sv, 1		; CM-NEXT: LV: Found an estimated cost of 1 for VF 2 For instruction: %b = extractvalue { float, float } %sv, 1

; FORCED-LABEL: define void @test_getVectorCallCost		; FORCED-LABEL: define void @test_getVectorCallCost

; FORCED-LABEL: vector.body: ; preds = %vector.body, %vector.ph		; FORCED-LABEL: vector.body: ; preds = %vector.body, %vector.ph
; FORCED-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]		; FORCED-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; FORCED-NEXT: %broadcast.splatinsert = insertelement <2 x i32> undef, i32 %index, i32 0		; FORCED-NEXT: %broadcast.splatinsert = insertelement <2 x i32> undef, i32 %index, i32 0
; FORCED-NEXT: %broadcast.splat = shufflevector <2 x i32> %broadcast.splatinsert, <2 x i32> undef, <2 x i32> zeroinitializer		; FORCED-NEXT: %broadcast.splat = shufflevector <2 x i32> %broadcast.splatinsert, <2 x i32> undef, <2 x i32> zeroinitializer
; FORCED-NEXT: %induction = add <2 x i32> %broadcast.splat, <i32 0, i32 1>		; FORCED-NEXT: %induction = add <2 x i32> %broadcast.splat, <i32 0, i32 1>
; FORCED-NEXT: %0 = add i32 %index, 0		; FORCED-NEXT: %0 = add i32 %index, 0
; FORCED-NEXT: %1 = extractvalue { float, float } %sv, 0		; FORCED-NEXT: %1 = extractvalue { float, float } %sv, 0
; FORCED-NEXT: %2 = extractvalue { float, float } %sv, 0		; FORCED-NEXT: %broadcast.splatinsert1 = insertelement <2 x float> undef, float %1, i32 0
; FORCED-NEXT: %3 = insertelement <2 x float> undef, float %1, i32 0		; FORCED-NEXT: %broadcast.splat2 = shufflevector <2 x float> %broadcast.splatinsert1, <2 x float> undef, <2 x i32> zeroinitializer
; FORCED-NEXT: %4 = insertelement <2 x float> %3, float %2, i32 1		; FORCED-NEXT: %2 = extractvalue { float, float } %sv, 1
; FORCED-NEXT: %5 = extractvalue { float, float } %sv, 1		; FORCED-NEXT: %broadcast.splatinsert3 = insertelement <2 x float> undef, float %2, i32 0
; FORCED-NEXT: %6 = extractvalue { float, float } %sv, 1		; FORCED-NEXT: %broadcast.splat4 = shufflevector <2 x float> %broadcast.splatinsert3, <2 x float> undef, <2 x i32> zeroinitializer
; FORCED-NEXT: %7 = insertelement <2 x float> undef, float %5, i32 0		; FORCED-NEXT: %3 = getelementptr float, float* %dst, i32 %0
; FORCED-NEXT: %8 = insertelement <2 x float> %7, float %6, i32 1		; FORCED-NEXT: %4 = call <2 x float> @llvm.pow.v2f32(<2 x float> %broadcast.splat2, <2 x float> %broadcast.splat4)
; FORCED-NEXT: %9 = getelementptr float, float* %dst, i32 %0		; FORCED-NEXT: %5 = getelementptr float, float* %3, i32 0
; FORCED-NEXT: %10 = call <2 x float> @llvm.pow.v2f32(<2 x float> %4, <2 x float> %8)		; FORCED-NEXT: %6 = bitcast float* %5 to <2 x float>*
; FORCED-NEXT: %11 = getelementptr float, float* %9, i32 0		; FORCED-NEXT: store <2 x float> %4, <2 x float>* %6, align 4
; FORCED-NEXT: %12 = bitcast float* %11 to <2 x float>*
; FORCED-NEXT: store <2 x float> %10, <2 x float>* %12, align 4
; FORCED-NEXT: %index.next = add i32 %index, 2		; FORCED-NEXT: %index.next = add i32 %index, 2
; FORCED-NEXT: %13 = icmp eq i32 %index.next, 0		; FORCED-NEXT: %7 = icmp eq i32 %index.next, 0
; FORCED-NEXT: br i1 %13, label %middle.block, label %vector.body, !llvm.loop !4		; FORCED-NEXT: br i1 %7, label %middle.block, label %vector.body, !llvm.loop !4

define void @test_getVectorCallCost(float* %dst, {float, float} %sv) {		define void @test_getVectorCallCost(float* %dst, {float, float} %sv) {
entry:		entry:
br label %loop.body		br label %loop.body

loop.body:		loop.body:
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.body ]		%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.body ]
%a = extractvalue { float, float } %sv, 0		%a = extractvalue { float, float } %sv, 0
Show All 11 Lines

llvm/test/Transforms/LoopVectorize/X86/assume.ll

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	entry:
br label %for.body		br label %for.body

; CHECK-LABEL: @test2		; CHECK-LABEL: @test2
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: @llvm.assume		; CHECK: @llvm.assume
; CHECK: @llvm.assume		; CHECK: @llvm.assume
; CHECK: @llvm.assume		; CHECK: @llvm.assume
; CHECK: @llvm.assume		; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: @llvm.assume
; CHECK: for.body:		; CHECK: for.body:
; CHECK: ret void		; CHECK: ret void

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
tail call void @llvm.assume(i1 %maskcond)		tail call void @llvm.assume(i1 %maskcond)
%arrayidx = getelementptr inbounds float, float* %0, i64 %indvars.iv		%arrayidx = getelementptr inbounds float, float* %0, i64 %indvars.iv
%2 = load float, float* %arrayidx, align 4		%2 = load float, float* %arrayidx, align 4
Show All 12 Lines

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll

	Show All 17 Lines
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = trunc i32 [[INDEX]] to i16			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = trunc i32 [[INDEX]] to i16
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i16> undef, i16 [[OFFSET_IDX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i16> undef, i16 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT]], <2 x i16> undef, <2 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i16> [[BROADCAST_SPLATINSERT]], <2 x i16> undef, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i16> [[BROADCAST_SPLAT]], <i16 0, i16 1>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <2 x i16> [[BROADCAST_SPLAT]], <i16 0, i16 1>
	; CHECK-NEXT: [[TMP0:%.*]] = add i16 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i16 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = sext i16 [[TMP0]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i16 0 to i64
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr [2 x i16], [2 x i16] @b, i16 0, i64 [[TMP1]]			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <2 x i64> undef, i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr i16, i16** [[TMP2]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT1]], <2 x i64> undef, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i16* [[TMP3]] to <2 x i16>			; CHECK-NEXT: [[TMP2:%.]] = getelementptr [1 x %rec8], [1 x %rec8] @a, i16 0, <2 x i64> [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: store <2 x i16> <i16 getelementptr inbounds (%rec8, %rec8* extractelement (<2 x %rec8> getelementptr ([1 x %rec8], [1 x %rec8] @a, <2 x i16> zeroinitializer, <2 x i64> zeroinitializer), i32 0), i32 0, i32 0), i16* getelementptr inbounds (%rec8, %rec8* extractelement (<2 x %rec8> getelementptr ([1 x %rec8], [1 x %rec8] @a, <2 x i16> zeroinitializer, <2 x i64> zeroinitializer), i32 1), i32 0, i32 0)>, <2 x i16> [[TMP4]], align 8			; CHECK-NEXT: [[TMP3:%.]] = bitcast <2 x %rec8> [[TMP2]] to <2 x i16*>
				; CHECK-NEXT: [[TMP4:%.*]] = sext i16 [[TMP0]] to i64
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr [2 x i16], [2 x i16] @b, i16 0, i64 [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr i16, i16** [[TMP5]], i32 0
				; CHECK-NEXT: [[TMP7:%.]] = bitcast i16* [[TMP6]] to <2 x i16>
				; CHECK-NEXT: store <2 x i16> [[TMP3]], <2 x i16>* [[TMP7]], align 8
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 2			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], 2
	; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; CHECK: middle.block:			; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 2, 2
				; CHECK-NEXT: br i1 [[CMP_N]], label [[BB3:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i16 [ 2, [[MIDDLE_BLOCK]] ], [ 0, [[BB1:%.]] ]
				; CHECK-NEXT: br label [[BB2:%.*]]
				; CHECK: bb2:
				; CHECK-NEXT: [[C_1_0:%.]] = phi i16 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[_TMP9:%.]], [[BB2]] ]
				; CHECK-NEXT: [[_TMP1:%.*]] = zext i16 0 to i64
				; CHECK-NEXT: [[_TMP2:%.]] = getelementptr [1 x %rec8], [1 x %rec8] @a, i16 0, i64 [[_TMP1]]
				; CHECK-NEXT: [[_TMP4:%.]] = bitcast %rec8 [[_TMP2]] to i16*
				; CHECK-NEXT: [[_TMP6:%.*]] = sext i16 [[C_1_0]] to i64
				; CHECK-NEXT: [[_TMP7:%.]] = getelementptr [2 x i16], [2 x i16] @b, i16 0, i64 [[_TMP6]]
				; CHECK-NEXT: store i16* [[_TMP4]], i16** [[_TMP7]]
				; CHECK-NEXT: [[_TMP9]] = add nsw i16 [[C_1_0]], 1
				; CHECK-NEXT: [[_TMP11:%.*]] = icmp slt i16 [[_TMP9]], 2
				; CHECK-NEXT: br i1 [[_TMP11]], label [[BB2]], label [[BB3]], !llvm.loop !2
				; CHECK: bb3:
				; CHECK-NEXT: ret void
				;

	bb1:			bb1:
	br label %bb2			br label %bb2

	bb2:			bb2:
	%c.1.0 = phi i16 [ 0, %bb1 ], [ %_tmp9, %bb2 ]			%c.1.0 = phi i16 [ 0, %bb1 ], [ %_tmp9, %bb2 ]
	%_tmp1 = zext i16 0 to i64			%_tmp1 = zext i16 0 to i64
	%_tmp2 = getelementptr [1 x %rec8], [1 x %rec8]* @a, i16 0, i64 %_tmp1			%_tmp2 = getelementptr [1 x %rec8], [1 x %rec8]* @a, i16 0, i64 %_tmp1
	Show All 11 Lines

llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll

	Show All 12 Lines

	define void @cff_index_load_offsets(i1 %cond, i8 %x, i8* %p) #0 {			define void @cff_index_load_offsets(i1 %cond, i8 %x, i8* %p) #0 {
	; CHECK-LABEL: @cff_index_load_offsets(			; CHECK-LABEL: @cff_index_load_offsets(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_THEN:%.]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[IF_THEN:%.]], label [[EXIT:%.*]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x i8> undef, i8 [[X:%.]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> undef, <4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4			; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
	; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i8, i8 null, i64 [[TMP1]]			; CHECK-NEXT: [[NEXT_GEP:%.]] = getelementptr i8, i8 null, i64 [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = zext <4 x i8> [[BROADCAST_SPLAT]] to <4 x i32>			; CHECK-NEXT: [[TMP2:%.]] = zext i8 [[X:%.]] to i32
	; CHECK-NEXT: [[TMP3:%.*]] = shl nuw <4 x i32> [[TMP2]], <i32 24, i32 24, i32 24, i32 24>			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP3:%.*]] = shl nuw <4 x i32> [[BROADCAST_SPLAT]], <i32 24, i32 24, i32 24, i32 24>
	; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[P:%.*]], align 1, !tbaa !1			; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[P:%.*]], align 1, !tbaa !1
	; CHECK-NEXT: [[TMP5:%.]] = load i8, i8 [[P]], align 1, !tbaa !1			; CHECK-NEXT: [[TMP5:%.]] = load i8, i8 [[P]], align 1, !tbaa !1
	; CHECK-NEXT: [[TMP6:%.]] = load i8, i8 [[P]], align 1, !tbaa !1			; CHECK-NEXT: [[TMP6:%.]] = load i8, i8 [[P]], align 1, !tbaa !1
	; CHECK-NEXT: [[TMP7:%.]] = load i8, i8 [[P]], align 1, !tbaa !1			; CHECK-NEXT: [[TMP7:%.]] = load i8, i8 [[P]], align 1, !tbaa !1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i8> undef, i8 [[TMP4]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i8> undef, i8 [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i8> [[TMP8]], i8 [[TMP5]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x i8> [[TMP8]], i8 [[TMP5]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i8> [[TMP9]], i8 [[TMP6]], i32 2			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x i8> [[TMP9]], i8 [[TMP6]], i32 2
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i8> [[TMP10]], i8 [[TMP7]], i32 3			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x i8> [[TMP10]], i8 [[TMP7]], i32 3
	; CHECK-NEXT: [[TMP12:%.*]] = zext <4 x i8> [[TMP11]] to <4 x i32>			; CHECK-NEXT: [[TMP12:%.*]] = zext <4 x i8> [[TMP11]] to <4 x i32>
	; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw <4 x i32> [[TMP12]], <i32 16, i32 16, i32 16, i32 16>			; CHECK-NEXT: [[TMP13:%.*]] = shl nuw nsw <4 x i32> [[TMP12]], <i32 16, i32 16, i32 16, i32 16>
	; CHECK-NEXT: [[TMP14:%.*]] = or <4 x i32> [[TMP13]], [[TMP3]]			; CHECK-NEXT: [[TMP14:%.*]] = or <4 x i32> [[TMP13]], [[TMP3]]
	; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 undef, align 1, !tbaa !1			; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 undef, align 1, !tbaa !1
	; CHECK-NEXT: [[TMP16:%.]] = load i8, i8 undef, align 1, !tbaa !1			; CHECK-NEXT: [[TMP16:%.]] = load i8, i8 undef, align 1, !tbaa !1
	; CHECK-NEXT: [[TMP17:%.]] = load i8, i8 undef, align 1, !tbaa !1			; CHECK-NEXT: [[TMP17:%.]] = load i8, i8 undef, align 1, !tbaa !1
	; CHECK-NEXT: [[TMP18:%.]] = load i8, i8 undef, align 1, !tbaa !1			; CHECK-NEXT: [[TMP18:%.]] = load i8, i8 undef, align 1, !tbaa !1
	; CHECK-NEXT: [[TMP19:%.*]] = or <4 x i32> [[TMP14]], zeroinitializer			; CHECK-NEXT: [[TMP19:%.*]] = shl nuw nsw i32 undef, 8
	; CHECK-NEXT: [[TMP20:%.*]] = or <4 x i32> [[TMP19]], zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i32> undef, i32 [[TMP19]], i32 0
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i32> [[TMP20]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT1]], <4 x i32> undef, <4 x i32> zeroinitializer
	; CHECK-NEXT: store i32 [[TMP21]], i32* undef, align 4, !tbaa !4			; CHECK-NEXT: [[TMP20:%.*]] = or <4 x i32> [[TMP14]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i32> [[TMP20]], i32 1			; CHECK-NEXT: [[TMP21:%.*]] = zext i8 undef to i32
	; CHECK-NEXT: store i32 [[TMP22]], i32* undef, align 4, !tbaa !4			; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i32> undef, i32 [[TMP21]], i32 0
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP20]], i32 2			; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT3]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP22:%.*]] = or <4 x i32> [[TMP20]], [[BROADCAST_SPLAT4]]
				; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i32> [[TMP22]], i32 0
	; CHECK-NEXT: store i32 [[TMP23]], i32* undef, align 4, !tbaa !4			; CHECK-NEXT: store i32 [[TMP23]], i32* undef, align 4, !tbaa !4
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i32> [[TMP20]], i32 3			; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x i32> [[TMP22]], i32 1
	; CHECK-NEXT: store i32 [[TMP24]], i32* undef, align 4, !tbaa !4			; CHECK-NEXT: store i32 [[TMP24]], i32* undef, align 4, !tbaa !4
				; CHECK-NEXT: [[TMP25:%.*]] = extractelement <4 x i32> [[TMP22]], i32 2
				; CHECK-NEXT: store i32 [[TMP25]], i32* undef, align 4, !tbaa !4
				; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x i32> [[TMP22]], i32 3
				; CHECK-NEXT: store i32 [[TMP26]], i32* undef, align 4, !tbaa !4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0			; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], 0
	; CHECK-NEXT: br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6			; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !6
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1, 0			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1, 0
	; CHECK-NEXT: br i1 [[CMP_N]], label [[SW_EPILOG:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[SW_EPILOG:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ null, [[MIDDLE_BLOCK]] ], [ null, [[IF_THEN]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i8 [ null, [[MIDDLE_BLOCK]] ], [ null, [[IF_THEN]] ]
	; CHECK-NEXT: br label [[FOR_BODY68:%.*]]			; CHECK-NEXT: br label [[FOR_BODY68:%.*]]
	; CHECK: for.body68:			; CHECK: for.body68:
	; CHECK-NEXT: [[P_359:%.]] = phi i8 [ [[ADD_PTR86:%.*]], [[FOR_BODY68]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[P_359:%.]] = phi i8 [ [[ADD_PTR86:%.*]], [[FOR_BODY68]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[CONV70:%.*]] = zext i8 [[X]] to i32			; CHECK-NEXT: [[CONV70:%.*]] = zext i8 [[X]] to i32
	; CHECK-NEXT: [[SHL71:%.*]] = shl nuw i32 [[CONV70]], 24			; CHECK-NEXT: [[SHL71:%.*]] = shl nuw i32 [[CONV70]], 24
	; CHECK-NEXT: [[TMP26:%.]] = load i8, i8 [[P]], align 1, !tbaa !1			; CHECK-NEXT: [[TMP28:%.]] = load i8, i8 [[P]], align 1, !tbaa !1
	; CHECK-NEXT: [[CONV73:%.*]] = zext i8 [[TMP26]] to i32			; CHECK-NEXT: [[CONV73:%.*]] = zext i8 [[TMP28]] to i32
	; CHECK-NEXT: [[SHL74:%.*]] = shl nuw nsw i32 [[CONV73]], 16			; CHECK-NEXT: [[SHL74:%.*]] = shl nuw nsw i32 [[CONV73]], 16
	; CHECK-NEXT: [[OR75:%.*]] = or i32 [[SHL74]], [[SHL71]]			; CHECK-NEXT: [[OR75:%.*]] = or i32 [[SHL74]], [[SHL71]]
	; CHECK-NEXT: [[TMP27:%.]] = load i8, i8 undef, align 1, !tbaa !1			; CHECK-NEXT: [[TMP29:%.]] = load i8, i8 undef, align 1, !tbaa !1
	; CHECK-NEXT: [[SHL78:%.*]] = shl nuw nsw i32 undef, 8			; CHECK-NEXT: [[SHL78:%.*]] = shl nuw nsw i32 undef, 8
	; CHECK-NEXT: [[OR79:%.*]] = or i32 [[OR75]], [[SHL78]]			; CHECK-NEXT: [[OR79:%.*]] = or i32 [[OR75]], [[SHL78]]
	; CHECK-NEXT: [[CONV81:%.*]] = zext i8 undef to i32			; CHECK-NEXT: [[CONV81:%.*]] = zext i8 undef to i32
	; CHECK-NEXT: [[OR83:%.*]] = or i32 [[OR79]], [[CONV81]]			; CHECK-NEXT: [[OR83:%.*]] = or i32 [[OR79]], [[CONV81]]
	; CHECK-NEXT: store i32 [[OR83]], i32* undef, align 4, !tbaa !4			; CHECK-NEXT: store i32 [[OR83]], i32* undef, align 4, !tbaa !4
	; CHECK-NEXT: [[ADD_PTR86]] = getelementptr inbounds i8, i8* [[P_359]], i64 4			; CHECK-NEXT: [[ADD_PTR86]] = getelementptr inbounds i8, i8* [[P_359]], i64 4
	; CHECK-NEXT: [[CMP66:%.]] = icmp ult i8 [[ADD_PTR86]], undef			; CHECK-NEXT: [[CMP66:%.]] = icmp ult i8 [[ADD_PTR86]], undef
	; CHECK-NEXT: br i1 [[CMP66]], label [[FOR_BODY68]], label [[SW_EPILOG]], !llvm.loop !8			; CHECK-NEXT: br i1 [[CMP66]], label [[FOR_BODY68]], label [[SW_EPILOG]], !llvm.loop !8
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/funclet.ll

; RUN: opt -S -loop-vectorize < %s \| FileCheck %s		; RUN: opt -S -loop-vectorize < %s \| FileCheck %s
target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"		target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
target triple = "i686-pc-windows-msvc18.0.0"		target triple = "i686-pc-windows-msvc18.0.0"

		; Loop invariant call to @floor is uniform, which means we do not end up
		; with any vector instructions in the loop.
define void @test1() #0 personality i32 (...)* @__CxxFrameHandler3 {		define void @test1() #0 personality i32 (...)* @__CxxFrameHandler3 {
entry:		entry:
invoke void @_CxxThrowException(i8* null, i8* null)		invoke void @_CxxThrowException(i8* null, i8* null)
to label %unreachable unwind label %catch.dispatch		to label %unreachable unwind label %catch.dispatch

catch.dispatch: ; preds = %entry		catch.dispatch: ; preds = %entry
%0 = catchswitch within none [label %catch] unwind to caller		%0 = catchswitch within none [label %catch] unwind to caller

Show All 15 Lines	try.cont: ; preds = %for.cond.cleanup
ret void		ret void

unreachable: ; preds = %entry		unreachable: ; preds = %entry
unreachable		unreachable
}		}

; CHECK-LABEL: define void @test1(		; CHECK-LABEL: define void @test1(
; CHECK: %[[cpad:.]] = catchpad within {{.}} [i8* null, i32 64, i8* null]		; CHECK: %[[cpad:.]] = catchpad within {{.}} [i8* null, i32 64, i8* null]
; CHECK: call <16 x double> @llvm.floor.v16f64(<16 x double> {{.*}}) [ "funclet"(token %[[cpad]]) ]		; CHECK: call double @floor(double 1.000000e+00) #1 [ "funclet"(token %1) ]

		define void @test2(double* %A) #0 personality i32 (...)* @__CxxFrameHandler3 {
		entry:
		invoke void @_CxxThrowException(i8* null, i8* null)
		to label %unreachable unwind label %catch.dispatch

		catch.dispatch: ; preds = %entry
		%0 = catchswitch within none [label %catch] unwind to caller

		catch: ; preds = %catch.dispatch
		%1 = catchpad within %0 [i8* null, i32 64, i8* null]
		br label %for.body

		for.cond.cleanup: ; preds = %for.body
		catchret from %1 to label %try.cont

		for.body: ; preds = %for.body, %catch
		%i.07 = phi i32 [ 0, %catch ], [ %inc, %for.body ]
		%A.ptr = getelementptr double, double* %A, i32 %i.07
		%A.val = load double, double* %A.ptr
		%call = call double @floor(double %A.val) #1 [ "funclet"(token %1) ]
		%inc = add nuw nsw i32 %i.07, 1
		%exitcond = icmp eq i32 %inc, 1024
		br i1 %exitcond, label %for.cond.cleanup, label %for.body

		try.cont: ; preds = %for.cond.cleanup
		ret void

		unreachable: ; preds = %entry
		unreachable
		}

		; CHECK-LABEL: define void @test2(
		; CHECK: %[[cpad:.]] = catchpad within {{.}} [i8* null, i32 64, i8* null]
		; CHECK: call <2 x double> @llvm.floor.v2f64(<2 x double> %wide.load) [ "funclet"(token %1) ]


declare x86_stdcallcc void @_CxxThrowException(i8, i8)		declare x86_stdcallcc void @_CxxThrowException(i8, i8)

declare i32 @__CxxFrameHandler3(...)		declare i32 @__CxxFrameHandler3(...)

declare double @floor(double) #1		declare double @floor(double) #1

attributes #0 = { "target-features"="+sse2" }		attributes #0 = { "target-features"="+sse2" }
attributes #1 = { nounwind readnone }		attributes #1 = { nounwind readnone }

llvm/test/Transforms/LoopVectorize/X86/invariant-load-gather.ll

	Show All 19 Lines
	; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[SMAX2]]			; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[B]], i64 [[SMAX2]]
	; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, i8 [[A4]], i64 1			; CHECK-NEXT: [[UGLYGEP:%.]] = getelementptr i8, i8 [[A4]], i64 1
	; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i8 [[UGLYGEP]], [[B1]]			; CHECK-NEXT: [[BOUND0:%.]] = icmp ugt i8 [[UGLYGEP]], [[B1]]
	; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[SCEVGEP]], [[A]]			; CHECK-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[SCEVGEP]], [[A]]
	; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]			; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[SMAX]], 9223372036854775792			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[SMAX]], 9223372036854775792
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <16 x i32> undef, i32* [[A]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT5:%.*]] = insertelement <16 x i32> undef, i32 [[NTRUNC]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT6:%.]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT5]], <16 x i32*> undef, <16 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT5]], <16 x i32> undef, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT7:%.*]] = insertelement <16 x i32> undef, i32 [[NTRUNC]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.]] = insertelement <16 x i32> undef, i32* [[A]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT8:%.*]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT7]], <16 x i32> undef, <16 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT10:%.]] = shufflevector <16 x i32> [[BROADCAST_SPLATINSERT9]], <16 x i32*> undef, <16 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = icmp ne <16 x i32> [[BROADCAST_SPLAT6]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.]] = icmp ne i32 [[A]], null
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT7:%.*]] = insertelement <16 x i1> undef, i1 [[TMP3]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT8:%.*]] = shufflevector <16 x i1> [[BROADCAST_SPLATINSERT7]], <16 x i1> undef, <16 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP2]] to <16 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP2]] to <16 x i32>*
	; CHECK-NEXT: store <16 x i32> [[BROADCAST_SPLAT8]], <16 x i32>* [[TMP4]], align 4, !alias.scope !0, !noalias !3			; CHECK-NEXT: store <16 x i32> [[BROADCAST_SPLAT6]], <16 x i32>* [[TMP4]], align 4, !alias.scope !0, !noalias !3
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !5
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <16 x i32> @llvm.masked.gather.v16i32.v16p0i32(<16 x i32> [[BROADCAST_SPLAT6]], i32 4, <16 x i1> [[TMP3]], <16 x i32> undef), !alias.scope !3			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <16 x i32> @llvm.masked.gather.v16i32.v16p0i32(<16 x i32> [[BROADCAST_SPLAT10]], i32 4, <16 x i1> [[BROADCAST_SPLAT8]], <16 x i32> undef), !alias.scope !3
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <16 x i1> [[TMP3]], <16 x i32> [[WIDE_MASKED_GATHER]], <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 1>			; CHECK-NEXT: [[PREDPHI:%.*]] = select <16 x i1> [[BROADCAST_SPLAT8]], <16 x i32> [[WIDE_MASKED_GATHER]], <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 1>
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX]], [[N_VEC]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x i32> [[PREDPHI]], i32 15			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <16 x i32> [[PREDPHI]], i32 15
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[LATCH:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[LATCH:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-vectorize -S -mattr=avx512f -instcombine < %s \| FileCheck %s			; RUN: opt -loop-vectorize -S -mattr=avx512f -instcombine < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; first test checks that loop with a reduction and a uniform store gets			; first test checks that loop with a reduction and a uniform store gets
	; vectorized.			; vectorized.
	; CHECK-LABEL: inv_val_store_to_inv_address_with_reduction
	; CHECK-LABEL: vector.memcheck:
	; CHECK: found.conflict

	; CHECK-LABEL: vector.body:
	; CHECK: %vec.phi = phi <16 x i32> [ zeroinitializer, %vector.ph ], [ [[ADD:%[a-zA-Z0-9.]+]], %vector.body ]
	; CHECK: %wide.load = load <16 x i32>
	; CHECK: [[ADD]] = add <16 x i32> %vec.phi, %wide.load
	; CHECK: store i32 %ntrunc, i32* %a
	; CHECK-NOT: store i32 %ntrunc, i32* %a
	; CHECK: %index.next = add i64 %index, 64

	; CHECK-LABEL: middle.block:
	; CHECK: %rdx.shuf = shufflevector <16 x i32>
	define i32 @inv_val_store_to_inv_address_with_reduction(i32* %a, i64 %n, i32* %b) {			define i32 @inv_val_store_to_inv_address_with_reduction(i32* %a, i64 %n, i32* %b) {
				; CHECK-LABEL: @inv_val_store_to_inv_address_with_reduction(
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
				; CHECK-NEXT: [[VEC_PHI:%.]] = phi <16 x i32> [ zeroinitializer, %vector.ph ], [ [[TMP10:%.]], %vector.body ]
				; CHECK-NEXT: [[VEC_PHI8:%.]] = phi <16 x i32> [ zeroinitializer, %vector.ph ], [ [[TMP11:%.]], %vector.body ]
				; CHECK-NEXT: [[VEC_PHI9:%.]] = phi <16 x i32> [ zeroinitializer, %vector.ph ], [ [[TMP12:%.]], %vector.body ]
				; CHECK-NEXT: [[VEC_PHI10:%.]] = phi <16 x i32> [ zeroinitializer, %vector.ph ], [ [[TMP13:%.]], %vector.body ]
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 %b, i64 [[INDEX]]
				; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <16 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP3]], align 8, !alias.scope !0
				; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP2]], i64 16
				; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <16 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD11:%.]] = load <16 x i32>, <16 x i32> [[TMP5]], align 8, !alias.scope !0
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP2]], i64 32
				; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <16 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD12:%.]] = load <16 x i32>, <16 x i32> [[TMP7]], align 8, !alias.scope !0
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP2]], i64 48
				; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[TMP8]] to <16 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP9]], align 8, !alias.scope !0
				; CHECK-NEXT: [[TMP10]] = add <16 x i32> [[VEC_PHI]], [[WIDE_LOAD]]
				; CHECK-NEXT: [[TMP11]] = add <16 x i32> [[VEC_PHI8]], [[WIDE_LOAD11]]
				; CHECK-NEXT: [[TMP12]] = add <16 x i32> [[VEC_PHI9]], [[WIDE_LOAD12]]
				; CHECK-NEXT: [[TMP13]] = add <16 x i32> [[VEC_PHI10]], [[WIDE_LOAD13]]
				; CHECK-NEXT: store i32 %ntrunc, i32* %a, align 4, !alias.scope !3, !noalias !0
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 64
				; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], %n.vec
				; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label %vector.body, !llvm.loop !5
				; CHECK: middle.block:
				; CHECK-NEXT: [[BIN_RDX:%.*]] = add <16 x i32> [[TMP11]], [[TMP10]]
				; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <16 x i32> [[TMP12]], [[BIN_RDX]]
				; CHECK-NEXT: [[BIN_RDX15:%.*]] = add <16 x i32> [[TMP13]], [[BIN_RDX14]]
				; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[BIN_RDX15]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[BIN_RDX16:%.*]] = add <16 x i32> [[BIN_RDX15]], [[RDX_SHUF]]
				; CHECK-NEXT: [[RDX_SHUF17:%.*]] = shufflevector <16 x i32> [[BIN_RDX16]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[BIN_RDX18:%.*]] = add <16 x i32> [[BIN_RDX16]], [[RDX_SHUF17]]
				; CHECK-NEXT: [[RDX_SHUF19:%.*]] = shufflevector <16 x i32> [[BIN_RDX18]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[BIN_RDX20:%.*]] = add <16 x i32> [[BIN_RDX18]], [[RDX_SHUF19]]
				; CHECK-NEXT: [[RDX_SHUF21:%.*]] = shufflevector <16 x i32> [[BIN_RDX20]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[BIN_RDX22:%.*]] = add <16 x i32> [[BIN_RDX20]], [[RDX_SHUF21]]
				; CHECK-NEXT: [[TMP15:%.*]] = extractelement <16 x i32> [[BIN_RDX22]], i32 0
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 %smax, %n.vec
				; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label %scalar.ph

	entry:			entry:
	%ntrunc = trunc i64 %n to i32			%ntrunc = trunc i64 %n to i32
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]			%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
	%tmp0 = phi i32 [ %tmp3, %for.body ], [ 0, %entry ]			%tmp0 = phi i32 [ %tmp3, %for.body ], [ 0, %entry ]
	%tmp1 = getelementptr inbounds i32, i32* %b, i64 %i			%tmp1 = getelementptr inbounds i32, i32* %b, i64 %i
	▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

	Show First 20 Lines • Show All 531 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [4096 x i32]			; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [4096 x i32]
	; CHECK-NEXT: [[BASE:%.]] = bitcast [4096 x i32] [[ALLOCA]] to i32*			; CHECK-NEXT: [[BASE:%.]] = bitcast [4096 x i32] [[ALLOCA]] to i32*
	; CHECK-NEXT: call void @init(i32* [[BASE]])			; CHECK-NEXT: call void @init(i32* [[BASE]])
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE36:%.*]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_LOAD_CONTINUE36:%.*]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP180:%.]], [[PRED_LOAD_CONTINUE36]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP168:%.]], [[PRED_LOAD_CONTINUE36]] ]
	; CHECK-NEXT: [[VEC_PHI4:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP181:%.]], [[PRED_LOAD_CONTINUE36]] ]			; CHECK-NEXT: [[VEC_PHI4:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP169:%.]], [[PRED_LOAD_CONTINUE36]] ]
	; CHECK-NEXT: [[VEC_PHI5:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP182:%.]], [[PRED_LOAD_CONTINUE36]] ]			; CHECK-NEXT: [[VEC_PHI5:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP170:%.]], [[PRED_LOAD_CONTINUE36]] ]
	; CHECK-NEXT: [[VEC_PHI6:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP183:%.]], [[PRED_LOAD_CONTINUE36]] ]			; CHECK-NEXT: [[VEC_PHI6:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP171:%.]], [[PRED_LOAD_CONTINUE36]] ]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> undef, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> undef, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> undef, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> undef, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i64> [[BROADCAST_SPLAT]], <i64 0, i64 1, i64 2, i64 3>
	; CHECK-NEXT: [[INDUCTION1:%.*]] = add <4 x i64> [[BROADCAST_SPLAT]], <i64 4, i64 5, i64 6, i64 7>			; CHECK-NEXT: [[INDUCTION1:%.*]] = add <4 x i64> [[BROADCAST_SPLAT]], <i64 4, i64 5, i64 6, i64 7>
	; CHECK-NEXT: [[INDUCTION2:%.*]] = add <4 x i64> [[BROADCAST_SPLAT]], <i64 8, i64 9, i64 10, i64 11>			; CHECK-NEXT: [[INDUCTION2:%.*]] = add <4 x i64> [[BROADCAST_SPLAT]], <i64 8, i64 9, i64 10, i64 11>
	; CHECK-NEXT: [[INDUCTION3:%.*]] = add <4 x i64> [[BROADCAST_SPLAT]], <i64 12, i64 13, i64 14, i64 15>			; CHECK-NEXT: [[INDUCTION3:%.*]] = add <4 x i64> [[BROADCAST_SPLAT]], <i64 12, i64 13, i64 14, i64 15>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]]			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]]
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]]			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]]
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]]			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]]
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]]			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]]
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> undef, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> undef, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.*]] = extractelement <4 x i1> [[TMP39]], i32 0			; CHECK-NEXT: [[TMP64:%.]] = bitcast i32 [[BASE]] to i16*
	; CHECK-NEXT: br i1 [[TMP64]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
	; CHECK: pred.load.if:
	; CHECK-NEXT: [[TMP65:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP65:%.]] = bitcast i32 [[BASE]] to i16*
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i16, i16 [[TMP65]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP66:%.]] = bitcast i32 [[BASE]] to i16*
	; CHECK-NEXT: [[TMP67:%.]] = bitcast i16 [[TMP66]] to i32*			; CHECK-NEXT: [[TMP67:%.]] = bitcast i32 [[BASE]] to i16*
	; CHECK-NEXT: [[TMP68:%.]] = load i32, i32 [[TMP67]]			; CHECK-NEXT: [[TMP68:%.*]] = extractelement <4 x i1> [[TMP39]], i32 0
	; CHECK-NEXT: [[TMP69:%.*]] = insertelement <4 x i32> undef, i32 [[TMP68]], i32 0			; CHECK-NEXT: br i1 [[TMP68]], label [[PRED_LOAD_IF:%.]], label [[PRED_LOAD_CONTINUE:%.]]
				; CHECK: pred.load.if:
				; CHECK-NEXT: [[TMP69:%.]] = getelementptr inbounds i16, i16 [[TMP64]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP70:%.]] = bitcast i16 [[TMP69]] to i32*
				; CHECK-NEXT: [[TMP71:%.]] = load i32, i32 [[TMP70]]
				; CHECK-NEXT: [[TMP72:%.*]] = insertelement <4 x i32> undef, i32 [[TMP71]], i32 0
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE]]
	; CHECK: pred.load.continue:			; CHECK: pred.load.continue:
	; CHECK-NEXT: [[TMP70:%.*]] = phi <4 x i32> [ undef, [[VECTOR_BODY]] ], [ [[TMP69]], [[PRED_LOAD_IF]] ]			; CHECK-NEXT: [[TMP73:%.*]] = phi <4 x i32> [ undef, [[VECTOR_BODY]] ], [ [[TMP72]], [[PRED_LOAD_IF]] ]
	; CHECK-NEXT: [[TMP71:%.*]] = extractelement <4 x i1> [[TMP39]], i32 1			; CHECK-NEXT: [[TMP74:%.*]] = extractelement <4 x i1> [[TMP39]], i32 1
	; CHECK-NEXT: br i1 [[TMP71]], label [[PRED_LOAD_IF7:%.]], label [[PRED_LOAD_CONTINUE8:%.]]			; CHECK-NEXT: br i1 [[TMP74]], label [[PRED_LOAD_IF7:%.]], label [[PRED_LOAD_CONTINUE8:%.]]
	; CHECK: pred.load.if7:			; CHECK: pred.load.if7:
	; CHECK-NEXT: [[TMP72:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP75:%.]] = getelementptr inbounds i16, i16 [[TMP64]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP73:%.]] = getelementptr inbounds i16, i16 [[TMP72]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP76:%.]] = bitcast i16 [[TMP75]] to i32*
	; CHECK-NEXT: [[TMP74:%.]] = bitcast i16 [[TMP73]] to i32*			; CHECK-NEXT: [[TMP77:%.]] = load i32, i32 [[TMP76]]
	; CHECK-NEXT: [[TMP75:%.]] = load i32, i32 [[TMP74]]			; CHECK-NEXT: [[TMP78:%.*]] = insertelement <4 x i32> [[TMP73]], i32 [[TMP77]], i32 1
	; CHECK-NEXT: [[TMP76:%.*]] = insertelement <4 x i32> [[TMP70]], i32 [[TMP75]], i32 1
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE8]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE8]]
	; CHECK: pred.load.continue8:			; CHECK: pred.load.continue8:
	; CHECK-NEXT: [[TMP77:%.*]] = phi <4 x i32> [ [[TMP70]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP76]], [[PRED_LOAD_IF7]] ]			; CHECK-NEXT: [[TMP79:%.*]] = phi <4 x i32> [ [[TMP73]], [[PRED_LOAD_CONTINUE]] ], [ [[TMP78]], [[PRED_LOAD_IF7]] ]
	; CHECK-NEXT: [[TMP78:%.*]] = extractelement <4 x i1> [[TMP39]], i32 2			; CHECK-NEXT: [[TMP80:%.*]] = extractelement <4 x i1> [[TMP39]], i32 2
	; CHECK-NEXT: br i1 [[TMP78]], label [[PRED_LOAD_IF9:%.]], label [[PRED_LOAD_CONTINUE10:%.]]			; CHECK-NEXT: br i1 [[TMP80]], label [[PRED_LOAD_IF9:%.]], label [[PRED_LOAD_CONTINUE10:%.]]
	; CHECK: pred.load.if9:			; CHECK: pred.load.if9:
	; CHECK-NEXT: [[TMP79:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP81:%.]] = getelementptr inbounds i16, i16 [[TMP64]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP80:%.]] = getelementptr inbounds i16, i16 [[TMP79]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP82:%.]] = bitcast i16 [[TMP81]] to i32*
	; CHECK-NEXT: [[TMP81:%.]] = bitcast i16 [[TMP80]] to i32*			; CHECK-NEXT: [[TMP83:%.]] = load i32, i32 [[TMP82]]
	; CHECK-NEXT: [[TMP82:%.]] = load i32, i32 [[TMP81]]			; CHECK-NEXT: [[TMP84:%.*]] = insertelement <4 x i32> [[TMP79]], i32 [[TMP83]], i32 2
	; CHECK-NEXT: [[TMP83:%.*]] = insertelement <4 x i32> [[TMP77]], i32 [[TMP82]], i32 2
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE10]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE10]]
	; CHECK: pred.load.continue10:			; CHECK: pred.load.continue10:
	; CHECK-NEXT: [[TMP84:%.*]] = phi <4 x i32> [ [[TMP77]], [[PRED_LOAD_CONTINUE8]] ], [ [[TMP83]], [[PRED_LOAD_IF9]] ]			; CHECK-NEXT: [[TMP85:%.*]] = phi <4 x i32> [ [[TMP79]], [[PRED_LOAD_CONTINUE8]] ], [ [[TMP84]], [[PRED_LOAD_IF9]] ]
	; CHECK-NEXT: [[TMP85:%.*]] = extractelement <4 x i1> [[TMP39]], i32 3			; CHECK-NEXT: [[TMP86:%.*]] = extractelement <4 x i1> [[TMP39]], i32 3
	; CHECK-NEXT: br i1 [[TMP85]], label [[PRED_LOAD_IF11:%.]], label [[PRED_LOAD_CONTINUE12:%.]]			; CHECK-NEXT: br i1 [[TMP86]], label [[PRED_LOAD_IF11:%.]], label [[PRED_LOAD_CONTINUE12:%.]]
	; CHECK: pred.load.if11:			; CHECK: pred.load.if11:
	; CHECK-NEXT: [[TMP86:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP87:%.]] = getelementptr inbounds i16, i16 [[TMP64]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP87:%.]] = getelementptr inbounds i16, i16 [[TMP86]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP88:%.]] = bitcast i16 [[TMP87]] to i32*			; CHECK-NEXT: [[TMP88:%.]] = bitcast i16 [[TMP87]] to i32*
	; CHECK-NEXT: [[TMP89:%.]] = load i32, i32 [[TMP88]]			; CHECK-NEXT: [[TMP89:%.]] = load i32, i32 [[TMP88]]
	; CHECK-NEXT: [[TMP90:%.*]] = insertelement <4 x i32> [[TMP84]], i32 [[TMP89]], i32 3			; CHECK-NEXT: [[TMP90:%.*]] = insertelement <4 x i32> [[TMP85]], i32 [[TMP89]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE12]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE12]]
	; CHECK: pred.load.continue12:			; CHECK: pred.load.continue12:
	; CHECK-NEXT: [[TMP91:%.*]] = phi <4 x i32> [ [[TMP84]], [[PRED_LOAD_CONTINUE10]] ], [ [[TMP90]], [[PRED_LOAD_IF11]] ]			; CHECK-NEXT: [[TMP91:%.*]] = phi <4 x i32> [ [[TMP85]], [[PRED_LOAD_CONTINUE10]] ], [ [[TMP90]], [[PRED_LOAD_IF11]] ]
	; CHECK-NEXT: [[TMP92:%.*]] = extractelement <4 x i1> [[TMP47]], i32 0			; CHECK-NEXT: [[TMP92:%.*]] = extractelement <4 x i1> [[TMP47]], i32 0
	; CHECK-NEXT: br i1 [[TMP92]], label [[PRED_LOAD_IF13:%.]], label [[PRED_LOAD_CONTINUE14:%.]]			; CHECK-NEXT: br i1 [[TMP92]], label [[PRED_LOAD_IF13:%.]], label [[PRED_LOAD_CONTINUE14:%.]]
	; CHECK: pred.load.if13:			; CHECK: pred.load.if13:
	; CHECK-NEXT: [[TMP93:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP93:%.]] = getelementptr inbounds i16, i16 [[TMP65]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP94:%.]] = getelementptr inbounds i16, i16 [[TMP93]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP94:%.]] = bitcast i16 [[TMP93]] to i32*
	; CHECK-NEXT: [[TMP95:%.]] = bitcast i16 [[TMP94]] to i32*			; CHECK-NEXT: [[TMP95:%.]] = load i32, i32 [[TMP94]]
	; CHECK-NEXT: [[TMP96:%.]] = load i32, i32 [[TMP95]]			; CHECK-NEXT: [[TMP96:%.*]] = insertelement <4 x i32> undef, i32 [[TMP95]], i32 0
	; CHECK-NEXT: [[TMP97:%.*]] = insertelement <4 x i32> undef, i32 [[TMP96]], i32 0
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE14]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE14]]
	; CHECK: pred.load.continue14:			; CHECK: pred.load.continue14:
	; CHECK-NEXT: [[TMP98:%.*]] = phi <4 x i32> [ undef, [[PRED_LOAD_CONTINUE12]] ], [ [[TMP97]], [[PRED_LOAD_IF13]] ]			; CHECK-NEXT: [[TMP97:%.*]] = phi <4 x i32> [ undef, [[PRED_LOAD_CONTINUE12]] ], [ [[TMP96]], [[PRED_LOAD_IF13]] ]
	; CHECK-NEXT: [[TMP99:%.*]] = extractelement <4 x i1> [[TMP47]], i32 1			; CHECK-NEXT: [[TMP98:%.*]] = extractelement <4 x i1> [[TMP47]], i32 1
	; CHECK-NEXT: br i1 [[TMP99]], label [[PRED_LOAD_IF15:%.]], label [[PRED_LOAD_CONTINUE16:%.]]			; CHECK-NEXT: br i1 [[TMP98]], label [[PRED_LOAD_IF15:%.]], label [[PRED_LOAD_CONTINUE16:%.]]
	; CHECK: pred.load.if15:			; CHECK: pred.load.if15:
	; CHECK-NEXT: [[TMP100:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP99:%.]] = getelementptr inbounds i16, i16 [[TMP65]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP101:%.]] = getelementptr inbounds i16, i16 [[TMP100]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP100:%.]] = bitcast i16 [[TMP99]] to i32*
	; CHECK-NEXT: [[TMP102:%.]] = bitcast i16 [[TMP101]] to i32*			; CHECK-NEXT: [[TMP101:%.]] = load i32, i32 [[TMP100]]
	; CHECK-NEXT: [[TMP103:%.]] = load i32, i32 [[TMP102]]			; CHECK-NEXT: [[TMP102:%.*]] = insertelement <4 x i32> [[TMP97]], i32 [[TMP101]], i32 1
	; CHECK-NEXT: [[TMP104:%.*]] = insertelement <4 x i32> [[TMP98]], i32 [[TMP103]], i32 1
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE16]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE16]]
	; CHECK: pred.load.continue16:			; CHECK: pred.load.continue16:
	; CHECK-NEXT: [[TMP105:%.*]] = phi <4 x i32> [ [[TMP98]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP104]], [[PRED_LOAD_IF15]] ]			; CHECK-NEXT: [[TMP103:%.*]] = phi <4 x i32> [ [[TMP97]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP102]], [[PRED_LOAD_IF15]] ]
	; CHECK-NEXT: [[TMP106:%.*]] = extractelement <4 x i1> [[TMP47]], i32 2			; CHECK-NEXT: [[TMP104:%.*]] = extractelement <4 x i1> [[TMP47]], i32 2
	; CHECK-NEXT: br i1 [[TMP106]], label [[PRED_LOAD_IF17:%.]], label [[PRED_LOAD_CONTINUE18:%.]]			; CHECK-NEXT: br i1 [[TMP104]], label [[PRED_LOAD_IF17:%.]], label [[PRED_LOAD_CONTINUE18:%.]]
	; CHECK: pred.load.if17:			; CHECK: pred.load.if17:
	; CHECK-NEXT: [[TMP107:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP105:%.]] = getelementptr inbounds i16, i16 [[TMP65]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP108:%.]] = getelementptr inbounds i16, i16 [[TMP107]], i64 [[TMP6]]			; CHECK-NEXT: [[TMP106:%.]] = bitcast i16 [[TMP105]] to i32*
	; CHECK-NEXT: [[TMP109:%.]] = bitcast i16 [[TMP108]] to i32*			; CHECK-NEXT: [[TMP107:%.]] = load i32, i32 [[TMP106]]
	; CHECK-NEXT: [[TMP110:%.]] = load i32, i32 [[TMP109]]			; CHECK-NEXT: [[TMP108:%.*]] = insertelement <4 x i32> [[TMP103]], i32 [[TMP107]], i32 2
	; CHECK-NEXT: [[TMP111:%.*]] = insertelement <4 x i32> [[TMP105]], i32 [[TMP110]], i32 2
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE18]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE18]]
	; CHECK: pred.load.continue18:			; CHECK: pred.load.continue18:
	; CHECK-NEXT: [[TMP112:%.*]] = phi <4 x i32> [ [[TMP105]], [[PRED_LOAD_CONTINUE16]] ], [ [[TMP111]], [[PRED_LOAD_IF17]] ]			; CHECK-NEXT: [[TMP109:%.*]] = phi <4 x i32> [ [[TMP103]], [[PRED_LOAD_CONTINUE16]] ], [ [[TMP108]], [[PRED_LOAD_IF17]] ]
	; CHECK-NEXT: [[TMP113:%.*]] = extractelement <4 x i1> [[TMP47]], i32 3			; CHECK-NEXT: [[TMP110:%.*]] = extractelement <4 x i1> [[TMP47]], i32 3
	; CHECK-NEXT: br i1 [[TMP113]], label [[PRED_LOAD_IF19:%.]], label [[PRED_LOAD_CONTINUE20:%.]]			; CHECK-NEXT: br i1 [[TMP110]], label [[PRED_LOAD_IF19:%.]], label [[PRED_LOAD_CONTINUE20:%.]]
	; CHECK: pred.load.if19:			; CHECK: pred.load.if19:
	; CHECK-NEXT: [[TMP114:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP111:%.]] = getelementptr inbounds i16, i16 [[TMP65]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP115:%.]] = getelementptr inbounds i16, i16 [[TMP114]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP112:%.]] = bitcast i16 [[TMP111]] to i32*
	; CHECK-NEXT: [[TMP116:%.]] = bitcast i16 [[TMP115]] to i32*			; CHECK-NEXT: [[TMP113:%.]] = load i32, i32 [[TMP112]]
	; CHECK-NEXT: [[TMP117:%.]] = load i32, i32 [[TMP116]]			; CHECK-NEXT: [[TMP114:%.*]] = insertelement <4 x i32> [[TMP109]], i32 [[TMP113]], i32 3
	; CHECK-NEXT: [[TMP118:%.*]] = insertelement <4 x i32> [[TMP112]], i32 [[TMP117]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE20]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE20]]
	; CHECK: pred.load.continue20:			; CHECK: pred.load.continue20:
	; CHECK-NEXT: [[TMP119:%.*]] = phi <4 x i32> [ [[TMP112]], [[PRED_LOAD_CONTINUE18]] ], [ [[TMP118]], [[PRED_LOAD_IF19]] ]			; CHECK-NEXT: [[TMP115:%.*]] = phi <4 x i32> [ [[TMP109]], [[PRED_LOAD_CONTINUE18]] ], [ [[TMP114]], [[PRED_LOAD_IF19]] ]
	; CHECK-NEXT: [[TMP120:%.*]] = extractelement <4 x i1> [[TMP55]], i32 0			; CHECK-NEXT: [[TMP116:%.*]] = extractelement <4 x i1> [[TMP55]], i32 0
	; CHECK-NEXT: br i1 [[TMP120]], label [[PRED_LOAD_IF21:%.]], label [[PRED_LOAD_CONTINUE22:%.]]			; CHECK-NEXT: br i1 [[TMP116]], label [[PRED_LOAD_IF21:%.]], label [[PRED_LOAD_CONTINUE22:%.]]
	; CHECK: pred.load.if21:			; CHECK: pred.load.if21:
	; CHECK-NEXT: [[TMP121:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP117:%.]] = getelementptr inbounds i16, i16 [[TMP66]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP122:%.]] = getelementptr inbounds i16, i16 [[TMP121]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP118:%.]] = bitcast i16 [[TMP117]] to i32*
	; CHECK-NEXT: [[TMP123:%.]] = bitcast i16 [[TMP122]] to i32*			; CHECK-NEXT: [[TMP119:%.]] = load i32, i32 [[TMP118]]
	; CHECK-NEXT: [[TMP124:%.]] = load i32, i32 [[TMP123]]			; CHECK-NEXT: [[TMP120:%.*]] = insertelement <4 x i32> undef, i32 [[TMP119]], i32 0
	; CHECK-NEXT: [[TMP125:%.*]] = insertelement <4 x i32> undef, i32 [[TMP124]], i32 0
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE22]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE22]]
	; CHECK: pred.load.continue22:			; CHECK: pred.load.continue22:
	; CHECK-NEXT: [[TMP126:%.*]] = phi <4 x i32> [ undef, [[PRED_LOAD_CONTINUE20]] ], [ [[TMP125]], [[PRED_LOAD_IF21]] ]			; CHECK-NEXT: [[TMP121:%.*]] = phi <4 x i32> [ undef, [[PRED_LOAD_CONTINUE20]] ], [ [[TMP120]], [[PRED_LOAD_IF21]] ]
	; CHECK-NEXT: [[TMP127:%.*]] = extractelement <4 x i1> [[TMP55]], i32 1			; CHECK-NEXT: [[TMP122:%.*]] = extractelement <4 x i1> [[TMP55]], i32 1
	; CHECK-NEXT: br i1 [[TMP127]], label [[PRED_LOAD_IF23:%.]], label [[PRED_LOAD_CONTINUE24:%.]]			; CHECK-NEXT: br i1 [[TMP122]], label [[PRED_LOAD_IF23:%.]], label [[PRED_LOAD_CONTINUE24:%.]]
	; CHECK: pred.load.if23:			; CHECK: pred.load.if23:
	; CHECK-NEXT: [[TMP128:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP123:%.]] = getelementptr inbounds i16, i16 [[TMP66]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP129:%.]] = getelementptr inbounds i16, i16 [[TMP128]], i64 [[TMP9]]			; CHECK-NEXT: [[TMP124:%.]] = bitcast i16 [[TMP123]] to i32*
	; CHECK-NEXT: [[TMP130:%.]] = bitcast i16 [[TMP129]] to i32*			; CHECK-NEXT: [[TMP125:%.]] = load i32, i32 [[TMP124]]
	; CHECK-NEXT: [[TMP131:%.]] = load i32, i32 [[TMP130]]			; CHECK-NEXT: [[TMP126:%.*]] = insertelement <4 x i32> [[TMP121]], i32 [[TMP125]], i32 1
	; CHECK-NEXT: [[TMP132:%.*]] = insertelement <4 x i32> [[TMP126]], i32 [[TMP131]], i32 1
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE24]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE24]]
	; CHECK: pred.load.continue24:			; CHECK: pred.load.continue24:
	; CHECK-NEXT: [[TMP133:%.*]] = phi <4 x i32> [ [[TMP126]], [[PRED_LOAD_CONTINUE22]] ], [ [[TMP132]], [[PRED_LOAD_IF23]] ]			; CHECK-NEXT: [[TMP127:%.*]] = phi <4 x i32> [ [[TMP121]], [[PRED_LOAD_CONTINUE22]] ], [ [[TMP126]], [[PRED_LOAD_IF23]] ]
	; CHECK-NEXT: [[TMP134:%.*]] = extractelement <4 x i1> [[TMP55]], i32 2			; CHECK-NEXT: [[TMP128:%.*]] = extractelement <4 x i1> [[TMP55]], i32 2
	; CHECK-NEXT: br i1 [[TMP134]], label [[PRED_LOAD_IF25:%.]], label [[PRED_LOAD_CONTINUE26:%.]]			; CHECK-NEXT: br i1 [[TMP128]], label [[PRED_LOAD_IF25:%.]], label [[PRED_LOAD_CONTINUE26:%.]]
	; CHECK: pred.load.if25:			; CHECK: pred.load.if25:
	; CHECK-NEXT: [[TMP135:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP129:%.]] = getelementptr inbounds i16, i16 [[TMP66]], i64 [[TMP10]]
	; CHECK-NEXT: [[TMP136:%.]] = getelementptr inbounds i16, i16 [[TMP135]], i64 [[TMP10]]			; CHECK-NEXT: [[TMP130:%.]] = bitcast i16 [[TMP129]] to i32*
	; CHECK-NEXT: [[TMP137:%.]] = bitcast i16 [[TMP136]] to i32*			; CHECK-NEXT: [[TMP131:%.]] = load i32, i32 [[TMP130]]
	; CHECK-NEXT: [[TMP138:%.]] = load i32, i32 [[TMP137]]			; CHECK-NEXT: [[TMP132:%.*]] = insertelement <4 x i32> [[TMP127]], i32 [[TMP131]], i32 2
	; CHECK-NEXT: [[TMP139:%.*]] = insertelement <4 x i32> [[TMP133]], i32 [[TMP138]], i32 2
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE26]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE26]]
	; CHECK: pred.load.continue26:			; CHECK: pred.load.continue26:
	; CHECK-NEXT: [[TMP140:%.*]] = phi <4 x i32> [ [[TMP133]], [[PRED_LOAD_CONTINUE24]] ], [ [[TMP139]], [[PRED_LOAD_IF25]] ]			; CHECK-NEXT: [[TMP133:%.*]] = phi <4 x i32> [ [[TMP127]], [[PRED_LOAD_CONTINUE24]] ], [ [[TMP132]], [[PRED_LOAD_IF25]] ]
	; CHECK-NEXT: [[TMP141:%.*]] = extractelement <4 x i1> [[TMP55]], i32 3			; CHECK-NEXT: [[TMP134:%.*]] = extractelement <4 x i1> [[TMP55]], i32 3
	; CHECK-NEXT: br i1 [[TMP141]], label [[PRED_LOAD_IF27:%.]], label [[PRED_LOAD_CONTINUE28:%.]]			; CHECK-NEXT: br i1 [[TMP134]], label [[PRED_LOAD_IF27:%.]], label [[PRED_LOAD_CONTINUE28:%.]]
	; CHECK: pred.load.if27:			; CHECK: pred.load.if27:
	; CHECK-NEXT: [[TMP142:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP135:%.]] = getelementptr inbounds i16, i16 [[TMP66]], i64 [[TMP11]]
	; CHECK-NEXT: [[TMP143:%.]] = getelementptr inbounds i16, i16 [[TMP142]], i64 [[TMP11]]			; CHECK-NEXT: [[TMP136:%.]] = bitcast i16 [[TMP135]] to i32*
	; CHECK-NEXT: [[TMP144:%.]] = bitcast i16 [[TMP143]] to i32*			; CHECK-NEXT: [[TMP137:%.]] = load i32, i32 [[TMP136]]
	; CHECK-NEXT: [[TMP145:%.]] = load i32, i32 [[TMP144]]			; CHECK-NEXT: [[TMP138:%.*]] = insertelement <4 x i32> [[TMP133]], i32 [[TMP137]], i32 3
	; CHECK-NEXT: [[TMP146:%.*]] = insertelement <4 x i32> [[TMP140]], i32 [[TMP145]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE28]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE28]]
	; CHECK: pred.load.continue28:			; CHECK: pred.load.continue28:
	; CHECK-NEXT: [[TMP147:%.*]] = phi <4 x i32> [ [[TMP140]], [[PRED_LOAD_CONTINUE26]] ], [ [[TMP146]], [[PRED_LOAD_IF27]] ]			; CHECK-NEXT: [[TMP139:%.*]] = phi <4 x i32> [ [[TMP133]], [[PRED_LOAD_CONTINUE26]] ], [ [[TMP138]], [[PRED_LOAD_IF27]] ]
	; CHECK-NEXT: [[TMP148:%.*]] = extractelement <4 x i1> [[TMP63]], i32 0			; CHECK-NEXT: [[TMP140:%.*]] = extractelement <4 x i1> [[TMP63]], i32 0
	; CHECK-NEXT: br i1 [[TMP148]], label [[PRED_LOAD_IF29:%.]], label [[PRED_LOAD_CONTINUE30:%.]]			; CHECK-NEXT: br i1 [[TMP140]], label [[PRED_LOAD_IF29:%.]], label [[PRED_LOAD_CONTINUE30:%.]]
	; CHECK: pred.load.if29:			; CHECK: pred.load.if29:
	; CHECK-NEXT: [[TMP149:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP141:%.]] = getelementptr inbounds i16, i16 [[TMP67]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP150:%.]] = getelementptr inbounds i16, i16 [[TMP149]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP142:%.]] = bitcast i16 [[TMP141]] to i32*
	; CHECK-NEXT: [[TMP151:%.]] = bitcast i16 [[TMP150]] to i32*			; CHECK-NEXT: [[TMP143:%.]] = load i32, i32 [[TMP142]]
	; CHECK-NEXT: [[TMP152:%.]] = load i32, i32 [[TMP151]]			; CHECK-NEXT: [[TMP144:%.*]] = insertelement <4 x i32> undef, i32 [[TMP143]], i32 0
	; CHECK-NEXT: [[TMP153:%.*]] = insertelement <4 x i32> undef, i32 [[TMP152]], i32 0
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE30]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE30]]
	; CHECK: pred.load.continue30:			; CHECK: pred.load.continue30:
	; CHECK-NEXT: [[TMP154:%.*]] = phi <4 x i32> [ undef, [[PRED_LOAD_CONTINUE28]] ], [ [[TMP153]], [[PRED_LOAD_IF29]] ]			; CHECK-NEXT: [[TMP145:%.*]] = phi <4 x i32> [ undef, [[PRED_LOAD_CONTINUE28]] ], [ [[TMP144]], [[PRED_LOAD_IF29]] ]
	; CHECK-NEXT: [[TMP155:%.*]] = extractelement <4 x i1> [[TMP63]], i32 1			; CHECK-NEXT: [[TMP146:%.*]] = extractelement <4 x i1> [[TMP63]], i32 1
	; CHECK-NEXT: br i1 [[TMP155]], label [[PRED_LOAD_IF31:%.]], label [[PRED_LOAD_CONTINUE32:%.]]			; CHECK-NEXT: br i1 [[TMP146]], label [[PRED_LOAD_IF31:%.]], label [[PRED_LOAD_CONTINUE32:%.]]
	; CHECK: pred.load.if31:			; CHECK: pred.load.if31:
	; CHECK-NEXT: [[TMP156:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP147:%.]] = getelementptr inbounds i16, i16 [[TMP67]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP157:%.]] = getelementptr inbounds i16, i16 [[TMP156]], i64 [[TMP13]]			; CHECK-NEXT: [[TMP148:%.]] = bitcast i16 [[TMP147]] to i32*
	; CHECK-NEXT: [[TMP158:%.]] = bitcast i16 [[TMP157]] to i32*			; CHECK-NEXT: [[TMP149:%.]] = load i32, i32 [[TMP148]]
	; CHECK-NEXT: [[TMP159:%.]] = load i32, i32 [[TMP158]]			; CHECK-NEXT: [[TMP150:%.*]] = insertelement <4 x i32> [[TMP145]], i32 [[TMP149]], i32 1
	; CHECK-NEXT: [[TMP160:%.*]] = insertelement <4 x i32> [[TMP154]], i32 [[TMP159]], i32 1
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE32]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE32]]
	; CHECK: pred.load.continue32:			; CHECK: pred.load.continue32:
	; CHECK-NEXT: [[TMP161:%.*]] = phi <4 x i32> [ [[TMP154]], [[PRED_LOAD_CONTINUE30]] ], [ [[TMP160]], [[PRED_LOAD_IF31]] ]			; CHECK-NEXT: [[TMP151:%.*]] = phi <4 x i32> [ [[TMP145]], [[PRED_LOAD_CONTINUE30]] ], [ [[TMP150]], [[PRED_LOAD_IF31]] ]
	; CHECK-NEXT: [[TMP162:%.*]] = extractelement <4 x i1> [[TMP63]], i32 2			; CHECK-NEXT: [[TMP152:%.*]] = extractelement <4 x i1> [[TMP63]], i32 2
	; CHECK-NEXT: br i1 [[TMP162]], label [[PRED_LOAD_IF33:%.]], label [[PRED_LOAD_CONTINUE34:%.]]			; CHECK-NEXT: br i1 [[TMP152]], label [[PRED_LOAD_IF33:%.]], label [[PRED_LOAD_CONTINUE34:%.]]
	; CHECK: pred.load.if33:			; CHECK: pred.load.if33:
	; CHECK-NEXT: [[TMP163:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP153:%.]] = getelementptr inbounds i16, i16 [[TMP67]], i64 [[TMP14]]
	; CHECK-NEXT: [[TMP164:%.]] = getelementptr inbounds i16, i16 [[TMP163]], i64 [[TMP14]]			; CHECK-NEXT: [[TMP154:%.]] = bitcast i16 [[TMP153]] to i32*
	; CHECK-NEXT: [[TMP165:%.]] = bitcast i16 [[TMP164]] to i32*			; CHECK-NEXT: [[TMP155:%.]] = load i32, i32 [[TMP154]]
	; CHECK-NEXT: [[TMP166:%.]] = load i32, i32 [[TMP165]]			; CHECK-NEXT: [[TMP156:%.*]] = insertelement <4 x i32> [[TMP151]], i32 [[TMP155]], i32 2
	; CHECK-NEXT: [[TMP167:%.*]] = insertelement <4 x i32> [[TMP161]], i32 [[TMP166]], i32 2
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE34]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE34]]
	; CHECK: pred.load.continue34:			; CHECK: pred.load.continue34:
	; CHECK-NEXT: [[TMP168:%.*]] = phi <4 x i32> [ [[TMP161]], [[PRED_LOAD_CONTINUE32]] ], [ [[TMP167]], [[PRED_LOAD_IF33]] ]			; CHECK-NEXT: [[TMP157:%.*]] = phi <4 x i32> [ [[TMP151]], [[PRED_LOAD_CONTINUE32]] ], [ [[TMP156]], [[PRED_LOAD_IF33]] ]
	; CHECK-NEXT: [[TMP169:%.*]] = extractelement <4 x i1> [[TMP63]], i32 3			; CHECK-NEXT: [[TMP158:%.*]] = extractelement <4 x i1> [[TMP63]], i32 3
	; CHECK-NEXT: br i1 [[TMP169]], label [[PRED_LOAD_IF35:%.*]], label [[PRED_LOAD_CONTINUE36]]			; CHECK-NEXT: br i1 [[TMP158]], label [[PRED_LOAD_IF35:%.*]], label [[PRED_LOAD_CONTINUE36]]
	; CHECK: pred.load.if35:			; CHECK: pred.load.if35:
	; CHECK-NEXT: [[TMP170:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[TMP159:%.]] = getelementptr inbounds i16, i16 [[TMP67]], i64 [[TMP15]]
	; CHECK-NEXT: [[TMP171:%.]] = getelementptr inbounds i16, i16 [[TMP170]], i64 [[TMP15]]			; CHECK-NEXT: [[TMP160:%.]] = bitcast i16 [[TMP159]] to i32*
	; CHECK-NEXT: [[TMP172:%.]] = bitcast i16 [[TMP171]] to i32*			; CHECK-NEXT: [[TMP161:%.]] = load i32, i32 [[TMP160]]
	; CHECK-NEXT: [[TMP173:%.]] = load i32, i32 [[TMP172]]			; CHECK-NEXT: [[TMP162:%.*]] = insertelement <4 x i32> [[TMP157]], i32 [[TMP161]], i32 3
	; CHECK-NEXT: [[TMP174:%.*]] = insertelement <4 x i32> [[TMP168]], i32 [[TMP173]], i32 3
	; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE36]]			; CHECK-NEXT: br label [[PRED_LOAD_CONTINUE36]]
	; CHECK: pred.load.continue36:			; CHECK: pred.load.continue36:
	; CHECK-NEXT: [[TMP175:%.*]] = phi <4 x i32> [ [[TMP168]], [[PRED_LOAD_CONTINUE34]] ], [ [[TMP174]], [[PRED_LOAD_IF35]] ]			; CHECK-NEXT: [[TMP163:%.*]] = phi <4 x i32> [ [[TMP157]], [[PRED_LOAD_CONTINUE34]] ], [ [[TMP162]], [[PRED_LOAD_IF35]] ]
	; CHECK-NEXT: [[TMP176:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP164:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP177:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP165:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP178:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP166:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP179:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP167:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[TMP91]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[TMP91]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI37:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[TMP119]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI37:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[TMP115]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI38:%.*]] = select <4 x i1> [[TMP55]], <4 x i32> [[TMP147]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI38:%.*]] = select <4 x i1> [[TMP55]], <4 x i32> [[TMP139]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI39:%.*]] = select <4 x i1> [[TMP63]], <4 x i32> [[TMP175]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI39:%.*]] = select <4 x i1> [[TMP63]], <4 x i32> [[TMP163]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP180]] = add <4 x i32> [[VEC_PHI]], [[PREDPHI]]			; CHECK-NEXT: [[TMP168]] = add <4 x i32> [[VEC_PHI]], [[PREDPHI]]
	; CHECK-NEXT: [[TMP181]] = add <4 x i32> [[VEC_PHI4]], [[PREDPHI37]]			; CHECK-NEXT: [[TMP169]] = add <4 x i32> [[VEC_PHI4]], [[PREDPHI37]]
	; CHECK-NEXT: [[TMP182]] = add <4 x i32> [[VEC_PHI5]], [[PREDPHI38]]			; CHECK-NEXT: [[TMP170]] = add <4 x i32> [[VEC_PHI5]], [[PREDPHI38]]
	; CHECK-NEXT: [[TMP183]] = add <4 x i32> [[VEC_PHI6]], [[PREDPHI39]]			; CHECK-NEXT: [[TMP171]] = add <4 x i32> [[VEC_PHI6]], [[PREDPHI39]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP184:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP172:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP184]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !8			; CHECK-NEXT: br i1 [[TMP172]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !8
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP181]], [[TMP180]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP169]], [[TMP168]]
	; CHECK-NEXT: [[BIN_RDX40:%.*]] = add <4 x i32> [[TMP182]], [[BIN_RDX]]			; CHECK-NEXT: [[BIN_RDX40:%.*]] = add <4 x i32> [[TMP170]], [[BIN_RDX]]
	; CHECK-NEXT: [[BIN_RDX41:%.*]] = add <4 x i32> [[TMP183]], [[BIN_RDX40]]			; CHECK-NEXT: [[BIN_RDX41:%.*]] = add <4 x i32> [[TMP171]], [[BIN_RDX40]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX41]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[BIN_RDX41]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX42:%.*]] = add <4 x i32> [[BIN_RDX41]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX42:%.*]] = add <4 x i32> [[BIN_RDX41]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF43:%.*]] = shufflevector <4 x i32> [[BIN_RDX42]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF43:%.*]] = shufflevector <4 x i32> [[BIN_RDX42]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX44:%.*]] = add <4 x i32> [[BIN_RDX42]], [[RDX_SHUF43]]			; CHECK-NEXT: [[BIN_RDX44:%.*]] = add <4 x i32> [[BIN_RDX42]], [[RDX_SHUF43]]
	; CHECK-NEXT: [[TMP185:%.*]] = extractelement <4 x i32> [[BIN_RDX44]], i32 0			; CHECK-NEXT: [[TMP173:%.*]] = extractelement <4 x i32> [[BIN_RDX44]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP185]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP173]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LATCH:%.*]] ]
	; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LATCH]] ]			; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LATCH]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
	; CHECK-NEXT: [[TEST_ADDR:%.]] = getelementptr inbounds i1, i1 [[TEST_BASE]], i64 [[IV]]			; CHECK-NEXT: [[TEST_ADDR:%.]] = getelementptr inbounds i1, i1 [[TEST_BASE]], i64 [[IV]]
	; CHECK-NEXT: [[EARLYCND:%.]] = load i1, i1 [[TEST_ADDR]]			; CHECK-NEXT: [[EARLYCND:%.]] = load i1, i1 [[TEST_ADDR]]
	; CHECK-NEXT: br i1 [[EARLYCND]], label [[PRED:%.*]], label [[LATCH]]			; CHECK-NEXT: br i1 [[EARLYCND]], label [[PRED:%.*]], label [[LATCH]]
	; CHECK: pred:			; CHECK: pred:
	; CHECK-NEXT: [[BASE_I16P:%.]] = bitcast i32 [[BASE]] to i16*			; CHECK-NEXT: [[BASE_I16P:%.]] = bitcast i32 [[BASE]] to i16*
	; CHECK-NEXT: [[ADDR_I16P:%.]] = getelementptr inbounds i16, i16 [[BASE_I16P]], i64 [[IV]]			; CHECK-NEXT: [[ADDR_I16P:%.]] = getelementptr inbounds i16, i16 [[BASE_I16P]], i64 [[IV]]
	; CHECK-NEXT: [[ADDR:%.]] = bitcast i16 [[ADDR_I16P]] to i32*			; CHECK-NEXT: [[ADDR:%.]] = bitcast i16 [[ADDR_I16P]] to i32*
	; CHECK-NEXT: [[VAL:%.]] = load i32, i32 [[ADDR]]			; CHECK-NEXT: [[VAL:%.]] = load i32, i32 [[ADDR]]
	; CHECK-NEXT: br label [[LATCH]]			; CHECK-NEXT: br label [[LATCH]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK-NEXT: [[VAL_PHI:%.*]] = phi i32 [ 0, [[LOOP]] ], [ [[VAL]], [[PRED]] ]			; CHECK-NEXT: [[VAL_PHI:%.*]] = phi i32 [ 0, [[LOOP]] ], [ [[VAL]], [[PRED]] ]
	; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[VAL_PHI]]			; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[VAL_PHI]]
	; CHECK-NEXT: [[EXIT:%.*]] = icmp ugt i64 [[IV]], 4094			; CHECK-NEXT: [[EXIT:%.*]] = icmp ugt i64 [[IV]], 4094
	; CHECK-NEXT: br i1 [[EXIT]], label [[LOOP_EXIT]], label [[LOOP]], !llvm.loop !9			; CHECK-NEXT: br i1 [[EXIT]], label [[LOOP_EXIT]], label [[LOOP]], !llvm.loop !9
	; CHECK: loop_exit:			; CHECK: loop_exit:
	; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[LATCH]] ], [ [[TMP185]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[LATCH]] ], [ [[TMP173]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]			; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
	;			;
	entry:			entry:
	%alloca = alloca [4096 x i32]			%alloca = alloca [4096 x i32]
	%base = bitcast [4096 x i32]* %alloca to i32*			%base = bitcast [4096 x i32]* %alloca to i32*
	call void @init(i32* %base)			call void @init(i32* %base)
	br label %loop			br label %loop
	loop:			loop:
	▲ Show 20 Lines • Show All 1,281 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	scalar.body:
br i1 %cond, label %for.end, label %scalar.body		br i1 %cond, label %for.end, label %scalar.body

for.end:		for.end:
ret void		ret void
}		}

; UNROLL-NO-IC-LABEL: @constant_folded_previous_value(		; UNROLL-NO-IC-LABEL: @constant_folded_previous_value(
; UNROLL-NO-IC: vector.body:		; UNROLL-NO-IC: vector.body:
; UNROLL-NO-IC: [[VECTOR_RECUR:%.*]] = phi <4 x i64> [ <i64 undef, i64 undef, i64 undef, i64 0>, %vector.ph ], [ <i64 1, i64 1, i64 1, i64 1>, %vector.body ]		; UNROLL-NO-IC: [[VECTOR_RECUR:%.*]] = phi <4 x i64> [ <i64 undef, i64 undef, i64 undef, i64 0>, %vector.ph ], [ %broadcast.splat4, %vector.body ]
; UNROLL-NO-IC-NEXT: [[TMP0:%.*]] = shufflevector <4 x i64> [[VECTOR_RECUR]], <4 x i64> <i64 1, i64 1, i64 1, i64 1>, <4 x i32> <i32 3, i32 4, i32 5, i32 6>		; UNROLL-NO-IC-NEXT: %broadcast.splatinsert = insertelement <4 x i64> undef, i64 %index, i32 0
		; UNROLL-NO-IC-NEXT: %broadcast.splat = shufflevector <4 x i64> %broadcast.splatinsert, <4 x i64> undef, <4 x i32> zeroinitializer
; UNROLL-NO-IC: br i1 {{.*}}, label %middle.block, label %vector.body		; UNROLL-NO-IC: br i1 {{.*}}, label %middle.block, label %vector.body
;		;
define void @constant_folded_previous_value() {		define void @constant_folded_previous_value() {
entry:		entry:
br label %scalar.body		br label %scalar.body

scalar.body:		scalar.body:
%i = phi i64 [ 0, %entry ], [ %i.next, %scalar.body ]		%i = phi i64 [ 0, %entry ], [ %i.next, %scalar.body ]
▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/no_outside_user.ll

Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines	f1.exit.loopexit:
%.lcssa = phi i32 [ %sum, %.lr.ph.i ]		%.lcssa = phi i32 [ %sum, %.lr.ph.i ]
ret i32 %.lcssa		ret i32 %.lcssa
}		}

@tab = common global [32 x i8] zeroinitializer, align 1		@tab = common global [32 x i8] zeroinitializer, align 1

; CHECK-LABEL: non_uniform_live_out()		; CHECK-LABEL: non_uniform_live_out()
; CHECK-LABEL: vector.body:		; CHECK-LABEL: vector.body:
; CHECK: %vec.ind = phi <2 x i32> [ <i32 0, i32 1>, %vector.ph ], [ %vec.ind.next, %vector.body ]		; CHECK: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
; CHECK: [[ADD:%[a-zA-Z0-9.]+]] = add <2 x i32> %vec.ind, <i32 7, i32 7>		; CHECK: [[ADD1:%[a-zA-Z0-9.]+]] = add i32 %index, 0
; CHECK: [[EE:%[a-zA-Z0-9.]+]] = extractelement <2 x i32> [[ADD]], i32 0		; CHECK-NEXT: [[ADD2:%[a-zA-Z0-9.]+]] = add i32 [[ADD1]], 7
; CHECK: [[GEP:%[a-zA-Z0-9.]+]] = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 [[EE]]		; CHECK: [[GEP:%[a-zA-Z0-9.]+]] = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 [[ADD2]]
; CHECK-NEXT: [[GEP2:%[a-zA-Z0-9.]+]] = getelementptr inbounds i8, i8* [[GEP]], i32 0		; CHECK-NEXT: [[GEP2:%[a-zA-Z0-9.]+]] = getelementptr inbounds i8, i8* [[GEP]], i32 0
; CHECK-NEXT: [[BC:%[a-zA-Z0-9.]+]] = bitcast i8* [[GEP2]] to <2 x i8>*		; CHECK-NEXT: [[BC:%[a-zA-Z0-9.]+]] = bitcast i8* [[GEP2]] to <2 x i8>*
; CHECK-NEXT: %wide.load = load <2 x i8>, <2 x i8>* [[BC]]		; CHECK-NEXT: %wide.load = load <2 x i8>, <2 x i8>* [[BC]]
; CHECK-NEXT: [[ADD2:%[a-zA-Z0-9.]+]] = add <2 x i8> %wide.load, <i8 1, i8 1>		; CHECK-NEXT: [[ADD3:%[a-zA-Z0-9.]+]] = add <2 x i8> %wide.load, <i8 1, i8 1>
; CHECK: store <2 x i8> [[ADD2]], <2 x i8>*		; CHECK: store <2 x i8> [[ADD3]], <2 x i8>*

; CHECK-LABEL: middle.block:
; CHECK: [[ADDEE:%[a-zA-Z0-9.]+]] = extractelement <2 x i32> [[ADD]], i32 1

; CHECK-LABEL: for.end:		; CHECK-LABEL: for.end:
; CHECK: %lcssa = phi i32 [ %i.09, %for.body ], [ [[ADDEE]], %middle.block ]		; CHECK: %lcssa = phi i32 [ %i.09, %for.body ], [ [[ADD2]], %middle.block ]
; CHECK: %arrayidx.out = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %lcssa		; CHECK: %arrayidx.out = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %lcssa
define i32 @non_uniform_live_out() {		define i32 @non_uniform_live_out() {
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]		%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
%i.09 = add i32 %i.08, 7		%i.09 = add i32 %i.08, 7
Show All 14 Lines

llvm/test/Transforms/LoopVectorize/pr32859.ll

	; RUN: opt < %s -loop-vectorize -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -S \| FileCheck --check-prefix=CM %s
				; RUN: opt -force-vector-width=4 < %s -loop-vectorize -S \| FileCheck --check-prefix=FORCE %s

	; Out of the LCSSA form we could have 'phi i32 [ loop-invariant, %for.inc.2.i ]'			; Out of the LCSSA form we could have 'phi i32 [ loop-invariant, %for.inc.2.i ]'
	; but the IR Verifier requires for PHI one entry for each predecessor of			; but the IR Verifier requires for PHI one entry for each predecessor of
	; it's parent basic block. The original PR14725 solution for the issue just			; it's parent basic block. The original PR14725 solution for the issue just
	; added 'undef' for an predecessor BB and which is not correct. We copy the real			; added 'undef' for an predecessor BB and which is not correct. We copy the real
	; value for another predecessor instead of bringing 'undef'.			; value for another predecessor instead of bringing 'undef'.

	; CHECK-LABEL: for.cond.preheader:			; FORCE-LABEL: for.cond.preheader:
	; CHECK: %e.0.ph = phi i32 [ 0, %if.end.2.i ], [ 0, %middle.block ]			; FORCE-NEXT: %e.0.ph = phi i32 [ 0, %if.end.2.i ]

				; Without forcing vectorization, we do not vectorize because we won't generate
				; any vector instructions, besides the loop management code.
				; CM-LABEL: entry:
				; CM-NEXT: br label %for.cond1.preheader.i

				; CM-LABEL: for.cond1.preheader.i:
				; CM-NEXT: %c.06.i = phi i32 [ 0, %entry ], [ %inc5.i, %if.end.2.i ]
				; CM-NEXT: %tobool.i = icmp ne i32 undef, 0
				; CM-NEXT: br label %if.end.2.i

				; CM-LABEL: if.end.2.i:
				; CM-NEXT: %inc5.i = add nsw i32 %c.06.i, 1
				; CM-NEXT: %cmp.i = icmp slt i32 %inc5.i, 16
				; CM-NEXT: br i1 %cmp.i, label %for.cond1.preheader.i, label %for.cond.preheader

				; CM-LABEL: for.cond.preheader:
				; CM-NEXT: %e.0.ph = phi i32 [ 0, %if.end.2.i ]
				; CM-NEXT: unreachable
	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define void @main() #0 {			define void @main() #0 {
	entry:			entry:
	br label %for.cond1.preheader.i			br label %for.cond1.preheader.i

	for.cond1.preheader.i: ; preds = %if.end.2.i, %entry			for.cond1.preheader.i: ; preds = %if.end.2.i, %entry
	%c.06.i = phi i32 [ 0, %entry ], [ %inc5.i, %if.end.2.i ]			%c.06.i = phi i32 [ 0, %entry ], [ %inc5.i, %if.end.2.i ]
	%tobool.i = icmp ne i32 undef, 0			%tobool.i = icmp ne i32 undef, 0
	Show All 11 Lines

llvm/test/Transforms/LoopVectorize/vector-intrinsic-call-cost.ll

	; RUN: opt -S -loop-vectorize -force-vector-width=4 %s \| FileCheck %s			; RUN: opt -S -loop-vectorize -force-vector-width=4 %s \| FileCheck %s

	; CHECK-LABEL: @test_fshl			; CHECK-LABEL: @test_fshl_invariant
	; CHECK-LABEL: vector.body:			; CHECK-LABEL: vector.body:
	; CHECK-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]			; CHECK-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECK-NEXT: %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0			; CHECK-NEXT: %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0
	; CHECK-NEXT: %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer			; CHECK-NEXT: %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
	; CHECK-NEXT: %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>			; CHECK-NEXT: %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>
	; CHECK-NEXT: %0 = add i32 %index, 0			; CHECK-NEXT: %0 = add i32 %index, 0
	; CHECK-NEXT: %1 = call <4 x i16> @llvm.fshl.v4i16(<4 x i16> undef, <4 x i16> undef, <4 x i16> <i16 15, i16 15, i16 15, i16 15>)			; CHECK-NEXT: %1 = tail call i16 @llvm.fshl.i16(i16 undef, i16 undef, i16 15)
	; CHECK-NEXT: %index.next = add i32 %index, 4			; CHECK-NEXT: %index.next = add i32 %index, 4
	; CHECK-NEXT: %2 = icmp eq i32 %index.next, %n.vec			; CHECK-NEXT: %2 = icmp eq i32 %index.next, %n.vec
	; CHECK-NEXT: br i1 %2, label %middle.block, label %vector.body, !llvm.loop !0			; CHECK-NEXT: br i1 %2, label %middle.block, label %vector.body, !llvm.loop !0
	;			;
	define void @test_fshl(i32 %width) {			define void @test_fshl_invariant(i32 %width) {
	entry:			entry:
	br label %for.body9.us.us			br label %for.body9.us.us

	for.cond6.for.cond.cleanup8_crit_edge.us.us: ; preds = %for.body9.us.us			for.cond6.for.cond.cleanup8_crit_edge.us.us: ; preds = %for.body9.us.us
	ret void			ret void

	for.body9.us.us: ; preds = %for.body9.us.us, %entry			for.body9.us.us: ; preds = %for.body9.us.us, %entry
	%x.020.us.us = phi i32 [ 0, %entry ], [ %inc.us.us, %for.body9.us.us ]			%x.020.us.us = phi i32 [ 0, %entry ], [ %inc.us.us, %for.body9.us.us ]
	%conv4.i.us.us = tail call i16 @llvm.fshl.i16(i16 undef, i16 undef, i16 15)			%conv4.i.us.us = tail call i16 @llvm.fshl.i16(i16 undef, i16 undef, i16 15)
	%inc.us.us = add nuw i32 %x.020.us.us, 1			%inc.us.us = add nuw i32 %x.020.us.us, 1
	%exitcond50 = icmp eq i32 %inc.us.us, %width			%exitcond50 = icmp eq i32 %inc.us.us, %width
	br i1 %exitcond50, label %for.cond6.for.cond.cleanup8_crit_edge.us.us, label %for.body9.us.us			br i1 %exitcond50, label %for.cond6.for.cond.cleanup8_crit_edge.us.us, label %for.body9.us.us
	}			}

	declare i16 @llvm.fshl.i16(i16, i16, i16)			declare i16 @llvm.fshl.i16(i16, i16, i16)

				; CHECK-LABEL: @test_fshl(
				; CHECK-LABEL: vector.body:
				; CHECK-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECK-NEXT: %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0
				; CHECK-NEXT: %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>
				; CHECK-NEXT: %8 = add i32 %index, 0
				; CHECK-NEXT: %9 = getelementptr i16, i16* %A, i32 %8
				; CHECK-NEXT: %10 = getelementptr i16, i16* %9, i32 0
				; CHECK-NEXT: %11 = bitcast i16* %10 to <4 x i16>*
				; CHECK-NEXT: %wide.load = load <4 x i16>, <4 x i16>* %11, align 2
				; CHECK-NEXT: %12 = call <4 x i16> @llvm.fshl.v4i16(<4 x i16> %wide.load, <4 x i16> %wide.load, <4 x i16> <i16 15, i16 15, i16 15, i16 15>)
				; CHECK-NEXT: %index.next = add i32 %index, 4
				; CHECK-NEXT: %13 = icmp eq i32 %index.next, %n.vec
				; CHECK-NEXT: br i1 %13, label %middle.block, label %vector.body, !llvm.loop !4

				define void @test_fshl(i32 %width, i16* %A) {
				entry:
				br label %for.body9.us.us

				for.cond6.for.cond.cleanup8_crit_edge.us.us: ; preds = %for.body9.us.us
				ret void

				for.body9.us.us: ; preds = %for.body9.us.us, %entry
				%x.020.us.us = phi i32 [ 0, %entry ], [ %inc.us.us, %for.body9.us.us ]
				%A.ptr = getelementptr i16, i16* %A, i32 %x.020.us.us
				%a = load i16, i16* %A.ptr
				%conv4.i.us.us = tail call i16 @llvm.fshl.i16(i16 %a, i16 %a, i16 15)
				%inc.us.us = add nuw i32 %x.020.us.us, 1
				%exitcond50 = icmp eq i32 %inc.us.us, %width
				br i1 %exitcond50, label %for.cond6.for.cond.cleanup8_crit_edge.us.us, label %for.body9.us.us
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Mark instructions with loop invariant arguments as uniform. (WIP)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 224463

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll

llvm/test/Transforms/LoopVectorize/X86/assume.ll

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll

llvm/test/Transforms/LoopVectorize/X86/cost-model-assert.ll

llvm/test/Transforms/LoopVectorize/X86/funclet.ll

llvm/test/Transforms/LoopVectorize/X86/invariant-load-gather.ll

llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll

llvm/test/Transforms/LoopVectorize/no_outside_user.ll

llvm/test/Transforms/LoopVectorize/pr32859.ll

llvm/test/Transforms/LoopVectorize/vector-intrinsic-call-cost.ll

[LV] Mark instructions with loop invariant arguments as uniform. (WIP)
AbandonedPublic