This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
8/11
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
ARM/
-
sphinx.ll
-
PowerPC/
-
optimal-epilog-vectorization.ll
-
SystemZ/
-
predicated-first-order-recurrence.ll
-
X86/
-
constant-fold.ll
-
float-induction.ll
1/1
if-pred-stores.ll
-
loop-form.ll
-
memdep-fold-tail.ll
-
optimal-epilog-vectorization-liveout.ll
-
optimal-epilog-vectorization.ll
-
pr44488-predication.ll
-
pr44547.ll
-
pr46525-expander-insertpoint.ll
-
single-value-blend-phis.ll

Differential D75746

[LoopVectorizer] Simplify branch in the remainder loop for trivial cases
Needs ReviewPublic

Authored by danilaml on Mar 6 2020, 7:35 AM.

Download Raw Diff

Details

Reviewers

Ayal
fhahn
hsaito
gilr
rengolin
dcaballe

Summary

When vectorizing by factor of 2 the remainder loop always executes
only one iteration so there is no actual need to keep the branch.

Fixes PR44547

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

danilaml created this revision.Mar 6 2020, 7:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2020, 7:35 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

danilaml marked an inline comment as done.Mar 6 2020, 7:40 AM

danilaml added inline comments.

llvm/test/Transforms/LoopVectorize/if-pred-stores.ll
346	this transform seems correct, but not sure if the original purpose of this test is still fulfilled

danilaml updated this revision to Diff 248733.Mar 6 2020, 7:57 AM

danilaml updated this revision to Diff 248737.Mar 6 2020, 8:03 AM

danilaml added reviewers: Ayal, fhahn, hsaito, gilr, rengolin, dcaballe.Mar 6 2020, 8:08 AM

danilaml marked an inline comment as done.

danilaml added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	I think this check is enough unless there are other cases in which "remainder loop has `N % (VF*UF)` iterations doesn't hold.

Harbormaster completed remote builds in B48336: Diff 248728.Mar 6 2020, 8:14 AM

Harbormaster completed remote builds in B48344: Diff 248737.Mar 6 2020, 8:46 AM

Harbormaster completed remote builds in B48341: Diff 248733.

anton.kolesov added a subscriber: anton.kolesov.Mar 10 2020, 11:15 PM

ping

Eliminating a redundant back-edge is clearly good. It would be better if such a special-case cleanup could be handled by some subsequent optimization, and possibly generalized. Note however that the remainder loop may or may not be considered subject for further optimization: LV currently disables unrolling the remainder loop following r231631, OTOH it may be worth vectorizing according to D30247. If desired, optimizations other than vectorization should preferably be taken care of by subsequent passes such as indvars and loop-unroll. LV knows that the trip count of the remainder loop is at most VF*UF, in the absence of overflows and runtime guards, and can make an effort to convey this bound, if GVN and IPSCCP fail to pick it up (referring to PR44547). Unfortunately, introducing an llvm.assume() seems insufficient - perhaps it could be made to work?

LV originally tries to keep the control of the remainder loop intact, adjusting only the starting values of its phi's, including that of its iv. If this control is going to be modified, by hacking its latch branch, another alternative is to replace it altogether with a new canonical {0, +1, %TripCount} iv (as done for the vector loop), possibly following a %TripCount = urem %ComputedTripCount, VF*UF which conveys this upper bound clearly. Somewhat akin to how truncateToMinimalBitwidths() conveys minimal bitwidths.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	Note that the trip count of the remainder loop may be equal to VFUF, when loop `requiresScalarEpilogue()`; so in the above special case of VFUF==2 remainder loop may iterate once or twice. Note that `emitMinimumIterationCountCheck()` also takes care of the case where adding 1 to the backedge-taken count overflows, leading to an incorrect trip count of zero; here too "remainder" loop iterates (much) more than once.
3602	`BI` is aka `ScalarLatchBr`

danilaml updated this revision to Diff 251702.Mar 20 2020, 10:44 AM

danilaml marked 2 inline comments as done.

Herald added a reviewer: aartbik. · View Herald TranscriptMar 20 2020, 10:44 AM

Updated revision with additional checks and rebased.

I'm not sure that llvm.assume can be reliably made to work (and be simpler, than just eliminating back edge).

Going with {0, +1, %TripCount} might be more beneficial in the end (it should also be trivial to call vectorize/unroll recursively in the remainder loop in such case, if I understood things correctly).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	Thanks. I knew I might've missed something. This makes me more skeptical about potential llvm.assume solution. Am I understanding your note correctly, that adding requiresScalarEpilogue check is enough?

Harbormaster completed remote builds in B49923: Diff 251702.Mar 20 2020, 11:24 AM

Ayal added inline comments.Mar 20 2020, 5:28 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	Adding `requiresScalarEpilogue()` check is enough to handle the first case, but not the second case. One way to try and handle the second case is to change `getOrCreateVectorTripCount()` so that it relies on `PSE::getBackedgeTakenCount()` w/o adding 1 to it, as this addition (done by `getOrCreateTripCount()`) may overflow to zero. See r209854, and the `max_i32_backedgetaken()` test it added to test/Transforms/LoopVectorize/induction.ll. Another way may be to check if/when adding 1 is known not to overflow.

danilaml updated this revision to Diff 254766.Apr 3 2020, 6:04 AM

danilaml marked an inline comment as done.

danilaml updated this revision to Diff 254767.Apr 3 2020, 6:07 AM

danilaml marked 2 inline comments as done.Apr 3 2020, 6:46 AM

danilaml added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	I haven't figured out how to make VectorTripCount reliant solely on getBackedgeTakenCount without introducing extra checks/instructions (which seems to be the rationale behind the current code, instead of the one in r209854, at the expense of not vectorizing the rare UINT_MAX loops). Instead, I've opted in checking whether the overflow might've occurred by using getConstantMaxBackedgeTakenCount. If I understood things correctly, it would return -1 (all ones) in the overflow case.

Harbormaster failed remote builds in B51620: Diff 254766!Apr 3 2020, 6:57 AM

Harbormaster failed remote builds in B51621: Diff 254767!

danilaml updated this revision to Diff 254791.Apr 3 2020, 7:03 AM

danilaml marked an inline comment as done.

Ayal added inline comments.Apr 3 2020, 7:16 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

3600

A sketch of making VectorTripCount work correctly for loops with BTC= UINTMAX as well:

Step = VF*UF
BTC = PSE::BackedgeTakenCount()
N = BTC + 1                     // could overflow to 0 so do not compute N % Step
if (foldTail) N = N + (Step-1)  // for rounding up
R = BTC % Step                  // Fits foldTail: (N+Step-1)%Step == (BTC+1+Step-1)%Step == (BTC+Step)%Step == BTC%Step
if !(foldTail) { R = R + 1      // Fits requiresScalarEpilog: produces 0 < R <= Step 
  if !(requiresScalarEpilog) R = (R == Step ? 0 : R) == R % Step}
VectorTripCount = N - R

which could be optimized into

Step = VF*UF
BTC = PSE::BackedgeTakenCount()
R = BTC % Step
VTC = BTC - R
if !(requiresScalarEpilog) VTC = (R == Step-1 ? VTC + Step : VTC)
if (foldTail) VTC = VTC + Step
VectorTripCount = VTC

This also allows foldTail to work with Steps (i.e., UF's) that are not a power of 2.

Harbormaster failed remote builds in B51633: Diff 254791!Apr 3 2020, 8:02 AM

danilaml updated this revision to Diff 255335.Apr 6 2020, 7:35 AM

danilaml marked an inline comment as done.Apr 6 2020, 8:09 AM

danilaml added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	Hm, but doesn't it introduce additional instructions/compares that might lead to worse CodeGen? I.e. in the common case, BTC will need to be computed, when `BTC - R`, when `select` from `VTC` and `VTC+Step`. whereas currently it is just `VectorTripCount = TC - (TC % Step)`, where TC is already computed.

Harbormaster completed remote builds in B51942: Diff 255335.Apr 6 2020, 8:40 AM

Ayal added inline comments.Apr 6 2020, 12:42 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	Hopefully the code-size increase and the slowdown will be insignificant, being outside the loop, but indeed they need to be checked. Another option is to use the fact that, although BTC+1 might overflow and wrap to zero, BTC-(Step-1) may not underflow if !foldTail thanks to the min.iters.check of TripCount = N = BTC+1 >= Step, and therefore (BTC+1)(w/o overflow) === (BTC-(Step-1)) modulo Step. So for a common case of !foldTail && !requiresScalarEpilog, VectorTripCount can be computed w/o risk of overflow or underflow using `N-((N-Step)%Step)` instead of the current `N - (N % Step)`. For completeness, current VectorTripCount is computed by: Step = VF*UF BTC = PSE::BackedgeTakenCount() N = BTC + 1 if (foldTail) N = N + (Step-1) R = N % Step if (requiresScalarEpilog) R = (R == 0 ? Step : R) VectorTripCount = N - R

aartbik removed a reviewer: aartbik.Apr 14 2020, 4:15 PM

danilaml updated this revision to Diff 296446.Oct 6 2020, 6:47 AM

Rebased.
Had some time to come back to this.
I've tried implementing the following proposed way to get VTC locally:

Step = VF*UF
BTC = PSE::BackedgeTakenCount()
R = BTC % Step
VTC = BTC - R
if !(requiresScalarEpilog) VTC = (R == Step-1 ? VTC + Step : VTC)
if (foldTail) VTC = VTC + Step
VectorTripCount = VTC

But it had generally a negative performance impact on the benchmarks I've run mainly due to 1) more instructions required to compute VTC 2) VTC = (R == Step-1 ? VTC + Step : VTC) select for common case making vectorized loop's Exit Count uncomputable thus impeding some later optimizations.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve.

Harbormaster completed remote builds in B74134: Diff 296446.Oct 6 2020, 7:01 AM

danilaml updated this revision to Diff 314186.Dec 31 2020, 5:08 AM

Herald added a subscriber: nemanjai. · View Herald TranscriptDec 31 2020, 5:08 AM

Harbormaster completed remote builds in B83785: Diff 314186.Dec 31 2020, 5:44 AM

Is there no transform that can be taught to do this cleanup instead?

In D75746#2475681, @lebedev.ri wrote:

Is there no transform that can be taught to do this cleanup instead?

Possibly, but I have a hard time imagining how it could be done. Perhaps someone much more knowledgeable in LLVM loop optimizations than I can answer that.
I'm not even sure if it's currently possible to reliably reconstruct the unroll factor and find the remainder loop in some pass after unrolling has been complete.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3600	I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve. Currently if BTC = UINTMAX the vector loop is bypassed and the scalar "remainder" loop executes all iterations, under !foldTail. In order for loops requiring no runtime checks besides min.iters.check to execute the vector loop (as much as possible) also for BTC = UINTMAX, thereby ensuring that the scalar loop always executes at most Step-1 iterations -- the original motivation for this patch -- the following may be done: (1) Change min.iters.check to check if `BTC >= Step-1` instead of checking if `N = BTC+1 >= Step`. The latter overflows for BTC = UINTMAX thereby bypassing the vector loop, whereas the former does not wrap. (2) Compute the VectorTripCount using `N-((N-Step)%Step)` instead of `N - (N % Step)`. The latter produces zero when BTC = UINTMAX for any Step, which is incorrect for Steps that do not divide BTC+1, i.e., for non-power-of-2 UFs. With requiresScalarEpilog, apply the `R = (R == 0 ? Step : R)` to `R = ((N-Step)%Step)` before subtracting it from N. Note that VectorTripCount computed in (2) may overflow to zero, i.e., for BTC = UINTMAX and Step(UF) that is a power-of-2. This works correctly, as currently done with foldTail, where min.iters.check is eliminated, and UF is required to be a power-of-2. With foldTail, use `R = ((N-1)%Step) = BTC%Step` as suggested earlier above, which never wraps. In any case, this patch focuses on the tail. To summarize: Step = VF*UF BTC = PSE::BackedgeTakenCount() ; min.iters.check: if (!foldTail): if (BTC < Step-1) goto scalar loop N = BTC+1 ; may overflow to zero if (foldTail): R = BTC % Step if (!foldTail): R = (N-Step) % Step if (requiresScalarEpilog): R = (R == 0 ? Step : R) VectorTripCount = N - R Hopefully this way of ensuring that the tail scalar loop always executes less than Step iterations, also for non-power-of-2 Steps, has no significant negative performance impact.

Ayal mentioned this in D103255: [LV] Mark increment of main vector loop induction variable as NUW..Jun 9 2021, 3:03 PM

Rebased

Harbormaster completed remote builds in B112468: Diff 356531.Jul 5 2021, 11:09 AM

dcaballe resigned from this revision.Oct 8 2021, 2:33 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

18 lines

test/

Transforms/

LoopVectorize/

ARM/

sphinx.ll

2 lines

PowerPC/

optimal-epilog-vectorization.ll

6 lines

SystemZ/

predicated-first-order-recurrence.ll

2 lines

X86/

2 lines

18 lines

199 lines

4 lines

4 lines

optimal-epilog-vectorization-liveout.ll

2 lines

optimal-epilog-vectorization.ll

2 lines

pr44488-predication.ll

2 lines

pr44547.ll

62 lines

pr46525-expander-insertpoint.ll

2 lines

single-value-blend-phis.ll

20 lines

Diff 356531

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,589 Lines • ▼ Show 20 Lines	BasicBlock InnerLoopVectorizer::completeLoopSkeleton(Loop L,
// Get ready to start creating new instructions into the vectorized body.		// Get ready to start creating new instructions into the vectorized body.
assert(LoopVectorPreHeader == L->getLoopPreheader() &&		assert(LoopVectorPreHeader == L->getLoopPreheader() &&
"Inconsistent vector loop preheader");		"Inconsistent vector loop preheader");
Builder.SetInsertPoint(&*LoopVectorBody->getFirstInsertionPt());		Builder.SetInsertPoint(&*LoopVectorBody->getFirstInsertionPt());

Optional<MDNode *> VectorizedLoopID =		Optional<MDNode *> VectorizedLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,		makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
LLVMLoopVectorizeFollowupVectorized});		LLVMLoopVectorizeFollowupVectorized});

		// If VFxUF is 2 and vector loop is not skipped then remainder executes once.
		if (!VF.isScalable() && VF.getKnownMinValue() * UF == 2 &&
		danilamlAuthorUnsubmitted Done Reply Inline Actions I think this check is enough unless there are other cases in which "remainder loop has `N % (VFUF)` iterations doesn't hold. danilaml:* I think this check is enough unless there are other cases in which "remainder loop has `N %…
		AyalUnsubmitted Done Reply Inline Actions Note that the trip count of the remainder loop may be equal to VFUF, when loop `requiresScalarEpilogue()`; so in the above special case of VFUF==2 remainder loop may iterate once or twice. Note that `emitMinimumIterationCountCheck()` also takes care of the case where adding 1 to the backedge-taken count overflows, leading to an incorrect trip count of zero; here too "remainder" loop iterates (much) more than once. Ayal: Note that the trip count of the remainder loop may be equal to VF*UF, when loop…
		danilamlAuthorUnsubmitted Done Reply Inline Actions Thanks. I knew I might've missed something. This makes me more skeptical about potential llvm.assume solution. Am I understanding your note correctly, that adding requiresScalarEpilogue check is enough? danilaml: Thanks. I knew I might've missed something. This makes me more skeptical about potential llvm.
		AyalUnsubmitted Done Reply Inline Actions Adding `requiresScalarEpilogue()` check is enough to handle the first case, but not the second case. One way to try and handle the second case is to change `getOrCreateVectorTripCount()` so that it relies on `PSE::getBackedgeTakenCount()` w/o adding 1 to it, as this addition (done by `getOrCreateTripCount()`) may overflow to zero. See r209854, and the `max_i32_backedgetaken()` test it added to test/Transforms/LoopVectorize/induction.ll. Another way may be to check if/when adding 1 is known not to overflow. Ayal: Adding `requiresScalarEpilogue()` check is enough to handle the first case, but not the second…
		danilamlAuthorUnsubmitted Done Reply Inline Actions I haven't figured out how to make VectorTripCount reliant solely on getBackedgeTakenCount without introducing extra checks/instructions (which seems to be the rationale behind the current code, instead of the one in r209854, at the expense of not vectorizing the rare UINT_MAX loops). Instead, I've opted in checking whether the overflow might've occurred by using getConstantMaxBackedgeTakenCount. If I understood things correctly, it would return -1 (all ones) in the overflow case. danilaml: I haven't figured out how to make VectorTripCount reliant solely on getBackedgeTakenCount…
		AyalUnsubmitted Not Done Reply Inline Actions A sketch of making VectorTripCount work correctly for loops with BTC= UINTMAX as well: Step = VFUF BTC = PSE::BackedgeTakenCount() N = BTC + 1 // could overflow to 0 so do not compute N % Step if (foldTail) N = N + (Step-1) // for rounding up R = BTC % Step // Fits foldTail: (N+Step-1)%Step == (BTC+1+Step-1)%Step == (BTC+Step)%Step == BTC%Step if !(foldTail) { R = R + 1 // Fits requiresScalarEpilog: produces 0 < R <= Step if !(requiresScalarEpilog) R = (R == Step ? 0 : R) == R % Step} VectorTripCount = N - R which could be optimized into Step = VFUF BTC = PSE::BackedgeTakenCount() R = BTC % Step VTC = BTC - R if !(requiresScalarEpilog) VTC = (R == Step-1 ? VTC + Step : VTC) if (foldTail) VTC = VTC + Step VectorTripCount = VTC This also allows foldTail to work with Steps (i.e., UF's) that are not a power of 2. Ayal: A sketch of making VectorTripCount work correctly for loops with BTC= UINTMAX as well: ```…
		danilamlAuthorUnsubmitted Done Reply Inline Actions Hm, but doesn't it introduce additional instructions/compares that might lead to worse CodeGen? I.e. in the common case, BTC will need to be computed, when `BTC - R`, when `select` from `VTC` and `VTC+Step`. whereas currently it is just `VectorTripCount = TC - (TC % Step)`, where TC is already computed. danilaml: Hm, but doesn't it introduce additional instructions/compares that might lead to worse CodeGen?
		AyalUnsubmitted Not Done Reply Inline Actions Hopefully the code-size increase and the slowdown will be insignificant, being outside the loop, but indeed they need to be checked. Another option is to use the fact that, although BTC+1 might overflow and wrap to zero, BTC-(Step-1) may not underflow if !foldTail thanks to the min.iters.check of TripCount = N = BTC+1 >= Step, and therefore (BTC+1)(w/o overflow) === (BTC-(Step-1)) modulo Step. So for a common case of !foldTail && !requiresScalarEpilog, VectorTripCount can be computed w/o risk of overflow or underflow using `N-((N-Step)%Step)` instead of the current `N - (N % Step)`. For completeness, current VectorTripCount is computed by: Step = VFUF BTC = PSE::BackedgeTakenCount() N = BTC + 1 if (foldTail) N = N + (Step-1) R = N % Step if (requiresScalarEpilog) R = (R == 0 ? Step : R) VectorTripCount = N - R Ayal:* Hopefully the code-size increase and the slowdown will be insignificant, being outside the loop…
		danilamlAuthorUnsubmitted Done Reply Inline Actions I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve. danilaml: I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve.
		AyalUnsubmitted Not Done Reply Inline Actions I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve. Currently if BTC = UINTMAX the vector loop is bypassed and the scalar "remainder" loop executes all iterations, under !foldTail. In order for loops requiring no runtime checks besides min.iters.check to execute the vector loop (as much as possible) also for BTC = UINTMAX, thereby ensuring that the scalar loop always executes at most Step-1 iterations -- the original motivation for this patch -- the following may be done: (1) Change min.iters.check to check if `BTC >= Step-1` instead of checking if `N = BTC+1 >= Step`. The latter overflows for BTC = UINTMAX thereby bypassing the vector loop, whereas the former does not wrap. (2) Compute the VectorTripCount using `N-((N-Step)%Step)` instead of `N - (N % Step)`. The latter produces zero when BTC = UINTMAX for any Step, which is incorrect for Steps that do not divide BTC+1, i.e., for non-power-of-2 UFs. With requiresScalarEpilog, apply the `R = (R == 0 ? Step : R)` to `R = ((N-Step)%Step)` before subtracting it from N. Note that VectorTripCount computed in (2) may overflow to zero, i.e., for BTC = UINTMAX and Step(UF) that is a power-of-2. This works correctly, as currently done with foldTail, where min.iters.check is eliminated, and UF is required to be a power-of-2. With foldTail, use `R = ((N-1)%Step) = BTC%Step` as suggested earlier above, which never wraps. In any case, this patch focuses on the tail. To summarize: Step = VFUF BTC = PSE::BackedgeTakenCount() ; min.iters.check: if (!foldTail): if (BTC < Step-1) goto scalar loop N = BTC+1 ; may overflow to zero if (foldTail): R = BTC % Step if (!foldTail): R = (N-Step) % Step if (requiresScalarEpilog): R = (R == 0 ? Step : R) VectorTripCount = N - R Hopefully this way of ensuring that the tail scalar loop always executes less than Step iterations, also for non-power-of-2 Steps, has no significant negative performance impact. Ayal:* > I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve.
		isa<BranchInst>(ScalarLatchTerm) && !areSafetyChecksAdded() &&
		!Cost->requiresScalarEpilogue(VF)) {
		AyalUnsubmitted Done Reply Inline Actions `BI` is aka `ScalarLatchBr` Ayal: `BI` is aka `ScalarLatchBr`
		auto *ScalarLatchBr = cast<BranchInst>(ScalarLatchTerm);
		ScalarLatchBr->setCondition(
		Builder.getInt1(ScalarLatchBr->getSuccessor(0) == LoopExitBlock));
		}

if (VectorizedLoopID.hasValue()) {		if (VectorizedLoopID.hasValue()) {
L->setLoopID(VectorizedLoopID.getValue());		L->setLoopID(VectorizedLoopID.getValue());

// Do not setAlreadyVectorized if loop attributes have been defined		// Do not setAlreadyVectorized if loop attributes have been defined
// explicitly.		// explicitly.
return LoopVectorPreHeader;		return LoopVectorPreHeader;
}		}

▲ Show 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {

// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);

// Set/update profile weights for the vector and remainder loops as original		// Set/update profile weights for the vector and remainder loops as original
// loop iterations are now distributed among them. Note that original loop		// loop iterations are now distributed among them. Note that original loop
// represented by LoopScalarBody becomes remainder loop after vectorization.		// represented by LoopScalarBody becomes remainder loop after vectorization.
//		//
// For cases like foldTailByMasking() and requiresScalarEpiloque() we may		// For cases like foldTailByMasking() and requiresScalarEpilogue() we may
// end up getting slightly roughened result but that should be OK since		// end up getting slightly roughened result but that should be OK since
// profile is not inherently precise anyway. Note also possible bypass of		// profile is not inherently precise anyway. Note also possible bypass of
// vector code caused by legality checks is ignored, assigning all the weight		// vector code caused by legality checks is ignored, assigning all the weight
// to the vector loop, optimistically.		// to the vector loop, optimistically.
//		//
// For scalable vectorization we can't know at compile time how many iterations		// For scalable vectorization we can't know at compile time how many
// of the loop are handled in one vector iteration, so instead assume a pessimistic		// iterations of the loop are handled in one vector iteration, so instead
// vscale of '1'.		// assume a pessimistic vscale of '1'.
setProfileInfoAfterUnrolling(		setProfileInfoAfterUnrolling(
LI->getLoopFor(LoopScalarBody), LI->getLoopFor(LoopVectorBody),		LI->getLoopFor(LoopScalarBody), LI->getLoopFor(LoopVectorBody),
LI->getLoopFor(LoopScalarBody), VF.getKnownMinValue() * UF);		LI->getLoopFor(LoopScalarBody), VF.getKnownMinValue() * UF);
}		}

void InnerLoopVectorizer::fixCrossIterationPHIs(VPTransformState &State) {		void InnerLoopVectorizer::fixCrossIterationPHIs(VPTransformState &State) {
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
▲ Show 20 Lines • Show All 6,262 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/sphinx.ll

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[MUL123:%.*]] = fmul fast double [[CONV122]], [[CONV122]]			; CHECK-NEXT: [[MUL123:%.*]] = fmul fast double [[CONV122]], [[CONV122]]
	; CHECK-NEXT: [[ARRAYIDX124:%.]] = getelementptr inbounds float, float [[T6]], i32 [[I_2132]]			; CHECK-NEXT: [[ARRAYIDX124:%.]] = getelementptr inbounds float, float [[T6]], i32 [[I_2132]]
	; CHECK-NEXT: [[T11:%.]] = load float, float [[ARRAYIDX124]], align 4			; CHECK-NEXT: [[T11:%.]] = load float, float [[ARRAYIDX124]], align 4
	; CHECK-NEXT: [[CONV125:%.*]] = fpext float [[T11]] to double			; CHECK-NEXT: [[CONV125:%.*]] = fpext float [[T11]] to double
	; CHECK-NEXT: [[MUL126:%.*]] = fmul fast double [[MUL123]], [[CONV125]]			; CHECK-NEXT: [[MUL126:%.*]] = fmul fast double [[MUL123]], [[CONV125]]
	; CHECK-NEXT: [[SUB127]] = fsub fast double [[DVAL1_4131]], [[MUL126]]			; CHECK-NEXT: [[SUB127]] = fsub fast double [[DVAL1_4131]], [[MUL126]]
	; CHECK-NEXT: [[INC129]] = add nuw nsw i32 [[I_2132]], 1			; CHECK-NEXT: [[INC129]] = add nuw nsw i32 [[I_2132]], 1
	; CHECK-NEXT: [[EXITCOND143:%.*]] = icmp eq i32 [[INC129]], [[T]]			; CHECK-NEXT: [[EXITCOND143:%.*]] = icmp eq i32 [[INC129]], [[T]]
	; CHECK-NEXT: br i1 [[EXITCOND143]], label [[OUTEREND]], label [[INNERLOOP]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 true, label [[OUTEREND]], label [[INNERLOOP]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: outerend:			; CHECK: outerend:
	; CHECK-NEXT: [[SUB127_LCSSA:%.*]] = phi double [ [[SUB127]], [[INNERLOOP]] ], [ [[TMP18]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SUB127_LCSSA:%.*]] = phi double [ [[SUB127]], [[INNERLOOP]] ], [ [[TMP18]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[CONV138:%.*]] = fptosi double [[SUB127_LCSSA]] to i32			; CHECK-NEXT: [[CONV138:%.*]] = fptosi double [[SUB127_LCSSA]] to i32
	; CHECK-NEXT: [[CALL142]] = add nuw nsw i32 [[SCORE_1135]], [[CONV138]]			; CHECK-NEXT: [[CALL142]] = add nuw nsw i32 [[SCORE_1135]], [[CONV138]]
	; CHECK-NEXT: [[INC144]] = add nuw nsw i32 [[J_0136]], 1			; CHECK-NEXT: [[INC144]] = add nuw nsw i32 [[J_0136]], 1
	; CHECK-NEXT: [[ARRAYIDX102:%.]] = getelementptr inbounds i32, i32 @a, i32 [[INC144]]			; CHECK-NEXT: [[ARRAYIDX102:%.]] = getelementptr inbounds i32, i32 @a, i32 [[INC144]]
	; CHECK-NEXT: [[V17]] = load i32, i32* [[ARRAYIDX102]], align 4			; CHECK-NEXT: [[V17]] = load i32, i32* [[ARRAYIDX102]], align 4
	; CHECK-NEXT: [[CMP103:%.*]] = icmp sgt i32 [[V17]], -1			; CHECK-NEXT: [[CMP103:%.*]] = icmp sgt i32 [[V17]], -1
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/PowerPC/optimal-epilog-vectorization.ll

	Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
	; VF-TWO-CHECK-NEXT: [[TMP145:%.]] = load float, float [[ARRAYIDX]], align 4			; VF-TWO-CHECK-NEXT: [[TMP145:%.]] = load float, float [[ARRAYIDX]], align 4
	; VF-TWO-CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[CC]], i64 [[INDVARS_IV]]			; VF-TWO-CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[CC]], i64 [[INDVARS_IV]]
	; VF-TWO-CHECK-NEXT: [[TMP146:%.]] = load float, float [[ARRAYIDX2]], align 4			; VF-TWO-CHECK-NEXT: [[TMP146:%.]] = load float, float [[ARRAYIDX2]], align 4
	; VF-TWO-CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP145]], [[TMP146]]			; VF-TWO-CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP145]], [[TMP146]]
	; VF-TWO-CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[AA]], i64 [[INDVARS_IV]]			; VF-TWO-CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[AA]], i64 [[INDVARS_IV]]
	; VF-TWO-CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX4]], align 4			; VF-TWO-CHECK-NEXT: store float [[ADD]], float* [[ARRAYIDX4]], align 4
	; VF-TWO-CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; VF-TWO-CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; VF-TWO-CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; VF-TWO-CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; VF-TWO-CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]			; VF-TWO-CHECK-NEXT: br i1 false, label [[FOR_BODY]], label [[FOR_END_LOOPEXIT_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]
	; VF-TWO-CHECK: for.end.loopexit.loopexit:			; VF-TWO-CHECK: for.end.loopexit.loopexit:
	; VF-TWO-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]			; VF-TWO-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]
	; VF-TWO-CHECK: for.end.loopexit:			; VF-TWO-CHECK: for.end.loopexit:
	; VF-TWO-CHECK-NEXT: br label [[FOR_END]]			; VF-TWO-CHECK-NEXT: br label [[FOR_END]]
	; VF-TWO-CHECK: for.end:			; VF-TWO-CHECK: for.end:
	; VF-TWO-CHECK-NEXT: ret void			; VF-TWO-CHECK-NEXT: ret void
	;			;
	; VF-FOUR-CHECK-LABEL: @f1(			; VF-FOUR-CHECK-LABEL: @f1(
	▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
	; VF-FOUR-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]			; VF-FOUR-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]
	; VF-FOUR-CHECK: for.end.loopexit:			; VF-FOUR-CHECK: for.end.loopexit:
	; VF-FOUR-CHECK-NEXT: br label [[FOR_END]]			; VF-FOUR-CHECK-NEXT: br label [[FOR_END]]
	; VF-FOUR-CHECK: for.end:			; VF-FOUR-CHECK: for.end:
	; VF-FOUR-CHECK-NEXT: ret void			; VF-FOUR-CHECK-NEXT: ret void
	;			;



	entry:			entry:
	%cmp1 = icmp sgt i32 %N, 0			%cmp1 = icmp sgt i32 %N, 0
	br i1 %cmp1, label %for.body.preheader, label %for.end			br i1 %cmp1, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	%wide.trip.count = zext i32 %N to i64			%wide.trip.count = zext i32 %N to i64
	br label %for.body			br label %for.body

	▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	; VF-TWO-CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[IDXPROM]]			; VF-TWO-CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[B]], i64 [[IDXPROM]]
	; VF-TWO-CHECK-NEXT: [[TMP132:%.]] = load float, float [[ARRAYIDX]], align 4			; VF-TWO-CHECK-NEXT: [[TMP132:%.]] = load float, float [[ARRAYIDX]], align 4
	; VF-TWO-CHECK-NEXT: [[CONV3:%.*]] = fadd fast float [[TMP132]], 1.000000e+00			; VF-TWO-CHECK-NEXT: [[CONV3:%.*]] = fadd fast float [[TMP132]], 1.000000e+00
	; VF-TWO-CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]			; VF-TWO-CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]
	; VF-TWO-CHECK-NEXT: store float [[CONV3]], float* [[ARRAYIDX5]], align 4			; VF-TWO-CHECK-NEXT: store float [[CONV3]], float* [[ARRAYIDX5]], align 4
	; VF-TWO-CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; VF-TWO-CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; VF-TWO-CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_014]], 1			; VF-TWO-CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_014]], 1
	; VF-TWO-CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; VF-TWO-CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; VF-TWO-CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT_LOOPEXIT]], !llvm.loop [[LOOP7:![0-9]+]]			; VF-TWO-CHECK-NEXT: br i1 false, label [[FOR_BODY]], label [[FOR_END_LOOPEXIT_LOOPEXIT]], !llvm.loop [[LOOP7:![0-9]+]]
	; VF-TWO-CHECK: for.end.loopexit.loopexit:			; VF-TWO-CHECK: for.end.loopexit.loopexit:
	; VF-TWO-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]			; VF-TWO-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]
	; VF-TWO-CHECK: for.end.loopexit:			; VF-TWO-CHECK: for.end.loopexit:
	; VF-TWO-CHECK-NEXT: br label [[FOR_END]]			; VF-TWO-CHECK-NEXT: br label [[FOR_END]]
	; VF-TWO-CHECK: for.end:			; VF-TWO-CHECK: for.end:
	; VF-TWO-CHECK-NEXT: ret i32 0			; VF-TWO-CHECK-NEXT: ret i32 0
	;			;
	; VF-FOUR-CHECK-LABEL: @f2(			; VF-FOUR-CHECK-LABEL: @f2(
	▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	; VF-FOUR-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]			; VF-FOUR-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]
	; VF-FOUR-CHECK: for.end.loopexit:			; VF-FOUR-CHECK: for.end.loopexit:
	; VF-FOUR-CHECK-NEXT: br label [[FOR_END]]			; VF-FOUR-CHECK-NEXT: br label [[FOR_END]]
	; VF-FOUR-CHECK: for.end:			; VF-FOUR-CHECK: for.end:
	; VF-FOUR-CHECK-NEXT: ret i32 0			; VF-FOUR-CHECK-NEXT: ret i32 0
	;			;



	entry:			entry:
	%cmp1 = icmp sgt i32 %n, 1			%cmp1 = icmp sgt i32 %n, 1
	br i1 %cmp1, label %for.body.preheader, label %for.end			br i1 %cmp1, label %for.body.preheader, label %for.end

	for.body.preheader: ; preds = %entry			for.body.preheader: ; preds = %entry
	%0 = add i32 %n, -1			%0 = add i32 %n, -1
	%wide.trip.count = zext i32 %0 to i64			%wide.trip.count = zext i32 %0 to i64
	br label %for.body			br label %for.body
	Show All 25 Lines

llvm/test/Transforms/LoopVectorize/SystemZ/predicated-first-order-recurrence.ll

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[LV:%.]], [[LOOP]] ]			; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[LV:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[LOOP]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[A_PTR:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @A, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[A_PTR:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @A, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[LV]] = load i32, i32* [[A_PTR]], align 4			; CHECK-NEXT: [[LV]] = load i32, i32* [[A_PTR]], align 4
	; CHECK-NEXT: [[B_PTR:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @B, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[B_PTR:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @B, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i32 [[SCALAR_RECUR]], i32* [[B_PTR]], align 4			; CHECK-NEXT: store i32 [[SCALAR_RECUR]], i32* [[B_PTR]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 5			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 5
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[EXIT]], label [[LOOP]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 true, label [[EXIT]], label [[LOOP]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop: ; preds = %loop, %entry			loop: ; preds = %loop, %entry
	%rec = phi i32 [ 0, %entry], [ %lv, %loop ]			%rec = phi i32 [ 0, %entry], [ %lv, %loop ]
	Show All 12 Lines

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[_TMP1:%.*]] = zext i16 0 to i64			; CHECK-NEXT: [[_TMP1:%.*]] = zext i16 0 to i64
	; CHECK-NEXT: [[_TMP2:%.]] = getelementptr [1 x %rec8], [1 x %rec8] @a, i16 0, i64 [[_TMP1]]			; CHECK-NEXT: [[_TMP2:%.]] = getelementptr [1 x %rec8], [1 x %rec8] @a, i16 0, i64 [[_TMP1]]
	; CHECK-NEXT: [[_TMP4:%.]] = bitcast %rec8 [[_TMP2]] to i16*			; CHECK-NEXT: [[_TMP4:%.]] = bitcast %rec8 [[_TMP2]] to i16*
	; CHECK-NEXT: [[_TMP6:%.*]] = sext i16 [[C_1_0]] to i64			; CHECK-NEXT: [[_TMP6:%.*]] = sext i16 [[C_1_0]] to i64
	; CHECK-NEXT: [[_TMP7:%.]] = getelementptr [2 x i16], [2 x i16] @b, i16 0, i64 [[_TMP6]]			; CHECK-NEXT: [[_TMP7:%.]] = getelementptr [2 x i16], [2 x i16] @b, i16 0, i64 [[_TMP6]]
	; CHECK-NEXT: store i16* [[_TMP4]], i16** [[_TMP7]], align 8			; CHECK-NEXT: store i16* [[_TMP4]], i16** [[_TMP7]], align 8
	; CHECK-NEXT: [[_TMP9]] = add nsw i16 [[C_1_0]], 1			; CHECK-NEXT: [[_TMP9]] = add nsw i16 [[C_1_0]], 1
	; CHECK-NEXT: [[_TMP11:%.*]] = icmp slt i16 [[_TMP9]], 2			; CHECK-NEXT: [[_TMP11:%.*]] = icmp slt i16 [[_TMP9]], 2
	; CHECK-NEXT: br i1 [[_TMP11]], label [[BB2]], label [[BB3]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[BB2]], label [[BB3]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;

	bb1:			bb1:
	br label %bb2			br label %bb2

	bb2:			bb2:
	Show All 14 Lines

llvm/test/Transforms/LoopVectorize/float-induction.ll

	Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
	; VEC1_INTERL2-NEXT: br label [[FOR_BODY:%.*]]			; VEC1_INTERL2-NEXT: br label [[FOR_BODY:%.*]]
	; VEC1_INTERL2: for.body:			; VEC1_INTERL2: for.body:
	; VEC1_INTERL2-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; VEC1_INTERL2-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; VEC1_INTERL2-NEXT: [[X_05:%.]] = phi float [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ], [ [[ADD:%.]], [[FOR_BODY]] ]			; VEC1_INTERL2-NEXT: [[X_05:%.]] = phi float [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ], [ [[ADD:%.]], [[FOR_BODY]] ]
	; VEC1_INTERL2-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDVARS_IV]]			; VEC1_INTERL2-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 [[INDVARS_IV]]
	; VEC1_INTERL2-NEXT: store float [[X_05]], float* [[ARRAYIDX]], align 4			; VEC1_INTERL2-NEXT: store float [[X_05]], float* [[ARRAYIDX]], align 4
	; VEC1_INTERL2-NEXT: [[ADD]] = fsub reassoc float [[X_05]], [[FPINC]]			; VEC1_INTERL2-NEXT: [[ADD]] = fsub reassoc float [[X_05]], [[FPINC]]
	; VEC1_INTERL2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; VEC1_INTERL2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; VEC1_INTERL2-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; VEC1_INTERL2-NEXT: br i1 true, label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]
	; VEC1_INTERL2-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; VEC1_INTERL2-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]
	; VEC1_INTERL2: for.end.loopexit:			; VEC1_INTERL2: for.end.loopexit:
	; VEC1_INTERL2-NEXT: br label [[FOR_END]]			; VEC1_INTERL2-NEXT: br label [[FOR_END]]
	; VEC1_INTERL2: for.end:			; VEC1_INTERL2: for.end:
	; VEC1_INTERL2-NEXT: ret void			; VEC1_INTERL2-NEXT: ret void
	;			;
	; VEC2_INTERL1_PRED_STORE-LABEL: @fp_iv_loop1_reassoc_FMF(			; VEC2_INTERL1_PRED_STORE-LABEL: @fp_iv_loop1_reassoc_FMF(
	; VEC2_INTERL1_PRED_STORE-NEXT: entry:			; VEC2_INTERL1_PRED_STORE-NEXT: entry:
	; VEC2_INTERL1_PRED_STORE-NEXT: [[CMP4:%.]] = icmp sgt i32 [[N:%.]], 0			; VEC2_INTERL1_PRED_STORE-NEXT: [[CMP4:%.]] = icmp sgt i32 [[N:%.]], 0
	Show All 29 Lines
	; VEC2_INTERL1_PRED_STORE-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; VEC2_INTERL1_PRED_STORE-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; VEC2_INTERL1_PRED_STORE-NEXT: [[VEC_IND_NEXT]] = fsub reassoc <2 x float> [[VEC_IND]], [[DOTSPLAT5]]			; VEC2_INTERL1_PRED_STORE-NEXT: [[VEC_IND_NEXT]] = fsub reassoc <2 x float> [[VEC_IND]], [[DOTSPLAT5]]
	; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; VEC2_INTERL1_PRED_STORE-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]			; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]
	; VEC2_INTERL1_PRED_STORE: middle.block:			; VEC2_INTERL1_PRED_STORE: middle.block:
	; VEC2_INTERL1_PRED_STORE-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]			; VEC2_INTERL1_PRED_STORE-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
	; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY]]			; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY]]
	; VEC2_INTERL1_PRED_STORE: for.body:			; VEC2_INTERL1_PRED_STORE: for.body:
	; VEC2_INTERL1_PRED_STORE-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_LR_PH]] ]			; VEC2_INTERL1_PRED_STORE-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[FOR_BODY_LR_PH]] ]
	; VEC2_INTERL1_PRED_STORE-NEXT: [[X_05:%.]] = phi float [ [[ADD:%.]], [[FOR_BODY]] ], [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[INIT]], [[FOR_BODY_LR_PH]] ]			; VEC2_INTERL1_PRED_STORE-NEXT: [[BC_RESUME_VAL1:%.*]] = phi float [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[INIT]], [[FOR_BODY_LR_PH]] ]
	; VEC2_INTERL1_PRED_STORE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV]]			; VEC2_INTERL1_PRED_STORE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[A]], i64 [[BC_RESUME_VAL]]
	; VEC2_INTERL1_PRED_STORE-NEXT: store float [[X_05]], float* [[ARRAYIDX]], align 4			; VEC2_INTERL1_PRED_STORE-NEXT: store float [[BC_RESUME_VAL1]], float* [[ARRAYIDX]], align 4
	; VEC2_INTERL1_PRED_STORE-NEXT: [[ADD]] = fsub reassoc float [[X_05]], [[FPINC]]			; VEC2_INTERL1_PRED_STORE-NEXT: br label [[FOR_END]], !llvm.loop [[LOOP2:![0-9]+]]
	; VEC2_INTERL1_PRED_STORE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; VEC2_INTERL1_PRED_STORE-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; VEC2_INTERL1_PRED_STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], [[N]]
	; VEC2_INTERL1_PRED_STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], [[LOOP5:!llvm.loop !.*]]
	; VEC2_INTERL1_PRED_STORE: for.end:			; VEC2_INTERL1_PRED_STORE: for.end:
	; VEC2_INTERL1_PRED_STORE-NEXT: ret void			; VEC2_INTERL1_PRED_STORE-NEXT: ret void
	;			;
	entry:			entry:
	%cmp4 = icmp sgt i32 %N, 0			%cmp4 = icmp sgt i32 %N, 0
	br i1 %cmp4, label %for.body.lr.ph, label %for.end			br i1 %cmp4, label %for.body.lr.ph, label %for.end

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/if-pred-stores.ll

	Show All 33 Lines
	; UNROLL: pred.store.continue3:			; UNROLL: pred.store.continue3:
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128			; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128
	; UNROLL-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; UNROLL-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 128, 128			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 128, 128
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; UNROLL: for.body:			; UNROLL: for.body:
	; UNROLL-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ 128, [[MIDDLE_BLOCK]] ]			; UNROLL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[F]], i64 128
	; UNROLL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[F]], i64 [[INDVARS_IV]]
	; UNROLL-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; UNROLL-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; UNROLL-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP9]], 100			; UNROLL-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP9]], 100
	; UNROLL-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; UNROLL-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; UNROLL: if.then:			; UNROLL: if.then:
	; UNROLL-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 20			; UNROLL-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 20
	; UNROLL-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4			; UNROLL-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4
	; UNROLL-NEXT: br label [[FOR_INC]]			; UNROLL-NEXT: br label [[FOR_INC]]
	; UNROLL: for.inc:			; UNROLL: for.inc:
	; UNROLL-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 128, 1
	; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128			; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
	; UNROLL-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; UNROLL-NEXT: br label [[FOR_END]], !llvm.loop [[LOOP2:![0-9]+]]
	; UNROLL: for.end:			; UNROLL: for.end:
	; UNROLL-NEXT: ret i32 0			; UNROLL-NEXT: ret i32 0
	;			;
	; UNROLL-NOSIMPLIFY-LABEL: @test(			; UNROLL-NOSIMPLIFY-LABEL: @test(
	; UNROLL-NOSIMPLIFY-NEXT: entry:			; UNROLL-NOSIMPLIFY-NEXT: entry:
	; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; UNROLL-NOSIMPLIFY: vector.ph:			; UNROLL-NOSIMPLIFY: vector.ph:
	; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]
	Show All 36 Lines
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; UNROLL-NOSIMPLIFY: if.then:			; UNROLL-NOSIMPLIFY: if.then:
	; UNROLL-NOSIMPLIFY-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 20			; UNROLL-NOSIMPLIFY-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 20
	; UNROLL-NOSIMPLIFY-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4			; UNROLL-NOSIMPLIFY-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]
	; UNROLL-NOSIMPLIFY: for.inc:			; UNROLL-NOSIMPLIFY: for.inc:
	; UNROLL-NOSIMPLIFY-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NOSIMPLIFY-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128			; UNROLL-NOSIMPLIFY-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; UNROLL-NOSIMPLIFY: for.end:			; UNROLL-NOSIMPLIFY: for.end:
	; UNROLL-NOSIMPLIFY-NEXT: ret i32 0			; UNROLL-NOSIMPLIFY-NEXT: ret i32 0
	;			;
	; VEC-LABEL: @test(			; VEC-LABEL: @test(
	; VEC-NEXT: entry:			; VEC-NEXT: entry:
	; VEC-NEXT: br label [[VECTOR_BODY:%.*]]			; VEC-NEXT: br label [[VECTOR_BODY:%.*]]
	; VEC: vector.body:			; VEC: vector.body:
	; VEC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.]] ]			; VEC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.]] ]
	Show All 23 Lines
	; VEC: pred.store.continue2:			; VEC: pred.store.continue2:
	; VEC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; VEC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; VEC-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128			; VEC-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128
	; VEC-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; VEC-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; VEC: middle.block:			; VEC: middle.block:
	; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 128, 128			; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 128, 128
	; VEC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; VEC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; VEC: for.body:			; VEC: for.body:
	; VEC-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ 128, [[MIDDLE_BLOCK]] ]			; VEC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[F]], i64 128
	; VEC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[F]], i64 [[INDVARS_IV]]
	; VEC-NEXT: [[TMP14:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; VEC-NEXT: [[TMP14:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; VEC-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP14]], 100			; VEC-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP14]], 100
	; VEC-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; VEC-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; VEC: if.then:			; VEC: if.then:
	; VEC-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP14]], 20			; VEC-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP14]], 20
	; VEC-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4			; VEC-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4
	; VEC-NEXT: br label [[FOR_INC]]			; VEC-NEXT: br label [[FOR_INC]]
	; VEC: for.inc:			; VEC: for.inc:
	; VEC-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; VEC-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 128, 1
	; VEC-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128			; VEC-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
	; VEC-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; VEC-NEXT: br label [[FOR_END]], !llvm.loop [[LOOP2:![0-9]+]]
	; VEC: for.end:			; VEC: for.end:
	; VEC-NEXT: ret i32 0			; VEC-NEXT: ret i32 0
	;			;

	entry:			entry:
	br label %for.body			br label %for.body



	for.body:			for.body:
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc ]
	%arrayidx = getelementptr inbounds i32, i32* %f, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %f, i64 %indvars.iv
	Show All 25 Lines
	; UNROLL-NEXT: entry:			; UNROLL-NEXT: entry:
	; UNROLL-NEXT: [[TMP0:%.]] = xor i1 [[COND:%.]], true			; UNROLL-NEXT: [[TMP0:%.]] = xor i1 [[COND:%.]], true
	; UNROLL-NEXT: call void @llvm.assume(i1 [[TMP0]])			; UNROLL-NEXT: call void @llvm.assume(i1 [[TMP0]])
	; UNROLL-NEXT: [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 undef, i32 0)			; UNROLL-NEXT: [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 undef, i32 0)
	; UNROLL-NEXT: [[TMP1:%.*]] = sub i32 [[SMAX]], undef			; UNROLL-NEXT: [[TMP1:%.*]] = sub i32 [[SMAX]], undef
	; UNROLL-NEXT: [[TMP2:%.*]] = zext i32 [[TMP1]] to i64			; UNROLL-NEXT: [[TMP2:%.*]] = zext i32 [[TMP1]] to i64
	; UNROLL-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1			; UNROLL-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
	; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 2			; UNROLL-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 2
	; UNROLL-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NEXT: [[TMP4:%.*]] = xor i1 [[MIN_ITERS_CHECK]], true
	; UNROLL: vector.ph:			; UNROLL-NEXT: call void @llvm.assume(i1 [[TMP4]])
	; UNROLL-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2			; UNROLL-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
	; UNROLL-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]			; UNROLL-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
	; UNROLL-NEXT: [[IND_END:%.*]] = add i64 undef, [[N_VEC]]			; UNROLL-NEXT: [[IND_END:%.*]] = add i64 undef, [[N_VEC]]
	; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL: vector.body:			; UNROLL: vector.body:
	; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE4:%.*]] ]			; UNROLL-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE4:%.]] ]
	; UNROLL-NEXT: [[VEC_PHI:%.]] = phi i32 [ undef, [[VECTOR_PH]] ], [ [[PREDPHI:%.]], [[PRED_STORE_CONTINUE4]] ]			; UNROLL-NEXT: [[VEC_PHI:%.]] = phi i32 [ undef, [[ENTRY]] ], [ [[PREDPHI:%.]], [[PRED_STORE_CONTINUE4]] ]
	; UNROLL-NEXT: [[VEC_PHI2:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[PREDPHI5:%.]], [[PRED_STORE_CONTINUE4]] ]			; UNROLL-NEXT: [[VEC_PHI2:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[PREDPHI5:%.]], [[PRED_STORE_CONTINUE4]] ]
	; UNROLL-NEXT: [[OFFSET_IDX:%.*]] = add i64 undef, [[INDEX]]			; UNROLL-NEXT: [[OFFSET_IDX:%.*]] = add i64 undef, [[INDEX]]
	; UNROLL-NEXT: [[INDUCTION:%.*]] = add i64 [[OFFSET_IDX]], 0			; UNROLL-NEXT: [[INDUCTION:%.*]] = add i64 [[OFFSET_IDX]], 0
	; UNROLL-NEXT: [[INDUCTION1:%.*]] = add i64 [[OFFSET_IDX]], 1			; UNROLL-NEXT: [[INDUCTION1:%.*]] = add i64 [[OFFSET_IDX]], 1
	; UNROLL-NEXT: [[TMP4:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[INDUCTION]]			; UNROLL-NEXT: [[TMP5:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[INDUCTION]]
	; UNROLL-NEXT: [[TMP5:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[INDUCTION1]]			; UNROLL-NEXT: [[TMP6:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[INDUCTION1]]
	; UNROLL-NEXT: [[TMP6:%.]] = load i32, i32 [[TMP4]], align 4
	; UNROLL-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP5]], align 4			; UNROLL-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP5]], align 4
				; UNROLL-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP6]], align 4
	; UNROLL-NEXT: br i1 undef, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE4]]			; UNROLL-NEXT: br i1 undef, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE4]]
	; UNROLL: pred.store.if:			; UNROLL: pred.store.if:
	; UNROLL-NEXT: store i32 2, i32* [[TMP4]], align 4			; UNROLL-NEXT: store i32 2, i32* [[TMP5]], align 4
	; UNROLL-NEXT: br label [[PRED_STORE_CONTINUE4]]			; UNROLL-NEXT: br label [[PRED_STORE_CONTINUE4]]
	; UNROLL: pred.store.continue4:			; UNROLL: pred.store.continue4:
	; UNROLL-NEXT: [[TMP8:%.*]] = add i32 [[VEC_PHI]], 1			; UNROLL-NEXT: [[TMP9:%.*]] = add i32 [[VEC_PHI]], 1
	; UNROLL-NEXT: [[TMP9:%.*]] = add i32 [[VEC_PHI2]], 1			; UNROLL-NEXT: [[TMP10:%.*]] = add i32 [[VEC_PHI2]], 1
	; UNROLL-NEXT: [[PREDPHI]] = select i1 undef, i32 [[VEC_PHI]], i32 [[TMP8]]			; UNROLL-NEXT: [[PREDPHI]] = select i1 undef, i32 [[VEC_PHI]], i32 [[TMP9]]
	; UNROLL-NEXT: [[PREDPHI5]] = select i1 undef, i32 [[VEC_PHI2]], i32 [[TMP9]]			; UNROLL-NEXT: [[PREDPHI5]] = select i1 undef, i32 [[VEC_PHI2]], i32 [[TMP10]]
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; UNROLL-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; UNROLL-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; UNROLL-NEXT: [[TMP12:%.*]] = xor i1 [[TMP11]], true
	; UNROLL: middle.block:			; UNROLL-NEXT: call void @llvm.assume(i1 [[TMP12]])
	; UNROLL-NEXT: [[BIN_RDX:%.*]] = add i32 [[PREDPHI5]], [[PREDPHI]]			; UNROLL-NEXT: br label [[VECTOR_BODY]]
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
	; UNROLL-NEXT: [[TMP11:%.*]] = xor i1 [[CMP_N]], true
	; UNROLL-NEXT: call void @llvm.assume(i1 [[TMP11]])
	; UNROLL-NEXT: br label [[SCALAR_PH]]
	; UNROLL: scalar.ph:
	; UNROLL-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ undef, [[ENTRY:%.]] ]
	; UNROLL-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ undef, [[ENTRY]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NEXT: br label [[FOR_BODY14:%.*]]
	; UNROLL: for.body14:
	; UNROLL-NEXT: [[INDVARS_IV3:%.]] = phi i64 [ [[INDVARS_IV_NEXT4:%.]], [[FOR_INC23:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; UNROLL-NEXT: [[INEWCHUNKS_120:%.]] = phi i32 [ [[INEWCHUNKS_2:%.]], [[FOR_INC23]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; UNROLL-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[INDVARS_IV3]]
	; UNROLL-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX16]], align 4
	; UNROLL-NEXT: br i1 undef, label [[IF_THEN18:%.*]], label [[FOR_INC23]]
	; UNROLL: if.then18:
	; UNROLL-NEXT: store i32 2, i32* [[ARRAYIDX16]], align 4
	; UNROLL-NEXT: [[INC21:%.*]] = add nsw i32 [[INEWCHUNKS_120]], 1
	; UNROLL-NEXT: br label [[FOR_INC23]]
	; UNROLL: for.inc23:
	; UNROLL-NEXT: [[INEWCHUNKS_2]] = phi i32 [ [[INC21]], [[IF_THEN18]] ], [ [[INEWCHUNKS_120]], [[FOR_BODY14]] ]
	; UNROLL-NEXT: [[INDVARS_IV_NEXT4]] = add nsw i64 [[INDVARS_IV3]], 1
	; UNROLL-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV3]] to i32
	; UNROLL-NEXT: [[CMP13:%.*]] = icmp slt i32 [[TMP1]], 0
	; UNROLL-NEXT: call void @llvm.assume(i1 [[CMP13]])
	; UNROLL-NEXT: br label [[FOR_BODY14]]
	;			;
	; UNROLL-NOSIMPLIFY-LABEL: @bug18724(			; UNROLL-NOSIMPLIFY-LABEL: @bug18724(
	; UNROLL-NOSIMPLIFY-NEXT: entry:			; UNROLL-NOSIMPLIFY-NEXT: entry:
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_BODY9:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_BODY9:%.*]]
	; UNROLL-NOSIMPLIFY: for.body9:			; UNROLL-NOSIMPLIFY: for.body9:
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[COND:%.]], label [[FOR_INC26:%.]], label [[FOR_BODY14_PREHEADER:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 [[COND:%.]], label [[FOR_INC26:%.]], label [[FOR_BODY14_PREHEADER:%.*]]
	; UNROLL-NOSIMPLIFY: for.body14.preheader:			; UNROLL-NOSIMPLIFY: for.body14.preheader:
	; UNROLL-NOSIMPLIFY-NEXT: [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 undef, i32 0)			; UNROLL-NOSIMPLIFY-NEXT: [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 undef, i32 0)
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; UNROLL-NOSIMPLIFY-NEXT: store i32 2, i32* [[ARRAYIDX16]], align 4			; UNROLL-NOSIMPLIFY-NEXT: store i32 2, i32* [[ARRAYIDX16]], align 4
	; UNROLL-NOSIMPLIFY-NEXT: [[INC21:%.*]] = add nsw i32 [[INEWCHUNKS_120]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[INC21:%.*]] = add nsw i32 [[INEWCHUNKS_120]], 1
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC23]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC23]]
	; UNROLL-NOSIMPLIFY: for.inc23:			; UNROLL-NOSIMPLIFY: for.inc23:
	; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_2]] = phi i32 [ [[INC21]], [[IF_THEN18]] ], [ [[INEWCHUNKS_120]], [[FOR_BODY14]] ]			; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_2]] = phi i32 [ [[INC21]], [[IF_THEN18]] ], [ [[INEWCHUNKS_120]], [[FOR_BODY14]] ]
	; UNROLL-NOSIMPLIFY-NEXT: [[INDVARS_IV_NEXT4]] = add nsw i64 [[INDVARS_IV3]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[INDVARS_IV_NEXT4]] = add nsw i64 [[INDVARS_IV3]], 1
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV3]] to i32			; UNROLL-NOSIMPLIFY-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV3]] to i32
	; UNROLL-NOSIMPLIFY-NEXT: [[CMP13:%.*]] = icmp slt i32 [[TMP1]], 0			; UNROLL-NOSIMPLIFY-NEXT: [[CMP13:%.*]] = icmp slt i32 [[TMP1]], 0
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[CMP13]], label [[FOR_BODY14]], label [[FOR_INC26_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[FOR_BODY14]], label [[FOR_INC26_LOOPEXIT]], !llvm.loop [[LOOP4:![0-9]+]]
	; UNROLL-NOSIMPLIFY: for.inc26.loopexit:			; UNROLL-NOSIMPLIFY: for.inc26.loopexit:
	; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_2_LCSSA:%.*]] = phi i32 [ [[INEWCHUNKS_2]], [[FOR_INC23]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]			; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_2_LCSSA:%.*]] = phi i32 [ [[INEWCHUNKS_2]], [[FOR_INC23]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC26]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC26]]
	; UNROLL-NOSIMPLIFY: for.inc26:			; UNROLL-NOSIMPLIFY: for.inc26:
	; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_1_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY9]] ], [ [[INEWCHUNKS_2_LCSSA]], [[FOR_INC26_LOOPEXIT]] ]			; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_1_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY9]] ], [ [[INEWCHUNKS_2_LCSSA]], [[FOR_INC26_LOOPEXIT]] ]
	; UNROLL-NOSIMPLIFY-NEXT: unreachable			; UNROLL-NOSIMPLIFY-NEXT: unreachable
	;			;
	; VEC-LABEL: @bug18724(			; VEC-LABEL: @bug18724(
	; VEC-NEXT: entry:			; VEC-NEXT: entry:
	; VEC-NEXT: [[TMP0:%.]] = xor i1 [[COND:%.]], true			; VEC-NEXT: [[TMP0:%.]] = xor i1 [[COND:%.]], true
	; VEC-NEXT: call void @llvm.assume(i1 [[TMP0]])			; VEC-NEXT: call void @llvm.assume(i1 [[TMP0]])
	; VEC-NEXT: [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 undef, i32 0)			; VEC-NEXT: [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 undef, i32 0)
	; VEC-NEXT: [[TMP1:%.*]] = sub i32 [[SMAX]], undef			; VEC-NEXT: [[TMP1:%.*]] = sub i32 [[SMAX]], undef
	; VEC-NEXT: [[TMP2:%.*]] = zext i32 [[TMP1]] to i64			; VEC-NEXT: [[TMP2:%.*]] = zext i32 [[TMP1]] to i64
	; VEC-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1			; VEC-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
	; VEC-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 2			; VEC-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 2
	; VEC-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; VEC-NEXT: [[TMP4:%.*]] = xor i1 [[MIN_ITERS_CHECK]], true
	; VEC: vector.ph:			; VEC-NEXT: call void @llvm.assume(i1 [[TMP4]])
	; VEC-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2			; VEC-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
	; VEC-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]			; VEC-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
	; VEC-NEXT: [[IND_END:%.*]] = add i64 undef, [[N_VEC]]			; VEC-NEXT: [[IND_END:%.*]] = add i64 undef, [[N_VEC]]
	; VEC-NEXT: br label [[VECTOR_BODY:%.*]]			; VEC-NEXT: br label [[VECTOR_BODY:%.*]]
	; VEC: vector.body:			; VEC: vector.body:
	; VEC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.*]] ]			; VEC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.]] ]
	; VEC-NEXT: [[VEC_PHI:%.]] = phi <2 x i32> [ <i32 undef, i32 0>, [[VECTOR_PH]] ], [ [[PREDPHI:%.]], [[PRED_STORE_CONTINUE2]] ]			; VEC-NEXT: [[VEC_PHI:%.]] = phi <2 x i32> [ <i32 undef, i32 0>, [[ENTRY]] ], [ [[PREDPHI:%.]], [[PRED_STORE_CONTINUE2]] ]
	; VEC-NEXT: [[OFFSET_IDX:%.*]] = add i64 undef, [[INDEX]]			; VEC-NEXT: [[OFFSET_IDX:%.*]] = add i64 undef, [[INDEX]]
	; VEC-NEXT: [[TMP4:%.*]] = add i64 [[OFFSET_IDX]], 0			; VEC-NEXT: [[TMP5:%.*]] = add i64 [[OFFSET_IDX]], 0
	; VEC-NEXT: [[TMP5:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[TMP4]]			; VEC-NEXT: [[TMP6:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[TMP5]]
	; VEC-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i32 0			; VEC-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i32 0
	; VEC-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*			; VEC-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP7]] to <2 x i32>*
	; VEC-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP7]], align 4			; VEC-NEXT: [[WIDE_LOAD:%.]] = load <2 x i32>, <2 x i32> [[TMP8]], align 4
	; VEC-NEXT: br i1 undef, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE2]]			; VEC-NEXT: br i1 undef, label [[PRED_STORE_IF:%.*]], label [[PRED_STORE_CONTINUE2]]
	; VEC: pred.store.if:			; VEC: pred.store.if:
	; VEC-NEXT: store i32 2, i32* [[TMP5]], align 4			; VEC-NEXT: store i32 2, i32* [[TMP6]], align 4
	; VEC-NEXT: br label [[PRED_STORE_CONTINUE2]]			; VEC-NEXT: br label [[PRED_STORE_CONTINUE2]]
	; VEC: pred.store.continue2:			; VEC: pred.store.continue2:
	; VEC-NEXT: [[TMP8:%.*]] = add <2 x i32> [[VEC_PHI]], <i32 1, i32 1>			; VEC-NEXT: [[TMP9:%.*]] = add <2 x i32> [[VEC_PHI]], <i32 1, i32 1>
	; VEC-NEXT: [[PREDPHI]] = select <2 x i1> undef, <2 x i32> [[VEC_PHI]], <2 x i32> [[TMP8]]			; VEC-NEXT: [[PREDPHI]] = select <2 x i1> undef, <2 x i32> [[VEC_PHI]], <2 x i32> [[TMP9]]
	; VEC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; VEC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; VEC-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; VEC-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; VEC-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; VEC-NEXT: [[TMP11:%.*]] = xor i1 [[TMP10]], true
	; VEC: middle.block:
	; VEC-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v2i32(<2 x i32> [[PREDPHI]])
	; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
	; VEC-NEXT: [[TMP11:%.*]] = xor i1 [[CMP_N]], true
	; VEC-NEXT: call void @llvm.assume(i1 [[TMP11]])			; VEC-NEXT: call void @llvm.assume(i1 [[TMP11]])
	; VEC-NEXT: br label [[SCALAR_PH]]			; VEC-NEXT: br label [[VECTOR_BODY]]
				danilamlAuthorUnsubmitted Done Reply Inline Actions this transform seems correct, but not sure if the original purpose of this test is still fulfilled danilaml: this transform seems correct, but not sure if the original purpose of this test is still…
	; VEC: scalar.ph:
	; VEC-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ undef, [[ENTRY:%.]] ]
	; VEC-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ undef, [[ENTRY]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]
	; VEC-NEXT: br label [[FOR_BODY14:%.*]]
	; VEC: for.body14:
	; VEC-NEXT: [[INDVARS_IV3:%.]] = phi i64 [ [[INDVARS_IV_NEXT4:%.]], [[FOR_INC23:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; VEC-NEXT: [[INEWCHUNKS_120:%.]] = phi i32 [ [[INEWCHUNKS_2:%.]], [[FOR_INC23]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; VEC-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[INDVARS_IV3]]
	; VEC-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX16]], align 4
	; VEC-NEXT: br i1 undef, label [[IF_THEN18:%.*]], label [[FOR_INC23]]
	; VEC: if.then18:
	; VEC-NEXT: store i32 2, i32* [[ARRAYIDX16]], align 4
	; VEC-NEXT: [[INC21:%.*]] = add nsw i32 [[INEWCHUNKS_120]], 1
	; VEC-NEXT: br label [[FOR_INC23]]
	; VEC: for.inc23:
	; VEC-NEXT: [[INEWCHUNKS_2]] = phi i32 [ [[INC21]], [[IF_THEN18]] ], [ [[INEWCHUNKS_120]], [[FOR_BODY14]] ]
	; VEC-NEXT: [[INDVARS_IV_NEXT4]] = add nsw i64 [[INDVARS_IV3]], 1
	; VEC-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV3]] to i32
	; VEC-NEXT: [[CMP13:%.*]] = icmp slt i32 [[TMP1]], 0
	; VEC-NEXT: call void @llvm.assume(i1 [[CMP13]])
	; VEC-NEXT: br label [[FOR_BODY14]]
	;			;

	entry:			entry:
	br label %for.body9			br label %for.body9

	for.body9:			for.body9:
	br i1 %cond, label %for.inc26, label %for.body14			br i1 %cond, label %for.inc26, label %for.body14

	for.body14:			for.body14:
	%indvars.iv3 = phi i64 [ %indvars.iv.next4, %for.inc23 ], [ undef, %for.body9 ]			%indvars.iv3 = phi i64 [ %indvars.iv.next4, %for.inc23 ], [ undef, %for.body9 ]
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; UNROLL-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1			; UNROLL-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1
	; UNROLL-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i32			; UNROLL-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i32
	; UNROLL-NEXT: [[TMP7:%.*]] = trunc i32 [[TMP6]] to i8			; UNROLL-NEXT: [[TMP7:%.*]] = trunc i32 [[TMP6]] to i8
	; UNROLL-NEXT: store i8 [[TMP7]], i8* [[TMP4]], align 1			; UNROLL-NEXT: store i8 [[TMP7]], i8* [[TMP4]], align 1
	; UNROLL-NEXT: br label [[PRED_STORE_CONTINUE6]]			; UNROLL-NEXT: br label [[PRED_STORE_CONTINUE6]]
	; UNROLL: pred.store.continue6:			; UNROLL: pred.store.continue6:
	; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; UNROLL-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef			; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef
	; UNROLL-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; UNROLL-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; UNROLL: for.body:			; UNROLL: for.body:
	; UNROLL-NEXT: [[TMP0:%.]] = phi i64 [ [[TMP6:%.]], [[FOR_INC:%.*]] ], [ undef, [[MIDDLE_BLOCK]] ]			; UNROLL-NEXT: [[TMP2:%.]] = getelementptr i8, i8 undef, i64 undef
	; UNROLL-NEXT: [[TMP1:%.]] = phi i64 [ [[TMP7:%.]], [[FOR_INC]] ], [ undef, [[MIDDLE_BLOCK]] ]
	; UNROLL-NEXT: [[TMP2:%.]] = getelementptr i8, i8 undef, i64 [[TMP0]]
	; UNROLL-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1			; UNROLL-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1
	; UNROLL-NEXT: br i1 [[C]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; UNROLL-NEXT: br i1 [[C]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; UNROLL: if.then:			; UNROLL: if.then:
	; UNROLL-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; UNROLL-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; UNROLL-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; UNROLL-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; UNROLL-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; UNROLL-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; UNROLL-NEXT: br label [[FOR_INC]]			; UNROLL-NEXT: br label [[FOR_INC]]
	; UNROLL: for.inc:			; UNROLL: for.inc:
	; UNROLL-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; UNROLL-NEXT: [[TMP6:%.*]] = add nuw nsw i64 undef, 1
	; UNROLL-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; UNROLL-NEXT: [[TMP7:%.*]] = add i64 undef, -1
	; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; UNROLL-NEXT: br i1 [[TMP8]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; UNROLL-NEXT: br label [[FOR_END]], !llvm.loop [[LOOP4:![0-9]+]]
	; UNROLL: for.end:			; UNROLL: for.end:
	; UNROLL-NEXT: ret void			; UNROLL-NEXT: ret void
	;			;
	; UNROLL-NOSIMPLIFY-LABEL: @minimal_bit_widths(			; UNROLL-NOSIMPLIFY-LABEL: @minimal_bit_widths(
	; UNROLL-NOSIMPLIFY-NEXT: entry:			; UNROLL-NOSIMPLIFY-NEXT: entry:
	; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; UNROLL-NOSIMPLIFY: vector.ph:			; UNROLL-NOSIMPLIFY: vector.ph:
	; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; UNROLL-NOSIMPLIFY-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; UNROLL-NOSIMPLIFY-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; UNROLL-NOSIMPLIFY-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; UNROLL-NOSIMPLIFY-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]
	; UNROLL-NOSIMPLIFY: for.inc:			; UNROLL-NOSIMPLIFY: for.inc:
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; UNROLL-NOSIMPLIFY-NEXT: [[TMP7]] = add i64 [[TMP1]], -1
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; UNROLL-NOSIMPLIFY-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[TMP8]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; UNROLL-NOSIMPLIFY: for.end:			; UNROLL-NOSIMPLIFY: for.end:
	; UNROLL-NOSIMPLIFY-NEXT: ret void			; UNROLL-NOSIMPLIFY-NEXT: ret void
	;			;
	; VEC-LABEL: @minimal_bit_widths(			; VEC-LABEL: @minimal_bit_widths(
	; VEC-NEXT: entry:			; VEC-NEXT: entry:
	; VEC-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i1> poison, i1 [[C:%.]], i32 0			; VEC-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <2 x i1> poison, i1 [[C:%.]], i32 0
	; VEC-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT]], <2 x i1> poison, <2 x i32> zeroinitializer			; VEC-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT]], <2 x i1> poison, <2 x i32> zeroinitializer
	; VEC-NEXT: br label [[VECTOR_BODY:%.*]]			; VEC-NEXT: br label [[VECTOR_BODY:%.*]]
	Show All 23 Lines
	; VEC-NEXT: [[TMP12:%.*]] = extractelement <2 x i8> [[WIDE_LOAD]], i32 1			; VEC-NEXT: [[TMP12:%.*]] = extractelement <2 x i8> [[WIDE_LOAD]], i32 1
	; VEC-NEXT: [[TMP13:%.*]] = zext i8 [[TMP12]] to i32			; VEC-NEXT: [[TMP13:%.*]] = zext i8 [[TMP12]] to i32
	; VEC-NEXT: [[TMP14:%.*]] = trunc i32 [[TMP13]] to i8			; VEC-NEXT: [[TMP14:%.*]] = trunc i32 [[TMP13]] to i8
	; VEC-NEXT: store i8 [[TMP14]], i8* [[TMP11]], align 1			; VEC-NEXT: store i8 [[TMP14]], i8* [[TMP11]], align 1
	; VEC-NEXT: br label [[PRED_STORE_CONTINUE3]]			; VEC-NEXT: br label [[PRED_STORE_CONTINUE3]]
	; VEC: pred.store.continue3:			; VEC: pred.store.continue3:
	; VEC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; VEC-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; VEC-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef			; VEC-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef
	; VEC-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; VEC-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; VEC: middle.block:			; VEC: middle.block:
	; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef			; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef
	; VEC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; VEC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; VEC: for.body:			; VEC: for.body:
	; VEC-NEXT: [[TMP0:%.]] = phi i64 [ [[TMP6:%.]], [[FOR_INC:%.*]] ], [ undef, [[MIDDLE_BLOCK]] ]			; VEC-NEXT: [[TMP2:%.]] = getelementptr i8, i8 undef, i64 undef
	; VEC-NEXT: [[TMP1:%.]] = phi i64 [ [[TMP7:%.]], [[FOR_INC]] ], [ undef, [[MIDDLE_BLOCK]] ]
	; VEC-NEXT: [[TMP2:%.]] = getelementptr i8, i8 undef, i64 [[TMP0]]
	; VEC-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1			; VEC-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1
	; VEC-NEXT: br i1 [[C]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; VEC-NEXT: br i1 [[C]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; VEC: if.then:			; VEC: if.then:
	; VEC-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; VEC-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; VEC-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; VEC-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; VEC-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; VEC-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; VEC-NEXT: br label [[FOR_INC]]			; VEC-NEXT: br label [[FOR_INC]]
	; VEC: for.inc:			; VEC: for.inc:
	; VEC-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; VEC-NEXT: [[TMP6:%.*]] = add nuw nsw i64 undef, 1
	; VEC-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; VEC-NEXT: [[TMP7:%.*]] = add i64 undef, -1
	; VEC-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; VEC-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; VEC-NEXT: br i1 [[TMP8]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; VEC-NEXT: br label [[FOR_END]], !llvm.loop [[LOOP5:![0-9]+]]
	; VEC: for.end:			; VEC: for.end:
	; VEC-NEXT: ret void			; VEC-NEXT: ret void
	;			;

	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%tmp0 = phi i64 [ %tmp6, %for.inc ], [ 0, %entry ]			%tmp0 = phi i64 [ %tmp6, %for.inc ], [ 0, %entry ]
	%tmp1 = phi i64 [ %tmp7, %for.inc ], [ undef, %entry ]			%tmp1 = phi i64 [ %tmp7, %for.inc ], [ undef, %entry ]
	%tmp2 = getelementptr i8, i8* undef, i64 %tmp0			%tmp2 = getelementptr i8, i8* undef, i64 %tmp0
	%tmp3 = load i8, i8* %tmp2, align 1			%tmp3 = load i8, i8* %tmp2, align 1
	Show All 13 Lines

	for.end:			for.end:
	ret void			ret void
	}			}

	define void @minimal_bit_widths_with_aliasing_store(i1 %c, i8* %ptr) {			define void @minimal_bit_widths_with_aliasing_store(i1 %c, i8* %ptr) {
	; UNROLL-LABEL: @minimal_bit_widths_with_aliasing_store(			; UNROLL-LABEL: @minimal_bit_widths_with_aliasing_store(
	; UNROLL-NEXT: entry:			; UNROLL-NEXT: entry:
	; UNROLL-NEXT: br label [[FOR_BODY:%.*]]			; UNROLL-NEXT: [[TMP2:%.]] = getelementptr i8, i8 [[PTR:%.*]], i64 0
	; UNROLL: for.body:
	; UNROLL-NEXT: [[TMP0:%.]] = phi i64 [ [[TMP6:%.]], [[FOR_INC:%.]] ], [ 0, [[ENTRY:%.]] ]
	; UNROLL-NEXT: [[TMP1:%.]] = phi i64 [ [[TMP7:%.]], [[FOR_INC]] ], [ 0, [[ENTRY]] ]
	; UNROLL-NEXT: [[TMP2:%.]] = getelementptr i8, i8 [[PTR:%.*]], i64 [[TMP0]]
	; UNROLL-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1			; UNROLL-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1
	; UNROLL-NEXT: store i8 0, i8* [[TMP2]], align 1			; UNROLL-NEXT: store i8 0, i8* [[TMP2]], align 1
	; UNROLL-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[FOR_INC]]			; UNROLL-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[FOR_INC:%.*]]
	; UNROLL: if.then:			; UNROLL: if.then:
	; UNROLL-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; UNROLL-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; UNROLL-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; UNROLL-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; UNROLL-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; UNROLL-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; UNROLL-NEXT: br label [[FOR_INC]]			; UNROLL-NEXT: br label [[FOR_INC]]
	; UNROLL: for.inc:			; UNROLL: for.inc:
	; UNROLL-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; UNROLL-NEXT: [[TMP6:%.*]] = add nuw nsw i64 0, 1
	; UNROLL-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; UNROLL-NEXT: [[TMP7:%.*]] = add i64 0, -1
	; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; UNROLL-NEXT: br i1 [[TMP8]], label [[FOR_END:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; UNROLL: for.end:
	; UNROLL-NEXT: ret void			; UNROLL-NEXT: ret void
	;			;
	; UNROLL-NOSIMPLIFY-LABEL: @minimal_bit_widths_with_aliasing_store(			; UNROLL-NOSIMPLIFY-LABEL: @minimal_bit_widths_with_aliasing_store(
	; UNROLL-NOSIMPLIFY-NEXT: entry:			; UNROLL-NOSIMPLIFY-NEXT: entry:
	; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; UNROLL-NOSIMPLIFY: vector.ph:			; UNROLL-NOSIMPLIFY: vector.ph:
	; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]
	; UNROLL-NOSIMPLIFY: vector.body:			; UNROLL-NOSIMPLIFY: vector.body:
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; UNROLL-NOSIMPLIFY-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; UNROLL-NOSIMPLIFY-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; UNROLL-NOSIMPLIFY-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; UNROLL-NOSIMPLIFY-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]
	; UNROLL-NOSIMPLIFY: for.inc:			; UNROLL-NOSIMPLIFY: for.inc:
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; UNROLL-NOSIMPLIFY-NEXT: [[TMP7]] = add i64 [[TMP1]], -1
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; UNROLL-NOSIMPLIFY-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[TMP8]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; UNROLL-NOSIMPLIFY: for.end:			; UNROLL-NOSIMPLIFY: for.end:
	; UNROLL-NOSIMPLIFY-NEXT: ret void			; UNROLL-NOSIMPLIFY-NEXT: ret void
	;			;
	; VEC-LABEL: @minimal_bit_widths_with_aliasing_store(			; VEC-LABEL: @minimal_bit_widths_with_aliasing_store(
	; VEC-NEXT: entry:			; VEC-NEXT: entry:
	; VEC-NEXT: br label [[FOR_BODY:%.*]]			; VEC-NEXT: [[TMP2:%.]] = getelementptr i8, i8 [[PTR:%.*]], i64 0
	; VEC: for.body:
	; VEC-NEXT: [[TMP0:%.]] = phi i64 [ [[TMP6:%.]], [[FOR_INC:%.]] ], [ 0, [[ENTRY:%.]] ]
	; VEC-NEXT: [[TMP1:%.]] = phi i64 [ [[TMP7:%.]], [[FOR_INC]] ], [ 0, [[ENTRY]] ]
	; VEC-NEXT: [[TMP2:%.]] = getelementptr i8, i8 [[PTR:%.*]], i64 [[TMP0]]
	; VEC-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1			; VEC-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1
	; VEC-NEXT: store i8 0, i8* [[TMP2]], align 1			; VEC-NEXT: store i8 0, i8* [[TMP2]], align 1
	; VEC-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[FOR_INC]]			; VEC-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[FOR_INC:%.*]]
	; VEC: if.then:			; VEC: if.then:
	; VEC-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; VEC-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; VEC-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; VEC-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; VEC-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; VEC-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; VEC-NEXT: br label [[FOR_INC]]			; VEC-NEXT: br label [[FOR_INC]]
	; VEC: for.inc:			; VEC: for.inc:
	; VEC-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; VEC-NEXT: [[TMP6:%.*]] = add nuw nsw i64 0, 1
	; VEC-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; VEC-NEXT: [[TMP7:%.*]] = add i64 0, -1
	; VEC-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; VEC-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; VEC-NEXT: br i1 [[TMP8]], label [[FOR_END:%.*]], label [[FOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; VEC: for.end:
	; VEC-NEXT: ret void			; VEC-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%tmp0 = phi i64 [ %tmp6, %for.inc ], [ 0, %entry ]			%tmp0 = phi i64 [ %tmp6, %for.inc ], [ 0, %entry ]
	%tmp1 = phi i64 [ %tmp7, %for.inc ], [ 0, %entry ]			%tmp1 = phi i64 [ %tmp7, %for.inc ], [ 0, %entry ]
	Show All 20 Lines

llvm/test/Transforms/LoopVectorize/loop-form.ll

	Show All 33 Lines
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_COND]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_COND]] ]
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_COND]], label [[IF_END]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[FOR_COND]], label [[IF_END]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; TAILFOLD-LABEL: @bottom_tested(			; TAILFOLD-LABEL: @bottom_tested(
	; TAILFOLD-NEXT: entry:			; TAILFOLD-NEXT: entry:
	; TAILFOLD-NEXT: [[SMAX:%.]] = call i32 @llvm.smax.i32(i32 [[N:%.]], i32 0)			; TAILFOLD-NEXT: [[SMAX:%.]] = call i32 @llvm.smax.i32(i32 [[N:%.]], i32 0)
	; TAILFOLD-NEXT: [[TMP0:%.*]] = add nuw i32 [[SMAX]], 1			; TAILFOLD-NEXT: [[TMP0:%.*]] = add nuw i32 [[SMAX]], 1
	; TAILFOLD-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; TAILFOLD-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	Show All 39 Lines
	; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]			; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
	; TAILFOLD: for.cond:			; TAILFOLD: for.cond:
	; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_COND]] ]			; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_COND]] ]
	; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4			; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
	; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1			; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; TAILFOLD-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; TAILFOLD-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_COND]], label [[IF_END]], [[LOOP2:!llvm.loop !.*]]			; TAILFOLD-NEXT: br i1 false, label [[FOR_COND]], label [[IF_END]], [[LOOP2:!llvm.loop !.*]]
	; TAILFOLD: if.end:			; TAILFOLD: if.end:
	; TAILFOLD-NEXT: ret void			; TAILFOLD-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]
	▲ Show 20 Lines • Show All 1,104 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP7]], i32 1
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[TMP12]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[TMP12]]
	; CHECK-NEXT: store i8 7, i8* [[TMP13]], align 8			; CHECK-NEXT: store i8 7, i8* [[TMP13]], align 8
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE6]]
	; CHECK: pred.store.continue6:			; CHECK: pred.store.continue6:
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16			; CHECK-NEXT: [[TMP14:%.*]] = icmp eq i32 [[INDEX_NEXT]], 16
	; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; CHECK-NEXT: br i1 [[TMP14]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 16, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[J:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[J_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[J:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[J_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[AJ:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[J]]			; CHECK-NEXT: [[AJ:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[J]]
	; CHECK-NEXT: store i8 69, i8* [[AJ]], align 8			; CHECK-NEXT: store i8 69, i8* [[AJ]], align 8
	; CHECK-NEXT: [[JP3:%.*]] = add nuw nsw i32 3, [[J]]			; CHECK-NEXT: [[JP3:%.*]] = add nuw nsw i32 3, [[J]]
	; CHECK-NEXT: [[AJP3:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[JP3]]			; CHECK-NEXT: [[AJP3:%.]] = getelementptr inbounds [18 x i8], [18 x i8] @a, i32 0, i32 [[JP3]]
	; CHECK-NEXT: store i8 7, i8* [[AJP3]], align 8			; CHECK-NEXT: store i8 7, i8* [[AJP3]], align 8
	; CHECK-NEXT: [[J_NEXT]] = add nuw nsw i32 [[J]], 1			; CHECK-NEXT: [[J_NEXT]] = add nuw nsw i32 [[J]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[J_NEXT]], 15			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[J_NEXT]], 15
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !2			; CHECK-NEXT: br i1 true, label [[FOR_END]], label [[FOR_BODY]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%j = phi i32 [ 0, %entry ], [ %j.next, %for.body ]			%j = phi i32 [ 0, %entry ], [ %j.next, %for.body ]
	Show All 15 Lines

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-liveout.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; VF-TWO-CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; VF-TWO-CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; VF-TWO-CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]			; VF-TWO-CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]
	; VF-TWO-CHECK-NEXT: [[TMP20:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; VF-TWO-CHECK-NEXT: [[TMP20:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; VF-TWO-CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDVARS_IV]]			; VF-TWO-CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDVARS_IV]]
	; VF-TWO-CHECK-NEXT: [[TMP21:%.]] = load i32, i32 [[ARRAYIDX2]], align 4			; VF-TWO-CHECK-NEXT: [[TMP21:%.]] = load i32, i32 [[ARRAYIDX2]], align 4
	; VF-TWO-CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP20]], [[TMP21]]			; VF-TWO-CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP20]], [[TMP21]]
	; VF-TWO-CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; VF-TWO-CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; VF-TWO-CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]			; VF-TWO-CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
	; VF-TWO-CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT_LOOPEXIT]], [[LOOP4:!llvm.loop !.*]]			; VF-TWO-CHECK-NEXT: br i1 false, label [[FOR_BODY]], label [[FOR_END_LOOPEXIT_LOOPEXIT]], [[LOOP4:!llvm.loop !.*]]
	; VF-TWO-CHECK: for.end.loopexit.loopexit:			; VF-TWO-CHECK: for.end.loopexit.loopexit:
	; VF-TWO-CHECK-NEXT: [[ADD_LCSSA3:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP19]], [[VEC_EPILOG_MIDDLE_BLOCK]] ]			; VF-TWO-CHECK-NEXT: [[ADD_LCSSA3:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ], [ [[TMP19]], [[VEC_EPILOG_MIDDLE_BLOCK]] ]
	; VF-TWO-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]			; VF-TWO-CHECK-NEXT: br label [[FOR_END_LOOPEXIT]]
	; VF-TWO-CHECK: for.end.loopexit:			; VF-TWO-CHECK: for.end.loopexit:
	; VF-TWO-CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[TMP9]], [[MIDDLE_BLOCK]] ], [ [[ADD_LCSSA3]], [[FOR_END_LOOPEXIT_LOOPEXIT]] ]			; VF-TWO-CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ [[TMP9]], [[MIDDLE_BLOCK]] ], [ [[ADD_LCSSA3]], [[FOR_END_LOOPEXIT_LOOPEXIT]] ]
	; VF-TWO-CHECK-NEXT: br label [[FOR_END]]			; VF-TWO-CHECK-NEXT: br label [[FOR_END]]
	; VF-TWO-CHECK: for.end:			; VF-TWO-CHECK: for.end:
	; VF-TWO-CHECK-NEXT: [[RES_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_END_LOOPEXIT]] ]			; VF-TWO-CHECK-NEXT: [[RES_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_LCSSA]], [[FOR_END_LOOPEXIT]] ]
	Show All 29 Lines

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll

	Show First 20 Lines • Show All 448 Lines • ▼ Show 20 Lines
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC3]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK:%.]] ]			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC3]], [[VEC_EPILOG_MIDDLE_BLOCK]] ], [ [[N_VEC]], [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[ITER_CHECK:%.]] ]
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK-PROFITABLE-BY-DEFAULT: for.body:			; CHECK-PROFITABLE-BY-DEFAULT: for.body:
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[VEC_EPILOG_SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[A]], i64 [[IV]]			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[A]], i64 [[IV]]
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: store i8 1, i8* [[ARRAYIDX]], align 1			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: store i8 1, i8* [[ARRAYIDX]], align 1
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[IV_NEXT]], [[N]]			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[IV_NEXT]], [[N]]
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT_LOOPEXIT]], [[LOOP4:!llvm.loop !.*]]			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: br i1 false, label [[FOR_BODY]], label [[FOR_END_LOOPEXIT_LOOPEXIT]], [[LOOP4:!llvm.loop !.*]]
	; CHECK-PROFITABLE-BY-DEFAULT: for.end.loopexit.loopexit:			; CHECK-PROFITABLE-BY-DEFAULT: for.end.loopexit.loopexit:
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: br label [[FOR_END_LOOPEXIT]]			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: br label [[FOR_END_LOOPEXIT]]
	; CHECK-PROFITABLE-BY-DEFAULT: for.end.loopexit:			; CHECK-PROFITABLE-BY-DEFAULT: for.end.loopexit:
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: br label [[FOR_END:%.*]]			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: br label [[FOR_END:%.*]]
	; CHECK-PROFITABLE-BY-DEFAULT: for.end:			; CHECK-PROFITABLE-BY-DEFAULT: for.end:
	; CHECK-PROFITABLE-BY-DEFAULT-NEXT: ret void			; CHECK-PROFITABLE-BY-DEFAULT-NEXT: ret void
	;			;
	entry:			entry:
	Show All 16 Lines

llvm/test/Transforms/LoopVectorize/pr44488-predication.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; CHECK: cond.false4:			; CHECK: cond.false4:
	; CHECK-NEXT: [[REM:%.*]] = srem i16 5786, [[LV]]			; CHECK-NEXT: [[REM:%.*]] = srem i16 5786, [[LV]]
	; CHECK-NEXT: br label [[FOR_LATCH]]			; CHECK-NEXT: br label [[FOR_LATCH]]
	; CHECK: for.latch:			; CHECK: for.latch:
	; CHECK-NEXT: [[COND6:%.*]] = phi i16 [ [[REM]], [[COND_FALSE4]] ], [ 5786, [[COND_END]] ]			; CHECK-NEXT: [[COND6:%.*]] = phi i16 [ [[REM]], [[COND_FALSE4]] ], [ 5786, [[COND_END]] ]
	; CHECK-NEXT: store i16 [[COND6]], i16* @v_39, align 1			; CHECK-NEXT: store i16 [[COND6]], i16* @v_39, align 1
	; CHECK-NEXT: [[INC7]] = add nsw i16 [[I_07]], 1			; CHECK-NEXT: [[INC7]] = add nsw i16 [[I_07]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[INC7]], 111			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[INC7]], 111
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[EXIT]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[FOR_BODY]], label [[EXIT]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[RV:%.]] = load i16, i16 @v_39, align 1			; CHECK-NEXT: [[RV:%.]] = load i16, i16 @v_39, align 1
	; CHECK-NEXT: ret i16 [[RV]]			; CHECK-NEXT: ret i16 [[RV]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.latch			for.body: ; preds = %entry, %for.latch
	Show All 24 Lines

llvm/test/Transforms/LoopVectorize/pr44547.ll

This file was added.

				; RUN: opt -S -loop-vectorize -simplifycfg -force-vector-width=2 -force-vector-interleave=1 < %s \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				;CHECK-LABEL: @single_iter_remainder(
				define void @single_iter_remainder(i16* noalias nocapture readonly %a, i16* noalias nocapture readonly %b, i16* noalias nocapture %c, i32 %n) {
				entry:
				%cmp7 = icmp eq i32 %n, 0
				br i1 %cmp7, label %for.cond.cleanup, label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				;CHECK: vector.body:
				;CHECK: for.body:
				;CHECK: br label %for.cond.cleanup
				for.body: ; preds = %entry, %for.body
				%i.011 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
				%a.addr.010 = phi i16* [ %incdec.ptr, %for.body ], [ %a, %entry ]
				%c.addr.09 = phi i16* [ %incdec.ptr4, %for.body ], [ %c, %entry ]
				%b.addr.08 = phi i16* [ %incdec.ptr1, %for.body ], [ %b, %entry ]
				%incdec.ptr = getelementptr inbounds i16, i16* %a.addr.010, i64 1
				%0 = load i16, i16* %a.addr.010, align 2
				%incdec.ptr1 = getelementptr inbounds i16, i16* %b.addr.08, i64 1
				%1 = load i16, i16* %b.addr.08, align 2
				%add = add i16 %1, %0
				%incdec.ptr4 = getelementptr inbounds i16, i16* %c.addr.09, i64 1
				store i16 %add, i16* %c.addr.09, align 2
				%inc = add nuw nsw i32 %i.011, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

				;CHECK-LABEL: @single_iter_remainder_checks(
				define void @single_iter_remainder_checks(i16* %a, i16* %b, i16* %c, i32 %n) {
				entry:
				%cmp7 = icmp eq i32 %n, 0
				br i1 %cmp7, label %for.cond.cleanup, label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				;CHECK: vector.body:
				;CHECK: for.body:
				;CHECK-NOT: br label %for.cond.cleanup
				;CHECK: br i1 %exitcond, label %for.cond.cleanup, label %for.body
				for.body: ; preds = %entry, %for.body
				%i.011 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
				%a.addr.010 = phi i16* [ %incdec.ptr, %for.body ], [ %a, %entry ]
				%c.addr.09 = phi i16* [ %incdec.ptr4, %for.body ], [ %c, %entry ]
				%b.addr.08 = phi i16* [ %incdec.ptr1, %for.body ], [ %b, %entry ]
				%incdec.ptr = getelementptr inbounds i16, i16* %a.addr.010, i64 1
				%0 = load i16, i16* %a.addr.010, align 2
				%incdec.ptr1 = getelementptr inbounds i16, i16* %b.addr.08, i64 1
				%1 = load i16, i16* %b.addr.08, align 2
				%add = add i16 %1, %0
				%incdec.ptr4 = getelementptr inbounds i16, i16* %c.addr.09, i64 1
				store i16 %add, i16* %c.addr.09, align 2
				%inc = add nuw nsw i32 %i.011, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4			; CHECK-NEXT: store i32 0, i32* [[PTR]], align 4
	; CHECK-NEXT: [[V2:%.*]] = trunc i64 [[IV]] to i8			; CHECK-NEXT: [[V2:%.*]] = trunc i64 [[IV]] to i8
	; CHECK-NEXT: [[V3:%.*]] = add i8 [[V2]], 1			; CHECK-NEXT: [[V3:%.*]] = add i8 [[V2]], 1
	; CHECK-NEXT: [[CMP15:%.*]] = icmp slt i8 [[V3]], 5			; CHECK-NEXT: [[CMP15:%.*]] = icmp slt i8 [[V3]], 5
	; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], [[INC]]			; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], [[INC]]
	; CHECK-NEXT: br i1 [[CMP15]], label [[LOOP]], label [[LOOP_EXIT]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[LOOP]], label [[LOOP_EXIT]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: loop.exit:			; CHECK: loop.exit:
	; CHECK-NEXT: [[DIV_1:%.*]] = udiv i64 [[Y]], [[ADD]]			; CHECK-NEXT: [[DIV_1:%.*]] = udiv i64 [[Y]], [[ADD]]
	; CHECK-NEXT: [[V1:%.*]] = add i64 [[DIV_1]], 1			; CHECK-NEXT: [[V1:%.*]] = add i64 [[DIV_1]], 1
	; CHECK-NEXT: br label [[LOOP_2:%.*]]			; CHECK-NEXT: br label [[LOOP_2:%.*]]
	; CHECK: loop.2:			; CHECK: loop.2:
	; CHECK-NEXT: [[IV_1:%.]] = phi i64 [ [[IV_NEXT_1:%.]], [[LOOP_2]] ], [ 0, [[LOOP_EXIT]] ]			; CHECK-NEXT: [[IV_1:%.]] = phi i64 [ [[IV_NEXT_1:%.]], [[LOOP_2]] ], [ 0, [[LOOP_EXIT]] ]
	; CHECK-NEXT: [[IV_NEXT_1]] = add i64 [[IV_1]], [[V1]]			; CHECK-NEXT: [[IV_NEXT_1]] = add i64 [[IV_1]], [[V1]]
	; CHECK-NEXT: call void @use(i64 [[IV_NEXT_1]])			; CHECK-NEXT: call void @use(i64 [[IV_NEXT_1]])
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll

	Show All 29 Lines
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP7]], <2 x i16> <i16 1, i16 1>, <2 x i16> [[WIDE_LOAD]]			; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP7]], <2 x i16> <i16 1, i16 1>, <2 x i16> [[WIDE_LOAD]]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[TMP0]]			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[TMP0]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i16 [[TMP10]] to <2 x i16>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i16 [[TMP10]] to <2 x i16>*
	; CHECK-NEXT: store <2 x i16> [[PREDPHI]], <2 x i16>* [[TMP11]], align 2			; CHECK-NEXT: store <2 x i16> [[PREDPHI]], <2 x i16>* [[TMP11]], align 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 32, 32			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 32, 32
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 32, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 32, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
	; CHECK-NEXT: [[IV_TRUNC:%.*]] = trunc i64 [[IV]] to i16			; CHECK-NEXT: [[IV_TRUNC:%.*]] = trunc i64 [[IV]] to i16
	; CHECK-NEXT: br label [[LOOP_COND:%.*]]			; CHECK-NEXT: br label [[LOOP_COND:%.*]]
	; CHECK: loop.cond:			; CHECK: loop.cond:
	; CHECK-NEXT: [[BLEND:%.*]] = phi i16 [ [[IV_TRUNC]], [[LOOP_HEADER]] ]			; CHECK-NEXT: [[BLEND:%.*]] = phi i16 [ [[IV_TRUNC]], [[LOOP_HEADER]] ]
	; CHECK-NEXT: [[SRC_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @src, i16 0, i16 [[BLEND]]			; CHECK-NEXT: [[SRC_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @src, i16 0, i16 [[BLEND]]
	; CHECK-NEXT: [[LV:%.]] = load i16, i16 [[SRC_PTR]], align 1			; CHECK-NEXT: [[LV:%.]] = load i16, i16 [[SRC_PTR]], align 1
	; CHECK-NEXT: [[CMP_B:%.*]] = icmp sgt i64 [[IV]], [[A]]			; CHECK-NEXT: [[CMP_B:%.*]] = icmp sgt i64 [[IV]], [[A]]
	; CHECK-NEXT: br i1 [[CMP_B]], label [[LOOP_NEXT:%.*]], label [[LOOP_LATCH]]			; CHECK-NEXT: br i1 [[CMP_B]], label [[LOOP_NEXT:%.*]], label [[LOOP_LATCH]]
	; CHECK: loop.next:			; CHECK: loop.next:
	; CHECK-NEXT: br label [[LOOP_LATCH]]			; CHECK-NEXT: br label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[RES:%.*]] = phi i16 [ [[LV]], [[LOOP_COND]] ], [ 1, [[LOOP_NEXT]] ]			; CHECK-NEXT: [[RES:%.*]] = phi i16 [ [[LV]], [[LOOP_COND]] ], [ 1, [[LOOP_NEXT]] ]
	; CHECK-NEXT: [[DST_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[IV]]			; CHECK-NEXT: [[DST_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[IV]]
	; CHECK-NEXT: store i16 [[RES]], i16* [[DST_PTR]], align 2			; CHECK-NEXT: store i16 [[RES]], i16* [[DST_PTR]], align 2
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[CMP439:%.*]] = icmp ult i64 [[IV]], 31			; CHECK-NEXT: [[CMP439:%.*]] = icmp ult i64 [[IV]], 31
	; CHECK-NEXT: br i1 [[CMP439]], label [[LOOP_HEADER]], label [[EXIT]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[LOOP_HEADER]], label [[EXIT]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[PREDPHI1:%.*]] = select <2 x i1> [[TMP12]], <2 x i16> <i16 1, i16 1>, <2 x i16> [[PREDPHI]]			; CHECK-NEXT: [[PREDPHI1:%.*]] = select <2 x i1> [[TMP12]], <2 x i16> <i16 1, i16 1>, <2 x i16> [[PREDPHI]]
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[TMP0]]			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[TMP0]]
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds i16, i16 [[TMP13]], i32 0			; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds i16, i16 [[TMP13]], i32 0
	; CHECK-NEXT: [[TMP15:%.]] = bitcast i16 [[TMP14]] to <2 x i16>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast i16 [[TMP14]] to <2 x i16>*
	; CHECK-NEXT: store <2 x i16> [[PREDPHI1]], <2 x i16>* [[TMP15]], align 2			; CHECK-NEXT: store <2 x i16> [[PREDPHI1]], <2 x i16>* [[TMP15]], align 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
	; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32
	; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 32, 32			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 32, 32
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 32, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 32, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
	Show All 9 Lines
	; CHECK: loop.next:			; CHECK: loop.next:
	; CHECK-NEXT: br label [[LOOP_LATCH]]			; CHECK-NEXT: br label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[RES:%.*]] = phi i16 [ 0, [[LOOP_HEADER]] ], [ [[LV]], [[LOOP_COND]] ], [ 1, [[LOOP_NEXT]] ]			; CHECK-NEXT: [[RES:%.*]] = phi i16 [ 0, [[LOOP_HEADER]] ], [ [[LV]], [[LOOP_COND]] ], [ 1, [[LOOP_NEXT]] ]
	; CHECK-NEXT: [[DST_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[IV]]			; CHECK-NEXT: [[DST_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[IV]]
	; CHECK-NEXT: store i16 [[RES]], i16* [[DST_PTR]], align 2			; CHECK-NEXT: store i16 [[RES]], i16* [[DST_PTR]], align 2
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[CMP439:%.*]] = icmp ult i64 [[IV]], 31			; CHECK-NEXT: [[CMP439:%.*]] = icmp ult i64 [[IV]], 31
	; CHECK-NEXT: br i1 [[CMP439]], label [[LOOP_HEADER]], label [[EXIT]], [[LOOP5:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[LOOP_HEADER]], label [[EXIT]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @src, i16 0, i16 [[TMP6]]			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @src, i16 0, i16 [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.]] = load i16, i16 [[TMP5]], align 1			; CHECK-NEXT: [[TMP8:%.]] = load i16, i16 [[TMP5]], align 1
	; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP7]], align 1			; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP7]], align 1
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
	; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <2 x i16> [[VEC_IND1]], <i16 2, i16 2>			; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <2 x i16> [[VEC_IND1]], <i16 2, i16 2>
	; CHECK-NEXT: [[VEC_IND_NEXT4]] = add <2 x i16> [[VEC_IND3]], <i16 2, i16 2>			; CHECK-NEXT: [[VEC_IND_NEXT4]] = add <2 x i16> [[VEC_IND3]], <i16 2, i16 2>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 32
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 32, 32			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 32, 32
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 32, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 32, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
	; CHECK-NEXT: [[IV_TRUNC:%.*]] = trunc i64 [[IV]] to i16			; CHECK-NEXT: [[IV_TRUNC:%.*]] = trunc i64 [[IV]] to i16
	; CHECK-NEXT: [[IV_TRUNC_2:%.*]] = trunc i64 [[IV]] to i16			; CHECK-NEXT: [[IV_TRUNC_2:%.*]] = trunc i64 [[IV]] to i16
	; CHECK-NEXT: [[CMP_A:%.*]] = icmp ugt i64 [[IV]], [[A]]			; CHECK-NEXT: [[CMP_A:%.*]] = icmp ugt i64 [[IV]], [[A]]
	; CHECK-NEXT: br i1 [[CMP_A]], label [[LOOP_NEXT:%.*]], label [[LOOP_LATCH]]			; CHECK-NEXT: br i1 [[CMP_A]], label [[LOOP_NEXT:%.*]], label [[LOOP_LATCH]]
	; CHECK: loop.next:			; CHECK: loop.next:
	; CHECK-NEXT: br label [[LOOP_LATCH]]			; CHECK-NEXT: br label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[BLEND:%.*]] = phi i16 [ [[IV_TRUNC]], [[LOOP_HEADER]] ], [ [[IV_TRUNC_2]], [[LOOP_NEXT]] ]			; CHECK-NEXT: [[BLEND:%.*]] = phi i16 [ [[IV_TRUNC]], [[LOOP_HEADER]] ], [ [[IV_TRUNC_2]], [[LOOP_NEXT]] ]
	; CHECK-NEXT: [[SRC_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @src, i16 0, i16 [[BLEND]]			; CHECK-NEXT: [[SRC_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @src, i16 0, i16 [[BLEND]]
	; CHECK-NEXT: [[LV:%.]] = load i16, i16 [[SRC_PTR]], align 1			; CHECK-NEXT: [[LV:%.]] = load i16, i16 [[SRC_PTR]], align 1
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[CMP439:%.*]] = icmp ult i64 [[IV]], 31			; CHECK-NEXT: [[CMP439:%.*]] = icmp ult i64 [[IV]], 31
	; CHECK-NEXT: br i1 [[CMP439]], label [[LOOP_HEADER]], label [[EXIT]], [[LOOP7:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[LOOP_HEADER]], label [[EXIT]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[TMP0]]			; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[TMP0]]
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i16, i16 [[TMP20]], i32 0			; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i16, i16 [[TMP20]], i32 0
	; CHECK-NEXT: [[TMP22:%.]] = bitcast i16 [[TMP21]] to <2 x i16>*			; CHECK-NEXT: [[TMP22:%.]] = bitcast i16 [[TMP21]] to <2 x i16>*
	; CHECK-NEXT: store <2 x i16> [[PREDPHI5]], <2 x i16>* [[TMP22]], align 2			; CHECK-NEXT: store <2 x i16> [[PREDPHI5]], <2 x i16>* [[TMP22]], align 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
	; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <2 x i16> [[VEC_IND1]], <i16 2, i16 2>			; CHECK-NEXT: [[VEC_IND_NEXT2]] = add <2 x i16> [[VEC_IND1]], <i16 2, i16 2>
	; CHECK-NEXT: [[TMP23:%.*]] = icmp eq i64 [[INDEX_NEXT]], 64			; CHECK-NEXT: [[TMP23:%.*]] = icmp eq i64 [[INDEX_NEXT]], 64
	; CHECK-NEXT: br i1 [[TMP23]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP23]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 64, 64			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 64, 64
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 64, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 64, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
	Show All 9 Lines
	; CHECK: loop.next:			; CHECK: loop.next:
	; CHECK-NEXT: br label [[LOOP_LATCH]]			; CHECK-NEXT: br label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[RES:%.*]] = phi i16 [ 0, [[LOOP_HEADER]] ], [ [[LV]], [[LOOP_COND]] ], [ 1, [[LOOP_NEXT]] ]			; CHECK-NEXT: [[RES:%.*]] = phi i16 [ 0, [[LOOP_HEADER]] ], [ [[LV]], [[LOOP_COND]] ], [ 1, [[LOOP_NEXT]] ]
	; CHECK-NEXT: [[DST_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[IV]]			; CHECK-NEXT: [[DST_PTR:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @dst, i16 0, i64 [[IV]]
	; CHECK-NEXT: store i16 [[RES]], i16* [[DST_PTR]], align 2			; CHECK-NEXT: store i16 [[RES]], i16* [[DST_PTR]], align 2
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[CMP439:%.*]] = icmp ult i64 [[IV]], 63			; CHECK-NEXT: [[CMP439:%.*]] = icmp ult i64 [[IV]], 63
	; CHECK-NEXT: br i1 [[CMP439]], label [[LOOP_HEADER]], label [[EXIT]], [[LOOP9:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[LOOP_HEADER]], label [[EXIT]], !llvm.loop [[LOOP9:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
	Show All 39 Lines
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i32> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i32> [[VEC_IND]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr i32, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr i32, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <2 x i32>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[VEC_IND]], <2 x i32>* [[TMP4]], align 4			; CHECK-NEXT: store <2 x i32> [[VEC_IND]], <2 x i32>* [[TMP4]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1000
	; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP10:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 1000, 1000			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 1000, 1000
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[ADD_I:%.]], [[LOOP_LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[ADD_I:%.]], [[LOOP_LATCH:%.*]] ]
	; CHECK-NEXT: [[C_0:%.*]] = icmp ugt i32 [[IV]], [[X]]			; CHECK-NEXT: [[C_0:%.*]] = icmp ugt i32 [[IV]], [[X]]
	; CHECK-NEXT: br i1 [[C_0]], label [[LOOP_LATCH]], label [[LOOP_LATCH]]			; CHECK-NEXT: br i1 [[C_0]], label [[LOOP_LATCH]], label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[P:%.*]] = phi i32 [ [[IV]], [[LOOP_HEADER]] ], [ [[IV]], [[LOOP_HEADER]] ]			; CHECK-NEXT: [[P:%.*]] = phi i32 [ [[IV]], [[LOOP_HEADER]] ], [ [[IV]], [[LOOP_HEADER]] ]
	; CHECK-NEXT: [[GEP_PTR:%.]] = getelementptr i32, i32 [[PTR]], i32 [[P]]			; CHECK-NEXT: [[GEP_PTR:%.]] = getelementptr i32, i32 [[PTR]], i32 [[P]]
	; CHECK-NEXT: store i32 [[P]], i32* [[GEP_PTR]], align 4			; CHECK-NEXT: store i32 [[P]], i32* [[GEP_PTR]], align 4
	; CHECK-NEXT: [[ADD_I]] = add nsw i32 [[P]], 1			; CHECK-NEXT: [[ADD_I]] = add nsw i32 [[P]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[ADD_I]], 1000			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[ADD_I]], 1000
	; CHECK-NEXT: br i1 [[CMP]], label [[LOOP_HEADER]], label [[EXIT]], [[LOOP11:!llvm.loop !.*]]			; CHECK-NEXT: br i1 false, label [[LOOP_HEADER]], label [[EXIT]], !llvm.loop [[LOOP11:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	%iv = phi i32 [ 0 , %entry ], [ %add.i, %loop.latch ]			%iv = phi i32 [ 0 , %entry ], [ %add.i, %loop.latch ]
	Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorizer] Simplify branch in the remainder loop for trivial casesNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 356531

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/ARM/sphinx.ll

llvm/test/Transforms/LoopVectorize/PowerPC/optimal-epilog-vectorization.ll

llvm/test/Transforms/LoopVectorize/SystemZ/predicated-first-order-recurrence.ll

llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll

llvm/test/Transforms/LoopVectorize/float-induction.ll

llvm/test/Transforms/LoopVectorize/if-pred-stores.ll

llvm/test/Transforms/LoopVectorize/loop-form.ll

llvm/test/Transforms/LoopVectorize/memdep-fold-tail.ll

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-liveout.ll

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll

llvm/test/Transforms/LoopVectorize/pr44488-predication.ll

llvm/test/Transforms/LoopVectorize/pr44547.ll

llvm/test/Transforms/LoopVectorize/pr46525-expander-insertpoint.ll

llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll

[LoopVectorizer] Simplify branch in the remainder loop for trivial cases
Needs ReviewPublic