This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
8/11
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
ARM/
-
sphinx.ll
-
SystemZ/
-
predicated-first-order-recurrence.ll
1/1
if-pred-stores.ll
-
pr44547.ll

Differential D75746

[LoopVectorizer] Simplify branch in the remainder loop for trivial cases
Needs ReviewPublic

Authored by danilaml on Mar 6 2020, 7:35 AM.

Download Raw Diff

Details

Reviewers

Ayal
fhahn
hsaito
gilr
rengolin
dcaballe

Summary

When vectorizing by factor of 2 the remainder loop always executes
only one iteration so there is no actual need to keep the branch.

Fixes PR44547

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

danilaml created this revision.Mar 6 2020, 7:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2020, 7:35 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

danilaml marked an inline comment as done.Mar 6 2020, 7:40 AM

danilaml added inline comments.

llvm/test/Transforms/LoopVectorize/if-pred-stores.ll
278	this transform seems correct, but not sure if the original purpose of this test is still fulfilled

danilaml updated this revision to Diff 248733.Mar 6 2020, 7:57 AM

danilaml updated this revision to Diff 248737.Mar 6 2020, 8:03 AM

danilaml added reviewers: Ayal, fhahn, hsaito, gilr, rengolin, dcaballe.Mar 6 2020, 8:08 AM

danilaml marked an inline comment as done.

danilaml added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	I think this check is enough unless there are other cases in which "remainder loop has `N % (VF*UF)` iterations doesn't hold.

Harbormaster completed remote builds in B48336: Diff 248728.Mar 6 2020, 8:14 AM

Harbormaster completed remote builds in B48344: Diff 248737.Mar 6 2020, 8:46 AM

Harbormaster completed remote builds in B48341: Diff 248733.

anton.kolesov added a subscriber: anton.kolesov.Mar 10 2020, 11:15 PM

ping

Eliminating a redundant back-edge is clearly good. It would be better if such a special-case cleanup could be handled by some subsequent optimization, and possibly generalized. Note however that the remainder loop may or may not be considered subject for further optimization: LV currently disables unrolling the remainder loop following r231631, OTOH it may be worth vectorizing according to D30247. If desired, optimizations other than vectorization should preferably be taken care of by subsequent passes such as indvars and loop-unroll. LV knows that the trip count of the remainder loop is at most VF*UF, in the absence of overflows and runtime guards, and can make an effort to convey this bound, if GVN and IPSCCP fail to pick it up (referring to PR44547). Unfortunately, introducing an llvm.assume() seems insufficient - perhaps it could be made to work?

LV originally tries to keep the control of the remainder loop intact, adjusting only the starting values of its phi's, including that of its iv. If this control is going to be modified, by hacking its latch branch, another alternative is to replace it altogether with a new canonical {0, +1, %TripCount} iv (as done for the vector loop), possibly following a %TripCount = urem %ComputedTripCount, VF*UF which conveys this upper bound clearly. Somewhat akin to how truncateToMinimalBitwidths() conveys minimal bitwidths.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	Note that the trip count of the remainder loop may be equal to VFUF, when loop `requiresScalarEpilogue()`; so in the above special case of VFUF==2 remainder loop may iterate once or twice. Note that `emitMinimumIterationCountCheck()` also takes care of the case where adding 1 to the backedge-taken count overflows, leading to an incorrect trip count of zero; here too "remainder" loop iterates (much) more than once.
3109	`BI` is aka `ScalarLatchBr`

danilaml updated this revision to Diff 251702.Mar 20 2020, 10:44 AM

danilaml marked 2 inline comments as done.

Herald added a reviewer: aartbik. · View Herald TranscriptMar 20 2020, 10:44 AM

Updated revision with additional checks and rebased.

I'm not sure that llvm.assume can be reliably made to work (and be simpler, than just eliminating back edge).

Going with {0, +1, %TripCount} might be more beneficial in the end (it should also be trivial to call vectorize/unroll recursively in the remainder loop in such case, if I understood things correctly).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	Thanks. I knew I might've missed something. This makes me more skeptical about potential llvm.assume solution. Am I understanding your note correctly, that adding requiresScalarEpilogue check is enough?

Harbormaster completed remote builds in B49923: Diff 251702.Mar 20 2020, 11:24 AM

Ayal added inline comments.Mar 20 2020, 5:28 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	Adding `requiresScalarEpilogue()` check is enough to handle the first case, but not the second case. One way to try and handle the second case is to change `getOrCreateVectorTripCount()` so that it relies on `PSE::getBackedgeTakenCount()` w/o adding 1 to it, as this addition (done by `getOrCreateTripCount()`) may overflow to zero. See r209854, and the `max_i32_backedgetaken()` test it added to test/Transforms/LoopVectorize/induction.ll. Another way may be to check if/when adding 1 is known not to overflow.

danilaml updated this revision to Diff 254766.Apr 3 2020, 6:04 AM

danilaml marked an inline comment as done.

danilaml updated this revision to Diff 254767.Apr 3 2020, 6:07 AM

danilaml marked 2 inline comments as done.Apr 3 2020, 6:46 AM

danilaml added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	I haven't figured out how to make VectorTripCount reliant solely on getBackedgeTakenCount without introducing extra checks/instructions (which seems to be the rationale behind the current code, instead of the one in r209854, at the expense of not vectorizing the rare UINT_MAX loops). Instead, I've opted in checking whether the overflow might've occurred by using getConstantMaxBackedgeTakenCount. If I understood things correctly, it would return -1 (all ones) in the overflow case.

Harbormaster failed remote builds in B51620: Diff 254766!Apr 3 2020, 6:57 AM

Harbormaster failed remote builds in B51621: Diff 254767!

danilaml updated this revision to Diff 254791.Apr 3 2020, 7:03 AM

danilaml marked an inline comment as done.

Ayal added inline comments.Apr 3 2020, 7:16 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

3107

A sketch of making VectorTripCount work correctly for loops with BTC= UINTMAX as well:

Step = VF*UF
BTC = PSE::BackedgeTakenCount()
N = BTC + 1                     // could overflow to 0 so do not compute N % Step
if (foldTail) N = N + (Step-1)  // for rounding up
R = BTC % Step                  // Fits foldTail: (N+Step-1)%Step == (BTC+1+Step-1)%Step == (BTC+Step)%Step == BTC%Step
if !(foldTail) { R = R + 1      // Fits requiresScalarEpilog: produces 0 < R <= Step 
  if !(requiresScalarEpilog) R = (R == Step ? 0 : R) == R % Step}
VectorTripCount = N - R

which could be optimized into

Step = VF*UF
BTC = PSE::BackedgeTakenCount()
R = BTC % Step
VTC = BTC - R
if !(requiresScalarEpilog) VTC = (R == Step-1 ? VTC + Step : VTC)
if (foldTail) VTC = VTC + Step
VectorTripCount = VTC

This also allows foldTail to work with Steps (i.e., UF's) that are not a power of 2.

Harbormaster failed remote builds in B51633: Diff 254791!Apr 3 2020, 8:02 AM

danilaml updated this revision to Diff 255335.Apr 6 2020, 7:35 AM

danilaml marked an inline comment as done.Apr 6 2020, 8:09 AM

danilaml added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	Hm, but doesn't it introduce additional instructions/compares that might lead to worse CodeGen? I.e. in the common case, BTC will need to be computed, when `BTC - R`, when `select` from `VTC` and `VTC+Step`. whereas currently it is just `VectorTripCount = TC - (TC % Step)`, where TC is already computed.

Harbormaster completed remote builds in B51942: Diff 255335.Apr 6 2020, 8:40 AM

Ayal added inline comments.Apr 6 2020, 12:42 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	Hopefully the code-size increase and the slowdown will be insignificant, being outside the loop, but indeed they need to be checked. Another option is to use the fact that, although BTC+1 might overflow and wrap to zero, BTC-(Step-1) may not underflow if !foldTail thanks to the min.iters.check of TripCount = N = BTC+1 >= Step, and therefore (BTC+1)(w/o overflow) === (BTC-(Step-1)) modulo Step. So for a common case of !foldTail && !requiresScalarEpilog, VectorTripCount can be computed w/o risk of overflow or underflow using `N-((N-Step)%Step)` instead of the current `N - (N % Step)`. For completeness, current VectorTripCount is computed by: Step = VF*UF BTC = PSE::BackedgeTakenCount() N = BTC + 1 if (foldTail) N = N + (Step-1) R = N % Step if (requiresScalarEpilog) R = (R == 0 ? Step : R) VectorTripCount = N - R

aartbik removed a reviewer: aartbik.Apr 14 2020, 4:15 PM

danilaml updated this revision to Diff 296446.Oct 6 2020, 6:47 AM

Rebased.
Had some time to come back to this.
I've tried implementing the following proposed way to get VTC locally:

Step = VF*UF
BTC = PSE::BackedgeTakenCount()
R = BTC % Step
VTC = BTC - R
if !(requiresScalarEpilog) VTC = (R == Step-1 ? VTC + Step : VTC)
if (foldTail) VTC = VTC + Step
VectorTripCount = VTC

But it had generally a negative performance impact on the benchmarks I've run mainly due to 1) more instructions required to compute VTC 2) VTC = (R == Step-1 ? VTC + Step : VTC) select for common case making vectorized loop's Exit Count uncomputable thus impeding some later optimizations.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve.

Harbormaster completed remote builds in B74134: Diff 296446.Oct 6 2020, 7:01 AM

danilaml updated this revision to Diff 314186.Dec 31 2020, 5:08 AM

Herald added a subscriber: nemanjai. · View Herald TranscriptDec 31 2020, 5:08 AM

Harbormaster completed remote builds in B83785: Diff 314186.Dec 31 2020, 5:44 AM

Is there no transform that can be taught to do this cleanup instead?

In D75746#2475681, @lebedev.ri wrote:

Is there no transform that can be taught to do this cleanup instead?

Possibly, but I have a hard time imagining how it could be done. Perhaps someone much more knowledgeable in LLVM loop optimizations than I can answer that.
I'm not even sure if it's currently possible to reliably reconstruct the unroll factor and find the remainder loop in some pass after unrolling has been complete.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3107	I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve. Currently if BTC = UINTMAX the vector loop is bypassed and the scalar "remainder" loop executes all iterations, under !foldTail. In order for loops requiring no runtime checks besides min.iters.check to execute the vector loop (as much as possible) also for BTC = UINTMAX, thereby ensuring that the scalar loop always executes at most Step-1 iterations -- the original motivation for this patch -- the following may be done: (1) Change min.iters.check to check if `BTC >= Step-1` instead of checking if `N = BTC+1 >= Step`. The latter overflows for BTC = UINTMAX thereby bypassing the vector loop, whereas the former does not wrap. (2) Compute the VectorTripCount using `N-((N-Step)%Step)` instead of `N - (N % Step)`. The latter produces zero when BTC = UINTMAX for any Step, which is incorrect for Steps that do not divide BTC+1, i.e., for non-power-of-2 UFs. With requiresScalarEpilog, apply the `R = (R == 0 ? Step : R)` to `R = ((N-Step)%Step)` before subtracting it from N. Note that VectorTripCount computed in (2) may overflow to zero, i.e., for BTC = UINTMAX and Step(UF) that is a power-of-2. This works correctly, as currently done with foldTail, where min.iters.check is eliminated, and UF is required to be a power-of-2. With foldTail, use `R = ((N-1)%Step) = BTC%Step` as suggested earlier above, which never wraps. In any case, this patch focuses on the tail. To summarize: Step = VF*UF BTC = PSE::BackedgeTakenCount() ; min.iters.check: if (!foldTail): if (BTC < Step-1) goto scalar loop N = BTC+1 ; may overflow to zero if (foldTail): R = BTC % Step if (!foldTail): R = (N-Step) % Step if (requiresScalarEpilog): R = (R == 0 ? Step : R) VectorTripCount = N - R Hopefully this way of ensuring that the tail scalar loop always executes less than Step iterations, also for non-power-of-2 Steps, has no significant negative performance impact.

Ayal mentioned this in D103255: [LV] Mark increment of main vector loop induction variable as NUW..Jun 9 2021, 3:03 PM

Rebased

Harbormaster completed remote builds in B112468: Diff 356531.Jul 5 2021, 11:09 AM

dcaballe resigned from this revision.Oct 8 2021, 2:33 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

8 lines

test/

Transforms/

LoopVectorize/

ARM/

sphinx.ll

2 lines

SystemZ/

predicated-first-order-recurrence.ll

2 lines

if-pred-stores.ll

90 lines

pr44547.ll

32 lines

Diff 248733

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,096 Lines • ▼ Show 20 Lines	BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
// Get ready to start creating new instructions into the vectorized body.		// Get ready to start creating new instructions into the vectorized body.
assert(LoopVectorPreHeader == Lp->getLoopPreheader() &&		assert(LoopVectorPreHeader == Lp->getLoopPreheader() &&
"Inconsistent vector loop preheader");		"Inconsistent vector loop preheader");
Builder.SetInsertPoint(&*LoopVectorBody->getFirstInsertionPt());		Builder.SetInsertPoint(&*LoopVectorBody->getFirstInsertionPt());

Optional<MDNode *> VectorizedLoopID =		Optional<MDNode *> VectorizedLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,		makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
LLVMLoopVectorizeFollowupVectorized});		LLVMLoopVectorizeFollowupVectorized});

		// If VFxUF is 2 and vector loop is not skipped then remainder executes once.
		if (VF * UF == 2 && !areSafetyChecksAdded()) {
		danilamlAuthorUnsubmitted Done Reply Inline Actions I think this check is enough unless there are other cases in which "remainder loop has `N % (VFUF)` iterations doesn't hold. danilaml:* I think this check is enough unless there are other cases in which "remainder loop has `N %…
		AyalUnsubmitted Done Reply Inline Actions Note that the trip count of the remainder loop may be equal to VFUF, when loop `requiresScalarEpilogue()`; so in the above special case of VFUF==2 remainder loop may iterate once or twice. Note that `emitMinimumIterationCountCheck()` also takes care of the case where adding 1 to the backedge-taken count overflows, leading to an incorrect trip count of zero; here too "remainder" loop iterates (much) more than once. Ayal: Note that the trip count of the remainder loop may be equal to VF*UF, when loop…
		danilamlAuthorUnsubmitted Done Reply Inline Actions Thanks. I knew I might've missed something. This makes me more skeptical about potential llvm.assume solution. Am I understanding your note correctly, that adding requiresScalarEpilogue check is enough? danilaml: Thanks. I knew I might've missed something. This makes me more skeptical about potential llvm.
		AyalUnsubmitted Done Reply Inline Actions Adding `requiresScalarEpilogue()` check is enough to handle the first case, but not the second case. One way to try and handle the second case is to change `getOrCreateVectorTripCount()` so that it relies on `PSE::getBackedgeTakenCount()` w/o adding 1 to it, as this addition (done by `getOrCreateTripCount()`) may overflow to zero. See r209854, and the `max_i32_backedgetaken()` test it added to test/Transforms/LoopVectorize/induction.ll. Another way may be to check if/when adding 1 is known not to overflow. Ayal: Adding `requiresScalarEpilogue()` check is enough to handle the first case, but not the second…
		danilamlAuthorUnsubmitted Done Reply Inline Actions I haven't figured out how to make VectorTripCount reliant solely on getBackedgeTakenCount without introducing extra checks/instructions (which seems to be the rationale behind the current code, instead of the one in r209854, at the expense of not vectorizing the rare UINT_MAX loops). Instead, I've opted in checking whether the overflow might've occurred by using getConstantMaxBackedgeTakenCount. If I understood things correctly, it would return -1 (all ones) in the overflow case. danilaml: I haven't figured out how to make VectorTripCount reliant solely on getBackedgeTakenCount…
		AyalUnsubmitted Not Done Reply Inline Actions A sketch of making VectorTripCount work correctly for loops with BTC= UINTMAX as well: Step = VFUF BTC = PSE::BackedgeTakenCount() N = BTC + 1 // could overflow to 0 so do not compute N % Step if (foldTail) N = N + (Step-1) // for rounding up R = BTC % Step // Fits foldTail: (N+Step-1)%Step == (BTC+1+Step-1)%Step == (BTC+Step)%Step == BTC%Step if !(foldTail) { R = R + 1 // Fits requiresScalarEpilog: produces 0 < R <= Step if !(requiresScalarEpilog) R = (R == Step ? 0 : R) == R % Step} VectorTripCount = N - R which could be optimized into Step = VFUF BTC = PSE::BackedgeTakenCount() R = BTC % Step VTC = BTC - R if !(requiresScalarEpilog) VTC = (R == Step-1 ? VTC + Step : VTC) if (foldTail) VTC = VTC + Step VectorTripCount = VTC This also allows foldTail to work with Steps (i.e., UF's) that are not a power of 2. Ayal: A sketch of making VectorTripCount work correctly for loops with BTC= UINTMAX as well: ```…
		danilamlAuthorUnsubmitted Done Reply Inline Actions Hm, but doesn't it introduce additional instructions/compares that might lead to worse CodeGen? I.e. in the common case, BTC will need to be computed, when `BTC - R`, when `select` from `VTC` and `VTC+Step`. whereas currently it is just `VectorTripCount = TC - (TC % Step)`, where TC is already computed. danilaml: Hm, but doesn't it introduce additional instructions/compares that might lead to worse CodeGen?
		AyalUnsubmitted Not Done Reply Inline Actions Hopefully the code-size increase and the slowdown will be insignificant, being outside the loop, but indeed they need to be checked. Another option is to use the fact that, although BTC+1 might overflow and wrap to zero, BTC-(Step-1) may not underflow if !foldTail thanks to the min.iters.check of TripCount = N = BTC+1 >= Step, and therefore (BTC+1)(w/o overflow) === (BTC-(Step-1)) modulo Step. So for a common case of !foldTail && !requiresScalarEpilog, VectorTripCount can be computed w/o risk of overflow or underflow using `N-((N-Step)%Step)` instead of the current `N - (N % Step)`. For completeness, current VectorTripCount is computed by: Step = VFUF BTC = PSE::BackedgeTakenCount() N = BTC + 1 if (foldTail) N = N + (Step-1) R = N % Step if (requiresScalarEpilog) R = (R == 0 ? Step : R) VectorTripCount = N - R Ayal:* Hopefully the code-size increase and the slowdown will be insignificant, being outside the loop…
		danilamlAuthorUnsubmitted Done Reply Inline Actions I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve. danilaml: I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve.
		AyalUnsubmitted Not Done Reply Inline Actions I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve. Currently if BTC = UINTMAX the vector loop is bypassed and the scalar "remainder" loop executes all iterations, under !foldTail. In order for loops requiring no runtime checks besides min.iters.check to execute the vector loop (as much as possible) also for BTC = UINTMAX, thereby ensuring that the scalar loop always executes at most Step-1 iterations -- the original motivation for this patch -- the following may be done: (1) Change min.iters.check to check if `BTC >= Step-1` instead of checking if `N = BTC+1 >= Step`. The latter overflows for BTC = UINTMAX thereby bypassing the vector loop, whereas the former does not wrap. (2) Compute the VectorTripCount using `N-((N-Step)%Step)` instead of `N - (N % Step)`. The latter produces zero when BTC = UINTMAX for any Step, which is incorrect for Steps that do not divide BTC+1, i.e., for non-power-of-2 UFs. With requiresScalarEpilog, apply the `R = (R == 0 ? Step : R)` to `R = ((N-Step)%Step)` before subtracting it from N. Note that VectorTripCount computed in (2) may overflow to zero, i.e., for BTC = UINTMAX and Step(UF) that is a power-of-2. This works correctly, as currently done with foldTail, where min.iters.check is eliminated, and UF is required to be a power-of-2. With foldTail, use `R = ((N-1)%Step) = BTC%Step` as suggested earlier above, which never wraps. In any case, this patch focuses on the tail. To summarize: Step = VFUF BTC = PSE::BackedgeTakenCount() ; min.iters.check: if (!foldTail): if (BTC < Step-1) goto scalar loop N = BTC+1 ; may overflow to zero if (foldTail): R = BTC % Step if (!foldTail): R = (N-Step) % Step if (requiresScalarEpilog): R = (R == 0 ? Step : R) VectorTripCount = N - R Hopefully this way of ensuring that the tail scalar loop always executes less than Step iterations, also for non-power-of-2 Steps, has no significant negative performance impact. Ayal:* > I'm not sure what using `N-((N-Step)%Step)` instead of `N - (N % Step)` is supposed to solve.
		if (BasicBlock *Latch = OrigLoop->getLoopLatch())
		if (BranchInst *BI = dyn_cast_or_null<BranchInst>(Latch->getTerminator()))
		AyalUnsubmitted Done Reply Inline Actions `BI` is aka `ScalarLatchBr` Ayal: `BI` is aka `ScalarLatchBr`
		BI->setCondition(Builder.getInt1(BI->getSuccessor(0) == LoopExitBlock));
		}

if (VectorizedLoopID.hasValue()) {		if (VectorizedLoopID.hasValue()) {
Lp->setLoopID(VectorizedLoopID.getValue());		Lp->setLoopID(VectorizedLoopID.getValue());

// Do not setAlreadyVectorized if loop attributes have been defined		// Do not setAlreadyVectorized if loop attributes have been defined
// explicitly.		// explicitly.
return LoopVectorPreHeader;		return LoopVectorPreHeader;
}		}

▲ Show 20 Lines • Show All 4,904 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/sphinx.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[MUL123:%.*]] = fmul fast double [[CONV122]], [[CONV122]]			; CHECK-NEXT: [[MUL123:%.*]] = fmul fast double [[CONV122]], [[CONV122]]
	; CHECK-NEXT: [[ARRAYIDX124:%.]] = getelementptr inbounds float, float [[T6]], i32 [[I_2132]]			; CHECK-NEXT: [[ARRAYIDX124:%.]] = getelementptr inbounds float, float [[T6]], i32 [[I_2132]]
	; CHECK-NEXT: [[T11:%.]] = load float, float [[ARRAYIDX124]], align 4			; CHECK-NEXT: [[T11:%.]] = load float, float [[ARRAYIDX124]], align 4
	; CHECK-NEXT: [[CONV125:%.*]] = fpext float [[T11]] to double			; CHECK-NEXT: [[CONV125:%.*]] = fpext float [[T11]] to double
	; CHECK-NEXT: [[MUL126:%.*]] = fmul fast double [[MUL123]], [[CONV125]]			; CHECK-NEXT: [[MUL126:%.*]] = fmul fast double [[MUL123]], [[CONV125]]
	; CHECK-NEXT: [[SUB127]] = fsub fast double [[DVAL1_4131]], [[MUL126]]			; CHECK-NEXT: [[SUB127]] = fsub fast double [[DVAL1_4131]], [[MUL126]]
	; CHECK-NEXT: [[INC129]] = add nuw nsw i32 [[I_2132]], 1			; CHECK-NEXT: [[INC129]] = add nuw nsw i32 [[I_2132]], 1
	; CHECK-NEXT: [[EXITCOND143:%.*]] = icmp eq i32 [[INC129]], [[T]]			; CHECK-NEXT: [[EXITCOND143:%.*]] = icmp eq i32 [[INC129]], [[T]]
	; CHECK-NEXT: br i1 [[EXITCOND143]], label [[OUTEREND]], label [[INNERLOOP]], !llvm.loop !2			; CHECK-NEXT: br i1 true, label [[OUTEREND]], label [[INNERLOOP]], !llvm.loop !2
	; CHECK: outerend:			; CHECK: outerend:
	; CHECK-NEXT: [[SUB127_LCSSA:%.*]] = phi double [ [[SUB127]], [[INNERLOOP]] ], [ [[TMP18]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SUB127_LCSSA:%.*]] = phi double [ [[SUB127]], [[INNERLOOP]] ], [ [[TMP18]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[CONV138:%.*]] = fptosi double [[SUB127_LCSSA]] to i32			; CHECK-NEXT: [[CONV138:%.*]] = fptosi double [[SUB127_LCSSA]] to i32
	; CHECK-NEXT: [[CALL142]] = add nuw nsw i32 [[SCORE_1135]], [[CONV138]]			; CHECK-NEXT: [[CALL142]] = add nuw nsw i32 [[SCORE_1135]], [[CONV138]]
	; CHECK-NEXT: [[INC144]] = add nuw nsw i32 [[J_0136]], 1			; CHECK-NEXT: [[INC144]] = add nuw nsw i32 [[J_0136]], 1
	; CHECK-NEXT: [[ARRAYIDX102:%.]] = getelementptr inbounds i32, i32 @a, i32 [[INC144]]			; CHECK-NEXT: [[ARRAYIDX102:%.]] = getelementptr inbounds i32, i32 @a, i32 [[INC144]]
	; CHECK-NEXT: [[V17]] = load i32, i32* [[ARRAYIDX102]], align 4			; CHECK-NEXT: [[V17]] = load i32, i32* [[ARRAYIDX102]], align 4
	; CHECK-NEXT: [[CMP103:%.*]] = icmp sgt i32 [[V17]], -1			; CHECK-NEXT: [[CMP103:%.*]] = icmp sgt i32 [[V17]], -1
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/SystemZ/predicated-first-order-recurrence.ll

	Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[LV:%.]], [[LOOP]] ]			; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i32 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[LV:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[LOOP]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[A_PTR:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @A, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[A_PTR:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @A, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[LV]] = load i32, i32* [[A_PTR]], align 4			; CHECK-NEXT: [[LV]] = load i32, i32* [[A_PTR]], align 4
	; CHECK-NEXT: [[B_PTR:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @B, i64 0, i64 [[INDVARS_IV]]			; CHECK-NEXT: [[B_PTR:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @B, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i32 [[SCALAR_RECUR]], i32* [[B_PTR]], align 4			; CHECK-NEXT: store i32 [[SCALAR_RECUR]], i32* [[B_PTR]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 5			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 5
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[EXIT]], label [[LOOP]], !llvm.loop !2			; CHECK-NEXT: br i1 true, label [[EXIT]], label [[LOOP]], !llvm.loop !2
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop: ; preds = %loop, %entry			loop: ; preds = %loop, %entry
	%rec = phi i32 [ 0, %entry], [ %lv, %loop ]			%rec = phi i32 [ 0, %entry], [ %lv, %loop ]
	Show All 12 Lines

llvm/test/Transforms/LoopVectorize/if-pred-stores.ll

	Show All 33 Lines
	; UNROLL: pred.store.continue3:			; UNROLL: pred.store.continue3:
	; UNROLL-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; UNROLL-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128			; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128
	; UNROLL-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; UNROLL-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 128, 128			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 128, 128
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; UNROLL: for.body:			; UNROLL: for.body:
	; UNROLL-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ 128, [[MIDDLE_BLOCK]] ]			; UNROLL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[F]], i64 128
	; UNROLL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[F]], i64 [[INDVARS_IV]]
	; UNROLL-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; UNROLL-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; UNROLL-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP9]], 100			; UNROLL-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP9]], 100
	; UNROLL-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; UNROLL-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; UNROLL: if.then:			; UNROLL: if.then:
	; UNROLL-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 20			; UNROLL-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 20
	; UNROLL-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4			; UNROLL-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4
	; UNROLL-NEXT: br label [[FOR_INC]]			; UNROLL-NEXT: br label [[FOR_INC]]
	; UNROLL: for.inc:			; UNROLL: for.inc:
	; UNROLL-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 128, 1
	; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128			; UNROLL-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
	; UNROLL-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !2			; UNROLL-NEXT: br label [[FOR_END]]
	; UNROLL: for.end:			; UNROLL: for.end:
	; UNROLL-NEXT: ret i32 0			; UNROLL-NEXT: ret i32 0
	;			;
	; UNROLL-NOSIMPLIFY-LABEL: @test(			; UNROLL-NOSIMPLIFY-LABEL: @test(
	; UNROLL-NOSIMPLIFY-NEXT: entry:			; UNROLL-NOSIMPLIFY-NEXT: entry:
	; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; UNROLL-NOSIMPLIFY: vector.ph:			; UNROLL-NOSIMPLIFY: vector.ph:
	; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]
	Show All 36 Lines
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; UNROLL-NOSIMPLIFY: if.then:			; UNROLL-NOSIMPLIFY: if.then:
	; UNROLL-NOSIMPLIFY-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 20			; UNROLL-NOSIMPLIFY-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP9]], 20
	; UNROLL-NOSIMPLIFY-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4			; UNROLL-NOSIMPLIFY-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]
	; UNROLL-NOSIMPLIFY: for.inc:			; UNROLL-NOSIMPLIFY: for.inc:
	; UNROLL-NOSIMPLIFY-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; UNROLL-NOSIMPLIFY-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128			; UNROLL-NOSIMPLIFY-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !2			; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !2
	; UNROLL-NOSIMPLIFY: for.end:			; UNROLL-NOSIMPLIFY: for.end:
	; UNROLL-NOSIMPLIFY-NEXT: ret i32 0			; UNROLL-NOSIMPLIFY-NEXT: ret i32 0
	;			;
	; VEC-LABEL: @test(			; VEC-LABEL: @test(
	; VEC-NEXT: entry:			; VEC-NEXT: entry:
	; VEC-NEXT: br label [[VECTOR_BODY:%.*]]			; VEC-NEXT: br label [[VECTOR_BODY:%.*]]
	; VEC: vector.body:			; VEC: vector.body:
	; VEC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.]] ]			; VEC-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.]] ]
	Show All 26 Lines
	; VEC: pred.store.continue2:			; VEC: pred.store.continue2:
	; VEC-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; VEC-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; VEC-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128			; VEC-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 128
	; VEC-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0			; VEC-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !0
	; VEC: middle.block:			; VEC: middle.block:
	; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 128, 128			; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 128, 128
	; VEC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; VEC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; VEC: for.body:			; VEC: for.body:
	; VEC-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.*]] ], [ 128, [[MIDDLE_BLOCK]] ]			; VEC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[F]], i64 128
	; VEC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[F]], i64 [[INDVARS_IV]]
	; VEC-NEXT: [[TMP14:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; VEC-NEXT: [[TMP14:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; VEC-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP14]], 100			; VEC-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP14]], 100
	; VEC-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; VEC-NEXT: br i1 [[CMP1]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; VEC: if.then:			; VEC: if.then:
	; VEC-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP14]], 20			; VEC-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP14]], 20
	; VEC-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4			; VEC-NEXT: store i32 [[ADD]], i32* [[ARRAYIDX]], align 4
	; VEC-NEXT: br label [[FOR_INC]]			; VEC-NEXT: br label [[FOR_INC]]
	; VEC: for.inc:			; VEC: for.inc:
	; VEC-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; VEC-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 128, 1
	; VEC-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128			; VEC-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 128
	; VEC-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !2			; VEC-NEXT: br label [[FOR_END]]
	; VEC: for.end:			; VEC: for.end:
	; VEC-NEXT: ret i32 0			; VEC-NEXT: ret i32 0
	;			;
	entry:			entry:
	br label %for.body			br label %for.body



	Show All 23 Lines
	; vectorized loop body.			; vectorized loop body.
	; PR18724			; PR18724

	define void @bug18724(i1 %cond) {			define void @bug18724(i1 %cond) {
	; UNROLL-LABEL: @bug18724(			; UNROLL-LABEL: @bug18724(
	; UNROLL-NEXT: entry:			; UNROLL-NEXT: entry:
	; UNROLL-NEXT: [[TMP0:%.]] = xor i1 [[COND:%.]], true			; UNROLL-NEXT: [[TMP0:%.]] = xor i1 [[COND:%.]], true
	; UNROLL-NEXT: call void @llvm.assume(i1 [[TMP0]])			; UNROLL-NEXT: call void @llvm.assume(i1 [[TMP0]])
	; UNROLL-NEXT: br label [[FOR_BODY14:%.*]]			; UNROLL-NEXT: unreachable
	; UNROLL: for.body14:
	; UNROLL-NEXT: [[INDVARS_IV3:%.]] = phi i64 [ [[INDVARS_IV_NEXT4:%.]], [[FOR_INC23:%.]] ], [ undef, [[ENTRY:%.]] ]
	; UNROLL-NEXT: [[INEWCHUNKS_120:%.]] = phi i32 [ [[INEWCHUNKS_2:%.]], [[FOR_INC23]] ], [ undef, [[ENTRY]] ]
	; UNROLL-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[INDVARS_IV3]]
	; UNROLL-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX16]], align 4
	; UNROLL-NEXT: br i1 undef, label [[IF_THEN18:%.*]], label [[FOR_INC23]]
	; UNROLL: if.then18:
	; UNROLL-NEXT: store i32 2, i32* [[ARRAYIDX16]], align 4
	; UNROLL-NEXT: [[INC21:%.*]] = add nsw i32 [[INEWCHUNKS_120]], 1
	; UNROLL-NEXT: br label [[FOR_INC23]]
	; UNROLL: for.inc23:
	; UNROLL-NEXT: [[INEWCHUNKS_2]] = phi i32 [ [[INC21]], [[IF_THEN18]] ], [ [[INEWCHUNKS_120]], [[FOR_BODY14]] ]
	; UNROLL-NEXT: [[INDVARS_IV_NEXT4]] = add nsw i64 [[INDVARS_IV3]], 1
	; UNROLL-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV3]] to i32
	; UNROLL-NEXT: [[CMP13:%.*]] = icmp slt i32 [[TMP1]], 0
	; UNROLL-NEXT: call void @llvm.assume(i1 [[CMP13]])
	; UNROLL-NEXT: br label [[FOR_BODY14]]
	;			;
	; UNROLL-NOSIMPLIFY-LABEL: @bug18724(			; UNROLL-NOSIMPLIFY-LABEL: @bug18724(
	; UNROLL-NOSIMPLIFY-NEXT: entry:			; UNROLL-NOSIMPLIFY-NEXT: entry:
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_BODY9:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_BODY9:%.*]]
	; UNROLL-NOSIMPLIFY: for.body9:			; UNROLL-NOSIMPLIFY: for.body9:
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[COND:%.]], label [[FOR_INC26:%.]], label [[FOR_BODY14_PREHEADER:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 [[COND:%.]], label [[FOR_INC26:%.]], label [[FOR_BODY14_PREHEADER:%.*]]
	; UNROLL-NOSIMPLIFY: for.body14.preheader:			; UNROLL-NOSIMPLIFY: for.body14.preheader:
	; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; UNROLL-NOSIMPLIFY-NEXT: store i32 2, i32* [[ARRAYIDX16]], align 4			; UNROLL-NOSIMPLIFY-NEXT: store i32 2, i32* [[ARRAYIDX16]], align 4
	; UNROLL-NOSIMPLIFY-NEXT: [[INC21:%.*]] = add nsw i32 [[INEWCHUNKS_120]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[INC21:%.*]] = add nsw i32 [[INEWCHUNKS_120]], 1
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC23]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC23]]
	; UNROLL-NOSIMPLIFY: for.inc23:			; UNROLL-NOSIMPLIFY: for.inc23:
	; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_2]] = phi i32 [ [[INC21]], [[IF_THEN18]] ], [ [[INEWCHUNKS_120]], [[FOR_BODY14]] ]			; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_2]] = phi i32 [ [[INC21]], [[IF_THEN18]] ], [ [[INEWCHUNKS_120]], [[FOR_BODY14]] ]
	; UNROLL-NOSIMPLIFY-NEXT: [[INDVARS_IV_NEXT4]] = add nsw i64 [[INDVARS_IV3]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[INDVARS_IV_NEXT4]] = add nsw i64 [[INDVARS_IV3]], 1
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV3]] to i32			; UNROLL-NOSIMPLIFY-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV3]] to i32
	; UNROLL-NOSIMPLIFY-NEXT: [[CMP13:%.*]] = icmp slt i32 [[TMP1]], 0			; UNROLL-NOSIMPLIFY-NEXT: [[CMP13:%.*]] = icmp slt i32 [[TMP1]], 0
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[CMP13]], label [[FOR_BODY14]], label [[FOR_INC26_LOOPEXIT]], !llvm.loop !4			; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[FOR_BODY14]], label [[FOR_INC26_LOOPEXIT]], !llvm.loop !4
	; UNROLL-NOSIMPLIFY: for.inc26.loopexit:			; UNROLL-NOSIMPLIFY: for.inc26.loopexit:
	; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_2_LCSSA:%.*]] = phi i32 [ [[INEWCHUNKS_2]], [[FOR_INC23]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]			; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_2_LCSSA:%.*]] = phi i32 [ [[INEWCHUNKS_2]], [[FOR_INC23]] ], [ [[BIN_RDX]], [[MIDDLE_BLOCK]] ]
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC26]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC26]]
	; UNROLL-NOSIMPLIFY: for.inc26:			; UNROLL-NOSIMPLIFY: for.inc26:
	; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_1_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY9]] ], [ [[INEWCHUNKS_2_LCSSA]], [[FOR_INC26_LOOPEXIT]] ]			; UNROLL-NOSIMPLIFY-NEXT: [[INEWCHUNKS_1_LCSSA:%.*]] = phi i32 [ undef, [[FOR_BODY9]] ], [ [[INEWCHUNKS_2_LCSSA]], [[FOR_INC26_LOOPEXIT]] ]
	; UNROLL-NOSIMPLIFY-NEXT: unreachable			; UNROLL-NOSIMPLIFY-NEXT: unreachable
	;			;
	; VEC-LABEL: @bug18724(			; VEC-LABEL: @bug18724(
	; VEC-NEXT: entry:			; VEC-NEXT: entry:
	; VEC-NEXT: [[TMP0:%.]] = xor i1 [[COND:%.]], true			; VEC-NEXT: [[TMP0:%.]] = xor i1 [[COND:%.]], true
	; VEC-NEXT: call void @llvm.assume(i1 [[TMP0]])			; VEC-NEXT: call void @llvm.assume(i1 [[TMP0]])
	; VEC-NEXT: br label [[FOR_BODY14:%.*]]			; VEC-NEXT: unreachable
				danilamlAuthorUnsubmitted Done Reply Inline Actions this transform seems correct, but not sure if the original purpose of this test is still fulfilled danilaml: this transform seems correct, but not sure if the original purpose of this test is still…
	; VEC: for.body14:
	; VEC-NEXT: [[INDVARS_IV3:%.]] = phi i64 [ [[INDVARS_IV_NEXT4:%.]], [[FOR_INC23:%.]] ], [ undef, [[ENTRY:%.]] ]
	; VEC-NEXT: [[INEWCHUNKS_120:%.]] = phi i32 [ [[INEWCHUNKS_2:%.]], [[FOR_INC23]] ], [ undef, [[ENTRY]] ]
	; VEC-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds [768 x i32], [768 x i32] undef, i64 0, i64 [[INDVARS_IV3]]
	; VEC-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX16]], align 4
	; VEC-NEXT: br i1 undef, label [[IF_THEN18:%.*]], label [[FOR_INC23]]
	; VEC: if.then18:
	; VEC-NEXT: store i32 2, i32* [[ARRAYIDX16]], align 4
	; VEC-NEXT: [[INC21:%.*]] = add nsw i32 [[INEWCHUNKS_120]], 1
	; VEC-NEXT: br label [[FOR_INC23]]
	; VEC: for.inc23:
	; VEC-NEXT: [[INEWCHUNKS_2]] = phi i32 [ [[INC21]], [[IF_THEN18]] ], [ [[INEWCHUNKS_120]], [[FOR_BODY14]] ]
	; VEC-NEXT: [[INDVARS_IV_NEXT4]] = add nsw i64 [[INDVARS_IV3]], 1
	; VEC-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV3]] to i32
	; VEC-NEXT: [[CMP13:%.*]] = icmp slt i32 [[TMP1]], 0
	; VEC-NEXT: call void @llvm.assume(i1 [[CMP13]])
	; VEC-NEXT: br label [[FOR_BODY14]]
	;			;
	entry:			entry:
	br label %for.body9			br label %for.body9

	for.body9:			for.body9:
	br i1 %cond, label %for.inc26, label %for.body14			br i1 %cond, label %for.inc26, label %for.body14

	for.body14:			for.body14:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; UNROLL-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1			; UNROLL-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1
	; UNROLL-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i32			; UNROLL-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i32
	; UNROLL-NEXT: [[TMP7:%.*]] = trunc i32 [[TMP6]] to i8			; UNROLL-NEXT: [[TMP7:%.*]] = trunc i32 [[TMP6]] to i8
	; UNROLL-NEXT: store i8 [[TMP7]], i8* [[TMP4]], align 1			; UNROLL-NEXT: store i8 [[TMP7]], i8* [[TMP4]], align 1
	; UNROLL-NEXT: br label [[PRED_STORE_CONTINUE6]]			; UNROLL-NEXT: br label [[PRED_STORE_CONTINUE6]]
	; UNROLL: pred.store.continue6:			; UNROLL: pred.store.continue6:
	; UNROLL-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; UNROLL-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef			; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef
	; UNROLL-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !3			; UNROLL-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !2
	; UNROLL: middle.block:			; UNROLL: middle.block:
	; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef			; UNROLL-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef
	; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; UNROLL-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; UNROLL: for.body:			; UNROLL: for.body:
	; UNROLL-NEXT: [[TMP0:%.]] = phi i64 [ [[TMP6:%.]], [[FOR_INC:%.*]] ], [ undef, [[MIDDLE_BLOCK]] ]			; UNROLL-NEXT: [[TMP2:%.]] = getelementptr i8, i8 undef, i64 undef
	; UNROLL-NEXT: [[TMP1:%.]] = phi i64 [ [[TMP7:%.]], [[FOR_INC]] ], [ undef, [[MIDDLE_BLOCK]] ]
	; UNROLL-NEXT: [[TMP2:%.]] = getelementptr i8, i8 undef, i64 [[TMP0]]
	; UNROLL-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1			; UNROLL-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1
	; UNROLL-NEXT: br i1 [[C]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; UNROLL-NEXT: br i1 [[C]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; UNROLL: if.then:			; UNROLL: if.then:
	; UNROLL-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; UNROLL-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; UNROLL-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; UNROLL-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; UNROLL-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; UNROLL-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; UNROLL-NEXT: br label [[FOR_INC]]			; UNROLL-NEXT: br label [[FOR_INC]]
	; UNROLL: for.inc:			; UNROLL: for.inc:
	; UNROLL-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; UNROLL-NEXT: [[TMP6:%.*]] = add nuw nsw i64 undef, 1
	; UNROLL-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; UNROLL-NEXT: [[TMP7:%.*]] = add i64 undef, -1
	; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; UNROLL-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; UNROLL-NEXT: br i1 [[TMP8]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !4			; UNROLL-NEXT: br label [[FOR_END]]
	; UNROLL: for.end:			; UNROLL: for.end:
	; UNROLL-NEXT: ret void			; UNROLL-NEXT: ret void
	;			;
	; UNROLL-NOSIMPLIFY-LABEL: @minimal_bit_widths(			; UNROLL-NOSIMPLIFY-LABEL: @minimal_bit_widths(
	; UNROLL-NOSIMPLIFY-NEXT: entry:			; UNROLL-NOSIMPLIFY-NEXT: entry:
	; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; UNROLL-NOSIMPLIFY-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; UNROLL-NOSIMPLIFY: vector.ph:			; UNROLL-NOSIMPLIFY: vector.ph:
	; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[VECTOR_BODY:%.*]]
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; UNROLL-NOSIMPLIFY-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; UNROLL-NOSIMPLIFY-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; UNROLL-NOSIMPLIFY-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; UNROLL-NOSIMPLIFY-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]			; UNROLL-NOSIMPLIFY-NEXT: br label [[FOR_INC]]
	; UNROLL-NOSIMPLIFY: for.inc:			; UNROLL-NOSIMPLIFY: for.inc:
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; UNROLL-NOSIMPLIFY-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; UNROLL-NOSIMPLIFY-NEXT: [[TMP7]] = add i64 [[TMP1]], -1
	; UNROLL-NOSIMPLIFY-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; UNROLL-NOSIMPLIFY-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; UNROLL-NOSIMPLIFY-NEXT: br i1 [[TMP8]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !6			; UNROLL-NOSIMPLIFY-NEXT: br i1 true, label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !6
	; UNROLL-NOSIMPLIFY: for.end:			; UNROLL-NOSIMPLIFY: for.end:
	; UNROLL-NOSIMPLIFY-NEXT: ret void			; UNROLL-NOSIMPLIFY-NEXT: ret void
	;			;
	; VEC-LABEL: @minimal_bit_widths(			; VEC-LABEL: @minimal_bit_widths(
	; VEC-NEXT: entry:			; VEC-NEXT: entry:
	; VEC-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <2 x i1> undef, i1 [[C:%.]], i32 0			; VEC-NEXT: [[BROADCAST_SPLATINSERT5:%.]] = insertelement <2 x i1> undef, i1 [[C:%.]], i32 0
	; VEC-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT5]], <2 x i1> undef, <2 x i32> zeroinitializer			; VEC-NEXT: [[BROADCAST_SPLAT6:%.*]] = shufflevector <2 x i1> [[BROADCAST_SPLATINSERT5]], <2 x i1> undef, <2 x i32> zeroinitializer
	; VEC-NEXT: br label [[VECTOR_BODY:%.*]]			; VEC-NEXT: br label [[VECTOR_BODY:%.*]]
	Show All 29 Lines
	; VEC-NEXT: [[TMP12:%.*]] = trunc i32 [[TMP11]] to i8			; VEC-NEXT: [[TMP12:%.*]] = trunc i32 [[TMP11]] to i8
	; VEC-NEXT: [[TMP13:%.*]] = add i64 [[INDEX]], 1			; VEC-NEXT: [[TMP13:%.*]] = add i64 [[INDEX]], 1
	; VEC-NEXT: [[TMP14:%.]] = getelementptr i8, i8 undef, i64 [[TMP13]]			; VEC-NEXT: [[TMP14:%.]] = getelementptr i8, i8 undef, i64 [[TMP13]]
	; VEC-NEXT: store i8 [[TMP12]], i8* [[TMP14]], align 1			; VEC-NEXT: store i8 [[TMP12]], i8* [[TMP14]], align 1
	; VEC-NEXT: br label [[PRED_STORE_CONTINUE8]]			; VEC-NEXT: br label [[PRED_STORE_CONTINUE8]]
	; VEC: pred.store.continue8:			; VEC: pred.store.continue8:
	; VEC-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; VEC-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; VEC-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef			; VEC-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], undef
	; VEC-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !4			; VEC-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop !2
	; VEC: middle.block:			; VEC: middle.block:
	; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef			; VEC-NEXT: [[CMP_N:%.*]] = icmp eq i64 undef, undef
	; VEC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; VEC-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; VEC: for.body:			; VEC: for.body:
	; VEC-NEXT: [[TMP0:%.]] = phi i64 [ [[TMP6:%.]], [[FOR_INC:%.*]] ], [ undef, [[MIDDLE_BLOCK]] ]			; VEC-NEXT: [[TMP2:%.]] = getelementptr i8, i8 undef, i64 undef
	; VEC-NEXT: [[TMP1:%.]] = phi i64 [ [[TMP7:%.]], [[FOR_INC]] ], [ undef, [[MIDDLE_BLOCK]] ]
	; VEC-NEXT: [[TMP2:%.]] = getelementptr i8, i8 undef, i64 [[TMP0]]
	; VEC-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1			; VEC-NEXT: [[TMP3:%.]] = load i8, i8 [[TMP2]], align 1
	; VEC-NEXT: br i1 [[C]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; VEC-NEXT: br i1 [[C]], label [[IF_THEN:%.]], label [[FOR_INC:%.]]
	; VEC: if.then:			; VEC: if.then:
	; VEC-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32			; VEC-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i32
	; VEC-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8			; VEC-NEXT: [[TMP5:%.*]] = trunc i32 [[TMP4]] to i8
	; VEC-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1			; VEC-NEXT: store i8 [[TMP5]], i8* [[TMP2]], align 1
	; VEC-NEXT: br label [[FOR_INC]]			; VEC-NEXT: br label [[FOR_INC]]
	; VEC: for.inc:			; VEC: for.inc:
	; VEC-NEXT: [[TMP6]] = add nuw nsw i64 [[TMP0]], 1			; VEC-NEXT: [[TMP6:%.*]] = add nuw nsw i64 undef, 1
	; VEC-NEXT: [[TMP7]] = add i64 [[TMP1]], -1			; VEC-NEXT: [[TMP7:%.*]] = add i64 undef, -1
	; VEC-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0			; VEC-NEXT: [[TMP8:%.*]] = icmp eq i64 [[TMP7]], 0
	; VEC-NEXT: br i1 [[TMP8]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop !5			; VEC-NEXT: br label [[FOR_END]]
	; VEC: for.end:			; VEC: for.end:
	; VEC-NEXT: ret void			; VEC-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%tmp0 = phi i64 [ %tmp6, %for.inc ], [ 0, %entry ]			%tmp0 = phi i64 [ %tmp6, %for.inc ], [ 0, %entry ]
	Show All 20 Lines

llvm/test/Transforms/LoopVectorize/pr44547.ll

This file was added.

				; RUN: opt -S -loop-vectorize -simplifycfg -force-vector-width=2 -force-vector-interleave=1 < %s \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				;CHECK-LABEL: @single_iter_remainder(
				define void @single_iter_remainder(i16* noalias nocapture readonly %a, i16* noalias nocapture readonly %b, i16* noalias nocapture %c, i32 %n) {
				entry:
				%cmp7 = icmp eq i32 %n, 0
				br i1 %cmp7, label %for.cond.cleanup, label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				;CHECK: vector.body:
				;CHECK: for.body:
				;CHECK: br label %for.cond.cleanup
				for.body: ; preds = %entry, %for.body
				%i.011 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
				%a.addr.010 = phi i16* [ %incdec.ptr, %for.body ], [ %a, %entry ]
				%c.addr.09 = phi i16* [ %incdec.ptr4, %for.body ], [ %c, %entry ]
				%b.addr.08 = phi i16* [ %incdec.ptr1, %for.body ], [ %b, %entry ]
				%incdec.ptr = getelementptr inbounds i16, i16* %a.addr.010, i64 1
				%0 = load i16, i16* %a.addr.010, align 2
				%incdec.ptr1 = getelementptr inbounds i16, i16* %b.addr.08, i64 1
				%1 = load i16, i16* %b.addr.08, align 2
				%add = add i16 %1, %0
				%incdec.ptr4 = getelementptr inbounds i16, i16* %c.addr.09, i64 1
				store i16 %add, i16* %c.addr.09, align 2
				%inc = add nuw nsw i32 %i.011, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorizer] Simplify branch in the remainder loop for trivial casesNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248733

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/ARM/sphinx.ll

llvm/test/Transforms/LoopVectorize/SystemZ/predicated-first-order-recurrence.ll

llvm/test/Transforms/LoopVectorize/if-pred-stores.ll

llvm/test/Transforms/LoopVectorize/pr44547.ll

[LoopVectorizer] Simplify branch in the remainder loop for trivial cases
Needs ReviewPublic