This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
3/5
LoopVectorizationLegality.cpp
6/8
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
control-flow.ll
-
loop-form.ll
1/2
loop-legality-checks.ll

Differential D93317

[LV] Vectorize (some) early and multiple exit loops
ClosedPublic

Authored by reames on Dec 15 2020, 10:38 AM.

Download Raw Diff

Details

Reviewers

Ayal
fhahn
anna

Commits

rGe4df6a40dad6: [LV] Vectorize (some) early and multiple exit loops

Summary

This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways:

single exit loops which are not bottom tested
multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later)

The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts.

The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration.

The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Dec 15 2020, 10:38 AM

Herald added subscribers: dantrushin, javed.absar, bollu and 2 others. · View Herald TranscriptDec 15 2020, 10:38 AM

reames requested review of this revision.Dec 15 2020, 10:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2020, 10:38 AM

Harbormaster completed remote builds in B82499: Diff 311960.Dec 15 2020, 10:38 AM

reames added a reviewer: anna.Dec 15 2020, 10:39 AM

reames mentioned this in rG99ac8868cfb4: [tests][LV] precommit tests for D93317.Dec 15 2020, 10:55 AM

rebase over landed tests

reames edited the summary of this revision. (Show Details)Dec 15 2020, 11:02 AM

Update tests to cover cases where we can't vectorize due to either a) size, or b) predication.

Doing this revealed that the handling of the predicate don't vectorize option is broken in the patch. (We correctly vectorize with a scalar epilogue where the user intent was not to vectorize.) Fix forthcoming.

(Note - comment edited as I'd originally misunderstood scope of work to address tail fold case above)

Rebase over a81db8b31. While that change is NFC, the code structure makes it much easier to disable vectorization when predication is demanded and we can't provide it.

One further simplification enabled by previous rebase.

ping

Nice leverage of requiresScalarEpilogue!
Looks good to me, adding some minor comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1098–1106	To avoid confusing exit/ing block terms: `// We currently must have a single "exit block" after the loop. Note that multiple "exiting blocks" inside the loop are allowed, provided they all reach the single exit block.`
1105	nit: a[n] unique
1121	Clarify failure reason, e.g., "The loop must have no live-out values if it has more than one exiting block" ?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1559	Checking that the only exit is the latch can be done (alternatively, literally) by `if (TheLoop->getExitingBlock() == TheLoop->getLoopLatch())`
3018	To avoid confusing exit/ing block/loop terms: `That is, if the loop contains multiple exiting blocks, or a single exiting block which is not the latch.`
5507–5524	// We can't vectorize anything but a bottom tested loop without a scalar epilogue. Perhaps better stated as: `// The only loops we can vectorize without a scalar epilogue, are loops with a bottom-test and a single exiting block.`
5509	nit: i[t]eration
5512	If predication is preferred over a scalar epilog, but the latter is not forbidden (i.e., the CM_ScalarEpilogueNotNeededUsePredicate case), we could "fallback to a vectorization with a scalar epilogue" here, instead of bailing out, as done below? Can test `if (TheLoop->getExitingBlock() != TheLoop->getLatchBlock())`
llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll
24	Would it be useful to keep this (single exiting, double latched(?)) test?

Thank you very much for putting up this patch! This looks like a good start.

I think there are some remaining code-gen issue, e.g. something like the example below leads to a verifier failure, when built with opt -loop-vectorize -force-vector-width=2. I didn't have time to take a closer look at what might cause the failure yet.

define void @test(float* %addr) {
entry:
  br label %loop.header

loop.header:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
  %gep = getelementptr float, float* %addr, i64 %iv
  %exitcond.not = icmp eq i64 %iv, 200
  br i1 %exitcond.not, label %exit, label %loop.body

loop.body:
  %0 = load float, float* %gep, align 4
  br i1 undef, label %loop.latch, label %then

then:
  store float 10.0, float* %gep, align 4
  br label %loop.latch

loop.latch:
  %iv.next = add nuw nsw i64 %iv, 1
  br label %loop.header

exit:
  ret void
}

This revision now requires changes to proceed.Dec 22 2020, 8:56 AM

Ayal, thanks for all the great wordsmith comments!

Florian, I can confirm the crash with your test case, will investigate.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1559	I went ahead and switched since you seem to have a preference, but in general, these are not the same. Consider an infinite loop with two latch blocks. Such a loop can't reach here, but it requires context to know that. The form of the check I used is context free. Doesn't really matter here, so I'll go with your preference.
llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll
24	Oh, I'd missed the fact this was a double latch, not just a non exiting latch. I'll add a test for that case in loop-form.ll along with the others and rebase.

Incorporate wording/style comments from Ayal, test rebase pending.

reames mentioned this in rGf106b281be24: [tests] precommit a test mentioned in review for D93317.Dec 22 2020, 9:49 AM

JFYI, multiple latch tests added in f106b2, but they don't impact the diff as SCEV can't prove exit counts and thus we don't vectorize.

Now to track down the problem Florian found.

Ok, I see what's going on w/Florian's example. It's an interaction with block predication and early exits. Essentially, block predication expects to be able to add uses of any condition in the loop, and the dead code elimination has eliminated the exit condition in the vector loop. I can restrict the legality slightly to avoid this interaction easily enough, but I want to think a bit first if there's a clean integration.

Incorporate a fix for the issue Florian found. Essentially, we can't both treat single use exit conditions as dead, and allow predication to use them in a mask. Since we know they're evaluate to true for the entire vector body, we can simply ignore them when forming edge predicates.

Florian, any other edge cases you can think of? I'd completely missed that one. Thank you for finding it!

reames mentioned this in D93725: [LV] Relax assumption that LCSSA implies single entry.Dec 22 2020, 12:23 PM

This looks fine to me, thanks; would be good to get @fhahn approval too.

Wonder if a similar optimization may be applied to the loop unroller as well - discarding all exiting edges (keeping one in the latch only) from a single-latched "countable" loop, whose last iteration (or more) is peeled.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1559	Agreed, this form assumes there is a single latch, which is also implied by the comment, and can be asserted.
7956	Good catch and fix!

In D93317#2468673, @reames wrote:

Incorporate a fix for the issue Florian found. Essentially, we can't both treat single use exit conditions as dead, and allow predication to use them in a mask. Since we know they're evaluate to true for the entire vector body, we can simply ignore them when forming edge predicates.

Florian, any other edge cases you can think of? I'd completely missed that one. Thank you for finding it!

I think there are some scenarios when we break some LCSSA PHIs that use PHIs created during SCEV expansion when we generate the runtime checks. After a first glance, it appears like an issue after we add the conditional branch from the middle block. The example below should cause a verifier failure with opt -loop-vectorize -force-vector-width=4. I tried to reduce it a bit, but unfortunately it is still quite ugly.

define void @widget(i16** %arg, i64 %N) local_unnamed_addr {
bb:
  br label %bb1

bb1:                                              ; preds = %bb16, %bb1, %bb
  %tmp = load i16*, i16** %arg, align 8
  %tmp2 = load i16*, i16** %arg, align 8
  br i1 undef, label %bb1, label %bb3

bb3:                                              ; preds = %bb15, %bb1
  br i1 undef, label %bb16, label %bb4

bb4:                                              ; preds = %bb3
  br i1 undef, label %bb5, label %bb15

bb5:                                              ; preds = %bb9, %bb4
  %tmp6 = phi i64 [ %tmp7, %bb9 ], [ %N, %bb4 ]
  %tmp7 = add nsw i64 %tmp6, -1
  %tmp8 = icmp sgt i64 %tmp6, 1
  br i1 %tmp8, label %bb9, label %bb13

bb9:                                              ; preds = %bb5
  %tmp10 = getelementptr inbounds i16, i16* %tmp2, i64 %tmp7
  %tmp11 = load i16, i16* %tmp10, align 2
  %tmp12 = getelementptr inbounds i16, i16* %tmp, i64 0
  store i16 %tmp11, i16* %tmp12, align 2
  br label %bb5

bb13:                                             ; preds = %bb15, %bb5
  %tmp14 = load i16, i16* %tmp, align 2
  ret void

bb15:                                             ; preds = %bb4
  br i1 undef, label %bb13, label %bb3

bb16:                                             ; preds = %bb3
  br label %bb1
}

In D93317#2470271, @fhahn wrote:

In D93317#2468673, @reames wrote:

Florian, any other edge cases you can think of? I'd completely missed that one. Thank you for finding it!

I think there are some scenarios when we break some LCSSA PHIs that use PHIs created during SCEV expansion when we generate the runtime checks. After a first glance, it appears like an issue after we add the conditional branch from the middle block. The example below should cause a verifier failure with opt -loop-vectorize -force-vector-width=4. I tried to reduce it a bit, but unfortunately it is still quite ugly.

I can confirm the failure and will debug. Thank you again for finding a cornercase.

Florian, do you mind if I landed this under an off by default flag? I realize we have correctness issues outstanding, but it would be a lot easier to test this, and highlight the fixes one by one if I was working off checked in code. Once we'd worked through everything, I'd enable and remove the flag.

In D93317#2470271, @fhahn wrote:

I think there are some scenarios when we break some LCSSA PHIs that use PHIs created during SCEV expansion when we generate the runtime checks. After a first glance, it appears like an issue after we add the conditional branch from the middle block. The example below should cause a verifier failure with opt -loop-vectorize -force-vector-width=4. I tried to reduce it a bit, but unfortunately it is still quite ugly.

Hmm, if it is reduced a bit further by fusing bb9 into bb5, making the loop bottom-tested, same failure still occurs, regardless of this patch?

I.e., replacing:

bb5:                                              ; preds = %bb9, %bb4
  %tmp6 = phi i64 [ %tmp7, %bb9 ], [ %N, %bb4 ]
  %tmp7 = add nsw i64 %tmp6, -1
  %tmp8 = icmp sgt i64 %tmp6, 1
  br i1 %tmp8, label %bb9, label %bb13

bb9:                                              ; preds = %bb5
  %tmp10 = getelementptr inbounds i16, i16* %tmp2, i64 %tmp7
  %tmp11 = load i16, i16* %tmp10, align 2
  %tmp12 = getelementptr inbounds i16, i16* %tmp, i64 0
  store i16 %tmp11, i16* %tmp12, align 2
  br label %bb5

with:

bb5:                                              ; preds = %bb5, %bb4
  %tmp6 = phi i64 [ %tmp7, %bb5 ], [ %N, %bb4 ]
  %tmp7 = add nsw i64 %tmp6, -1
  %tmp8 = icmp sgt i64 %tmp6, 1
  %tmp10 = getelementptr inbounds i16, i16* %tmp2, i64 %tmp7
  %tmp11 = load i16, i16* %tmp10, align 2
  %tmp12 = getelementptr inbounds i16, i16* %tmp, i64 0
  store i16 %tmp11, i16* %tmp12, align 2
  br i1 %tmp8, label %bb5, label %bb13

In D93317#2470271, @fhahn wrote:

I can confirm the failure and will debug. Thank you again for finding a cornercase.

Florian, do you mind if I landed this under an off by default flag? I realize we have correctness issues outstanding, but it would be a lot easier to test this, and highlight the fixes one by one if I was working off checked in code. Once we'd worked through everything, I'd enable and remove the flag.

IMO it would be preferable to land this without a flag, so we ensure this gets wide testing and we can flush out & fix the remaining issues early. I think it is almost there. I'll look into the issue over the next few days and I suggest to circle back after the holiday weekend and either land the patch together with a fix or behind a flag.

In D93317#2471303, @Ayal wrote:

Hmm, if it is reduced a bit further by fusing bb9 into bb5, making the loop bottom-tested, same failure still occurs, regardless of this patch?

Thanks Ayal! I had a suspicion that this crash only got surfaced when using the patch, as the code related to setting up the middle block & co is not really touched by this patch. I'll try to take a look over the next few days.

fhahn mentioned this in rG0ea3749b3cde: [LV] Set up branch from middle block earlier..Dec 27 2020, 10:21 AM

LGTM, thanks! I pushed a small fix for the crash discussed earlier in 0ea3749b3cde. It should be a very safe/straight-forward fix, but I would appreciate taking a look post-review.

I also did some additional testing with loop rotation disabled, which should stress test parts of the multi-exit support, on SPEC2006 & MultiSource. That didn't surface any further issues, so I think it should be fine to land this without a flag for now.

There appears to be some training whitespace in the diff, it would be good to re-format before landing.

This revision is now accepted and ready to land.Dec 27 2020, 10:25 AM

Closed by commit rGe4df6a40dad6: [LV] Vectorize (some) early and multiple exit loops (authored by reames). · Explain WhyDec 28 2020, 9:41 AM

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGe4df6a40dad6: [LV] Vectorize (some) early and multiple exit loops.

Ayal, Florian, thank you for the efforts on this review. I really appreciate the active review.

As an aside: I realized that supporting tail folding is actually a lot easier than I thought. I didn't do it in this patch, but a future patch just needs to a) not clamp the iteration space if tail folding, and b) use the exit conditions to form predicate masks. Amusingly, it would have been easier to start with tail folding from the beginning if I'd realized that. :)

The next patch in this series is D93725.

aeubanks added a subscriber: aeubanks.Dec 28 2020, 10:08 AM

aeubanks added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1120	This doesn't compile on Windows: http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio, I've reverted this change.

aeubanks added a reverting change: rG4ffcd4fe9ac2: Revert "[LV] Vectorize (some) early and multiple exit loops".Dec 28 2020, 10:09 AM

fhahn added inline comments.Jan 1 2021, 6:02 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1120	I missed this during the initial review, but we might derefence a nullptr here, because when `DoExtraAnalysis` is true, `ExitBB` may be null. I pushed d9f306aa52fe to only execute the checks in the `else` branch of the `!ExitBB` check

In D93317#2472174, @fhahn wrote:

LGTM, thanks! I pushed a small fix for the crash discussed earlier in 0ea3749b3cde. It should be a very safe/straight-forward fix, but I would appreciate taking a look post-review.

Thanks, good catch! Added a couple of post-commit nits.

reames mentioned this in rG9f61fbd75ae1: [LV] Relax assumption that LCSSA implies single entry.Jan 12 2021, 12:35 PM

Ayal mentioned this in D103700: [LV] Fix bug when unrolling (only) a loop with non-latch exit.Jun 6 2021, 11:55 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

25 lines

LoopVectorize.cpp

59 lines

test/

Transforms/

LoopVectorize/

control-flow.ll

2 lines

loop-form.ll

455 lines

loop-legality-checks.ll

23 lines

Diff 313868

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 1,089 Lines • ▼ Show 20 Lines	reportVectorizationFailure("The loop must have a single backedge",
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);		"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}

// We must have a single exiting block.		// We currently must have a single "exit block" after the loop. Note that
if (!Lp->getExitingBlock()) {		// multiple "exiting blocks" inside the loop are allowed, provided they all
reportVectorizationFailure("The loop must have an exiting block",		// reach the single exit block.
		// TODO: This restriction can be relaxed in the near future, it's here solely
		// to allow separation of changes for review. We need to generalize the phi
		// update logic in a number of places.
		BasicBlock *ExitBB = Lp->getUniqueExitBlock();
		if (!ExitBB) {
		AyalUnsubmitted Done Reply Inline Actions nit: a[n] unique Ayal: nit: a[n] unique
		reportVectorizationFailure("The loop must have a unique exit block",
		AyalUnsubmitted Done Reply Inline Actions To avoid confusing exit/ing block terms: `// We currently must have a single "exit block" after the loop. Note that multiple "exiting blocks" inside the loop are allowed, provided they all reach the single exit block.` Ayal: To avoid confusing exit/ing block terms: `// We currently must have a single "exit block" after…
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);		"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}

// We only handle bottom-tested loops, i.e. loop in which the condition is		// The existing code assumes that LCSSA implies that phis are single entry
// checked at the end of each iteration. With that we can assume that all		// (which was true when we had at most a single exiting edge from the latch).
// instructions in the loop are executed the same number of times.		// In general, there's nothing which prevents an LCSSA phi in exit block from
if (Lp->getExitingBlock() != Lp->getLoopLatch()) {		// having two or more values if there are multiple exiting edges leading to
reportVectorizationFailure("The exiting block is not the loop latch",		// the exit block. (TODO: implement general case)
		if (!empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) {
		aeubanksUnsubmitted Not Done Reply Inline Actions This doesn't compile on Windows: http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio, I've reverted this change. aeubanks: This doesn't compile on Windows: http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.
		fhahnUnsubmitted Not Done Reply Inline Actions I missed this during the initial review, but we might derefence a nullptr here, because when `DoExtraAnalysis` is true, `ExitBB` may be null. I pushed d9f306aa52fe to only execute the checks in the `else` branch of the `!ExitBB` check fhahn: I missed this during the initial review, but we might derefence a nullptr here, because when…
		reportVectorizationFailure("The loop must have no live-out values if "
		AyalUnsubmitted Done Reply Inline Actions Clarify failure reason, e.g., "The loop must have no live-out values if it has more than one exiting block" ? Ayal: Clarify failure reason, e.g., "The loop must have no live-out values if it has more than one…
		"it has more than one exiting block",
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);		"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}

▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 831 Lines • ▼ Show 20 Lines	protected:
BasicBlock *LoopVectorPreHeader;		BasicBlock *LoopVectorPreHeader;

/// The scalar-loop preheader.		/// The scalar-loop preheader.
BasicBlock *LoopScalarPreHeader;		BasicBlock *LoopScalarPreHeader;

/// Middle Block between the vector and the scalar.		/// Middle Block between the vector and the scalar.
BasicBlock *LoopMiddleBlock;		BasicBlock *LoopMiddleBlock;

/// The ExitBlock of the scalar loop.		/// The (unique) ExitBlock of the scalar loop. Note that
		/// there can be multiple exiting edges reaching this block.
BasicBlock *LoopExitBlock;		BasicBlock *LoopExitBlock;

/// The vector loop body.		/// The vector loop body.
BasicBlock *LoopVectorBody;		BasicBlock *LoopVectorBody;

/// The scalar loop body.		/// The scalar loop body.
BasicBlock *LoopScalarBody;		BasicBlock *LoopScalarBody;

▲ Show 20 Lines • Show All 694 Lines • ▼ Show 20 Lines	public:
}		}

/// Get the interleaved access group that \p Instr belongs to.		/// Get the interleaved access group that \p Instr belongs to.
const InterleaveGroup<Instruction> *		const InterleaveGroup<Instruction> *
getInterleavedAccessGroup(Instruction *Instr) {		getInterleavedAccessGroup(Instruction *Instr) {
return InterleaveInfo.getInterleaveGroup(Instr);		return InterleaveInfo.getInterleaveGroup(Instr);
}		}

/// Returns true if an interleaved group requires a scalar iteration		/// Returns true if we're required to use a scalar epilogue for at least
/// to handle accesses with gaps, and there is nothing preventing us from		/// the final iteration of the original loop.
/// creating a scalar epilogue.
bool requiresScalarEpilogue() const {		bool requiresScalarEpilogue() const {
return isScalarEpilogueAllowed() && InterleaveInfo.requiresScalarEpilogue();		if (!isScalarEpilogueAllowed())
		return false;
		// If we might exit from anywhere but the latch, must run the exiting
		// iteration in scalar form.
		if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())
		AyalUnsubmitted Done Reply Inline Actions Checking that the only exit is the latch can be done (alternatively, literally) by `if (TheLoop->getExitingBlock() == TheLoop->getLoopLatch())` Ayal: Checking that the only exit is the latch can be done (alternatively, literally) by `if (TheLoop…
		reamesAuthorUnsubmitted Done Reply Inline Actions I went ahead and switched since you seem to have a preference, but in general, these are not the same. Consider an infinite loop with two latch blocks. Such a loop can't reach here, but it requires context to know that. The form of the check I used is context free. Doesn't really matter here, so I'll go with your preference. reames: I went ahead and switched since you seem to have a preference, but in general, these are not…
		AyalUnsubmitted Not Done Reply Inline Actions Agreed, this form assumes there is a single latch, which is also implied by the comment, and can be asserted. Ayal: Agreed, this form assumes there is a single latch, which is also implied by the comment, and…
		return true;
		return InterleaveInfo.requiresScalarEpilogue();
}		}

/// Returns true if a scalar epilogue is not allowed due to optsize or a		/// Returns true if a scalar epilogue is not allowed due to optsize or a
/// loop hint annotation.		/// loop hint annotation.
bool isScalarEpilogueAllowed() const {		bool isScalarEpilogueAllowed() const {
return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed;		return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed;
}		}

▲ Show 20 Lines • Show All 1,343 Lines • ▼ Show 20 Lines	PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,
setDebugLocFromInst(Builder, OldInst);		setDebugLocFromInst(Builder, OldInst);

// Create i+1 and fill the PHINode.		// Create i+1 and fill the PHINode.
Value *Next = Builder.CreateAdd(Induction, Step, "index.next");		Value *Next = Builder.CreateAdd(Induction, Step, "index.next");
Induction->addIncoming(Start, L->getLoopPreheader());		Induction->addIncoming(Start, L->getLoopPreheader());
Induction->addIncoming(Next, Latch);		Induction->addIncoming(Next, Latch);
// Create the compare.		// Create the compare.
Value *ICmp = Builder.CreateICmpEQ(Next, End);		Value *ICmp = Builder.CreateICmpEQ(Next, End);
Builder.CreateCondBr(ICmp, L->getExitBlock(), Header);		Builder.CreateCondBr(ICmp, L->getUniqueExitBlock(), Header);

// Now we have two terminators. Remove the old one from the block.		// Now we have two terminators. Remove the old one from the block.
Latch->getTerminator()->eraseFromParent();		Latch->getTerminator()->eraseFromParent();

return Induction;		return Induction;
}		}

Value InnerLoopVectorizer::getOrCreateTripCount(Loop L) {		Value InnerLoopVectorizer::getOrCreateTripCount(Loop L) {
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {

// Now we need to generate the expression for the part of the loop that the		// Now we need to generate the expression for the part of the loop that the
// vectorized body will execute. This is equal to N - (N % Step) if scalar		// vectorized body will execute. This is equal to N - (N % Step) if scalar
// iterations are not required for correctness, or N - Step, otherwise. Step		// iterations are not required for correctness, or N - Step, otherwise. Step
// is equal to the vectorization factor (number of SIMD elements) times the		// is equal to the vectorization factor (number of SIMD elements) times the
// unroll factor (number of SIMD instructions).		// unroll factor (number of SIMD instructions).
Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");		Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");

// If there is a non-reversed interleaved group that may speculatively access		// There are two cases where we need to ensure (at least) the last iteration
// memory out-of-bounds, we need to ensure that there will be at least one		// runs in the scalar remainder loop. Thus, if the step evenly divides
// iteration of the scalar epilogue loop. Thus, if the step evenly divides
// the trip count, we set the remainder to be equal to the step. If the step		// the trip count, we set the remainder to be equal to the step. If the step
// does not evenly divide the trip count, no adjustment is necessary since		// does not evenly divide the trip count, no adjustment is necessary since
// there will already be scalar iterations. Note that the minimum iterations		// there will already be scalar iterations. Note that the minimum iterations
// check ensures that N >= Step.		// check ensures that N >= Step. The cases are:
		// 1) If there is a non-reversed interleaved group that may speculatively
		// access memory out-of-bounds.
		// 2) If any instruction may follow a conditionally taken exit. That is, if
		// the loop contains multiple exiting blocks, or a single exiting block
		AyalUnsubmitted Done Reply Inline Actions To avoid confusing exit/ing block/loop terms: `That is, if the loop contains multiple exiting blocks, or a single exiting block which is not the latch.` Ayal: To avoid confusing exit/ing block/loop terms: `That is, if the loop contains multiple exiting…
		// which is not the latch.
if (VF.isVector() && Cost->requiresScalarEpilogue()) {		if (VF.isVector() && Cost->requiresScalarEpilogue()) {
auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));		auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
R = Builder.CreateSelect(IsZero, Step, R);		R = Builder.CreateSelect(IsZero, Step, R);
}		}

VectorTripCount = Builder.CreateSub(TC, R, "n.vec");		VectorTripCount = Builder.CreateSub(TC, R, "n.vec");

return VectorTripCount;		return VectorTripCount;
▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	case InductionDescriptor::IK_NoInduction:
return nullptr;		return nullptr;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

Loop *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {		Loop *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopScalarBody = OrigLoop->getHeader();		LoopScalarBody = OrigLoop->getHeader();
LoopVectorPreHeader = OrigLoop->getLoopPreheader();		LoopVectorPreHeader = OrigLoop->getLoopPreheader();
LoopExitBlock = OrigLoop->getExitBlock();		LoopExitBlock = OrigLoop->getUniqueExitBlock();
assert(LoopExitBlock && "Must have an exit block");		assert(LoopExitBlock && "Must have an exit block");
assert(LoopVectorPreHeader && "Invalid loop structure");		assert(LoopVectorPreHeader && "Invalid loop structure");

LoopMiddleBlock =		LoopMiddleBlock =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,		SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
LI, nullptr, Twine(Prefix) + "middle.block");		LI, nullptr, Twine(Prefix) + "middle.block");
LoopScalarPreHeader =		LoopScalarPreHeader =
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,		SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
const InductionDescriptor &II,		const InductionDescriptor &II,
Value CountRoundDown, Value EndValue,		Value CountRoundDown, Value EndValue,
BasicBlock *MiddleBlock) {		BasicBlock *MiddleBlock) {
// There are two kinds of external IV usages - those that use the value		// There are two kinds of external IV usages - those that use the value
// computed in the last iteration (the PHI) and those that use the penultimate		// computed in the last iteration (the PHI) and those that use the penultimate
// value (the value that feeds into the phi from the loop latch).		// value (the value that feeds into the phi from the loop latch).
// We allow both, but they, obviously, have different values.		// We allow both, but they, obviously, have different values.

assert(OrigLoop->getExitBlock() && "Expected a single exit block");		assert(OrigLoop->getUniqueExitBlock() && "Expected a single exit block");

DenseMap<Value , Value > MissingVals;		DenseMap<Value , Value > MissingVals;

// An external user of the last iteration's value should see the value that		// An external user of the last iteration's value should see the value that
// the remainder loop uses to initialize its own IV.		// the remainder loop uses to initialize its own IV.
Value *PostInc = OrigPhi->getIncomingValueForBlock(OrigLoop->getLoopLatch());		Value *PostInc = OrigPhi->getIncomingValueForBlock(OrigLoop->getLoopLatch());
for (User *U : PostInc->users()) {		for (User *U : PostInc->users()) {
Instruction *UI = cast<Instruction>(U);		Instruction *UI = cast<Instruction>(U);
▲ Show 20 Lines • Show All 1,901 Lines • ▼ Show 20 Lines	case CM_ScalarEpilogueNotAllowedOptSize:
else		else
LLVM_DEBUG(dbgs() << "LV: Not allowing scalar epilogue due to low trip "		LLVM_DEBUG(dbgs() << "LV: Not allowing scalar epilogue due to low trip "
<< "count.\n");		<< "count.\n");

// Bail if runtime checks are required, which are not good when optimising		// Bail if runtime checks are required, which are not good when optimising
// for size.		// for size.
if (runtimeChecksRequired())		if (runtimeChecksRequired())
return None;		return None;

break;		break;
}		}

		// The only loops we can vectorize without a scalar epilogue, are loops with
		// a bottom-test and a single exiting block. We'd have to handle the fact
		// that not every instruction executes on the last iteration. This will
		AyalUnsubmitted Done Reply Inline Actions nit: i[t]eration Ayal: nit: i[t]eration
		// require a lane mask which varies through the vector loop body. (TODO)
		if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
		// If there was a tail-folding hint/switch, but we can't fold the tail by
		AyalUnsubmitted Done Reply Inline Actions If predication is preferred over a scalar epilog, but the latter is not forbidden (i.e., the CM_ScalarEpilogueNotNeededUsePredicate case), we could "fallback to a vectorization with a scalar epilogue" here, instead of bailing out, as done below? Can test `if (TheLoop->getExitingBlock() != TheLoop->getLatchBlock())` Ayal: If predication is preferred over a scalar epilog, but the latter is not forbidden (i.e., the…
		// masking, fallback to a vectorization with a scalar epilogue.
		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
		LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
		"scalar epilogue instead.\n");
		ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
		return MaxVF;
		}
		return None;
		}

// Now try the tail folding		// Now try the tail folding

		AyalUnsubmitted Done Reply Inline Actions // We can't vectorize anything but a bottom tested loop without a scalar epilogue. Perhaps better stated as: `// The only loops we can vectorize without a scalar epilogue, are loops with a bottom-test and a single exiting block.` Ayal: > // We can't vectorize anything but a bottom tested loop without a scalar epilogue. Perhaps…
// Invalidate interleave groups that require an epilogue if we can't mask		// Invalidate interleave groups that require an epilogue if we can't mask
// the interleave-group.		// the interleave-group.
if (!useMaskedInterleavedAccesses(TTI)) {		if (!useMaskedInterleavedAccesses(TTI)) {
assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&		assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point");		"No decisions should have been taken at this point");
// Note: There is no need to invalidate any cost modeling decisions here, as		// Note: There is no need to invalidate any cost modeling decisions here, as
// non where taken so far.		// non where taken so far.
InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();		InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
▲ Show 20 Lines • Show All 2,410 Lines • ▼ Show 20 Lines	VPValue VPRecipeBuilder::createEdgeMask(BasicBlock Src, BasicBlock *Dst,

// The terminator has to be a branch inst!		// The terminator has to be a branch inst!
BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());		BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
assert(BI && "Unexpected terminator found");		assert(BI && "Unexpected terminator found");

if (!BI->isConditional() \|\| BI->getSuccessor(0) == BI->getSuccessor(1))		if (!BI->isConditional() \|\| BI->getSuccessor(0) == BI->getSuccessor(1))
return EdgeMaskCache[Edge] = SrcMask;		return EdgeMaskCache[Edge] = SrcMask;

		// If source is an exiting block, we know the exit edge is dynamically dead
		// in the vector loop, and thus we don't need to restrict the mask. Avoid
		// adding uses of an otherwise potentially dead instruction.
		if (OrigLoop->isLoopExiting(Src))
		return EdgeMaskCache[Edge] = SrcMask;

		AyalUnsubmitted Not Done Reply Inline Actions Good catch and fix! Ayal: Good catch and fix!
VPValue *EdgeMask = Plan->getOrAddVPValue(BI->getCondition());		VPValue *EdgeMask = Plan->getOrAddVPValue(BI->getCondition());
assert(EdgeMask && "No Edge Mask found for condition");		assert(EdgeMask && "No Edge Mask found for condition");

if (BI->getSuccessor(0) != Dst)		if (BI->getSuccessor(0) != Dst)
EdgeMask = Builder.createNot(EdgeMask);		EdgeMask = Builder.createNot(EdgeMask);

if (SrcMask) // Otherwise block in-mask is all-one, no need to AND.		if (SrcMask) // Otherwise block in-mask is all-one, no need to AND.
EdgeMask = Builder.createAnd(EdgeMask, SrcMask);		EdgeMask = Builder.createAnd(EdgeMask, SrcMask);
▲ Show 20 Lines • Show All 1,499 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/control-flow.ll

	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S -pass-remarks-missed='loop-vectorize' 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S -pass-remarks-missed='loop-vectorize' 2>&1 \| FileCheck %s

	; C/C++ code for control flow test			; C/C++ code for control flow test
	; int test(int *A, int Length) {			; int test(int *A, int Length) {
	; for (int i = 0; i < Length; i++) {			; for (int i = 0; i < Length; i++) {
	; if (A[i] > 10.0) goto end;			; if (A[i] > 10.0) goto end;
	; A[i] = 0;			; A[i] = 0;
	; }			; }
	; end:			; end:
	; return 0;			; return 0;
	; }			; }

	; CHECK: remark: source.cpp:5:9: loop not vectorized: loop control flow is not understood by vectorizer			; CHECK: remark: source.cpp:5:9: loop not vectorized: could not determine number of loop iterations
	; CHECK: remark: source.cpp:5:9: loop not vectorized			; CHECK: remark: source.cpp:5:9: loop not vectorized

	; CHECK: _Z4testPii			; CHECK: _Z4testPii
	; CHECK-NOT: x i32>			; CHECK-NOT: x i32>
	; CHECK: ret			; CHECK: ret

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/loop-form.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -loop-vectorize -force-vector-width=2 < %s \| FileCheck %s			; RUN: opt -S -loop-vectorize -force-vector-width=2 < %s \| FileCheck %s
				; RUN: opt -S -loop-vectorize -force-vector-width=2 -prefer-predicate-over-epilogue=predicate-dont-vectorize < %s \| FileCheck --check-prefix TAILFOLD %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	define void @bottom_tested(i16* %p, i32 %n) {			define void @bottom_tested(i16* %p, i32 %n) {
	; CHECK-LABEL: @bottom_tested(			; CHECK-LABEL: @bottom_tested(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0			; CHECK-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = add nuw i32 [[SMAX]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add nuw i32 [[SMAX]], 1
	Show All 26 Lines
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_COND]], label [[IF_END]], [[LOOP2:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_COND]], label [[IF_END]], [[LOOP2:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
				; TAILFOLD-LABEL: @bottom_tested(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: [[TMP0:%.]] = icmp sgt i32 [[N:%.]], 0
				; TAILFOLD-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0
				; TAILFOLD-NEXT: [[TMP1:%.*]] = add nuw i32 [[SMAX]], 1
				; TAILFOLD-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; TAILFOLD: vector.ph:
				; TAILFOLD-NEXT: [[N_RND_UP:%.*]] = add i32 [[TMP1]], 1
				; TAILFOLD-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[N_RND_UP]], 2
				; TAILFOLD-NEXT: [[N_VEC:%.*]] = sub i32 [[N_RND_UP]], [[N_MOD_VF]]
				; TAILFOLD-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i32 [[TMP1]], 1
				; TAILFOLD-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i32> undef, i32 [[TRIP_COUNT_MINUS_1]], i32 0
				; TAILFOLD-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> undef, <2 x i32> zeroinitializer
				; TAILFOLD-NEXT: br label [[VECTOR_BODY:%.*]]
				; TAILFOLD: vector.body:
				; TAILFOLD-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.*]] ]
				; TAILFOLD-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE2]] ]
				; TAILFOLD-NEXT: [[TMP2:%.*]] = add i32 [[INDEX]], 0
				; TAILFOLD-NEXT: [[TMP3:%.*]] = add i32 [[INDEX]], 1
				; TAILFOLD-NEXT: [[TMP4:%.*]] = icmp ule <2 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
				; TAILFOLD-NEXT: [[TMP5:%.*]] = sext <2 x i32> [[VEC_IND]] to <2 x i64>
				; TAILFOLD-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP4]], i32 0
				; TAILFOLD-NEXT: br i1 [[TMP6]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; TAILFOLD: pred.store.if:
				; TAILFOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
				; TAILFOLD-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[TMP7]]
				; TAILFOLD-NEXT: store i16 0, i16* [[TMP8]], align 4
				; TAILFOLD-NEXT: br label [[PRED_STORE_CONTINUE]]
				; TAILFOLD: pred.store.continue:
				; TAILFOLD-NEXT: [[TMP9:%.*]] = extractelement <2 x i1> [[TMP4]], i32 1
				; TAILFOLD-NEXT: br i1 [[TMP9]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2]]
				; TAILFOLD: pred.store.if1:
				; TAILFOLD-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
				; TAILFOLD-NEXT: [[TMP11:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[TMP10]]
				; TAILFOLD-NEXT: store i16 0, i16* [[TMP11]], align 4
				; TAILFOLD-NEXT: br label [[PRED_STORE_CONTINUE2]]
				; TAILFOLD: pred.store.continue2:
				; TAILFOLD-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
				; TAILFOLD-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
				; TAILFOLD-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; TAILFOLD-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
				; TAILFOLD: middle.block:
				; TAILFOLD-NEXT: br i1 true, label [[IF_END:%.*]], label [[SCALAR_PH]]
				; TAILFOLD: scalar.ph:
				; TAILFOLD-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_COND]] ]
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_COND]], label [[IF_END]], [[LOOP2:!llvm.loop !.*]]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: ret void
				;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]
	%iprom = sext i32 %i to i64			%iprom = sext i32 %i to i64
	%b = getelementptr inbounds i16, i16* %p, i64 %iprom			%b = getelementptr inbounds i16, i16* %p, i64 %iprom
	store i16 0, i16* %b, align 4			store i16 0, i16* %b, align 4
	%inc = add nsw i32 %i, 1			%inc = add nsw i32 %i, 1
	%cmp = icmp slt i32 %i, %n			%cmp = icmp slt i32 %i, %n
	br i1 %cmp, label %for.cond, label %if.end			br i1 %cmp, label %for.cond, label %if.end

	if.end:			if.end:
	ret void			ret void
	}			}

	define void @early_exit(i16* %p, i32 %n) {			define void @early_exit(i16* %p, i32 %n) {
	; CHECK-LABEL: @early_exit(			; CHECK-LABEL: @early_exit(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[N:%.]], 0
				; CHECK-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0
				; CHECK-NEXT: [[TMP1:%.*]] = add nuw i32 [[SMAX]], 1
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i32 [[TMP1]], 2
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP1]], 2
				; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[N_MOD_VF]], 0
				; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i32 2, i32 [[N_MOD_VF]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP1]], [[TMP3]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[INDEX]], 1
				; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP4]] to i64
				; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[TMP7]], i32 0
				; CHECK-NEXT: [[TMP9:%.]] = bitcast i16 [[TMP8]] to <2 x i16>*
				; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP9]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
				; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_COND:%.*]]
				; CHECK: for.cond:
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
				; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
				; CHECK-NEXT: store i16 0, i16* [[B]], align 4
				; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; CHECK-NEXT: br label [[FOR_COND]], [[LOOP5:!llvm.loop !.*]]
				; CHECK: if.end:
				; CHECK-NEXT: ret void
				;
				; TAILFOLD-LABEL: @early_exit(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
				; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
				; TAILFOLD: for.body:
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: br label [[FOR_COND]]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: ret void
				;
				entry:
				br label %for.cond

				for.cond:
				%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%cmp = icmp slt i32 %i, %n
				br i1 %cmp, label %for.body, label %if.end

				for.body:
				%iprom = sext i32 %i to i64
				%b = getelementptr inbounds i16, i16* %p, i64 %iprom
				store i16 0, i16* %b, align 4
				%inc = add nsw i32 %i, 1
				br label %for.cond

				if.end:
				ret void
				}

				; Same as early_exit, but with optsize to prevent the use of
				; a scalar epilogue. -- Can't vectorize this in either case.
				define void @optsize(i16* %p, i32 %n) optsize {
				; CHECK-LABEL: @optsize(
				; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
	; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]			; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: br label [[FOR_COND]]			; CHECK-NEXT: br label [[FOR_COND]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
				; TAILFOLD-LABEL: @optsize(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
				; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
				; TAILFOLD: for.body:
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: br label [[FOR_COND]]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: ret void
				;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%cmp = icmp slt i32 %i, %n			%cmp = icmp slt i32 %i, %n
	br i1 %cmp, label %for.body, label %if.end			br i1 %cmp, label %for.body, label %if.end

	for.body:			for.body:
	%iprom = sext i32 %i to i64			%iprom = sext i32 %i to i64
	%b = getelementptr inbounds i16, i16* %p, i64 %iprom			%b = getelementptr inbounds i16, i16* %p, i64 %iprom
	store i16 0, i16* %b, align 4			store i16 0, i16* %b, align 4
	%inc = add nsw i32 %i, 1			%inc = add nsw i32 %i, 1
	br label %for.cond			br label %for.cond

	if.end:			if.end:
	ret void			ret void
	}			}


	; multiple exit - no values inside the loop used outside			; multiple exit - no values inside the loop used outside
	define void @multiple_unique_exit(i16* %p, i32 %n) {			define void @multiple_unique_exit(i16* %p, i32 %n) {
	; CHECK-LABEL: @multiple_unique_exit(			; CHECK-LABEL: @multiple_unique_exit(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[N:%.]], 0
				; CHECK-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[SMAX]], 2096
				; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP1]], i32 [[SMAX]], i32 2096
				; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i32 [[UMIN]], 1
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i32 [[TMP2]], 2
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP2]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[N_MOD_VF]], 0
				; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i32 2, i32 [[N_MOD_VF]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP2]], [[TMP4]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[INDEX]], 1
				; CHECK-NEXT: [[TMP7:%.*]] = sext i32 [[TMP5]] to i64
				; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[TMP7]]
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0
				; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <2 x i16>*
				; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP10]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
				; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
	; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP7:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
				; TAILFOLD-LABEL: @multiple_unique_exit(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
				; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
				; TAILFOLD: for.body:
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
				; TAILFOLD-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: ret void
				;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%cmp = icmp slt i32 %i, %n			%cmp = icmp slt i32 %i, %n
	br i1 %cmp, label %for.body, label %if.end			br i1 %cmp, label %for.body, label %if.end

	Show All 24 Lines
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ], [ [[I]], [[FOR_COND]] ]			; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ], [ [[I]], [[FOR_COND]] ]
	; CHECK-NEXT: ret i32 [[I_LCSSA]]			; CHECK-NEXT: ret i32 [[I_LCSSA]]
	;			;
				; TAILFOLD-LABEL: @multiple_unique_exit2(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
				; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
				; TAILFOLD: for.body:
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
				; TAILFOLD-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ], [ [[I]], [[FOR_COND]] ]
				; TAILFOLD-NEXT: ret i32 [[I_LCSSA]]
				;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%cmp = icmp slt i32 %i, %n			%cmp = icmp slt i32 %i, %n
	br i1 %cmp, label %for.body, label %if.end			br i1 %cmp, label %for.body, label %if.end

	Show All 24 Lines
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[EXIT:%.*]] = phi i32 [ 0, [[FOR_COND]] ], [ 1, [[FOR_BODY]] ]			; CHECK-NEXT: [[EXIT:%.*]] = phi i32 [ 0, [[FOR_COND]] ], [ 1, [[FOR_BODY]] ]
	; CHECK-NEXT: ret i32 [[EXIT]]			; CHECK-NEXT: ret i32 [[EXIT]]
	;			;
				; TAILFOLD-LABEL: @multiple_unique_exit3(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
				; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
				; TAILFOLD: for.body:
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
				; TAILFOLD-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: [[EXIT:%.*]] = phi i32 [ 0, [[FOR_COND]] ], [ 1, [[FOR_BODY]] ]
				; TAILFOLD-NEXT: ret i32 [[EXIT]]
				;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%cmp = icmp slt i32 %i, %n			%cmp = icmp slt i32 %i, %n
	br i1 %cmp, label %for.body, label %if.end			br i1 %cmp, label %for.body, label %if.end

	Show All 26 Lines
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	; CHECK: if.end2:			; CHECK: if.end2:
	; CHECK-NEXT: ret i32 1			; CHECK-NEXT: ret i32 1
	;			;
				; TAILFOLD-LABEL: @multiple_exit_blocks(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
				; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
				; TAILFOLD: for.body:
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
				; TAILFOLD-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END2:%.*]]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: ret i32 0
				; TAILFOLD: if.end2:
				; TAILFOLD-NEXT: ret i32 1
				;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%cmp = icmp slt i32 %i, %n			%cmp = icmp slt i32 %i, %n
	br i1 %cmp, label %for.body, label %if.end			br i1 %cmp, label %for.body, label %if.end

	Show All 27 Lines
	; CHECK-NEXT: switch i32 [[I]], label [[FOR_COND]] [			; CHECK-NEXT: switch i32 [[I]], label [[FOR_COND]] [
	; CHECK-NEXT: i32 2096, label [[IF_END:%.*]]			; CHECK-NEXT: i32 2096, label [[IF_END:%.*]]
	; CHECK-NEXT: i32 2097, label [[IF_END]]			; CHECK-NEXT: i32 2097, label [[IF_END]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ], [ [[I]], [[FOR_COND]] ]			; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ], [ [[I]], [[FOR_COND]] ]
	; CHECK-NEXT: ret i32 [[I_LCSSA]]			; CHECK-NEXT: ret i32 [[I_LCSSA]]
	;			;
				; TAILFOLD-LABEL: @multiple_exit_switch(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_COND]] ]
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: switch i32 [[I]], label [[FOR_COND]] [
				; TAILFOLD-NEXT: i32 2096, label [[IF_END:%.*]]
				; TAILFOLD-NEXT: i32 2097, label [[IF_END]]
				; TAILFOLD-NEXT: ]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_COND]] ], [ [[I]], [[FOR_COND]] ]
				; TAILFOLD-NEXT: ret i32 [[I_LCSSA]]
				;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]
	%iprom = sext i32 %i to i64			%iprom = sext i32 %i to i64
	%b = getelementptr inbounds i16, i16* %p, i64 %iprom			%b = getelementptr inbounds i16, i16* %p, i64 %iprom
	store i16 0, i16* %b, align 4			store i16 0, i16* %b, align 4
	Show All 23 Lines
	; CHECK-NEXT: i32 2096, label [[IF_END:%.*]]			; CHECK-NEXT: i32 2096, label [[IF_END:%.*]]
	; CHECK-NEXT: i32 2097, label [[IF_END2:%.*]]			; CHECK-NEXT: i32 2097, label [[IF_END2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	; CHECK: if.end2:			; CHECK: if.end2:
	; CHECK-NEXT: ret i32 1			; CHECK-NEXT: ret i32 1
	;			;
				; TAILFOLD-LABEL: @multiple_exit_switch2(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
				; TAILFOLD: for.cond:
				; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_COND]] ]
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; TAILFOLD-NEXT: switch i32 [[I]], label [[FOR_COND]] [
				; TAILFOLD-NEXT: i32 2096, label [[IF_END:%.*]]
				; TAILFOLD-NEXT: i32 2097, label [[IF_END2:%.*]]
				; TAILFOLD-NEXT: ]
				; TAILFOLD: if.end:
				; TAILFOLD-NEXT: ret i32 0
				; TAILFOLD: if.end2:
				; TAILFOLD-NEXT: ret i32 1
				;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.cond ]
	%iprom = sext i32 %i to i64			%iprom = sext i32 %i to i64
	%b = getelementptr inbounds i16, i16* %p, i64 %iprom			%b = getelementptr inbounds i16, i16* %p, i64 %iprom
	store i16 0, i16* %b, align 4			store i16 0, i16* %b, align 4
	Show All 25 Lines
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[CMPS:%.*]] = icmp sgt i32 [[INC]], 16			; CHECK-NEXT: [[CMPS:%.*]] = icmp sgt i32 [[INC]], 16
	; CHECK-NEXT: br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label [[FOR_END:%.*]]
	; CHECK: for.body.backedge:			; CHECK: for.body.backedge:
	; CHECK-NEXT: br label [[FOR_BODY]]			; CHECK-NEXT: br label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
				; TAILFOLD-LABEL: @multiple_latch1(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_BODY:%.*]]
				; TAILFOLD: for.body:
				; TAILFOLD-NEXT: [[I_02:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY_BACKEDGE:%.]] ]
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I_02]], 1
				; TAILFOLD-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], 16
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_BODY_BACKEDGE]], label [[FOR_SECOND:%.*]]
				; TAILFOLD: for.second:
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I_02]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[CMPS:%.*]] = icmp sgt i32 [[INC]], 16
				; TAILFOLD-NEXT: br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label [[FOR_END:%.*]]
				; TAILFOLD: for.body.backedge:
				; TAILFOLD-NEXT: br label [[FOR_BODY]]
				; TAILFOLD: for.end:
				; TAILFOLD-NEXT: ret i32 0
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body.backedge]			%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body.backedge]
	%inc = add nsw i32 %i.02, 1			%inc = add nsw i32 %i.02, 1
	%cmp = icmp slt i32 %inc, 16			%cmp = icmp slt i32 %inc, 16
	br i1 %cmp, label %for.body.backedge, label %for.second			br i1 %cmp, label %for.body.backedge, label %for.second
	Show All 30 Lines
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I_02]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I_02]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[CMPS:%.*]] = icmp sgt i32 [[INC]], 16			; CHECK-NEXT: [[CMPS:%.*]] = icmp sgt i32 [[INC]], 16
	; CHECK-NEXT: br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
				; TAILFOLD-LABEL: @multiple_latch2(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[FOR_BODY:%.*]]
				; TAILFOLD: for.body:
				; TAILFOLD-NEXT: [[I_02:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY_BACKEDGE:%.]] ]
				; TAILFOLD-NEXT: [[INC]] = add nsw i32 [[I_02]], 1
				; TAILFOLD-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], 16
				; TAILFOLD-NEXT: br i1 [[CMP]], label [[FOR_BODY_BACKEDGE]], label [[FOR_SECOND:%.*]]
				; TAILFOLD: for.body.backedge:
				; TAILFOLD-NEXT: br label [[FOR_BODY]]
				; TAILFOLD: for.second:
				; TAILFOLD-NEXT: [[IPROM:%.*]] = sext i32 [[I_02]] to i64
				; TAILFOLD-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]
				; TAILFOLD-NEXT: store i16 0, i16* [[B]], align 4
				; TAILFOLD-NEXT: [[CMPS:%.*]] = icmp sgt i32 [[INC]], 16
				; TAILFOLD-NEXT: br i1 [[CMPS]], label [[FOR_BODY_BACKEDGE]], label [[FOR_END:%.*]]
				; TAILFOLD: for.end:
				; TAILFOLD-NEXT: ret i32 0
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]			%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]
	%inc = add nsw i32 %i.02, 1			%inc = add nsw i32 %i.02, 1
	%cmp = icmp slt i32 %inc, 16			%cmp = icmp slt i32 %inc, 16
	br i1 %cmp, label %for.body, label %for.second			br i1 %cmp, label %for.body, label %for.second

	for.second:			for.second:
	%iprom = sext i32 %i.02 to i64			%iprom = sext i32 %i.02 to i64
	%b = getelementptr inbounds i16, i16* %p, i64 %iprom			%b = getelementptr inbounds i16, i16* %p, i64 %iprom
	store i16 0, i16* %b, align 4			store i16 0, i16* %b, align 4
	%cmps = icmp sgt i32 %inc, 16			%cmps = icmp sgt i32 %inc, 16
	br i1 %cmps, label %for.body, label %for.end			br i1 %cmps, label %for.body, label %for.end

	for.end:			for.end:
	ret i32 0			ret i32 0
	}			}

	declare void @foo()
				; Check interaction between block predication and early exits. We need the
				; condition on the early exit to remain dead (i.e. not be used when forming
				; the predicate mask).
				define void @scalar_predication(float* %addr) {
				; CHECK-LABEL: @scalar_predication(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[PRED_STORE_CONTINUE2:%.*]] ]
				; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[PRED_STORE_CONTINUE2]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr float, float [[ADDR:%.*]], i64 [[TMP0]]
				; CHECK-NEXT: [[TMP2:%.]] = getelementptr float, float [[TMP1]], i32 0
				; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[TMP2]] to <2 x float>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
				; CHECK-NEXT: [[TMP4:%.*]] = fcmp oeq <2 x float> [[WIDE_LOAD]], zeroinitializer
				; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i1> [[TMP4]], <i1 true, i1 true>
				; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP5]], i32 0
				; CHECK-NEXT: br i1 [[TMP6]], label [[PRED_STORE_IF:%.]], label [[PRED_STORE_CONTINUE:%.]]
				; CHECK: pred.store.if:
				; CHECK-NEXT: store float 1.000000e+01, float* [[TMP1]], align 4
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE]]
				; CHECK: pred.store.continue:
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i1> [[TMP5]], i32 1
				; CHECK-NEXT: br i1 [[TMP7]], label [[PRED_STORE_IF1:%.*]], label [[PRED_STORE_CONTINUE2]]
				; CHECK: pred.store.if1:
				; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr float, float [[ADDR]], i64 [[TMP8]]
				; CHECK-NEXT: store float 1.000000e+01, float* [[TMP9]], align 4
				; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]
				; CHECK: pred.store.continue2:
				; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
				; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
				; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200
				; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 201, 200
				; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
				; CHECK: loop.header:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
				; CHECK-NEXT: [[GEP:%.]] = getelementptr float, float [[ADDR]], i64 [[IV]]
				; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200
				; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP_BODY:%.*]]
				; CHECK: loop.body:
				; CHECK-NEXT: [[TMP11:%.]] = load float, float [[GEP]], align 4
				; CHECK-NEXT: [[PRED:%.*]] = fcmp oeq float [[TMP11]], 0.000000e+00
				; CHECK-NEXT: br i1 [[PRED]], label [[LOOP_LATCH]], label [[THEN:%.*]]
				; CHECK: then:
				; CHECK-NEXT: store float 1.000000e+01, float* [[GEP]], align 4
				; CHECK-NEXT: br label [[LOOP_LATCH]]
				; CHECK: loop.latch:
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: br label [[LOOP_HEADER]], [[LOOP9:!llvm.loop !.*]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				; TAILFOLD-LABEL: @scalar_predication(
				; TAILFOLD-NEXT: entry:
				; TAILFOLD-NEXT: br label [[LOOP_HEADER:%.*]]
				; TAILFOLD: loop.header:
				; TAILFOLD-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]
				; TAILFOLD-NEXT: [[GEP:%.]] = getelementptr float, float [[ADDR:%.*]], i64 [[IV]]
				; TAILFOLD-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200
				; TAILFOLD-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.]], label [[LOOP_BODY:%.]]
				; TAILFOLD: loop.body:
				; TAILFOLD-NEXT: [[TMP0:%.]] = load float, float [[GEP]], align 4
				; TAILFOLD-NEXT: [[PRED:%.*]] = fcmp oeq float [[TMP0]], 0.000000e+00
				; TAILFOLD-NEXT: br i1 [[PRED]], label [[LOOP_LATCH]], label [[THEN:%.*]]
				; TAILFOLD: then:
				; TAILFOLD-NEXT: store float 1.000000e+01, float* [[GEP]], align 4
				; TAILFOLD-NEXT: br label [[LOOP_LATCH]]
				; TAILFOLD: loop.latch:
				; TAILFOLD-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; TAILFOLD-NEXT: br label [[LOOP_HEADER]]
				; TAILFOLD: exit:
				; TAILFOLD-NEXT: ret void
				;
				entry:
				br label %loop.header

				loop.header:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
				%gep = getelementptr float, float* %addr, i64 %iv
				%exitcond.not = icmp eq i64 %iv, 200
				br i1 %exitcond.not, label %exit, label %loop.body

				loop.body:
				%0 = load float, float* %gep, align 4
				%pred = fcmp oeq float %0, 0.0
				br i1 %pred, label %loop.latch, label %then

				then:
				store float 10.0, float* %gep, align 4
				br label %loop.latch

				loop.latch:
				%iv.next = add nuw nsw i64 %iv, 1
				br label %loop.header

				exit:
				ret void
				}

llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll

	; RUN: opt < %s -loop-vectorize -debug-only=loop-vectorize -S -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -debug-only=loop-vectorize -S -disable-output 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	; Make sure LV legal bails out when the exiting block != loop latch.
	; CHECK-LABEL: "latch_is_not_exiting"
	; CHECK: LV: Not vectorizing: The exiting block is not the loop latch.
	define i32 @latch_is_not_exiting() {
	entry:
	br label %for.body

	for.body:
	%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]
	%inc = add nsw i32 %i.02, 1
	%cmp = icmp slt i32 %inc, 16
	br i1 %cmp, label %for.body, label %for.second

	for.second:
	%cmps = icmp sgt i32 %inc, 16
	br i1 %cmps, label %for.body, label %for.end

	for.end:
	ret i32 0
	}

	AyalUnsubmitted Not Done Reply Inline Actions Would it be useful to keep this (single exiting, double latched(?)) test? Ayal: Would it be useful to keep this (single exiting, double latched(?)) test?
	reamesAuthorUnsubmitted Done Reply Inline Actions Oh, I'd missed the fact this was a double latch, not just a non exiting latch. I'll add a test for that case in loop-form.ll along with the others and rebase. reames: Oh, I'd missed the fact this was a double latch, not just a non exiting latch. I'll add a test…
	; Make sure LV legal bails out when there is no exiting block			; Make sure LV legal bails out when there is no exiting block
	; CHECK-LABEL: "no_exiting_block"			; CHECK-LABEL: "no_exiting_block"
	; CHECK: LV: Not vectorizing: The loop must have an exiting block.			; CHECK: LV: Not vectorizing: The loop must have a unique exit block.
	define i32 @no_exiting_block() {			define i32 @no_exiting_block() {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]			%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]
	%inc = add nsw i32 %i.02, 1			%inc = add nsw i32 %i.02, 1
	%cmp = icmp slt i32 %inc, 16			%cmp = icmp slt i32 %inc, 16
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines