This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
3/5
LoopVectorizationLegality.cpp
6/8
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
control-flow.ll
-
loop-form.ll
1/2
loop-legality-checks.ll

Differential D93317

[LV] Vectorize (some) early and multiple exit loops
ClosedPublic

Authored by reames on Dec 15 2020, 10:38 AM.

Download Raw Diff

Details

Reviewers

Ayal
fhahn
anna

Commits

rGe4df6a40dad6: [LV] Vectorize (some) early and multiple exit loops

Summary

This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways:

single exit loops which are not bottom tested
multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later)

The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts.

The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration.

The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical).

Diff Detail

Event Timeline

reames created this revision.Dec 15 2020, 10:38 AM

Herald added subscribers: dantrushin, javed.absar, bollu and 2 others. · View Herald TranscriptDec 15 2020, 10:38 AM

reames requested review of this revision.Dec 15 2020, 10:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2020, 10:38 AM

Harbormaster completed remote builds in B82499: Diff 311960.Dec 15 2020, 10:38 AM

reames added a reviewer: anna.Dec 15 2020, 10:39 AM

reames mentioned this in rG99ac8868cfb4: [tests][LV] precommit tests for D93317.Dec 15 2020, 10:55 AM

rebase over landed tests

reames edited the summary of this revision. (Show Details)Dec 15 2020, 11:02 AM

Update tests to cover cases where we can't vectorize due to either a) size, or b) predication.

Doing this revealed that the handling of the predicate don't vectorize option is broken in the patch. (We correctly vectorize with a scalar epilogue where the user intent was not to vectorize.) Fix forthcoming.

(Note - comment edited as I'd originally misunderstood scope of work to address tail fold case above)

Rebase over a81db8b31. While that change is NFC, the code structure makes it much easier to disable vectorization when predication is demanded and we can't provide it.

One further simplification enabled by previous rebase.

ping

Nice leverage of requiresScalarEpilogue!
Looks good to me, adding some minor comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1098–1105	To avoid confusing exit/ing block terms: `// We currently must have a single "exit block" after the loop. Note that multiple "exiting blocks" inside the loop are allowed, provided they all reach the single exit block.`
1105	nit: a[n] unique
1120	Clarify failure reason, e.g., "The loop must have no live-out values if it has more than one exiting block" ?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1555	Checking that the only exit is the latch can be done (alternatively, literally) by `if (TheLoop->getExitingBlock() == TheLoop->getLoopLatch())`
3011	To avoid confusing exit/ing block/loop terms: `That is, if the loop contains multiple exiting blocks, or a single exiting block which is not the latch.`
5499–5505	// We can't vectorize anything but a bottom tested loop without a scalar epilogue. Perhaps better stated as: `// The only loops we can vectorize without a scalar epilogue, are loops with a bottom-test and a single exiting block.`
5501	nit: i[t]eration
5504	If predication is preferred over a scalar epilog, but the latter is not forbidden (i.e., the CM_ScalarEpilogueNotNeededUsePredicate case), we could "fallback to a vectorization with a scalar epilogue" here, instead of bailing out, as done below? Can test `if (TheLoop->getExitingBlock() != TheLoop->getLatchBlock())`
llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll
24	Would it be useful to keep this (single exiting, double latched(?)) test?

Thank you very much for putting up this patch! This looks like a good start.

I think there are some remaining code-gen issue, e.g. something like the example below leads to a verifier failure, when built with opt -loop-vectorize -force-vector-width=2. I didn't have time to take a closer look at what might cause the failure yet.

define void @test(float* %addr) {
entry:
  br label %loop.header

loop.header:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop.latch ]
  %gep = getelementptr float, float* %addr, i64 %iv
  %exitcond.not = icmp eq i64 %iv, 200
  br i1 %exitcond.not, label %exit, label %loop.body

loop.body:
  %0 = load float, float* %gep, align 4
  br i1 undef, label %loop.latch, label %then

then:
  store float 10.0, float* %gep, align 4
  br label %loop.latch

loop.latch:
  %iv.next = add nuw nsw i64 %iv, 1
  br label %loop.header

exit:
  ret void
}

This revision now requires changes to proceed.Dec 22 2020, 8:56 AM

Ayal, thanks for all the great wordsmith comments!

Florian, I can confirm the crash with your test case, will investigate.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1555	I went ahead and switched since you seem to have a preference, but in general, these are not the same. Consider an infinite loop with two latch blocks. Such a loop can't reach here, but it requires context to know that. The form of the check I used is context free. Doesn't really matter here, so I'll go with your preference.
llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll
24	Oh, I'd missed the fact this was a double latch, not just a non exiting latch. I'll add a test for that case in loop-form.ll along with the others and rebase.

Incorporate wording/style comments from Ayal, test rebase pending.

reames mentioned this in rGf106b281be24: [tests] precommit a test mentioned in review for D93317.Dec 22 2020, 9:49 AM

JFYI, multiple latch tests added in f106b2, but they don't impact the diff as SCEV can't prove exit counts and thus we don't vectorize.

Now to track down the problem Florian found.

Ok, I see what's going on w/Florian's example. It's an interaction with block predication and early exits. Essentially, block predication expects to be able to add uses of any condition in the loop, and the dead code elimination has eliminated the exit condition in the vector loop. I can restrict the legality slightly to avoid this interaction easily enough, but I want to think a bit first if there's a clean integration.

Incorporate a fix for the issue Florian found. Essentially, we can't both treat single use exit conditions as dead, and allow predication to use them in a mask. Since we know they're evaluate to true for the entire vector body, we can simply ignore them when forming edge predicates.

Florian, any other edge cases you can think of? I'd completely missed that one. Thank you for finding it!

reames mentioned this in D93725: [LV] Relax assumption that LCSSA implies single entry.Dec 22 2020, 12:23 PM

This looks fine to me, thanks; would be good to get @fhahn approval too.

Wonder if a similar optimization may be applied to the loop unroller as well - discarding all exiting edges (keeping one in the latch only) from a single-latched "countable" loop, whose last iteration (or more) is peeled.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1555	Agreed, this form assumes there is a single latch, which is also implied by the comment, and can be asserted.
7911	Good catch and fix!

In D93317#2468673, @reames wrote:

Incorporate a fix for the issue Florian found. Essentially, we can't both treat single use exit conditions as dead, and allow predication to use them in a mask. Since we know they're evaluate to true for the entire vector body, we can simply ignore them when forming edge predicates.

Florian, any other edge cases you can think of? I'd completely missed that one. Thank you for finding it!

I think there are some scenarios when we break some LCSSA PHIs that use PHIs created during SCEV expansion when we generate the runtime checks. After a first glance, it appears like an issue after we add the conditional branch from the middle block. The example below should cause a verifier failure with opt -loop-vectorize -force-vector-width=4. I tried to reduce it a bit, but unfortunately it is still quite ugly.

define void @widget(i16** %arg, i64 %N) local_unnamed_addr {
bb:
  br label %bb1

bb1:                                              ; preds = %bb16, %bb1, %bb
  %tmp = load i16*, i16** %arg, align 8
  %tmp2 = load i16*, i16** %arg, align 8
  br i1 undef, label %bb1, label %bb3

bb3:                                              ; preds = %bb15, %bb1
  br i1 undef, label %bb16, label %bb4

bb4:                                              ; preds = %bb3
  br i1 undef, label %bb5, label %bb15

bb5:                                              ; preds = %bb9, %bb4
  %tmp6 = phi i64 [ %tmp7, %bb9 ], [ %N, %bb4 ]
  %tmp7 = add nsw i64 %tmp6, -1
  %tmp8 = icmp sgt i64 %tmp6, 1
  br i1 %tmp8, label %bb9, label %bb13

bb9:                                              ; preds = %bb5
  %tmp10 = getelementptr inbounds i16, i16* %tmp2, i64 %tmp7
  %tmp11 = load i16, i16* %tmp10, align 2
  %tmp12 = getelementptr inbounds i16, i16* %tmp, i64 0
  store i16 %tmp11, i16* %tmp12, align 2
  br label %bb5

bb13:                                             ; preds = %bb15, %bb5
  %tmp14 = load i16, i16* %tmp, align 2
  ret void

bb15:                                             ; preds = %bb4
  br i1 undef, label %bb13, label %bb3

bb16:                                             ; preds = %bb3
  br label %bb1
}

In D93317#2470271, @fhahn wrote:

In D93317#2468673, @reames wrote:

Florian, any other edge cases you can think of? I'd completely missed that one. Thank you for finding it!

I think there are some scenarios when we break some LCSSA PHIs that use PHIs created during SCEV expansion when we generate the runtime checks. After a first glance, it appears like an issue after we add the conditional branch from the middle block. The example below should cause a verifier failure with opt -loop-vectorize -force-vector-width=4. I tried to reduce it a bit, but unfortunately it is still quite ugly.

I can confirm the failure and will debug. Thank you again for finding a cornercase.

Florian, do you mind if I landed this under an off by default flag? I realize we have correctness issues outstanding, but it would be a lot easier to test this, and highlight the fixes one by one if I was working off checked in code. Once we'd worked through everything, I'd enable and remove the flag.

In D93317#2470271, @fhahn wrote:

I think there are some scenarios when we break some LCSSA PHIs that use PHIs created during SCEV expansion when we generate the runtime checks. After a first glance, it appears like an issue after we add the conditional branch from the middle block. The example below should cause a verifier failure with opt -loop-vectorize -force-vector-width=4. I tried to reduce it a bit, but unfortunately it is still quite ugly.

Hmm, if it is reduced a bit further by fusing bb9 into bb5, making the loop bottom-tested, same failure still occurs, regardless of this patch?

I.e., replacing:

bb5:                                              ; preds = %bb9, %bb4
  %tmp6 = phi i64 [ %tmp7, %bb9 ], [ %N, %bb4 ]
  %tmp7 = add nsw i64 %tmp6, -1
  %tmp8 = icmp sgt i64 %tmp6, 1
  br i1 %tmp8, label %bb9, label %bb13

bb9:                                              ; preds = %bb5
  %tmp10 = getelementptr inbounds i16, i16* %tmp2, i64 %tmp7
  %tmp11 = load i16, i16* %tmp10, align 2
  %tmp12 = getelementptr inbounds i16, i16* %tmp, i64 0
  store i16 %tmp11, i16* %tmp12, align 2
  br label %bb5

with:

bb5:                                              ; preds = %bb5, %bb4
  %tmp6 = phi i64 [ %tmp7, %bb5 ], [ %N, %bb4 ]
  %tmp7 = add nsw i64 %tmp6, -1
  %tmp8 = icmp sgt i64 %tmp6, 1
  %tmp10 = getelementptr inbounds i16, i16* %tmp2, i64 %tmp7
  %tmp11 = load i16, i16* %tmp10, align 2
  %tmp12 = getelementptr inbounds i16, i16* %tmp, i64 0
  store i16 %tmp11, i16* %tmp12, align 2
  br i1 %tmp8, label %bb5, label %bb13

In D93317#2470271, @fhahn wrote:

I can confirm the failure and will debug. Thank you again for finding a cornercase.

Florian, do you mind if I landed this under an off by default flag? I realize we have correctness issues outstanding, but it would be a lot easier to test this, and highlight the fixes one by one if I was working off checked in code. Once we'd worked through everything, I'd enable and remove the flag.

IMO it would be preferable to land this without a flag, so we ensure this gets wide testing and we can flush out & fix the remaining issues early. I think it is almost there. I'll look into the issue over the next few days and I suggest to circle back after the holiday weekend and either land the patch together with a fix or behind a flag.

In D93317#2471303, @Ayal wrote:

Hmm, if it is reduced a bit further by fusing bb9 into bb5, making the loop bottom-tested, same failure still occurs, regardless of this patch?

Thanks Ayal! I had a suspicion that this crash only got surfaced when using the patch, as the code related to setting up the middle block & co is not really touched by this patch. I'll try to take a look over the next few days.

fhahn mentioned this in rG0ea3749b3cde: [LV] Set up branch from middle block earlier..Dec 27 2020, 10:21 AM

LGTM, thanks! I pushed a small fix for the crash discussed earlier in 0ea3749b3cde. It should be a very safe/straight-forward fix, but I would appreciate taking a look post-review.

I also did some additional testing with loop rotation disabled, which should stress test parts of the multi-exit support, on SPEC2006 & MultiSource. That didn't surface any further issues, so I think it should be fine to land this without a flag for now.

There appears to be some training whitespace in the diff, it would be good to re-format before landing.

This revision is now accepted and ready to land.Dec 27 2020, 10:25 AM

Closed by commit rGe4df6a40dad6: [LV] Vectorize (some) early and multiple exit loops (authored by reames). · Explain WhyDec 28 2020, 9:41 AM

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGe4df6a40dad6: [LV] Vectorize (some) early and multiple exit loops.

Ayal, Florian, thank you for the efforts on this review. I really appreciate the active review.

As an aside: I realized that supporting tail folding is actually a lot easier than I thought. I didn't do it in this patch, but a future patch just needs to a) not clamp the iteration space if tail folding, and b) use the exit conditions to form predicate masks. Amusingly, it would have been easier to start with tail folding from the beginning if I'd realized that. :)

The next patch in this series is D93725.

aeubanks added a subscriber: aeubanks.Dec 28 2020, 10:08 AM

aeubanks added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1119	This doesn't compile on Windows: http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio, I've reverted this change.

aeubanks added a reverting change: rG4ffcd4fe9ac2: Revert "[LV] Vectorize (some) early and multiple exit loops".Dec 28 2020, 10:09 AM

fhahn added inline comments.Jan 1 2021, 6:02 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1119	I missed this during the initial review, but we might derefence a nullptr here, because when `DoExtraAnalysis` is true, `ExitBB` may be null. I pushed d9f306aa52fe to only execute the checks in the `else` branch of the `!ExitBB` check

In D93317#2472174, @fhahn wrote:

LGTM, thanks! I pushed a small fix for the crash discussed earlier in 0ea3749b3cde. It should be a very safe/straight-forward fix, but I would appreciate taking a look post-review.

Thanks, good catch! Added a couple of post-commit nits.

reames mentioned this in rG9f61fbd75ae1: [LV] Relax assumption that LCSSA implies single entry.Jan 12 2021, 12:35 PM

Ayal mentioned this in D103700: [LV] Fix bug when unrolling (only) a loop with non-latch exit.Jun 6 2021, 11:55 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

27 lines

LoopVectorize.cpp

49 lines

test/

Transforms/

LoopVectorize/

control-flow.ll

2 lines

loop-form.ll

88 lines

loop-legality-checks.ll

23 lines

Diff 311967

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 1,089 Lines • ▼ Show 20 Lines	reportVectorizationFailure("The loop must have a single backedge",
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);		"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}

// We must have a single exiting block.		// We must have a single exiting block. Note that this allows multiple
if (!Lp->getExitingBlock()) {		// exits provided they all exit to the same block.
reportVectorizationFailure("The loop must have an exiting block",		// TODO: This restriction can be relaxed in the near future, it's here solely
		// to allow separation of changes for review. We need to generalize the phi
		// update logic in a number of places.
		BasicBlock *ExitBB = Lp->getUniqueExitBlock();
		if (!ExitBB) {
		reportVectorizationFailure("The loop must have an unique exit block",
		AyalUnsubmitted Done Reply Inline Actions To avoid confusing exit/ing block terms: `// We currently must have a single "exit block" after the loop. Note that multiple "exiting blocks" inside the loop are allowed, provided they all reach the single exit block.` Ayal: To avoid confusing exit/ing block terms: `// We currently must have a single "exit block" after…
		AyalUnsubmitted Done Reply Inline Actions nit: a[n] unique Ayal: nit: a[n] unique
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);		"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}

// We only handle bottom-tested loops, i.e. loop in which the condition is		// The existing code assumes that LCSSA implies that phis are single entry
// checked at the end of each iteration. With that we can assume that all		// (which was true when we had at most a single exiting edge from the latch).
// instructions in the loop are executed the same number of times.		// In general, there's nothing which prevents an LCSSA phi in exit block from
if (Lp->getExitingBlock() != Lp->getLoopLatch()) {		// having two or more values if there are multiple exiting edges leading to
reportVectorizationFailure("The exiting block is not the loop latch",		// the exit block. (TODO: implement general case)
		if (!empty(ExitBB->phis()) && !ExitBB->getSinglePredecessor()) {
		aeubanksUnsubmitted Not Done Reply Inline Actions This doesn't compile on Windows: http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio, I've reverted this change. aeubanks: This doesn't compile on Windows: http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.
		fhahnUnsubmitted Not Done Reply Inline Actions I missed this during the initial review, but we might derefence a nullptr here, because when `DoExtraAnalysis` is true, `ExitBB` may be null. I pushed d9f306aa52fe to only execute the checks in the `else` branch of the `!ExitBB` check fhahn: I missed this during the initial review, but we might derefence a nullptr here, because when…
		reportVectorizationFailure("The loop must have an unique exit block",
		AyalUnsubmitted Done Reply Inline Actions Clarify failure reason, e.g., "The loop must have no live-out values if it has more than one exiting block" ? Ayal: Clarify failure reason, e.g., "The loop must have no live-out values if it has more than one…
"loop control flow is not understood by vectorizer",		"loop control flow is not understood by vectorizer",
"CFGNotUnderstood", ORE, TheLoop);		"CFGNotUnderstood", ORE, TheLoop);
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}

return Result;		return Result;
}		}

bool LoopVectorizationLegality::canVectorizeLoopNestCFG(		bool LoopVectorizationLegality::canVectorizeLoopNestCFG(
Loop *Lp, bool UseVPlanNativePath) {		Loop *Lp, bool UseVPlanNativePath) {
// Store the result and return it at the end instead of exiting early, in case		// Store the result and return it at the end instead of exiting early, in case
// allowExtraAnalysis is used to report multiple reasons for not vectorizing.		// allowExtraAnalysis is used to report multiple reasons for not vectorizing.
bool Result = true;		bool Result = true;
▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 830 Lines • ▼ Show 20 Lines	protected:
BasicBlock *LoopVectorPreHeader;		BasicBlock *LoopVectorPreHeader;

/// The scalar-loop preheader.		/// The scalar-loop preheader.
BasicBlock *LoopScalarPreHeader;		BasicBlock *LoopScalarPreHeader;

/// Middle Block between the vector and the scalar.		/// Middle Block between the vector and the scalar.
BasicBlock *LoopMiddleBlock;		BasicBlock *LoopMiddleBlock;

/// The ExitBlock of the scalar loop.		/// The (unique) ExitBlock of the scalar loop. Note that
		/// there can be multiple exiting edges reaching this block.
BasicBlock *LoopExitBlock;		BasicBlock *LoopExitBlock;

/// The vector loop body.		/// The vector loop body.
BasicBlock *LoopVectorBody;		BasicBlock *LoopVectorBody;

/// The scalar loop body.		/// The scalar loop body.
BasicBlock *LoopScalarBody;		BasicBlock *LoopScalarBody;

▲ Show 20 Lines • Show All 691 Lines • ▼ Show 20 Lines	public:
}		}

/// Get the interleaved access group that \p Instr belongs to.		/// Get the interleaved access group that \p Instr belongs to.
const InterleaveGroup<Instruction> *		const InterleaveGroup<Instruction> *
getInterleavedAccessGroup(Instruction *Instr) {		getInterleavedAccessGroup(Instruction *Instr) {
return InterleaveInfo.getInterleaveGroup(Instr);		return InterleaveInfo.getInterleaveGroup(Instr);
}		}

/// Returns true if an interleaved group requires a scalar iteration		/// Returns true if we're required to use a scalar epilogue for at least
/// to handle accesses with gaps, and there is nothing preventing us from		/// the final iteration of the original loop.
/// creating a scalar epilogue.
bool requiresScalarEpilogue() const {		bool requiresScalarEpilogue() const {
return isScalarEpilogueAllowed() && InterleaveInfo.requiresScalarEpilogue();		if (!isScalarEpilogueAllowed())
		return false;
		// If we might exit from anywhere but the latch, must run the exiting
		// iteration in scalar form.
		if (!TheLoop->getExitingBlock() \|\| !TheLoop->isRotatedForm())
		AyalUnsubmitted Done Reply Inline Actions Checking that the only exit is the latch can be done (alternatively, literally) by `if (TheLoop->getExitingBlock() == TheLoop->getLoopLatch())` Ayal: Checking that the only exit is the latch can be done (alternatively, literally) by `if (TheLoop…
		reamesAuthorUnsubmitted Done Reply Inline Actions I went ahead and switched since you seem to have a preference, but in general, these are not the same. Consider an infinite loop with two latch blocks. Such a loop can't reach here, but it requires context to know that. The form of the check I used is context free. Doesn't really matter here, so I'll go with your preference. reames: I went ahead and switched since you seem to have a preference, but in general, these are not…
		AyalUnsubmitted Not Done Reply Inline Actions Agreed, this form assumes there is a single latch, which is also implied by the comment, and can be asserted. Ayal: Agreed, this form assumes there is a single latch, which is also implied by the comment, and…
		return true;
		return InterleaveInfo.requiresScalarEpilogue();
}		}

/// Returns true if a scalar epilogue is not allowed due to optsize or a		/// Returns true if a scalar epilogue is not allowed due to optsize or a
/// loop hint annotation.		/// loop hint annotation.
bool isScalarEpilogueAllowed() const {		bool isScalarEpilogueAllowed() const {
return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed;		return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed;
}		}

▲ Show 20 Lines • Show All 1,340 Lines • ▼ Show 20 Lines	PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,
setDebugLocFromInst(Builder, OldInst);		setDebugLocFromInst(Builder, OldInst);

// Create i+1 and fill the PHINode.		// Create i+1 and fill the PHINode.
Value *Next = Builder.CreateAdd(Induction, Step, "index.next");		Value *Next = Builder.CreateAdd(Induction, Step, "index.next");
Induction->addIncoming(Start, L->getLoopPreheader());		Induction->addIncoming(Start, L->getLoopPreheader());
Induction->addIncoming(Next, Latch);		Induction->addIncoming(Next, Latch);
// Create the compare.		// Create the compare.
Value *ICmp = Builder.CreateICmpEQ(Next, End);		Value *ICmp = Builder.CreateICmpEQ(Next, End);
Builder.CreateCondBr(ICmp, L->getExitBlock(), Header);		Builder.CreateCondBr(ICmp, L->getUniqueExitBlock(), Header);

// Now we have two terminators. Remove the old one from the block.		// Now we have two terminators. Remove the old one from the block.
Latch->getTerminator()->eraseFromParent();		Latch->getTerminator()->eraseFromParent();

return Induction;		return Induction;
}		}

Value InnerLoopVectorizer::getOrCreateTripCount(Loop L) {		Value InnerLoopVectorizer::getOrCreateTripCount(Loop L) {
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {

// Now we need to generate the expression for the part of the loop that the		// Now we need to generate the expression for the part of the loop that the
// vectorized body will execute. This is equal to N - (N % Step) if scalar		// vectorized body will execute. This is equal to N - (N % Step) if scalar
// iterations are not required for correctness, or N - Step, otherwise. Step		// iterations are not required for correctness, or N - Step, otherwise. Step
// is equal to the vectorization factor (number of SIMD elements) times the		// is equal to the vectorization factor (number of SIMD elements) times the
// unroll factor (number of SIMD instructions).		// unroll factor (number of SIMD instructions).
Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");		Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");

// If there is a non-reversed interleaved group that may speculatively access		// There are two cases where we need to ensure (at least) the last iteration
// memory out-of-bounds, we need to ensure that there will be at least one		// runs in the scalar remainder loop. Thus, if the step evenly divides
// iteration of the scalar epilogue loop. Thus, if the step evenly divides
// the trip count, we set the remainder to be equal to the step. If the step		// the trip count, we set the remainder to be equal to the step. If the step
// does not evenly divide the trip count, no adjustment is necessary since		// does not evenly divide the trip count, no adjustment is necessary since
// there will already be scalar iterations. Note that the minimum iterations		// there will already be scalar iterations. Note that the minimum iterations
// check ensures that N >= Step.		// check ensures that N >= Step. The cases are:
		// 1) If there is a non-reversed interleaved group that may speculatively
		// access memory out-of-bounds.
		// 2) If any instruction may follow a conditionally taken exit. (e.g. due to
		// a multi exit loop, or a non-bottom tested single exit loop)
		AyalUnsubmitted Done Reply Inline Actions To avoid confusing exit/ing block/loop terms: `That is, if the loop contains multiple exiting blocks, or a single exiting block which is not the latch.` Ayal: To avoid confusing exit/ing block/loop terms: `That is, if the loop contains multiple exiting…
if (VF.isVector() && Cost->requiresScalarEpilogue()) {		if (VF.isVector() && Cost->requiresScalarEpilogue()) {
auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));		auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
R = Builder.CreateSelect(IsZero, Step, R);		R = Builder.CreateSelect(IsZero, Step, R);
}		}

VectorTripCount = Builder.CreateSub(TC, R, "n.vec");		VectorTripCount = Builder.CreateSub(TC, R, "n.vec");

return VectorTripCount;		return VectorTripCount;
▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	case InductionDescriptor::IK_NoInduction:
return nullptr;		return nullptr;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

Loop *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {		Loop *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopScalarBody = OrigLoop->getHeader();		LoopScalarBody = OrigLoop->getHeader();
LoopVectorPreHeader = OrigLoop->getLoopPreheader();		LoopVectorPreHeader = OrigLoop->getLoopPreheader();
LoopExitBlock = OrigLoop->getExitBlock();		LoopExitBlock = OrigLoop->getUniqueExitBlock();
assert(LoopExitBlock && "Must have an exit block");		assert(LoopExitBlock && "Must have an exit block");
assert(LoopVectorPreHeader && "Invalid loop structure");		assert(LoopVectorPreHeader && "Invalid loop structure");

LoopMiddleBlock =		LoopMiddleBlock =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,		SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
LI, nullptr, Twine(Prefix) + "middle.block");		LI, nullptr, Twine(Prefix) + "middle.block");
LoopScalarPreHeader =		LoopScalarPreHeader =
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,		SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
const InductionDescriptor &II,		const InductionDescriptor &II,
Value CountRoundDown, Value EndValue,		Value CountRoundDown, Value EndValue,
BasicBlock *MiddleBlock) {		BasicBlock *MiddleBlock) {
// There are two kinds of external IV usages - those that use the value		// There are two kinds of external IV usages - those that use the value
// computed in the last iteration (the PHI) and those that use the penultimate		// computed in the last iteration (the PHI) and those that use the penultimate
// value (the value that feeds into the phi from the loop latch).		// value (the value that feeds into the phi from the loop latch).
// We allow both, but they, obviously, have different values.		// We allow both, but they, obviously, have different values.

assert(OrigLoop->getExitBlock() && "Expected a single exit block");		assert(OrigLoop->getUniqueExitBlock() && "Expected a single exit block");

DenseMap<Value , Value > MissingVals;		DenseMap<Value , Value > MissingVals;

// An external user of the last iteration's value should see the value that		// An external user of the last iteration's value should see the value that
// the remainder loop uses to initialize its own IV.		// the remainder loop uses to initialize its own IV.
Value *PostInc = OrigPhi->getIncomingValueForBlock(OrigLoop->getLoopLatch());		Value *PostInc = OrigPhi->getIncomingValueForBlock(OrigLoop->getLoopLatch());
for (User *U : PostInc->users()) {		for (User *U : PostInc->users()) {
Instruction *UI = cast<Instruction>(U);		Instruction *UI = cast<Instruction>(U);
▲ Show 20 Lines • Show All 1,899 Lines • ▼ Show 20 Lines	case CM_ScalarEpilogueNotAllowedOptSize:
else		else
LLVM_DEBUG(dbgs() << "LV: Not allowing scalar epilogue due to low trip "		LLVM_DEBUG(dbgs() << "LV: Not allowing scalar epilogue due to low trip "
<< "count.\n");		<< "count.\n");

// Bail if runtime checks are required, which are not good when optimising		// Bail if runtime checks are required, which are not good when optimising
// for size.		// for size.
if (runtimeChecksRequired())		if (runtimeChecksRequired())
return None;		return None;

break;		break;
}		}

		// We can't vectorize anything but a bottom tested loop without a scalar
		// epilogue. Unless this is bottom tested, bail out.
		if (!TheLoop->getExitingBlock() \|\| !TheLoop->isRotatedForm())
		AyalUnsubmitted Done Reply Inline Actions nit: i[t]eration Ayal: nit: i[t]eration
		return None;

// Now try the tail folding		// Now try the tail folding
		AyalUnsubmitted Done Reply Inline Actions If predication is preferred over a scalar epilog, but the latter is not forbidden (i.e., the CM_ScalarEpilogueNotNeededUsePredicate case), we could "fallback to a vectorization with a scalar epilogue" here, instead of bailing out, as done below? Can test `if (TheLoop->getExitingBlock() != TheLoop->getLatchBlock())` Ayal: If predication is preferred over a scalar epilog, but the latter is not forbidden (i.e., the…

		AyalUnsubmitted Done Reply Inline Actions // We can't vectorize anything but a bottom tested loop without a scalar epilogue. Perhaps better stated as: `// The only loops we can vectorize without a scalar epilogue, are loops with a bottom-test and a single exiting block.` Ayal: > // We can't vectorize anything but a bottom tested loop without a scalar epilogue. Perhaps…
// Invalidate interleave groups that require an epilogue if we can't mask		// Invalidate interleave groups that require an epilogue if we can't mask
// the interleave-group.		// the interleave-group.
if (!useMaskedInterleavedAccesses(TTI)) {		if (!useMaskedInterleavedAccesses(TTI)) {
assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&		assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point");		"No decisions should have been taken at this point");
// Note: There is no need to invalidate any cost modeling decisions here, as		// Note: There is no need to invalidate any cost modeling decisions here, as
// non where taken so far.		// non where taken so far.
InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();		InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
▲ Show 20 Lines • Show All 2,389 Lines • ▼ Show 20 Lines	VPValue VPRecipeBuilder::createEdgeMask(BasicBlock Src, BasicBlock *Dst,
if (!BI->isConditional() \|\| BI->getSuccessor(0) == BI->getSuccessor(1))		if (!BI->isConditional() \|\| BI->getSuccessor(0) == BI->getSuccessor(1))
return EdgeMaskCache[Edge] = SrcMask;		return EdgeMaskCache[Edge] = SrcMask;

VPValue *EdgeMask = Plan->getOrAddVPValue(BI->getCondition());		VPValue *EdgeMask = Plan->getOrAddVPValue(BI->getCondition());
assert(EdgeMask && "No Edge Mask found for condition");		assert(EdgeMask && "No Edge Mask found for condition");

if (BI->getSuccessor(0) != Dst)		if (BI->getSuccessor(0) != Dst)
EdgeMask = Builder.createNot(EdgeMask);		EdgeMask = Builder.createNot(EdgeMask);

		AyalUnsubmitted Not Done Reply Inline Actions Good catch and fix! Ayal: Good catch and fix!
if (SrcMask) // Otherwise block in-mask is all-one, no need to AND.		if (SrcMask) // Otherwise block in-mask is all-one, no need to AND.
EdgeMask = Builder.createAnd(EdgeMask, SrcMask);		EdgeMask = Builder.createAnd(EdgeMask, SrcMask);

return EdgeMaskCache[Edge] = EdgeMask;		return EdgeMaskCache[Edge] = EdgeMask;
}		}

VPValue VPRecipeBuilder::createBlockInMask(BasicBlock BB, VPlanPtr &Plan) {		VPValue VPRecipeBuilder::createBlockInMask(BasicBlock BB, VPlanPtr &Plan) {
assert(OrigLoop->contains(BB) && "Block is not a part of a loop");		assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
▲ Show 20 Lines • Show All 953 Lines • ▼ Show 20 Lines	static ScalarEpilogueLowering getScalarEpilogueLowering(
bool PredicateOptDisabled = PreferPredicateOverEpilogue.getNumOccurrences() &&		bool PredicateOptDisabled = PreferPredicateOverEpilogue.getNumOccurrences() &&
!PreferPredicateOverEpilogue;		!PreferPredicateOverEpilogue;

// 2) Next, if disabling predication is requested on the command line, honour		// 2) Next, if disabling predication is requested on the command line, honour
// this and request a scalar epilogue.		// this and request a scalar epilogue.
if (PredicateOptDisabled)		if (PredicateOptDisabled)
return CM_ScalarEpilogueAllowed;		return CM_ScalarEpilogueAllowed;


		// For tail folding of loops which aren't solely bottom tested , we'd have
		// to handle the fact that not every instruction executes on the last
		// ieration. This will require a lane mask which varies through the
		// vector loop body. (TODO)
		if (!L->getExitingBlock() \|\| !L->isRotatedForm())
		return CM_ScalarEpilogueAllowed;

// 3) and 4) look if enabling predication is requested on the command line,		// 3) and 4) look if enabling predication is requested on the command line,
// with a loop hint, or if the TTI hook indicates this is profitable, request		// with a loop hint, or if the TTI hook indicates this is profitable, request
// predication.		// predication.
if (PreferPredicateOverEpilogue \|\|		if (PreferPredicateOverEpilogue \|\|
Hints.getPredicate() == LoopVectorizeHints::FK_Enabled \|\|		Hints.getPredicate() == LoopVectorizeHints::FK_Enabled \|\|
(TTI->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT,		(TTI->preferPredicateOverEpilogue(L, LI, SE, AC, TLI, DT,
LVL.getLAI()) &&		LVL.getLAI()) &&
Hints.getPredicate() != LoopVectorizeHints::FK_Disabled))		Hints.getPredicate() != LoopVectorizeHints::FK_Disabled))
▲ Show 20 Lines • Show All 517 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/control-flow.ll

	; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S -pass-remarks-missed='loop-vectorize' 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S -pass-remarks-missed='loop-vectorize' 2>&1 \| FileCheck %s

	; C/C++ code for control flow test			; C/C++ code for control flow test
	; int test(int *A, int Length) {			; int test(int *A, int Length) {
	; for (int i = 0; i < Length; i++) {			; for (int i = 0; i < Length; i++) {
	; if (A[i] > 10.0) goto end;			; if (A[i] > 10.0) goto end;
	; A[i] = 0;			; A[i] = 0;
	; }			; }
	; end:			; end:
	; return 0;			; return 0;
	; }			; }

	; CHECK: remark: source.cpp:5:9: loop not vectorized: loop control flow is not understood by vectorizer			; CHECK: remark: source.cpp:5:9: loop not vectorized: could not determine number of loop iterations
	; CHECK: remark: source.cpp:5:9: loop not vectorized			; CHECK: remark: source.cpp:5:9: loop not vectorized

	; CHECK: _Z4testPii			; CHECK: _Z4testPii
	; CHECK-NOT: x i32>			; CHECK-NOT: x i32>
	; CHECK: ret			; CHECK: ret

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/loop-form.ll

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

if.end:		if.end:
ret void		ret void
}		}

define void @early_exit(i16* %p, i32 %n) {		define void @early_exit(i16* %p, i32 %n) {
; CHECK-LABEL: @early_exit(		; CHECK-LABEL: @early_exit(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[N:%.]], 0
		; CHECK-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0
		; CHECK-NEXT: [[TMP1:%.*]] = add nuw i32 [[SMAX]], 1
		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i32 [[TMP1]], 2
		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK: vector.ph:
		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP1]], 2
		; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[N_MOD_VF]], 0
		; CHECK-NEXT: [[TMP3:%.*]] = select i1 [[TMP2]], i32 2, i32 [[N_MOD_VF]]
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP1]], [[TMP3]]
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i32> undef, i32 [[N]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> undef, <2 x i32> zeroinitializer
		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[TMP4:%.*]] = add i32 [[INDEX]], 0
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[INDEX]], 1
		; CHECK-NEXT: [[TMP6:%.*]] = icmp slt <2 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
		; CHECK-NEXT: [[TMP7:%.*]] = sext i32 [[TMP4]] to i64
		; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[TMP7]]
		; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0
		; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <2 x i16>*
		; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP10]], align 4
		; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
		; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]
		; CHECK: middle.block:
		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]
		; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
		; CHECK: scalar.ph:
		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[FOR_COND:%.*]]		; CHECK-NEXT: br label [[FOR_COND:%.*]]
; CHECK: for.cond:		; CHECK: for.cond:
; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]		; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]		; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64		; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]		; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
; CHECK-NEXT: store i16 0, i16* [[B]], align 4		; CHECK-NEXT: store i16 0, i16* [[B]], align 4
; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1		; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
; CHECK-NEXT: br label [[FOR_COND]]		; CHECK-NEXT: br label [[FOR_COND]], [[LOOP5:!llvm.loop !.*]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %for.cond		br label %for.cond

for.cond:		for.cond:
%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]		%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
Show All 11 Lines	if.end:
ret void		ret void
}		}


; multiple exit - no values inside the loop used outside		; multiple exit - no values inside the loop used outside
define void @multiple_unique_exit(i16* %p, i32 %n) {		define void @multiple_unique_exit(i16* %p, i32 %n) {
; CHECK-LABEL: @multiple_unique_exit(		; CHECK-LABEL: @multiple_unique_exit(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[N:%.]], 0
		; CHECK-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 0
		; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[SMAX]], 2096
		; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP1]], i32 [[SMAX]], i32 2096
		; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i32 [[UMIN]], 1
		; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ule i32 [[TMP2]], 2
		; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
		; CHECK: vector.ph:
		; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[TMP2]], 2
		; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i32 [[N_MOD_VF]], 0
		; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i32 2, i32 [[N_MOD_VF]]
		; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[TMP2]], [[TMP4]]
		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i32> undef, i32 [[N]], i32 0
		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i32> [[BROADCAST_SPLATINSERT]], <2 x i32> undef, <2 x i32> zeroinitializer
		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
		; CHECK: vector.body:
		; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i32> [ <i32 0, i32 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[INDEX]], 0
		; CHECK-NEXT: [[TMP6:%.*]] = add i32 [[INDEX]], 1
		; CHECK-NEXT: [[TMP7:%.*]] = icmp slt <2 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
		; CHECK-NEXT: [[TMP8:%.*]] = sext i32 [[TMP5]] to i64
		; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[TMP8]]
		; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i16, i16 [[TMP9]], i32 0
		; CHECK-NEXT: [[TMP11:%.]] = bitcast i16 [[TMP10]] to <2 x i16>*
		; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP11]], align 4
		; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
		; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
		; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
		; CHECK: middle.block:
		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]
		; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
		; CHECK: scalar.ph:
		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[FOR_COND:%.*]]		; CHECK-NEXT: br label [[FOR_COND:%.*]]
; CHECK: for.cond:		; CHECK: for.cond:
; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]		; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]		; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64		; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P:%.*]], i64 [[IPROM]]		; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
; CHECK-NEXT: store i16 0, i16* [[B]], align 4		; CHECK-NEXT: store i16 0, i16* [[B]], align 4
; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1		; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096		; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]]		; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP7:!llvm.loop !.*]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %for.cond		br label %for.cond

for.cond:		for.cond:
%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]		%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/loop-legality-checks.ll

	; RUN: opt < %s -loop-vectorize -debug-only=loop-vectorize -S -disable-output 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-vectorize -debug-only=loop-vectorize -S -disable-output 2>&1 \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	; Make sure LV legal bails out when the exiting block != loop latch.
	; CHECK-LABEL: "latch_is_not_exiting"
	; CHECK: LV: Not vectorizing: The exiting block is not the loop latch.
	define i32 @latch_is_not_exiting() {
	entry:
	br label %for.body

	for.body:
	%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]
	%inc = add nsw i32 %i.02, 1
	%cmp = icmp slt i32 %inc, 16
	br i1 %cmp, label %for.body, label %for.second

	for.second:
	%cmps = icmp sgt i32 %inc, 16
	br i1 %cmps, label %for.body, label %for.end

	for.end:
	ret i32 0
	}

	AyalUnsubmitted Not Done Reply Inline Actions Would it be useful to keep this (single exiting, double latched(?)) test? Ayal: Would it be useful to keep this (single exiting, double latched(?)) test?
	reamesAuthorUnsubmitted Done Reply Inline Actions Oh, I'd missed the fact this was a double latch, not just a non exiting latch. I'll add a test for that case in loop-form.ll along with the others and rebase. reames: Oh, I'd missed the fact this was a double latch, not just a non exiting latch. I'll add a test…
	; Make sure LV legal bails out when there is no exiting block			; Make sure LV legal bails out when there is no exiting block
	; CHECK-LABEL: "no_exiting_block"			; CHECK-LABEL: "no_exiting_block"
	; CHECK: LV: Not vectorizing: The loop must have an exiting block.			; CHECK: LV: Not vectorizing: The loop must have an unique exit block.
	define i32 @no_exiting_block() {			define i32 @no_exiting_block() {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]			%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ], [%inc, %for.second]
	%inc = add nsw i32 %i.02, 1			%inc = add nsw i32 %i.02, 1
	%cmp = icmp slt i32 %inc, 16			%cmp = icmp slt i32 %inc, 16
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines