This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/6
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
1/2
unroll_nonlatch.ll

Differential D103700

[LV] Fix bug when unrolling (only) a loop with non-latch exit
ClosedPublic

Authored by reames on Jun 4 2021, 8:50 AM.

Download Raw Diff

Details

Reviewers

fhahn
stefanp
Ayal

Commits

rGe49d65f36d66: [LV] Fix bug when unrolling (only) a loop with non-latch exit

Summary

If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so.

The included test case on an unmodified opt will access memory one past the expected bound.

I believe this to be the root cause of the issues seen with 3e5ce49e, but even if it isn't, it's definitely a bug.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Jun 4 2021, 8:50 AM

Herald added subscribers: zzheng, bollu, hiraditya, mcrosier. · View Herald TranscriptJun 4 2021, 8:50 AM

reames requested review of this revision.Jun 4 2021, 8:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2021, 8:50 AM

lebedev.ri added a subscriber: lebedev.ri.Jun 4 2021, 9:02 AM

lebedev.ri added inline comments.

llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
2	I don't think `bug-reduced.ll` is correct here

reames added inline comments.Jun 4 2021, 9:18 AM

llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
2	Er, oops. No. I'm surprised the autoupdate worked with that typo. That should be %s.

Fix silly typo in test.

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

In D103700#2799338, @nemanjai wrote:

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Harbormaster completed remote builds in B107694: Diff 349894.Jun 4 2021, 10:02 AM

In D103700#2799340, @reames wrote:

In D103700#2799338, @nemanjai wrote:

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

Ayal added inline comments.Jun 6 2021, 11:55 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3190	What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog. Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1. Also curious if D94892 should be applicable to epilog vectorization, as commented there.

In D103700#2799532, @nemanjai wrote:

In D103700#2799340, @reames wrote:

In D103700#2799338, @nemanjai wrote:

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

I added both patches and I did a quick test on the Big Endian Power PC buildbot machine.
This patch does seem to fix the original issue. However, after adding the two patches I have 4 new LIT test failues:

LLVM :: Transforms/LoopVectorize/first-order-recurrence-complex.ll
LLVM :: Transforms/LoopVectorize/loop-form.ll
LLVM :: Transforms/LoopVectorize/multiple-exits-versioning.ll
LLVM :: Transforms/LoopVectorize/unroll_nonlatch.ll

reames added inline comments.Jun 7 2021, 9:28 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3190	What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog. We could reasonably decide that such a loop does not require a scalar epilogue, but if the cost model decides it does (as it might today), code generation had better be consistent about it. That's all this patch does. Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1. I don't follow your comment here. As demonstrated by the test case, we do need to generate an epilogue loop in some cases even when not vectorizing. Also curious if D94892 should be applicable to epilog vectorization, as commented there. (will reply there)

In D103700#2802779, @stefanp wrote:
In D103700#2799532, @nemanjai wrote:

In D103700#2799340, @reames wrote:

In D103700#2799338, @nemanjai wrote:

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

I added both patches and I did a quick test on the Big Endian Power PC buildbot machine.
This patch does seem to fix the original issue. However, after adding the two patches I have 4 new LIT test failues:
LLVM :: Transforms/LoopVectorize/first-order-recurrence-complex.ll
LLVM :: Transforms/LoopVectorize/loop-form.ll
LLVM :: Transforms/LoopVectorize/multiple-exits-versioning.ll
LLVM :: Transforms/LoopVectorize/unroll_nonlatch.ll

The original patch would definitely need rebased over changed tests, so that's not surprising.

Ayal added inline comments.Jun 7 2021, 3:05 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3190	What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog. We could reasonably decide that such a loop does not require a scalar epilogue, but if the cost model decides it does (as it might today), code generation had better be consistent about it. That's all this patch does. We do decide rightfully that such a loop does not require a scalar epilogue today; this patch causes it to have a scalar epilogue. Take for example test case even_load_static_tc from interleaved-accesses.ll, where instead of "-force-vector-width=4 -force-vector-interleave=1" (which requires an epilog) we run it with "-force-vector-width=1 -force-vector-interleave=4" (which does not require an epilog). Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1. I don't follow your comment here. As demonstrated by the test case, we do need to generate an epilogue loop in some cases even when not vectorizing. Sorry if the comment is unclear. The logic for forcing a scalar epilog should conceptually be ((getExitingBlock() != getLoopLatch()) \|\| (VF.isVector() && InterleaveInfo.requiresScalarEpilogue())) , right? Asserting that `isScalarEpilogueAllowed()` holds if the above condition is true: because if an epilog is not allowed, Legal should have prevented vectoring a loop with a non-latch exiting block, and interleave groups requiring epilog should not have been formed. To have `Cost->requiresScalarEpilogue()` fully convey this logic, one could directly pass it VF or VF.isVector(); or reset InterleaveInfo's requiresScalarEpilogue earlier when VF is set to 1, e.g., by invalidating all its groups.

@Ayal If I'm understanding you even partially correctly, it sounds like you're raising a code quality issue. That is, we may generate a dead epilogue loop (e.g. with a condition statically known to result in the epilogue being untaken, but emitted as a condition) when we didn't need to. Is that correct?

If so, I request that we land this patch - which is fixing a functional bug - and handle the code quality issue in a separate patch.

I really don't understand the interweaving code, and would really rather not have to. :)

raghesh added a subscriber: raghesh.Jun 16 2021, 9:56 PM

In D103700#2813906, @reames wrote:

@Ayal If I'm understanding you even partially correctly, it sounds like you're raising a code quality issue. That is, we may generate a dead epilogue loop (e.g. with a condition statically known to result in the epilogue being untaken, but emitted as a condition) when we didn't need to. Is that correct?

No, the emitted code in this case would be explicitly leaving some iterations to the epilogue even if the trip count is a multiple of the unroll factor. So the epilogue is alive and will not be optimized away later.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1576	To have `Cost->requiresScalarEpilogue()` fully convey this logic, one could directly pass it VF That seems like the simplest way to both fix the bug and avoid pessimizing the unroll-only case, i.e.: return VF.isVector() && InterleaveInfo.requiresScalarEpilogue();

Attempt to incorporate review feedback.

@Ayal, @gilr - I still don't understand your comments on the review. I think this was what you were asking for (right?), and it doesn't seem to do any harm, but I'm not sure why you think this was needed for correctness. That doesn't seem to match the code. If this was not what you had in mind, can I ask you to simply fix the bug proven by the test case yourself? We don't appear to be making progress here in terms of me understanding the point you're getting at.

Harbormaster completed remote builds in B110720: Diff 354097.Jun 23 2021, 4:01 PM

(include full test diff as opposed to diff from last revision)

Harbormaster completed remote builds in B110721: Diff 354099.Jun 23 2021, 4:35 PM

This was indeed what was suggested, thanks for following up! The patch correctly fixes the bug raised by the test case, right? I.e., the last iteration of the "vector" (unrolled) loop is peeled into UF(*VF) iterations of the scalar epilog.
While keeping even_load_static_tc test case of interleaved-accesses.ll without this peeling, when run with "-force-vector-width=1 -force-vector-interleave=4"; unlike the original patch. Suggest to include this test case "run" in the patch.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8321–8322	"VF" should be "`ForEpilogue ? EPI.EpilogueVF : VF`" here, judging from VFactor above.
8465–8466	"VF" should be EPI.EpilogueVF here.

In D103700#2839620, @Ayal wrote:

This was indeed what was suggested, thanks for following up! The patch correctly fixes the bug raised by the test case, right? I.e., the last iteration of the "vector" (unrolled) loop is peeled into UF(*VF) iterations of the scalar epilog.
While keeping even_load_static_tc test case of interleaved-accesses.ll without this peeling, when run with "-force-vector-width=1 -force-vector-interleave=4"; unlike the original patch. Suggest to include this test case "run" in the patch.

@Ayal - I'm sorry, but I'm having a really hard time understanding you.

To restate, the correctness issue this is fixing is that the "vectorized" (really unrolled) loop in the test case was running more iterations than the original loop. It did this because different parts of the vectorizer disagreed about whether a scalar epilogue loop was required. That disagreement is the bug being fixed. There is no change in peeling policy intended or desired. This is a fix for a miscompile, nothing more.

@Ayal - I don't feel that we are making useful progress on this review. You seem to have a very specific view of how you want this fixed, and that is not translating well for me. Instead of continuing to waste both our times, can I ask you to fix the bug yourself? At the moment, I feel we're both wasting time on this conversation.

In D103700#2841557, @reames wrote:

In D103700#2839620, @Ayal wrote:

This was indeed what was suggested, thanks for following up! The patch correctly fixes the bug raised by the test case, right? I.e., the last iteration of the "vector" (unrolled) loop is peeled into UF(*VF) iterations of the scalar epilog.
While keeping even_load_static_tc test case of interleaved-accesses.ll without this peeling, when run with "-force-vector-width=1 -force-vector-interleave=4"; unlike the original patch. Suggest to include this test case "run" in the patch.

@Ayal - I'm sorry, but I'm having a really hard time understanding you.

To restate, the correctness issue this is fixing is that the "vectorized" (really unrolled) loop in the test case was running more iterations than the original loop. It did this because different parts of the vectorizer disagreed about whether a scalar epilogue loop was required. That disagreement is the bug being fixed. There is no change in peeling policy intended or desired. This is a fix for a miscompile, nothing more.

@Ayal - I don't feel that we are making useful progress on this review. You seem to have a very specific view of how you want this fixed, and that is not translating well for me. Instead of continuing to waste both our times, can I ask you to fix the bug yourself? At the moment, I feel we're both wasting time on this conversation.

@reames, your patch is practically ready to land, it just needs Epilog vectorizer to pass the correct VF as noted; do we agree?
The suggestion to include an extra test case can be ignored, if desired.

reames mentioned this in rG716d2fedbfc8: Precommit miscompile test from D103700.Jun 28 2021, 4:01 PM

Address review comment. Additionally rebase over landed test, don't know why I didn't land that a while ago.

Harbormaster completed remote builds in B111397: Diff 355058.Jun 28 2021, 5:11 PM

Ayal accepted this revision.Jun 28 2021, 11:15 PM

This revision is now accepted and ready to land.Jun 28 2021, 11:15 PM

This revision was landed with ongoing or failed builds.Jun 29 2021, 8:04 AM

Closed by commit rGe49d65f36d66: [LV] Fix bug when unrolling (only) a loop with non-latch exit (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGe49d65f36d66: [LV] Fix bug when unrolling (only) a loop with non-latch exit.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

35 lines

test/

Transforms/

LoopVectorize/

unroll_nonlatch.ll

16 lines

Diff 355249

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,560 Lines • ▼ Show 20 Lines	public:
/// Get the interleaved access group that \p Instr belongs to.		/// Get the interleaved access group that \p Instr belongs to.
const InterleaveGroup<Instruction> *		const InterleaveGroup<Instruction> *
getInterleavedAccessGroup(Instruction *Instr) {		getInterleavedAccessGroup(Instruction *Instr) {
return InterleaveInfo.getInterleaveGroup(Instr);		return InterleaveInfo.getInterleaveGroup(Instr);
}		}

/// Returns true if we're required to use a scalar epilogue for at least		/// Returns true if we're required to use a scalar epilogue for at least
/// the final iteration of the original loop.		/// the final iteration of the original loop.
bool requiresScalarEpilogue() const {		bool requiresScalarEpilogue(ElementCount VF) const {
if (!isScalarEpilogueAllowed())		if (!isScalarEpilogueAllowed())
return false;		return false;
// If we might exit from anywhere but the latch, must run the exiting		// If we might exit from anywhere but the latch, must run the exiting
// iteration in scalar form.		// iteration in scalar form.
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())		if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch())
return true;		return true;
return InterleaveInfo.requiresScalarEpilogue();		return VF.isVector() && InterleaveInfo.requiresScalarEpilogue();
		gilrUnsubmitted Not Done Reply Inline Actions To have `Cost->requiresScalarEpilogue()` fully convey this logic, one could directly pass it VF That seems like the simplest way to both fix the bug and avoid pessimizing the unroll-only case, i.e.: return VF.isVector() && InterleaveInfo.requiresScalarEpilogue(); gilr: >To have `Cost->requiresScalarEpilogue()` fully convey this logic, one could directly pass it…
}		}

/// Returns true if a scalar epilogue is not allowed due to optsize or a		/// Returns true if a scalar epilogue is not allowed due to optsize or a
/// loop hint annotation.		/// loop hint annotation.
bool isScalarEpilogueAllowed() const {		bool isScalarEpilogueAllowed() const {
return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed;		return ScalarEpilogueStatus == CM_ScalarEpilogueAllowed;
}		}

▲ Show 20 Lines • Show All 1,591 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {

// Now we need to generate the expression for the part of the loop that the		// Now we need to generate the expression for the part of the loop that the
// vectorized body will execute. This is equal to N - (N % Step) if scalar		// vectorized body will execute. This is equal to N - (N % Step) if scalar
// iterations are not required for correctness, or N - Step, otherwise. Step		// iterations are not required for correctness, or N - Step, otherwise. Step
// is equal to the vectorization factor (number of SIMD elements) times the		// is equal to the vectorization factor (number of SIMD elements) times the
// unroll factor (number of SIMD instructions).		// unroll factor (number of SIMD instructions).
Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");		Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");

// There are two cases where we need to ensure (at least) the last iteration		// There are cases where we must run at least one iteration in the remainder
// runs in the scalar remainder loop. Thus, if the step evenly divides		// loop. See the cost model for when this can happen. If the step evenly
// the trip count, we set the remainder to be equal to the step. If the step		// divides the trip count, we set the remainder to be equal to the step. If
// does not evenly divide the trip count, no adjustment is necessary since		// the step does not evenly divide the trip count, no adjustment is necessary
// there will already be scalar iterations. Note that the minimum iterations		// since there will already be scalar iterations. Note that the minimum
// check ensures that N >= Step. The cases are:		// iterations check ensures that N >= Step.
// 1) If there is a non-reversed interleaved group that may speculatively		if (Cost->requiresScalarEpilogue(VF)) {
		AyalUnsubmitted Not Done Reply Inline Actions What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog. Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1. Also curious if D94892 should be applicable to epilog vectorization, as commented there. Ayal: What if a loop has a single exiting block - the loop latch, and an interleave group that…
		reamesAuthorUnsubmitted Done Reply Inline Actions What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog. We could reasonably decide that such a loop does not require a scalar epilogue, but if the cost model decides it does (as it might today), code generation had better be consistent about it. That's all this patch does. Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1. I don't follow your comment here. As demonstrated by the test case, we do need to generate an epilogue loop in some cases even when not vectorizing. Also curious if D94892 should be applicable to epilog vectorization, as commented there. (will reply there) reames: > What if a loop has a single exiting block - the loop latch, and an interleave group that…
		AyalUnsubmitted Not Done Reply Inline Actions What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog. We could reasonably decide that such a loop does not require a scalar epilogue, but if the cost model decides it does (as it might today), code generation had better be consistent about it. That's all this patch does. We do decide rightfully that such a loop does not require a scalar epilogue today; this patch causes it to have a scalar epilogue. Take for example test case even_load_static_tc from interleaved-accesses.ll, where instead of "-force-vector-width=4 -force-vector-interleave=1" (which requires an epilog) we run it with "-force-vector-width=1 -force-vector-interleave=4" (which does not require an epilog). Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1. I don't follow your comment here. As demonstrated by the test case, we do need to generate an epilogue loop in some cases even when not vectorizing. Sorry if the comment is unclear. The logic for forcing a scalar epilog should conceptually be ((getExitingBlock() != getLoopLatch()) \|\| (VF.isVector() && InterleaveInfo.requiresScalarEpilogue())) , right? Asserting that `isScalarEpilogueAllowed()` holds if the above condition is true: because if an epilog is not allowed, Legal should have prevented vectoring a loop with a non-latch exiting block, and interleave groups requiring epilog should not have been formed. To have `Cost->requiresScalarEpilogue()` fully convey this logic, one could directly pass it VF or VF.isVector(); or reset InterleaveInfo's requiresScalarEpilogue earlier when VF is set to 1, e.g., by invalidating all its groups. Ayal: > > What if a loop has a single exiting block - the loop latch, and an interleave group that…
// access memory out-of-bounds.
// 2) If any instruction may follow a conditionally taken exit. That is, if
// the loop contains multiple exiting blocks, or a single exiting block
// which is not the latch.
if (VF.isVector() && Cost->requiresScalarEpilogue()) {
auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));		auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
R = Builder.CreateSelect(IsZero, Step, R);		R = Builder.CreateSelect(IsZero, Step, R);
}		}

VectorTripCount = Builder.CreateSub(TC, R, "n.vec");		VectorTripCount = Builder.CreateSub(TC, R, "n.vec");

return VectorTripCount;		return VectorTripCount;
}		}
Show All 37 Lines	void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
BasicBlock *const TCCheckBlock = LoopVectorPreHeader;		BasicBlock *const TCCheckBlock = LoopVectorPreHeader;
IRBuilder<> Builder(TCCheckBlock->getTerminator());		IRBuilder<> Builder(TCCheckBlock->getTerminator());

// Generate code to check if the loop's trip count is less than VF * UF, or		// Generate code to check if the loop's trip count is less than VF * UF, or
// equal to it in case a scalar epilogue is required; this implies that the		// equal to it in case a scalar epilogue is required; this implies that the
// vector trip count is zero. This check also covers the case where adding one		// vector trip count is zero. This check also covers the case where adding one
// to the backedge-taken count overflowed leading to an incorrect trip count		// to the backedge-taken count overflowed leading to an incorrect trip count
// of zero. In this case we will also jump to the scalar loop.		// of zero. In this case we will also jump to the scalar loop.
auto P = Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE		auto P = Cost->requiresScalarEpilogue(VF) ? ICmpInst::ICMP_ULE
: ICmpInst::ICMP_ULT;		: ICmpInst::ICMP_ULT;

// If tail is to be folded, vector loop takes care of all iterations.		// If tail is to be folded, vector loop takes care of all iterations.
Value *CheckMinIters = Builder.getFalse();		Value *CheckMinIters = Builder.getFalse();
if (!Cost->foldTailByMasking()) {		if (!Cost->foldTailByMasking()) {
Value *Step =		Value *Step =
createStepForVF(Builder, ConstantInt::get(Count->getType(), UF), VF);		createStepForVF(Builder, ConstantInt::get(Count->getType(), UF), VF);
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");		CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
}		}
▲ Show 20 Lines • Show All 5,059 Lines • ▼ Show 20 Lines	BasicBlock *EpilogueVectorizerMainLoop::emitMinimumIterationCountCheck(
Value *Count = getOrCreateTripCount(L);		Value *Count = getOrCreateTripCount(L);
// Reuse existing vector loop preheader for TC checks.		// Reuse existing vector loop preheader for TC checks.
// Note that new preheader block is generated for vector loop.		// Note that new preheader block is generated for vector loop.
BasicBlock *const TCCheckBlock = LoopVectorPreHeader;		BasicBlock *const TCCheckBlock = LoopVectorPreHeader;
IRBuilder<> Builder(TCCheckBlock->getTerminator());		IRBuilder<> Builder(TCCheckBlock->getTerminator());

// Generate code to check if the loop's trip count is less than VF * UF of the		// Generate code to check if the loop's trip count is less than VF * UF of the
// main vector loop.		// main vector loop.
auto P =		auto P = Cost->requiresScalarEpilogue(ForEpilogue ? EPI.EpilogueVF : VF) ?
Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE : ICmpInst::ICMP_ULT;		ICmpInst::ICMP_ULE : ICmpInst::ICMP_ULT;
		AyalUnsubmitted Not Done Reply Inline Actions "VF" should be "`ForEpilogue ? EPI.EpilogueVF : VF`" here, judging from VFactor above. Ayal: "VF" should be "`ForEpilogue ? EPI.EpilogueVF : VF`" here, judging from VFactor above.

Value *CheckMinIters = Builder.CreateICmp(		Value *CheckMinIters = Builder.CreateICmp(
P, Count, ConstantInt::get(Count->getType(), VFactor * UFactor),		P, Count, ConstantInt::get(Count->getType(), VFactor * UFactor),
"min.iters.check");		"min.iters.check");

if (!ForEpilogue)		if (!ForEpilogue)
TCCheckBlock->setName("vector.main.loop.iter.check");		TCCheckBlock->setName("vector.main.loop.iter.check");

▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	assert(
DT->dominates(cast<Instruction>(EPI.TripCount)->getParent(), Insert)) &&		DT->dominates(cast<Instruction>(EPI.TripCount)->getParent(), Insert)) &&
"saved trip count does not dominate insertion point.");		"saved trip count does not dominate insertion point.");
Value *TC = EPI.TripCount;		Value *TC = EPI.TripCount;
IRBuilder<> Builder(Insert->getTerminator());		IRBuilder<> Builder(Insert->getTerminator());
Value *Count = Builder.CreateSub(TC, EPI.VectorTripCount, "n.vec.remaining");		Value *Count = Builder.CreateSub(TC, EPI.VectorTripCount, "n.vec.remaining");

// Generate code to check if the loop's trip count is less than VF * UF of the		// Generate code to check if the loop's trip count is less than VF * UF of the
// vector epilogue loop.		// vector epilogue loop.
auto P =		auto P = Cost->requiresScalarEpilogue(EPI.EpilogueVF) ?
Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE : ICmpInst::ICMP_ULT;		ICmpInst::ICMP_ULE : ICmpInst::ICMP_ULT;
		AyalUnsubmitted Not Done Reply Inline Actions "VF" should be EPI.EpilogueVF here. Ayal: "VF" should be EPI.EpilogueVF here.

Value *CheckMinIters = Builder.CreateICmp(		Value *CheckMinIters = Builder.CreateICmp(
P, Count,		P, Count,
ConstantInt::get(Count->getType(),		ConstantInt::get(Count->getType(),
EPI.EpilogueVF.getKnownMinValue() * EPI.EpilogueUF),		EPI.EpilogueVF.getKnownMinValue() * EPI.EpilogueUF),
"min.epilog.iters.check");		"min.epilog.iters.check");

ReplaceInstWithInst(		ReplaceInstWithInst(
▲ Show 20 Lines • Show All 1,907 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt %s -S -loop-vectorize -force-vector-interleave=2 \| FileCheck %s			; RUN: opt %s -S -loop-vectorize -force-vector-interleave=2 \| FileCheck %s
				lebedev.riUnsubmitted Not Done Reply Inline Actions I don't think `bug-reduced.ll` is correct here lebedev.ri: I don't think `bug-reduced.ll` is correct here
				reamesAuthorUnsubmitted Done Reply Inline Actions Er, oops. No. I'm surprised the autoupdate worked with that typo. That should be %s. reames: Er, oops. No. I'm surprised the autoupdate worked with that typo. That should be %s.

	; Demonstrate a case where we unroll a loop, but don't vectorize it.			; Demonstrate a case where we unroll a loop, but don't vectorize it.
	; This currently reveals a miscompile. The original loop runs stores in			; The original loop runs stores in the latch block on iterations 0 to 1022,
	; the latch block on iterations 0 to 1022, and exits when %indvars.iv = 1023.			; and exits when %indvars.iv = 1023. (That is, it actually runs the stores
	; Currently, the unrolled loop produced by the vectorizer runs the iteration			; for an odd number of iterations.) If we unroll by two in the "vector.body"
	; where %indvar.iv = 1023 in the vector.body loop before exiting. This results			; loop, we must exit to the epilogue on iteration with %indvars.iv = 1022 to
	; in an out of bounds access..			; avoid an out of bounds access.

	define void @test(double* %data) {			define void @test(double* %data) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[INDUCTION:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[INDUCTION1:%.*]] = add i64 [[INDEX]], 1			; CHECK-NEXT: [[INDUCTION1:%.*]] = add i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[INDUCTION]], 1			; CHECK-NEXT: [[TMP0:%.*]] = shl nuw nsw i64 [[INDUCTION]], 1
	; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[INDUCTION1]], 1			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[INDUCTION1]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP0]], 1			; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP0]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[TMP1]], 1
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds double, double [[DATA:%.*]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds double, double [[DATA:%.*]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[DATA]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[DATA]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.]] = load double, double [[TMP4]], align 8			; CHECK-NEXT: [[TMP6:%.]] = load double, double [[TMP4]], align 8
	; CHECK-NEXT: [[TMP7:%.]] = load double, double [[TMP5]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load double, double [[TMP5]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fneg double [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = fneg double [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = fneg double [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fneg double [[TMP7]]
	; CHECK-NEXT: store double [[TMP8]], double* [[TMP4]], align 8			; CHECK-NEXT: store double [[TMP8]], double* [[TMP4]], align 8
	; CHECK-NEXT: store double [[TMP9]], double* [[TMP5]], align 8			; CHECK-NEXT: store double [[TMP9]], double* [[TMP5]], align 8
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1022
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1024			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, 1022
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1024, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1022, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_LATCH:%.*]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_LATCH:%.*]] ]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_LATCH]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END]], label [[FOR_LATCH]]
	; CHECK: for.latch:			; CHECK: for.latch:
	; CHECK-NEXT: [[T15:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[T15:%.*]] = shl nuw nsw i64 [[INDVARS_IV]], 1
	Show All 30 Lines