This is an archive of the discontinued LLVM Phabricator instance.

[LV] Fix bug when unrolling (only) a loop with non-latch exit
ClosedPublic

Authored by reames on Jun 4 2021, 8:50 AM.

Details

Summary

If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so.

The included test case on an unmodified opt will access memory one past the expected bound.

I believe this to be the root cause of the issues seen with 3e5ce49e, but even if it isn't, it's definitely a bug.

Diff Detail

Event Timeline

reames created this revision.Jun 4 2021, 8:50 AM
reames requested review of this revision.Jun 4 2021, 8:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2021, 8:50 AM
lebedev.ri added inline comments.
llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
3

I don't think bug-reduced.ll is correct here

reames added inline comments.Jun 4 2021, 9:18 AM
llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
3

Er, oops. No. I'm surprised the autoupdate worked with that typo. That should be %s.

reames updated this revision to Diff 349894.Jun 4 2021, 9:19 AM

Fix silly typo in test.

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

reames added a comment.Jun 4 2021, 9:26 AM

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

Ayal added inline comments.Jun 6 2021, 11:55 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3175

What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog.

Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1.

Also curious if D94892 should be applicable to epilog vectorization, as commented there.

stefanp added a comment.EditedJun 7 2021, 8:04 AM

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

I added both patches and I did a quick test on the Big Endian Power PC buildbot machine.
This patch does seem to fix the original issue. However, after adding the two patches I have 4 new LIT test failues:

LLVM :: Transforms/LoopVectorize/first-order-recurrence-complex.ll
LLVM :: Transforms/LoopVectorize/loop-form.ll
LLVM :: Transforms/LoopVectorize/multiple-exits-versioning.ll
LLVM :: Transforms/LoopVectorize/unroll_nonlatch.ll
reames added inline comments.Jun 7 2021, 9:28 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3175

What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog.

We could reasonably decide that such a loop does not require a scalar epilogue, but if the cost model decides it does (as it might today), code generation had better be consistent about it. That's all this patch does.

Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1.

I don't follow your comment here. As demonstrated by the test case, we do need to generate an epilogue loop in some cases even when not vectorizing.

Also curious if D94892 should be applicable to epilog vectorization, as commented there.

(will reply there)

reames added a comment.Jun 7 2021, 9:33 AM

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

I added both patches and I did a quick test on the Big Endian Power PC buildbot machine.
This patch does seem to fix the original issue. However, after adding the two patches I have 4 new LIT test failues:

LLVM :: Transforms/LoopVectorize/first-order-recurrence-complex.ll
LLVM :: Transforms/LoopVectorize/loop-form.ll
LLVM :: Transforms/LoopVectorize/multiple-exits-versioning.ll
LLVM :: Transforms/LoopVectorize/unroll_nonlatch.ll

The original patch would definitely need rebased over changed tests, so that's not surprising.

Ayal added inline comments.Jun 7 2021, 3:05 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3175

What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog.

We could reasonably decide that such a loop does not require a scalar epilogue, but if the cost model decides it does (as it might today), code generation had better be consistent about it. That's all this patch does.

We do decide rightfully that such a loop does not require a scalar epilogue today; this patch causes it
to have a scalar epilogue.
Take for example test case even_load_static_tc from interleaved-accesses.ll, where instead of "-force-vector-width=4 -force-vector-interleave=1" (which requires an epilog) we run it with "-force-vector-width=1 -force-vector-interleave=4" (which does not require an epilog).

Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1.

I don't follow your comment here. As demonstrated by the test case, we do need to generate an epilogue loop in some cases even when not vectorizing.

Sorry if the comment is unclear. The logic for forcing a scalar epilog should conceptually be

((getExitingBlock() != getLoopLatch()) ||
 (VF.isVector() && InterleaveInfo.requiresScalarEpilogue()))

, right?
Asserting that isScalarEpilogueAllowed() holds if the above condition is true: because if an epilog is not allowed, Legal should have prevented vectoring a loop with a non-latch exiting block, and interleave groups requiring epilog should not have been formed.

To have Cost->requiresScalarEpilogue() fully convey this logic, one could directly pass it VF or VF.isVector(); or reset InterleaveInfo's requiresScalarEpilogue earlier when VF is set to 1, e.g., by invalidating all its groups.

@Ayal If I'm understanding you even partially correctly, it sounds like you're raising a code quality issue. That is, we may generate a dead epilogue loop (e.g. with a condition statically known to result in the epilogue being untaken, but emitted as a condition) when we didn't need to. Is that correct?

If so, I request that we land this patch - which is fixing a functional bug - and handle the code quality issue in a separate patch.

I really don't understand the interweaving code, and would really rather not have to. :)

gilr added a subscriber: gilr.Jun 17 2021, 10:10 AM

@Ayal If I'm understanding you even partially correctly, it sounds like you're raising a code quality issue. That is, we may generate a dead epilogue loop (e.g. with a condition statically known to result in the epilogue being untaken, but emitted as a condition) when we didn't need to. Is that correct?

No, the emitted code in this case would be explicitly leaving some iterations to the epilogue even if the trip count is a multiple of the unroll factor. So the epilogue is alive and will not be optimized away later.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1567

To have Cost->requiresScalarEpilogue() fully convey this logic, one could directly pass it VF

That seems like the simplest way to both fix the bug and avoid pessimizing the unroll-only case, i.e.:

return VF.isVector() && InterleaveInfo.requiresScalarEpilogue();
reames updated this revision to Diff 354097.EditedJun 23 2021, 4:01 PM

Attempt to incorporate review feedback.

@Ayal, @gilr - I still don't understand your comments on the review. I think this was what you were asking for (right?), and it doesn't seem to do any harm, but I'm not sure why you think this was needed for correctness. That doesn't seem to match the code. If this was not what you had in mind, can I ask you to simply fix the bug proven by the test case yourself? We don't appear to be making progress here in terms of me understanding the point you're getting at.

reames updated this revision to Diff 354099.Jun 23 2021, 4:03 PM

(include full test diff as opposed to diff from last revision)

Ayal added a comment.Jun 24 2021, 2:43 PM

This was indeed what was suggested, thanks for following up! The patch correctly fixes the bug raised by the test case, right? I.e., the last iteration of the "vector" (unrolled) loop is peeled into UF(*VF) iterations of the scalar epilog.
While keeping even_load_static_tc test case of interleaved-accesses.ll without this peeling, when run with "-force-vector-width=1 -force-vector-interleave=4"; unlike the original patch. Suggest to include this test case "run" in the patch.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8300

"VF" should be "ForEpilogue ? EPI.EpilogueVF : VF" here, judging from VFactor above.

8444

"VF" should be EPI.EpilogueVF here.

This was indeed what was suggested, thanks for following up! The patch correctly fixes the bug raised by the test case, right? I.e., the last iteration of the "vector" (unrolled) loop is peeled into UF(*VF) iterations of the scalar epilog.
While keeping even_load_static_tc test case of interleaved-accesses.ll without this peeling, when run with "-force-vector-width=1 -force-vector-interleave=4"; unlike the original patch. Suggest to include this test case "run" in the patch.

@Ayal - I'm sorry, but I'm having a really hard time understanding you.

To restate, the correctness issue this is fixing is that the "vectorized" (really unrolled) loop in the test case was running more iterations than the original loop. It did this because different parts of the vectorizer disagreed about whether a scalar epilogue loop was required. That disagreement is the bug being fixed. There is no change in peeling policy intended or desired. This is a fix for a miscompile, nothing more.

@Ayal - I don't feel that we are making useful progress on this review. You seem to have a very specific view of how you want this fixed, and that is not translating well for me. Instead of continuing to waste both our times, can I ask you to fix the bug yourself? At the moment, I feel we're both wasting time on this conversation.

Ayal added a comment.Jun 26 2021, 1:33 PM

This was indeed what was suggested, thanks for following up! The patch correctly fixes the bug raised by the test case, right? I.e., the last iteration of the "vector" (unrolled) loop is peeled into UF(*VF) iterations of the scalar epilog.
While keeping even_load_static_tc test case of interleaved-accesses.ll without this peeling, when run with "-force-vector-width=1 -force-vector-interleave=4"; unlike the original patch. Suggest to include this test case "run" in the patch.

@Ayal - I'm sorry, but I'm having a really hard time understanding you.

To restate, the correctness issue this is fixing is that the "vectorized" (really unrolled) loop in the test case was running more iterations than the original loop. It did this because different parts of the vectorizer disagreed about whether a scalar epilogue loop was required. That disagreement is the bug being fixed. There is no change in peeling policy intended or desired. This is a fix for a miscompile, nothing more.

@Ayal - I don't feel that we are making useful progress on this review. You seem to have a very specific view of how you want this fixed, and that is not translating well for me. Instead of continuing to waste both our times, can I ask you to fix the bug yourself? At the moment, I feel we're both wasting time on this conversation.

@reames, your patch is practically ready to land, it just needs Epilog vectorizer to pass the correct VF as noted; do we agree?
The suggestion to include an extra test case can be ignored, if desired.

reames updated this revision to Diff 355058.Jun 28 2021, 4:11 PM

Address review comment. Additionally rebase over landed test, don't know why I didn't land that a while ago.

Ayal accepted this revision.Jun 28 2021, 11:15 PM
This revision is now accepted and ready to land.Jun 28 2021, 11:15 PM
This revision was landed with ongoing or failed builds.Jun 29 2021, 8:04 AM
This revision was automatically updated to reflect the committed changes.