Page MenuHomePhabricator

[LV] Fix bug when unrolling (only) a loop with non-latch exit
Needs ReviewPublic

Authored by reames on Fri, Jun 4, 8:50 AM.

Details

Reviewers
fhahn
stefanp
Ayal
Summary

If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so.

The included test case on an unmodified opt will access memory one past the expected bound.

I believe this to be the root cause of the issues seen with 3e5ce49e, but even if it isn't, it's definitely a bug.

Diff Detail

Event Timeline

reames created this revision.Fri, Jun 4, 8:50 AM
reames requested review of this revision.Fri, Jun 4, 8:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Jun 4, 8:50 AM
lebedev.ri added inline comments.
llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
3

I don't think bug-reduced.ll is correct here

reames added inline comments.Fri, Jun 4, 9:18 AM
llvm/test/Transforms/LoopVectorize/unroll_nonlatch.ll
3

Er, oops. No. I'm surprised the autoupdate worked with that typo. That should be %s.

reames updated this revision to Diff 349894.Fri, Jun 4, 9:19 AM

Fix silly typo in test.

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

reames added a comment.Fri, Jun 4, 9:26 AM

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

Ayal added inline comments.Sun, Jun 6, 11:55 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3175

What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog.

Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1.

Also curious if D94892 should be applicable to epilog vectorization, as commented there.

stefanp added a comment.EditedMon, Jun 7, 8:04 AM

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

I added both patches and I did a quick test on the Big Endian Power PC buildbot machine.
This patch does seem to fix the original issue. However, after adding the two patches I have 4 new LIT test failues:

LLVM :: Transforms/LoopVectorize/first-order-recurrence-complex.ll
LLVM :: Transforms/LoopVectorize/loop-form.ll
LLVM :: Transforms/LoopVectorize/multiple-exits-versioning.ll
LLVM :: Transforms/LoopVectorize/unroll_nonlatch.ll
reames added inline comments.Mon, Jun 7, 9:28 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3175

What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog.

We could reasonably decide that such a loop does not require a scalar epilogue, but if the cost model decides it does (as it might today), code generation had better be consistent about it. That's all this patch does.

Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1.

I don't follow your comment here. As demonstrated by the test case, we do need to generate an epilogue loop in some cases even when not vectorizing.

Also curious if D94892 should be applicable to epilog vectorization, as commented there.

(will reply there)

reames added a comment.Mon, Jun 7, 9:33 AM

We will be happy to test this on both our LE and BE systems. @stefanp do you have time to test this out or should I do it?

Seems like overkill for this one given the change is trivial and the test is target independent.

I'd save that for when I go to reapply the original patch that exposed this.

Oh, sorry. I meant to say that we'd test with this patch and the reverted one applied together.

I added both patches and I did a quick test on the Big Endian Power PC buildbot machine.
This patch does seem to fix the original issue. However, after adding the two patches I have 4 new LIT test failues:

LLVM :: Transforms/LoopVectorize/first-order-recurrence-complex.ll
LLVM :: Transforms/LoopVectorize/loop-form.ll
LLVM :: Transforms/LoopVectorize/multiple-exits-versioning.ll
LLVM :: Transforms/LoopVectorize/unroll_nonlatch.ll

The original patch would definitely need rebased over changed tests, so that's not surprising.

Ayal added inline comments.Mon, Jun 7, 3:05 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3175

What if a loop has a single exiting block - the loop latch, and an interleave group that requires a scalar epilog, but we decide to unroll the loop w/o vectorizing it? Such a (test)case should be unrolled w/o a scalar epilog.

We could reasonably decide that such a loop does not require a scalar epilogue, but if the cost model decides it does (as it might today), code generation had better be consistent about it. That's all this patch does.

We do decide rightfully that such a loop does not require a scalar epilogue today; this patch causes it
to have a scalar epilogue.
Take for example test case even_load_static_tc from interleaved-accesses.ll, where instead of "-force-vector-width=4 -force-vector-interleave=1" (which requires an epilog) we run it with "-force-vector-width=1 -force-vector-interleave=4" (which does not require an epilog).

Note that InterleaveInfo.requiresScalarEpilogue() is relevant only when vectorizing but is otherwise independent of VF, so should force a scalar epilog in conjunction with the original "if (VF > 1)" of D19487. Exiting from a non-latch block, OTOH, as introduced in D93317, should force a scalar epilog for any VF, including 1.

I don't follow your comment here. As demonstrated by the test case, we do need to generate an epilogue loop in some cases even when not vectorizing.

Sorry if the comment is unclear. The logic for forcing a scalar epilog should conceptually be

((getExitingBlock() != getLoopLatch()) ||
 (VF.isVector() && InterleaveInfo.requiresScalarEpilogue()))

, right?
Asserting that isScalarEpilogueAllowed() holds if the above condition is true: because if an epilog is not allowed, Legal should have prevented vectoring a loop with a non-latch exiting block, and interleave groups requiring epilog should not have been formed.

To have Cost->requiresScalarEpilogue() fully convey this logic, one could directly pass it VF or VF.isVector(); or reset InterleaveInfo's requiresScalarEpilogue earlier when VF is set to 1, e.g., by invalidating all its groups.

@Ayal If I'm understanding you even partially correctly, it sounds like you're raising a code quality issue. That is, we may generate a dead epilogue loop (e.g. with a condition statically known to result in the epilogue being untaken, but emitted as a condition) when we didn't need to. Is that correct?

If so, I request that we land this patch - which is fixing a functional bug - and handle the code quality issue in a separate patch.

I really don't understand the interweaving code, and would really rather not have to. :)

gilr added a subscriber: gilr.Thu, Jun 17, 10:10 AM

@Ayal If I'm understanding you even partially correctly, it sounds like you're raising a code quality issue. That is, we may generate a dead epilogue loop (e.g. with a condition statically known to result in the epilogue being untaken, but emitted as a condition) when we didn't need to. Is that correct?

No, the emitted code in this case would be explicitly leaving some iterations to the epilogue even if the trip count is a multiple of the unroll factor. So the epilogue is alive and will not be optimized away later.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1567

To have Cost->requiresScalarEpilogue() fully convey this logic, one could directly pass it VF

That seems like the simplest way to both fix the bug and avoid pessimizing the unroll-only case, i.e.:

return VF.isVector() && InterleaveInfo.requiresScalarEpilogue();