This is an archive of the discontinued LLVM Phabricator instance.

[LV] Consider if scalar epilogue is required in getMaximizedVFForTarget.
ClosedPublic

Authored by fhahn on Jun 30 2023, 2:15 PM.

Details

Summary

When a scalar epilogue is required, at least one iteration of the scalar loop
has to execute. Adjust ConstTripCount accordingly to avoid picking a max VF
that results in a dead vector loop.

Diff Detail

Event Timeline

fhahn created this revision.Jun 30 2023, 2:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 2:15 PM
fhahn requested review of this revision.Jun 30 2023, 2:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 2:15 PM
Ayal added inline comments.Jul 2 2023, 8:38 AM
llvm/test/Transforms/LoopVectorize/X86/pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll
7–9

Comment above deserves updating? It refers to skipping over the main loop vectorized to VF=32 and jumping to the epilog loop vectorized to VF=8 (with original trip count of 32 and requires scalar epilog), whereas now the main loop is vectorized to VF=8 and the epilog loop is not vectorized at all.

fhahn updated this revision to Diff 536809.Jul 3 2023, 9:50 AM

Rebase on top of e561edaaa56c9a8818d546774b141dead7224b50 which updates the test in pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll so it keeps testing for the original issue.

fhahn marked an inline comment as done.Jul 3 2023, 9:52 AM
fhahn added inline comments.
llvm/test/Transforms/LoopVectorize/X86/pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll
7–9

Thanks, I moved the test to limit-vf-by-tripcount.ll in e561edaaa56c9a8818d546774b141dead7224b50 and update the trip counts in the test so that there's an iteration of the main loop and epilogue loop, so it keeps testing for the original issue.

Ayal added a comment.Jul 4 2023, 5:58 AM

Rebase on top of e561edaaa56c9a8818d546774b141dead7224b50 which updates the test in pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll so it keeps testing for the original issue.

Ah, the original issue of pr56319 was to ensure that the epilog loop leaves at-least one iteration for the scalar loop (or actually epilogVF iterations) rather than vectorizing all remaining iterations? An odd trip count of 39 will always end up with a last scalar iteration, due to main and epilog VF's being even. Perhaps a trip count of 48 with (forced?) main-loop VF of 32 and epilog-loop VF of 8 should ensure the epilog loop runs only once.

llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll
285–286

nit: name of test is a bit misleading, the VF of the main loop is limited to avoid excessive values rather than avoiding epilog vectorization.

fhahn updated this revision to Diff 537095.Jul 4 2023, 7:41 AM
fhahn marked an inline comment as done.

Rebase on top of e561edaaa56c9a8818d546774b141dead7224b50 which updates the test in pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll so it keeps testing for the original issue.

Ah, the original issue of pr56319 was to ensure that the epilog loop leaves at-least one iteration for the scalar loop (or actually epilogVF iterations) rather than vectorizing all remaining iterations? An odd trip count of 39 will always end up with a last scalar iteration, due to main and epilog VF's being even. Perhaps a trip count of 48 with (forced?) main-loop VF of 32 and epilog-loop VF of 8 should ensure the epilog loop runs only once.

The original issue was that we simplified the branch in the main vector loop to exit in the first iteration (VF == vector TC) and because epilogue and main vector loops share the same VPlan, it was also simplified for the epilogue vector loop, but there where more than 1 vector iteration (fix was https://github.com/llvm/llvm-project/commit/0dddf04caba55a64f8534518d65311bdac05cf39).

Now, the test still checks that we don't simplify neither the main nor epilogue vector loop branch. But in this case, we could simplify both, because the epilogue vector loop only executes a single iteration; we should also have a variant where the epilogue vector loops executes multiple times (e.g. main VF=32, epilogue VF=8, TC = 49), but unfortunately my attempts so far weren't successful, because there's some code in isMoreProfitable that estimates the cost of the vector loop + scalar tail for the given VFs, which ignores possible epilogue vectorization( https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L5384). For all cases I tried, this leads to a small main VF ends up being chose (8 in that case) if there would be more than a single iteration of the epilogue vector loop.

fhahn marked an inline comment as done.Jul 4 2023, 7:41 AM
fhahn added inline comments.
llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll
285–286

adjusted the name, thanks!

Ayal added a comment.Jul 4 2023, 1:17 PM

Rebase on top of e561edaaa56c9a8818d546774b141dead7224b50 which updates the test in pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll so it keeps testing for the original issue.

Ah, the original issue of pr56319 was to ensure that the epilog loop leaves at-least one iteration for the scalar loop (or actually epilogVF iterations) rather than vectorizing all remaining iterations? An odd trip count of 39 will always end up with a last scalar iteration, due to main and epilog VF's being even. Perhaps a trip count of 48 with (forced?) main-loop VF of 32 and epilog-loop VF of 8 should ensure the epilog loop runs only once.

The original issue was that we simplified the branch in the main vector loop to exit in the first iteration (VF == vector TC) and because epilogue and main vector loops share the same VPlan, it was also simplified for the epilogue vector loop, but there where more than 1 vector iteration (fix was https://github.com/llvm/llvm-project/commit/0dddf04caba55a64f8534518d65311bdac05cf39).

Now, the test still checks that we don't simplify neither the main nor epilogue vector loop branch. But in this case, we could simplify both, because the epilogue vector loop only executes a single iteration; we should also have a variant where the epilogue vector loops executes multiple times (e.g. main VF=32, epilogue VF=8, TC = 49), but unfortunately my attempts so far weren't successful, because there's some code in isMoreProfitable that estimates the cost of the vector loop + scalar tail for the given VFs, which ignores possible epilogue vectorization( https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L5384). For all cases I tried, this leads to a small main VF ends up being chose (8 in that case) if there would be more than a single iteration of the epilogue vector loop.

Ah, ok, that optimization was later outlined by having LVP::execute_plan() do if (!IsEpilogueVectorization) VPlanTransforms::optimizeForVFAndUF()).
(And may explain the observed clean-up behavior observed in https://reviews.llvm.org/D154264#inline-1491906 :-)

The explanation for test @pr56319 may be misleading, though, as it specifically deals with having the epilog loop vectorization consider requiresScalarEpilog:
; Test case where the exit condition in the main vector loop can be optimized
; to true, but not in the epilogue vector loop. In the test the interleave
; group requires to execute at least one scalar iteration
, meaning the last
; vector iteration of the epilogue vector loop cannot be executed.

Could a test with trip count of, say, 34, work - be vectorized with main VF=32 and epilog VF=2 w/o requiresScalarEpilog - when last member of the interleave-group is accessed, while with requiresScalarEpilog main VF=32 will continue but epilog vectorization will be abandoned?

fhahn updated this revision to Diff 537171.Jul 4 2023, 1:47 PM
fhahn marked an inline comment as done.

Ah, ok, that optimization was later outlined by having LVP::execute_plan() do if (!IsEpilogueVectorization) VPlanTransforms::optimizeForVFAndUF()).
(And may explain the observed clean-up behavior observed in https://reviews.llvm.org/D154264#inline-1491906 :-)

The explanation for test @pr56319 may be misleading, though, as it specifically deals with having the epilog loop vectorization consider requiresScalarEpilog:
; Test case where the exit condition in the main vector loop can be optimized
; to true, but not in the epilogue vector loop. In the test the interleave
; group requires to execute at least one scalar iteration
, meaning the last
; vector iteration of the epilogue vector loop cannot be executed.

Agreed, I think the interleave group requiring one scalar iteration is a red herring with respect to the orinal issue. Tried to clarify it in this patch.

Could a test with trip count of, say, 34, work - be vectorized with main VF=32 and epilog VF=2 w/o requiresScalarEpilog - when last member of the interleave-group is accessed, while with requiresScalarEpilog main VF=32 will continue but epilog vectorization will be abandoned?

With 34, we could simplify the branch in the epilogue vector loop, as the epilogue vector loop would be dead (or shouldn't get generated, as in follow-on patches). I *think* TC = 37 with forced epilogue VF=2 should work (single main loop iteration with VF=32, 2 iterations of the epilogue vector loop (so branch cannot be simplified), final scalar iteration. Updated accoringly

Ayal accepted this revision.Jul 4 2023, 2:37 PM

Ok, very well, then better drop the last requireScalarEpilog part from the explanation of the test?

llvm/test/Transforms/LoopVectorize/X86/pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll
9–10

The scalar loop will get to run the final iteration in any case, due to the trip-count being odd, regardless of any interleave group requiring it (aka red herring).

29

(Just noting that we're refraining from optimizing this single iteration main loop with an unconditional branch exiting to middle block, because that would also cause the double iteration epilog loop to bail out after its first iteration.)

This revision is now accepted and ready to land.Jul 4 2023, 2:37 PM
This revision was landed with ongoing or failed builds.Jul 6 2023, 5:36 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.
fhahn marked an inline comment as done.Jul 6 2023, 5:36 AM
fhahn added inline comments.
llvm/test/Transforms/LoopVectorize/X86/pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll
9–10

Adjusted in the committed version.

29

Exactly