When the trip-count is provably divisible by the maximal/chosen VF, folding the loop's tail during vectorization is redundant.
This commit extends the existing test for constant trip-counts to any trip-count known to be divisible by maximal/selected VF by SCEV.
Details
- Reviewers
fhahn Ayal SjoerdMeijer - Commits
- rGa56280094e08: [LV] Avoid needless fold tail
Diff Detail
Event Timeline
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5505 | Was surprised to see this change, because I thought we were handling it here already. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5505 |
Yes, since IC may take non-power-of-2 values. Will add a test to cover that. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5505 | Ah yeah, I do see that now. |
LGTM, thanks
llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll | ||
---|---|---|
10 ↗ | (On Diff #313091) | Is this enough? I think it might be better to check the whole vector body? |
llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll | ||
---|---|---|
10 ↗ | (On Diff #313091) | Tried to keep the checks to the minimum proving unmasked vectorization, but perhaps indeed better to check the whole context - will switch to update_test format for easy maintenance. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5505 | Thinking a bit more about this, I think we should be able to use ScalarEvolution::getURemExpr to check if the trip count is a multiple of any VF. That should work for both the constant and variable trip-count cases. As I missed commenting on that before the patch landed, I put up D93677 |
Was surprised to see this change, because I thought we were handling it here already.
Is this check here still relevant? Or can we "merge" this with the one that you added below?