LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for the patch! Some comments inline.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5629–5630 | IIUC the problem here is in cases when MaxInterleaveCount == 0, right? The code below expects MaxInterleaveCount > 0 and I think it would be slightly preferable to make sure MaxInterleaveCount is at least 1. With that, the code setting IC should do the right thing, and other (potential future) users of MaxInterleaveCount can expect a sane value. | |
llvm/test/Transforms/LoopVectorize/zero_unroll.ll | ||
1 ↗ | (On Diff #291847) | I think the test needs moving to llvm/test/Transforms/LoopVectorize/SystemZ/, because it relies on SytemZ's cost model and might fail if LLVM is build without the SystemZ backend. Could you also add a check line to make sure a vector body is generated? |
5 ↗ | (On Diff #291847) | nit: dso_local local_unnamed_addr unneeded? |
25 ↗ | (On Diff #291847) | nit: are those attributes needed? |
llvm/test/Transforms/LoopVectorize/zero_unroll.ll | ||
---|---|---|
12 ↗ | (On Diff #291847) | This looks like an infinite outer loop? It maybe doesn't matter a large amount, but if it shows the same problem with a ret instead of looping back, that would make a conceptually simpler test. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5629–5630 |
That's right.
I don't see any uses of MaxInterleaveCount bellow this point. |
LGTM, thanks. Suggestion inline, but quite subjective, so feel free to ignore.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5629–5630 |
Right, personally I think it would still be slightly preferable to ensure the variables have sane values as early as possible, especially when a MaxInterleaveCount of 0 does not really make sense in this context and makes the existing code for IC handling work as expected. But it probably it doesn't make much of a difference in practice. Also, using std::max would seem slightly more explicit to me, together with a comment that IC must be at least 1. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5605–5607 | this is the culprit: *BestKnownTC < VF.getKnownMinValue() leads to zeroing MaxInterleaveCount, right? Best check here if this is the case, and if so reset MaxInterleaveCount to 1 instead of 0. Also worth updating the above comment. | |
5629–5630 |
Agreed. Adding an assert that MaxInterleaveCount is strictly positive is also recommended. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5605–5607 | In this particular case yes, *BestKnownTC < VF.getKnownMinValue() is the problem. But MaxInterleaveCount can potentially be set to 0 in early assignments as well. Is there any guarantee that user won't set ForceTargetMaxScalarInterleaveFactor/ForceTargetMaxVectorInterleaveFactor to 0? Also I don't see anything preventing TTI::getMaxInterleaveFactor to return 0. In fact there is one target returning 0: unsigned HexagonTTIImpl::getMaxInterleaveFactor(unsigned VF) { return useHVX() ? 2 : 0; } | |
5629–5630 |
What I don't like in this approach is over complication. That doesn't help code readability. Instead of one line IC = std::max(1, IC) which { unsigned MaxIC=V1; if (c1) MaxIC=V2; ... if (IC > MaxIC) IC = MaxIC } IC = std::max(1, IC); What do you think?
Agreed. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5605–5607 |
I don't think there's anything currently preventing targets to return 0 for the max interleave factor, but I think it is reasonable to require them to return a MaxInterleaveFactor >= 1 and assert otherwise. But I think it should be sufficient to make sure that max IC >= 1 here with a comment, as Ayal suggested | |
5629–5630 | Given that @Ayal also seems to prefer sanitizing the value earlier I think it would be preferable to do so earlier, but don't have a strong opinion on that. |
LGTM, thanks!
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5609–5622 | nit: seems like something like 'count' is missing from the sentence below. Or say something like MaxInterleave count must be at least 1. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5605–5607 | I have changed the Hexagon code to return 1 (https://reviews.llvm.org/rG99cafe009477). |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5601–5602 | My suggestion would be to (1) update the above comment: the interleave count is limited to be less than the trip count divided by VF, provided it is at-least 1. (2) place the MaxInterleaveCount = std::max(1u, MaxInterleaveCount); fix immediately after setting MaxInterleaveCount = std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount); and (3) introduce an assert(MaxInterleaveCount > 0 && "Max Interleave Count must be at-least 1"); later -- setting it to 0 by targets should be treated as an error (thanks @kparzysz for the fix!) |
Fixed according to Ayal's request.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5629–5630 | Done |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5622 | nit: suffice to only introduce the assert(MaxInterleaveCount > 0 && "Maximum interleave count must be greater than 0"); in the above change hunk, right? Thanks! |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
5622 | That was done intentionally. By sanitizing IC value as early as possible we avoid potential issues with using incorrect value. This is similar in spirit to what is done for MaxInterliveCount. |
My suggestion would be to (1) update the above comment: the interleave count is limited to be less than the trip count divided by VF, provided it is at-least 1.
(2) place the MaxInterleaveCount = std::max(1u, MaxInterleaveCount); fix immediately after setting
and (3) introduce an assert(MaxInterleaveCount > 0 && "Max Interleave Count must be at-least 1"); later -- setting it to 0 by targets should be treated as an error (thanks @kparzysz for the fix!)