The runtime check threshold should also restrict interleave count. Otherwise, too many runtime checks will be generated for some cases.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7594 | isTooManyRuntimeChecks -> hasTooManyRuntimeChecks ? | |
7675 | Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help? | |
llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll | ||
1 | If you use -debug, it needs to REQUIRES: asserts | |
3 | CHECK-LABEL | |
10 | There are quite a lot of extra blocks in this test. Can a lot of them be removed? |
llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll | ||
---|---|---|
76 | Please avoid using undef in the test unless necessary. |
Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?
Does this mean this didn't help? It seems that code would not alter the IC in any case, as it is already returning VectorizationFactor::Disabled.
Just as a point of cleanup, is it possible to move the logic into selectInterleaveCount, so that it returns a value of 1 if there are too many runtime checks? I'm not sure all the info needed would be easily available inside the costmodel though.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7596 | Should this be checking Hints.allowReordering like the other uses of PragmaVectorizeMemoryCheckThreshold? |
- Requirements can not be accessed in selectInterleaveCount, so additional parameters need to be added to selectInterleaveCount. Do you think it is necessary?<br>
- I'm not sure if I need to keep SelectedVF.Width.getKnownMinValue() > 1, but if it is not necessary, we could remove it in a separate patch.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7675 | Probably to reduce compilation time (although the impact is small) ? | |
llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll | ||
1 | Done, thanks very much! | |
76 | Done, thanks very much! |
- Requirements can not be accessed in selectInterleaveCount, so additional parameters need to be added to selectInterleaveCount. Do you think it is necessary?<br>
Yeah - it might need to pass Requirements.getNumRuntimePointerChecks() through to selectInterleaveCount. I think it's OK as you have it right now, but Florian (or anyone else) can complain if they disagree.
- I'm not sure if I need to keep SelectedVF.Width.getKnownMinValue() > 1, but if it is not necessary, we could remove it in a separate patch.
I was just hoping that if it got to that block then it would prevent the interleaving. It looks like it doesn't work like that though.
LGTM. Perhaps wait a day or two in case anyone else has comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7681–7682 | Maybe just use a single if now: if (SelectedVF.Width.getKnownMinValue() > 1 && hasTooManyRuntimeChecks()) { |
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h | ||
---|---|---|
322 | It would be good to document the helper. Also, requiresTooManyRuntimeChecks may be slightly better, because has seems to imply that the checks are already there to me. | |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
10470 | This check here should be sufficient, there should be no need to also check in selectInterleaveCount. Could you just move the remark generation & early exit from ::plan here? You might want to skip those checks if there's a UserVF or UserIC used, with those I think we should always vectorize if possible. It also might be good to add a check line to your test which forces an interleave count > 1. | |
llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll | ||
2 | As this relies on the PPC cost-model, this needs to go into the PowerPC subdirectory. | |
13 | nit: can change %z_e_8676_1962 to i64 and remove the first load. Also, names could be tidied up a bit. | |
14 | nit: it would be good to clean up the names of the blocks a bit more. |
I think SelectedVF.Width.getKnownMinValue() > 1 is used to avoid the check when VF is scalar, but ElementCount::isScalar() seems better.
I don't think the position for the runtimeCheck is good.
Now, the code looks like:
1.If UserVF is OK, do vectorization using UserVF
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates and select best VF
5.Check the number of runtime checks for selected VF if it's scalar. If it's too many, stop vectorization.
15234 would be a better order?
1.If UserVF is OK, do vectorization using UserVF
5.Check the number of runtime checks(No need to consider about the VF because the number of runtime checks are same for all VF). If it's too many, stop vectorization.
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates
I'm not sure of my opinion. If it's wrong, happy to get a response :)
Thanks for the update!
It's indeed not at the most optimal position focused solely at compile-time. But I think for this patch, I think it would be good to roughly keep roughly the original order, especially if we manage to check & error at a single place.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
10470 | Can the handling be merged into a single check & diagnostic? |
Fix the failed case (optimization-remark-options.c), because the remark info should be updated
(Updated)
Difference with accepted version: Move memory runtime checks to processLoop to control both VF and IC
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
7681–7682 | Done. Thanks for your review! | |
10470 | Hi, @fhahn, thanks for your review! It sounds similar to doesNotMeet (https://reviews.llvm.org/D98634), but the difference is that I need to use UserIC and UserVF to control whether this check needs to be performed, right? E.g. if (!UserVF && LVP.requiresTooManyRuntimeChecks()) { /*generate remarks*/ VF = VectorizationFactor::Disabled(); } if (!UserIC && LVP.requiresTooManyRuntimeChecks()) { /*generate remarks*/ IC = 1; }
| |
10470 | Hi,@fhahn, thanks for your reply! Does the current version meet the requirements? | |
10470 | Hi, @fhahn, is there any other problem with this patch? ping |
LGTM with additional suggestions inline, thanks!
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
10474 | I think we usually try to use early exits to reduce the indentation, so it might be worth doing something like if (LVP.requiresTooManyRuntimeChecks()) { ORE->emit([&]() { return OptimizationRemarkAnalysisAliasing( DEBUG_TYPE, "CantReorderMemOps", L->getStartLoc(), L->getHeader()) << "loop not vectorized: cannot prove it is safe to reorder " "memory operations"; }); LLVM_DEBUG(dbgs() << "LV: Too many memory checks needed.\n"); Hints.emitRemarkWithHints(); return false; } // Select the interleave count. IC = CM.selectInterleaveCount(VF.Width, *VF.Cost.getValue()); (this has the added benefit of not checking for !LVP.requiresTooManyRuntimeChecks() but the unnegated version, which is slightly more straight forward) | |
llvm/test/Transforms/LoopVectorize/PowerPC/interleaved-pointer-runtime-check-unprofitable.ll | ||
3 ↗ | (On Diff #428930) | Might be good to precommit the test case and then just show the difference in this diff (without the fix ; CHECK: vector.memcheck) |
Still LGTM, thanks! The remaining suggestion can be addressed directly before committing the patch.
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h | ||
---|---|---|
323 | nit: could be turned into const if possible. |
Thanks, @fhahn! I'll turn the function into const and add the precommit test when committing the patch
It would be good to document the helper. Also, requiresTooManyRuntimeChecks may be slightly better, because has seems to imply that the checks are already there to me.