This patch fixes pr34283, which exposed that the computation of maximum legal width for vectorization was wrong, because it relied on MaxInterleaveFactor to obtain the maximum stride used in the loop, however not all strided accesses in the loop have an interleave-group associated with them. In pr34283 the interleave group gets invalidated because it is a store with gaps; as a result MaxInterleaveFactor is 1, and the computed maximum legal width for vectorization comes out larger than it should be.
Instead of recording the maximum stride in the loop, which can be over conservative (e.g. if the access with the maximum stride is not involved in the dependence limitation), this patch tracks the actual maximum legal width imposed by accesses that are involved in dependencies.
Details
Diff Detail
Event Timeline
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
6281 | Not strictly related to your patch, but since you are already adding a missing "Invalidate candidate interleave group" message above for the case that shows up in PR34283 (thanks!) , might as well also add a similar message that's missing here (where we invalidate a candidate interleave group due to a reverse access with gaps). | |
6378 | Another small nit not related to your patch, but while we're here: would be nice to make this message a bit clearer, e.g. "... Widest register safe to use is..." |
include/llvm/Analysis/LoopAccessAnalysis.h | ||
---|---|---|
205 | Should MaxSafeRegisterWidth replace MaxSafeDepDistBytes, or are both needed? | |
265 |
>> that is most restrictive. | |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
6226 | Can simplify: "LV: Invalidate candidate interleaved store group due to gaps.\n" | |
6363 | This comment about being fairly conservative (i.e., could be room for improvement) still holds, right? | |
6366 | So another way of fixing this could be to have Legal record the MaxStride, instead of asking it for the MaxInterleaveFactor here? | |
6367 |
that is most restrictive | |
test/Transforms/LoopVectorize/X86/pr34283-1.ll | ||
1 | No need to require asserts here, right? Better place the test say in memdep.ll, rather than opening a new file? Following TestingGuide.html#writing-new-regression-tests. In any case, place both tests in same file, using the PR number in the function names instead of foo. | |
25 | Can we also/instead check the runtime guard that is generated, perhaps where M is not a compile-time constant? | |
26 | min[i]mum | |
test/Transforms/LoopVectorize/X86/pr34283-2.ll | ||
1 | The "2>&1" is not needed here. | |
25 | This is a bit fragile, as it's relying on the cost-model to choose exactly VF=4. Perhaps better to simply CHECK for a vector whose VF is in the range 2-7, and no vectors of VF 8 or larger? |
include/llvm/Analysis/LoopAccessAnalysis.h | ||
---|---|---|
205 | I think we need both. MaxSafeDepDistBytes is used in a few other LAA functions, and is correct. | |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
6363 | We are actually now dropping this over-conservativeness, which was also not conservative enough. | |
6366 | yes, this was one of the possible solutions we thought of, but it is less accurate. Furthermore we can also drop getMaxInterleaveFactor() as it becomes dead. | |
test/Transforms/LoopVectorize/X86/pr34283-1.ll | ||
1 | The asserts are needed for checking the debug output, but we can check the IR instead, as in the second test. | |
25 | This fix involves compile time distances that affect max VF; no runtime guards involved. | |
test/Transforms/LoopVectorize/X86/pr34283-2.ll | ||
25 | Can force-vectorization-width=4 and CHECK that it vectorized, and force-vectorization-width=8 and CHECK that it did not. |
Only additional comment is whether the test needs to be X86/skx specific, or whether it can be placed in, say, memdep.ll
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
1677 | Best assert that LAI is not null before dereferencing it. | |
6374 | While you're touching this, best use std::min. | |
test/Transforms/LoopVectorize/memdep.ll | ||
230 ↗ | (On Diff #114571) | Better clarify: |
Should MaxSafeRegisterWidth replace MaxSafeDepDistBytes, or are both needed?