Do not limit LoopVectorize interleave count using MaxLocalUsers in case MaxLocalUsers is zero.
Details
Diff Detail
Event Timeline
Do you have any performance data motivating the change and ruling out any regressions?
llvm/test/Transforms/LoopVectorize/X86/interleave-count.ll | ||
---|---|---|
38 | Please update the test to use opaque pointers. Also, it would be good to put up a patch to just add the test separately and then only include the changes caused by the patch in the diff |
No. I investigated an AVX512 memset code generated by a non-llvm compiler with vectorized move instruction using zmm register and unrolled as if it had interleave count 16. Trying to achieve the same result with LoopVectorize I have found this nit. This is not a problem as long as the number of vector registers is big enough, that is even if decremented and bit-floored but still bigger than the other interleave count limits (e.g. X86TTIImpl::getMaxInterleaveFactor() returns 4 for AVX).
llvm/test/Transforms/LoopVectorize/X86/interleave-count.ll | ||
---|---|---|
38 | Done. See D147588. |
Please update the test to use opaque pointers. Also, it would be good to put up a patch to just add the test separately and then only include the changes caused by the patch in the diff