Do not limit LoopVectorize interleave count using MaxLocalUsers in case MaxLocalUsers is zero.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Do you have any performance data motivating the change and ruling out any regressions?
llvm/test/Transforms/LoopVectorize/X86/interleave-count.ll | ||
---|---|---|
38 | Please update the test to use opaque pointers. Also, it would be good to put up a patch to just add the test separately and then only include the changes caused by the patch in the diff |
No. I investigated an AVX512 memset code generated by a non-llvm compiler with vectorized move instruction using zmm register and unrolled as if it had interleave count 16. Trying to achieve the same result with LoopVectorize I have found this nit. This is not a problem as long as the number of vector registers is big enough, that is even if decremented and bit-floored but still bigger than the other interleave count limits (e.g. X86TTIImpl::getMaxInterleaveFactor() returns 4 for AVX).
llvm/test/Transforms/LoopVectorize/X86/interleave-count.ll | ||
---|---|---|
38 | Done. See D147588. |
Please update the test to use opaque pointers. Also, it would be good to put up a patch to just add the test separately and then only include the changes caused by the patch in the diff