This additional unrolling (interleaving) will increase the register usage, and most likely hurts the performance.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
Add a test case.
I was thinking that we should still be able to use the backend option: -force-target-max-scalar-interleave
But it seems the code in LoopVectorize.cpp prevents us doing this:
// Don't attempt if
// 1. the target claims to have no vector registers, and // 2. interleaving won't help ILP. // // The second condition is necessary because, even if the target has no // vector registers, loop vectorization may still enable scalar // interleaving. if (!TTI->getNumberOfRegisters(true) && TTI->getMaxInterleaveFactor(1) < 2) { return false; }
Comment Actions
add lit.local.cfg file in the newly created test/Transforms/LoopVectorize/AMDGPU directory.