This additional unrolling (interleaving) will increase the register usage, and most likely hurts the performance.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
Add a test case.
I was thinking that we should still be able to use the backend option: -force-target-max-scalar-interleave
But it seems the code in LoopVectorize.cpp prevents us doing this:
// Don't attempt if
// 1. the target claims to have no vector registers, and
// 2. interleaving won't help ILP.
//
// The second condition is necessary because, even if the target has no
// vector registers, loop vectorization may still enable scalar
// interleaving.
if (!TTI->getNumberOfRegisters(true) && TTI->getMaxInterleaveFactor(1) < 2) {
return false;
}Comment Actions
add lit.local.cfg file in the newly created test/Transforms/LoopVectorize/AMDGPU directory.