We may form a zero-stride vector load when lowering gather to strided
load. As what D137699 has done, we use load+splat for this form if
there is no optimized implementation.
We restrict this to unmasked loads currently in consideration of the
complexity of hanlding all falses masks.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This isn't correct. The strided load can be masked. For the case where all lanes are masked off, executing the scalar load is unsound and could introduce a fault.
You could allow any mask where you can prove at least one lane active, or make the scalar store conditional, but there's a bunch of complexity there. As a starting point, I suggest you restrict your transformation to when the instruction is unmasked.
Thanks. I didn't consider these situations before.
I tried to transform masked loads to scalar load + splat + vrgather, but it may not worth doing it as three instructions are needed. And for runtime all-zeros-mask, we may take some costs to handle it. So, I think we can only handle unmasked loads for now. :-)
LGTM
llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store-asm.ll | ||
---|---|---|
230 | Hm, this test case is interesting. This is a case where even with a fast broadcast load, putting the value into a scalar allows the splat to be folded into the using instruction. This trades a scalar register for a vector one, and might be generally interesting. Maybe a case to give some further thought, definitely not blocking for this patch. |
Hm, this test case is interesting.
This is a case where even with a fast broadcast load, putting the value into a scalar allows the splat to be folded into the using instruction. This trades a scalar register for a vector one, and might be generally interesting.
Maybe a case to give some further thought, definitely not blocking for this patch.