We may form a zero-stride vector load when lowering gather to strided
load. As what D137699 has done, we use load+splat for this form if
there is no optimized implementation.
We restrict this to unmasked loads currently in consideration of the
complexity of hanlding all falses masks.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,050 ms | x64 debian > libFuzzer.libFuzzer::minimize_crash.test |
Event Timeline
This isn't correct. The strided load can be masked. For the case where all lanes are masked off, executing the scalar load is unsound and could introduce a fault.
You could allow any mask where you can prove at least one lane active, or make the scalar store conditional, but there's a bunch of complexity there. As a starting point, I suggest you restrict your transformation to when the instruction is unmasked.
Thanks. I didn't consider these situations before.
I tried to transform masked loads to scalar load + splat + vrgather, but it may not worth doing it as three instructions are needed. And for runtime all-zeros-mask, we may take some costs to handle it. So, I think we can only handle unmasked loads for now. :-)
LGTM
llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store-asm.ll | ||
---|---|---|
230 | Hm, this test case is interesting. This is a case where even with a fast broadcast load, putting the value into a scalar allows the splat to be folded into the using instruction. This trades a scalar register for a vector one, and might be generally interesting. Maybe a case to give some further thought, definitely not blocking for this patch. |
Hm, this test case is interesting.
This is a case where even with a fast broadcast load, putting the value into a scalar allows the splat to be folded into the using instruction. This trades a scalar register for a vector one, and might be generally interesting.
Maybe a case to give some further thought, definitely not blocking for this patch.