This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Don't use zero-stride vector load for gather if not optimized
ClosedPublic

Authored by pcwang-thead on Nov 14 2022, 3:29 AM.

Details

Summary

We may form a zero-stride vector load when lowering gather to strided
load. As what D137699 has done, we use load+splat for this form if
there is no optimized implementation.
We restrict this to unmasked loads currently in consideration of the
complexity of hanlding all falses masks.

Diff Detail

Event Timeline

pcwang-thead requested review of this revision.Nov 14 2022, 3:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 3:29 AM
reames requested changes to this revision.Nov 14 2022, 12:28 PM

This isn't correct. The strided load can be masked. For the case where all lanes are masked off, executing the scalar load is unsound and could introduce a fault.

You could allow any mask where you can prove at least one lane active, or make the scalar store conditional, but there's a bunch of complexity there. As a starting point, I suggest you restrict your transformation to when the instruction is unmasked.

This revision now requires changes to proceed.Nov 14 2022, 12:28 PM

Restrict this to unmasked loads only.

pcwang-thead edited the summary of this revision. (Show Details)Nov 14 2022, 11:58 PM

This isn't correct. The strided load can be masked. For the case where all lanes are masked off, executing the scalar load is unsound and could introduce a fault.

You could allow any mask where you can prove at least one lane active, or make the scalar store conditional, but there's a bunch of complexity there. As a starting point, I suggest you restrict your transformation to when the instruction is unmasked.

Thanks. I didn't consider these situations before.

I tried to transform masked loads to scalar load + splat + vrgather, but it may not worth doing it as three instructions are needed. And for runtime all-zeros-mask, we may take some costs to handle it. So, I think we can only handle unmasked loads for now. :-)

reames accepted this revision.Nov 15 2022, 7:27 AM

LGTM

llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store-asm.ll
230

Hm, this test case is interesting.

This is a case where even with a fast broadcast load, putting the value into a scalar allows the splat to be folded into the using instruction. This trades a scalar register for a vector one, and might be generally interesting.

Maybe a case to give some further thought, definitely not blocking for this patch.

This revision is now accepted and ready to land.Nov 15 2022, 7:27 AM
pcwang-thead marked an inline comment as done.Nov 16 2022, 12:35 AM
pcwang-thead added inline comments.
llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store-asm.ll
230

After thinking, I think there are some opportunities to fold splat into .vx instructions, see D138101.