Add lowering support for 32-bit vec3 variant of s.buffer.load intrinsic.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5671 | Should get the alignment from the ABI type alignment? On second thought though, the MMO should already exist at this point so I’m not sure why this is reconstructing one |
Using the alignment from the ABI type alignment.
The s_buffer_load intrinsic is not marked with SDNPMemOperand, so I think
that is why we need to create MMO here.
It probably should be marked with SDNPMemOperand, and the fact that it's IntrNoMem is another problem that should eventually be solved
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll | ||
---|---|---|
27–42 | Most of these test changes look unrelated? | |
97–98 | There is no load dwordx3, so I'm slightly confused about why you need this, but I would expect this ot widen to 4x loads? |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll | ||
---|---|---|
27–42 | I added the v3 test which exercises the code I am modifying (divergent index): s_buffer_loadx3_index_divergent. Also added analogous s_buffer_load_index_divergent and s_buffer_loadx2_index_divergent for consistency. | |
97–98 | The big picture is that I am working on cutting down the number of loaded components with various buffer loads. I have another change in instcombine (soon to be uploaded for review) that trims loads based on the components used. With that patch vec3 s_buffer_load crashes in the lowering so I am adding support for that. It is useful to have s_buffer_load.v3 for the case with divergent index, where s_buffer_load cannot be used and buffer_load_dword is generated instead. On newer GPU (VI and later) buffer_load_dwordx3 is present, only on SI we generate buffer_load_dwordx4 for that (see s_buffer_loadx3_index_divergent test). As for whether it is better to split or widen the s_buffer_load (non-divergent index), the advantage of splitting is that the split loads can be merged with an adjacent load more easily. But I do not have a strong opinion on that. |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll | ||
---|---|---|
97–98 | On second thought widening seems to make more sense, will update the patch. |
LGMT
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5690 | Hardcoding the index type to i32 is fine in target specific code |
Should get the alignment from the ABI type alignment? On second thought though, the MMO should already exist at this point so I’m not sure why this is reconstructing one