This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Lower llvm.amdgcn.s.buffer.load.v3[i|f]32
ClosedPublic

Authored by piotr on Nov 12 2019, 4:00 AM.

Details

Summary

Add lowering support for 32-bit vec3 variant of s.buffer.load intrinsic.

Diff Detail

Event Timeline

piotr created this revision.Nov 12 2019, 4:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2019, 4:00 AM
arsenm added inline comments.Nov 12 2019, 5:08 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5671

Should get the alignment from the ABI type alignment? On second thought though, the MMO should already exist at this point so I’m not sure why this is reconstructing one

piotr updated this revision to Diff 229028.Nov 13 2019, 1:41 AM

Using the alignment from the ABI type alignment.

The s_buffer_load intrinsic is not marked with SDNPMemOperand, so I think
that is why we need to create MMO here.

Using the alignment from the ABI type alignment.

The s_buffer_load intrinsic is not marked with SDNPMemOperand, so I think
that is why we need to create MMO here.

It probably should be marked with SDNPMemOperand, and the fact that it's IntrNoMem is another problem that should eventually be solved

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
27–42

Most of these test changes look unrelated?

97–98

There is no load dwordx3, so I'm slightly confused about why you need this, but I would expect this ot widen to 4x loads?

piotr marked 2 inline comments as done.Nov 13 2019, 3:27 AM
piotr added inline comments.
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
27–42

I added the v3 test which exercises the code I am modifying (divergent index): s_buffer_loadx3_index_divergent. Also added analogous s_buffer_load_index_divergent and s_buffer_loadx2_index_divergent for consistency.

97–98

The big picture is that I am working on cutting down the number of loaded components with various buffer loads. I have another change in instcombine (soon to be uploaded for review) that trims loads based on the components used. With that patch vec3 s_buffer_load crashes in the lowering so I am adding support for that.

It is useful to have s_buffer_load.v3 for the case with divergent index, where s_buffer_load cannot be used and buffer_load_dword is generated instead. On newer GPU (VI and later) buffer_load_dwordx3 is present, only on SI we generate buffer_load_dwordx4 for that (see s_buffer_loadx3_index_divergent test).

As for whether it is better to split or widen the s_buffer_load (non-divergent index), the advantage of splitting is that the split loads can be merged with an adjacent load more easily. But I do not have a strong opinion on that.

piotr marked an inline comment as done.Nov 15 2019, 12:42 AM
piotr added inline comments.
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
97–98

On second thought widening seems to make more sense, will update the patch.

piotr updated this revision to Diff 229506.Nov 15 2019, 4:30 AM

Widening instead of splitting.

arsenm accepted this revision.Nov 15 2019, 4:47 AM

LGMT

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5690

Hardcoding the index type to i32 is fine in target specific code

This revision is now accepted and ready to land.Nov 15 2019, 4:47 AM
This revision was automatically updated to reflect the committed changes.