( See
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798
for some of the context )
Currently, the intrinsics for operating on AMDGPU's buffer descriptors
take the descriptor as a <4 x i32> value. This is problematic, in that
these intrinsics cannot be analyzed like the load/store/atomic
operations they are, which weakens our ability to perform compiler
optimizations and blocks the address space 7 work done elsewhere.
Therefore, this commit:
- Changes all intrinsics that take a 128-bit buffer resource (V#) to
take a ptr addrspace(8) instead.
- Updates the definitions of those intrinsics to indicate that they
only operate on memory related to their arguments.
- Updates Global ISel and SelectionDAG to, early in their operation,
lower these address space 8 pointers (which are 128-bit scalars) to
<4 x i32> vectors. This must be done because the definitions of the
resource instructions themselves (and all the code for reasoning about
them) expects a <4 x i32> value, and this cannot be changed because
assigning a register class to i128 values would cause breakages in our
SelectionDAG usage.
- Specifically, in GlobalISel, we use G_UNMERGE/G_BUIlD_VECTOR and VECTOR_EXTRACT_ELEMENT/G_MERGE in the non-vector case, and G_{INTTOPTR,PTRTOINT} and G_BITCAST in the complex case, as we have found this produces better codegen
- In SelectionDAG, we simply use a bitcast We have found that these produce better-looking codegen.
- Update some of the creation of MachineMemOperands to keep a
reference to the original buffer descriptor. Designing a
PseudoSourceValue that also holds the original index/offset/... values
is future work.
- Define auto-upgrade from the old intrinscic forms to the new ones
that performs the transformation
%v = call T @llvm.amdgcn.*buffer*(...a, <4 x i32> %rsrc, ...b)
>
%.rsrc.int = bitcast <4 x i32> %rsrc to i128
%.rsrc.ptr = inttoptr i128 %.rsrc.int to ptr addrspace(8)
%v = call T @llvm.amdgcn.*buffer*(...a, ptr addrspace(8) %.rsrc.ptr, ...b)
Tests have been updated to the new intrinsic forms.
The test changes, generally, fall into
- Changing the declarations of buffer intrinsics to their new forms
- Updating MIR tests to account for the new types or for additional
COPY instructions that are temporarily introduced by the new
legalizations
- Occasional drifts of a movk to the other side of a buffer operation
- Changes to the behavior of loading a ptr addrspace(8) from an undef
location, which can now act like [???0, ???1, ???0, ???1] : <4 x i32>
instead of [??? 0, ???1, ???2, ???3] : <4 x i32>. These were often
avoided by removing such loads and replacing them with loads from the
relevant null pointer.
One of the unrelated changes, caused by the legalizaiton not being
idempotent, is llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll 's lack
of a mad, which is, per discussion, fine.
This patch does not appear to affect the final generated code, except
for the introduction of additional moves at O0 in some cases.
Depends on D145441