This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Make buffer intrinsics pointer-valued
AbandonedPublic

Authored by krzysz00 on Mar 23 2023, 2:38 PM.

Details

Reviewers
arsenm
nhaehnle
Group Reviewers
Restricted Project
Summary

( See
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798
for some of the context )

Currently, the intrinsics for operating on AMDGPU's buffer descriptors
take the descriptor as a <4 x i32> value. This is problematic, in that
these intrinsics cannot be analyzed like the load/store/atomic
operations they are, which weakens our ability to perform compiler
optimizations and blocks the address space 7 work done elsewhere.
Therefore, this commit:

  1. Changes all intrinsics that take a 128-bit buffer resource (V#) to

take a ptr addrspace(8) instead.

  1. Updates the definitions of those intrinsics to indicate that they

only operate on memory related to their arguments.

  1. Updates Global ISel and SelectionDAG to, early in their operation,

lower these address space 8 pointers (which are 128-bit scalars) to
<4 x i32> vectors. This must be done because the definitions of the
resource instructions themselves (and all the code for reasoning about
them) expects a <4 x i32> value, and this cannot be changed because
assigning a register class to i128 values would cause breakages in our
SelectionDAG usage.

    • Specifically, in GlobalISel, we use G_UNMERGE/G_BUIlD_VECTOR and VECTOR_EXTRACT_ELEMENT/G_MERGE in the non-vector case, and G_{INTTOPTR,PTRTOINT} and G_BITCAST in the complex case, as we have found this produces better codegen
    • In SelectionDAG, we simply use a bitcast We have found that these produce better-looking codegen.
  1. Update some of the creation of MachineMemOperands to keep a

reference to the original buffer descriptor. Designing a
PseudoSourceValue that also holds the original index/offset/... values
is future work.

  1. Define auto-upgrade from the old intrinscic forms to the new ones

that performs the transformation

%v = call T @llvm.amdgcn.*buffer*(...a, <4 x i32> %rsrc, ...b)

>

%.rsrc.int = bitcast <4 x i32> %rsrc to i128
%.rsrc.ptr = inttoptr i128 %.rsrc.int to ptr addrspace(8)
%v = call T @llvm.amdgcn.*buffer*(...a, ptr addrspace(8) %.rsrc.ptr, ...b)

Tests have been updated to the new intrinsic forms.
The test changes, generally, fall into

  1. Changing the declarations of buffer intrinsics to their new forms
  2. Updating MIR tests to account for the new types or for additional

COPY instructions that are temporarily introduced by the new
legalizations

  1. Occasional drifts of a movk to the other side of a buffer operation
  2. Changes to the behavior of loading a ptr addrspace(8) from an undef

location, which can now act like [???0, ???1, ???0, ???1] : <4 x i32>
instead of [??? 0, ???1, ???2, ???3] : <4 x i32>. These were often
avoided by removing such loads and replacing them with loads from the
relevant null pointer.

One of the unrelated changes, caused by the legalizaiton not being
idempotent, is llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll 's lack
of a mad, which is, per discussion, fine.

This patch does not appear to affect the final generated code, except
for the introduction of additional moves at O0 in some cases.

Depends on D145441

Diff Detail

Event Timeline

krzysz00 created this revision.Mar 23 2023, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 2:38 PM
krzysz00 requested review of this revision.Mar 23 2023, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 2:38 PM
krzysz00 added reviewers: arsenm, nhaehnle, Restricted Project.Mar 23 2023, 2:44 PM
krzysz00 updated this revision to Diff 508179.Mar 24 2023, 11:42 AM

Add basic cest for alias analysis on buffer intrinsics

krzysz00 updated this revision to Diff 508185.Mar 24 2023, 12:06 PM

Add tests for constructing buffer descriptors from integers

foad added a comment.Mar 29 2023, 1:10 AM

While here, update amdgcn.s.buffer.load to have the correct memory
effects (namely, a read from its argument).

Can you split this part out into its own patch? It sounds more straightforward and less controversial than the rest.

krzysz00 updated this revision to Diff 509762.Mar 30 2023, 11:59 AM
krzysz00 edited the summary of this revision. (Show Details)

Factor out s.buffer.load memory effect change to earlier patch, rebase

krzysz00 abandoned this revision.Mar 31 2023, 9:04 AM

Abandoning because this'll break backwards compatibility as currently implemented.