This commit adds the -lower-buffer-fat-pointers pass, which is
applicable to all AMDGCN compilations.
The purpose of this pass is to remove the type ptr addrspace(7) from
incoming IR. This must be done at the LLVM IR level because `ptr
addrspace(7)`, as a 160-bit primitive type, cannot be correctly
handled by SelectionDAG.
The detailed operation of the pass is described in comments, but, in
summary, the removal proceeds by:
- Rewriting loads and stores of ptr addrspace(7) to loads and stores
of i160 (including vectors and aggregates). This is needed because the
in-register representation of these pointers will stop matching their
in-memory representation in step 2, and so ptrtoint/inttoptr
operations are used to preserve the expected memory layout
- Mutating the IR to replace all occurrences of ptr addrspace(7)
with the type {ptr addrspace(8), ptr addrspace(6) }, which makes the
two parts of a buffer fat pointer (the 128-bit address space 8
resource and the 32-bit address space 6 offset) visible in the IR.
This also impacts the argument and return types of functions.
- *Splitting* the resource and offset parts. All instructions that
produce or consume buffer fat pointers (like GEP or load) are
rewritten to produce or consume the resource and offset parts
separately. For example, GEP updates the offset part of the result and
a load uses the resource and offset parts to populate the relevant
llvm.amdgcn.raw.ptr.buffer.load intrinsic call.
At the end of this process, the original mutated instructions are
replaced by their new split counterparts, ensuring no invalidly-typed
IR escapes this pass. (For operations like call, where the struct form
is needed, insertelement operations are inserted).
Compared to LGC's PatchBufferOp (
https://github.com/GPUOpen-Drivers/llpc/blob/32cda89776980202597d5bf4ed4447a1bae64047/lgc/patch/PatchBufferOp.cpp
): this pass
- Also handles vectors of ptr addrspace(7)s
- Also handles function boundaries
- Includes the same uniform buffer optimization for loops and
conditionals
- Does *not* handle memcpy() and friends (this is future work)
- Does *not* break up large loads and stores into smaller parts. This
should be handled by extending the legalization
of *.buffer.{load,store} to handle larger types by producing multiple
instructions (the same way ordinary LOAD and STORE are legalized).
That work is planned for a followup commit.
- Does *not* have special logic for handling divergent buffer
descriptors. The logic in LGC is, as far as I can tell, incorrect in
general, and, per discussions with @nhaehnle, isn't widely used.
Therefore, divergent descriptors are handled with waterfall loops
later in legalization.
As a final matter, this commit updates atomic expansion to treat
buffer operations analogously to global ones.
(One question for reviewers: is the new pass is the right place?
Should it be later in the pipeline?)
Not padded out to same column as the later header lines