This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add buffer intrinsics that take resources as pointers
ClosedPublic

Authored by krzysz00 on Apr 4 2023, 10:21 AM.

Details

Summary

In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use ptr addrspace(8) instead of <4 x i32> to represent their
rsrc arguments.

The new intrinsics are named by replacing buffer. with buffer.ptr.

One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
noalias buffer can be pulled out of a loop that is modifying a
distinct buffer.

In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.

This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.

Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.

Depends on D145441

Diff Detail

Event Timeline

krzysz00 created this revision.Apr 4 2023, 10:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 10:21 AM
krzysz00 requested review of this revision.Apr 4 2023, 10:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 10:21 AM
krzysz00 added reviewers: arsenm, foad, nhaehnle, Restricted Project.Apr 4 2023, 10:22 AM

Thank you! I could only go over the intrinsics definitions right now and they look good to me.

piotr added a comment.Apr 7 2023, 4:01 AM

Thanks for working on this. Just added a couple of nits.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
493

Typo trunsform.

1331

Typo haev.

llvm/lib/Target/AMDGPU/SIISelLowering.h
255–256

a addrspace -> an addrspace ?
expent -> expect

llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll
3

This should be testing gfx90a, not gfx940, right?

arsenm added inline comments.Apr 7 2023, 4:13 PM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1074–1075

I'd lean towards swapping the naming around, so that you would have "int_amdgcn_raw_ptr_buffer_load". That way the ISA opcode name part remains unbroken

krzysz00 updated this revision to Diff 512514.Apr 11 2023, 10:40 AM
krzysz00 marked 4 inline comments as done.

Rename intrinsics, fix typos

krzysz00 added inline comments.Apr 12 2023, 4:31 PM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1074–1075

I don't see any reason why not, done.

llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll
3

The corresponding non-gisel test has both gfx90a and gfx940 - I figured I should update this one to match while I'm here.

krzysz00 edited the summary of this revision. (Show Details)
krzysz00 edited the summary of this revision. (Show Details)
krzysz00 updated this revision to Diff 516966.Apr 25 2023, 3:59 PM

Rebase, requiring test updates due to some AND/$scc change

arsenm added inline comments.May 1 2023, 1:49 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
513

Can just use an std::array since if this is only 4 piece case?

515–517

Can do VectorElements[I]= B.buildExtractVectorElementConstant(S32, ...).getReg().

Also we really should have a scalarize vector utility in MachineIRBuilder like the DAG does

527

auto BitCast = B.buildBitcast(ScalarTy, BitcastReg)

529

Missing observer notification?

1055–1058

Move this to the end, legal cases should be first and ordered with the most common cases first

2521–2523

This seems very special cased and I don't understand why you need specially handle vector extracts

2529

Fold register creation into the build

2565–2566

Ditto with the extract case

llvm/test/CodeGen/AMDGPU/GlobalISel/fp64-atomics-gfx90a.ll
3

Add new run line in pre-commit

krzysz00 updated this revision to Diff 518829.May 2 2023, 12:34 PM
krzysz00 marked 6 inline comments as done.

Address review comments, update comments

krzysz00 added inline comments.May 2 2023, 12:36 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
134

changeElementType() exists

529

Should I pass the Observer in or have it called at the call sites or?

(Also, is it OK to changingInstruction()/changedInstruction() recursively?)

1055–1058

Having checked, the legal rule matches before the unsupported, and the matching is done in order, so this needs to come first in order to make sure buffer pointer PTR_ADD gets caught in legalization (as opposed to relying on the fact that we currently can't select it)

2521–2523

Updated the comment, and I think we need to handle them both for generality and since @nhaehnle mentioned they could come up in Vulkan

krzysz00 updated this revision to Diff 518867.May 2 2023, 2:53 PM

Add new versions of the legalization tests that got added while I was out

krzysz00 updated this revision to Diff 519292.May 3 2023, 4:04 PM

Split adding gfx40 to the gisel fp64 atomics test to its own commit.

krzysz00 marked 2 inline comments as done.May 17 2023, 7:59 AM
arsenm added a comment.Jun 1 2023, 9:44 AM

This should get a mention in the release notes

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1046

Comment should be updated to have ptr first

1285

Should move this with the other gfx908 intrinsics

1362

Same

1365

Same

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
159–160

Can just pass in getElementCount to LLT::vector

krzysz00 updated this revision to Diff 527623.Jun 1 2023, 2:26 PM
krzysz00 marked 5 inline comments as done.

Add release notes items for this patch stack, fix review comments.

piotr added a comment.Jun 2 2023, 2:53 AM

LGTM with a nit, but please wait for Matt's approval.

I also did some extra sanity testing and verified that all your three outstanding patches are NFC for the graphics workloads.

llvm/lib/Target/AMDGPU/SIISelLowering.h
255–256

expct -> expect

(no need to re-submit to phab just for that)

arsenm accepted this revision.Jun 2 2023, 12:20 PM
This revision is now accepted and ready to land.Jun 2 2023, 12:20 PM
krzysz00 updated this revision to Diff 528471.Jun 5 2023, 9:05 AM

Hopefully this makes arc land work

This revision was landed with ongoing or failed builds.Jun 5 2023, 9:59 AM
This revision was automatically updated to reflect the committed changes.