This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add intrinsic for converting global pointers to resources
ClosedPublic

Authored by krzysz00 on Apr 21 2023, 1:46 PM.

Details

Summary

Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit
pointer, the 16-bit stride/swizzling constant that replace the high 16
bits of an address in a buffer resource, the 32-bit extent/number of
elements, and the 32-bit flags (the latter two being the 3rd and 4th
wards of the resource), and combines them into a ptr addrspace(8).

This intrinsic is lowered during the early phases of the backend.

This intrinsic is needed so that alias analysis can correctly infer
that a certain buffer resource points to the same memory as some
global pointer. Previous methods of constructing buffer resources,
which relied on ptrtoint, would not allow for such an inference.

Depends on D148184

Diff Detail

Event Timeline

krzysz00 created this revision.Apr 21 2023, 1:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2023, 1:46 PM
krzysz00 requested review of this revision.Apr 21 2023, 1:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2023, 1:46 PM
gandhi21299 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4432

Might be helpful to have an assertion on the number of operands of MI. Is it possible that any of the operands is not a register?

arsenm added inline comments.Apr 24 2023, 11:55 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1009–1010

There's no reason to have different intrinsics for different source address spaces. Just accept a type mangling operand for the input pointer

arsenm added inline comments.Apr 24 2023, 11:57 AM
llvm/lib/Analysis/ValueTracking.cpp
5804

Typo necassarily

arsenm added inline comments.Apr 24 2023, 12:02 PM
llvm/lib/Analysis/ValueTracking.cpp
5810–5811

This handling needs a test (I'm assuming that was the intent of ptr-buffer-alias-scheduling.ll, but I think we also need a pure IR one that doesn't depend on codegen)

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4460–4461

You can combine all of these createGenericVirtualRegister calls like:

auto ExtStride = B.buildAnyExt(S32, Stride)
krzysz00 added inline comments.Apr 24 2023, 1:28 PM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1009–1010
  1. Does "any pointer" work for different address spaces? The documentation's a bit fuzzy
  2. If we're accepting arbitrary pointers, will we then need to, during legalization, reject pointer types that don't make sense (ex. LDS)?
llvm/lib/Analysis/ValueTracking.cpp
5810–5811

The test over in LICM is handling this, though there might be a more straightforward way to do it.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4432

From what I can tell of all the surrounding code ... no?

arsenm added inline comments.Apr 25 2023, 3:48 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1009–1010
  1. Yes
  2. Yes. Ideally we would have a target IR verifier for these sorts of things. In general we just get selection errors for weird things like this. If you just handle any 64-bit pointer I think it will work out that way without having to do anything special
arsenm added inline comments.Apr 25 2023, 1:53 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4432

Only if any operands are immarg, which they aren't (IIRC this was a MachineVerifier check which is missing)

krzysz00 updated this revision to Diff 516896.Apr 25 2023, 2:02 PM
krzysz00 retitled this revision from [AMDGPU] Add intrinsics for converting global pointers to resources to [AMDGPU] Add intrinsic for converting global pointers to resources.
krzysz00 edited the summary of this revision. (Show Details)

Merge intrinsics into one variadic one, add negative test for 32-bit pointers

krzysz00 updated this revision to Diff 516981.Apr 25 2023, 4:36 PM

One more rebasing commit

krzysz00 updated this revision to Diff 517203.Apr 26 2023, 8:59 AM
krzysz00 marked 6 inline comments as done.

Code simplifications - thanks for the info, Matt!

krzysz00 marked an inline comment as done.Apr 26 2023, 9:00 AM
krzysz00 added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
5810–5811

Unless you can think of a "purer" way to check this

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4432

Ok so this is basically fine as is, sounds like

What should happen if I do something like:

  %alloca = ...
  %cast = addrspacecast ptr addrspace(5) %alloca to ptr
  %buffer = call ptr addrspace(8) @llvm.amdgcn.as.buffer.rsrc.p0(ptr %cast ...)

Now you can do evil things like access other lanes private stack items. Should we define this to poison or something?
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
998

Drop this comment, the i8 reference is vestigial

999

make_buffer_rsrc? as makes it sound like a regular cast?

Re the evil code: you're right, but I could also write that as

%alloca = ...
%cast = addrspacecast ptr addrspace(5) %alloca to ptr
%as.int = ptrtoint ptr %cast as i64
%ext = zext i64 %as.int to i128
%buffer.int = or i128 %ext, i128 [...]
; Either
%buffer = inttoptr i128 %buffer.int to ptr addrspace(8)
; or
%buffer = bitcast i128 %buffer.int to <4 x i32>
; Muck around in everyone else's stack, etc etc.
%buffer = call ptr addrspace(8) @llvm.amdgcn.as.buffer.rsrc.p0(ptr %cast ...)

As to what we should *do* about it ... I'm not 100% sure, but it seems to me that, from an IR perspective, this sort of cracking open the stack is allowed, but using it to step on something that's not in bounds of your pointer isn't.
So the conversion itself is valid, but you'd get undef (conceptually) from trying to read things you shouldn't.

In other words, I'd argue IR that takes alloca() results and builds the buffer manually is the same kind of syntactically unavoidable but semantically forbidden behavior as

%alloca = alloca [1 x i8]
%poke = getelementptr i8, ptr %alloca, i64 1024
store i64 ..., ptr %poke

even though creating %poke itself is allowed (if I understand the semantics of LLVM IR correctly).

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
999

I'm thinking of it as an addrspacecast that takes arguments, hence the naming, but I'm not too tied to the name.

arsenm added inline comments.May 1 2023, 2:03 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4442

I thought you had to do B.buildUnmerge({S32, S32}, Pointer)?

4450–4451

Do you need really need the version that returns APInt and the register, or can you use the one that returns int64_t?

4452

Can do !StrideConst

4454

StrideConst

4455

can you just get out of APInt?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8678

Should this be *ConstStride != 0?

krzysz00 added a subscriber: fhahn.

Adding people who most recently touched/reviewed the isIntrinsicReturningPointerAliasingArgumentWithoutCapturing function for their input on whether my patch to it is correct. ( @fhahn in particular since it looks like MustPreserveNullness is yours)

krzysz00 updated this revision to Diff 519308.May 3 2023, 5:05 PM
krzysz00 marked 6 inline comments as done.

Update address space documentation, address some review comments

krzysz00 marked an inline comment as not done.May 3 2023, 5:06 PM
krzysz00 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4450–4451

Having looked around, the APInt version seems to do things like look through chains of sext/trunc/copy/... and otherwise does that sort of constant folding. That might be worth it?

krzysz00 updated this revision to Diff 522297.May 15 2023, 12:30 PM

Rename Extent to NumRecords to match ISA docs.

arsenm added inline comments.May 16 2023, 7:38 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
999

I think "make" or "create" would be better

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.as.buffer.rsrc.ll
2 ↗(On Diff #522297)

Why split the dag and globalisel versions of the tests? The dag version should also use an explicit -global-isel=0 run line

krzysz00 updated this revision to Diff 523060.May 17 2023, 8:21 AM
krzysz00 edited the summary of this revision. (Show Details)

Rename intrinsic per review comments

arsenm accepted this revision.May 23 2023, 3:32 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8686

Hardcoding this to an i32 constant is fine instead of going through DAG.getShiftAmountConstant

8693–8694

can fold to direct return

This revision is now accepted and ready to land.May 23 2023, 3:32 AM
chapuni added a subscriber: chapuni.Jun 5 2023, 2:13 PM
chapuni added inline comments.
llvm/test/CodeGen/AMDGPU/make-buffer-rsrc-lds-fails.ll
2–3

Would they really crash? I guess they require +asserts.