This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] add LDS f32 intrinsics
ClosedPublic

Authored by dfukalov on Sep 18 2017, 11:17 AM.

Details

Summary

added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Diff Detail

Event Timeline

dfukalov created this revision.Sep 18 2017, 11:17 AM
This revision is now accepted and ready to land.Sep 18 2017, 11:23 AM

Since it's atomic it needs to be added to isIntrinsicSourceOfDivergence. Also getTgtMemIntrinsic, and getAddrModeArguments (all with associated tests)

include/llvm/IR/IntrinsicsAMDGPU.td
303

Should this have an operand added for the ordering?

304

This is certainly not IntrNoReturn

arsenm requested changes to this revision.Sep 18 2017, 11:27 AM
This revision now requires changes to proceed.Sep 18 2017, 11:27 AM
dfukalov updated this revision to Diff 115694.Sep 18 2017, 11:40 AM
dfukalov edited edge metadata.
dfukalov marked an inline comment as done.
dfukalov added inline comments.
include/llvm/IR/IntrinsicsAMDGPU.td
303

No, these intrinsics are created by request to be able to generate ds_{add|min|max}[_rtn]_f32 in case of OpenCL local memory atomics only. They work only for pointers to floats located in addrspace 3

304

You're right, I interpreted it as intrinsic that has no return value, diff updated

arsenm added inline comments.Sep 18 2017, 11:50 AM
include/llvm/IR/IntrinsicsAMDGPU.td
303

That doesn't change the ordering. Also needs an operand for volatile

t-tye added inline comments.Sep 18 2017, 11:54 AM
include/llvm/IR/IntrinsicsAMDGPU.td
303

How are the memory ordering, memory scope and volatile carried through so that those fields can be set in the Machine Memory Operand? All these properties are needed to generate the correct waitcnt in the memory legalizer (see AMDGPUUsage.rst section on memory model).

arsenm added inline comments.Sep 18 2017, 2:27 PM
include/llvm/IR/IntrinsicsAMDGPU.td
303

Those are supposed to be handled by the intrinsic callbacks I mentioned that need to be implemented (at least for volatile). I'm not sure anything really correctly considers the atomic scope for possibly atomic intrinsics.

The most similar case we have is for amdgcn_atomic_inc/dec.

t-tye added inline comments.Sep 19 2017, 12:47 PM
include/llvm/IR/IntrinsicsAMDGPU.td
300

Need to add the same fields as for AMDGPUAtomicIncIntrin, namely:

llvm_i32_ty, // ordering
llvm_i32_ty, // scope
llvm_i1_ty], // isVolatile
305

Need to add the same fields as for AMDGPUAtomicIncIntrin, namely:

llvm_i32_ty, // ordering
llvm_i32_ty, // scope
llvm_i1_ty], // isVolatile
lib/Target/AMDGPU/DSInstructions.td
776–785

Also, somehow a MachineMemoryOperand needs to be created with values from the ordering, scope and isVolatile LLVM IR instruction operands.

t-tye requested changes to this revision.Sep 19 2017, 12:47 PM
This revision now requires changes to proceed.Sep 19 2017, 12:47 PM

This also should have an intermediate node added, so it is easy to add it to the various switches over memory operations (e.g. we look for all memory operations in PerformDAGCombine)

dfukalov marked an inline comment as done.Sep 20 2017, 12:14 PM

Am I right that since we should have almost the same processing as atomic inc intrinsics,
it would be better idea to define ds_add/min/max intrinsics the same way as AMDGPUAtomicIncIntrin (or unify them),
and then update AMDGPU BE to correctly process these ds_ instrisics the same ways as atomic inc/dec?

dfukalov updated this revision to Diff 126743.Dec 13 2017, 5:49 AM
dfukalov edited the summary of this revision. (Show Details)

updates as requested by reviewers

dfukalov marked 4 inline comments as done.Dec 13 2017, 5:51 AM
arsenm added inline comments.Dec 20 2017, 9:19 AM
include/llvm/IR/IntrinsicsAMDGPU.td
306

These should probably be named fadd.. etc to match the IR operations

lib/Target/AMDGPU/SIISelLowering.cpp
574–576

This will need to be rebased since I just changed these last week

6549–6552

Tests for these with this combine would be nice

test/CodeGen/AMDGPU/lds_atomic_f32.ll
1

Can you also add a pre-gfx9 run line, and check for the m0 initialization

dfukalov updated this revision to Diff 129829.Jan 15 2018, 4:02 AM

diff updated according to latest comments

dfukalov marked 4 inline comments as done.Jan 15 2018, 4:03 AM
arsenm accepted this revision.Jan 15 2018, 8:38 AM

LGTM except for the mystery name mangling in the test

test/CodeGen/AMDGPU/lds_atomic_f32.ll
5

These weren't declared as mangled, so why do these have the .f32? I'm kind of surprised this is accepted

dfukalov set the repository for this revision to rL LLVM.Jan 17 2018, 6:02 AM
This revision was not accepted when it landed; it landed in state Needs Review.Jan 17 2018, 6:06 AM
This revision was automatically updated to reflect the committed changes.