added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS
Details
Diff Detail
Event Timeline
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
303 | No, these intrinsics are created by request to be able to generate ds_{add|min|max}[_rtn]_f32 in case of OpenCL local memory atomics only. They work only for pointers to floats located in addrspace 3 | |
304 | You're right, I interpreted it as intrinsic that has no return value, diff updated |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
303 | That doesn't change the ordering. Also needs an operand for volatile |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
303 | How are the memory ordering, memory scope and volatile carried through so that those fields can be set in the Machine Memory Operand? All these properties are needed to generate the correct waitcnt in the memory legalizer (see AMDGPUUsage.rst section on memory model). |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
303 | Those are supposed to be handled by the intrinsic callbacks I mentioned that need to be implemented (at least for volatile). I'm not sure anything really correctly considers the atomic scope for possibly atomic intrinsics. The most similar case we have is for amdgcn_atomic_inc/dec. |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
300 | Need to add the same fields as for AMDGPUAtomicIncIntrin, namely: llvm_i32_ty, // ordering llvm_i32_ty, // scope llvm_i1_ty], // isVolatile | |
305 | Need to add the same fields as for AMDGPUAtomicIncIntrin, namely: llvm_i32_ty, // ordering llvm_i32_ty, // scope llvm_i1_ty], // isVolatile | |
lib/Target/AMDGPU/DSInstructions.td | ||
619–628 | Also, somehow a MachineMemoryOperand needs to be created with values from the ordering, scope and isVolatile LLVM IR instruction operands. |
This also should have an intermediate node added, so it is easy to add it to the various switches over memory operations (e.g. we look for all memory operations in PerformDAGCombine)
Am I right that since we should have almost the same processing as atomic inc intrinsics,
it would be better idea to define ds_add/min/max intrinsics the same way as AMDGPUAtomicIncIntrin (or unify them),
and then update AMDGPU BE to correctly process these ds_ instrisics the same ways as atomic inc/dec?
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
306 | These should probably be named fadd.. etc to match the IR operations | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
574–576 ↗ | (On Diff #126743) | This will need to be rebased since I just changed these last week |
6549–6552 ↗ | (On Diff #126743) | Tests for these with this combine would be nice |
test/CodeGen/AMDGPU/lds_atomic_f32.ll | ||
1 ↗ | (On Diff #126743) | Can you also add a pre-gfx9 run line, and check for the m0 initialization |
LGTM except for the mystery name mangling in the test
test/CodeGen/AMDGPU/lds_atomic_f32.ll | ||
---|---|---|
4 ↗ | (On Diff #129829) | These weren't declared as mangled, so why do these have the .f32? I'm kind of surprised this is accepted |
Need to add the same fields as for AMDGPUAtomicIncIntrin, namely: