This is an archive of the discontinued LLVM Phabricator instance.

[LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space
ClosedPublic

Authored by tianshilei1992 on Jul 13 2022, 1:28 PM.

Details

Summary

The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).

In order to do that, this patch introduces a new interface function
emitExpandAtomicRMW. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Jul 13 2022, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 1:28 PM
tianshilei1992 requested review of this revision.Jul 13 2022, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 1:28 PM

I would expect to have a test in test/Transforms/AtomicExpand/AMDGPU like the others there

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13009–13010

This is ignoring some of the edge case behavior treatment for the atomic instructions. I would have to look up the details again

13039

assert is redundant with the cast<>

13045

this doesn't look opaque pointer friendly? CreatePointerCast is heavier than you need anyway

13050

Should use getIntrinsic with the enum, not refer to the intrinsic by name (or CreateIntrinsic)

13058–13059

getNullValue

13060–13061

getFalse

13069

Ditto

llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
2

Should also make sure to cover gfx908 and 90a

5–9

This doesn't demonstrate any of the looping structure

17

Don't need most of these attributes

arsenm added inline comments.Jul 13 2022, 1:52 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13003

Can cast to private and do a non-atomic load

13005

Same for the store

arsenm added inline comments.Jul 13 2022, 1:57 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13062–13064

You shouldn't need to use the intrinsic. You can use the atomicrmw with the new address space and rely on the existing handling

13083–13084

Same here, could just emit the atomicrmw with addrspace(1)

arsenm added inline comments.Jul 13 2022, 2:00 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13076

Pass through AA mteadata?

tianshilei1992 added inline comments.Jul 13 2022, 2:04 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13050

Well, I agree, but that intrinsic is not in llvm yet. clang directly lowers the compiler built-in to this. As a result, directly using the name is a WA.

llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
5–9

There is no loop.

arsenm added inline comments.Jul 13 2022, 2:06 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13050

Yes it is, the intrinsic wouldn't work at all if it weren't

llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
5–9

I mean branching

arsenm added inline comments.Jul 13 2022, 2:08 PM
llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
5–9

Plus the global case does still require the cmpxchg loop in some cases. e.g. everything in shouldExpandAtomicRMWInIR still applies for the atomics you are emitting

rampitec added inline comments.Jul 13 2022, 2:35 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12768–12769

If this atomic falls into system scope it has to be expanded into CAS. This code breaks the logic.
The check below was done after the AS check to perform a fast check first since the outcome is the same anyway. This is not true anymore.

tianshilei1992 marked 12 inline comments as done.

partially fix comments

tianshilei1992 added inline comments.Jul 20 2022, 7:59 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13050

It looks like I didn't use CreateIntrinsic correctly. It hits the following assertion (llvm/lib/IR/Function.cpp:894):

assert((Tys.empty() || Intrinsic::isOverloaded(Id)) &&
       "This version of getName is for overloaded intrinsics only");

Isn't Intrinsic::amdgcn_is_shared the right intrinsic ID?

fix assertion

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13050

K, I fixed that.

add the check for branch instruction in test and remove unnecessary features

tianshilei1992 marked 3 inline comments as done.Jul 21 2022, 11:50 AM
tianshilei1992 marked 2 inline comments as done.Jul 21 2022, 12:00 PM
tianshilei1992 added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13076

Can you expatiate it? I didn't get it.

rampitec added inline comments.Jul 21 2022, 12:01 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12768–12769

Thanks, I believe it is correct now for the CAS vs expand logic ans system scope.

update test for GFX90A

tianshilei1992 marked 2 inline comments as done.Jul 21 2022, 4:12 PM

I'd still like to have an IR to IR test in test/Transforms/AtomicExpand

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13004

put addrspace(5) here

13076

It's probably not important, but you can forward any aliasing metadata through from the original atomic to the new memory operation.

llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
46–47

These two attribute groups are the same. Also you can drop the target-features

tianshilei1992 marked 3 inline comments as done.Jul 21 2022, 7:51 PM

I'd still like to have an IR to IR test in test/Transforms/AtomicExpand

Oh, that will be added soon!

add an IR test to llvm/test/Transforms/AtomicExpand/AMDGPU

tianshilei1992 marked an inline comment as done.Jul 22 2022, 3:09 AM

update comments

Is anything else needed to be done? I'd like to get it in before the code freeze such that we could directly pull it down to internal repo.

arsenm added inline comments.Aug 1 2022, 1:49 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13057–13058

There are other metadata nodes, maybe there is a helper for it?

13064–13065

Should be able to unconditionally call CreateBitCast

llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd-flat-specialization.ll
120

Also should test with this off to make sure it's appropriately expanded. The pass may need something to re-visit the newly emitted atomicrmw

tianshilei1992 marked 3 inline comments as done.

rebase and update comments

New week, new ping. :-)

rebase and ping

rampitec added inline comments.Sep 6 2022, 12:46 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12781

At this point this is gfx908 and gfx11. Then gfx11 has flat_atomic_add_f32.

It also appears to return Expand for double, but emitExpandAtomicRMW does not support doubles.

tianshilei1992 added inline comments.Sep 8 2022, 12:08 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12781

Thanks for the info. I'll make the change accordingly.
Is there any place listing those support among different versions? In that way I can have a complete picture?

rampitec added inline comments.Sep 8 2022, 12:09 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12781

I was checking our own MC tests. I found it easiest.

tianshilei1992 added inline comments.Sep 8 2022, 12:11 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12781

K, gotcha. Thx!

arsenm accepted this revision.Sep 22 2022, 9:11 AM

I think this LGTM, but I'm having a real hard time re-sorting through the mess of atomic legality conditions

This revision is now accepted and ready to land.Sep 22 2022, 9:11 AM

@Petar.Avramovic has sorted through this mess more recently than I

I am not sure about changes in SIISelLowering.cpp, it looks correct for gfx90a but not for gfx908. Can you rebase on top of D131560?
There are some additions to when rmw fadd atomics are expanded.
If I am reading this correctly, flat f32 fadd that is non-system scope and function has "amdgpu-unsafe-fp-atomics"="true" will use expand from this patch on gfx908 no-rtn fadd and gfx90a?
Remaining two targets (gfx940 and gfx11) that have global fadd f32 also have flat fadd f32 instructions.
Can you also update summary, there are a few targets that have flat/global fadd.
You are changing way of expansion on targets that have global fadd but does not have flat fadd instruction (if atomic is non-system scope and function has "amdgpu-unsafe-fp-atomics"="true" attribute)?
Also there is no check if target has hasLDSFPAtomicAdd before using AtomicExpansionKind::Expand (targets affected by this change have it but should probably add feature check before expanding)

llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd-flat-specialization.ll
28

There are some changes in D131560, this will have to be expanded for gfx908.

rebase, add more tests

I am not sure about changes in SIISelLowering.cpp, it looks correct for gfx90a but not for gfx908. Can you rebase on top of D131560?
There are some additions to when rmw fadd atomics are expanded.
If I am reading this correctly, flat f32 fadd that is non-system scope and function has "amdgpu-unsafe-fp-atomics"="true" will use expand from this patch on gfx908 no-rtn fadd and gfx90a?
Remaining two targets (gfx940 and gfx11) that have global fadd f32 also have flat fadd f32 instructions.
Can you also update summary, there are a few targets that have flat/global fadd.
You are changing way of expansion on targets that have global fadd but does not have flat fadd instruction (if atomic is non-system scope and function has "amdgpu-unsafe-fp-atomics"="true" attribute)?
Also there is no check if target has hasLDSFPAtomicAdd before using AtomicExpansionKind::Expand (targets affected by this change have it but should probably add feature check before expanding)

Thanks for the info. I rebased the patch and refined the logic to determine. Does it look right now?

tianshilei1992 requested review of this revision.Oct 5 2022, 8:18 AM

When to expand part LGTM.
For clarity, you could also check for Subtarget->hasLDSFPAtomicAdd() together with Subtarget->hasAtomicFaddRtnInsts() and Subtarget->hasAtomicFaddNoRtnInsts() to match feature description and instructions generated during expansion (It looks to me that expand assumes that target has ds_add).
Can you re-check tests? There should be some changes in llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll, also autogenerate llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll (btw it failed for me).

fix comments and update tests

tianshilei1992 marked 3 inline comments as done.Oct 5 2022, 10:45 AM
nhaehnle removed a subscriber: nhaehnle.Oct 6 2022, 2:46 AM
arsenm accepted this revision.Nov 1 2022, 2:44 PM

LGTM with nit

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12787

Typo lsd, s/lsd/LDS/

This revision is now accepted and ready to land.Nov 1 2022, 2:44 PM

rebase and fix typo

arsenm accepted this revision.Nov 4 2022, 10:07 AM