Page MenuHomePhabricator

[LLVM][AMDGPU] Specialize 32-bit atomic fadd instruction for generic address space
AcceptedPublic

Authored by tianshilei1992 on Jul 13 2022, 1:28 PM.

Details

Summary

The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).

In order to do that, this patch introduces a new interface function
emitExpandAtomicRMW. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.

Diff Detail

Unit TestsFailed

TimeTest
60,030 msx64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/LeakTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/fuzzer-leak.test.tmp-LeakTest

Event Timeline

tianshilei1992 created this revision.Jul 13 2022, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 1:28 PM
tianshilei1992 requested review of this revision.Jul 13 2022, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 1:28 PM

I would expect to have a test in test/Transforms/AtomicExpand/AMDGPU like the others there

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13010

Can cast to private and do a non-atomic load

13012

Same for the store

13016–13017

This is ignoring some of the edge case behavior treatment for the atomic instructions. I would have to look up the details again

13046

assert is redundant with the cast<>

13052

this doesn't look opaque pointer friendly? CreatePointerCast is heavier than you need anyway

13057

Should use getIntrinsic with the enum, not refer to the intrinsic by name (or CreateIntrinsic)

13065–13066

getNullValue

13067–13068

getFalse

13076

Ditto

llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
2

Should also make sure to cover gfx908 and 90a

5–9

This doesn't demonstrate any of the looping structure

17

Don't need most of these attributes

arsenm added inline comments.Jul 13 2022, 1:57 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13069–13071

You shouldn't need to use the intrinsic. You can use the atomicrmw with the new address space and rely on the existing handling

13090–13091

Same here, could just emit the atomicrmw with addrspace(1)

arsenm added inline comments.Jul 13 2022, 2:00 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13083

Pass through AA mteadata?

tianshilei1992 added inline comments.Jul 13 2022, 2:04 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13057

Well, I agree, but that intrinsic is not in llvm yet. clang directly lowers the compiler built-in to this. As a result, directly using the name is a WA.

llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
5–9

There is no loop.

arsenm added inline comments.Jul 13 2022, 2:06 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13057

Yes it is, the intrinsic wouldn't work at all if it weren't

llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
5–9

I mean branching

arsenm added inline comments.Jul 13 2022, 2:08 PM
llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
5–9

Plus the global case does still require the cmpxchg loop in some cases. e.g. everything in shouldExpandAtomicRMWInIR still applies for the atomics you are emitting

rampitec added inline comments.Jul 13 2022, 2:35 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12791–12792

If this atomic falls into system scope it has to be expanded into CAS. This code breaks the logic.
The check below was done after the AS check to perform a fast check first since the outcome is the same anyway. This is not true anymore.

tianshilei1992 marked 12 inline comments as done.

partially fix comments

tianshilei1992 added inline comments.Jul 20 2022, 7:59 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13057

It looks like I didn't use CreateIntrinsic correctly. It hits the following assertion (llvm/lib/IR/Function.cpp:894):

assert((Tys.empty() || Intrinsic::isOverloaded(Id)) &&
       "This version of getName is for overloaded intrinsics only");

Isn't Intrinsic::amdgcn_is_shared the right intrinsic ID?

fix assertion

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13057

K, I fixed that.

add the check for branch instruction in test and remove unnecessary features

tianshilei1992 marked 3 inline comments as done.Jul 21 2022, 11:50 AM
tianshilei1992 marked 2 inline comments as done.Jul 21 2022, 12:00 PM
tianshilei1992 added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13083

Can you expatiate it? I didn't get it.

rampitec added inline comments.Jul 21 2022, 12:01 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12791–12792

Thanks, I believe it is correct now for the CAS vs expand logic ans system scope.

update test for GFX90A

tianshilei1992 marked 2 inline comments as done.Jul 21 2022, 4:12 PM

I'd still like to have an IR to IR test in test/Transforms/AtomicExpand

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13011

put addrspace(5) here

13083

It's probably not important, but you can forward any aliasing metadata through from the original atomic to the new memory operation.

llvm/test/CodeGen/AMDGPU/atomicrmw-expand.ll
46–47

These two attribute groups are the same. Also you can drop the target-features

tianshilei1992 marked 3 inline comments as done.Jul 21 2022, 7:51 PM

I'd still like to have an IR to IR test in test/Transforms/AtomicExpand

Oh, that will be added soon!

add an IR test to llvm/test/Transforms/AtomicExpand/AMDGPU

tianshilei1992 marked an inline comment as done.Jul 22 2022, 3:09 AM

update comments

Is anything else needed to be done? I'd like to get it in before the code freeze such that we could directly pull it down to internal repo.

arsenm added inline comments.Aug 1 2022, 1:49 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13064–13065

There are other metadata nodes, maybe there is a helper for it?

13071–13072

Should be able to unconditionally call CreateBitCast

llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd-flat-specialization.ll
120

Also should test with this off to make sure it's appropriately expanded. The pass may need something to re-visit the newly emitted atomicrmw

tianshilei1992 marked 3 inline comments as done.

rebase and update comments

New week, new ping. :-)

rebase and ping

rampitec added inline comments.Tue, Sep 6, 12:46 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12804

At this point this is gfx908 and gfx11. Then gfx11 has flat_atomic_add_f32.

It also appears to return Expand for double, but emitExpandAtomicRMW does not support doubles.

tianshilei1992 added inline comments.Thu, Sep 8, 12:08 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12804

Thanks for the info. I'll make the change accordingly.
Is there any place listing those support among different versions? In that way I can have a complete picture?

rampitec added inline comments.Thu, Sep 8, 12:09 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12804

I was checking our own MC tests. I found it easiest.

tianshilei1992 added inline comments.Thu, Sep 8, 12:11 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12804

K, gotcha. Thx!

arsenm accepted this revision.Thu, Sep 22, 9:11 AM

I think this LGTM, but I'm having a real hard time re-sorting through the mess of atomic legality conditions

This revision is now accepted and ready to land.Thu, Sep 22, 9:11 AM

@Petar.Avramovic has sorted through this mess more recently than I

I am not sure about changes in SIISelLowering.cpp, it looks correct for gfx90a but not for gfx908. Can you rebase on top of D131560?
There are some additions to when rmw fadd atomics are expanded.
If I am reading this correctly, flat f32 fadd that is non-system scope and function has "amdgpu-unsafe-fp-atomics"="true" will use expand from this patch on gfx908 no-rtn fadd and gfx90a?
Remaining two targets (gfx940 and gfx11) that have global fadd f32 also have flat fadd f32 instructions.
Can you also update summary, there are a few targets that have flat/global fadd.
You are changing way of expansion on targets that have global fadd but does not have flat fadd instruction (if atomic is non-system scope and function has "amdgpu-unsafe-fp-atomics"="true" attribute)?
Also there is no check if target has hasLDSFPAtomicAdd before using AtomicExpansionKind::Expand (targets affected by this change have it but should probably add feature check before expanding)

llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd-flat-specialization.ll
27

There are some changes in D131560, this will have to be expanded for gfx908.