Page MenuHomePhabricator

Let clang atomic builtins fetch add/sub support floating point types
Needs ReviewPublic

Authored by yaxunl on Dec 19 2019, 1:14 PM.

Details

Summary

Recently atomicrmw started to support fadd/fsub:

https://reviews.llvm.org/D53965

However clang atomic builtins fetch add/sub still does not support emitting atomicrmw fadd/fsub.

This patch adds that.

TargetInfo::isFPAtomicFetchAddSubSupportedByBits is added to condition supporting of fp type
in atomic fetch add/sub builtins. By default it returns false. Target needs to override it
to enable it. This patch only enables it for amdgcn. For other target this patch is NFC.

Diff Detail

Event Timeline

yaxunl created this revision.Dec 19 2019, 1:14 PM

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support?

jfb added a subscriber: __simt__.Dec 19 2019, 2:53 PM
yaxunl added a comment.EditedDec 20 2019, 7:35 AM
In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully.

For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?

arsenm added a subscriber: arsenm.Jan 2 2020, 7:53 AM
arsenm added inline comments.
clang/lib/CodeGen/CGAtomic.cpp
598–600

Should this really be based on the type, or should the builtin name be different for FP?

In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully.

For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?

For systems that have load-link/store-conditional architectures, like ARM / PPC / base RISC-V without extension, I would imagine that using a cmpxchg loop is much worse than simply doing the floating-point add/sub in the middle of the atomic mini-transaction. I'm sure that we want back-ends to be capable of implementing this better than what this pass is doing, even when they don't have "native" fp atomics.

You listed amdgcn... what does this do on nvptx?

In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully.

For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?

For systems that have load-link/store-conditional architectures, like ARM / PPC / base RISC-V without extension, I would imagine that using a cmpxchg loop is much worse than simply doing the floating-point add/sub in the middle of the atomic mini-transaction. I'm sure that we want back-ends to be capable of implementing this better than what this pass is doing, even when they don't have "native" fp atomics.

You listed amdgcn... what does this do on nvptx?

Targets can implement shouldExpandAtomicRMWInIR for the desired behavior, which NVPTX currently does not implement. Looking at AtomicExpandPass, it looks like either cmpxchg or LLSC expansions should work for the FP atomics already

yaxunl updated this revision to Diff 265325.Wed, May 20, 12:38 PM
yaxunl added a reviewer: arsenm.

rebase

yaxunl marked 2 inline comments as done.Wed, May 20, 1:28 PM
In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully.

For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?

For systems that have load-link/store-conditional architectures, like ARM / PPC / base RISC-V without extension, I would imagine that using a cmpxchg loop is much worse than simply doing the floating-point add/sub in the middle of the atomic mini-transaction. I'm sure that we want back-ends to be capable of implementing this better than what this pass is doing, even when they don't have "native" fp atomics.

You listed amdgcn... what does this do on nvptx?

Targets can implement shouldExpandAtomicRMWInIR for the desired behavior, which NVPTX currently does not implement. Looking at AtomicExpandPass, it looks like either cmpxchg or LLSC expansions should work for the FP atomics already

nvptx is similar to hexagon and riscv32, where fp atomics is translated to call of __atomic_fetch_add_4.

Since currently only amdgcn supports fp atomics, I am going to add a TargetInfo hook about whether fp atomics is supported and only emit fp atomics for targets supporting it.

clang/lib/CodeGen/CGAtomic.cpp
598–600

I think the original name is better. They are exactly what they are intended to be. They were not able to handle fp types therefore they used to emit diagnostics when fp types were passed to them. However now they are able to handle fp types.

In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support?

Yes, I guess we do in order to implement fetch_add & friends on floating point types (https://wg21.link/P0020R6).

The builtins would need to work on float, double and long double. The code seems to suggest it does, however the tests only check for float. Does this support __atomic_fetch_{add,sub} on double and long double?

ldionne added inline comments.Wed, May 20, 2:08 PM
clang/test/CodeGen/atomic-ops.c
296 ↗(On Diff #265325)

Sorry if that's a dumb question, but I'm a bit confused: p is a float*, but then we add a double 1.0 to it. Is that intended, or should that be double *p instead (or 1.0f)?

yaxunl marked 3 inline comments as done.Wed, May 20, 4:14 PM
In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support?

Yes, I guess we do in order to implement fetch_add & friends on floating point types (https://wg21.link/P0020R6).

The builtins would need to work on float, double and long double. The code seems to suggest it does, however the tests only check for float. Does this support __atomic_fetch_{add,sub} on double and long double?

It depends on target. For x86_64, __atomic_fetch_{add,sub} on double and long double are translated to __atomic_fetch_sub_8 and __atomic_fetch_sub_16.
For amdgcn, __atomic_fetch_{add,sub} on double is translated to fp atomic insts. long double is the same as double on amdgcn.

clang/test/CodeGen/atomic-ops.c
296 ↗(On Diff #265325)

In this case, the value type is converted to the pointee type of the pointer operand.

jfb added a comment.Wed, May 20, 6:20 PM
In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support?

Yes, I guess we do in order to implement fetch_add & friends on floating point types (https://wg21.link/P0020R6).

The builtins would need to work on float, double and long double. The code seems to suggest it does, however the tests only check for float. Does this support __atomic_fetch_{add,sub} on double and long double?

libc++ could implement atomic<float> using a cmpxchg loop with bit_cast and the FP instruction in most cases, and only use these builtins if available.

yaxunl updated this revision to Diff 265479.Thu, May 21, 5:24 AM
yaxunl marked an inline comment as done.
yaxunl edited the summary of this revision. (Show Details)

Added TargetInfo::isFPAtomicFetchAddSubSupported to guard fp atomic.

tra added a subscriber: tra.Thu, May 21, 10:49 AM
tra added inline comments.
clang/include/clang/Basic/TargetInfo.h
1424

I think it should be predicated on specific type.
E.g. NVPTX supports atomic ops on fp32 ~everywhere, but fp64 atomic add/sub is only supported on newer GPUs.
And then there's fp16...

ldionne added inline comments.Thu, May 21, 11:45 AM
clang/test/CodeGen/atomic-ops.c
296 ↗(On Diff #265325)

Ok, thanks for the clarification. Yeah, it was a dumb question after all. I still think it should be made clearer by using 1.0f.

yaxunl marked 3 inline comments as done.Thu, May 21, 2:32 PM
yaxunl added inline comments.
clang/include/clang/Basic/TargetInfo.h
1424

will do and add tests for fp16

clang/test/CodeGen/atomic-ops.c
296 ↗(On Diff #265325)

this test has been removed. the new tests do not have this issue.

yaxunl updated this revision to Diff 265606.Thu, May 21, 2:39 PM
yaxunl marked 2 inline comments as done.
yaxunl edited the summary of this revision. (Show Details)
yaxunl added a reviewer: tra.

check supported fp atomics by bits.

ldionne added inline comments.Mon, May 25, 11:10 AM
clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
26

Nitpick, but this should be 1.0L to be consistent.

tra added inline comments.Tue, May 26, 9:58 AM
clang/include/clang/Basic/TargetInfo.h
1424

The number of bits alone may not be sufficient to differentiate the FP variants.
E.g. 16-bit floats currently have 2 variants: IEEE FP16 and BFloat16 (supported by intel and newer NVIDIA GPUs).
CUDA-11 has introduced TF32 FP format, so we're likely to have more than one 32-bit FP type, too.
I think PPC has an odd long double variant represented as pair of 64-bit doubles.