This is an archive of the discontinued LLVM Phabricator instance.

Let clang atomic builtins fetch add/sub support floating point types
ClosedPublic

Authored by yaxunl on Dec 19 2019, 1:14 PM.

Details

Summary

Recently atomicrmw started to support fadd/fsub:

https://reviews.llvm.org/D53965

However clang atomic builtins fetch add/sub still does not support emitting atomicrmw fadd/fsub.

This patch adds that.

Diff Detail

Event Timeline

yaxunl created this revision.Dec 19 2019, 1:14 PM

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support?

jfb added a subscriber: __simt__.Dec 19 2019, 2:53 PM
yaxunl added a comment.EditedDec 20 2019, 7:35 AM
In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully.

For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?

arsenm added a subscriber: arsenm.Jan 2 2020, 7:53 AM
arsenm added inline comments.
clang/lib/CodeGen/CGAtomic.cpp
605–607

Should this really be based on the type, or should the builtin name be different for FP?

In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully.

For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?

For systems that have load-link/store-conditional architectures, like ARM / PPC / base RISC-V without extension, I would imagine that using a cmpxchg loop is much worse than simply doing the floating-point add/sub in the middle of the atomic mini-transaction. I'm sure that we want back-ends to be capable of implementing this better than what this pass is doing, even when they don't have "native" fp atomics.

You listed amdgcn... what does this do on nvptx?

In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully.

For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?

For systems that have load-link/store-conditional architectures, like ARM / PPC / base RISC-V without extension, I would imagine that using a cmpxchg loop is much worse than simply doing the floating-point add/sub in the middle of the atomic mini-transaction. I'm sure that we want back-ends to be capable of implementing this better than what this pass is doing, even when they don't have "native" fp atomics.

You listed amdgcn... what does this do on nvptx?

Targets can implement shouldExpandAtomicRMWInIR for the desired behavior, which NVPTX currently does not implement. Looking at AtomicExpandPass, it looks like either cmpxchg or LLSC expansions should work for the FP atomics already

yaxunl updated this revision to Diff 265325.May 20 2020, 12:38 PM
yaxunl added a reviewer: arsenm.

rebase

yaxunl marked 2 inline comments as done.May 20 2020, 1:28 PM
In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully.

For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?

For systems that have load-link/store-conditional architectures, like ARM / PPC / base RISC-V without extension, I would imagine that using a cmpxchg loop is much worse than simply doing the floating-point add/sub in the middle of the atomic mini-transaction. I'm sure that we want back-ends to be capable of implementing this better than what this pass is doing, even when they don't have "native" fp atomics.

You listed amdgcn... what does this do on nvptx?

Targets can implement shouldExpandAtomicRMWInIR for the desired behavior, which NVPTX currently does not implement. Looking at AtomicExpandPass, it looks like either cmpxchg or LLSC expansions should work for the FP atomics already

nvptx is similar to hexagon and riscv32, where fp atomics is translated to call of __atomic_fetch_add_4.

Since currently only amdgcn supports fp atomics, I am going to add a TargetInfo hook about whether fp atomics is supported and only emit fp atomics for targets supporting it.

clang/lib/CodeGen/CGAtomic.cpp
605–607

I think the original name is better. They are exactly what they are intended to be. They were not able to handle fp types therefore they used to emit diagnostics when fp types were passed to them. However now they are able to handle fp types.

In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support?

Yes, I guess we do in order to implement fetch_add & friends on floating point types (https://wg21.link/P0020R6).

The builtins would need to work on float, double and long double. The code seems to suggest it does, however the tests only check for float. Does this support __atomic_fetch_{add,sub} on double and long double?

ldionne added inline comments.May 20 2020, 2:08 PM
clang/test/CodeGen/atomic-ops.c
296 ↗(On Diff #265325)

Sorry if that's a dumb question, but I'm a bit confused: p is a float*, but then we add a double 1.0 to it. Is that intended, or should that be double *p instead (or 1.0f)?

yaxunl marked 3 inline comments as done.May 20 2020, 4:14 PM
In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support?

Yes, I guess we do in order to implement fetch_add & friends on floating point types (https://wg21.link/P0020R6).

The builtins would need to work on float, double and long double. The code seems to suggest it does, however the tests only check for float. Does this support __atomic_fetch_{add,sub} on double and long double?

It depends on target. For x86_64, __atomic_fetch_{add,sub} on double and long double are translated to __atomic_fetch_sub_8 and __atomic_fetch_sub_16.
For amdgcn, __atomic_fetch_{add,sub} on double is translated to fp atomic insts. long double is the same as double on amdgcn.

clang/test/CodeGen/atomic-ops.c
296 ↗(On Diff #265325)

In this case, the value type is converted to the pointee type of the pointer operand.

jfb added a comment.May 20 2020, 6:20 PM
In D71726#1791904, @jfb wrote:

This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :)

Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support?

Yes, I guess we do in order to implement fetch_add & friends on floating point types (https://wg21.link/P0020R6).

The builtins would need to work on float, double and long double. The code seems to suggest it does, however the tests only check for float. Does this support __atomic_fetch_{add,sub} on double and long double?

libc++ could implement atomic<float> using a cmpxchg loop with bit_cast and the FP instruction in most cases, and only use these builtins if available.

yaxunl updated this revision to Diff 265479.May 21 2020, 5:24 AM
yaxunl marked an inline comment as done.
yaxunl edited the summary of this revision. (Show Details)

Added TargetInfo::isFPAtomicFetchAddSubSupported to guard fp atomic.

tra added a subscriber: tra.May 21 2020, 10:49 AM
tra added inline comments.
clang/include/clang/Basic/TargetInfo.h
1418 ↗(On Diff #265479)

I think it should be predicated on specific type.
E.g. NVPTX supports atomic ops on fp32 ~everywhere, but fp64 atomic add/sub is only supported on newer GPUs.
And then there's fp16...

ldionne added inline comments.May 21 2020, 11:45 AM
clang/test/CodeGen/atomic-ops.c
296 ↗(On Diff #265325)

Ok, thanks for the clarification. Yeah, it was a dumb question after all. I still think it should be made clearer by using 1.0f.

yaxunl marked 3 inline comments as done.May 21 2020, 2:32 PM
yaxunl added inline comments.
clang/include/clang/Basic/TargetInfo.h
1418 ↗(On Diff #265479)

will do and add tests for fp16

clang/test/CodeGen/atomic-ops.c
296 ↗(On Diff #265325)

this test has been removed. the new tests do not have this issue.

yaxunl updated this revision to Diff 265606.May 21 2020, 2:39 PM
yaxunl marked 2 inline comments as done.
yaxunl edited the summary of this revision. (Show Details)
yaxunl added a reviewer: tra.

check supported fp atomics by bits.

ldionne added inline comments.May 25 2020, 11:10 AM
clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
27

Nitpick, but this should be 1.0L to be consistent.

tra added inline comments.May 26 2020, 9:58 AM
clang/include/clang/Basic/TargetInfo.h
1418 ↗(On Diff #265479)

The number of bits alone may not be sufficient to differentiate the FP variants.
E.g. 16-bit floats currently have 2 variants: IEEE FP16 and BFloat16 (supported by intel and newer NVIDIA GPUs).
CUDA-11 has introduced TF32 FP format, so we're likely to have more than one 32-bit FP type, too.
I think PPC has an odd long double variant represented as pair of 64-bit doubles.

yaxunl marked 4 inline comments as done.Jul 18 2020, 3:44 PM
yaxunl added inline comments.
clang/include/clang/Basic/TargetInfo.h
1418 ↗(On Diff #265479)

will use llvm::fltSemantics for checking, which should cover different fp types.

clang/test/CodeGenCUDA/amdgpu-atomic-ops.cu
27

done

yaxunl updated this revision to Diff 279032.Jul 18 2020, 3:46 PM
yaxunl marked 2 inline comments as done.

use llvm::fltSemantics for checking

Why not have clang always emit atomicrmw for floats, and let AtomicExpandPass handle legalizing that into integer atomics if necessary, rather than adding a target hook in clang?

Why not have clang always emit atomicrmw for floats, and let AtomicExpandPass handle legalizing that into integer atomics if necessary, rather than adding a target hook in clang?

Not all targets can legalize fp atomics by AtomicExpandPass. Some targets need library support.

jfb added a comment.Jul 21 2020, 3:22 PM

Why not have clang always emit atomicrmw for floats, and let AtomicExpandPass handle legalizing that into integer atomics if necessary, rather than adding a target hook in clang?

Not all targets can legalize fp atomics by AtomicExpandPass. Some targets need library support.

What are they missing? It can be expanded to a cmpxchg loop with bitcast to an integer type of the same size.

Why not have clang always emit atomicrmw for floats, and let AtomicExpandPass handle legalizing that into integer atomics if necessary, rather than adding a target hook in clang?

Not all targets can legalize fp atomics by AtomicExpandPass. Some targets need library support.

That isn't true, because you can do so generically with a cmpxchg loop, assuming that size of atomic is supported by the target. This might not be the most efficient lowering choice, but it's always possible as a fallback. (And if the size is too large, then AtomicExpandPass will lower the cmpxchg to the libatomic call.)

If a target wants to tell AtomicExpandPass that fp add/sub are supported, and then lower the resulting ATOMIC_LOAD_FSUB sdag node into a libcall of its choice, that's also ok (as long as the libcall is lock-free).

Why not have clang always emit atomicrmw for floats, and let AtomicExpandPass handle legalizing that into integer atomics if necessary, rather than adding a target hook in clang?

Not all targets can legalize fp atomics by AtomicExpandPass. Some targets need library support.

That isn't true, because you can do so generically with a cmpxchg loop, assuming that size of atomic is supported by the target. This might not be the most efficient lowering choice, but it's always possible as a fallback. (And if the size is too large, then AtomicExpandPass will lower the cmpxchg to the libatomic call.)

If a target wants to tell AtomicExpandPass that fp add/sub are supported, and then lower the resulting ATOMIC_LOAD_FSUB sdag node into a libcall of its choice, that's also ok (as long as the libcall is lock-free).

how about other fp types e.g. bf16, half, long double? Do we need to diagnose them or not?

yaxunl updated this revision to Diff 281296.Jul 28 2020, 11:07 AM

Make IEEE single and double type as supported for fp atomics in all targets by default. This is based on the assumption that AtomicExpandPass or its ongoing work is sufficient to support fp atomics for all targets. This is to facilitate middle end and backend end development to support fp atomics.

If a target would like to treat single and double fp atomics as unsupported, it can override the default behavior in its own TargetInfo.

ping. I think I have addressed all the issues in FE. I think issues in AtomicExpandPass should be addressed by separate patches. Can we land this? Thanks.

tra added a comment.Jul 29 2020, 10:45 AM

LGTM, modulo couple of nits.

@jyknight are you OK with this?

Make IEEE single and double type as supported for fp atomics in all targets by default. This is based on the assumption that AtomicExpandPass or its ongoing work is sufficient to support fp atomics for all targets. This is to facilitate middle end and backend end development to support fp atomics.

If a target would like to treat single and double fp atomics as unsupported, it can override the default behavior in its own TargetInfo.

Do we have sufficient test coverage on all platforms to make sure we're not generating something that LLVM can't handle everywhere?
If not, perhaps we should default to unsupported and only enable it for known working targets.

clang/lib/CodeGen/CGAtomic.cpp
906–908

ShouldCastToIntPtrTy = !MemTy->isFloatingType();

clang/test/Sema/atomic-ops.c
107–109

Rename arguments? d -> f, d2 -> d, d3 -> ld ?

yaxunl updated this revision to Diff 283028.Aug 4 2020, 2:11 PM
yaxunl marked 2 inline comments as done.

revised by Artem's comments.

yaxunl updated this revision to Diff 283067.Aug 4 2020, 5:09 PM

added tests for targets supporting fp atomics.

yaxunl added a comment.Aug 4 2020, 5:17 PM
In D71726#2182667, @tra wrote:

LGTM, modulo couple of nits.

@jyknight are you OK with this?

Make IEEE single and double type as supported for fp atomics in all targets by default. This is based on the assumption that AtomicExpandPass or its ongoing work is sufficient to support fp atomics for all targets. This is to facilitate middle end and backend end development to support fp atomics.

If a target would like to treat single and double fp atomics as unsupported, it can override the default behavior in its own TargetInfo.

Do we have sufficient test coverage on all platforms to make sure we're not generating something that LLVM can't handle everywhere?
If not, perhaps we should default to unsupported and only enable it for known working targets.

I updated TargetInfo for fp atomic support for common targets. Basically by default fp atomic support is now off. It is enabled only for targets which do not generate lib calls for fp atomics. This is because the availability of lib call depends on platform, so it is up to the Target owners to determine whether the support is available if lib call is needed. For those targets which are able to generate llvm fp atomic instructions, fp atomic support is enabled in clang, and tests are added to cover them.

clang/lib/CodeGen/CGAtomic.cpp
906–908

done

clang/test/Sema/atomic-ops.c
107–109

done

yaxunl edited the summary of this revision. (Show Details)Aug 4 2020, 5:20 PM
ldionne removed a subscriber: ldionne.Aug 5 2020, 9:05 AM
In D71726#2182667, @tra wrote:

If a target would like to treat single and double fp atomics as unsupported, it can override the default behavior in its own TargetInfo.

I really don't think this should be a target option at all. Every target can support the atomic fadd/fsub IR instruction (via lowering to a cmpxchg loop if nothing else). If it doesn't work, that's a bug in LLVM. We shouldn't be adding target hooks in Clang to workaround LLVM bugs, rather, we should fix them.

There is one nit -- atomicrmw doesn't (yet) support specifying alignment. There's work now to fix that, but until that's submitted, only naturally-aligned atomicrmw instructions can be created. So, for now, supporting only a naturally-aligned floating-point add would be a reasonable temporary measure.

Do we have sufficient test coverage on all platforms to make sure we're not generating something that LLVM can't handle everywhere?

Probably not.

If not, perhaps we should default to unsupported and only enable it for known working targets.

No, I don't think that's a good way to go. We should fix LLVM if it's broken.

jyknight added inline comments.Aug 10 2020, 9:11 AM
clang/lib/CodeGen/CGAtomic.cpp
952

convertToAtomicIntPointer does more than just cast to an int pointer, are you sure the rest is not necessary for fp types?

clang/lib/Sema/SemaChecking.cpp
5188–5190

This is confusing, and took me a bit to understand what you're doing. I'd suggest reordering the clauses, putting the pointer case first, e.g.:

if (Form == Arithmetic && ValType->isPointerType())
  Ty = Context.getPointerDiffType();
else if (Form == Init || Form == Arithmetic)
  Ty = ValType;
else if (Form == Copy || Form == Xchg) .....
else ......
...

Oh, one more note, C11 has -- and clang already supports -- _Atomic long double x; x += 4; via lowering to a cmpxchg loop. Now that we have an LLVM IR representation for atomicrmw fadd/fsub, clang should be lowering the _Atomic += to that, too. (Doesn't need to be in this patch, but it should be done.)

yaxunl marked 2 inline comments as done.Aug 10 2020, 11:46 AM
In D71726#2182667, @tra wrote:

If a target would like to treat single and double fp atomics as unsupported, it can override the default behavior in its own TargetInfo.

I really don't think this should be a target option at all. Every target can support the atomic fadd/fsub IR instruction (via lowering to a cmpxchg loop if nothing else). If it doesn't work, that's a bug in LLVM. We shouldn't be adding target hooks in Clang to workaround LLVM bugs, rather, we should fix them.

There is one nit -- atomicrmw doesn't (yet) support specifying alignment. There's work now to fix that, but until that's submitted, only naturally-aligned atomicrmw instructions can be created. So, for now, supporting only a naturally-aligned floating-point add would be a reasonable temporary measure.

clang does not always emit atomic instructions for atomic builtins. Clang may emit lib calls for atomic builtins. Basically clang checks target info about max atomic inline width and if the desired atomic operation exceeds the supported atomic inline width, clang will emit lib calls for atomic builtins. The rationale is that the lib calls may be faster than the IR generated by the LLVM pass. This behavior has long existed and it also applies to fp atomics. I don't think emitting lib calls for atomic builtins is a bug. However, this does introduce the issue about whether the library functions for atomics are available for a specific target. As I said, only the target owners have the answer and therefore I introduced the target hook.

Do we have sufficient test coverage on all platforms to make sure we're not generating something that LLVM can't handle everywhere?

Probably not.

In clang, we only test IR generation, as is done for other atomic builtins. fp atomics do not have less coverage compared with other atomic builtins. Actually for other atomic builtins we do not even test them on different targets. The ISA generation of fp atomics should be done in llvm tests and should not be blocking clang change.

clang/lib/CodeGen/CGAtomic.cpp
952

it is not needed for fp types. If the value type does not match the pointer type, clang automatically inserts proper llvm instructions to convert the value type to a value type that matches the pointer type. Two codegen tests are added (atomic_fetch_add(double*, float) and atomic_fetch_add(double*, int)) to test such situations.

clang/lib/Sema/SemaChecking.cpp
5188–5190

done

yaxunl updated this revision to Diff 284463.Aug 10 2020, 11:49 AM
yaxunl marked 2 inline comments as done.

Revised by James' comments.

clang does not always emit atomic instructions for atomic builtins. Clang may emit lib calls for atomic builtins. Basically clang checks target info about max atomic inline width and if the desired atomic operation exceeds the supported atomic inline width, clang will emit lib calls for atomic builtins. The rationale is that the lib calls may be faster than the IR generated by the LLVM pass. This behavior has long existed and it also applies to fp atomics. I don't think emitting lib calls for atomic builtins is a bug. However, this does introduce the issue about whether the library functions for atomics are available for a specific target. As I said, only the target owners have the answer and therefore I introduced the target hook.

If we want the frontend to emit an error when the target doesn't support library-based atomics, that seems fine, but there's no reason to only do so for floating-point types. That is, we should have a TargetInfo method that asks whether atomics at a given size and alignment are supported at all, similar to what we have for "builtin" (lock-free) atomics, and we should check it for all the atomic types and operations.

Actually, maybe we should take the existing hook and have it return one of { LockFree, Library, Unsupported }.

clang does not always emit atomic instructions for atomic builtins. Clang may emit lib calls for atomic builtins. Basically clang checks target info about max atomic inline width and if the desired atomic operation exceeds the supported atomic inline width, clang will emit lib calls for atomic builtins. The rationale is that the lib calls may be faster than the IR generated by the LLVM pass. This behavior has long existed and it also applies to fp atomics. I don't think emitting lib calls for atomic builtins is a bug. However, this does introduce the issue about whether the library functions for atomics are available for a specific target. As I said, only the target owners have the answer and therefore I introduced the target hook.

The LLVM AtomicExpandPass is _also_ introducing libcalls (or cmpxchg loops), as is appropriate for a given target. We currently, redundantly, support the same thing in two places. It's a long-term goal of mine to simplify the atomics code in clang, by deferring more of it to LLVM, but some prerequisites (e.g. supporting misaligned atomicrmw) are not yet in place. The intent is that it is always valid to emit the LLVM atomic IR, and it will be transformed into whatever is best on a given target. As such, there's no reason to restrict these clang intrinsics.

Yes, there are no generically available libcalls for atomic float math -- but that's okay -- let LLVM handle transform into a cmpxchg loop when required.

Yes, there are no generically available libcalls for atomic float math -- but that's okay -- let LLVM handle transform into a cmpxchg loop when required.

I suspect Yaxun's target cannot provide libcalls at all, which is why he wants to diagnose up-front. But I agree that we should be thinking about this uniformly, and that his target should be diagnosing *all* unsupported atomics.

yaxunl updated this revision to Diff 307206.Nov 23 2020, 3:12 PM
yaxunl edited the summary of this revision. (Show Details)

revised by John's comments. Added target hook and diagnostics for generic atomic operations.

Yes, there are no generically available libcalls for atomic float math -- but that's okay -- let LLVM handle transform into a cmpxchg loop when required.

I suspect Yaxun's target cannot provide libcalls at all, which is why he wants to diagnose up-front. But I agree that we should be thinking about this uniformly, and that his target should be diagnosing *all* unsupported atomics.

amdgpu target currently does not support atomic libcalls. I added a target hook for atomic operation support and diagnostics for generic atomic operations by John's comments.

Clang has existing diagnostics for unsupported atomic load/store for some platforms, and functions about atomic support scattered in target info, AST context, and codegen. This change refactors these codes and unify them as a target hook.

@rjmccall I have addressed the comments about diagnostics. Could you please review it? Thanks.

rjmccall added inline comments.Jan 27 2021, 6:51 PM
clang/include/clang/Basic/TargetInfo.h
1478 ↗(On Diff #307206)

This shouldn't be here; if you have places that don't always represent an atomic operation, queries for the kind should return an Optional<AtomicOperationKind> from the classification.

1479 ↗(On Diff #307206)

atomic_init is not actually an atomic operation, so there's never an inherent reason it can't be supported.

In general, I am torn about this list, because it's simultaneously rather fine-grained while not seeming nearly fine-grained enough to be truly general. What's actually going on on your target? You have ISA support for doing some specific operations atomically, but not a general atomic compare-and-swap operation? Which means that you then cannot support support other operations?

It is unfortunate that our layering prevents TargetInfo from simply being passed the appropriate expression.

1497 ↗(On Diff #307206)

I think this reflects our current strategies for emitting atomics, but it's a somewhat misleading enum in general because this isn't an exhaustive list of the options — there are certainly possible inline expansions that aren't lock-free. (For example, you could have an inline spin-lock embedded in the atomic object.) The goal of this enum is so that TargetInfo only has to have one hook for checking atomic operations? I would be happier if you included an inline-but-not-lock-free alternative in this enum, even if it's never currently used, so that clients can do the right test.

1501 ↗(On Diff #307206)

Why is this needed as a separate hook?

clang/lib/AST/ASTContext.cpp
11046 ↗(On Diff #307206)

Should this be a method on AtomicExpr? It seems like an intrinsic, target-independent property of the expression.

clang/lib/Basic/TargetInfo.cpp
870 ↗(On Diff #307206)

Darwin targets should all be subclasses of DarwinTargetInfo in OSTargets.h, so you should be able to just override this there instead of having it in the base case.

clang/lib/Basic/Targets/AArch64.h
143 ↗(On Diff #307206)

Why can't targets reliably expand this to an atomic compare-and-exchange if they support that for the target width?

yaxunl marked 7 inline comments as done.Feb 1 2021, 8:16 AM
yaxunl added inline comments.
clang/include/clang/Basic/TargetInfo.h
1478 ↗(On Diff #307206)

Removed.

1479 ↗(On Diff #307206)

The target hook getAtomicSupport needs an argument for atomic operation. Since not all targets support fp add/sub, we need an enum for add/sub. Since certain release of iOS/macOS does not support C11 load/store, we need an enum for C11 load/store. We could define the enums as {AddSub, C11LoadStore, Other}. However, this would cause a difficulty for emitting diagnostic message for unsupported atomic operations since we map this enum to a string for the atomic operation and use it in the diagnostic message. 'Other' would be mapped to 'other atomic operation' which is not clear what it is.

1497 ↗(On Diff #307206)

Added InlineWithLock

1501 ↗(On Diff #307206)

Most target shares getAtomicSupport except FP atomic support, so define a virtual function for FP atomic support and let getAtomicSupport call it.

clang/lib/AST/ASTContext.cpp
11046 ↗(On Diff #307206)

Yes. moved to AtomicExpr

clang/lib/Basic/TargetInfo.cpp
870 ↗(On Diff #307206)

done

clang/lib/Basic/Targets/AArch64.h
143 ↗(On Diff #307206)

There are some bugs in either the middle end or backend causing this not working. For example, half type atomic fadd on amdgcn is not lowered to cmpxchg and the backend has isel failure, bf16 type atomic fadd on arm is not lowered to cmpxchg and the backend has isel failure. The support for each fp type needs to be done case by case. So far there is no target support atomic fadd/sub with half and bf16 type.

yaxunl updated this revision to Diff 320477.Feb 1 2021, 8:19 AM
yaxunl marked 7 inline comments as done.

revised by John's comments

I still have the same fundamental objection as before to the parts of this patch for prohibiting FP add/sub on some targets.

If a particular LLVM target cannot handle transforming an FP add/sub (or any other RMW operations!) into the correct cmpxchg or LL/SC loop, that's a bug in the backend which should be fixed. I don't see why we ought to add a bunch of functionality in the frontend to workaround this?

(Some of the other changes, e.g. to diagnose lack of support for large atomics is useful, though.)

rjmccall added inline comments.Feb 1 2021, 12:09 PM
clang/include/clang/Basic/TargetInfo.h
1479 ↗(On Diff #307206)

It's not obviously true that not all targets support FP add/sub, though. Any target that provides compare-and-swap at the width of an FP type can do an atomic FP add/sub at that width; it might be less efficient than it would be with specific ISA support, but that's true for a lot of atomic operations. Surely it's better to just fix whatever bugs LLVM has with lowering atomic FP add/sub than to add more abstraction to Clang to handle a special case that shouldn't exist.

I don't know what issues Darwin has with C11 load/store; that might be a more compelling reason to have this abstraction, although again it seems strange that we're outlawing a specific operation when in principle we can just emit it less efficiently.

clang/lib/Basic/Targets/AArch64.h
143 ↗(On Diff #307206)

Are we legalizing atomicrmw to cmpxchg loops in the backend instead of as LLVM IR pass? That seems like an architectural mistake. Regardless, this bug should just be fixed.

yaxunl added a comment.Feb 2 2021, 9:03 AM

This patch focuses on clang work for enabling fp atomics. There is a middle end pass for lowering fp atomics to cmpxchg, however not all targets enable it or enable it properly. From clang point of view, those targets are not ready to say they support fp atomics, therefore it diagnose those situations and let clang fail gracefully instead of crashing with isel failure or missing symbols in linker.

I have limited resources to work on middle end and backend for all targets. If a backend really cares about fp atomics, they should fix the atomic lowering pass then enable fp atomics support in clang Target info.

This patch implements fp atomic support in clang. It does not make things worse in regarding the bugs in middle ends and backends. I think it is not beneficial to blocking this clang change due to middle end and backend issues.

yaxunl added a comment.Feb 2 2021, 9:25 AM

If the concern is that diagnose fp atomics as unsupported hinders middle end and backend work for fixing fp atomic issues, how about adding a -fenable-fp-atomics to clang which can override target info about fp atomics support.

My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy.

I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them)

tra added a comment.Feb 2 2021, 10:01 AM

My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy.

I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them)

+1. I agree with James.

Removing code is often harder than adding it. When you're adding things, you're the only user. Once things are in, they will start growing dependencies that will need to be dealt with if you ever want to remove the code.

Clean solution that works for AMDGPU only for now is better than a potentially permanent workaround.

In D71726#2537054, @tra wrote:

My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy.

I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them)

+1. I agree with James.

Removing code is often harder than adding it. When you're adding things, you're the only user. Once things are in, they will start growing dependencies that will need to be dealt with if you ever want to remove the code.

Clean solution that works for AMDGPU only for now is better than a potentially permanent workaround.

For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld.

For other targets, I can make changes to assume fp atomics are supported if width is within max inline atomic width of the target. Basically this will let fp atomics emitted for these targets and assuming middle end or backend will handle them properly.

In D71726#2537054, @tra wrote:

My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy.

I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them)

+1. I agree with James.

Removing code is often harder than adding it. When you're adding things, you're the only user. Once things are in, they will start growing dependencies that will need to be dealt with if you ever want to remove the code.

Clean solution that works for AMDGPU only for now is better than a potentially permanent workaround.

For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld.

Diagnosing that you don't support atomics your target can't reasonably support is completely fine. (You could actually actually inline a locking approach if you really wanted to, though; Microsoft's std::atomic does that in the general case, although admittedly that's library code.) I would like to understand whether that's really type-specific or just size-specific, though, and I don't think we've gotten a plain answer about that. Is it true that amdgpu simply does not have a generic cmpxchg?

For other targets, I can make changes to assume fp atomics are supported if width is within max inline atomic width of the target. Basically this will let fp atomics emitted for these targets and assuming middle end or backend will handle them properly.

I think that's reasonable.

For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld.

If this is limited simply to not supporting oversized or misaligned atomics, I'd find that a lot less objectionable. At that point you just need a single boolean variable/accessor for whether the target can support atomic library calls. I note that we already have warning messages: warn_atomic_op_misaligned and warn_atomic_op_oversized. Maybe those can just be promoted to errors on AMDGPU.

yaxunl marked 2 inline comments as done.Feb 7 2021, 3:55 PM

For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld.

If this is limited simply to not supporting oversized or misaligned atomics, I'd find that a lot less objectionable. At that point you just need a single boolean variable/accessor for whether the target can support atomic library calls. I note that we already have warning messages: warn_atomic_op_misaligned and warn_atomic_op_oversized. Maybe those can just be promoted to errors on AMDGPU.

Good points. Will do.

yaxunl updated this revision to Diff 322005.Feb 7 2021, 3:58 PM
yaxunl edited the summary of this revision. (Show Details)

Revised by James, Artem, and John's comments.

@jyknight @rjmccall ping. diagnostic issue addressed.

yaxunl added a comment.Mar 2 2021, 9:21 AM

@rjmccall @jyknight Ping. Any further concerns? Thanks.

yaxunl updated this revision to Diff 332658.Mar 23 2021, 6:37 AM

Re-use existing warning instead of introducing new diagnostics.

Ping. Can some one help review this patch? I believe all comments addressed. Thanks.

yaxunl edited the summary of this revision. (Show Details)Mar 23 2021, 6:42 AM
tra added a comment.Mar 23 2021, 10:22 AM

@jyknight - James, do you have further concerns about the patch?

clang/lib/Driver/ToolChains/Clang.cpp
6454 ↗(On Diff #332658)

If we rely on promoting the warnings to errors for correctness, I think we may need a more robust mechanism to enforce that than trying to guess the state based on provided options.
E.g. can these diagnostics be enabled/disabled with a wider scope option like -W[no-]extra or -W[no-]all?

Maybe we should add a cc1-only option --enforce-atomic-alignment and use that to determine if misalignment should be an error at the point where we issue the diagnostics?

6457 ↗(On Diff #332658)

This should be else if, or, maybe use llvm::StringSwitch()instead:

DiagAtomicLibCall = llvm::StringSwitch<bool>(A->getValue())
   .Case("no-error=atomic-alignment", false)
   .Case("error=atomic-alignment", true)
   .Default(DiagAtomicLibCall)
yaxunl updated this revision to Diff 332730.Mar 23 2021, 11:06 AM
yaxunl edited the summary of this revision. (Show Details)

separate diagnosing unaligned atomc for amdgpu to another review.

In D71726#2645269, @tra wrote:

@jyknight - James, do you have further concerns about the patch?

I separated the change about diagnosing unaligned atomics for amdgpu to https://reviews.llvm.org/D99201 since these two changes are orthogonal.

rjmccall added inline comments.Mar 29 2021, 10:29 PM
clang/lib/Sema/SemaChecking.cpp
5046

Does LLVM support atomics on all floating-point types?

yaxunl added inline comments.Apr 4 2021, 8:25 AM
clang/lib/Sema/SemaChecking.cpp
5046

LLVM IR parser requires atomicrmw value operand must have size of power of 2, therefore LLVM does not support atomicrmw on x86_fp80 which has size of 80 bytes. LLVM supports atomicrmw on all other floating-point types (bfloat, half, float, double, fp128, ppc_fp128).

rjmccall added inline comments.Apr 5 2021, 10:53 AM
clang/lib/Sema/SemaChecking.cpp
5046

Okay. So this needs to check the underlying FP semantics and disallow atomics on unsupported types.

yaxunl marked 2 inline comments as done.Apr 5 2021, 6:28 PM
yaxunl added inline comments.
clang/lib/Sema/SemaChecking.cpp
5046

will do

yaxunl updated this revision to Diff 335366.Apr 5 2021, 6:31 PM
yaxunl marked an inline comment as done.

Revised by John's comments. Do not allow atomic fetch add with x86_fp80.

Alright, mostly looks good.

clang/lib/Sema/SemaChecking.cpp
5046

Could you extract this whole condition into a function and make it a bit more readable?

yaxunl updated this revision to Diff 335489.Apr 6 2021, 5:19 AM

revised by John's comments

rjmccall accepted this revision.Apr 6 2021, 11:52 AM

Thanks, LGTM

This revision is now accepted and ready to land.Apr 6 2021, 11:52 AM
This revision was landed with ongoing or failed builds.Apr 6 2021, 12:45 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 12:45 PM
kpet added a subscriber: kpet.May 26 2021, 1:12 AM