Page MenuHomePhabricator

Let clang atomic builtins fetch add/sub support floating point types
ClosedPublic

Authored by yaxunl on Dec 19 2019, 1:14 PM.

Details

Summary

Recently atomicrmw started to support fadd/fsub:

https://reviews.llvm.org/D53965

However clang atomic builtins fetch add/sub still does not support emitting atomicrmw fadd/fsub.

This patch adds that.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jyknight added inline comments.Aug 10 2020, 9:11 AM
clang/lib/CodeGen/CGAtomic.cpp
952

convertToAtomicIntPointer does more than just cast to an int pointer, are you sure the rest is not necessary for fp types?

clang/lib/Sema/SemaChecking.cpp
5188–5190

This is confusing, and took me a bit to understand what you're doing. I'd suggest reordering the clauses, putting the pointer case first, e.g.:

if (Form == Arithmetic && ValType->isPointerType())
  Ty = Context.getPointerDiffType();
else if (Form == Init || Form == Arithmetic)
  Ty = ValType;
else if (Form == Copy || Form == Xchg) .....
else ......
...

Oh, one more note, C11 has -- and clang already supports -- _Atomic long double x; x += 4; via lowering to a cmpxchg loop. Now that we have an LLVM IR representation for atomicrmw fadd/fsub, clang should be lowering the _Atomic += to that, too. (Doesn't need to be in this patch, but it should be done.)

yaxunl marked 2 inline comments as done.Aug 10 2020, 11:46 AM
In D71726#2182667, @tra wrote:

If a target would like to treat single and double fp atomics as unsupported, it can override the default behavior in its own TargetInfo.

I really don't think this should be a target option at all. Every target can support the atomic fadd/fsub IR instruction (via lowering to a cmpxchg loop if nothing else). If it doesn't work, that's a bug in LLVM. We shouldn't be adding target hooks in Clang to workaround LLVM bugs, rather, we should fix them.

There is one nit -- atomicrmw doesn't (yet) support specifying alignment. There's work now to fix that, but until that's submitted, only naturally-aligned atomicrmw instructions can be created. So, for now, supporting only a naturally-aligned floating-point add would be a reasonable temporary measure.

clang does not always emit atomic instructions for atomic builtins. Clang may emit lib calls for atomic builtins. Basically clang checks target info about max atomic inline width and if the desired atomic operation exceeds the supported atomic inline width, clang will emit lib calls for atomic builtins. The rationale is that the lib calls may be faster than the IR generated by the LLVM pass. This behavior has long existed and it also applies to fp atomics. I don't think emitting lib calls for atomic builtins is a bug. However, this does introduce the issue about whether the library functions for atomics are available for a specific target. As I said, only the target owners have the answer and therefore I introduced the target hook.

Do we have sufficient test coverage on all platforms to make sure we're not generating something that LLVM can't handle everywhere?

Probably not.

In clang, we only test IR generation, as is done for other atomic builtins. fp atomics do not have less coverage compared with other atomic builtins. Actually for other atomic builtins we do not even test them on different targets. The ISA generation of fp atomics should be done in llvm tests and should not be blocking clang change.

clang/lib/CodeGen/CGAtomic.cpp
952

it is not needed for fp types. If the value type does not match the pointer type, clang automatically inserts proper llvm instructions to convert the value type to a value type that matches the pointer type. Two codegen tests are added (atomic_fetch_add(double*, float) and atomic_fetch_add(double*, int)) to test such situations.

clang/lib/Sema/SemaChecking.cpp
5188–5190

done

yaxunl updated this revision to Diff 284463.Aug 10 2020, 11:49 AM
yaxunl marked 2 inline comments as done.

Revised by James' comments.

clang does not always emit atomic instructions for atomic builtins. Clang may emit lib calls for atomic builtins. Basically clang checks target info about max atomic inline width and if the desired atomic operation exceeds the supported atomic inline width, clang will emit lib calls for atomic builtins. The rationale is that the lib calls may be faster than the IR generated by the LLVM pass. This behavior has long existed and it also applies to fp atomics. I don't think emitting lib calls for atomic builtins is a bug. However, this does introduce the issue about whether the library functions for atomics are available for a specific target. As I said, only the target owners have the answer and therefore I introduced the target hook.

If we want the frontend to emit an error when the target doesn't support library-based atomics, that seems fine, but there's no reason to only do so for floating-point types. That is, we should have a TargetInfo method that asks whether atomics at a given size and alignment are supported at all, similar to what we have for "builtin" (lock-free) atomics, and we should check it for all the atomic types and operations.

Actually, maybe we should take the existing hook and have it return one of { LockFree, Library, Unsupported }.

clang does not always emit atomic instructions for atomic builtins. Clang may emit lib calls for atomic builtins. Basically clang checks target info about max atomic inline width and if the desired atomic operation exceeds the supported atomic inline width, clang will emit lib calls for atomic builtins. The rationale is that the lib calls may be faster than the IR generated by the LLVM pass. This behavior has long existed and it also applies to fp atomics. I don't think emitting lib calls for atomic builtins is a bug. However, this does introduce the issue about whether the library functions for atomics are available for a specific target. As I said, only the target owners have the answer and therefore I introduced the target hook.

The LLVM AtomicExpandPass is _also_ introducing libcalls (or cmpxchg loops), as is appropriate for a given target. We currently, redundantly, support the same thing in two places. It's a long-term goal of mine to simplify the atomics code in clang, by deferring more of it to LLVM, but some prerequisites (e.g. supporting misaligned atomicrmw) are not yet in place. The intent is that it is always valid to emit the LLVM atomic IR, and it will be transformed into whatever is best on a given target. As such, there's no reason to restrict these clang intrinsics.

Yes, there are no generically available libcalls for atomic float math -- but that's okay -- let LLVM handle transform into a cmpxchg loop when required.

Yes, there are no generically available libcalls for atomic float math -- but that's okay -- let LLVM handle transform into a cmpxchg loop when required.

I suspect Yaxun's target cannot provide libcalls at all, which is why he wants to diagnose up-front. But I agree that we should be thinking about this uniformly, and that his target should be diagnosing *all* unsupported atomics.

yaxunl updated this revision to Diff 307206.Nov 23 2020, 3:12 PM
yaxunl edited the summary of this revision. (Show Details)

revised by John's comments. Added target hook and diagnostics for generic atomic operations.

Yes, there are no generically available libcalls for atomic float math -- but that's okay -- let LLVM handle transform into a cmpxchg loop when required.

I suspect Yaxun's target cannot provide libcalls at all, which is why he wants to diagnose up-front. But I agree that we should be thinking about this uniformly, and that his target should be diagnosing *all* unsupported atomics.

amdgpu target currently does not support atomic libcalls. I added a target hook for atomic operation support and diagnostics for generic atomic operations by John's comments.

Clang has existing diagnostics for unsupported atomic load/store for some platforms, and functions about atomic support scattered in target info, AST context, and codegen. This change refactors these codes and unify them as a target hook.

@rjmccall I have addressed the comments about diagnostics. Could you please review it? Thanks.

rjmccall added inline comments.Jan 27 2021, 6:51 PM
clang/include/clang/Basic/TargetInfo.h
1478 ↗(On Diff #307206)

This shouldn't be here; if you have places that don't always represent an atomic operation, queries for the kind should return an Optional<AtomicOperationKind> from the classification.

1479 ↗(On Diff #307206)

atomic_init is not actually an atomic operation, so there's never an inherent reason it can't be supported.

In general, I am torn about this list, because it's simultaneously rather fine-grained while not seeming nearly fine-grained enough to be truly general. What's actually going on on your target? You have ISA support for doing some specific operations atomically, but not a general atomic compare-and-swap operation? Which means that you then cannot support support other operations?

It is unfortunate that our layering prevents TargetInfo from simply being passed the appropriate expression.

1497 ↗(On Diff #307206)

I think this reflects our current strategies for emitting atomics, but it's a somewhat misleading enum in general because this isn't an exhaustive list of the options — there are certainly possible inline expansions that aren't lock-free. (For example, you could have an inline spin-lock embedded in the atomic object.) The goal of this enum is so that TargetInfo only has to have one hook for checking atomic operations? I would be happier if you included an inline-but-not-lock-free alternative in this enum, even if it's never currently used, so that clients can do the right test.

1501 ↗(On Diff #307206)

Why is this needed as a separate hook?

clang/lib/AST/ASTContext.cpp
11046 ↗(On Diff #307206)

Should this be a method on AtomicExpr? It seems like an intrinsic, target-independent property of the expression.

clang/lib/Basic/TargetInfo.cpp
870 ↗(On Diff #307206)

Darwin targets should all be subclasses of DarwinTargetInfo in OSTargets.h, so you should be able to just override this there instead of having it in the base case.

clang/lib/Basic/Targets/AArch64.h
143 ↗(On Diff #307206)

Why can't targets reliably expand this to an atomic compare-and-exchange if they support that for the target width?

yaxunl marked 7 inline comments as done.Feb 1 2021, 8:16 AM
yaxunl added inline comments.
clang/include/clang/Basic/TargetInfo.h
1478 ↗(On Diff #307206)

Removed.

1479 ↗(On Diff #307206)

The target hook getAtomicSupport needs an argument for atomic operation. Since not all targets support fp add/sub, we need an enum for add/sub. Since certain release of iOS/macOS does not support C11 load/store, we need an enum for C11 load/store. We could define the enums as {AddSub, C11LoadStore, Other}. However, this would cause a difficulty for emitting diagnostic message for unsupported atomic operations since we map this enum to a string for the atomic operation and use it in the diagnostic message. 'Other' would be mapped to 'other atomic operation' which is not clear what it is.

1497 ↗(On Diff #307206)

Added InlineWithLock

1501 ↗(On Diff #307206)

Most target shares getAtomicSupport except FP atomic support, so define a virtual function for FP atomic support and let getAtomicSupport call it.

clang/lib/AST/ASTContext.cpp
11046 ↗(On Diff #307206)

Yes. moved to AtomicExpr

clang/lib/Basic/TargetInfo.cpp
870 ↗(On Diff #307206)

done

clang/lib/Basic/Targets/AArch64.h
143 ↗(On Diff #307206)

There are some bugs in either the middle end or backend causing this not working. For example, half type atomic fadd on amdgcn is not lowered to cmpxchg and the backend has isel failure, bf16 type atomic fadd on arm is not lowered to cmpxchg and the backend has isel failure. The support for each fp type needs to be done case by case. So far there is no target support atomic fadd/sub with half and bf16 type.

yaxunl updated this revision to Diff 320477.Feb 1 2021, 8:19 AM
yaxunl marked 7 inline comments as done.

revised by John's comments

I still have the same fundamental objection as before to the parts of this patch for prohibiting FP add/sub on some targets.

If a particular LLVM target cannot handle transforming an FP add/sub (or any other RMW operations!) into the correct cmpxchg or LL/SC loop, that's a bug in the backend which should be fixed. I don't see why we ought to add a bunch of functionality in the frontend to workaround this?

(Some of the other changes, e.g. to diagnose lack of support for large atomics is useful, though.)

rjmccall added inline comments.Feb 1 2021, 12:09 PM
clang/include/clang/Basic/TargetInfo.h
1479 ↗(On Diff #307206)

It's not obviously true that not all targets support FP add/sub, though. Any target that provides compare-and-swap at the width of an FP type can do an atomic FP add/sub at that width; it might be less efficient than it would be with specific ISA support, but that's true for a lot of atomic operations. Surely it's better to just fix whatever bugs LLVM has with lowering atomic FP add/sub than to add more abstraction to Clang to handle a special case that shouldn't exist.

I don't know what issues Darwin has with C11 load/store; that might be a more compelling reason to have this abstraction, although again it seems strange that we're outlawing a specific operation when in principle we can just emit it less efficiently.

clang/lib/Basic/Targets/AArch64.h
143 ↗(On Diff #307206)

Are we legalizing atomicrmw to cmpxchg loops in the backend instead of as LLVM IR pass? That seems like an architectural mistake. Regardless, this bug should just be fixed.

yaxunl added a comment.Feb 2 2021, 9:03 AM

This patch focuses on clang work for enabling fp atomics. There is a middle end pass for lowering fp atomics to cmpxchg, however not all targets enable it or enable it properly. From clang point of view, those targets are not ready to say they support fp atomics, therefore it diagnose those situations and let clang fail gracefully instead of crashing with isel failure or missing symbols in linker.

I have limited resources to work on middle end and backend for all targets. If a backend really cares about fp atomics, they should fix the atomic lowering pass then enable fp atomics support in clang Target info.

This patch implements fp atomic support in clang. It does not make things worse in regarding the bugs in middle ends and backends. I think it is not beneficial to blocking this clang change due to middle end and backend issues.

yaxunl added a comment.Feb 2 2021, 9:25 AM

If the concern is that diagnose fp atomics as unsupported hinders middle end and backend work for fixing fp atomic issues, how about adding a -fenable-fp-atomics to clang which can override target info about fp atomics support.

My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy.

I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them)

tra added a comment.Feb 2 2021, 10:01 AM

My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy.

I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them)

+1. I agree with James.

Removing code is often harder than adding it. When you're adding things, you're the only user. Once things are in, they will start growing dependencies that will need to be dealt with if you ever want to remove the code.

Clean solution that works for AMDGPU only for now is better than a potentially permanent workaround.

In D71726#2537054, @tra wrote:

My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy.

I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them)

+1. I agree with James.

Removing code is often harder than adding it. When you're adding things, you're the only user. Once things are in, they will start growing dependencies that will need to be dealt with if you ever want to remove the code.

Clean solution that works for AMDGPU only for now is better than a potentially permanent workaround.

For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld.

For other targets, I can make changes to assume fp atomics are supported if width is within max inline atomic width of the target. Basically this will let fp atomics emitted for these targets and assuming middle end or backend will handle them properly.

In D71726#2537054, @tra wrote:

My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy.

I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them)

+1. I agree with James.

Removing code is often harder than adding it. When you're adding things, you're the only user. Once things are in, they will start growing dependencies that will need to be dealt with if you ever want to remove the code.

Clean solution that works for AMDGPU only for now is better than a potentially permanent workaround.

For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld.

Diagnosing that you don't support atomics your target can't reasonably support is completely fine. (You could actually actually inline a locking approach if you really wanted to, though; Microsoft's std::atomic does that in the general case, although admittedly that's library code.) I would like to understand whether that's really type-specific or just size-specific, though, and I don't think we've gotten a plain answer about that. Is it true that amdgpu simply does not have a generic cmpxchg?

For other targets, I can make changes to assume fp atomics are supported if width is within max inline atomic width of the target. Basically this will let fp atomics emitted for these targets and assuming middle end or backend will handle them properly.

I think that's reasonable.

For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld.

If this is limited simply to not supporting oversized or misaligned atomics, I'd find that a lot less objectionable. At that point you just need a single boolean variable/accessor for whether the target can support atomic library calls. I note that we already have warning messages: warn_atomic_op_misaligned and warn_atomic_op_oversized. Maybe those can just be promoted to errors on AMDGPU.

yaxunl marked 2 inline comments as done.Feb 7 2021, 3:55 PM

For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld.

If this is limited simply to not supporting oversized or misaligned atomics, I'd find that a lot less objectionable. At that point you just need a single boolean variable/accessor for whether the target can support atomic library calls. I note that we already have warning messages: warn_atomic_op_misaligned and warn_atomic_op_oversized. Maybe those can just be promoted to errors on AMDGPU.

Good points. Will do.

yaxunl updated this revision to Diff 322005.Feb 7 2021, 3:58 PM
yaxunl edited the summary of this revision. (Show Details)

Revised by James, Artem, and John's comments.

@jyknight @rjmccall ping. diagnostic issue addressed.

yaxunl added a comment.Mar 2 2021, 9:21 AM

@rjmccall @jyknight Ping. Any further concerns? Thanks.

yaxunl updated this revision to Diff 332658.Mar 23 2021, 6:37 AM

Re-use existing warning instead of introducing new diagnostics.

Ping. Can some one help review this patch? I believe all comments addressed. Thanks.

yaxunl edited the summary of this revision. (Show Details)Mar 23 2021, 6:42 AM
tra added a comment.Mar 23 2021, 10:22 AM

@jyknight - James, do you have further concerns about the patch?

clang/lib/Driver/ToolChains/Clang.cpp
6454 ↗(On Diff #332658)

If we rely on promoting the warnings to errors for correctness, I think we may need a more robust mechanism to enforce that than trying to guess the state based on provided options.
E.g. can these diagnostics be enabled/disabled with a wider scope option like -W[no-]extra or -W[no-]all?

Maybe we should add a cc1-only option --enforce-atomic-alignment and use that to determine if misalignment should be an error at the point where we issue the diagnostics?

6457 ↗(On Diff #332658)

This should be else if, or, maybe use llvm::StringSwitch()instead:

DiagAtomicLibCall = llvm::StringSwitch<bool>(A->getValue())
   .Case("no-error=atomic-alignment", false)
   .Case("error=atomic-alignment", true)
   .Default(DiagAtomicLibCall)
yaxunl updated this revision to Diff 332730.Mar 23 2021, 11:06 AM
yaxunl edited the summary of this revision. (Show Details)

separate diagnosing unaligned atomc for amdgpu to another review.

In D71726#2645269, @tra wrote:

@jyknight - James, do you have further concerns about the patch?

I separated the change about diagnosing unaligned atomics for amdgpu to https://reviews.llvm.org/D99201 since these two changes are orthogonal.

rjmccall added inline comments.Mar 29 2021, 10:29 PM
clang/lib/Sema/SemaChecking.cpp
5046

Does LLVM support atomics on all floating-point types?

yaxunl added inline comments.Apr 4 2021, 8:25 AM
clang/lib/Sema/SemaChecking.cpp
5046

LLVM IR parser requires atomicrmw value operand must have size of power of 2, therefore LLVM does not support atomicrmw on x86_fp80 which has size of 80 bytes. LLVM supports atomicrmw on all other floating-point types (bfloat, half, float, double, fp128, ppc_fp128).

rjmccall added inline comments.Apr 5 2021, 10:53 AM
clang/lib/Sema/SemaChecking.cpp
5046

Okay. So this needs to check the underlying FP semantics and disallow atomics on unsupported types.

yaxunl marked 2 inline comments as done.Apr 5 2021, 6:28 PM
yaxunl added inline comments.
clang/lib/Sema/SemaChecking.cpp
5046

will do

yaxunl updated this revision to Diff 335366.Apr 5 2021, 6:31 PM
yaxunl marked an inline comment as done.

Revised by John's comments. Do not allow atomic fetch add with x86_fp80.

Alright, mostly looks good.

clang/lib/Sema/SemaChecking.cpp
5046

Could you extract this whole condition into a function and make it a bit more readable?

yaxunl updated this revision to Diff 335489.Apr 6 2021, 5:19 AM

revised by John's comments

rjmccall accepted this revision.Apr 6 2021, 11:52 AM

Thanks, LGTM

This revision is now accepted and ready to land.Apr 6 2021, 11:52 AM
This revision was landed with ongoing or failed builds.Apr 6 2021, 12:45 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 12:45 PM
kpet added a subscriber: kpet.Wed, May 26, 1:12 AM