Recently atomicrmw started to support fadd/fsub:
https://reviews.llvm.org/D53965
However clang atomic builtins fetch add/sub still does not support emitting atomicrmw fadd/fsub.
This patch adds that.
Differential D71726
Let clang atomic builtins fetch add/sub support floating point types yaxunl on Dec 19 2019, 1:14 PM. Authored by
Details Recently atomicrmw started to support fadd/fsub: https://reviews.llvm.org/D53965 However clang atomic builtins fetch add/sub still does not support emitting atomicrmw fadd/fsub. This patch adds that.
Diff Detail Event TimelineComment Actions This generally seems fine. Does it work on most backends? I want to make sure it doesn't fail in backends :) Also, @ldionne / @EricWF / @mclow.lists do you need this in libc++ for floating-point atomic support? Comment Actions For x86_64, amdgcn, aarch64, armv7, mips64, it is translated to cmpxchg by AtomicExpandPass and backends did codegen successfully. For hexagon, riscv32, it is translated to call of __atomic_fetch_add_4 for fadd float. This is concerning. Probably we need to add __atomic_fetch_{add|sub}_{f16|f32|f64} ?
Comment Actions For systems that have load-link/store-conditional architectures, like ARM / PPC / base RISC-V without extension, I would imagine that using a cmpxchg loop is much worse than simply doing the floating-point add/sub in the middle of the atomic mini-transaction. I'm sure that we want back-ends to be capable of implementing this better than what this pass is doing, even when they don't have "native" fp atomics. You listed amdgcn... what does this do on nvptx? Comment Actions Targets can implement shouldExpandAtomicRMWInIR for the desired behavior, which NVPTX currently does not implement. Looking at AtomicExpandPass, it looks like either cmpxchg or LLSC expansions should work for the FP atomics already Comment Actions nvptx is similar to hexagon and riscv32, where fp atomics is translated to call of __atomic_fetch_add_4. Since currently only amdgcn supports fp atomics, I am going to add a TargetInfo hook about whether fp atomics is supported and only emit fp atomics for targets supporting it.
Comment Actions Yes, I guess we do in order to implement fetch_add & friends on floating point types (https://wg21.link/P0020R6). The builtins would need to work on float, double and long double. The code seems to suggest it does, however the tests only check for float. Does this support __atomic_fetch_{add,sub} on double and long double?
Comment Actions It depends on target. For x86_64, __atomic_fetch_{add,sub} on double and long double are translated to __atomic_fetch_sub_8 and __atomic_fetch_sub_16.
Comment Actions libc++ could implement atomic<float> using a cmpxchg loop with bit_cast and the FP instruction in most cases, and only use these builtins if available.
Comment Actions Why not have clang always emit atomicrmw for floats, and let AtomicExpandPass handle legalizing that into integer atomics if necessary, rather than adding a target hook in clang? Comment Actions Not all targets can legalize fp atomics by AtomicExpandPass. Some targets need library support. Comment Actions What are they missing? It can be expanded to a cmpxchg loop with bitcast to an integer type of the same size. Comment Actions That isn't true, because you can do so generically with a cmpxchg loop, assuming that size of atomic is supported by the target. This might not be the most efficient lowering choice, but it's always possible as a fallback. (And if the size is too large, then AtomicExpandPass will lower the cmpxchg to the libatomic call.) If a target wants to tell AtomicExpandPass that fp add/sub are supported, and then lower the resulting ATOMIC_LOAD_FSUB sdag node into a libcall of its choice, that's also ok (as long as the libcall is lock-free). Comment Actions how about other fp types e.g. bf16, half, long double? Do we need to diagnose them or not? Comment Actions Make IEEE single and double type as supported for fp atomics in all targets by default. This is based on the assumption that AtomicExpandPass or its ongoing work is sufficient to support fp atomics for all targets. This is to facilitate middle end and backend end development to support fp atomics. If a target would like to treat single and double fp atomics as unsupported, it can override the default behavior in its own TargetInfo. Comment Actions ping. I think I have addressed all the issues in FE. I think issues in AtomicExpandPass should be addressed by separate patches. Can we land this? Thanks. Comment Actions LGTM, modulo couple of nits. @jyknight are you OK with this? Do we have sufficient test coverage on all platforms to make sure we're not generating something that LLVM can't handle everywhere?
Comment Actions I updated TargetInfo for fp atomic support for common targets. Basically by default fp atomic support is now off. It is enabled only for targets which do not generate lib calls for fp atomics. This is because the availability of lib call depends on platform, so it is up to the Target owners to determine whether the support is available if lib call is needed. For those targets which are able to generate llvm fp atomic instructions, fp atomic support is enabled in clang, and tests are added to cover them.
Comment Actions I really don't think this should be a target option at all. Every target can support the atomic fadd/fsub IR instruction (via lowering to a cmpxchg loop if nothing else). If it doesn't work, that's a bug in LLVM. We shouldn't be adding target hooks in Clang to workaround LLVM bugs, rather, we should fix them. There is one nit -- atomicrmw doesn't (yet) support specifying alignment. There's work now to fix that, but until that's submitted, only naturally-aligned atomicrmw instructions can be created. So, for now, supporting only a naturally-aligned floating-point add would be a reasonable temporary measure.
Probably not.
No, I don't think that's a good way to go. We should fix LLVM if it's broken.
Comment Actions Oh, one more note, C11 has -- and clang already supports -- _Atomic long double x; x += 4; via lowering to a cmpxchg loop. Now that we have an LLVM IR representation for atomicrmw fadd/fsub, clang should be lowering the _Atomic += to that, too. (Doesn't need to be in this patch, but it should be done.) Comment Actions clang does not always emit atomic instructions for atomic builtins. Clang may emit lib calls for atomic builtins. Basically clang checks target info about max atomic inline width and if the desired atomic operation exceeds the supported atomic inline width, clang will emit lib calls for atomic builtins. The rationale is that the lib calls may be faster than the IR generated by the LLVM pass. This behavior has long existed and it also applies to fp atomics. I don't think emitting lib calls for atomic builtins is a bug. However, this does introduce the issue about whether the library functions for atomics are available for a specific target. As I said, only the target owners have the answer and therefore I introduced the target hook.
In clang, we only test IR generation, as is done for other atomic builtins. fp atomics do not have less coverage compared with other atomic builtins. Actually for other atomic builtins we do not even test them on different targets. The ISA generation of fp atomics should be done in llvm tests and should not be blocking clang change.
Comment Actions If we want the frontend to emit an error when the target doesn't support library-based atomics, that seems fine, but there's no reason to only do so for floating-point types. That is, we should have a TargetInfo method that asks whether atomics at a given size and alignment are supported at all, similar to what we have for "builtin" (lock-free) atomics, and we should check it for all the atomic types and operations. Actually, maybe we should take the existing hook and have it return one of { LockFree, Library, Unsupported }. Comment Actions The LLVM AtomicExpandPass is _also_ introducing libcalls (or cmpxchg loops), as is appropriate for a given target. We currently, redundantly, support the same thing in two places. It's a long-term goal of mine to simplify the atomics code in clang, by deferring more of it to LLVM, but some prerequisites (e.g. supporting misaligned atomicrmw) are not yet in place. The intent is that it is always valid to emit the LLVM atomic IR, and it will be transformed into whatever is best on a given target. As such, there's no reason to restrict these clang intrinsics. Yes, there are no generically available libcalls for atomic float math -- but that's okay -- let LLVM handle transform into a cmpxchg loop when required. Comment Actions
I suspect Yaxun's target cannot provide libcalls at all, which is why he wants to diagnose up-front. But I agree that we should be thinking about this uniformly, and that his target should be diagnosing *all* unsupported atomics. Comment Actions revised by John's comments. Added target hook and diagnostics for generic atomic operations. Comment Actions amdgpu target currently does not support atomic libcalls. I added a target hook for atomic operation support and diagnostics for generic atomic operations by John's comments. Clang has existing diagnostics for unsupported atomic load/store for some platforms, and functions about atomic support scattered in target info, AST context, and codegen. This change refactors these codes and unify them as a target hook. Comment Actions @rjmccall I have addressed the comments about diagnostics. Could you please review it? Thanks.
Comment Actions I still have the same fundamental objection as before to the parts of this patch for prohibiting FP add/sub on some targets. If a particular LLVM target cannot handle transforming an FP add/sub (or any other RMW operations!) into the correct cmpxchg or LL/SC loop, that's a bug in the backend which should be fixed. I don't see why we ought to add a bunch of functionality in the frontend to workaround this? (Some of the other changes, e.g. to diagnose lack of support for large atomics is useful, though.)
Comment Actions This patch focuses on clang work for enabling fp atomics. There is a middle end pass for lowering fp atomics to cmpxchg, however not all targets enable it or enable it properly. From clang point of view, those targets are not ready to say they support fp atomics, therefore it diagnose those situations and let clang fail gracefully instead of crashing with isel failure or missing symbols in linker. I have limited resources to work on middle end and backend for all targets. If a backend really cares about fp atomics, they should fix the atomic lowering pass then enable fp atomics support in clang Target info. This patch implements fp atomic support in clang. It does not make things worse in regarding the bugs in middle ends and backends. I think it is not beneficial to blocking this clang change due to middle end and backend issues. Comment Actions If the concern is that diagnose fp atomics as unsupported hinders middle end and backend work for fixing fp atomic issues, how about adding a -fenable-fp-atomics to clang which can override target info about fp atomics support. Comment Actions My concern is that this is treating a backend _bug_ as if it were just an optional feature. But it's not the case that it might be reasonable to either implement or not implement this in a backend -- it should be implemented, and those that don't are buggy. I'd be happier with just having an ISEL failure when you try to use fp atomics on broken targets, rather than adding all this code and configuration to Clang in order to avoid that. (And, of course, the target maintainers should also fix them) Comment Actions +1. I agree with James. Removing code is often harder than adding it. When you're adding things, you're the only user. Once things are in, they will start growing dependencies that will need to be dealt with if you ever want to remove the code. Clean solution that works for AMDGPU only for now is better than a potentially permanent workaround. Comment Actions For amdgpu target, we do need diagnose unsupported atomics (not limited to fp atomics) since we do not support libcall due to ISA level linking not supported. This is something we cannot fix in a short time and we would rather diagnose it than confusing the users with missing symbols in lld. For other targets, I can make changes to assume fp atomics are supported if width is within max inline atomic width of the target. Basically this will let fp atomics emitted for these targets and assuming middle end or backend will handle them properly. Comment Actions Diagnosing that you don't support atomics your target can't reasonably support is completely fine. (You could actually actually inline a locking approach if you really wanted to, though; Microsoft's std::atomic does that in the general case, although admittedly that's library code.) I would like to understand whether that's really type-specific or just size-specific, though, and I don't think we've gotten a plain answer about that. Is it true that amdgpu simply does not have a generic cmpxchg?
I think that's reasonable. Comment Actions If this is limited simply to not supporting oversized or misaligned atomics, I'd find that a lot less objectionable. At that point you just need a single boolean variable/accessor for whether the target can support atomic library calls. I note that we already have warning messages: warn_atomic_op_misaligned and warn_atomic_op_oversized. Maybe those can just be promoted to errors on AMDGPU. Comment Actions Re-use existing warning instead of introducing new diagnostics. Ping. Can some one help review this patch? I believe all comments addressed. Thanks. Comment Actions @jyknight - James, do you have further concerns about the patch?
Comment Actions I separated the change about diagnosing unaligned atomics for amdgpu to https://reviews.llvm.org/D99201 since these two changes are orthogonal.
Comment Actions Alright, mostly looks good.
|
I think it should be predicated on specific type.
E.g. NVPTX supports atomic ops on fp32 ~everywhere, but fp64 atomic add/sub is only supported on newer GPUs.
And then there's fp16...