This was one of the last required functions missing from Clang's libatomic.
Changes in behaviours:
- On x86-64, it'll now be able to detect and use cmpxchg16b
- The rest should be unchanged, given it'd already been able to figure it out statically.
oontvoo on Jul 31 2020, 1:59 PM.Authored by
There are a very large number of changes, so older changes are hidden. Show Older Changes
There is going to be a bunch more complexity required here, I'm afraid.
Not only do we need to detect the CPU capabilities in order to implement atomic_is_lock_free, but we also need to ensure that the implementations of atomic_load_c / etc can actually use those atomic instructions as appropriate.
This means we need to compile the function with different flags, e.g. -mcx16 for x86-64, in order for the compiler to be able to emit cmpxchg16b, or -march=i586 on 32bit x86. But -- we can't compile the entire file that way. Only the functions called after the runtime detection has assured that we're running on the appropriate hardware can be built with those flags.
GCC handles this by compiling the source files multiple times, with different flags, adding some additional #defines which change the function names for each of those compiles. Then, it arranges to dispatch to the correct function depending on the runtime CPU detection. Ideally, we'd also be able to use the "target" attribute to do this on an individual function within the file, but it looks like that may not actually work properly.
The implementation there provides the *right* answers for the cases that can be determined statically.
Most of the gunk in this patch is to set up for these checks to be extensible.
Then there's the actual making use of CMPXCHG8B/CMPXCHG16B when we know it's available in *_atomic_load/store.
Doesn't D92302 also check for the alignment of the pointer? My understanding is that with D92302, we're returning true from atomic_is_lock_free whenever the implementation in compiler-rt, as of D92302, would be lockfree. Whether that's the very best we can do on the hardware is a different story -- but D92302's atomic_is_lockfree will be consistent with the actual implementation. Am i mistaken?
So, basically, this patch does the above, but also allows using a lockfree implementation in more cases, is that it? I'm trying to figure out whether it makes sense to land D92302, and then this patch, since this one seems a lot more involved (and still has TODOs). Please let me know if I'm not getting this right, I'm a bit out of my depth trying to read this patch TBH.
Yes, D92302 provides an implementation of is_lock_free that is consistent with the current _atomic_load_* implementations.
The two TODOs in this patch was the attempt to simply this so that we have the basic framework in place before filling in more meaty details (ARMs and odd sizes support) . I suspect we'd do those in subsequent patches.
Got it, thanks a lot for the clarifications. Unless we're confident that we can move forward with this patch in the near future, I would start by doing D92302 since it solves real problems for libc++ (and Apple platforms to the extent that we're currently shipping a broken compiler-rt). @arichardson There are some tests in this patch, perhaps you could grab them? @oontvoo would you be OK with us trying to land D92302 and then rebasing your patch on top?
I don't think I'll be able to contribute to this discussion meaningfully without a significant time investment since I don't touch compiler-rt a lot. Since I have a huge review queue in libc++, I'd rather resign from the review than give the impression that it's blocked on me.