This is an archive of the discontinued LLVM Phabricator instance.

Add __atomic_is_lock_free to compiler-rt
Needs ReviewPublic

Authored by oontvoo on Jul 31 2020, 1:59 PM.

Details

Summary

This was one of the last required functions missing from Clang's libatomic.

Changes in behaviours:

  • On x86-64, it'll now be able to detect and use cmpxchg16b
  • The rest should be unchanged, given it'd already been able to figure it out statically.

Diff Detail

Event Timeline

oontvoo created this revision.Jul 31 2020, 1:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2020, 1:59 PM
Herald added subscribers: Restricted Project, jfb. · View Herald Transcript
oontvoo updated this revision to Diff 282788.Aug 3 2020, 7:54 PM

Handled the remaining sizes

oontvoo retitled this revision from [WIP] __atomic_is_lock_free to __atomic_is_lock_free.Aug 3 2020, 7:59 PM
oontvoo edited reviewers, added: jyknight; removed: jfb.
oontvoo published this revision for review.Aug 3 2020, 7:59 PM

This is technically a behavior change, someone like @ldionne ought to chime in, and we probably want to synchronize with libstdc++ @jwakely.

ldionne requested changes to this revision.Aug 4 2020, 6:03 AM

Can we please get a description of this change in the commit message and the Phab review?

This revision now requires changes to proceed.Aug 4 2020, 6:03 AM
oontvoo retitled this revision from __atomic_is_lock_free to Add __atomic_is_lock_free to compiler-rt.Aug 4 2020, 9:13 AM
oontvoo edited the summary of this revision. (Show Details)
oontvoo updated this revision to Diff 282976.Aug 4 2020, 11:29 AM

Removed use of GNU case range extension
(Triggered a bunch of clang-tidys)

oontvoo updated this revision to Diff 285166.Aug 12 2020, 12:46 PM
  • Remove the checking of non-power-of-2 sizes from is_lock_free for now (because the atomic_* ops don't handle them yet)
  • Rename function to _c and add pragma redefine
jfb added a comment.Aug 12 2020, 1:10 PM

Does this x86 change actually match what the runtime does in the same file?

oontvoo updated this revision to Diff 285230.Aug 12 2020, 6:16 PM

Updated diff

In D85044#2213896, @jfb wrote:

Does this x86 change actually match what the runtime does in the same file?

Yes

jfb added a comment.Aug 13 2020, 9:22 AM

Can you please add tests for this?

compiler-rt/lib/builtins/atomic.c
36

These are OS specific and won't work everywhere.

172

Please document what you're looking up here. It's definitely not "have_cas".

oontvoo updated this revision to Diff 285512.Aug 13 2020, 3:28 PM
oontvoo marked 2 inline comments as done.

Updated diff

Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2020, 3:28 PM
In D85044#2216023, @jfb wrote:

Can you please add tests for this?

I don't see this file being built anywhere. Can you point me to where to add the test?

oontvoo updated this revision to Diff 286408.Aug 18 2020, 3:06 PM

Simple test

There is going to be a bunch more complexity required here, I'm afraid.

Not only do we need to detect the CPU capabilities in order to implement atomic_is_lock_free, but we also need to ensure that the implementations of atomic_load_c / etc can actually use those atomic instructions as appropriate.

This means we need to compile the function with different flags, e.g. -mcx16 for x86-64, in order for the compiler to be able to emit cmpxchg16b, or -march=i586 on 32bit x86. But -- we can't compile the entire file that way. Only the functions called after the runtime detection has assured that we're running on the appropriate hardware can be built with those flags.

GCC handles this by compiling the source files multiple times, with different flags, adding some additional #defines which change the function names for each of those compiles. Then, it arranges to dispatch to the correct function depending on the runtime CPU detection. Ideally, we'd also be able to use the "target" attribute to do this on an individual function within the file, but it looks like that may not actually work properly.

compiler-rt/lib/builtins/atomic.c
34

Same check used twice? Also, I think we don't need a configure check for this, #if __has_include(<asm/hwcap.h>) would be ok.

154–156

This isn't threadsafe as written. It would be okay to use a relaxed atomic to load and store __have_atomic_cas (Relaxed is okay, because it's okay to potentially call check_x86_atomic_cas multiple times if this is reached simultaneously on multiple threads, and it will result in the same answer each time).

162–166

It's confusing to have "have_atomic_cas" sometimes mean "have CMPXCHG8b" and sometimes "have CMPXCHG16b".

And that confusion has led to a bug here -- on x86-64, cmpxchg8b is always available, but this code returns false for 8-byte atomics, unless CMPXCHG16b is also available.

177

AArch64 always supports atomics, the HWCAP_ATOMICS is needed only for the newer optimized instructions instead of the older LL/SC loop. (We may want to use those instructions for performance in the future, but it's not necessary for correctness, so we can leave out that complexity for now.

197

Let's leave aside ARM32 support for the first draft and just concentrate on x86 to get the overall framework in place.

202–203

At the end, it'll be better to fail to compile on platforms we haven't implemented, than to do the wrong thing.

But in order to allow implementing platforms piecemeal at the beginning, I'm ok to leave this fallback for now.

229

In C code (compiled with Clang), this can be spelled __attribute__((fallthrough)).

compiler-rt/test/builtins/Unit/atomic_lock_free_test.cc
3

Given that the contract for __atomic_is_lock_free doesn't require actual pointers, I'd leave out the construction of real objects in this test, and just pass in constructed values e.g. (void*)~7.

Also, all these assertions will need to be platform-specific.

oontvoo updated this revision to Diff 287068.Aug 21 2020, 11:00 AM
oontvoo marked 5 inline comments as done.

[WIP] mov stuff to target-specific dirs

oontvoo updated this revision to Diff 308488.Nov 30 2020, 2:56 PM
oontvoo marked an inline comment as done.

@jyknight: friendly ping ...

oontvoo updated this revision to Diff 308708.Dec 1 2020, 10:33 AM

rebase.

Note: This has effectively undone D86510 because the patch was trying to avoid _atomic_is_lock_free, which hadn't existed then.

I'm definitely naive w.r..t compiler-rt, but why is a simple approach like D92302 not sufficient?

I'm definitely naive w.r..t compiler-rt, but why is a simple approach like D92302 not sufficient?

The implementation there provides the *right* answers for the cases that can be determined statically.
There are cases where it can only be determined at runtime.
For eg., unaligned non-power of 2 sizes, or more importantly 16-byte atomic ops can have lock-free implementations on some but not all x86-64 .

Most of the gunk in this patch is to set up for these checks to be extensible.
Eg., for x86/x86-64, we wants to look for CMPXCHG8B/CMPXCHG16B

Then there's the actual making use of CMPXCHG8B/CMPXCHG16B when we know it's available in *_atomic_load/store.

I'm definitely naive w.r..t compiler-rt, but why is a simple approach like D92302 not sufficient?

The implementation there provides the *right* answers for the cases that can be determined statically.

Doesn't D92302 also check for the alignment of the pointer? My understanding is that with D92302, we're returning true from atomic_is_lock_free whenever the implementation in compiler-rt, as of D92302, would be lockfree. Whether that's the very best we can do on the hardware is a different story -- but D92302's atomic_is_lockfree will be consistent with the actual implementation. Am i mistaken?

So, basically, this patch does the above, but also allows using a lockfree implementation in more cases, is that it? I'm trying to figure out whether it makes sense to land D92302, and then this patch, since this one seems a lot more involved (and still has TODOs). Please let me know if I'm not getting this right, I'm a bit out of my depth trying to read this patch TBH.

Doesn't D92302 also check for the alignment of the pointer? My understanding is that with D92302, we're returning true from atomic_is_lock_free whenever the implementation in compiler-rt, as of D92302, would be lockfree. Whether that's the very best we can do on the hardware is a different story -- but D92302's atomic_is_lockfree will be consistent with the actual implementation. Am i mistaken?

Yes, D92302 provides an implementation of is_lock_free that is consistent with the current _atomic_load_* implementations.
(it's a no change in behaviour)

So, basically, this patch does the above, but also allows using a lockfree implementation in more cases, is that it?

Yes.

I'm trying to figure out whether it makes sense to land D92302, and then this patch, since this one seems a lot more involved (and still has TODOs). Please let me know if I'm not getting this right, I'm a bit out of my depth trying to read this patch TBH.

The two TODOs in this patch was the attempt to simply this so that we have the basic framework in place before filling in more meaty details (ARMs and odd sizes support) . I suspect we'd do those in subsequent patches.

Doesn't D92302 also check for the alignment of the pointer? My understanding is that with D92302, we're returning true from atomic_is_lock_free whenever the implementation in compiler-rt, as of D92302, would be lockfree. Whether that's the very best we can do on the hardware is a different story -- but D92302's atomic_is_lockfree will be consistent with the actual implementation. Am i mistaken?

Yes, D92302 provides an implementation of is_lock_free that is consistent with the current _atomic_load_* implementations.
(it's a no change in behaviour)

So, basically, this patch does the above, but also allows using a lockfree implementation in more cases, is that it?

Yes.

I'm trying to figure out whether it makes sense to land D92302, and then this patch, since this one seems a lot more involved (and still has TODOs). Please let me know if I'm not getting this right, I'm a bit out of my depth trying to read this patch TBH.

The two TODOs in this patch was the attempt to simply this so that we have the basic framework in place before filling in more meaty details (ARMs and odd sizes support) . I suspect we'd do those in subsequent patches.

Got it, thanks a lot for the clarifications. Unless we're confident that we can move forward with this patch in the near future, I would start by doing D92302 since it solves real problems for libc++ (and Apple platforms to the extent that we're currently shipping a broken compiler-rt). @arichardson There are some tests in this patch, perhaps you could grab them? @oontvoo would you be OK with us trying to land D92302 and then rebasing your patch on top?

Yes, sounds good. I don't see why it should wait.

Yes, sounds good. I don't see why it should wait.

Excellent, thank you!

oontvoo updated this revision to Diff 309948.Dec 7 2020, 10:30 AM

updated tests

oontvoo updated this revision to Diff 309954.Dec 7 2020, 10:38 AM

rename aux file to .c

arichardson added inline comments.Dec 8 2020, 3:06 AM
compiler-rt/test/builtins/Unit/atomic_lock_free_test.cc
58

This will fail on many 32-bit architectures (e.g. RISC-V 32)

oontvoo updated this revision to Diff 310402.Dec 8 2020, 6:19 PM
oontvoo marked an inline comment as done.

addressed comment: restrict size-8 testing to x86-64

ldionne resigned from this revision.Feb 25 2021, 10:22 AM

I don't think I'll be able to contribute to this discussion meaningfully without a significant time investment since I don't touch compiler-rt a lot. Since I have a huge review queue in libc++, I'd rather resign from the review than give the impression that it's blocked on me.

Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2022, 10:00 AM
Herald added a subscriber: Enna1. · View Herald Transcript