This is an archive of the discontinued LLVM Phabricator instance.

Implementation of C++20's P1135R6 for libcxx
ClosedPublic

Authored by __simt__ on Oct 4 2019, 1:11 PM.

Details

Summary

This is the first review of this code so there's a lot to look at, and I'm fully expecting a lot of changes will be requested.

This patch contains...

A) Changes to __threading_support that introduce:

  1. Low-level semaphores on POSIX and Apple GCD, and futex on Linux.
  2. Declarations for a sharded table of contention state in the dylib, used by atomic::wait.
  3. Declarations for a thread_local variable in the dylib, used by barriers.

B) The <atomic> changes from P1135:

  1. High QoI: multi-layered back-off, using either/both the state from 1b and futexes from 1a.
  2. Low QoI: exponential time back-off, using chrono only.

C) Barrier:

  1. High QoI: a tree barrier, using the acceleration state in 1c to amortize the extra round.
  2. Low QoI: a central barrier, with a specialization for the empty completion function.

D) Semaphore:

  1. All QoI: a general template semaphore for very large ptrdiff_t values.
  2. High QoI: a specialization for “reasonable” ptrdiff_t values, using semaphores in 1a and acceleration atomics.
  3. Low QoI: a specialization for unit count.

E) Latch:

  1. All QoI (low): a central latch. (If there's a desire to see it, I could borrow the same QoI knobs from barrier, but sizeof() would grow a lot.)

F) The first basic tests for each facility from P1135.

G) Miscellaneous tweaks I needed to get this to build as libcu++ (the CUDA variant). We can drop these or you can take them as improvements. One of them is a legit macro bug.

Diff Detail

Repository
rCXX libc++

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
__simt__ added inline comments.Oct 10 2019, 2:50 PM
include/atomic
1557

This one is fun.

The model for a condition variable is that there is a lock that guards all reads and writes of the variable that has a condition evaluated on it. In this case here, that variable is an atomic, and we both read and write it outside the lock - we evaluate the condition outside the lock with much greater concurrency. However we're still exposed to a race here, because threads may be transitioning into the sleep state as we come to notify.

Taking the lock here resolves this race condition, even if we do nothing with the critical section.

include/barrier
81

It can be. How do we do that?

200

This one doesn't need to be. Removing.

217

That's not how it works in <atomic> and I patterned on that. Also it helps the CUDA port be less tedious.

If you feel strongly about this one I can obviously take the extra steps, but do you?

include/chrono
1065

No.

1571

Yes. CUDA can't make OS calls but we do have another way of implementing this. So I made it external-izable.

Are the lock free algorithms used by this implementation published somewhere? That would give me a lot more confidence in their correctness.

__simt__ marked 3 inline comments as done.Oct 10 2019, 8:23 PM

Are the lock free algorithms used by this implementation published somewhere? That would give me a lot more confidence in their correctness.

Now that's getting to the meat of the matter! The short answer is no, and the long answer is that it depends on what we're going to call an algorithm.

The <latch> code is more-or-less the spec's exposition converted into code, but I think it needs to become more complex later, for QoI reasons. In <barrier> we have well-known barrier algorithms like linear/central and tree/hierarchical barriers, but the pseudocode you'll find in literature needs a lot of adaption to fit into C++ in general (which doesn't do thread-local non-static members for classes), and into the C++20 std::barrier interface in particular (which doesn't assign a dense index space of IDs to thread that could be used to access O(P)-sized internal structures) -- and all that adaptation is 90% of the code in that file, not the algorithm itself. Finally, the gnarly code in <semaphore> is not the semaphore algorithm proper either (because that's been left to the OS except for the fallback cases) but the acceleration layer that one would need to put on top of it in order to make it fast (avoid calling the OS when it's not strictly necessary).

To be clear, I am reasonably concerned that there are bugs in this code, as well. The simpler / lower-QoI paths which CUDA will eventually ship are decently solid, but the complex / high-QoI paths that a performant CPU implementation needs (read: existentially needs, even in ~6 core laptop) are not as well tested relative to their complexity.

On our end, I think our intentions are to write more fuzzer-like tests that apply stress on them. I also have in the back of my mind the idea of proposing a C++Now talk in the vein of "How-to P1135 @ High QoI", to capture more eyeballs / document the post-adaptation versions of the algorithms better.

Where would you like to take this chat next?

griwes added a subscriber: griwes.Oct 16 2019, 8:14 PM
griwes added inline comments.
libcxx/include/semaphore
67 ↗(On Diff #224121)

Use a reserved identifier for the template parameter.

409 ↗(On Diff #224121)

Use a reserved identifier for the template parameter.

420 ↗(On Diff #224121)

Use a reserved identifier for the template parameter.

libcxx/test/std/thread/thread.barrier/arrive.pass.cpp
19 ↗(On Diff #224121)

This depends on CTAD, which makes it only work in C++17 and above. Instead it should probably say std::barrier<> b;.

Same comment for all the other barrier and semaphore tests.

__simt__ marked 4 inline comments as done.Oct 17 2019, 7:29 PM
__simt__ updated this revision to Diff 225563.Oct 17 2019, 7:38 PM

Refactored the semaphores into layers that each do one clear thing, and dispensed with a lot of #ifdefs. Moved as much semaphore code as possible out of the header and into the dylib.

Refactored the atomic code which previously necessitated type punning so it no longer does that, and narrowed the declarations of the contention state type. Made use of private contention state in the internal libcxx types that make use of wait/notify so they don't create contention with the user's uses of wait/notify.

Refactored some of the classes which previously used public inheritance of interface members, to use containment of the base type and forwarding member calls instead. This also let me simplify the bases to only define the core functionality and put the sugars on the one user interface type.

Removed asserts and alignas, and fixed several names which were not reserved before.

Fixed some issues Michał pointed out.

Fixed bugs where semaphores would randomly assert on APPLE.

Fixed bugs where semaphores would randomly fail timed waits on linux.

__simt__ updated this revision to Diff 225615.Oct 18 2019, 7:28 AM

Simplified how the contention state is conditionally-used.

Fixed several breaks in the CUDA version from the last patch.

__simt__ updated this revision to Diff 225651.Oct 18 2019, 9:28 AM

Fixed an incompatibility between the _Atomic(T) backend of <atomic> and the Futex function SFINAE in <__threading_support>.

__simt__ updated this revision to Diff 225668.Oct 18 2019, 11:11 AM

Added some documentation for the semaphore and barrier algorithms as comments.

Fixed a few issues hit on Linux when overriding to disable optimized paths.

__simt__ updated this revision to Diff 225670.Oct 18 2019, 11:13 AM

Oof, didn't attach the patch.

zoecarver added inline comments.Oct 20 2019, 9:43 AM
libcxx/include/atomic
1477 ↗(On Diff #225670)

I think this can be a static_cast.

1533 ↗(On Diff #225670)

Might be wrong but, can't this be a CAS (because you are both comparing and swapping 0)?

1535 ↗(On Diff #225670)

Why do you lock, then immediately unlock the mutex here?

1563 ↗(On Diff #225670)

Is this another "magic" number? If so, can it be a macro too?

1696 ↗(On Diff #225670)

This should only be defined after C++17.

1697 ↗(On Diff #225670)

You can use memory_order::seq_cst here.

1821 ↗(On Diff #225670)

Is this available in C++03? If so, use typedef.

Actually, do we support any compilers that don't support using type aliases?

libcxx/include/semaphore
105 ↗(On Diff #225670)

Does __cxx_atomic_notify_one always call out to __cxx_atomic_notify_all? If so, can we get rid of __cxx_atomic_notify_one?

__simt__ marked 7 inline comments as done.Oct 20 2019, 10:05 AM
__simt__ added inline comments.
libcxx/include/atomic
1477 ↗(On Diff #225670)

Actually it can and should go away now. I'll remove the cast.

1533 ↗(On Diff #225670)

It's not conditionally exchanging, it's exchanging unconditionally and then conditionally taking a computation step.

1535 ↗(On Diff #225670)
1563 ↗(On Diff #225670)

Sure.

1696 ↗(On Diff #225670)

That would be unfortunate, I think, <atomic> in general tries to offer its functionality in back versions.

Also, the CUDA port definitely doesn't want to be tied to '17 for quite some time.

Can we leave it on in all dialects that <atomic> supports, like the rest?

1697 ↗(On Diff #225670)

Pending other comment's resolution.

1821 ↗(On Diff #225670)

Will do.

__simt__ updated this revision to Diff 226373.Oct 24 2019, 9:53 PM

In this update I repaired the usefulness of <atomic> in C++03 mode. Other headers won't be supported there, but I'm avoiding regression to <atomic>.

__simt__ updated this revision to Diff 226543.Oct 26 2019, 11:15 AM

In this version I moved the tree barrier's core algorithm into the dylib so that any functional or performance issue found in it later could still be fixed without breaking ABI. By the same fact it narrows the visibility of the definition of the thread_local symbol it uses.

__simt__ updated this revision to Diff 226990.Oct 29 2019, 4:36 PM

This revision enables cross-ABI compatibility for building the dylib and the user application with different combinations of these options:

  • _LIBCPP_HAS_NO_PLATFORM_WAIT
  • _LIBCPP_HAS_NO_PLATFORM_WAIT_TABLE
  • _LIBCPP_HAS_NO_PLATFORM_WAIT_STATE

When the ABIs are mixed, the total application still gets the best performance available for the facilities actually available in common. This is somewhat gnarly (apologies for the several void*...) but was requested offline, to allow for a platform to progress from not having Futex support (say, today) to having Futex (eventually, later).

What this ABI hardening does not include is support for changing the type of "__libcpp_platform_contention_t" from a uint32_t to a uint64_t, later. Once a platform chooses to expose Futex, it's an ABI break to change the size of that, so it's worth pondering whether the platform should be held back until it can support uint64_t Futex.

This revision also introduces an internal-only abstraction layer called "__atomic_positive_ptrdiff_t" which centralizes a lot of turning concerns for both latches and semaphores, when layered on top of atomics, for different configurations.

In this revision the APPLE configuration is a bit quirky: 1) the headers build as if Futex is enabled and is a uint64_t, but 2) the dylib builds as if Futex is disabled and leans on the cross-ABI compatibility to fall-back to mutexes and condvars. Also, this is not quirky but my new recommendation, on APPLE it no longer uses dispatch semaphores.

__simt__ marked 8 inline comments as done.Oct 29 2019, 4:53 PM

Closing out some obsolete comments after changes and/or testing.

Hi guys. Could I get some action items or other movement on this review? Would you like me to make it possible to merge it as disabled or as experimental, and send me DRs to fix afterwards?

ldionne requested changes to this revision.Nov 18 2019, 1:15 PM
  • _LIBCPP_HAS_NO_PLATFORM_WAIT
  • _LIBCPP_HAS_NO_PLATFORM_WAIT_TABLE
  • _LIBCPP_HAS_NO_PLATFORM_WAIT_STATE

It would be great to document these macros somewhere in the code or in a .rst document.

Also, this paper didn't add any feature-test macro?

libcxx/include/atomic
1850 ↗(On Diff #226990)

Can we add a test for this?

1871 ↗(On Diff #226990)

This too!

libcxx/include/chrono
1571 ↗(On Diff #226990)

What's that?

libcxx/include/semaphore
77 ↗(On Diff #226990)

Thanks for the comments in this file, I wish we did that more often.

libcxx/include/type_traits
4032 ↗(On Diff #226990)

What are these drive-by fixes?

This revision now requires changes to proceed.Nov 18 2019, 1:15 PM
__simt__ marked 6 inline comments as done.Nov 18 2019, 8:20 PM
__simt__ added inline comments.
libcxx/include/atomic
1850 ↗(On Diff #226990)

Yes, I can do that.

1871 ↗(On Diff #226990)

Ack.

libcxx/include/chrono
1571 ↗(On Diff #226990)

Timed wait functions need <chrono>. That meant that I had to port <chrono> to CUDA as part of this effort. In CUDA you can't ask the OS for the time, you need to ask the silicon for the time. I followed the precedent of external threading to implement this as external clocks. Makes sense?

libcxx/include/semaphore
77 ↗(On Diff #226990)

NP, Eric had requested something here. Feel free to ask for comments elsewhere.

libcxx/include/type_traits
4032 ↗(On Diff #226990)

It's important for the CUDA port that there not be naked "namespace std { ... }" that don't use the macros, because we need to inject our namespace name. I tried to keep the drive-by to a minimum.

Thank you for all the work you've put into this patch. Here are a few more comments. Still working through all this code :)

I'm becoming increasingly worried about the amount of inline configuration and platform-specific code. I want to make sure we have tests for all of these (and ideally CI that can run all the tests).

libcxx/src/atomic.cpp
28 ↗(On Diff #226990)

For what platforms is _LIBCPP_HAS_PLATFORM_WAIT_STATE false, and have you tested on those platforms? I'm worried that there might be compiler errors.

32 ↗(On Diff #226990)

What's the point of this macro, ATOMIC_VAR_INIT (I realize you didn't add it, but I'm still curious)?

46 ↗(On Diff #226990)

This is different if _LIBCPP_HAS_PLATFORM_WAIT_STATE is false, right?

129 ↗(On Diff #226990)

Will this ever be true?

133 ↗(On Diff #226990)

same as below

139 ↗(On Diff #226990)

Is there a test for this case? What will be the effect of __s not getting updated?

148 ↗(On Diff #226990)

Instead of having void pointers that are casted, I don't see any reason these couldn't be defined as their actual types (__cxx_atomic_impl* and __libcpp_platform_contention_t*).

162 ↗(On Diff #226990)

Is __libcpp_platform_wait defined on non-linux machines?

libcxx/src/barrier.cpp
44 ↗(On Diff #226990)

Will this ever not only happen on the first iteration (if so, move it out of the loop maybe)?

libcxx/src/semaphore.cpp
27 ↗(On Diff #226990)

Why is this only needed for apple?

123 ↗(On Diff #226990)

Where does 50 come from? Maybe make this a macro.

152 ↗(On Diff #226990)

Is this the same as while (__old == 0)?

__simt__ marked 24 inline comments as done.Jan 9 2020, 1:55 PM

No problem. I just want to know what I need to do in the next patch in order to move forward.

libcxx/src/atomic.cpp
28 ↗(On Diff #226990)

It's false on CUDA. It would be false on platforms that can't rely on OS support for efficient waiting and have to fall back to polling with backoff.

32 ↗(On Diff #226990)

Prior to P0883 merging into C++20, atomics are constructed in an uninitialized state. You're supposed to use this macro to give it a static initialization. This macro is deprecated after C++20.

46 ↗(On Diff #226990)

If you have a platform wait state, then it's used by this facility.

If you don't have a platform wait state, then a condvar is used instead.

129 ↗(On Diff #226990)

Oh yes. Every time that there is a contending waiter concurrent with this notifier. This condition is true whenever the facility's use is not trivial.

133 ↗(On Diff #226990)

Answered below.

139 ↗(On Diff #226990)

Undefined behavior. Usually a segfault.

148 ↗(On Diff #226990)

This is the uglier part of the patch. We have this situation where one platform (Apple) is about to go from not having platform wait states, to having them. This API is trying to be able to take either kind of efficient wait structure in the application OR in the dylib, and provide efficient waiting either way.

I think it would not be unreasonable if you guys asked me to cut this part out in order to get a first commit of the facility, and then work in the background with Louis on recovering the capability in some way, with or without a public patch.

162 ↗(On Diff #226990)

Yes.

libcxx/src/barrier.cpp
44 ↗(On Diff #226990)

This needs to be inside the loop unfortunately. During the first round, we need to record our effective start (leaf) location in the tree. I could express it differently - record it only at the end of the first round - but it would remain in the loop nest.

What we could potentially do is dumb down this favorite barrier index concept, or remove it entirely. It's worth a relatively small amount of performance by comparison to using the tree barrier in the first place.

I would not be offended if you asked me to give you an ordered list of things I could delete and tell you what it costs you to delete them, approximately. Then we could stop where we are more comfortable.

libcxx/src/semaphore.cpp
27 ↗(On Diff #226990)

Because of how GCD semaphores work, unfortunately.

We could delete this by sending all Apple semaphores to the generic template based on atomics.

Last I spoke with Louis, we thought that would be acceptable.

123 ↗(On Diff #226990)

Yeah. Aren't there enough macros though? I think I might come back later with a different patch to propose a set of macros to configure back-offs.

152 ↗(On Diff #226990)

It's not. This is masking the lower 32-bits of a 64-bit value, and then comparing that with 0.

zoecarver added inline comments.Jan 10 2020, 2:54 PM
libcxx/src/atomic.cpp
32 ↗(On Diff #226990)

Heh. I didn't realize that was part of the standard (I was wondering why it wasn't mangled). Good to know.

46 ↗(On Diff #226990)

I see. I got confused for a second while trying to follow the #ifs.

139 ↗(On Diff #226990)

Isn't that a problem if _LIBCPP_HAS_NO_PLATFORM_WAIT_TABLE isn't defined?

148 ↗(On Diff #226990)

I'll defer to others on this but, (assuming it wouldn't be much more work for you) it might be better to remove that from this patch and add it as a follow-up patch.

In general, the less code in this patch, the faster it can get committed.

libcxx/src/barrier.cpp
44 ↗(On Diff #226990)

What we could potentially do is dumb down this favorite barrier index concept, or remove it entirely. It's worth a relatively small amount of performance by comparison to using the tree barrier in the first place.

I don't know enough about this piece of code or potential methods of implementation to comment on what we should do. I'll defer to you/others on this.

libcxx/src/semaphore.cpp
123 ↗(On Diff #226990)

Yes, there are an unfortunate number of macros in this bit of code.

I don't feel strongly about adding a macro now or later, maybe add a comment, though (that the number isn't magic or referenced elsewhere).

152 ↗(On Diff #226990)

I'm pretty sure they are the same. Look at this example. The first and last functions generate the same optimized assembly.

__simt__ updated this revision to Diff 240389.Jan 25 2020, 10:34 AM
__simt__ marked 11 inline comments as done.Jan 25 2020, 10:34 AM

A lot of changes in this patch, mostly simplifications based on the areas that got feedback about complexity.

For the most part, I have deleted code:

  1. I deleted the condvar-based implementation of atomic_wait because no current platform would make use of it. That also allowed me to delete much of the complicated ABI tricks added in the last revision. That removed several configuration macros.
  2. After discussing it with Louis, I switched the Apple config to use atomics-based semaphores. Not only did that delete a lot of code on its own, but it made some other optimizations in semaphores no longer needed, so I deleted them too. That removed several more configuration macros as well.
  3. I deleted code dealing with the thread_local optimization in barrier. Using this_thread::get::id to inject a thread-stable search seed for the algorithm is about just as good, and everything for that already exists.
  4. Latches, barriers and semaphores make more direct use of the wait/notify functionality now. Where they had "exceptions" with customized waiting, now there's a unified interface there, which deletes code.

In addition to this, the internal abstractions are much better now. I particularly like how natively-supported Futex types pass-through the intermediate layer now, it just works.

I hope you find this version is close enough that we might soon switch to delta patches on top, in trunk.

Cheers,

Olivier

jfb added a subscriber: MadCoder.Jan 27 2020, 10:42 AM

Not finished looking through this but, here are some comments.

Overall, I really like this change. It looks fantastic. It seems like a much cleaner/simpler implementation. Thanks for all the time you've spent on this.

libcxx/include/__threading_support
29 ↗(On Diff #240389)

Are all the platforms we support guaranteed to have this header (they very well might, I just don't know)?

31 ↗(On Diff #240389)

Is this macro still needed?

33 ↗(On Diff #240389)

This could be an else block, no?

77 ↗(On Diff #240389)

Are there any platforms where SEM_VALUE_MAX doesn't exist? Maybe some BSD platform? Could you check that it is defined (or maybe that we're on a certain platform)?

libcxx/include/atomic
2447 ↗(On Diff #240389)

I might just be missing it but, is this in the standard? Otherwise, could you mangle it?

2448 ↗(On Diff #240389)

nit: space between the equals.

libcxx/include/barrier
274 ↗(On Diff #240389)

Will __barrier_base be defined if _LIBCPP_HAS_NO_TREE_BARRIER isn't?

libcxx/include/latch
53 ↗(On Diff #240389)

Can we make it so that these headers just don't exist before C++11 via cmake? That might be a nicer way to fail.

__simt__ marked 10 inline comments as done.Jan 27 2020, 1:02 PM

Thanks Zoe!

Some responses ->

libcxx/include/__threading_support
29 ↗(On Diff #240389)

This header is part of Pthreads and this #include is in the Pthreads section of the header, so I think the answer is yes. The Apple case is special in that they have deprecated this header.

31 ↗(On Diff #240389)

Yes. We need a way to disable the platform native semaphores. Both Apple (in the current design) and CUDA need it.

33 ↗(On Diff #240389)

It would have to be an #elif !defined(...) so that CUDA could trigger it too.

77 ↗(On Diff #240389)

Like the header, it's mandated by Pthreads. It's conceivable that some BSD is non-conforming and we still want to work on it, but I don't have a BSD system handy to try.

libcxx/include/atomic
2447 ↗(On Diff #240389)

It is, in C++20.

2448 ↗(On Diff #240389)

OK.

libcxx/include/barrier
274 ↗(On Diff #240389)

Yes. The macro ends up selecting between two different base classes -- one is the scalable tree barrier, the other is the simpler central barrier.

libcxx/include/latch
53 ↗(On Diff #240389)

I don't have this skill. :^/

But I do think that we want to support them in C++11/14/17. This author's production team is going to present it to users in these dialects, for instance.

This is looking pretty good, only a few nitpicks. Once the nitpicks are addressed we can commit this and then go incrementally.

libcxx/include/__threading_support
34 ↗(On Diff #240389)

Any reason to introduce this macro instead of just use !defined(_LIBCPP_NO_NATIVE_SEMAPHORES) when you need it?

513 ↗(On Diff #240389)

static constexpr?

libcxx/include/atomic
550 ↗(On Diff #240389)

Just wondering: will this in any way make it harder to support <atomic> in freestanding?

1465 ↗(On Diff #240389)

I'm not seeing this used anywhere -- am I missing something?

libcxx/include/semaphore
61 ↗(On Diff #240389)

Olivier and I spoke offline, and it's reasonable to request C++14 at least here. Please do this in all the headers you're adding.

The idea is to avoid having headers that are supported in older dialects than they need to be (within reason), which is often a source of technical debt.

libcxx/src/atomic.cpp
47 ↗(On Diff #240389)

In apple_availability.h, add:

#if defined(__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__)
#if __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 101500
#define _LIBCPP_USE_ULOCK
#endif
#elif defined(__ENVIRONMENT_IPHONE_OS_VERSION_MIN_REQUIRED__)
#if __ENVIRONMENT_IPHONE_OS_VERSION_MIN_REQUIRED__ >= 130000
#define _LIBCPP_USE_ULOCK
#endif
#elif defined(__ENVIRONMENT_TV_OS_VERSION_MIN_REQUIRED__)
#if __ENVIRONMENT_TV_OS_VERSION_MIN_REQUIRED__ >= 130000
#define _LIBCPP_USE_ULOCK
#endif
#elif defined(__ENVIRONMENT_WATCH_OS_VERSION_MIN_REQUIRED__)
#if __ENVIRONMENT_WATCH_OS_VERSION_MIN_REQUIRED__ >= 60000
#define _LIBCPP_USE_ULOCK
#endif
#endif // __ENVIRONMENT_.*_VERSION_MIN_REQUIRED__

That should do it for all platforms aligned to Mac OS 10.15. Feel free to use whatever name for _LIBCPP_USE_ULOCK -- _LIBCPP_USE_APPLE_ULOCK probably makes the most sense since it's an Apple-specific API.

Then, your #elif becomes #elif defined(__APPLE__) && defined(_LIBCPP_USE_APPLE_ULOCK).

__simt__ marked 11 inline comments as done.Feb 14 2020, 8:35 AM

Replies. Also noting that I will include an update to the implementation status HTML page.

libcxx/include/__threading_support
34 ↗(On Diff #240389)

I was going to say it’s more convenient for me, but I’m not even convinced.

I can streamline this.

513 ↗(On Diff #240389)

Yes.

libcxx/include/atomic
550 ↗(On Diff #240389)

A bit, yeah. There are a few ways to proceed.

1465 ↗(On Diff #240389)

I intend to use that to implement the normative encouragement that atomic_signed/unsigned[...]_t should be the efficient ones for waiting. I added those types below, but they don’t follow that encouragement.

libcxx/include/semaphore
61 ↗(On Diff #240389)

Yep

libcxx/src/atomic.cpp
47 ↗(On Diff #240389)

Thanks!

__simt__ updated this revision to Diff 245055.Feb 17 2020, 4:21 PM

This revision addresses the outstanding comments, and incorporates Louis' recommended macros for ulock detection.

I also added missing max() members to barrier/latch that were added by NB comment resolution, and I implemented the normative encouragement for the lock-free types to match the ideal contention type, if there is one. I added tests for these things.

Finally, I further improved the ABI resilience of the barrier type so it's almost a full pimpl now.

ldionne accepted this revision.Feb 18 2020, 7:11 AM

Thanks a lot for all the work and patience, Olivier. I think we're good to go now. Do you have commit access?

Also, can you please confirm that you meant to introduce all the following new symbols in the dylib, and nothing else:

std::__1::__libcpp_atomic_wait(std::__1::__cxx_atomic_impl<long long, std::__1::__cxx_atomic_base_impl<long long> > const volatile*, long long)
std::__1::__libcpp_atomic_wait(void const volatile*, long long)
std::__1::__cxx_atomic_notify_all(std::__1::__cxx_atomic_impl<long long, std::__1::__cxx_atomic_base_impl<long long> > const volatile*)
std::__1::__cxx_atomic_notify_all(void const volatile*)
std::__1::__cxx_atomic_notify_one(std::__1::__cxx_atomic_impl<long long, std::__1::__cxx_atomic_base_impl<long long> > const volatile*)
std::__1::__cxx_atomic_notify_one(void const volatile*)
std::__1::__libcpp_atomic_monitor(std::__1::__cxx_atomic_impl<long long, std::__1::__cxx_atomic_base_impl<long long> > const volatile*)
std::__1::__libcpp_atomic_monitor(void const volatile*)
std::__1::__arrive_barrier_algorithm_base(std::__1::__barrier_algorithm_base*, unsigned char)
std::__1::__destroy_barrier_algorithm_base(std::__1::__barrier_algorithm_base*)
std::__1::__construct_barrier_algorithm_base(long&)

Those will have to be added to the ABI list file (I can do that).

This revision is now accepted and ready to land.Feb 18 2020, 7:11 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2020, 8:03 AM
ldionne added inline comments.Feb 24 2020, 8:50 AM
libcxx/include/atomic
443 ↗(On Diff #246218)

We forgot to update the synopsis for this header with the C++20 Synchronization Library. @__simt__ would you be willing to do that in a followup change?

teemperor added inline comments.
libcxx/include/module.modulemap
234 ↗(On Diff #246218)

Maybe I'm missing something here, but due to this submodule we are always parsing the barrier header even when building the module with a language standard < C++14. This means that everyone using C++11 is no longer able to use the 'std' Clang module after this commit. Is this intentional?

ldionne added inline comments.Feb 24 2020, 10:50 AM
libcxx/include/module.modulemap
234 ↗(On Diff #246218)

No, this is not intentional. Sorry, there were several failures that needed fixing after committing this (failures that were impossible to notice without throwing the change at all the build bots) -- we're getting there.

davide added a subscriber: davide.Feb 24 2020, 10:58 AM

This is the second time something like this happened. When you make such large scale changes -- make sure to run the lldb testsuite, or be ready to revert. We already had this discussion in the past, but clearly it didn't prevent the problem from happening again.

teemperor added inline comments.Feb 24 2020, 11:22 AM
libcxx/include/module.modulemap
234 ↗(On Diff #246218)

No worries, pushed a fix in b61e83eb0e31c1e6006569b43bb98a61ff44ca4c

This is the second time something like this happened. When you make such large scale changes -- make sure to run the lldb testsuite, or be ready to revert. We already had this discussion in the past, but clearly it didn't prevent the problem from happening again.

Last time's discussion didn't get anywhere, as running the LLDB test suite on each commit we make to libc++ isn't a viable option.

Libc++'s test matrix is insanely large, and actually we can't even see all of it cause some uses are behind closed doors. Whenever we make a non-trivial change (and BTW this is a purely additive change), it breaks someone somewhere. And you can rest assured that we do run a lot of testing locally to make sure we don't break people before committing, but it can't catch everything. That's just the way it is, and we try to fix it as quickly as possible -- I've spent all day so far trying to fix the consequences of applying this patch. This is not a lack of diligence, it's just that the nature of libc++ makes it difficult to test comprehensively.

I don't know what this commit broke in LLDB, if anything, but instead it's more useful to comment here with a link to the failure so we can help fix it. Being combative is no help, as we're all in the same boat.

This is the second time something like this happened. When you make such large scale changes -- make sure to run the lldb testsuite, or be ready to revert. We already had this discussion in the past, but clearly it didn't prevent the problem from happening again.

Last time's discussion didn't get anywhere, as running the LLDB test suite on each commit we make to libc++ isn't a viable option.

Libc++'s test matrix is insanely large, and actually we can't even see all of it cause some uses are behind closed doors. Whenever we make a non-trivial change (and BTW this is a purely additive change), it breaks someone somewhere. And you can rest assured that we do run a lot of testing locally to make sure we don't break people before committing, but it can't catch everything. That's just the way it is, and we try to fix it as quickly as possible -- I've spent all day so far trying to fix the consequences of applying this patch. This is not a lack of diligence, it's just that the nature of libc++ makes it difficult to test comprehensively.

I don't know what this commit broke in LLDB, if anything, but instead it's more useful to comment here with a link to the failure so we can help fix it. Being combative is no help, as we're all in the same boat.

I would like to reiterate that the policy in LLVM is that commit that break projects can be reverted willy-nilly.
I would also like to stress that my use cases are not behind closed doors -- LLDB is part of the LLVM umbrella.
I thought we reached an agreement last time, but looks like there was a miscommunication or misunderstanding on your side, so let me reiterate that you have two options:

  1. Run the lldb testsuite for changes that impact layout of structures -- or in any case you consider non-trivial, or if you don't think you can do this, at least ping somebody from lldb to take a look at the changes before they're committed. This of course requires some judgement on your side. When in doubt ask, as sending an e-mail is cheap.
  2. You commit without pre-commit checking lldb because you consider the additive cost of running ninja check-lldb is prohibitive. This is your choice, but don't be surprised if people will revert your commit if it breaks things.

Hopefully this clarifies my position.

FWIW, this change broke building for windows. It seemed fairly straightforward to fix though, see D75102.

Hi, a bisect shows that this patch seems to cause time.h to not be found for us:

[1958/45287] CXX kernel-x64-clang/obj/kernel/lib/libc/libc.snprintf.cc.o
FAILED: kernel-x64-clang/obj/kernel/lib/libc/libc.snprintf.cc.o
../../../recipe_cleanup/clangEPCywe/bin/clang++ -MD -MF kernel-x64-clang/obj/kernel/lib/libc/libc.snprintf.cc.o.d -o kernel-x64-clang/obj/kernel/lib/libc/libc.snprintf.cc.o -DTOOLCHAIN_VERSION=/b/s/w/ir/k/recipe_cleanup/clangEPCywe/bin -DZX_ASSERT_LEVEL=2 -DWITH_FRAME_POINTERS=1 -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS=1 -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -DKERNEL_BASE=0xffffffff80100000 -DSMP_MAX_CPUS=32 -D_KERNEL -DLK -DENABLE_PANIC_SHELL -DWITH_DEBUG_LINEBUFFER -DZIRCON_TOOLCHAIN -DLK_DEBUGLEVEL=2 -DWITH_KERNEL_PCIE -DKERNEL_RETPOLINE=1 -DWITH_UNIFIED_SCHEDULER=1 -DSCHEDULER_TRACING_LEVEL=0 -DARCH_X86 -DKERNEL_LOAD_OFFSET=0x00100000 -D_LIBCPP_DISABLE_EXTERN_TEMPLATE -I../../zircon/kernel/include -I../../zircon/kernel/lib/libc/include -I../../zircon/kernel/lib/ktl/include -I../../zircon/kernel/lib/io/include -I../../zircon/kernel/lib/heap/include -I../../zircon/system/ulib/lazy_init/include -I../../zircon/system/ulib/lockdep/include -I../../zircon/system/ulib/ffl/include -I../../zircon/kernel/vm/include -I../../zircon/kernel/lib/user_copy/include -I../../zircon/system/ulib/zircon-internal/include -I../../zircon/kernel/lib/ktrace/include -I../../zircon/system/ulib/fbl/include -I../../zircon/kernel/lib/fbl/include -I../../zircon/system/public -I../../zircon/kernel/arch/x86/include -I../../zircon/system/ulib/bitmap/include -I../../zircon/kernel/arch/x86/page_tables/include -I../../zircon/system/ulib/hwreg/include -I../../zircon/kernel/lib/heap/include -I../../zircon/kernel/lib/io/include -I../../zircon/kernel/lib/ktl/include -idirafter ../../zircon/kernel/lib/libc/limits-dummy -fno-common --target=x86_64-fuchsia -mcx16 -march=x86-64 -fcrash-diagnostics-dir=clang-crashreports -fcolor-diagnostics -ffile-prefix-map=/b/s/w/ir/k/fuchsia/out/default.zircon=. -ffile-prefix-map=/b/s/w/ir/k/fuchsia/out=.. -ffile-prefix-map=/b/s/w/ir/k/fuchsia=../.. -no-canonical-prefixes -O2 -g3 -Wall -Wextra -Wno-unused-parameter -Wno-address-of-packed-member -Wnewline-eof -Wno-unknown-warning-option -Wno-c99-designator -Wno-int-in-bool-context -Wno-range-loop-analysis -fno-omit-frame-pointer -ffunction-sections -fdata-sections -Wthread-safety -Wimplicit-fallthrough -fvisibility=hidden -ftrivial-auto-var-init=pattern -Werror -Wno-error=deprecated-declarations -fpie -mretpoline -mretpoline-external-thunk -ffreestanding -include ../../zircon/kernel/include/hidden.h -fno-unwind-tables -mno-red-zone -Wformat=2 -Wvla -mno-red-zone -msoft-float -mno-mmx -mno-sse -mno-sse2 -mno-3dnow -mno-avx -mno-avx2 -mcmodel=kernel -fdata-sections -Wno-gnu-string-literal-operator-template -std=c++17 -Wconversion -Wno-sign-conversion -Wextra-semi -Wno-deprecated-copy -Wno-non-c-typedef-for-linkage -ftemplate-backtrace-limit=0 -fno-exceptions -fno-rtti -fno-threadsafe-statics -fvisibility-inlines-hidden -faligned-new=8 -fno-exceptions -c ../../zircon/kernel/lib/libc/snprintf.cc
In file included from ../../zircon/kernel/lib/libc/snprintf.cc:10:
In file included from ../../zircon/kernel/lib/ktl/include/ktl/algorithm.h:10:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/algorithm:643:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/memory:666:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/atomic:550:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/__threading_support:14:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/chrono:827:
../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/ctime:49:10: fatal error: 'time.h' file not found
#include <time.h>

Do you know if this is a known or unintended side effect of this patch? I don't think we changed anything on our side that would've caused this.

Builder: https://ci.chromium.org/p/fuchsia/builders/ci/clang_toolchain.fuchsia-x64-debug-subbuild/b8887404211928925456?

Hi, a bisect shows that this patch seems to cause time.h to not be found for us:
[...]

Do you know if this is a known or unintended side effect of this patch? I don't think we changed anything on our side that would've caused this.

This must be cause we're now including <chrono> in <atomic>. This is not really something we can (or want to) workaround, since that's in the spec. Are you folks using some fancy C library that doesn't provide <time.h>? If so, I would argue that the fact it worked before this patch is just a coincidence.

jfb added a comment.Feb 26 2020, 10:18 AM

Hi, a bisect shows that this patch seems to cause time.h to not be found for us:

[1958/45287] CXX kernel-x64-clang/obj/kernel/lib/libc/libc.snprintf.cc.o
FAILED: kernel-x64-clang/obj/kernel/lib/libc/libc.snprintf.cc.o
../../../recipe_cleanup/clangEPCywe/bin/clang++ -MD -MF kernel-x64-clang/obj/kernel/lib/libc/libc.snprintf.cc.o.d -o kernel-x64-clang/obj/kernel/lib/libc/libc.snprintf.cc.o -DTOOLCHAIN_VERSION=/b/s/w/ir/k/recipe_cleanup/clangEPCywe/bin -DZX_ASSERT_LEVEL=2 -DWITH_FRAME_POINTERS=1 -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS=1 -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -DKERNEL_BASE=0xffffffff80100000 -DSMP_MAX_CPUS=32 -D_KERNEL -DLK -DENABLE_PANIC_SHELL -DWITH_DEBUG_LINEBUFFER -DZIRCON_TOOLCHAIN -DLK_DEBUGLEVEL=2 -DWITH_KERNEL_PCIE -DKERNEL_RETPOLINE=1 -DWITH_UNIFIED_SCHEDULER=1 -DSCHEDULER_TRACING_LEVEL=0 -DARCH_X86 -DKERNEL_LOAD_OFFSET=0x00100000 -D_LIBCPP_DISABLE_EXTERN_TEMPLATE -I../../zircon/kernel/include -I../../zircon/kernel/lib/libc/include -I../../zircon/kernel/lib/ktl/include -I../../zircon/kernel/lib/io/include -I../../zircon/kernel/lib/heap/include -I../../zircon/system/ulib/lazy_init/include -I../../zircon/system/ulib/lockdep/include -I../../zircon/system/ulib/ffl/include -I../../zircon/kernel/vm/include -I../../zircon/kernel/lib/user_copy/include -I../../zircon/system/ulib/zircon-internal/include -I../../zircon/kernel/lib/ktrace/include -I../../zircon/system/ulib/fbl/include -I../../zircon/kernel/lib/fbl/include -I../../zircon/system/public -I../../zircon/kernel/arch/x86/include -I../../zircon/system/ulib/bitmap/include -I../../zircon/kernel/arch/x86/page_tables/include -I../../zircon/system/ulib/hwreg/include -I../../zircon/kernel/lib/heap/include -I../../zircon/kernel/lib/io/include -I../../zircon/kernel/lib/ktl/include -idirafter ../../zircon/kernel/lib/libc/limits-dummy -fno-common --target=x86_64-fuchsia -mcx16 -march=x86-64 -fcrash-diagnostics-dir=clang-crashreports -fcolor-diagnostics -ffile-prefix-map=/b/s/w/ir/k/fuchsia/out/default.zircon=. -ffile-prefix-map=/b/s/w/ir/k/fuchsia/out=.. -ffile-prefix-map=/b/s/w/ir/k/fuchsia=../.. -no-canonical-prefixes -O2 -g3 -Wall -Wextra -Wno-unused-parameter -Wno-address-of-packed-member -Wnewline-eof -Wno-unknown-warning-option -Wno-c99-designator -Wno-int-in-bool-context -Wno-range-loop-analysis -fno-omit-frame-pointer -ffunction-sections -fdata-sections -Wthread-safety -Wimplicit-fallthrough -fvisibility=hidden -ftrivial-auto-var-init=pattern -Werror -Wno-error=deprecated-declarations -fpie -mretpoline -mretpoline-external-thunk -ffreestanding -include ../../zircon/kernel/include/hidden.h -fno-unwind-tables -mno-red-zone -Wformat=2 -Wvla -mno-red-zone -msoft-float -mno-mmx -mno-sse -mno-sse2 -mno-3dnow -mno-avx -mno-avx2 -mcmodel=kernel -fdata-sections -Wno-gnu-string-literal-operator-template -std=c++17 -Wconversion -Wno-sign-conversion -Wextra-semi -Wno-deprecated-copy -Wno-non-c-typedef-for-linkage -ftemplate-backtrace-limit=0 -fno-exceptions -fno-rtti -fno-threadsafe-statics -fvisibility-inlines-hidden -faligned-new=8 -fno-exceptions -c ../../zircon/kernel/lib/libc/snprintf.cc
In file included from ../../zircon/kernel/lib/libc/snprintf.cc:10:
In file included from ../../zircon/kernel/lib/ktl/include/ktl/algorithm.h:10:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/algorithm:643:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/memory:666:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/atomic:550:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/__threading_support:14:
In file included from ../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/chrono:827:
../../../recipe_cleanup/clangEPCywe/bin/../include/c++/v1/ctime:49:10: fatal error: 'time.h' file not found
#include <time.h>

Do you know if this is a known or unintended side effect of this patch? I don't think we changed anything on our side that would've caused this.

Builder: https://ci.chromium.org/p/fuchsia/builders/ci/clang_toolchain.fuchsia-x64-debug-subbuild/b8887404211928925456?

Is zircon freestanding, and not providing chrono (which then includes time.h)? That would explain the issue, we'd need to fix freestanding atomic by not exposing time-related stuff if time.h ins't available.

In D68480#1893987, @jfb wrote:

Hi, a bisect shows that this patch seems to cause time.h to not be found for us:

[...]

Do you know if this is a known or unintended side effect of this patch? I don't think we changed anything on our side that would've caused this.

Builder: https://ci.chromium.org/p/fuchsia/builders/ci/clang_toolchain.fuchsia-x64-debug-subbuild/b8887404211928925456?

Is zircon freestanding, and not providing chrono (which then includes time.h)? That would explain the issue, we'd need to fix freestanding atomic by not exposing time-related stuff if time.h ins't available.

So, actually, I think we need to have a discussion about the support of Freestanding in libc++. We should either implement it properly and have testers for it, or stop pretending that we do. Right now all I see us doing is adding complexity to our code to fix stuff that breaks on user configurations we don't know about, understand or test. As someone who has to maintain the code with that complexity, this concerns me.

And I do care about Freestanding -- I think it's important and we should support it. But I also think that we should not do it halfway like right now. In the current state of things, I literally have no way to ensure that we won't break that configuration again tomorrow. @leonardchan @phosek Is there a way that you could add a libc++ builder that runs freestanding?

jfb added a comment.Feb 26 2020, 10:37 AM
In D68480#1893987, @jfb wrote:

Hi, a bisect shows that this patch seems to cause time.h to not be found for us:

[...]

Do you know if this is a known or unintended side effect of this patch? I don't think we changed anything on our side that would've caused this.

Builder: https://ci.chromium.org/p/fuchsia/builders/ci/clang_toolchain.fuchsia-x64-debug-subbuild/b8887404211928925456?

Is zircon freestanding, and not providing chrono (which then includes time.h)? That would explain the issue, we'd need to fix freestanding atomic by not exposing time-related stuff if time.h ins't available.

So, actually, I think we need to have a discussion about the support of Freestanding in libc++. We should either implement it properly and have testers for it, or stop pretending that we do. Right now all I see us doing is adding complexity to our code to fix stuff that breaks on user configurations we don't know about, understand or test. As someone who has to maintain the code with that complexity, this concerns me.

And I do care about Freestanding -- I think it's important and we should support it. But I also think that we should not do it halfway like right now. In the current state of things, I literally have no way to ensure that we won't break that configuration again tomorrow. @leonardchan @phosek Is there a way that you could add a libc++ builder that runs freestanding?

Agreed, it seems like we want to figure out what individual freestanding subsets can "carve out" of regular C++, and how to do so cleanly. "time" seems like an easy thing to carve out, but it shouldn't just be done here, it should be part of the configuration header, applied consistently through libc++, and extensively tested. Ideally it wouldn't just be a builder: you'd have a directory which tests freestanding configuration options (i.e. "can I include atomic without time support?").

@leonardchan @phosek Is there a way that you could add a libc++ builder that runs freestanding?

We have it as a TODO but can bump it in priority.

Agreed, it seems like we want to figure out what individual freestanding subsets can "carve out" of regular C++, and how to do so cleanly. "time" seems like an easy thing to carve out, but it shouldn't just be done here, it should be part of the configuration header, applied consistently through libc++, and extensively tested. Ideally it wouldn't just be a builder: you'd have a directory which tests freestanding configuration options (i.e. "can I include atomic without time support?").

How simple would it be to carve specifically time.h out? I'm unfamiliar with libcxx internals + configurations. I'm also not insisting that progress on this patch be halted, but I'd like to know if there's anything we could do at least now to provide a workaround for freestanding targets,

jfb added a comment.Feb 26 2020, 2:04 PM

@leonardchan @phosek Is there a way that you could add a libc++ builder that runs freestanding?

We have it as a TODO but can bump it in priority.

Agreed, it seems like we want to figure out what individual freestanding subsets can "carve out" of regular C++, and how to do so cleanly. "time" seems like an easy thing to carve out, but it shouldn't just be done here, it should be part of the configuration header, applied consistently through libc++, and extensively tested. Ideally it wouldn't just be a builder: you'd have a directory which tests freestanding configuration options (i.e. "can I include atomic without time support?").

How simple would it be to carve specifically time.h out? I'm unfamiliar with libcxx internals + configurations. I'm also not insisting that progress on this patch be halted, but I'd like to know if there's anything we could do at least now to provide a workaround for freestanding targets,

As a non-libc++ maintainer (so take what I say with some doubt), I'd imagine things being detected in libcxx/include/__config, say through __has_include(time.h), and then you'd have a macro which says "no time". I'd then figure out the transitive inclusions of time.h in all libc++ headers, figure out want pulls them it, and use the macro to not expose the corresponding functions / types on those platforms.

You'd then need a decent amount of testing to make sure this works well.

Concretely here, there are a few atomic functions which require chrono, so you'd want to conditionally include chrono and conditionally define those functions. I think the patch which does this should also handle time in other libc++ headers, so we're not in a situation of "we kinda support having no time.h, but not consistently".

ldionne added inline comments.Mar 18 2020, 2:14 PM
libcxx/include/atomic
443 ↗(On Diff #246218)
Herald added a reviewer: Restricted Project. · View Herald TranscriptMar 18 2020, 2:14 PM
__simt__ marked an inline comment as done.Mar 18 2020, 3:09 PM
__simt__ added inline comments.
libcxx/include/atomic
443 ↗(On Diff #246218)

Yep. Thanks for the ping.

This appears to have baked in some ABI details that don't permit efficient implementation on some of our supported platforms. In the future, please can you post an RFC from things that need to integrate with platform-specific code? We're now stuck with this interface until we are willing to do an ABI-breaking change. In particular:

  • OpenBSD provides a Linux-compatible futex that requires a 32-bit integer to be used as the key. This can't be used with the existing ABI, which has baked in 64-bit integers as the size for OpenBSD.
  • FreeBSD's _umtx_op uses long and so is able to support the ABI on 64-bit platforms (LP64) but not on 32-bit ones.
  • Windows' WaitOnAddress supports 1, 2, 4, and 8-byte objects, yet can be used only with 8-byte objects in this interface.
  • The semaphore interface does not require exposing the underlying semaphore type, so could have been implemented directly with _umtx_op on FreeBSD, rather than with the wrapper. A binary semaphore is trivial to implement with a futex with the fast path in the uncontended case a single inline atomic op. This is not possible without breaking the ABI in the current design.

The atomic wait / wake problems could have been addressed by making the size an explicit argument, rather than relying on overload resolution. The semaphore issue could have been addressed by asking for contributions to the __threading_support bits before a release was cut.

This appears to have baked in some ABI details that don't permit efficient implementation on some of our supported platforms.

This is unfortunate. Have you folks shipped this yet? Perhaps it's not too late to fix things now if you've only released it very recently.

In the future, please can you post an RFC from things that need to integrate with platform-specific code? We're now stuck with this interface until we are willing to do an ABI-breaking change.

With all due respect, I consider that the responsibility is yours. In May 2019, Olivier started a thread where he was gathering feedback on what was to become the synchronization library: https://lists.llvm.org/pipermail/libcxx-dev/2019-May/000396.html. The thread got very little traction (see the June 2019 archive for the only few replies). Then, this patch was created and it was under review for roughly 6 months. Anyone paying attention to libc++ development could have seen this go by.

The reality is that if you ship libc++ on your platform, you should have someone caring about libc++ development to ensure it plays well with your platform. There are so many people using libc++ in various ways (vendors and others) that it's difficult for the core contributors to make sure that everybody "gets the memo" whenever we do something. Don't get me wrong, I very often go out of my way to ensure that vendors get notified when we make potentially contentious changes (I often ping a bunch of people that I know vend the library). However, all of that is based on basically me knowing (or thinking I know) who vends the library for what platform. None of it is really formalized, and misses can and do happen.

Furthermore, I would like to point out that libc++ goes out of its way to ensure that these sorts of things don't happen. We have availability annotations, for example, that allow vendors to control when users are able to start relying on features. In this case, I used those annotations for solving exactly the problem you encountered: I ensured that users were not going to start relying on the synchronization library on Apple platforms until we could confirm it was ABI stable, and I only flipped the switch one year after it had been implemented (see D96790). I've set things up so that any vendor could take advantage of those (they were previously Apple-specific), and I've always been surprised that no other vendors cared enough about QoI on their platforms to start using that infrastructure. If you're not using the tools provided to control what you ship on your platform, you can't expect someone who might not even know your existence to do it for you.

I would also like to point out that more than one year after pre-commit CI has been set up, we still don't have a FreeBSD and OpenBSD bots in BuildKite. Technically, that makes those two platforms unsupported. They are important platforms and we definitely want to support them, but we need a bit more energy from the vendor side here. Not very much, just a bit.

So, concretely, here are some action items for OpenBSD and FreeBSD vendors to become a well-behaved citizen and make sure such issues don't happen again in the future:

  1. Please set up BuildKite bots for OpenBSD and FreeBSD. I'll help you do it, it's easy. Then, we'll officially support those platforms and document it in our documentation.
  2. If you vend libc++, please add yourself to this Herald group: https://reviews.llvm.org/project/view/109/. I created that group recently so I could formalize the notion of "begin a libc++ vendor" and make it easier to ping everybody.
  3. Please have someone subscribe to libcxx-commits or use some Herald rules to be notified about reviews. This should ensure that, even if we don't ping the vendor group, there's some chance that you folks will see it. You don't have to engage in reviews or anything, but even just reviewing the subject lines once per week should be enough to ensure things like this don't happen again. That seems like a reasonable investment for an organization that ships libc++ on their platform.
  4. Setup availability annotations for your platform. See libcxx/include/__availability for details.

Now, concretely, please let me know if you think it's feasible to fix this now that you've shipped it. If so, we can work together on that. Otherwise, I'm sorry this happened, and the steps above should ensure it doesn't again.

This appears to have baked in some ABI details that don't permit efficient implementation on some of our supported platforms.

This is unfortunate. Have you folks shipped this yet? Perhaps it's not too late to fix things now if you've only released it very recently.

I am not sure who you mean when you say 'you' here. I am writing this as:

  • A member of the LLVM project and libc++ contributor
  • A libc++ consumer.

Since Howard's initial commit, libc++ has had strong ABI backwards compatibility guarantees for anything outside of the experimental namespace. Any time we define an interface between the library and the headers, that is an ABI that we need to support in the long term.

I came to this particular patch because one of my colleagues pointed me at atomic_wait as a possible replacement for platform-specific wrappers around futex-like abstractions in our code. I then tried to add FreeBSD and Windows support (the two platforms I care about for our code that do not have platform-specific code paths in libc++) and discovered that the ABI that we have committed to supporting cannot be implemented *at all* on 32-bit architectures with FreeBSD and cannot expose all of the functionality on Windows.

This is because this patch set decided to add a new pattern, rather than copying the same approach that every other atomic op uses: provide _1, _2, _4, _8, _16 and _n implementations. This also means that if Linux adds a futex64 system call (which I consider fairly probable, given that it is painful on Linux that you can't use a futex with pointers currently) then we can't take advantage of it without an ABI break.

In the future, please can you post an RFC from things that need to integrate with platform-specific code? We're now stuck with this interface until we are willing to do an ABI-breaking change.

With all due respect, I consider that the responsibility is yours. In May 2019, Olivier started a thread where he was gathering feedback on what was to become the synchronization library: https://lists.llvm.org/pipermail/libcxx-dev/2019-May/000396.html. The thread got very little traction (see the June 2019 archive for the only few replies). Then, this patch was created and it was under review for roughly 6 months. Anyone paying attention to libc++ development could have seen this go by.

The subject of that thread does not in any way mention that this requires platform-specific logic. Putting 'platform-specific futex-like' or similar in the subject would have made people pay attention. No one was tagged as a reviewer here to request perspectives from other platforms (dim or joerg, for example). It is the responsibility of the committer to ensure that a patch has adequate reviews. This did not happen here and we are now stuck with this ABI. I don't know how we can fix this, without adding a new version in the __2 namespace and eventually having a SONAME bump.

ldionne added a subscriber: dim.Jan 3 2022, 2:11 PM

This appears to have baked in some ABI details that don't permit efficient implementation on some of our supported platforms.

This is unfortunate. Have you folks shipped this yet? Perhaps it's not too late to fix things now if you've only released it very recently.

I am not sure who you mean when you say 'you' here. I am writing this as:

  • A member of the LLVM project and libc++ contributor
  • A libc++ consumer.

Right, "you" might not be the appropriate target here, I was thinking about whoever vends libc++ on FreeBSD (I wrongly assumed you were working for that vendor). My point is that nobody cares about ABI breaks for a platform unless the library is vended on that platform, in which case that vendor is the one who cares about ABI stability. In this case, that vendor is FreeBSD, and I consider that whoever vends libc++ on FreeBSD unfortunately missed the boat on this review. In FreeBSD's defence, it's true that we were not as well organized as we are now when we shipped this patch -- we now have a more detailed support policy and ties between various vendors and developers are somewhat clearer.

I will argue that breaking this ABI on Windows is not a big deal since nobody's vending libc++ on Windows. Or at least if they do, they are completely invisible to us ("us" being the people who develop libc++).

Since Howard's initial commit, libc++ has had strong ABI backwards compatibility guarantees for anything outside of the experimental namespace. Any time we define an interface between the library and the headers, that is an ABI that we need to support in the long term.

While I agree to some extent, I think it is important to highlight that ABI stability is not a property of the library itself, but a property of a specific vended instance of the library. For example, Chrome uses libc++, but they don't care about ABI stability because they link it statically. So we provide them with knobs to change libc++ behavior in various ways that are not ABI stable. On the other hand, some vendors like Apple do care about ABI stability for various reasons, and those vendors ensure that the ABI isn't broken on their platform, with the flavor of libc++ that they ship. And similarly, libc++ provides various knobs to help these vendors NOT break the ABI on their platforms.

In other words, over time, libc++ has grown from "ABI stability at all costs" to "providing tools to tweak it the way you want and be ABI stable if you want, but ABI unstable if you don't". That was the response we came up with because different consumers have different use cases for the library.

Where I'm going with this is that when Olivier created this review, I saw it had ABI implications and did what I needed to do for the vendor I represent. In the presence of a vast number of consumers of libc++, it's impossible for Olivier, or I, or any individual, to know about all the consumers that might not like the specific ABI that was selected. It might be obvious to you cause you've looked into it, but in the general case, the responsibility is on the vendors of the library to be at least a bit aware of what's going on and make sure they know what they are shipping on their platform. Cause that's the root cause of the issue -- this sub-optimal ABI was shipped on FreeBSD (hence locking them into it) without FreeBSD even knowing about it. That's really unfortunate, but like I said, libc++ provides ways to help vendors control what they ship on their platform, and they need only minimal involvement for this to happen. For instance, I would really welcome FreeBSD (and other vendors) taking advantage of the availability annotations I maintain. If FreeBSD had used those, it might have given some indication that FreeBSD might not want to ship those new symbols immediately.

I came to this particular patch because one of my colleagues pointed me at atomic_wait as a possible replacement for platform-specific wrappers around futex-like abstractions in our code. I then tried to add FreeBSD and Windows support (the two platforms I care about for our code that do not have platform-specific code paths in libc++) and discovered that the ABI that we have committed to supporting cannot be implemented *at all* on 32-bit architectures with FreeBSD and cannot expose all of the functionality on Windows.

Would you be willing to open a patch showing what changes are necessary for each of those? I think we can totally break the ABI on Windows since nobody's shipping it. And on 32-bit FreeBSD, if things don't work at all right now, then we can also "break the ABI" on that configuration since nobody can be depending on something that doesn't work. Perhaps things are not as bad as they seem?

Also, on a different note: I'm pretty serious about FreeBSD needing to implement pre-commit CI if it wants to be supported officially. It's a small investment, but it needs to happen for libc++ to keep FreeBSD as part of its list of supported platforms. @dim can you help with this?

I will argue that breaking this ABI on Windows is not a big deal since nobody's vending libc++ on Windows. Or at least if they do, they are completely invisible to us ("us" being the people who develop libc++).

Would you be willing to open a patch showing what changes are necessary for each of those? I think we can totally break the ABI on Windows since nobody's shipping it.

I would like to add some nuance to these statements here.

I do ship shared linked libc++ on Windows (as part of the llvm-mingw toolchain distribution), but without any strong ABI guarantees. Ideally ABI wouldn't break between releases, but if it does (especially in a fringe area) it's probably tolerable.

MSYS2 also ships an environment where libc++ is the standard system C++ library, linked shared. That environment is considered somewhat experimental afaik (CC @mati865), so if it breaks some package (which can be fixed by rebuilding), I think they'd consider it tolerable. If none of their packages actually have ended up using and depending on the aspects that may change (I haven't followed closely exactly how widespread it is), the ABI break should be totally transparent though.

dim added a subscriber: emaste.Jan 3 2022, 2:47 PM

Also, on a different note: I'm pretty serious about FreeBSD needing to implement pre-commit CI if it wants to be supported officially. It's a small investment, but it needs to happen for libc++ to keep FreeBSD as part of its list of supported platforms. @dim can you help with this?

I'm a FreeBSD committer but I don't have the access to setup e.g. bots on FreeBSD infrastructure. I hope @emaste can assist with this, as a FreeBSD Foundation member; I have discussed it with him before, and he seemed to be willing but probably runs out of time all the time... :)

I will argue that breaking this ABI on Windows is not a big deal since nobody's vending libc++ on Windows. Or at least if they do, they are completely invisible to us ("us" being the people who develop libc++).

Would you be willing to open a patch showing what changes are necessary for each of those? I think we can totally break the ABI on Windows since nobody's shipping it.

I would like to add some nuance to these statements here.

I do ship shared linked libc++ on Windows (as part of the llvm-mingw toolchain distribution), but without any strong ABI guarantees. Ideally ABI wouldn't break between releases, but if it does (especially in a fringe area) it's probably tolerable.

MSYS2 also ships an environment where libc++ is the standard system C++ library, linked shared. That environment is considered somewhat experimental afaik (CC @mati865), so if it breaks some package (which can be fixed by rebuilding), I think they'd consider it tolerable. If none of their packages actually have ended up using and depending on the aspects that may change (I haven't followed closely exactly how widespread it is), the ABI break should be totally transparent though.

Thanks for pinging me.
Actually I'd not consider it that much experimental now, it's available by default using pacman, it's mentioned on the website and people are interested in it.
Personally I won't hold back small breakage when it's justified (like here). Ping about such change welcome though so we can better prepare for it.

I will argue that breaking this ABI on Windows is not a big deal since nobody's vending libc++ on Windows. Or at least if they do, they are completely invisible to us ("us" being the people who develop libc++).

Would you be willing to open a patch showing what changes are necessary for each of those? I think we can totally break the ABI on Windows since nobody's shipping it.

I would like to add some nuance to these statements here.

I do ship shared linked libc++ on Windows (as part of the llvm-mingw toolchain distribution), but without any strong ABI guarantees. Ideally ABI wouldn't break between releases, but if it does (especially in a fringe area) it's probably tolerable.

MSYS2 also ships an environment where libc++ is the standard system C++ library, linked shared. That environment is considered somewhat experimental afaik (CC @mati865), so if it breaks some package (which can be fixed by rebuilding), I think they'd consider it tolerable. If none of their packages actually have ended up using and depending on the aspects that may change (I haven't followed closely exactly how widespread it is), the ABI break should be totally transparent though.

Thanks for clarifying, I wasn't aware of that. Like I said, it's incredibly difficult to keep track of everyone who ships libc++, and especially who provides what guarantees to their users.

I added @emaste and @mati865 to the libcxx-vendors "team" here so we can ping you in these situations.

MBkkt added a subscriber: MBkkt.Oct 14 2022, 12:02 PM
MBkkt added inline comments.
libcxx/src/atomic.cpp
38 ↗(On Diff #246218)

Why do you use timed wait here?
It's strange, and also different with macos(ulock) behavior

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 12:02 PM
__simt__ added inline comments.Oct 14 2022, 12:44 PM
libcxx/src/atomic.cpp
38 ↗(On Diff #246218)

Because it would be incorrect otherwise.

There's an infinitesimal but not zero probability that the serial number being waited on can roll over, incremented by precisely the right value, and then you might think it didn't change when it did.

There are a lot of weird discussions we could have from here. Does this negate Futex? I don't think it should. Is it even worth worrying about (like, the computer might be hit by a neutron flying in from space much more often than this)? I think it's cheap to mitigate.

There's a judgement-call here about how long to wait. It's arbitrary.

We wouldn't need to do this if Linux supported 64-bit Futex. Partly because we would rarely use serial numbers like this, and also partly because 64-bit numbers don't roll over inside of the useful life span of computer hardware.

MBkkt added inline comments.Oct 14 2022, 12:55 PM
libcxx/src/atomic.cpp
38 ↗(On Diff #246218)

First of thanks for answer.

But if I understand correctly it's only about not atomic<int> behavior (any atomic which use waiter pool)
So for atomic<int> it's not needed (also why unsigned int not used in wait without waiter pool?)

Another thought, for ulock behavior should be same?