This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Implement `stop_token`
ClosedPublic

Authored by huixie90 on Mar 2 2023, 1:24 PM.

Details

Reviewers
ldionne
Mordante
Group Reviewers
Restricted Project
Summary

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
huixie90 marked 14 inline comments as done.Mar 28 2023, 11:53 AM

address feedback

huixie90 updated this revision to Diff 512113.Apr 10 2023, 4:23 AM

formatting

Could you extract atomic_unique_lock.h, intrusive_list.h and intrusive_shared_ptr.h into a separate review? I think we can check those in soon and that will greatly simplify this review. I don't know where to put them. It's not great, but I'd suggest leaving them in __stop_token and we can think about reorganization later.

libcxx/include/__stop_token/atomic_unique_lock.h
2

Since this is general purpose, it makes little sense to keep this under __stop_token/, but we also don't have any good place for these general-purpose utilities. No action item, but let's keep this at the back of our mind that we might need to reorganize things a bit in the future.

5

Can you add just a few simple tests for this class under libcxx/test/libcxx?

41

Would it make sense to use std::try_to_lock_t here to convey that we're taking the lock? Otherwise, this can be confusing with std::unique_lock.

67

Maybe we could call this __owns_lock() to mirror unique_lock?

88

I think a short comment explaining what __fail_to_lock and __state_after_lock are used for, it's not obvious from the names alone.

Actually, maybe __fail_to_lock should be called __give_up_locking? __fail_to_lock can sound like something that is called in case you've failed to take the lock, which is not quite correct.

118
libcxx/include/__stop_token/intrusive_list.h
2

Can you add just a few simple tests for this class under libcxx/test/libcxx?

32

__empty()? Or __is_empty()?

43–48

I think this is fine, but you should add a comment explaining that it's fine not to set __front->__next_ = nullptr, since it's not part of the linked list anymore.

58–60

If we don't have a prev, then we are necessarily the head. I would instead change this to:

if (...) { ... } else {
  _LIBCPP_ASSERT(__node == __head_, "message");
  __pop_front();
}

IOW, I think this function needs the precondition that __node is in the list, similarly to how std::list requires the iterator to be from this list when you call erase(). I think that's a really useful and important model to have.

libcxx/include/__stop_token/intrusive_shared_ptr.h
28–29 ↗(On Diff #512113)

Can you please add a comment documenting what this type is and what its contents should be for anyone specializing it?

31–32 ↗(On Diff #512113)

Let's also add a short comment documenting this class, since it's general purpose in nature and we might reuse it in the future.

35 ↗(On Diff #512113)

I'd make this explicit, like the one for std::shared_ptr.

libcxx/include/__stop_token/stop_callback.h
85

Isn't this always nullptr since you default initialize it above?

libcxx/include/__stop_token/stop_state.h
151
154

I think we shouldn't call this __remove if we know that __cb is not in the list anymore, which can be checked via __cb->__prev_ == nullptr && !__callback_list_.__is_head(__cb). Oh and that's exactly the check you make above with __potentially_executing_now!

And now I think we have the precondition we want for __remove.

ldionne added inline comments.May 5 2023, 10:15 AM
libcxx/include/__stop_token/stop_source.h
49

I would add a comment like:

// increment `other` first so that we don't hit 0 in case of self-assignment
73–75

A default constructed stop_token has no associated stop-state, and thus has not had stop requested.

I assume this is tested in stop_requested.pass.cpp?

libcxx/include/__stop_token/stop_state.h
43–51
54–56
83
89–90
124

Or something!

151
libcxx/test/std/thread/thread.stoptoken/stopcallback/cons.const.token.pass.cpp
12–13

You can replace this by // XFAIL: availability-synchronization_library-missing now. Here and everywhere else.

libcxx/test/std/thread/thread.stoptoken/stopcallback/dtor.pass.cpp
156

Maybe a bit more explicit?

158

Is there any reason why you don't initialize sc_ directly with std::make_unique<>? That would make it clear that ptr is not actually meaningful other than a temporary variable.

ldionne added inline comments.May 12 2023, 9:20 AM
libcxx/include/__stop_token/stop_state.h
35

Here I think we do need to store a pointer to a bool saying whether we've been destroyed to handle https://eel.is/c++draft/thread.stoptoken#stopcallback.cons-6 like we've discussed.

huixie90 updated this revision to Diff 523010.May 17 2023, 5:25 AM
huixie90 marked 27 inline comments as done.

rebase

huixie90 added inline comments.May 17 2023, 5:28 AM
libcxx/include/__stop_token/atomic_unique_lock.h
41

std::try_to_lock_t is non-blocking. here we are still blocking so might not be appropriate

libcxx/include/__stop_token/stop_callback.h
85

__state is a function parameter which is not always nullptr
__state_ is the member which is nullptr

ldionne accepted this revision.May 19 2023, 9:58 AM

I am extremely happy with the state of this patch! Thanks so much for all the iterations, I feel like we really have something good with great test coverage now. I'm especially happy about how things ended up after intrusive shared ptr and other generic bits were extracted, I think the code is really neat now.

Once the remaining comments have been addressed and the CI is passing, feel free to ship this. Thank you!

libcxx/include/__stop_token/stop_callback.h
43

Do you have a test for this typedef?

85

Do you have a test for the fact that the stop_callback won't get shared ownership of the __stop_state if the constructor runs the callback immediately (aka if a stop has been requested at the moment of calling this constructor).

Edit: I don't think there's a way to test that without poking into the library internals -- IOW I'm not sure this is user-observable.

85–86

I might combine these two ifs into if (__state && __state->__add_callback(this)).

libcxx/include/__stop_token/stop_source.h
29

Do you have a test for this? We need to make sure that:

  1. It exists
  2. The constructor is explicit
  3. Maybe that it's empty and trivially default constructible?
  4. That std::nostopstate (the variable, not the class) exists and is constexpr.
libcxx/include/__stop_token/stop_state.h
2

Recording the discussion: We considered whether we should move some of these classes to the dylib, and I think the answer should be no. There isn't *that much* code, it's not platform-specific (e.g. no including windows.h), and it would add backdeployment requirements. All in all, I don't think there's enough bang for the buck.

57
71

Could we add a _LIBCPP_ASSERT here to make sure that we don't overflow? Same below for the decrement: if we decrement when we are already at 0, I guess it means that there's a lifetime bug in the program somewhere and someone's using a stop_source that has already been destroyed. Since that's easy to catch, I think we should add those assertions.

110

Taking note of what you explained just now:

Since there's only one thread looping through the list and executing callbacks, it should be possible to store a single pointer to the currently-executing callback in the stop state. That would avoid requiring a __completed_ and a __destroyed_ member in the callback. Then in __remove_callback, we would atomic-wait on the currently-executing callback not being this.

I would suggest that we ship this patch with the current approach, and we investigate this improvement in a separate patch.

176
libcxx/include/__stop_token/stop_token.h
53

This private is redundant with the one above.

libcxx/test/std/thread/thread.stoptoken/stopcallback/cons.const.token.pass.cpp
96

Don't change anything, but we generally use snake_case for variable names. Just in the future.

libcxx/test/std/thread/thread.stoptoken/stopsource/cons.copy.pass.cpp
10

This is duplicated with the line below (please double-check throughout all your tests).

13

Here you're missing // XFAIL: availability-synchronization_library-missing, that would fix some of your CI.

libcxx/utils/generate_header_tests.py
1

To fix your CI issue, I think you need to add "stop_token": ["UNSUPPORTED: no-threads, availability-synchronization_library-missing"], to libcxx/utils/generate_feature_test_macro_components.py.

This revision is now accepted and ready to land.May 19 2023, 9:58 AM
huixie90 updated this revision to Diff 523897.May 19 2023, 12:33 PM
huixie90 marked 9 inline comments as done.

address review comments

huixie90 updated this revision to Diff 524049.May 20 2023, 12:38 PM
huixie90 marked an inline comment as done.

CI

huixie90 updated this revision to Diff 524387.May 22 2023, 10:39 AM
  • [libc++] Implement stop_token
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: NoQ. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript

Something seems wrong with this patch. It contains multiple unrelated changes.

huixie90 removed projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project.
huixie90 removed subscribers: JDevlieghere, jholewinski, arsenm and 98 others.
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: herhut. · View Herald Transcript
Herald added a reviewer: rriddle. · View Herald Transcript
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: NoQ. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
huixie90 removed projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project.
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2023, 12:47 PM
Mordante accepted this revision.May 23 2023, 10:55 AM

Thanks for working on this. I've a few nits, otherwise LGTM!

libcxx/include/__stop_token/stop_callback.h
79
libcxx/include/__stop_token/stop_state.h
35
53

please search for other places too.

65
libcxx/test/std/thread/thread.stoptoken/nostopstate/cons.default.pass.cpp
37 ↗(On Diff #524441)

How about testing the type?

libcxx/test/std/thread/thread.stoptoken/stopcallback/cons.const.token.pass.cpp
62

I really dislike using auto here. Either use a real type or the same_as.
In general I'm not a fan of auto, but here we should make sure we get the expected types. Or is the token an unspecified type?

huixie90 updated this revision to Diff 525105.May 24 2023, 4:12 AM
huixie90 marked 6 inline comments as done.

address comments

ldionne added inline comments.May 26 2023, 9:07 AM
libcxx/include/__stop_token/stop_token.h
26

I think my comment about fixing the backdeployment CI using availability-synchronization_library-missing was misguided. I'm not sure, but it seems like adding _LIBCPP_AVAILABILITY_SYNC here should tell the compiler that the class is only available when the synchronization library is available, and it should avoid getting an error when you merely #include <stop_token>. If you try to use std::stop_token, then it would fail with an error message but that's intended and it wouldn't impact the generated tests like modules_include.sh.cpp.

huixie90 updated this revision to Diff 526961.May 31 2023, 1:57 AM

indentation

huixie90 updated this revision to Diff 527018.May 31 2023, 6:41 AM

try again

huixie90 updated this revision to Diff 527065.May 31 2023, 8:44 AM

try again

huixie90 updated this revision to Diff 527173.May 31 2023, 12:52 PM

fix apple platform

ldionne accepted this revision.Jun 2 2023, 8:43 AM

You should rebase onto main, that'll fix CI. LGTM once CI is green.

huixie90 updated this revision to Diff 527888.Jun 2 2023, 9:56 AM
huixie90 marked an inline comment as done.

rebase

huixie90 updated this revision to Diff 527923.Jun 2 2023, 11:54 AM

fix stop_token lit_header_restrictions

Can you confirm whether this landed as b77e50e6aef5650d1ce0ce7ad1a41049395b9426? If so, let's close this.

Sorry for the belated feedback - I only just noticed that the stop_token implementation had been published.

The implementation looks like it's doing all the right things from a correctness point of view, which is good!

I have a few suggestions for some changes that might help reduce the number of atomic ops needed for some operations which you could consider if performance was problematic.

libcxx/include/__stop_token/stop_source.h
65

The way that you have separated the ref-count from the stop_source_counter in the state_t means that destroying a stop_source object now needs to update two separate atomic variables, resulting in two atomic operations instead of one.
If you instead combine both fields into a single 64-bit atomic value, then you can combine this into a single atomic decrement on destruction.

libcxx/include/__stop_token/stop_state.h
59

Initialising the ref-count to zero means that you need to do an atomic increment when the object is first constructed.

Since you know that a state is only ever allocated by a stop_source, which will immediately take a reference, you could just initialise the reference to 1 here and have the stop_source constructor's intrusive_shared_ptr "adopt" the reference.
You could also initialise the ref-count to 1 on construction as well.
This would save two atomic op on every stop source construction.

Further, if you merged the ref-count and state_t atomics into a single 64-bit atomic then you could reduce the destruction of a stop_source to a single atomic subtraction.

Making both of these changes would allow reducing the number of atomics executed when constructing a stop_source and immediately destructing it from 4 atomic ops in the current implementation, to just 1 atomic operation.

133

Note that you don't need to lock again here if you know that this callback was the last callback in the list, since once you have set the "stop-requested" flag, then no more callback items should be enqueued. This should save you an additional atomic lock/unlock at the cost of a branch here.

160

If you were to combine the state_t and the ref-count you could potentially merge the ref-count increment with the unlock into a single atomic operation by adding (__ref_count_increment - _lock_bit) atomically (in the case where the stop_callback is being constructed with a const stop_token&). This would save an extra atomic op.

Ideally the compiler would be able to merge consecutive atomic ops like this, but I don't think I've seen clang do such a thing before.

Since we want to investigate some improvements, I would start by making stop_token experimental. Then we can add some benchmarks that faithfully represent the intended use cases for stop_token, not just some random benchmarks that test the performance of some arbitrary operations. @lewissbaker are you aware of any benchmarks that would be representative of the way stop_token intends to be used?

Then we can try a few things (in no order):

  1. Try to remove some atomic operations as laid out by @lewissbaker above
  2. Try implementing the lock with try { mutex.lock(); } catch (...) { ... }; instead and see if it's faster. We need try-catch cause the methods on stop_token are noexcept.
  3. Try to see if there's something obviously wrong with our atomic_unique_lock implementation. Maybe we're using compare_exchange_weak when we should be using strong, or maybe there's some other perf issue with our implementation.

I would also not eliminate the possibility that the current implementation actually performs quite well for the intended use cases of stop_token. But the benchmarks will give us more insight into that.

lewissbaker added a comment.EditedJun 28 2023, 11:14 PM

@lewissbaker are you aware of any benchmarks that would be representative of the way stop_token intends to be used?

I wrote up a few basic tests/benchmarks for an older implementation of the design in cppcoro here:
https://github.com/lewissbaker/cppcoro/blob/master/test/cancellation_token_tests.cpp

Although I don't think these tests are necessarily representative of real-world usage.

There are a few use-cases that are going to be common for stop_tokens which could be good candidates for benchmarks:

  1. The std::jthread use-case.

In this case, you will have a single thread created by std::jthread consuming the stop_token:

  • registering/deregistering callbacks, usually just one at a time (e.g. in implementation of condition_variable::wait functions that take a stop-token), but sometimes might register one or more long-lived callbacks combined with single ephemeral callbacks.
  • polling for a stop_request / stop_possible
  • copying the stop_token (e.g. when passing by-value to nested function calls)

And then a main thread will eventually call request_stop() from the destructor of the std::jthread.

A rough sketch:

int main() {
  std::uint64_t count = 0;
  {
    std::jthread t([&](std::stop_token st) {
      std::uint64_t local_count = 0;
      while (!st.stop_requested()) {
        std::stop_callback cb{st, [&]() noexcept  {}};
        ++local_count;
      }
      count = local_count;
    });
    std::this_thread::sleep_for(30s);
  }
  std::cout << "Thread did " << count << " callback registration/deregistration in 30s\n";
}

If you also have an implementation of the condition_variable::wait() functions then you could also try calling the stop_token-taking versions of those instead.

  1. Async use-case #1 - Shutdown stop_token

At app startup, the app creates a single stop_source which it will then pass an associated stop_token to every request.

Assume a thread-pool handles these requests and for each request it polls for stop_requested(), then attaches a stop-callback, does some work, then detaches the stop-callback some time later.
The lifetime of requests/callbacks would overlap with other requests/callback from the same thread.

Say something like each thread keeping a circular buffer of N stop-callbacks and destroying the stop-callbacks in FIFO order

Rough sketch:

constexpr size_t thread_count = 20;
constexpr size_t concurrent_request_count = 1000;
std::atomic<bool> ready{false};

struct dummy_stop_callback { void operator()() const noexcept {} };

void thread_func(uint64_t& count, std::stop_token st) {
  std::vector<std::optional<std::stop_callback<dummy_stop_callback>>> cbs(concurrent_request_count);

  ready.wait(false);

  std::uint32_t index = 0;
  std::uint64_t local_count = 0;
  while (!st.stop_requested()) {
    cbs[index].emplace(st, dummy_stop_callback{});
    index = (index + 1) % concurrent_request_count;
    ++local_count;
  }
  count = local_count;
} 

int main() {
  std::vector<std::uint64_t> counts(thread_count, 0);
  std::stop_source ss;
  {
    std::vector<std::jthread> threads;

    {
      auto release_on_exit = on_scope_exit([&]() { ready = true; ready.notify_all(); });
      for (size_t i = 0; i < thread_count; ++i) {
        threads.emplace_back(thread_func, counts[i], ss.get_token());
      }
    }
  
    std::this_thread::sleep_for(std::chrono::seconds(10));

    ss.request_stop();
  }

  std::uint64_t total_count = std::reduce(counts.begin(), counts.end());

  std::cout << "Total iterations of " << thread_count << " threads for 10s was " << total_count << "\n";
}

Variations on this might involve deregistering stop-callbacks on different threads than the threads they were registered on.
But this is more involved to setup.

  1. Async use-case #2 - Per-request stop_source

This is a variation on the previous use-case, but where each task ends up creating a bunch of different chained stop_sources to simulate the concurrent sub-task nature of some operations.
e.g. for each request, we create a new stop_source, A, and then attach a stop_callback to the global shutdown stop-token which then calls request_stop() on the new stop_source if called.
Then we simulate passing the a stop_token to two child operations, each of which then creates its own stop_source (B and C) and subscribes a stop_callback to A which calls request_stop() on B or C respectively.

This can be done to an arbitrary number of ops per request, with varying tree widths/depths.

This should help to evaluate the performance of creation/destruction of stop_source and non-contended registration of stop-callbacks with less of a focus on lots of threads hitting the same stop-state.

You could also try a variation on this without the top-level shutdown stop_callback.

It is worth noting, however, that with P2300 there is also a proposed std::in_place_stop_source, which provides an alternative that does not heap-allocate/ref-count but for which the caller is required to keep the stop-source alive until all child ops that use the stop-tokens are done with it, and so I'd expect a lot of the structured async use-cases to eventually use that where possible instead of std::stop_source. This in-place version was not appropriate for jthread since that needs to be movable.

Thanks a lot, this is great! @huixie90 I think this is something we can work with -- let's make this experimental, add a few benchmarks like laid out above and then iterate on that!

ldionne closed this revision.Jul 7 2023, 6:21 AM

I created https://github.com/llvm/llvm-project/issues/63738 to track these optimizations. Let's close this patch since it has been shipped in b77e50e6aef5650d1ce0ce7ad1a41049395b9426