This is an archive of the discontinued LLVM Phabricator instance.

[libc++] [DO NOT MERGE] benchmark stop_token and use std::mutex in the implementation of stop_token
AbandonedPublic

Authored by ldionne on Jul 7 2023, 4:05 AM.

Details

Reviewers
EricWF
lewissbaker
huixie90
Group Reviewers
Restricted Project
Summary

[libc++] use mutex in stop_token

These are the results from the benchmark provided by @lewissbaker

Numbers are higher the better

  1. Baseline (using std::atomic)

Test1: jthread

Thread did 92610720 callback registration/deregistration in 30s
Thread did 94441617 callback registration/deregistration in 30s
Thread did 95045828 callback registration/deregistration in 30s
Thread did 92417235 callback registration/deregistration in 30s
Thread did 93926989 callback registration/deregistration in 30s
Thread did 94425982 callback registration/deregistration in 30s
Thread did 91251163 callback registration/deregistration in 30s
Thread did 92966481 callback registration/deregistration in 30s
Thread did 92799169 callback registration/deregistration in 30s
Thread did 92522792 callback registration/deregistration in 30s

Test2: async shutdown

Total iterations of 20 threads for 10s was 17164430
Total iterations of 20 threads for 10s was 15261966
Total iterations of 20 threads for 10s was 14647555
Total iterations of 20 threads for 10s was 14204384
Total iterations of 20 threads for 10s was 13803872
Total iterations of 20 threads for 10s was 13950054
Total iterations of 20 threads for 10s was 13941287
Total iterations of 20 threads for 10s was 14106324
Total iterations of 20 threads for 10s was 13442434
Total iterations of 20 threads for 10s was 13770722

  1. Using std::mutex

Test1: jthread

Thread did 176115775 callback registration/deregistration in 30s
Thread did 175788755 callback registration/deregistration in 30s
Thread did 175913759 callback registration/deregistration in 30s
Thread did 175611145 callback registration/deregistration in 30s
Thread did 175465895 callback registration/deregistration in 30s
Thread did 176001367 callback registration/deregistration in 30s
Thread did 176113327 callback registration/deregistration in 30s
Thread did 175989687 callback registration/deregistration in 30s
Thread did 175891133 callback registration/deregistration in 30s
Thread did 174903412 callback registration/deregistration in 30s

Test2: async shutdown

Total iterations of 20 threads for 10s was 13181221
Total iterations of 20 threads for 10s was 11502741
Total iterations of 20 threads for 10s was 10966349
Total iterations of 20 threads for 10s was 10615504
Total iterations of 20 threads for 10s was 10795518
Total iterations of 20 threads for 10s was 10297964
Total iterations of 20 threads for 10s was 10800617
Total iterations of 20 threads for 10s was 10540040
Total iterations of 20 threads for 10s was 10687574
Total iterations of 20 threads for 10s was 11015713

We can see using std::mutex in the implementation has about 80-90% speed up in Test1, but 30-40% slow down in Test2

Diff Detail

Event Timeline

huixie90 created this revision.Jul 7 2023, 4:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 4:05 AM
huixie90 requested review of this revision.Jul 7 2023, 4:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 4:05 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
huixie90 edited the summary of this revision. (Show Details)Jul 7 2023, 4:12 AM
huixie90 added reviewers: EricWF, lewissbaker, ldionne.
huixie90 added a subscriber: lewissbaker.

This is actually quite nice! The second test (libcxx/benchmarks/stop_token.async_shutdown.bench.cpp) is the one that's more realistic (by far), so IMO it's the one that we should try to optimize for.

However, here's the timings I got on my arm64 mac studio:

With std::atomic:
stop_token.async_shutdown.pass.cpp: Total iterations of 20 threads for 10s was 3 100 911
stop_token.jthread.pass.cpp: Thread did 1 434 114 717 callback registration/deregistration in 30s

With std::mutex:
stop_token.async_shutdown.pass.cpp: Total iterations of 20 threads for 10s was 228 318 798
stop_token.jthread.pass.cpp: Thread did 1 618 248 541 callback registration/deregistration in 30s

This is rather bad. I think there's probably a significant problem with the implementation of our atomic notify functions. I think we need to figure out that bug before we can draw any conclusions about stop_token, since the current state is just bonkers.

libcxx/benchmarks/stop_token.async_shutdown.bench.cpp
1

We should modify these benchmarks to instead use GoogleBenchmark and time how long it takes to do N registrations.

56–59

I think we can abandon this since this is now https://github.com/llvm/llvm-project/pull/69117.

ldionne commandeered this revision.Nov 3 2023, 8:26 AM
ldionne edited reviewers, added: huixie90; removed: ldionne.

[GH PR Transition] Commandeering to abandon.

ldionne abandoned this revision.Nov 3 2023, 8:27 AM