[libc++] use mutex in stop_token
These are the results from the benchmark provided by @lewissbaker
Numbers are higher the better
- Baseline (using std::atomic)
Test1: jthread
Thread did 92610720 callback registration/deregistration in 30s
Thread did 94441617 callback registration/deregistration in 30s
Thread did 95045828 callback registration/deregistration in 30s
Thread did 92417235 callback registration/deregistration in 30s
Thread did 93926989 callback registration/deregistration in 30s
Thread did 94425982 callback registration/deregistration in 30s
Thread did 91251163 callback registration/deregistration in 30s
Thread did 92966481 callback registration/deregistration in 30s
Thread did 92799169 callback registration/deregistration in 30s
Thread did 92522792 callback registration/deregistration in 30s
Test2: async shutdown
Total iterations of 20 threads for 10s was 17164430
Total iterations of 20 threads for 10s was 15261966
Total iterations of 20 threads for 10s was 14647555
Total iterations of 20 threads for 10s was 14204384
Total iterations of 20 threads for 10s was 13803872
Total iterations of 20 threads for 10s was 13950054
Total iterations of 20 threads for 10s was 13941287
Total iterations of 20 threads for 10s was 14106324
Total iterations of 20 threads for 10s was 13442434
Total iterations of 20 threads for 10s was 13770722
- Using std::mutex
Test1: jthread
Thread did 176115775 callback registration/deregistration in 30s
Thread did 175788755 callback registration/deregistration in 30s
Thread did 175913759 callback registration/deregistration in 30s
Thread did 175611145 callback registration/deregistration in 30s
Thread did 175465895 callback registration/deregistration in 30s
Thread did 176001367 callback registration/deregistration in 30s
Thread did 176113327 callback registration/deregistration in 30s
Thread did 175989687 callback registration/deregistration in 30s
Thread did 175891133 callback registration/deregistration in 30s
Thread did 174903412 callback registration/deregistration in 30s
Test2: async shutdown
Total iterations of 20 threads for 10s was 13181221
Total iterations of 20 threads for 10s was 11502741
Total iterations of 20 threads for 10s was 10966349
Total iterations of 20 threads for 10s was 10615504
Total iterations of 20 threads for 10s was 10795518
Total iterations of 20 threads for 10s was 10297964
Total iterations of 20 threads for 10s was 10800617
Total iterations of 20 threads for 10s was 10540040
Total iterations of 20 threads for 10s was 10687574
Total iterations of 20 threads for 10s was 11015713
We can see using std::mutex in the implementation has about 80-90% speed up in Test1, but 30-40% slow down in Test2
We should modify these benchmarks to instead use GoogleBenchmark and time how long it takes to do N registrations.