Page MenuHomePhabricator

[libc++] Replaces std::sort by Bitset sorting algorithm.
Needs ReviewPublic

Authored by minjaehwang on Dec 14 2020, 9:27 AM.

Details

Reviewers
jdoerfert
Group Reviewers
Restricted Project
Summary

Bitset sorting shows up to 5x improvements on sorting primitive types and 1.5x improvements for complex types such as std::string.
The evaluation is done by the libcxx algorithms benchmark.

std::sort with Bitset sorting algorithm.

Running projects/libcxx/benchmarks/algorithms.libcxx.out
Run on (12 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 15360K (x1)
Load Average: 0.57, 0.76, 0.55
------------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations
------------------------------------------------------------------------------------------------------------------
BM_Sort_uint32_Random_1                                                      3.03 ns         3.02 ns    230162432
BM_Sort_uint32_Random_4                                                      4.09 ns         4.09 ns    170917888
BM_Sort_uint32_Random_16                                                     8.46 ns         8.46 ns     80740352
BM_Sort_uint32_Random_64                                                     13.1 ns         13.1 ns     52690944
BM_Sort_uint32_Random_256                                                    16.4 ns         16.4 ns     41680896
BM_Sort_uint32_Random_1024                                                   18.8 ns         18.8 ns     36438016
BM_Sort_uint32_Random_16384                                                  23.0 ns         23.0 ns     29884416
BM_Sort_uint32_Random_262144                                                 27.1 ns         27.1 ns     25427968
BM_Sort_uint32_Ascending_1                                                   3.27 ns         3.26 ns    213123072
BM_Sort_uint32_Ascending_4                                                  0.981 ns        0.980 ns    716177408
BM_Sort_uint32_Ascending_16                                                 0.879 ns        0.879 ns    800063488
BM_Sort_uint32_Ascending_64                                                  3.66 ns         3.66 ns    190840832
BM_Sort_uint32_Ascending_256                                                 3.45 ns         3.45 ns    202375168
BM_Sort_uint32_Ascending_1024                                                3.33 ns         3.33 ns    210239488
BM_Sort_uint32_Ascending_16384                                               3.26 ns         3.26 ns    213385216
BM_Sort_uint32_Ascending_262144                                              3.23 ns         3.23 ns    215482368
BM_Sort_uint32_Descending_1                                                  3.27 ns         3.26 ns    214695936
BM_Sort_uint32_Descending_4                                                  1.26 ns         1.26 ns    556269568
BM_Sort_uint32_Descending_16                                                 4.88 ns         4.88 ns    144179200
BM_Sort_uint32_Descending_64                                                 4.88 ns         4.88 ns    139460608
BM_Sort_uint32_Descending_256                                                4.86 ns         4.86 ns    144179200
BM_Sort_uint32_Descending_1024                                               4.58 ns         4.58 ns    153354240
BM_Sort_uint32_Descending_16384                                              4.47 ns         4.47 ns    156237824
BM_Sort_uint32_Descending_262144                                             4.46 ns         4.46 ns    156499968
BM_Sort_uint32_SingleElement_1                                               3.28 ns         3.27 ns    214433792
BM_Sort_uint32_SingleElement_4                                              0.978 ns        0.977 ns    713293824
BM_Sort_uint32_SingleElement_16                                             0.910 ns        0.909 ns    772276224
BM_Sort_uint32_SingleElement_64                                              3.72 ns         3.72 ns    187695104
BM_Sort_uint32_SingleElement_256                                             3.48 ns         3.48 ns    201588736
BM_Sort_uint32_SingleElement_1024                                            3.43 ns         3.43 ns    205258752
BM_Sort_uint32_SingleElement_16384                                           3.31 ns         3.31 ns    211550208
BM_Sort_uint32_SingleElement_262144                                          3.42 ns         3.42 ns    212336640
BM_Sort_uint32_PipeOrgan_1                                                   3.39 ns         3.38 ns    206045184
BM_Sort_uint32_PipeOrgan_4                                                   1.08 ns         1.08 ns    655622144
BM_Sort_uint32_PipeOrgan_16                                                  2.04 ns         2.04 ns    338165760
BM_Sort_uint32_PipeOrgan_64                                                  8.71 ns         8.71 ns     79429632
BM_Sort_uint32_PipeOrgan_256                                                 6.95 ns         6.95 ns     98828288
BM_Sort_uint32_PipeOrgan_1024                                                7.05 ns         7.04 ns     96993280
BM_Sort_uint32_PipeOrgan_16384                                               8.40 ns         8.40 ns     82313216
BM_Sort_uint32_PipeOrgan_262144                                              9.19 ns         9.19 ns     73138176
BM_Sort_uint64_Random_1                                                      2.86 ns         2.85 ns    243007488
BM_Sort_uint64_Random_4                                                      4.06 ns         4.06 ns    171442176
BM_Sort_uint64_Random_16                                                     8.42 ns         8.41 ns     81264640
BM_Sort_uint64_Random_64                                                     12.9 ns         12.9 ns     52953088
BM_Sort_uint64_Random_256                                                    16.4 ns         16.4 ns     42205184
BM_Sort_uint64_Random_1024                                                   18.7 ns         18.7 ns     36700160
BM_Sort_uint64_Random_16384                                                  22.9 ns         22.9 ns     30146560
BM_Sort_uint64_Random_262144                                                 27.1 ns         27.1 ns     25427968
BM_Sort_uint64_Ascending_1                                                   2.85 ns         2.84 ns    246415360
BM_Sort_uint64_Ascending_4                                                  0.859 ns        0.858 ns    819462144
BM_Sort_uint64_Ascending_16                                                 0.817 ns        0.817 ns    855638016
BM_Sort_uint64_Ascending_64                                                  3.73 ns         3.73 ns    188219392
BM_Sort_uint64_Ascending_256                                                 3.55 ns         3.55 ns    196870144
BM_Sort_uint64_Ascending_1024                                                3.44 ns         3.44 ns    202899456
BM_Sort_uint64_Ascending_16384                                               3.34 ns         3.34 ns    211288064
BM_Sort_uint64_Ascending_262144                                              3.30 ns         3.30 ns    211288064
BM_Sort_uint64_Descending_1                                                  2.85 ns         2.84 ns    246415360
BM_Sort_uint64_Descending_4                                                  1.31 ns         1.31 ns    532938752
BM_Sort_uint64_Descending_16                                                 3.61 ns         3.61 ns    194248704
BM_Sort_uint64_Descending_64                                                 4.85 ns         4.85 ns    144179200
BM_Sort_uint64_Descending_256                                                5.00 ns         5.00 ns    136839168
BM_Sort_uint64_Descending_1024                                               4.70 ns         4.70 ns    148635648
BM_Sort_uint64_Descending_16384                                              4.53 ns         4.53 ns    154402816
BM_Sort_uint64_Descending_262144                                             4.49 ns         4.49 ns    155975680
BM_Sort_uint64_SingleElement_1                                               2.86 ns         2.85 ns    246677504
BM_Sort_uint64_SingleElement_4                                              0.854 ns        0.853 ns    821297152
BM_Sort_uint64_SingleElement_16                                             0.818 ns        0.817 ns    854327296
BM_Sort_uint64_SingleElement_64                                              3.78 ns         3.78 ns    185335808
BM_Sort_uint64_SingleElement_256                                             3.60 ns         3.60 ns    194248704
BM_Sort_uint64_SingleElement_1024                                            3.49 ns         3.49 ns    199229440
BM_Sort_uint64_SingleElement_16384                                           3.33 ns         3.33 ns    207618048
BM_Sort_uint64_SingleElement_262144                                          3.31 ns         3.31 ns    209977344
BM_Sort_uint64_PipeOrgan_1                                                   2.87 ns         2.86 ns    245366784
BM_Sort_uint64_PipeOrgan_4                                                  0.994 ns        0.994 ns    702808064
BM_Sort_uint64_PipeOrgan_16                                                  1.96 ns         1.96 ns    355729408
BM_Sort_uint64_PipeOrgan_64                                                  8.35 ns         8.35 ns     81526784
BM_Sort_uint64_PipeOrgan_256                                                 7.04 ns         7.04 ns     96731136
BM_Sort_uint64_PipeOrgan_1024                                                7.11 ns         7.11 ns     96468992
BM_Sort_uint64_PipeOrgan_16384                                               8.42 ns         8.41 ns     81526784
BM_Sort_uint64_PipeOrgan_262144                                              9.23 ns         9.23 ns     74448896
BM_Sort_pair<uint32, uint32>_Random_1                                        3.59 ns         3.59 ns    195821568
BM_Sort_pair<uint32, uint32>_Random_4                                        5.30 ns         5.30 ns    128450560
BM_Sort_pair<uint32, uint32>_Random_16                                       20.8 ns         20.8 ns     33030144
BM_Sort_pair<uint32, uint32>_Random_64                                       30.2 ns         30.2 ns     22806528
BM_Sort_pair<uint32, uint32>_Random_256                                      36.8 ns         36.8 ns     18874368
BM_Sort_pair<uint32, uint32>_Random_1024                                     41.7 ns         41.7 ns     16515072
BM_Sort_pair<uint32, uint32>_Random_16384                                    52.2 ns         52.2 ns     13107200
BM_Sort_pair<uint32, uint32>_Random_262144                                   60.3 ns         60.3 ns     11010048
BM_Sort_pair<uint32, uint32>_Ascending_1                                     3.43 ns         3.43 ns    203423744
BM_Sort_pair<uint32, uint32>_Ascending_4                                     1.75 ns         1.75 ns    399245312
BM_Sort_pair<uint32, uint32>_Ascending_16                                    2.36 ns         2.36 ns    297271296
BM_Sort_pair<uint32, uint32>_Ascending_64                                    1.38 ns         1.38 ns    506986496
BM_Sort_pair<uint32, uint32>_Ascending_256                                   1.33 ns         1.33 ns    525860864
BM_Sort_pair<uint32, uint32>_Ascending_1024                                  1.15 ns         1.15 ns    612368384
BM_Sort_pair<uint32, uint32>_Ascending_16384                                 1.00 ns         1.00 ns    697827328
BM_Sort_pair<uint32, uint32>_Ascending_262144                               0.992 ns        0.992 ns    700448768
BM_Sort_pair<uint32, uint32>_Descending_1                                    3.47 ns         3.46 ns    202637312
BM_Sort_pair<uint32, uint32>_Descending_4                                    2.39 ns         2.39 ns    292814848
BM_Sort_pair<uint32, uint32>_Descending_16                                   5.72 ns         5.72 ns    119799808
BM_Sort_pair<uint32, uint32>_Descending_64                                   3.58 ns         3.58 ns    197394432
BM_Sort_pair<uint32, uint32>_Descending_256                                  3.25 ns         3.25 ns    215744512
BM_Sort_pair<uint32, uint32>_Descending_1024                                 2.87 ns         2.87 ns    243531776
BM_Sort_pair<uint32, uint32>_Descending_16384                                2.71 ns         2.71 ns    258211840
BM_Sort_pair<uint32, uint32>_Descending_262144                               2.84 ns         2.84 ns    260571136
BM_Sort_pair<uint32, uint32>_SingleElement_1                                 3.65 ns         3.64 ns    197132288
BM_Sort_pair<uint32, uint32>_SingleElement_4                                 1.82 ns         1.82 ns    384040960
BM_Sort_pair<uint32, uint32>_SingleElement_16                                2.28 ns         2.28 ns    305922048
BM_Sort_pair<uint32, uint32>_SingleElement_64                                1.84 ns         1.84 ns    379060224
BM_Sort_pair<uint32, uint32>_SingleElement_256                               1.58 ns         1.58 ns    442236928
BM_Sort_pair<uint32, uint32>_SingleElement_1024                              1.40 ns         1.40 ns    500957184
BM_Sort_pair<uint32, uint32>_SingleElement_16384                             1.31 ns         1.31 ns    533463040
BM_Sort_pair<uint32, uint32>_SingleElement_262144                            1.31 ns         1.31 ns    535035904
BM_Sort_pair<uint32, uint32>_PipeOrgan_1                                     3.47 ns         3.46 ns    203423744
BM_Sort_pair<uint32, uint32>_PipeOrgan_4                                     1.83 ns         1.83 ns    380370944
BM_Sort_pair<uint32, uint32>_PipeOrgan_16                                    2.89 ns         2.89 ns    242745344
BM_Sort_pair<uint32, uint32>_PipeOrgan_64                                    4.47 ns         4.47 ns    155713536
BM_Sort_pair<uint32, uint32>_PipeOrgan_256                                   3.10 ns         3.10 ns    225705984
BM_Sort_pair<uint32, uint32>_PipeOrgan_1024                                  7.63 ns         7.63 ns     91750400
BM_Sort_pair<uint32, uint32>_PipeOrgan_16384                                 8.54 ns         8.54 ns     80216064
BM_Sort_pair<uint32, uint32>_PipeOrgan_262144                                9.54 ns         9.54 ns     72613888
BM_Sort_tuple<uint32, uint64, uint32>_Random_1                               3.80 ns         3.79 ns    184811520
BM_Sort_tuple<uint32, uint64, uint32>_Random_4                               6.02 ns         6.02 ns    113508352
BM_Sort_tuple<uint32, uint64, uint32>_Random_16                              25.5 ns         25.5 ns     27000832
BM_Sort_tuple<uint32, uint64, uint32>_Random_64                              37.1 ns         37.1 ns     18612224
BM_Sort_tuple<uint32, uint64, uint32>_Random_256                             44.1 ns         44.1 ns     15728640
BM_Sort_tuple<uint32, uint64, uint32>_Random_1024                            50.3 ns         50.2 ns     13369344
BM_Sort_tuple<uint32, uint64, uint32>_Random_16384                           63.7 ns         63.7 ns     10747904
BM_Sort_tuple<uint32, uint64, uint32>_Random_262144                          76.5 ns         76.4 ns      8650752
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_1                            3.80 ns         3.80 ns    184549376
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_4                            2.21 ns         2.21 ns    316932096
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_16                           2.79 ns         2.79 ns    250347520
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_64                           2.15 ns         2.15 ns    326631424
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_256                          2.19 ns         2.19 ns    319553536
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_1024                         1.80 ns         1.80 ns    387448832
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_16384                        1.65 ns         1.65 ns    426508288
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_262144                       1.60 ns         1.60 ns    437256192
BM_Sort_tuple<uint32, uint64, uint32>_Descending_1                           3.80 ns         3.79 ns    185335808
BM_Sort_tuple<uint32, uint64, uint32>_Descending_4                           3.27 ns         3.27 ns    213909504
BM_Sort_tuple<uint32, uint64, uint32>_Descending_16                          6.43 ns         6.43 ns    107479040
BM_Sort_tuple<uint32, uint64, uint32>_Descending_64                          4.88 ns         4.88 ns    146014208
BM_Sort_tuple<uint32, uint64, uint32>_Descending_256                         4.41 ns         4.41 ns    158859264
BM_Sort_tuple<uint32, uint64, uint32>_Descending_1024                        3.94 ns         3.94 ns    177471488
BM_Sort_tuple<uint32, uint64, uint32>_Descending_16384                       3.96 ns         3.96 ns    177209344
BM_Sort_tuple<uint32, uint64, uint32>_Descending_262144                      3.89 ns         3.89 ns    179568640
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_1                        3.80 ns         3.80 ns    184025088
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_4                        2.28 ns         2.28 ns    305659904
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_16                       2.98 ns         2.98 ns    234618880
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_64                       2.70 ns         2.70 ns    259784704
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_256                      2.37 ns         2.37 ns    294649856
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_1024                     2.04 ns         2.04 ns    343670784
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_16384                    1.94 ns         1.94 ns    359399424
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_262144                   1.91 ns         1.91 ns    367525888
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_1                            3.81 ns         3.80 ns    184287232
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_4                            2.43 ns         2.43 ns    290193408
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_16                           3.62 ns         3.61 ns    193986560
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_64                           5.48 ns         5.48 ns    125304832
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_256                          4.04 ns         4.04 ns    173539328
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_1024                         8.63 ns         8.63 ns     79167488
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_16384                        10.2 ns         10.2 ns     67371008
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_262144                       11.6 ns         11.6 ns     59506688
BM_Sort_string_Random_1                                                      3.85 ns         3.85 ns    181665792
BM_Sort_string_Random_4                                                      18.1 ns         18.1 ns     38535168
BM_Sort_string_Random_16                                                     45.5 ns         45.5 ns     14942208
BM_Sort_string_Random_64                                                     62.6 ns         62.6 ns     10747904
BM_Sort_string_Random_256                                                    74.4 ns         74.4 ns      9175040
BM_Sort_string_Random_1024                                                   85.9 ns         85.9 ns      7864320
BM_Sort_string_Random_16384                                                   121 ns          121 ns      5767168
BM_Sort_string_Random_262144                                                  179 ns          179 ns      3407872
BM_Sort_string_Ascending_1                                                   3.84 ns         3.84 ns    182714368
BM_Sort_string_Ascending_4                                                   12.9 ns         12.9 ns     53215232
BM_Sort_string_Ascending_16                                                  21.7 ns         21.7 ns     31457280
BM_Sort_string_Ascending_64                                                  22.2 ns         22.2 ns     30670848
BM_Sort_string_Ascending_256                                                 21.4 ns         21.4 ns     32243712
BM_Sort_string_Ascending_1024                                                21.6 ns         21.6 ns     31981568
BM_Sort_string_Ascending_16384                                               27.3 ns         27.3 ns     25427968
BM_Sort_string_Ascending_262144                                              44.3 ns         44.3 ns     14942208
BM_Sort_string_Descending_1                                                  3.86 ns         3.85 ns    182190080
BM_Sort_string_Descending_4                                                  14.7 ns         14.7 ns     46923776
BM_Sort_string_Descending_16                                                 31.7 ns         31.7 ns     21757952
BM_Sort_string_Descending_64                                                 30.1 ns         30.1 ns     23068672
BM_Sort_string_Descending_256                                                28.0 ns         28.0 ns     24379392
BM_Sort_string_Descending_1024                                               27.6 ns         27.6 ns     24641536
BM_Sort_string_Descending_16384                                              37.3 ns         37.3 ns     18350080
BM_Sort_string_Descending_262144                                             62.2 ns         62.2 ns     11272192
BM_Sort_string_SingleElement_1                                               3.85 ns         3.85 ns    182452224
BM_Sort_string_SingleElement_4                                               10.8 ns         10.8 ns     63438848
BM_Sort_string_SingleElement_16                                              23.6 ns         23.6 ns     29360128
BM_Sort_string_SingleElement_64                                              18.6 ns         18.6 ns     37224448
BM_Sort_string_SingleElement_256                                             15.9 ns         15.9 ns     42991616
BM_Sort_string_SingleElement_1024                                            14.1 ns         14.1 ns     48234496
BM_Sort_string_SingleElement_16384                                           13.4 ns         13.5 ns     50855936
BM_Sort_string_SingleElement_262144                                          17.8 ns         17.9 ns     38797312
BM_Sort_string_PipeOrgan_1                                                   3.87 ns         3.86 ns    182190080
BM_Sort_string_PipeOrgan_4                                                   14.1 ns         14.1 ns     49283072
BM_Sort_string_PipeOrgan_16                                                  24.8 ns         24.8 ns     27787264
BM_Sort_string_PipeOrgan_64                                                  37.9 ns         37.9 ns     18087936
BM_Sort_string_PipeOrgan_256                                                 31.0 ns         31.0 ns     22282240
BM_Sort_string_PipeOrgan_1024                                                48.6 ns         48.6 ns     13893632
BM_Sort_string_PipeOrgan_16384                                               64.2 ns         64.2 ns     10485760
BM_Sort_string_PipeOrgan_262144                                               105 ns          105 ns      6553600

std::sort with the existing algorithm

2020-12-14 12:09:27
Running projects/libcxx/benchmarks/algorithms.libcxx.out
Run on (12 X 4000 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 15360K (x1)
Load Average: 1.54, 4.34, 4.83
------------------------------------------------------------------------------------------------------------------
Benchmark                                                                       Time             CPU   Iterations
------------------------------------------------------------------------------------------------------------------
BM_Sort_uint32_Random_1                                                      3.54 ns         3.53 ns    198180864
BM_Sort_uint32_Random_4                                                      13.7 ns         13.7 ns     50855936
BM_Sort_uint32_Random_16                                                     33.6 ns         33.6 ns     20447232
BM_Sort_uint32_Random_64                                                     55.2 ns         55.2 ns     12320768
BM_Sort_uint32_Random_256                                                    72.5 ns         72.5 ns      9437184
BM_Sort_uint32_Random_1024                                                   89.6 ns         89.6 ns      7602176
BM_Sort_uint32_Random_16384                                                   123 ns          123 ns      5505024
BM_Sort_uint32_Random_262144                                                  155 ns          155 ns      4456448
BM_Sort_uint32_Ascending_1                                                   3.77 ns         3.77 ns    183500800
BM_Sort_uint32_Ascending_4                                                   3.41 ns         3.41 ns    206045184
BM_Sort_uint32_Ascending_16                                                  3.08 ns         3.08 ns    227016704
BM_Sort_uint32_Ascending_64                                                  6.26 ns         6.26 ns    108789760
BM_Sort_uint32_Ascending_256                                                 5.52 ns         5.52 ns    124256256
BM_Sort_uint32_Ascending_1024                                                5.39 ns         5.38 ns    127401984
BM_Sort_uint32_Ascending_16384                                               5.28 ns         5.28 ns    129236992
BM_Sort_uint32_Ascending_262144                                              5.26 ns         5.26 ns    130285568
BM_Sort_uint32_Descending_1                                                  3.79 ns         3.78 ns    185335808
BM_Sort_uint32_Descending_4                                                  9.89 ns         9.89 ns     69468160
BM_Sort_uint32_Descending_16                                                 42.8 ns         42.8 ns     16252928
BM_Sort_uint32_Descending_64                                                 11.8 ns         11.8 ns     58195968
BM_Sort_uint32_Descending_256                                                11.1 ns         11.1 ns     62128128
BM_Sort_uint32_Descending_1024                                               13.3 ns         13.3 ns     51904512
BM_Sort_uint32_Descending_16384                                              12.7 ns         12.7 ns     53477376
BM_Sort_uint32_Descending_262144                                             12.7 ns         12.7 ns     54001664
BM_Sort_uint32_SingleElement_1                                               3.78 ns         3.78 ns    184287232
BM_Sort_uint32_SingleElement_4                                               3.42 ns         3.42 ns    203423744
BM_Sort_uint32_SingleElement_16                                              3.80 ns         3.80 ns    183500800
BM_Sort_uint32_SingleElement_64                                              5.64 ns         5.64 ns    121634816
BM_Sort_uint32_SingleElement_256                                             5.26 ns         5.26 ns    129761280
BM_Sort_uint32_SingleElement_1024                                            5.20 ns         5.20 ns    131858432
BM_Sort_uint32_SingleElement_16384                                           5.10 ns         5.10 ns    132644864
BM_Sort_uint32_SingleElement_262144                                          5.10 ns         5.10 ns    133955584
BM_Sort_uint32_PipeOrgan_1                                                   3.77 ns         3.76 ns    185597952
BM_Sort_uint32_PipeOrgan_4                                                   5.20 ns         5.20 ns    131858432
BM_Sort_uint32_PipeOrgan_16                                                  13.4 ns         13.4 ns     51118080
BM_Sort_uint32_PipeOrgan_64                                                  43.0 ns         43.0 ns     15990784
BM_Sort_uint32_PipeOrgan_256                                                 16.6 ns         16.6 ns     41156608
BM_Sort_uint32_PipeOrgan_1024                                                21.0 ns         21.0 ns     32768000
BM_Sort_uint32_PipeOrgan_16384                                               27.4 ns         27.4 ns     24641536
BM_Sort_uint32_PipeOrgan_262144                                              33.3 ns         33.3 ns     20709376
BM_Sort_uint64_Random_1                                                      3.53 ns         3.52 ns    198705152
BM_Sort_uint64_Random_4                                                      13.4 ns         13.4 ns     51642368
BM_Sort_uint64_Random_16                                                     33.4 ns         33.4 ns     20709376
BM_Sort_uint64_Random_64                                                     55.1 ns         55.1 ns     12058624
BM_Sort_uint64_Random_256                                                    72.2 ns         72.2 ns      9437184
BM_Sort_uint64_Random_1024                                                   89.1 ns         89.1 ns      7602176
BM_Sort_uint64_Random_16384                                                   122 ns          122 ns      5505024
BM_Sort_uint64_Random_262144                                                  155 ns          155 ns      4456448
BM_Sort_uint64_Ascending_1                                                   3.53 ns         3.52 ns    198443008
BM_Sort_uint64_Ascending_4                                                   3.43 ns         3.43 ns    204210176
BM_Sort_uint64_Ascending_16                                                  3.10 ns         3.10 ns    224395264
BM_Sort_uint64_Ascending_64                                                  6.31 ns         6.31 ns    108789760
BM_Sort_uint64_Ascending_256                                                 5.58 ns         5.58 ns    120586240
BM_Sort_uint64_Ascending_1024                                                5.40 ns         5.40 ns    127401984
BM_Sort_uint64_Ascending_16384                                               5.31 ns         5.31 ns    127664128
BM_Sort_uint64_Ascending_262144                                              5.30 ns         5.30 ns    130285568
BM_Sort_uint64_Descending_1                                                  3.53 ns         3.52 ns    198705152
BM_Sort_uint64_Descending_4                                                  9.79 ns         9.79 ns     69992448
BM_Sort_uint64_Descending_16                                                 43.3 ns         43.3 ns     15990784
BM_Sort_uint64_Descending_64                                                 11.8 ns         11.8 ns     58195968
BM_Sort_uint64_Descending_256                                                11.1 ns         11.1 ns     62128128
BM_Sort_uint64_Descending_1024                                               13.2 ns         13.2 ns     51904512
BM_Sort_uint64_Descending_16384                                              12.8 ns         12.8 ns     53739520
BM_Sort_uint64_Descending_262144                                             12.7 ns         12.7 ns     54001664
BM_Sort_uint64_SingleElement_1                                               3.51 ns         3.50 ns    199753728
BM_Sort_uint64_SingleElement_4                                               3.42 ns         3.42 ns    204996608
BM_Sort_uint64_SingleElement_16                                              3.11 ns         3.11 ns    226230272
BM_Sort_uint64_SingleElement_64                                              5.63 ns         5.63 ns    120324096
BM_Sort_uint64_SingleElement_256                                             5.29 ns         5.29 ns    130809856
BM_Sort_uint64_SingleElement_1024                                            5.17 ns         5.17 ns    131334144
BM_Sort_uint64_SingleElement_16384                                           5.12 ns         5.12 ns    133955584
BM_Sort_uint64_SingleElement_262144                                          5.12 ns         5.12 ns    133955584
BM_Sort_uint64_PipeOrgan_1                                                   3.53 ns         3.53 ns    198967296
BM_Sort_uint64_PipeOrgan_4                                                   5.23 ns         5.23 ns    131858432
BM_Sort_uint64_PipeOrgan_16                                                  13.5 ns         13.5 ns     50855936
BM_Sort_uint64_PipeOrgan_64                                                  43.1 ns         43.1 ns     15728640
BM_Sort_uint64_PipeOrgan_256                                                 16.7 ns         16.7 ns     40894464
BM_Sort_uint64_PipeOrgan_1024                                                21.2 ns         21.2 ns     32505856
BM_Sort_uint64_PipeOrgan_16384                                               27.6 ns         27.6 ns     24903680
BM_Sort_uint64_PipeOrgan_262144                                              33.4 ns         33.4 ns     20709376
BM_Sort_pair<uint32, uint32>_Random_1                                        3.36 ns         3.35 ns    208666624
BM_Sort_pair<uint32, uint32>_Random_4                                        5.21 ns         5.21 ns    130285568
BM_Sort_pair<uint32, uint32>_Random_16                                       19.2 ns         19.2 ns     35913728
BM_Sort_pair<uint32, uint32>_Random_64                                       27.3 ns         27.3 ns     25165824
BM_Sort_pair<uint32, uint32>_Random_256                                      33.1 ns         33.1 ns     20709376
BM_Sort_pair<uint32, uint32>_Random_1024                                     39.2 ns         39.2 ns     17563648
BM_Sort_pair<uint32, uint32>_Random_16384                                    51.6 ns         51.6 ns     13107200
BM_Sort_pair<uint32, uint32>_Random_262144                                   64.3 ns         64.3 ns     10485760
BM_Sort_pair<uint32, uint32>_Ascending_1                                     3.38 ns         3.38 ns    207880192
BM_Sort_pair<uint32, uint32>_Ascending_4                                     1.70 ns         1.70 ns    413401088
BM_Sort_pair<uint32, uint32>_Ascending_16                                    2.12 ns         2.12 ns    337379328
BM_Sort_pair<uint32, uint32>_Ascending_64                                    1.28 ns         1.28 ns    581435392
BM_Sort_pair<uint32, uint32>_Ascending_256                                   1.28 ns         1.28 ns    545783808
BM_Sort_pair<uint32, uint32>_Ascending_1024                                  1.14 ns         1.14 ns    610795520
BM_Sort_pair<uint32, uint32>_Ascending_16384                                 1.00 ns         1.00 ns    692322304
BM_Sort_pair<uint32, uint32>_Ascending_262144                               0.986 ns        0.985 ns    704643072
BM_Sort_pair<uint32, uint32>_Descending_1                                    3.36 ns         3.35 ns    208404480
BM_Sort_pair<uint32, uint32>_Descending_4                                    2.37 ns         2.37 ns    296222720
BM_Sort_pair<uint32, uint32>_Descending_16                                   4.33 ns         4.33 ns    162004992
BM_Sort_pair<uint32, uint32>_Descending_64                                   2.76 ns         2.76 ns    252968960
BM_Sort_pair<uint32, uint32>_Descending_256                                  2.41 ns         2.41 ns    290193408
BM_Sort_pair<uint32, uint32>_Descending_1024                                 2.61 ns         2.61 ns    268435456
BM_Sort_pair<uint32, uint32>_Descending_16384                                2.36 ns         2.36 ns    297009152
BM_Sort_pair<uint32, uint32>_Descending_262144                               2.36 ns         2.35 ns    295960576
BM_Sort_pair<uint32, uint32>_SingleElement_1                                 3.41 ns         3.40 ns    206045184
BM_Sort_pair<uint32, uint32>_SingleElement_4                                 1.76 ns         1.76 ns    397934592
BM_Sort_pair<uint32, uint32>_SingleElement_16                                1.40 ns         1.40 ns    485752832
BM_Sort_pair<uint32, uint32>_SingleElement_64                                1.41 ns         1.41 ns    490995712
BM_Sort_pair<uint32, uint32>_SingleElement_256                               1.30 ns         1.30 ns    542638080
BM_Sort_pair<uint32, uint32>_SingleElement_1024                              1.21 ns         1.21 ns    576454656
BM_Sort_pair<uint32, uint32>_SingleElement_16384                             1.07 ns         1.07 ns    651689984
BM_Sort_pair<uint32, uint32>_SingleElement_262144                            1.05 ns         1.05 ns    666107904
BM_Sort_pair<uint32, uint32>_PipeOrgan_1                                     3.36 ns         3.35 ns    208404480
BM_Sort_pair<uint32, uint32>_PipeOrgan_4                                     1.76 ns         1.76 ns    397672448
BM_Sort_pair<uint32, uint32>_PipeOrgan_16                                    4.01 ns         4.01 ns    174587904
BM_Sort_pair<uint32, uint32>_PipeOrgan_64                                    3.82 ns         3.82 ns    182976512
BM_Sort_pair<uint32, uint32>_PipeOrgan_256                                   4.61 ns         4.61 ns    152043520
BM_Sort_pair<uint32, uint32>_PipeOrgan_1024                                  5.27 ns         5.27 ns    130285568
BM_Sort_pair<uint32, uint32>_PipeOrgan_16384                                 7.09 ns         7.09 ns     96468992
BM_Sort_pair<uint32, uint32>_PipeOrgan_262144                                6.82 ns         6.82 ns    103022592
BM_Sort_tuple<uint32, uint64, uint32>_Random_1                               3.34 ns         3.34 ns    210239488
BM_Sort_tuple<uint32, uint64, uint32>_Random_4                               5.94 ns         5.93 ns    111149056
BM_Sort_tuple<uint32, uint64, uint32>_Random_16                              23.3 ns         23.2 ns     29622272
BM_Sort_tuple<uint32, uint64, uint32>_Random_64                              33.5 ns         33.5 ns     20447232
BM_Sort_tuple<uint32, uint64, uint32>_Random_256                             39.3 ns         39.3 ns     17563648
BM_Sort_tuple<uint32, uint64, uint32>_Random_1024                            45.1 ns         45.1 ns     15204352
BM_Sort_tuple<uint32, uint64, uint32>_Random_16384                           58.6 ns         58.6 ns     11796480
BM_Sort_tuple<uint32, uint64, uint32>_Random_262144                          70.5 ns         70.5 ns      9699328
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_1                            3.32 ns         3.31 ns    210763776
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_4                            2.10 ns         2.10 ns    332398592
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_16                           2.58 ns         2.58 ns    271319040
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_64                           2.31 ns         2.31 ns    281018368
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_256                          2.25 ns         2.24 ns    313524224
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_1024                         1.85 ns         1.84 ns    379322368
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_16384                        1.71 ns         1.71 ns    408420352
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_262144                       1.65 ns         1.65 ns    419168256
BM_Sort_tuple<uint32, uint64, uint32>_Descending_1                           3.33 ns         3.33 ns    209453056
BM_Sort_tuple<uint32, uint64, uint32>_Descending_4                           3.28 ns         3.28 ns    214171648
BM_Sort_tuple<uint32, uint64, uint32>_Descending_16                          5.29 ns         5.29 ns    130285568
BM_Sort_tuple<uint32, uint64, uint32>_Descending_64                          3.93 ns         3.93 ns    177995776
BM_Sort_tuple<uint32, uint64, uint32>_Descending_256                         3.26 ns         3.26 ns    214958080
BM_Sort_tuple<uint32, uint64, uint32>_Descending_1024                        3.50 ns         3.50 ns    200278016
BM_Sort_tuple<uint32, uint64, uint32>_Descending_16384                       3.62 ns         3.61 ns    193200128
BM_Sort_tuple<uint32, uint64, uint32>_Descending_262144                      3.61 ns         3.61 ns    192413696
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_1                        3.32 ns         3.32 ns    209977344
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_4                        2.27 ns         2.27 ns    307757056
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_16                       2.10 ns         2.10 ns    332398592
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_64                       2.48 ns         2.48 ns    281280512
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_256                      2.38 ns         2.38 ns    294387712
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_1024                     2.11 ns         2.11 ns    333447168
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_16384                    1.83 ns         1.83 ns    378535936
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_262144                   1.82 ns         1.82 ns    381157376
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_1                            3.31 ns         3.31 ns    211288064
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_4                            2.37 ns         2.36 ns    294649856
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_16                           5.39 ns         5.39 ns    126877696
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_64                           5.29 ns         5.29 ns    128974848
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_256                          6.39 ns         6.39 ns    107216896
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_1024                         6.98 ns         6.98 ns     98304000
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_16384                        7.95 ns         7.95 ns     86507520
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_262144                       9.46 ns         9.46 ns     72351744
BM_Sort_string_Random_1                                                      4.05 ns         4.05 ns    173277184
BM_Sort_string_Random_4                                                      17.9 ns         17.9 ns     38010880
BM_Sort_string_Random_16                                                     50.4 ns         50.3 ns     13369344
BM_Sort_string_Random_64                                                     73.3 ns         73.3 ns      9437184
BM_Sort_string_Random_256                                                    94.6 ns         94.4 ns      7077888
BM_Sort_string_Random_1024                                                    115 ns          115 ns      6029312
BM_Sort_string_Random_16384                                                   177 ns          177 ns      3932160
BM_Sort_string_Random_262144                                                  279 ns          279 ns      2097152
BM_Sort_string_Ascending_1                                                   4.07 ns         4.07 ns    171704320
BM_Sort_string_Ascending_4                                                   12.9 ns         12.9 ns     52953088
BM_Sort_string_Ascending_16                                                  21.7 ns         21.7 ns     32243712
BM_Sort_string_Ascending_64                                                  23.9 ns         23.9 ns     29360128
BM_Sort_string_Ascending_256                                                 23.4 ns         23.4 ns     29360128
BM_Sort_string_Ascending_1024                                                24.1 ns         24.1 ns     28311552
BM_Sort_string_Ascending_16384                                               30.6 ns         30.6 ns     22282240
BM_Sort_string_Ascending_262144                                              47.3 ns         47.3 ns     14155776
BM_Sort_string_Descending_1                                                  4.07 ns         4.06 ns    171442176
BM_Sort_string_Descending_4                                                  14.6 ns         14.6 ns     46923776
BM_Sort_string_Descending_16                                                 30.9 ns         30.9 ns     22282240
BM_Sort_string_Descending_64                                                 31.4 ns         31.4 ns     22020096
BM_Sort_string_Descending_256                                                31.1 ns         31.1 ns     22282240
BM_Sort_string_Descending_1024                                               35.0 ns         35.0 ns     19660800
BM_Sort_string_Descending_16384                                              50.0 ns         50.0 ns     13631488
BM_Sort_string_Descending_262144                                             79.9 ns         79.9 ns      8388608
BM_Sort_string_SingleElement_1                                               4.08 ns         4.07 ns    171966464
BM_Sort_string_SingleElement_4                                               10.8 ns         10.8 ns     63438848
BM_Sort_string_SingleElement_16                                              22.6 ns         22.6 ns     30408704
BM_Sort_string_SingleElement_64                                              23.6 ns         23.6 ns     29097984
BM_Sort_string_SingleElement_256                                             18.3 ns         18.3 ns     37486592
BM_Sort_string_SingleElement_1024                                            16.9 ns         16.9 ns     40894464
BM_Sort_string_SingleElement_16384                                           13.8 ns         13.8 ns     49807360
BM_Sort_string_SingleElement_262144                                          15.5 ns         15.5 ns     44564480
BM_Sort_string_PipeOrgan_1                                                   4.07 ns         4.07 ns    171966464
BM_Sort_string_PipeOrgan_4                                                   14.0 ns         14.0 ns     48496640
BM_Sort_string_PipeOrgan_16                                                  29.2 ns         29.2 ns     23592960
BM_Sort_string_PipeOrgan_64                                                  39.3 ns         39.3 ns     17563648
BM_Sort_string_PipeOrgan_256                                                 47.0 ns         47.0 ns     14942208
BM_Sort_string_PipeOrgan_1024                                                53.3 ns         53.3 ns     12845056
BM_Sort_string_PipeOrgan_16384                                               74.2 ns         74.2 ns      9175040
BM_Sort_string_PipeOrgan_262144                                               123 ns          123 ns      5505024

Diff Detail

Unit TestsFailed

TimeTest
2,590 mslibcxx CI -fno-exceptions > libc++.libcxx/modules::stds_include.sh.cpp
Script: -- : 'RUN: at line 28'; /usr/bin/clang++ -v --target=x86_64-unknown-linux-gnu -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/__config_site -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/include -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/include/c++build -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -fno-exceptions -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/test/libcxx/modules/Output/stds_include.sh.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -fmodules -fcxx-modules -fsyntax-only -std=c++03 -DINVALIDATE_CACHE_CXX03 /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/libcxx/modules/stds_include.sh.cpp
2,590 mslibcxx CI -fno-exceptions > libc++.libcxx/modules::stds_include.sh.cpp
Script: -- : 'RUN: at line 28'; /usr/bin/clang++ -v --target=x86_64-unknown-linux-gnu -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/__config_site -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/include -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/include/c++build -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -fno-exceptions -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/test/libcxx/modules/Output/stds_include.sh.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -fmodules -fcxx-modules -fsyntax-only -std=c++03 -DINVALIDATE_CACHE_CXX03 /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/libcxx/modules/stds_include.sh.cpp
2,590 mslibcxx CI -fno-exceptions > libc++.std/algorithms/alg_sorting/alg_merge::inplace_merge.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/clang++ /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/std/algorithms/alg.sorting/alg.merge/inplace_merge.pass.cpp -v --target=x86_64-unknown-linux-gnu -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/__config_site -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/include -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/include/c++build -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -fno-exceptions -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/test/std/algorithms/alg.sorting/alg.merge/Output/inplace_merge.pass.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -L/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/./lib -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/./lib -nodefaultlibs -lc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s…
2,590 mslibcxx CI -fno-exceptions > libc++.std/algorithms/alg_sorting/alg_merge::inplace_merge.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/clang++ /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/std/algorithms/alg.sorting/alg.merge/inplace_merge.pass.cpp -v --target=x86_64-unknown-linux-gnu -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/__config_site -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/include -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/include/c++build -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -fno-exceptions -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/test/std/algorithms/alg.sorting/alg.merge/Output/inplace_merge.pass.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -L/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/./lib -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/./lib -nodefaultlibs -lc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s…
3,020 mslibcxx CI -fno-exceptions > libc++.std/algorithms/alg_sorting/alg_merge::inplace_merge_comp.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/clang++ /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/std/algorithms/alg.sorting/alg.merge/inplace_merge_comp.pass.cpp -v --target=x86_64-unknown-linux-gnu -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/__config_site -include /home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/include -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/include/c++build -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -I/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -fno-exceptions -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/projects/libcxx/test/std/algorithms/alg.sorting/alg.merge/Output/inplace_merge_comp.pass.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -L/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/./lib -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/f355e457fd03-1/llvm-project/libcxx-ci/build/generic-noexceptions/./lib -nodefaultlibs -lc++ -lm -lgcc_s -lgcc -lpthread -lc…
View Full Test Results (6,164 Failed)

Event Timeline

minjaehwang created this revision.Dec 14 2020, 9:27 AM
minjaehwang requested review of this revision.Dec 14 2020, 9:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2020, 9:27 AM
minjaehwang edited the summary of this revision. (Show Details)Dec 14 2020, 9:29 AM
minjaehwang edited the summary of this revision. (Show Details)
minjaehwang edited the summary of this revision. (Show Details)
lebedev.ri retitled this revision from Replaces std::sort by Bitset sorting algorithm. to [TableGen] Replaces std::sort by Bitset sorting algorithm..Dec 14 2020, 9:34 AM
lebedev.ri edited reviewers, added: Paul-C-Anagnostopoulos; removed: EricWF.

Did you upload the right diff? The code does not show sort.

Made the incorrect patch previously.

Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript

Did you upload the right diff? The code does not show sort.

I am very sorry. Just updated the review with the sort.

lebedev.ri retitled this revision from [TableGen] Replaces std::sort by Bitset sorting algorithm. to [libc++] Replaces std::sort by Bitset sorting algorithm..Dec 14 2020, 10:02 AM
lebedev.ri removed a reviewer: Paul-C-Anagnostopoulos.
lebedev.ri added a subscriber: lebedev.ri.

What are the perf changes on CPU's without support for those instructions?

IMHO bitset sort is not a standard term, so it needs some clarification. From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

IMHO bitset sort is not a standard term, so it needs some clarification. From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

There's a list of standard algorithms that are allowed to use temporary heap storage.
That list does not include std::sort (it consists of stable_sort, stable_partition and inplace_merge)

abidh added a subscriber: abidh.Dec 14 2020, 11:06 AM

This is an interesting contribution. I'm trying to wrap my head around it.

From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

Unless I missed a subtlety, it doesn't require heap allocation. The "bitsets" used are actually 64 (or 32) bit integers. The current algorithm also doesn't have heap allocation, so keeping that property is something we want (and something this patch does AFAICT).

@minjaehwang Is there documentation for this algorithm? I believe it would help us (at least myself) understand the algorithm and the potential tradeoffs involved. Also, are you aware of Timsort? IIRC there had been an analysis conducted that concluded that we should be using Timsort (as it was faster than our sort and respected the complexity requirements of the Standard).

libcxx/test/libcxx/algorithms/sort.pass.cpp
21

Shouldn't this test work with all implementations? Why not?

This is an interesting contribution. I'm trying to wrap my head around it.

Thanks!

From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

Unless I missed a subtlety, it doesn't require heap allocation. The "bitsets" used are actually 64 (or 32) bit integers. The current algorithm also doesn't have heap allocation, so keeping that property is something we want (and something this patch does AFAICT).

You are right. It only requires two 64 (or 32) bit integers to represent bitsets on left partition and right partition.

@minjaehwang Is there documentation for this algorithm? I believe it would help us (at least myself) understand the algorithm and the potential tradeoffs involved. Also, are you aware of Timsort? IIRC there had been an analysis conducted that concluded that we should be using Timsort (as it was faster than our sort and respected the complexity requirements of the Standard).

I have the documentation which was intended to present in my employer. I will work with the company to publish the document.

I am aware of Timsort. Timsort is probably good for replacing std::stable_sort. It is a variant of merge sort and could be generally slower compared to unstable sorts.

What are the perf changes on CPU's without support for those instructions?

This is definitely something that I would like to do. Could you suggest which platforms or architectures should I try?

This implementation benefits from bsr or lzcnt instructions. I am guessing bitset will speed up even without specialized instructions because lzcnt and bsr are not terribly slow to implement with general instructions.

IMHO bitset sort is not a standard term, so it needs some clarification. From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

You are right. I coined bitset sort terminology and have not published anything publicly yet. I will work on that.

Bitset partition works with stack variables without heap memory allocation.

minjaehwang added inline comments.Dec 14 2020, 12:25 PM
libcxx/test/libcxx/algorithms/sort.pass.cpp
21

You are right. This is a general sort test. Will fix it.

Hi, @ldionne asked me whether it was possible to review the algorithm, and so did I over the last three days. I eventually decided to create an account here and share again what I already sent him by mail, with additional information.

First, it looks like the main new trick is an implementation of the idea described in *BlockQuicksort: How Branch Mispredictions don't affect Quicksort* by Edelkamp and Weiß: basically, the partitioning algorithm performs comparisons over blocks of 32~64 elements, stores their results as booleans (here in a bitset), then performs the swaps to put the misplaced elements into place. It nicely avoids issues linked to branch misprediction, and thus is kind of a silver bullet for quicksort as long as the comparisons and projections passed to the algorithm are themselves branchless. As we can see for the strings benchmark, this partitioning scheme can be slower than a more traditional one when there are branches in the comparisons, which is likely due to the overhead from the additional logic involved.

On the good side, I tweaked bitset_sort so that I could inject it in my cpp-sort library, and ran the full test suite: the algorithm doesn't seem to have obvious bugs, and ubsan and asan didn't fid any issue either. Also the benchmarks show good results (here for a std::vector<double>: https://i.imgur.com/Nu6l8lI.png

On the not-so-good side, bitset_sort seems to be O(n²), just like the current implementation of std::sort in C++. I reran the quicksort killer implementation of Orson Peters from this issue, and got the same results. The quicksort killer is based on *A Killer Adversary for Quicksort* by M. D. McIlroy. Here is a graph showing the O(n log n) vs. O(n²) behaviour of the different algorithms: https://i.imgur.com/IWa2teO.png

So far my overall conclusion is that the idea of reimplementing the BlockQuicksort logic over bitsets is nice, but pdqsort otherwise seems better in every regard: it is truly O(n log n) - and even O(n log k) for k unique elements when k is small -, is efficient at breaking common quicksort-adverse patterns such as the pipe organ pattern in my first benchmark, and it actually switches over to a simpler non-branchless partitioning algorithm when it can't prove at compile time that the comparison will be branchless, avoiding the regression on types such as std::string.

If std::sort has to be changed, then I'd definitely reconsider using pdqsort instead. It should be possible to reimplement it over bitsets à la bitset_sort if you care about the reduced impact on stack memory.

minjaehwang added a comment.EditedDec 21 2020, 1:27 PM

Thank @Morwenn very much for looking into this code.

@Morwenn, you are right that the idea is largely based on Block Quicksort except that Bitset sort uses a 32/64 bit integer instead of an array. A bitset will be kept within CPU registers and won't require store instructions like Block Quicksort does.

As you know, Pdqsort is a variation of Block Quicksort with a pattern recognition and a different pivot selection. It recognizes ascending ranges, descending ranges, and O(n ^ 2) cases. Pdqsort shows O(n) performance on these known patterns and guarantees O(n lg n) in the worse case.

The bitset trick can be applied to Pdqsort's partition function just like this code does for the existing std::sort's partition function. I have implemented Bitset trick on top of pdqsort as well but chose the current implementation over pdqsort-based Bitset sort. The current implementation introduces the minimal change to std::sort. It replaces the partition function but keeps most of the outer layer of std::sort.

The reason for keeping the outer layer of std::sort is that it will keep most of performance characteristics unchanged and adds a little overhead when there is a branch in the comparison function.

@Morwenn mentioned the O(n ^ 2) case for quicksort which also happens for the existing std::sort. As you might know, a simple change can avoid O(n ^ 2). Introsort avoids O(n ^ 2) by calling heap sort when a quicksort depth goes above O(lg n). There has been efforts to introduce introsort into libc++ (Kumar's introsort). For unknown reason to me, it has not been submitted. Pdqsort avoids O(n ^ 2) by calling heap sort when partitions are highly unbalanced.

I understand that Pdqsort is faster in many known patterns over std::sort. If we are not afraid of changing the entire sort implementation, I can certainly bring Bitset-on-pdqsort as another code review. @Morwenn and @ldionne, could you give your thoughts on the future direction?

@Morwenn Bitset sort is faster for std::strings than pdqsort which turns off Block sort technique for non-arithmetic types. See the following comparison. Block sort technique can be faster even with comparison functions with branches.

BM_PdqSort_string_Random_1                                                   4.65 ns         4.64 ns    150470656
BM_PdqSort_string_Random_4                                                   19.0 ns         19.0 ns     36175872
BM_PdqSort_string_Random_16                                                  46.4 ns         46.4 ns     14942208
BM_PdqSort_string_Random_64                                                  64.7 ns         64.7 ns     10485760
BM_PdqSort_string_Random_256                                                 84.1 ns         84.1 ns      8126464
BM_PdqSort_string_Random_1024                                                 103 ns          103 ns      6553600
BM_PdqSort_string_Random_16384                                                160 ns          160 ns      4456448
BM_PdqSort_string_Random_262144                                               242 ns          242 ns      2359296
BM_BitsetSort_string_Random_1                                                3.85 ns         3.85 ns    181665792
BM_BitsetSort_string_Random_4                                                18.1 ns         18.1 ns     38535168
BM_BitsetSort_string_Random_16                                               45.5 ns         45.5 ns     14942208
BM_BitsetSort_string_Random_64                                               62.6 ns         62.6 ns     10747904
BM_BitsetSort_string_Random_256                                              74.4 ns         74.4 ns      9175040
BM_BitsetSort_string_Random_1024                                             85.9 ns         85.9 ns      7864320
BM_BitsetSort_string_Random_16384                                             121 ns          121 ns      5767168
BM_BitsetSort_string_Random_262144                                            179 ns          179 ns      3407872
BM_Sort_string_Random_1                                                      4.05 ns         4.05 ns    173277184
BM_Sort_string_Random_4                                                      17.9 ns         17.9 ns     38010880
BM_Sort_string_Random_16                                                     50.4 ns         50.3 ns     13369344
BM_Sort_string_Random_64                                                     73.3 ns         73.3 ns      9437184
BM_Sort_string_Random_256                                                    94.6 ns         94.4 ns      7077888
BM_Sort_string_Random_1024                                                    115 ns          115 ns      6029312
BM_Sort_string_Random_16384                                                   177 ns          177 ns      3932160
BM_Sort_string_Random_262144                                                  279 ns          279 ns      2097152

The bitset trick can be applied to Pdqsort's partition function just like this code does for the existing std::sort's partition function. I have implemented Bitset trick on top of pdqsort as well but chose the current implementation over pdqsort-based Bitset sort. The current implementation introduces the minimal change to std::sort. It replaces the partition function but keeps most of the outer layer of std::sort.

The reason for keeping the outer layer of std::sort is that it will keep most of performance characteristics unchanged and adds a little overhead when there is a branch in the comparison function.

That's surely a fair concern.

@Morwenn mentioned the O(n ^ 2) case for quicksort which also happens for the existing std::sort. As you might know, a simple change can avoid O(n ^ 2). Introsort avoids O(n ^ 2) by calling heap sort when a quicksort depth goes above O(lg n). There has been efforts to introduce introsort into libc++ (Kumar's introsort). For unknown reason to me, it has not been submitted. Pdqsort avoids O(n ^ 2) by calling heap sort when partitions are highly unbalanced.

Yeah, I'm still rather surprised that the libc++ implementation of std::sort isn't some flavour of introsort yet. It's not excessively hard to implement, and it would make the algorithm standard-compliant. I know that a few years ago they wanted proper sorting benchmarks before giving the "go" to a new sort implementation, but it seems like something that could have been fixed regardless.

I understand that Pdqsort is faster in many known patterns over std::sort. If we are not afraid of changing the entire sort implementation, I can certainly bring Bitset-on-pdqsort as another code review. @Morwenn and @ldionne, could you give your thoughts on the future direction?

@Morwenn Bitset sort is faster for std::strings than pdqsort which turns off Block sort technique for non-arithmetic types. See the following comparison. Block sort technique can be faster even with comparison functions with branches.

Oh, I thought that deactivating the branchless partitioning algorithm was done because it introduced regressions for non-arithmetic types. I guess I blindly believed that it had been compared against std::string of all things to make this decision, but never ran benchmarks to back that claim. That's new info to me.

As for the future direction I do believe that pdqsort has some interesting ideas that would be nice to have, but I'm not part of the libc++ project in the first place, so it's not really my call to make. I'm glad to see that things are moving in this area nonetheless, and bitset_sort would be a step in the right direction compared to the status quo anyway :)