Page MenuHomePhabricator

Modify std::sort: add BlockQuickSort partitioning algorithm for arithmetic types
ClosedPublic

Authored by nilayvaish on Mar 30 2022, 7:29 PM.

Details

Summary

This diff modifies std::sort in two ways:

  • for arithmetic types we update the core partitioning algorithm to use BlockQuickSort for partitioning. The partition function was carefully written to let the compiler generates SIMD instructions without actually writing SIMD intrinsics in the loop. We see up to 50% better performance for sorting arithmetic types. The use of the BlockQuickSort partitioning has been limited to arithmetic types since the algorithm works well when branch instructions can be avoided during partitioning. This usually not true for types other than the arithmetic ones.
  • for other types (tuples, strings) updates have been made to improve performance by about 10%.

Performance numbers comparing std::sort (old) and Bitset sort (new) on libcxx benchmark.

name                                                             old cpu/op  new cpu/op  delta
BM_Sort_uint32_Random_1                                          3.72ns ± 5%  3.78ns ±16%      ~     (p=0.819 n=36+34)
BM_Sort_uint32_Random_4                                          5.42ns ± 5%  5.29ns ± 7%    -2.42%  (p=0.000 n=35+31)
BM_Sort_uint32_Random_16                                         10.5ns ± 3%  11.9ns ±15%   +13.08%  (p=0.000 n=36+40)
BM_Sort_uint32_Random_64                                         18.6ns ± 7%  18.5ns ±15%    -0.95%  (p=0.002 n=33+40)
BM_Sort_uint32_Random_256                                        26.2ns ± 4%  21.3ns ± 8%   -18.89%  (p=0.000 n=37+34)
BM_Sort_uint32_Random_1024                                       33.4ns ± 5%  23.3ns ± 4%   -30.37%  (p=0.000 n=39+35)
BM_Sort_uint32_Random_16384                                      47.7ns ± 5%  26.7ns ± 5%   -44.06%  (p=0.000 n=39+35)
BM_Sort_uint32_Random_262144                                     62.6ns ± 3%  30.1ns ± 6%   -51.81%  (p=0.000 n=37+36)
BM_Sort_uint32_Ascending_1                                       3.71ns ± 3%  4.28ns ± 3%   +15.53%  (p=0.000 n=37+35)
BM_Sort_uint32_Ascending_4                                       1.47ns ± 3%  1.46ns ± 3%      ~     (p=0.083 n=36+37)
BM_Sort_uint32_Ascending_16                                      0.93ns ± 4%  1.02ns ± 3%    +9.32%  (p=0.000 n=36+36)
BM_Sort_uint32_Ascending_64                                      1.23ns ± 5%  1.51ns ± 3%   +22.56%  (p=0.000 n=34+36)
BM_Sort_uint32_Ascending_256                                     1.21ns ± 3%  1.57ns ± 4%   +29.77%  (p=0.000 n=33+35)
BM_Sort_uint32_Ascending_1024                                    1.03ns ± 4%  1.43ns ± 3%   +38.44%  (p=0.000 n=32+35)
BM_Sort_uint32_Ascending_16384                                   0.94ns ± 8%  1.36ns ± 5%   +44.09%  (p=0.000 n=32+35)
BM_Sort_uint32_Ascending_262144                                  0.93ns ± 3%  1.35ns ± 7%   +45.06%  (p=0.000 n=32+36)
BM_Sort_uint32_Descending_1                                      3.69ns ± 2%  4.27ns ± 3%   +15.73%  (p=0.000 n=31+36)
BM_Sort_uint32_Descending_4                                      1.74ns ± 2%  1.78ns ± 3%    +2.29%  (p=0.000 n=31+38)
BM_Sort_uint32_Descending_16                                     3.92ns ± 4%  4.20ns ± 4%    +7.13%  (p=0.000 n=32+38)
BM_Sort_uint32_Descending_64                                     2.09ns ± 4%  3.25ns ± 4%   +55.10%  (p=0.000 n=33+37)
BM_Sort_uint32_Descending_256                                    1.98ns ± 7%  2.93ns ± 4%   +47.95%  (p=0.000 n=34+36)
BM_Sort_uint32_Descending_1024                                   2.23ns ± 6%  2.64ns ± 3%   +18.22%  (p=0.000 n=34+38)
BM_Sort_uint32_Descending_16384                                  1.93ns ± 6%  2.43ns ± 4%   +25.99%  (p=0.000 n=34+35)
BM_Sort_uint32_Descending_262144                                 1.89ns ± 3%  2.38ns ± 4%   +25.41%  (p=0.000 n=33+35)
BM_Sort_uint32_SingleElement_1                                   3.67ns ± 2%  4.28ns ± 4%   +16.60%  (p=0.000 n=34+34)
BM_Sort_uint32_SingleElement_4                                   1.48ns ± 4%  1.48ns ± 5%      ~     (p=0.951 n=35+33)
BM_Sort_uint32_SingleElement_16                                  0.93ns ± 3%  1.02ns ± 4%    +9.51%  (p=0.000 n=36+33)
BM_Sort_uint32_SingleElement_64                                  0.76ns ± 3%  1.59ns ± 8%  +109.78%  (p=0.000 n=36+32)
BM_Sort_uint32_SingleElement_256                                 0.82ns ± 4%  1.45ns ± 5%   +76.62%  (p=0.000 n=37+34)
BM_Sort_uint32_SingleElement_1024                                0.77ns ± 4%  1.31ns ± 4%   +71.40%  (p=0.000 n=34+34)
BM_Sort_uint32_SingleElement_16384                               0.64ns ± 4%  1.24ns ± 6%   +93.29%  (p=0.000 n=35+36)
BM_Sort_uint32_SingleElement_262144                              0.63ns ± 3%  1.23ns ± 4%   +95.17%  (p=0.000 n=35+35)
BM_Sort_uint32_PipeOrgan_1                                       3.68ns ± 2%  4.42ns ± 3%   +20.31%  (p=0.000 n=34+36)
BM_Sort_uint32_PipeOrgan_4                                       1.54ns ± 3%  1.53ns ± 3%      ~     (p=0.128 n=34+36)
BM_Sort_uint32_PipeOrgan_16                                      2.22ns ± 3%  1.99ns ± 3%   -10.28%  (p=0.000 n=33+36)
BM_Sort_uint32_PipeOrgan_64                                      4.41ns ± 3%  3.39ns ± 4%   -23.17%  (p=0.000 n=35+37)
BM_Sort_uint32_PipeOrgan_256                                     2.75ns ± 5%  3.07ns ± 3%   +11.74%  (p=0.000 n=37+37)
BM_Sort_uint32_PipeOrgan_1024                                    3.58ns ± 2%  5.48ns ± 3%   +52.97%  (p=0.000 n=37+36)
BM_Sort_uint32_PipeOrgan_16384                                   4.10ns ± 3%  6.53ns ± 3%   +59.27%  (p=0.000 n=37+37)
BM_Sort_uint32_PipeOrgan_262144                                  4.90ns ± 3%  7.39ns ± 3%   +50.71%  (p=0.000 n=34+37)
BM_Sort_uint32_QuickSortAdversary_1                              3.68ns ± 2%  4.28ns ± 3%   +16.19%  (p=0.000 n=36+37)
BM_Sort_uint32_QuickSortAdversary_4                              1.46ns ± 4%  1.46ns ± 3%      ~     (p=0.736 n=35+38)
BM_Sort_uint32_QuickSortAdversary_16                             0.93ns ± 3%  1.02ns ± 4%    +9.69%  (p=0.000 n=36+37)
BM_Sort_uint32_QuickSortAdversary_64                             13.6ns ± 4%  17.9ns ± 8%   +31.37%  (p=0.000 n=36+35)
BM_Sort_uint32_QuickSortAdversary_256                            20.0ns ± 4%  25.7ns ± 4%   +28.69%  (p=0.000 n=36+35)
BM_Sort_uint32_QuickSortAdversary_1024                           28.3ns ± 6%  31.7ns ± 3%   +12.12%  (p=0.000 n=36+37)
BM_Sort_uint32_QuickSortAdversary_16384                          45.8ns ± 3%  50.6ns ± 4%   +10.32%  (p=0.000 n=38+36)
BM_Sort_uint32_QuickSortAdversary_262144                         61.6ns ± 4%  68.2ns ± 4%   +10.68%  (p=0.000 n=37+37)
BM_Sort_uint64_Random_1                                          3.71ns ± 4%  4.00ns ± 4%    +7.93%  (p=0.000 n=34+35)
BM_Sort_uint64_Random_4                                          5.52ns ± 8%  5.22ns ± 6%    -5.41%  (p=0.000 n=32+32)
BM_Sort_uint64_Random_16                                         10.7ns ±15%  10.2ns ± 7%      ~     (p=0.077 n=40+31)
BM_Sort_uint64_Random_64                                         19.0ns ±14%  18.2ns ±14%    -4.31%  (p=0.001 n=40+40)
BM_Sort_uint64_Random_256                                        25.7ns ± 9%  22.1ns ±15%   -13.82%  (p=0.000 n=33+40)
BM_Sort_uint64_Random_1024                                       32.4ns ± 6%  23.8ns ±16%   -26.64%  (p=0.000 n=33+40)
BM_Sort_uint64_Random_16384                                      46.8ns ± 3%  27.1ns ±16%   -42.15%  (p=0.000 n=33+40)
BM_Sort_uint64_Random_262144                                     61.3ns ± 4%  30.4ns ±16%   -50.34%  (p=0.000 n=34+40)
BM_Sort_uint64_Ascending_1                                       3.67ns ± 3%  3.87ns ±16%    +5.36%  (p=0.049 n=35+40)
BM_Sort_uint64_Ascending_4                                       1.46ns ± 3%  1.46ns ± 3%      ~     (p=0.130 n=37+31)
BM_Sort_uint64_Ascending_16                                      1.09ns ± 3%  0.91ns ± 6%   -16.79%  (p=0.000 n=38+32)
BM_Sort_uint64_Ascending_64                                      1.25ns ± 3%  1.29ns ± 5%    +3.11%  (p=0.000 n=38+34)
BM_Sort_uint64_Ascending_256                                     1.37ns ± 3%  1.42ns ± 3%    +3.07%  (p=0.000 n=39+35)
BM_Sort_uint64_Ascending_1024                                    1.12ns ± 3%  1.17ns ± 3%    +5.28%  (p=0.000 n=37+36)
BM_Sort_uint64_Ascending_16384                                   0.98ns ± 3%  1.09ns ± 3%   +10.95%  (p=0.000 n=36+37)
BM_Sort_uint64_Ascending_262144                                  0.98ns ± 3%  1.08ns ± 3%   +10.97%  (p=0.000 n=36+37)
BM_Sort_uint64_Descending_1                                      3.68ns ± 3%  3.67ns ± 3%      ~     (p=0.652 n=36+36)
BM_Sort_uint64_Descending_4                                      1.71ns ± 3%  1.73ns ± 3%    +1.50%  (p=0.000 n=33+34)
BM_Sort_uint64_Descending_16                                     4.96ns ± 2%  5.49ns ± 3%   +10.73%  (p=0.000 n=31+36)
BM_Sort_uint64_Descending_64                                     2.14ns ± 6%  3.03ns ± 3%   +41.72%  (p=0.000 n=32+35)
BM_Sort_uint64_Descending_256                                    2.03ns ± 4%  2.86ns ± 4%   +40.55%  (p=0.000 n=32+34)
BM_Sort_uint64_Descending_1024                                   2.20ns ± 2%  2.29ns ± 3%    +4.20%  (p=0.000 n=31+36)
BM_Sort_uint64_Descending_16384                                  1.89ns ± 3%  2.08ns ± 3%   +10.00%  (p=0.000 n=31+37)
BM_Sort_uint64_Descending_262144                                 1.92ns ± 3%  2.07ns ± 4%    +7.95%  (p=0.000 n=31+36)
BM_Sort_uint64_SingleElement_1                                   3.68ns ± 5%  3.67ns ± 3%      ~     (p=0.716 n=31+37)
BM_Sort_uint64_SingleElement_4                                   1.46ns ± 3%  1.46ns ± 3%      ~     (p=0.557 n=34+37)
BM_Sort_uint64_SingleElement_16                                  1.09ns ± 2%  0.91ns ± 3%   -16.93%  (p=0.000 n=33+36)
BM_Sort_uint64_SingleElement_64                                  0.83ns ± 4%  1.47ns ± 4%   +78.03%  (p=0.000 n=34+34)
BM_Sort_uint64_SingleElement_256                                 0.95ns ± 4%  1.28ns ± 4%   +35.17%  (p=0.000 n=35+35)
BM_Sort_uint64_SingleElement_1024                                0.76ns ± 3%  1.05ns ± 3%   +37.78%  (p=0.000 n=35+33)
BM_Sort_uint64_SingleElement_16384                               0.71ns ± 2%  0.98ns ± 5%   +38.43%  (p=0.000 n=34+33)
BM_Sort_uint64_SingleElement_262144                              0.72ns ± 3%  0.98ns ± 4%   +35.93%  (p=0.000 n=35+33)
BM_Sort_uint64_PipeOrgan_1                                       3.68ns ± 3%  3.68ns ± 3%      ~     (p=0.650 n=35+33)
BM_Sort_uint64_PipeOrgan_4                                       1.53ns ± 2%  1.54ns ± 4%      ~     (p=0.424 n=33+36)
BM_Sort_uint64_PipeOrgan_16                                      2.23ns ± 3%  2.06ns ± 4%    -7.68%  (p=0.000 n=34+35)
BM_Sort_uint64_PipeOrgan_64                                      5.46ns ± 2%  3.41ns ± 4%   -37.67%  (p=0.000 n=33+36)
BM_Sort_uint64_PipeOrgan_256                                     2.92ns ± 4%  2.91ns ± 3%      ~     (p=0.257 n=35+35)
BM_Sort_uint64_PipeOrgan_1024                                    3.72ns ± 3%  5.35ns ± 4%   +43.95%  (p=0.000 n=35+35)
BM_Sort_uint64_PipeOrgan_16384                                   4.12ns ± 3%  6.37ns ± 3%   +54.74%  (p=0.000 n=34+36)
BM_Sort_uint64_PipeOrgan_262144                                  4.99ns ± 3%  7.25ns ± 5%   +45.45%  (p=0.000 n=35+35)
BM_Sort_uint64_QuickSortAdversary_1                              3.67ns ± 2%  3.65ns ± 3%      ~     (p=0.071 n=35+37)
BM_Sort_uint64_QuickSortAdversary_4                              1.46ns ± 3%  1.46ns ± 3%      ~     (p=0.214 n=36+37)
BM_Sort_uint64_QuickSortAdversary_16                             1.09ns ± 3%  0.91ns ± 3%   -16.73%  (p=0.000 n=36+38)
BM_Sort_uint64_QuickSortAdversary_64                             13.7ns ± 3%  17.8ns ± 5%   +29.86%  (p=0.000 n=36+37)
BM_Sort_uint64_QuickSortAdversary_256                            20.0ns ± 3%  25.9ns ± 3%   +29.25%  (p=0.000 n=35+38)
BM_Sort_uint64_QuickSortAdversary_1024                           28.1ns ± 3%  31.0ns ± 4%   +10.35%  (p=0.000 n=33+37)
BM_Sort_uint64_QuickSortAdversary_16384                          45.8ns ± 2%  50.5ns ± 4%   +10.29%  (p=0.000 n=36+37)
BM_Sort_uint64_QuickSortAdversary_262144                         64.9ns ± 3%  69.5ns ± 3%    +7.15%  (p=0.000 n=36+36)
BM_Sort_pair<uint32, uint32>_Random_1                            4.03ns ± 5%  4.33ns ± 4%    +7.31%  (p=0.000 n=36+36)
BM_Sort_pair<uint32, uint32>_Random_4                            6.78ns ± 5%  6.71ns ± 4%    -1.09%  (p=0.040 n=35+35)
BM_Sort_pair<uint32, uint32>_Random_16                           25.2ns ± 6%  16.8ns ± 7%   -33.35%  (p=0.000 n=35+35)
BM_Sort_pair<uint32, uint32>_Random_64                           35.6ns ± 7%  27.2ns ± 8%   -23.73%  (p=0.000 n=34+36)
BM_Sort_pair<uint32, uint32>_Random_256                          43.5ns ±13%  34.0ns ± 8%   -21.78%  (p=0.000 n=32+34)
BM_Sort_pair<uint32, uint32>_Random_1024                         50.6ns ± 8%  40.8ns ± 5%   -19.35%  (p=0.000 n=32+32)
BM_Sort_pair<uint32, uint32>_Random_16384                        66.0ns ± 3%  55.9ns ± 6%   -15.24%  (p=0.000 n=32+32)
BM_Sort_pair<uint32, uint32>_Random_262144                       82.4ns ± 4%  72.0ns ± 5%   -12.64%  (p=0.000 n=32+31)
BM_Sort_pair<uint32, uint32>_Ascending_1                         4.00ns ± 2%  4.50ns ±16%   +12.59%  (p=0.000 n=33+40)
BM_Sort_pair<uint32, uint32>_Ascending_4                         2.22ns ± 3%  2.34ns ±16%    +5.46%  (p=0.041 n=33+40)
BM_Sort_pair<uint32, uint32>_Ascending_16                        2.33ns ± 4%  1.30ns ±15%   -44.33%  (p=0.000 n=34+40)
BM_Sort_pair<uint32, uint32>_Ascending_64                        1.39ns ± 4%  1.50ns ± 8%    +8.48%  (p=0.000 n=35+32)
BM_Sort_pair<uint32, uint32>_Ascending_256                       1.47ns ± 4%  1.56ns ± 3%    +5.96%  (p=0.000 n=37+31)
BM_Sort_pair<uint32, uint32>_Ascending_1024                      1.34ns ± 3%  1.35ns ± 4%    +1.22%  (p=0.000 n=38+31)
BM_Sort_pair<uint32, uint32>_Ascending_16384                     1.18ns ± 2%  1.18ns ± 3%      ~     (p=0.687 n=37+32)
BM_Sort_pair<uint32, uint32>_Ascending_262144                    1.18ns ± 3%  1.17ns ± 2%      ~     (p=0.153 n=38+34)
BM_Sort_pair<uint32, uint32>_Descending_1                        4.00ns ± 2%  4.29ns ± 3%    +7.22%  (p=0.000 n=37+36)
BM_Sort_pair<uint32, uint32>_Descending_4                        2.91ns ± 3%  2.92ns ± 3%      ~     (p=0.065 n=37+35)
BM_Sort_pair<uint32, uint32>_Descending_16                       4.96ns ± 4%  6.51ns ± 2%   +31.36%  (p=0.000 n=37+30)
BM_Sort_pair<uint32, uint32>_Descending_64                       3.13ns ± 2%  2.92ns ± 3%    -6.71%  (p=0.000 n=36+37)
BM_Sort_pair<uint32, uint32>_Descending_256                      2.56ns ± 3%  2.73ns ± 5%    +6.55%  (p=0.000 n=35+37)
BM_Sort_pair<uint32, uint32>_Descending_1024                     3.11ns ± 3%  2.34ns ± 4%   -24.85%  (p=0.000 n=36+35)
BM_Sort_pair<uint32, uint32>_Descending_16384                    2.84ns ± 3%  2.14ns ± 5%   -24.48%  (p=0.000 n=37+37)
BM_Sort_pair<uint32, uint32>_Descending_262144                   2.86ns ± 3%  2.15ns ± 3%   -25.08%  (p=0.000 n=36+35)
BM_Sort_pair<uint32, uint32>_SingleElement_1                     3.99ns ± 3%  4.28ns ± 3%    +7.08%  (p=0.000 n=33+35)
BM_Sort_pair<uint32, uint32>_SingleElement_4                     2.32ns ± 6%  2.30ns ± 3%    -0.77%  (p=0.032 n=32+35)
BM_Sort_pair<uint32, uint32>_SingleElement_16                    1.67ns ± 4%  1.27ns ± 4%   -24.13%  (p=0.000 n=32+35)
BM_Sort_pair<uint32, uint32>_SingleElement_64                    1.64ns ± 7%  1.83ns ± 4%   +11.54%  (p=0.000 n=31+35)
BM_Sort_pair<uint32, uint32>_SingleElement_256                   1.57ns ± 3%  1.90ns ± 3%   +21.46%  (p=0.000 n=31+36)
BM_Sort_pair<uint32, uint32>_SingleElement_1024                  1.49ns ±15%  1.63ns ± 3%    +9.42%  (p=0.000 n=40+37)
BM_Sort_pair<uint32, uint32>_SingleElement_16384                 1.29ns ±17%  1.57ns ± 3%   +21.51%  (p=0.000 n=33+36)
BM_Sort_pair<uint32, uint32>_SingleElement_262144                1.26ns ± 4%  1.56ns ± 4%   +24.11%  (p=0.000 n=33+36)
BM_Sort_pair<uint32, uint32>_PipeOrgan_1                         4.01ns ± 2%  4.28ns ± 3%    +6.68%  (p=0.000 n=32+35)
BM_Sort_pair<uint32, uint32>_PipeOrgan_4                         2.38ns ± 5%  2.42ns ± 4%    +1.61%  (p=0.000 n=34+35)
BM_Sort_pair<uint32, uint32>_PipeOrgan_16                        4.83ns ± 2%  2.71ns ± 7%   -43.96%  (p=0.000 n=34+34)
BM_Sort_pair<uint32, uint32>_PipeOrgan_64                        4.53ns ± 3%  3.89ns ± 7%   -14.11%  (p=0.000 n=35+33)
BM_Sort_pair<uint32, uint32>_PipeOrgan_256                       5.53ns ± 4%  2.81ns ± 4%   -49.13%  (p=0.000 n=36+33)
BM_Sort_pair<uint32, uint32>_PipeOrgan_1024                      6.49ns ± 4%  5.29ns ± 3%   -18.50%  (p=0.000 n=35+32)
BM_Sort_pair<uint32, uint32>_PipeOrgan_16384                     7.21ns ± 4%  5.97ns ± 3%   -17.24%  (p=0.000 n=36+33)
BM_Sort_pair<uint32, uint32>_PipeOrgan_262144                    7.98ns ± 5%  6.59ns ± 3%   -17.46%  (p=0.000 n=33+33)
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_1                3.99ns ± 3%  4.27ns ± 3%    +6.95%  (p=0.000 n=36+34)
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_4                2.40ns ± 3%  2.37ns ± 3%    -1.00%  (p=0.007 n=34+34)
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_16               4.96ns ± 5%  2.72ns ± 7%   -45.07%  (p=0.000 n=35+35)
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_64               7.24ns ± 4%  7.51ns ± 4%    +3.63%  (p=0.000 n=34+35)
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_256              9.85ns ± 5%  7.12ns ± 4%   -27.70%  (p=0.000 n=34+35)
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_1024             11.6ns ± 6%   8.8ns ± 5%   -23.86%  (p=0.000 n=35+35)
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_16384            32.7ns ± 3%  20.8ns ± 4%   -36.26%  (p=0.000 n=35+35)
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_262144           36.4ns ± 3%  24.0ns ± 4%   -34.12%  (p=0.000 n=34+36)
BM_Sort_tuple<uint32, uint64, uint32>_Random_1                   4.04ns ± 6%  4.34ns ± 4%    +7.55%  (p=0.000 n=37+37)
BM_Sort_tuple<uint32, uint64, uint32>_Random_4                   7.19ns ± 6%  7.26ns ± 5%    +0.99%  (p=0.042 n=36+38)
BM_Sort_tuple<uint32, uint64, uint32>_Random_16                  30.4ns ± 6%  21.8ns ± 7%   -28.28%  (p=0.000 n=34+37)
BM_Sort_tuple<uint32, uint64, uint32>_Random_64                  42.8ns ±11%  33.5ns ± 9%   -21.70%  (p=0.000 n=36+38)
BM_Sort_tuple<uint32, uint64, uint32>_Random_256                 49.9ns ± 6%  40.3ns ± 9%   -19.20%  (p=0.000 n=35+38)
BM_Sort_tuple<uint32, uint64, uint32>_Random_1024                56.3ns ± 3%  46.1ns ± 4%   -18.08%  (p=0.000 n=35+35)
BM_Sort_tuple<uint32, uint64, uint32>_Random_16384               72.2ns ± 5%  62.1ns ± 3%   -14.05%  (p=0.000 n=37+36)
BM_Sort_tuple<uint32, uint64, uint32>_Random_262144              88.7ns ± 6%  79.0ns ± 6%   -10.93%  (p=0.000 n=36+36)
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_1                3.96ns ± 3%  4.36ns ± 3%    +9.96%  (p=0.000 n=34+37)
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_4                2.39ns ± 2%  2.39ns ± 3%      ~     (p=0.604 n=36+37)
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_16               3.04ns ± 4%  1.48ns ± 3%   -51.20%  (p=0.000 n=34+35)
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_64               2.44ns ± 3%  2.30ns ± 5%    -5.61%  (p=0.000 n=36+35)
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_256              2.35ns ± 3%  2.39ns ± 5%    +1.78%  (p=0.000 n=33+34)
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_1024             2.12ns ± 5%  2.08ns ± 4%    -1.80%  (p=0.000 n=33+34)
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_16384            2.02ns ± 3%  2.00ns ± 5%    -1.25%  (p=0.000 n=32+32)
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_262144           2.06ns ± 5%  2.11ns ± 9%      ~     (p=0.618 n=32+40)
BM_Sort_tuple<uint32, uint64, uint32>_Descending_1               3.97ns ± 2%  4.57ns ±16%   +15.19%  (p=0.000 n=32+40)
BM_Sort_tuple<uint32, uint64, uint32>_Descending_4               3.64ns ± 3%  4.05ns ±15%   +11.05%  (p=0.000 n=33+40)
BM_Sort_tuple<uint32, uint64, uint32>_Descending_16              5.68ns ± 5%  9.36ns ±16%   +64.92%  (p=0.000 n=35+40)
BM_Sort_tuple<uint32, uint64, uint32>_Descending_64              4.27ns ± 4%  3.88ns ± 8%    -9.13%  (p=0.000 n=35+32)
BM_Sort_tuple<uint32, uint64, uint32>_Descending_256             3.58ns ± 3%  3.76ns ±14%    +5.12%  (p=0.002 n=38+40)
BM_Sort_tuple<uint32, uint64, uint32>_Descending_1024            4.16ns ± 3%  3.21ns ± 5%   -22.77%  (p=0.000 n=38+31)
BM_Sort_tuple<uint32, uint64, uint32>_Descending_16384           3.90ns ± 4%  3.00ns ± 3%   -23.12%  (p=0.000 n=38+32)
BM_Sort_tuple<uint32, uint64, uint32>_Descending_262144          4.52ns ± 3%  3.42ns ± 3%   -24.29%  (p=0.000 n=38+33)
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_1            3.97ns ± 3%  4.31ns ± 3%    +8.78%  (p=0.000 n=39+34)
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_4            2.54ns ± 2%  2.54ns ± 4%      ~     (p=0.341 n=38+36)
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_16           2.39ns ± 3%  1.70ns ± 6%   -28.90%  (p=0.000 n=38+35)
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_64           2.61ns ± 2%  3.23ns ± 3%   +24.07%  (p=0.000 n=35+35)
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_256          2.83ns ± 2%  2.97ns ± 4%    +4.83%  (p=0.000 n=35+37)
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_1024         2.44ns ± 4%  2.44ns ± 3%      ~     (p=0.481 n=36+36)
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_16384        2.19ns ± 3%  2.37ns ± 6%    +8.01%  (p=0.000 n=36+37)
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_262144       2.34ns ± 2%  2.36ns ± 5%    +1.11%  (p=0.001 n=36+36)
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_1                3.96ns ± 2%  4.31ns ± 3%    +8.76%  (p=0.000 n=33+35)
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_4                2.65ns ± 6%  2.67ns ± 4%      ~     (p=0.139 n=32+37)
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_16               5.64ns ± 3%  3.56ns ± 3%   -36.80%  (p=0.000 n=31+35)
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_64               6.12ns ±16%  5.04ns ± 4%   -17.64%  (p=0.000 n=40+37)
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_256              6.78ns ± 6%  3.73ns ± 3%   -44.94%  (p=0.000 n=31+36)
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_1024             8.36ns ±15%  6.51ns ± 4%   -22.13%  (p=0.000 n=40+37)
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_16384            9.24ns ±15%  7.91ns ± 3%   -14.34%  (p=0.000 n=40+37)
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_262144           10.7ns ± 3%   9.3ns ± 6%   -12.36%  (p=0.000 n=32+36)
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_1       3.97ns ± 3%  4.31ns ± 3%    +8.63%  (p=0.000 n=32+35)
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_4       2.79ns ± 3%  2.76ns ± 4%    -0.95%  (p=0.002 n=33+33)
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_16      5.07ns ± 3%  3.69ns ± 4%   -27.35%  (p=0.000 n=35+33)
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_64      9.26ns ± 3%  8.34ns ± 7%    -9.88%  (p=0.000 n=35+33)
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_256     11.8ns ± 5%   9.7ns ± 3%   -17.83%  (p=0.000 n=37+33)
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_1024    19.2ns ± 4%  14.5ns ±10%   -24.59%  (p=0.000 n=36+33)
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_16384   45.5ns ± 4%  37.4ns ± 9%   -17.71%  (p=0.000 n=35+33)
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_262144  50.0ns ± 4%  43.2ns ± 3%   -13.69%  (p=0.000 n=35+34)
BM_Sort_string_Random_1                                          4.66ns ± 6%  4.40ns ± 4%    -5.55%  (p=0.000 n=35+37)
BM_Sort_string_Random_4                                          14.9ns ± 3%  15.0ns ± 6%      ~     (p=0.863 n=36+38)
BM_Sort_string_Random_16                                         45.5ns ± 6%  35.8ns ± 8%   -21.37%  (p=0.000 n=36+36)
BM_Sort_string_Random_64                                         66.6ns ± 4%  58.2ns ± 3%   -12.69%  (p=0.000 n=36+37)
BM_Sort_string_Random_256                                        86.0ns ± 5%  77.4ns ± 3%   -10.01%  (p=0.000 n=37+37)
BM_Sort_string_Random_1024                                        106ns ± 3%    96ns ± 6%    -9.39%  (p=0.000 n=37+37)
BM_Sort_string_Random_16384                                       154ns ± 3%   141ns ± 5%    -8.03%  (p=0.000 n=35+36)
BM_Sort_string_Random_262144                                      213ns ± 4%   197ns ± 4%    -7.59%  (p=0.000 n=34+34)
BM_Sort_string_Ascending_1                                       4.59ns ± 2%  4.56ns ±17%    -0.60%  (p=0.002 n=32+40)
BM_Sort_string_Ascending_4                                       7.52ns ± 9%  7.54ns ±12%      ~     (p=0.554 n=37+40)
BM_Sort_string_Ascending_16                                      13.1ns ± 6%   8.8ns ±12%   -33.26%  (p=0.000 n=39+38)
BM_Sort_string_Ascending_64                                      14.8ns ±10%  14.5ns ±11%    -2.15%  (p=0.013 n=40+37)
BM_Sort_string_Ascending_256                                     14.0ns ± 6%  14.1ns ±10%      ~     (p=0.760 n=37+40)
BM_Sort_string_Ascending_1024                                    12.9ns ±10%  12.8ns ±20%      ~     (p=0.055 n=35+40)
BM_Sort_string_Ascending_16384                                   17.2ns ±13%  17.4ns ±21%      ~     (p=1.000 n=37+40)
BM_Sort_string_Ascending_262144                                  17.5ns ±12%  17.5ns ±25%      ~     (p=0.392 n=35+39)
BM_Sort_string_Descending_1                                      4.59ns ± 3%  4.34ns ± 3%    -5.51%  (p=0.000 n=32+33)
BM_Sort_string_Descending_4                                      10.1ns ± 5%   9.8ns ± 4%    -2.84%  (p=0.000 n=36+34)
BM_Sort_string_Descending_16                                     22.0ns ± 4%  39.6ns ± 4%   +79.84%  (p=0.000 n=36+33)
BM_Sort_string_Descending_64                                     21.4ns ±12%  21.3ns ±14%      ~     (p=0.542 n=37+39)
BM_Sort_string_Descending_256                                    19.4ns ±13%  18.9ns ±13%    -2.74%  (p=0.039 n=37+39)
BM_Sort_string_Descending_1024                                   22.7ns ± 5%  17.6ns ±15%   -22.52%  (p=0.000 n=35+40)
BM_Sort_string_Descending_16384                                  27.9ns ±14%  22.6ns ±10%   -19.11%  (p=0.000 n=40+37)
BM_Sort_string_Descending_262144                                 33.8ns ±14%  26.1ns ±21%   -22.74%  (p=0.000 n=39+38)
BM_Sort_string_SingleElement_1                                   4.58ns ± 2%  4.35ns ± 3%    -5.14%  (p=0.000 n=35+37)
BM_Sort_string_SingleElement_4                                   7.92ns ± 3%  7.92ns ± 7%      ~     (p=0.625 n=38+39)
BM_Sort_string_SingleElement_16                                  18.0ns ± 3%   7.9ns ± 6%   -56.23%  (p=0.000 n=36+35)
BM_Sort_string_SingleElement_64                                  20.3ns ± 5%  19.3ns ±15%    -4.83%  (p=0.000 n=34+38)
BM_Sort_string_SingleElement_256                                 19.4ns ± 7%  18.1ns ±14%    -6.67%  (p=0.000 n=36+39)
BM_Sort_string_SingleElement_1024                                19.3ns ± 9%  17.4ns ±17%    -9.40%  (p=0.000 n=35+40)
BM_Sort_string_SingleElement_16384                               17.5ns ±12%  16.2ns ±20%    -7.91%  (p=0.000 n=37+40)
BM_Sort_string_SingleElement_262144                              16.7ns ±18%  15.3ns ±27%    -8.56%  (p=0.000 n=40+40)
BM_Sort_string_PipeOrgan_1                                       4.60ns ± 2%  4.33ns ± 3%    -5.80%  (p=0.000 n=33+31)
BM_Sort_string_PipeOrgan_4                                       8.29ns ± 4%  8.17ns ± 8%    -1.50%  (p=0.004 n=39+36)
BM_Sort_string_PipeOrgan_16                                      22.9ns ± 3%  16.4ns ± 6%   -28.45%  (p=0.000 n=39+38)
BM_Sort_string_PipeOrgan_64                                      30.7ns ± 4%  28.9ns ± 7%    -6.05%  (p=0.000 n=38+37)
BM_Sort_string_PipeOrgan_256                                     38.1ns ± 3%  22.5ns ± 9%   -40.78%  (p=0.000 n=37+37)
BM_Sort_string_PipeOrgan_1024                                    45.4ns ± 4%  36.2ns ± 6%   -20.33%  (p=0.000 n=37+37)
BM_Sort_string_PipeOrgan_16384                                   56.2ns ± 4%  49.0ns ± 8%   -12.73%  (p=0.000 n=36+38)
BM_Sort_string_PipeOrgan_262144                                  77.8ns ±13%  62.8ns ±10%   -19.27%  (p=0.000 n=39+39)
BM_Sort_string_QuickSortAdversary_1                              4.80ns ±16%  4.34ns ± 4%    -9.56%  (p=0.000 n=39+34)
BM_Sort_string_QuickSortAdversary_4                              14.8ns ± 5%  14.7ns ± 4%    -0.80%  (p=0.037 n=33+33)
BM_Sort_string_QuickSortAdversary_16                             44.6ns ± 4%  34.8ns ± 5%   -21.98%  (p=0.000 n=35+34)
BM_Sort_string_QuickSortAdversary_64                             66.2ns ± 3%  58.1ns ± 4%   -12.32%  (p=0.000 n=36+35)
BM_Sort_string_QuickSortAdversary_256                            85.4ns ± 5%  76.9ns ± 6%    -9.99%  (p=0.000 n=36+36)
BM_Sort_string_QuickSortAdversary_1024                            106ns ± 4%    96ns ± 3%    -9.62%  (p=0.000 n=34+37)
BM_Sort_string_QuickSortAdversary_16384                           153ns ± 3%   141ns ± 4%    -8.22%  (p=0.000 n=34+37)
BM_Sort_string_QuickSortAdversary_262144                          211ns ± 5%   195ns ± 6%    -7.77%  (p=0.000 n=35+38)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
philnik added inline comments.Apr 22 2022, 11:51 AM
libcxx/include/__algorithm/sort.h
497–498

Could you add a _LIBCPP_ASSERT for the precondition?

500–502

Could you shorten this to something like // __bitset_partition uses bitsets for storing the outcomes of the comparisons? I think that makes it clear.

505

Could you split this up into a few more functions? There are very clear sections to divide this functions.

657–658

Again, please add a _LIBCPP_ASSERT.

731–733

Why can't these be _LIBCPP_CONSTEXPR?

805–812
874–879
904–917

Where do you check that the default comparator is used?

nilayvaish marked 10 inline comments as done.

Changes to address reviewer comments.

nilayvaish added inline comments.Apr 27 2022, 2:09 PM
libcxx/include/__algorithm/sort.h
346–348

Dropped _3. This was named so likely because it was using __sort3 for the first three elements. I have dropped that code.

380

Changed _VSTD to std throughout the file.

731–733

Fixed.

805–812

I have fixed it. But clang-format might reformat it.

874–879

Same comment as above.

904–917

Fixed. Actually I was not aware how to test for the default comparator. I have now switched to using __use_branchless_sort declaration that is used at other places in this file.

Previous diff failed to apply. So uploading a new one.

Move block_size to detail namespace
Include bits for libcpp_* functions.

Drop inline for __block_size

Change block_size to enum.

We found similar code in string header file as well.

Pinging the reviewers!

Pinging the reviewers!

Pinging the reviewers again!

Sorry it's taking so long. Louis doesn't have a lot of time currently and I think he should take a look first and say if we want to go this direction.

Sorry it's taking so long. Louis doesn't have a lot of time currently and I think he should take a look first and say if we want to go this direction.

I think Louis is in favor of this change. You can read his comment in the earlier version of this diff: https://reviews.llvm.org/D93233. Is there something I can do to speed up the process? Maybe we can involve some of the other libcxx reviewers. What do you think?

Pinging the reviewers!

thakis added a subscriber: thakis.Jun 19 2022, 9:43 AM

Have you checked what this does to:

  • impact of binary size of outputs
  • compile time

? (E.g. for the clang binary, or your favorite medium-size open source project.)

I am different. I would enjoy a description of the algorithm at the top of the header and not on line 644.

I am in favour of making our std::sort implementation better, and at some point this did seem like the right path forward. Since then we also made some changes with branchless sorting (ping @marcogelmi), and we added a bunch of benchmarks so that we have a better understanding of the impact of this change. I wonder whether this is still the right approach given that information.

It seems like you rebased onto main with the branchless sort changes and also re-ran the benchmarks against that. Looking at the results, it looks like this basically improves the performance by a fairly significant margin for randomly distributed integers, but seems to also decrease the performance quite a bit for integers with some sort of structure (e.g. already sorted, descending or ascending). In particular, I assume the reduction of branch mispredictions is not a huge factor anymore since we introduced branchless sorting, or am I off here? The fact that it doesn't seem to provide improvements across the board definitely make it a trickier change to vet, since we'll have to decide whether we want to make this tradeoff.

As it stands, we basically have to decide whether to optimize for randomly sorted data or to also take into account common patterns IIUC. That's not a super easy call to make, since we're talking about 40-50% improvements and 40-50% regressions in each case. I remember @Morwenn had mentioned pdqsort previously, and that seemed to be an algorithm that would handle both random data and also patterns properly. Would that solve the problems of this algorithm?

Sorry if this isn't super helpful, but I literally don't know how to make this call. I'd very much welcome the perspective of other folks who have given a lot of thought to sorting algorithms.

FWIW, making a major change to std::sort isn't a problem, as long as we have sufficient tests to validate its correctness, and that we are confident about its performance.

It seems like you rebased onto main with the branchless sort changes and also re-ran the benchmarks against that. Looking at the results, it looks like this basically improves the performance by a fairly significant margin for randomly distributed integers, but seems to also decrease the performance quite a bit for integers with some sort of structure (e.g. already sorted, descending or ascending). In particular, I assume the reduction of branch mispredictions is not a huge factor anymore since we introduced branchless sorting, or am I off here? The fact that it doesn't seem to provide improvements across the board definitely make it a trickier change to vet, since we'll have to decide whether we want to make this tradeoff.

As it stands, we basically have to decide whether to optimize for randomly sorted data or to also take into account common patterns IIUC. That's not a super easy call to make, since we're talking about 40-50% improvements and 40-50% regressions in each case. I remember @Morwenn had mentioned pdqsort previously, and that seemed to be an algorithm that would handle both random data and also patterns properly. Would that solve the problems of this algorithm?

Sorry if this isn't super helpful, but I literally don't know how to make this call. I'd very much welcome the perspective of other folks who have given a lot of thought to sorting algorithms.

FWIW, making a major change to std::sort isn't a problem, as long as we have sufficient tests to validate its correctness, and that we are confident about its performance.

Below I provide the benchmark numbers for both BlockQuickSort and PdqSort. The baseline in both cases is the current implementation of std::sort. The numbers were collected on Intel's Skylake machines. Each benchmark was executed 20 times. To get to a combined performance number, I have added weights for each of the benchmark. The weights are just some bunch numbers I chose, I do not have any empirical data to justify these. I assumed that randomly permuted inputs have a significantly higher probability and that medium length sequences are also more probable than super short and super long sequences. The weighted average appears at the end: 15% improvement overall. Both PdqSort and BlockQuickSort arrive at a similar weighted average. I genreally think that the two implementations perform close to each other. So we can choose either.

`															
			  					BlockQuickSort							PdqSort						
name						old cpu/op		new cpu/op		delta			old cpu/op		new cpu/op		delta		Weight
BM_Sort_uint32_Random_1				4.27ns ± 1%		4.41ns ± 1%		3.16%			4.49ns ±13%		3.77ns ± 2%		-16.01%		10
BM_Sort_uint32_Random_4				1.96ns ± 1%		1.96ns ± 1%		0.00%			1.98ns ± 6%		6.25ns ± 3%		216.00%		100
BM_Sort_uint32_Random_16			10.5ns ± 0%		11.2ns ± 0%		6.31%			10.6ns ± 1%		11.3ns ± 2%		6.93%		1000
BM_Sort_uint32_Random_64			18.9ns ± 1%		17.1ns ± 1%		-9.36%			19.0ns ± 2%		16.9ns ± 2%		-10.94%		1000
BM_Sort_uint32_Random_256			26.6ns ± 1%		21.3ns ± 1%		-20.03%			26.5ns ± 1%		20.5ns ± 1%		-22.84%		1000
BM_Sort_uint32_Random_1024			34.1ns ± 1%		24.5ns ± 5%		-28.15%			34.2ns ± 1%		23.0ns ± 1%		-32.61%		1000
BM_Sort_uint32_Random_16384			49.1ns ± 1%		29.8ns ± 0%		-39.30%			49.4ns ± 2%		27.4ns ± 1%		-44.58%		100
BM_Sort_uint32_Random_262144			63.8ns ± 1%		35.3ns ± 1%		-44.61%			64.0ns ± 1%		31.7ns ± 1%		-50.51%		10
BM_Sort_uint32_Ascending_1			4.27ns ± 0%		4.42ns ± 1%		3.47%			4.28ns ± 1%		3.78ns ± 1%		-11.66%		1
BM_Sort_uint32_Ascending_4			2.05ns ±14%		1.98ns ± 0%		-3.68%			1.97ns ± 1%		1.74ns ± 1%		-11.46%		1
BM_Sort_uint32_Ascending_16			0.93ns ±13%		0.85ns ± 1%		-8.51%			0.89ns ± 0%		0.89ns ± 1%		0.00%		1
BM_Sort_uint32_Ascending_64			1.15ns ±13%		1.29ns ± 1%		11.64%			1.09ns ± 0%		1.52ns ± 1%		38.61%		1
BM_Sort_uint32_Ascending_256			1.29ns ±13%		1.44ns ± 0%		11.95%			1.24ns ± 1%		1.55ns ± 1%		25.71%		1
BM_Sort_uint32_Ascending_1024			1.09ns ±14%		1.26ns ± 1%		15.89%			1.04ns ± 1%		1.37ns ± 3%		31.41%		1
BM_Sort_uint32_Ascending_16384			1.01ns ±13%		1.20ns ± 1%		19.09%			0.96ns ± 0%		1.31ns ± 1%		35.76%		1
BM_Sort_uint32_Ascending_262144			0.96ns ± 1%		1.18ns ± 1%		22.80%			0.96ns ± 1%		1.30ns ± 2%		34.60%		1
BM_Sort_uint32_Descending_1			4.27ns ± 1%		4.42ns ± 1%		3.63%			4.28ns ± 1%		3.77ns ± 1%		-11.87%		1
BM_Sort_uint32_Descending_4			1.96ns ± 1%		1.95ns ± 1%		0.00%			2.05ns ±13%		2.44ns ± 1%		18.65%		1
BM_Sort_uint32_Descending_16			4.28ns ±11%		4.32ns ± 2%		0.00%			4.21ns ±13%		4.60ns ± 3%		9.17%		1
BM_Sort_uint32_Descending_64			2.12ns ± 9%		2.95ns ± 1%		38.73%			2.05ns ±13%		3.00ns ± 1%		46.11%		1
BM_Sort_uint32_Descending_256			2.29ns ± 9%		3.01ns ± 0%		31.56%			2.20ns ±14%		2.77ns ± 1%		26.14%		1
BM_Sort_uint32_Descending_1024			2.47ns ± 9%		2.86ns ± 0%		15.65%			2.38ns ±13%		2.58ns ± 2%		8.41%		1
BM_Sort_uint32_Descending_16384			2.15ns ± 9%		2.73ns ± 1%		27.04%			2.07ns ±13%		2.42ns ± 1%		17.10%		1
BM_Sort_uint32_Descending_262144		2.12ns ± 9%		2.71ns ± 1%		27.39%			2.04ns ±13%		2.40ns ± 2%		17.67%		1
BM_Sort_uint32_SingleElement_1			4.66ns ± 9%		4.42ns ± 0%		0.00%			4.44ns ±11%		4.08ns ± 9%		-8.09%		1
BM_Sort_uint32_SingleElement_4			2.10ns ±11%		2.08ns ±11%		0.00%			1.94ns ± 1%		1.93ns ±10%		0.00%		1
BM_Sort_uint32_SingleElement_16			0.89ns ± 0%		0.92ns ±10%		0.00%			0.90ns ± 1%		1.05ns ±10%		16.78%		1
BM_Sort_uint32_SingleElement_64			0.73ns ± 0%		1.67ns ±10%		129.82%			0.72ns ± 1%		1.53ns ±10%		111.51%		1
BM_Sort_uint32_SingleElement_256		0.83ns ± 1%		1.43ns ±10%		72.31%			0.83ns ± 1%		1.30ns ±10%		56.47%		1
BM_Sort_uint32_SingleElement_1024		0.76ns ± 2%		1.30ns ±10%		70.36%			0.76ns ± 2%		1.14ns ± 8%		49.38%		1
BM_Sort_uint32_SingleElement_16384		0.66ns ± 1%		1.22ns ±13%		85.32%			0.66ns ± 1%		1.02ns ± 5%		54.70%		1
BM_Sort_uint32_SingleElement_262144		0.65ns ± 1%		1.14ns ± 1%		76.14%			0.65ns ± 1%		0.98ns ± 3%		51.70%		1
BM_Sort_uint32_PipeOrgan_1			4.27ns ± 0%		4.42ns ± 1%		3.43%			4.29ns ± 1%		3.78ns ± 1%		-12.01%		1
BM_Sort_uint32_PipeOrgan_4			1.94ns ± 0%		1.97ns ± 1%		1.38%			1.94ns ± 0%		1.89ns ± 1%		-2.76%		1
BM_Sort_uint32_PipeOrgan_16			1.77ns ± 0%		1.82ns ± 0%		2.67%			1.77ns ± 1%		2.07ns ± 1%		17.10%		1
BM_Sort_uint32_PipeOrgan_64			4.56ns ±13%		3.49ns ± 0%		-23.44%			4.35ns ± 1%		4.50ns ± 1%		3.53%		1
BM_Sort_uint32_PipeOrgan_256			2.71ns ±13%		3.13ns ±13%		15.54%			2.59ns ± 1%		3.80ns ± 1%		46.54%		1
BM_Sort_uint32_PipeOrgan_1024			3.64ns ±13%		6.13ns ±11%		68.19%			3.47ns ± 1%		5.18ns ± 2%		49.18%		1
BM_Sort_uint32_PipeOrgan_16384			4.43ns ±13%		7.65ns ±10%		72.70%			4.21ns ± 0%		6.00ns ± 1%		42.45%		1
BM_Sort_uint32_PipeOrgan_262144			5.28ns ±13%		8.83ns ±11%		67.17%			5.04ns ± 0%		6.79ns ± 1%		34.71%		1
BM_Sort_uint32_QuickSortAdversary_1		4.47ns ±13%		4.70ns ±11%		5.06%			4.27ns ± 0%		3.77ns ± 1%		-11.77%		1
BM_Sort_uint32_QuickSortAdversary_4		2.00ns ± 9%		1.98ns ± 1%		0.00%			1.97ns ± 0%		1.74ns ± 1%		-11.27%		1
BM_Sort_uint32_QuickSortAdversary_16		0.89ns ± 1%		0.90ns ±16%		0.63%			0.89ns ± 0%		0.93ns ± 1%		4.28%		1
BM_Sort_uint32_QuickSortAdversary_64		13.9ns ± 0%		18.4ns ± 0%		32.53%			14.7ns ±12%		15.5ns ± 1%		5.29%		1
BM_Sort_uint32_QuickSortAdversary_256		20.5ns ± 0%		26.8ns ± 0%		31.00%			21.8ns ±12%		20.6ns ± 1%		0.00%		1
BM_Sort_uint32_QuickSortAdversary_1024		29.3ns ± 0%		37.4ns ± 1%		27.73%			31.1ns ±12%		28.2ns ± 3%		-9.36%		1
BM_Sort_uint32_QuickSortAdversary_16384		47.2ns ± 0%		59.7ns ± 2%		26.46%			50.1ns ±12%		44.7ns ± 1%		-10.89%		1
BM_Sort_uint32_QuickSortAdversary_262144	66.6ns ± 0%		78.6ns ± 1%		17.96%			70.8ns ±12%		60.0ns ± 1%		-15.32%		1
BM_Sort_uint64_Random_1				4.26ns ± 0%		4.41ns ± 1%		3.44%			4.52ns ±13%		3.77ns ± 1%		-16.41%		10
BM_Sort_uint64_Random_4				1.96ns ± 0%		1.95ns ± 1%		-0.34%			2.06ns ±13%		6.22ns ± 1%		202.72%		100
BM_Sort_uint64_Random_16			10.5ns ± 0%		11.6ns ± 1%		10.77%			10.9ns ±14%		10.5ns ± 1%		0.00%		1000
BM_Sort_uint64_Random_64			18.8ns ± 0%		17.5ns ± 1%		-6.60%			18.8ns ± 3%		16.7ns ± 1%		-11.33%		1000
BM_Sort_uint64_Random_256			26.4ns ± 0%		21.7ns ± 1%		-17.88%			26.4ns ± 0%		20.4ns ± 1%		-22.67%		1000
BM_Sort_uint64_Random_1024			33.8ns ± 1%		24.6ns ± 1%		-27.15%			33.8ns ± 0%		22.7ns ± 1%		-32.91%		1000
BM_Sort_uint64_Random_16384			48.7ns ± 0%		30.1ns ± 1%		-38.10%			48.7ns ± 0%		27.1ns ± 1%		-44.21%		100
BM_Sort_uint64_Random_262144			63.4ns ± 1%		35.7ns ± 1%		-43.71%			63.3ns ± 1%		31.5ns ± 2%		-50.24%		10
BM_Sort_uint64_Ascending_1			4.28ns ± 1%		4.41ns ± 1%		3.08%			4.27ns ± 1%		3.77ns ± 1%		-11.80%		1
BM_Sort_uint64_Ascending_4			1.96ns ± 1%		1.96ns ± 0%		0.00%			1.95ns ± 0%		1.81ns ± 1%		-7.50%		1
BM_Sort_uint64_Ascending_16			0.89ns ± 1%		0.93ns ± 1%		3.79%			0.89ns ± 1%		1.09ns ± 1%		21.85%		1
BM_Sort_uint64_Ascending_64			1.20ns ±13%		1.31ns ± 1%		9.64%			1.15ns ± 1%		1.52ns ± 1%		32.50%		1
BM_Sort_uint64_Ascending_256			1.42ns ±12%		1.50ns ± 1%		5.25%			1.37ns ± 2%		1.50ns ± 3%		9.43%		1
BM_Sort_uint64_Ascending_1024			1.17ns ±12%		1.28ns ± 1%		9.80%			1.13ns ± 1%		1.31ns ± 2%		16.35%		1
BM_Sort_uint64_Ascending_16384			1.05ns ±12%		1.21ns ± 1%		15.72%			1.00ns ± 1%		1.16ns ± 1%		15.71%		1
BM_Sort_uint64_Ascending_262144			1.05ns ±13%		1.20ns ± 1%		14.62%			1.01ns ± 1%		1.15ns ± 1%		14.46%		1
BM_Sort_uint64_Descending_1			4.27ns ± 1%		4.42ns ± 1%		3.41%			4.27ns ± 1%		3.77ns ± 1%		-11.74%		1
BM_Sort_uint64_Descending_4			1.95ns ± 0%		1.96ns ± 0%		0.32%			2.04ns ±14%		2.44ns ± 1%		19.76%		1
BM_Sort_uint64_Descending_16			3.87ns ± 1%		4.13ns ± 1%		6.73%			4.08ns ±13%		5.15ns ± 1%		26.21%		1
BM_Sort_uint64_Descending_64			1.98ns ± 5%		2.97ns ± 1%		50.29%			2.05ns ±13%		2.94ns ± 2%		43.02%		1
BM_Sort_uint64_Descending_256			2.18ns ±10%		3.09ns ± 1%		41.56%			2.14ns ±12%		2.88ns ± 2%		34.52%		1
BM_Sort_uint64_Descending_1024			2.46ns ± 9%		2.86ns ± 1%		16.28%			2.36ns ±12%		2.43ns ± 1%		2.65%		1
BM_Sort_uint64_Descending_16384			2.13ns ± 9%		2.75ns ± 2%		29.10%			2.04ns ±13%		2.28ns ± 1%		11.95%		1
BM_Sort_uint64_Descending_262144		2.14ns ± 9%		2.72ns ± 1%		27.48%			2.06ns ±12%		2.30ns ± 1%		11.59%		1
BM_Sort_uint64_SingleElement_1			4.67ns ± 9%		4.42ns ± 0%		0.00%			4.38ns ±10%		4.18ns ±10%		0.00%		1
BM_Sort_uint64_SingleElement_4			2.14ns ± 9%		2.04ns ±14%		0.00%			1.96ns ± 0%		2.01ns ±10%		0.00%		1
BM_Sort_uint64_SingleElement_16			0.89ns ± 1%		0.99ns ±11%		10.85%			0.89ns ± 0%		1.21ns ±10%		35.14%		1
BM_Sort_uint64_SingleElement_64			1.09ns ± 1%		1.83ns ±11%		68.80%			1.09ns ± 1%		1.70ns ± 9%		56.29%		1
BM_Sort_uint64_SingleElement_256		1.26ns ± 3%		1.48ns ±10%		18.25%			1.26ns ± 2%		1.46ns ± 4%		16.46%		1
BM_Sort_uint64_SingleElement_1024		1.09ns ± 1%		1.30ns ±10%		19.87%			1.08ns ± 1%		1.26ns ± 2%		16.01%		1
BM_Sort_uint64_SingleElement_16384		1.04ns ± 1%		1.23ns ±12%		19.17%			1.04ns ± 1%		1.24ns ± 5%		19.30%		1
BM_Sort_uint64_SingleElement_262144		1.01ns ± 1%		1.15ns ± 2%		13.41%			1.01ns ± 1%		1.17ns ± 3%		14.88%		1
BM_Sort_uint64_PipeOrgan_1			4.27ns ± 1%		4.42ns ± 1%		3.40%			4.27ns ± 0%		3.78ns ± 1%		-11.58%		1
BM_Sort_uint64_PipeOrgan_4			1.96ns ± 0%		1.96ns ± 1%		0.00%			1.96ns ± 0%		1.97ns ± 1%		0.42%		1
BM_Sort_uint64_PipeOrgan_16			1.82ns ± 9%		1.91ns ± 1%		4.60%			1.81ns ± 0%		2.17ns ± 1%		20.11%		1
BM_Sort_uint64_PipeOrgan_64			4.56ns ±13%		3.48ns ± 4%		-23.73%			4.37ns ± 1%		4.30ns ± 1%		-1.66%		1
BM_Sort_uint64_PipeOrgan_256			2.81ns ±13%		3.31ns ±10%		17.65%			2.70ns ± 1%		3.69ns ± 1%		37.06%		1
BM_Sort_uint64_PipeOrgan_1024			3.73ns ±14%		6.35ns ±10%		70.14%			3.60ns ± 2%		5.11ns ± 1%		42.01%		1
BM_Sort_uint64_PipeOrgan_16384			4.40ns ±13%		7.91ns ±10%		79.99%			4.20ns ± 1%		5.86ns ± 1%		39.65%		1
BM_Sort_uint64_PipeOrgan_262144			5.30ns ±13%		9.10ns ±10%		71.74%			5.06ns ± 1%		6.68ns ± 1%		32.06%		1
BM_Sort_uint64_QuickSortAdversary_1		4.32ns ± 8%		4.47ns ± 5%		3.35%			4.27ns ± 1%		3.77ns ± 1%		-11.70%		1
BM_Sort_uint64_QuickSortAdversary_4		1.95ns ± 0%		1.96ns ± 1%		0.29%			2.03ns ±14%		1.81ns ± 1%		-11.02%		1
BM_Sort_uint64_QuickSortAdversary_16		0.89ns ± 1%		0.91ns ± 1%		2.17%			0.95ns ±12%		1.08ns ± 2%		14.17%		1
BM_Sort_uint64_QuickSortAdversary_64		13.9ns ± 0%		18.7ns ± 0%		34.55%			14.7ns ±12%		16.0ns ± 1%		0.00%		1
BM_Sort_uint64_QuickSortAdversary_256		20.6ns ± 1%		26.7ns ± 1%		29.68%			21.9ns ±11%		21.0ns ± 1%		0.00%		1
BM_Sort_uint64_QuickSortAdversary_1024		29.1ns ± 0%		37.3ns ± 1%		28.18%			30.9ns ±12%		28.2ns ± 1%		-8.58%		1
BM_Sort_uint64_QuickSortAdversary_16384		47.6ns ± 0%		59.6ns ± 1%		25.26%			51.5ns ±10%		45.7ns ± 1%		-11.16%		1
BM_Sort_uint64_QuickSortAdversary_262144	66.9ns ± 1%		80.8ns ± 1%		20.77%			73.2ns ±11%		63.2ns ± 2%		-13.65%		1
BM_Sort_pair<uint32, uint32>_Random_1		4.43ns ± 1%		4.42ns ± 1%		-0.29%			4.67ns ±13%		3.78ns ± 1%		-19.16%		10
BM_Sort_pair<uint32, uint32>_Random_4		6.70ns ± 0%		6.78ns ± 1%		1.18%			6.70ns ± 1%		6.54ns ± 2%		-2.42%		100
BM_Sort_pair<uint32, uint32>_Random_16		25.6ns ± 1%		17.3ns ± 2%		-32.40%			25.5ns ± 1%		15.3ns ± 1%		-39.91%		1000
BM_Sort_pair<uint32, uint32>_Random_64		35.9ns ± 1%		28.4ns ± 1%		-20.89%			35.9ns ± 1%		27.6ns ± 1%		-23.20%		1000
BM_Sort_pair<uint32, uint32>_Random_256		43.6ns ± 0%		35.6ns ± 1%		-18.15%			43.6ns ± 1%		35.1ns ± 1%		-19.36%		1000
BM_Sort_pair<uint32, uint32>_Random_1024	51.5ns ± 0%		42.8ns ± 1%		-16.76%			51.5ns ± 1%		42.6ns ± 2%		-17.35%		1000
BM_Sort_pair<uint32, uint32>_Random_16384	67.6ns ± 1%		58.5ns ± 1%		-13.51%			67.8ns ± 2%		58.3ns ± 2%		-14.09%		100
BM_Sort_pair<uint32, uint32>_Random_262144	83.9ns ± 4%		74.4ns ± 4%		-11.37%			83.7ns ± 4%		73.9ns ± 4%		-11.72%		10
BM_Sort_pair<uint32, uint32>_Ascending_1	4.44ns ± 1%		4.42ns ± 1%		-0.27%			4.44ns ± 0%		3.78ns ± 1%		-14.92%		1
BM_Sort_pair<uint32, uint32>_Ascending_4	2.13ns ± 0%		2.12ns ± 1%		0.00%			2.13ns ± 1%		2.10ns ± 4%		-1.26%		1
BM_Sort_pair<uint32, uint32>_Ascending_16	2.23ns ± 0%		1.61ns ± 2%		-27.64%			2.22ns ± 1%		1.58ns ± 1%		-28.86%		1
BM_Sort_pair<uint32, uint32>_Ascending_64	1.39ns ± 1%		1.50ns ± 1%		7.76%			1.39ns ± 1%		1.66ns ± 1%		19.24%		1
BM_Sort_pair<uint32, uint32>_Ascending_256	1.56ns ±13%		1.57ns ± 2%		0.94%			1.49ns ± 1%		1.81ns ± 2%		21.72%		1
BM_Sort_pair<uint32, uint32>_Ascending_1024	1.43ns ±13%		1.38ns ± 1%		0.00%			1.37ns ± 1%		1.66ns ± 2%		21.47%		1
BM_Sort_pair<uint32, uint32>_Ascending_16384	1.27ns ±13%		1.21ns ± 1%		-4.34%			1.21ns ± 1%		1.49ns ± 2%		23.07%		1
BM_Sort_pair<uint32, uint32>_Ascending_262144	1.26ns ±12%		1.21ns ± 2%		0.00%			1.21ns ± 1%		1.48ns ± 1%		22.73%		1
BM_Sort_pair<uint32, uint32>_Descending_1	4.63ns ±13%		4.42ns ± 0%		-4.44%			4.43ns ± 1%		3.79ns ± 1%		-14.39%		1
BM_Sort_pair<uint32, uint32>_Descending_4	3.00ns ± 1%		2.97ns ± 1%		-0.85%			2.99ns ± 1%		3.31ns ± 1%		10.61%		1
BM_Sort_pair<uint32, uint32>_Descending_16	4.83ns ± 1%		7.39ns ± 1%		52.92%			5.02ns ±14%		7.30ns ± 2%		45.36%		1
BM_Sort_pair<uint32, uint32>_Descending_64	3.25ns ± 1%		2.85ns ± 1%		-12.27%			3.43ns ±12%		3.12ns ± 1%		-8.86%		1
BM_Sort_pair<uint32, uint32>_Descending_256	2.78ns ± 1%		2.78ns ± 1%		0.00%			2.89ns ±13%		3.22ns ± 1%		11.36%		1
BM_Sort_pair<uint32, uint32>_Descending_1024	3.38ns ± 0%		2.32ns ± 1%		-31.18%			3.56ns ±13%		2.61ns ± 1%		-26.54%		1
BM_Sort_pair<uint32, uint32>_Descending_16384	3.04ns ± 1%		2.12ns ± 1%		-30.14%			3.19ns ±13%		2.40ns ± 1%		-24.79%		1
BM_Sort_pair<uint32, uint32>_Descending_262144	3.21ns ±13%		2.13ns ± 0%		-33.77%			3.21ns ±13%		2.40ns ± 1%		-25.28%		1
BM_Sort_pair<uint32, uint32>_SingleElement_1	4.83ns ± 9%		4.42ns ± 0%		-8.47%			4.66ns ±13%		3.95ns ±14%		-15.29%		1
BM_Sort_pair<uint32, uint32>_SingleElement_4	2.41ns ± 9%		2.13ns ± 0%		-11.64%			2.31ns ±13%		2.44ns ±10%		0.00%		1
BM_Sort_pair<uint32, uint32>_SingleElement_16	2.02ns ±23%		1.75ns ± 1%		0.00%			1.74ns ±17%		1.81ns ±10%		0.00%		1
BM_Sort_pair<uint32, uint32>_SingleElement_64	1.53ns ± 9%		1.91ns ±13%		24.46%			1.41ns ± 1%		2.01ns ±11%		43.16%		1
BM_Sort_pair<uint32, uint32>_SingleElement_256	1.71ns ± 8%		2.00ns ±11%		17.25%			1.59ns ± 2%		1.85ns ±10%		16.66%		1
BM_Sort_pair<uint32, uint32>_SingleElement_1024		1.50ns ±13%		1.75ns ±12%		16.59%			1.45ns ± 1%		1.52ns ±10%		0.00%		1
BM_Sort_pair<uint32, uint32>_SingleElement_16384	1.30ns ± 1%		1.71ns ±11%		31.45%			1.31ns ± 1%		1.44ns ±10%		10.59%		1
BM_Sort_pair<uint32, uint32>_SingleElement_262144	1.29ns ± 1%		1.71ns ±11%		32.70%			1.29ns ± 0%		1.37ns ±12%		0.00%		1
BM_Sort_pair<uint32, uint32>_PipeOrgan_1		4.43ns ± 0%		4.74ns ±11%		0.00%			4.44ns ± 1%		3.78ns ± 1%		-14.72%		1
BM_Sort_pair<uint32, uint32>_PipeOrgan_4		2.29ns ± 0%		2.44ns ±11%		0.00%			2.29ns ± 0%		2.35ns ± 1%		2.76%		1
BM_Sort_pair<uint32, uint32>_PipeOrgan_16		4.68ns ± 1%		3.01ns ± 1%		-35.66%			4.68ns ± 1%		2.83ns ± 1%		-39.52%		1
BM_Sort_pair<uint32, uint32>_PipeOrgan_64		4.52ns ± 1%		4.05ns ± 1%		-10.38%			4.52ns ± 1%		5.29ns ± 1%		17.23%		1
BM_Sort_pair<uint32, uint32>_PipeOrgan_256		5.60ns ± 1%		2.80ns ± 1%		-49.95%			5.59ns ± 1%		4.33ns ± 1%		-22.61%		1
BM_Sort_pair<uint32, uint32>_PipeOrgan_1024		6.41ns ± 1%		5.54ns ± 1%		-13.62%			6.40ns ± 1%		5.49ns ± 1%		-14.31%		1
BM_Sort_pair<uint32, uint32>_PipeOrgan_16384		7.27ns ± 1%		6.27ns ± 1%		-13.68%			7.26ns ± 0%		5.97ns ± 1%		-17.72%		1
BM_Sort_pair<uint32, uint32>_PipeOrgan_262144		8.11ns ± 1%		6.98ns ± 5%		-13.96%			8.11ns ± 1%		6.65ns ± 1%		-18.00%		1
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_1	4.45ns ± 1%		4.42ns ± 1%		-0.71%			4.43ns ± 0%		3.78ns ± 1%		-14.72%		1
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_4	2.30ns ± 8%		2.39ns ±12%		0.00%			2.28ns ± 1%		2.28ns ± 1%		0.00%		1
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_16	4.94ns ±13%		3.11ns ±10%		-36.97%			4.73ns ± 1%		2.78ns ± 1%		-41.18%		1
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_64	7.06ns ±13%		7.96ns ±10%		12.87%			6.74ns ± 0%		7.36ns ± 3%		9.20%		1
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_256	9.69ns ±13%		7.81ns ±11%		-19.36%			9.27ns ± 1%		7.41ns ± 1%		-20.07%		1
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_1024	11.3ns ±14%		10.0ns ±11%		-11.64%			10.9ns ± 1%		9.0ns ± 3%		-17.40%		1
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_16384	34.2ns ±13%		24.3ns ±12%		-28.84%			32.7ns ± 0%		22.9ns ± 1%		-30.03%		1
BM_Sort_pair<uint32, uint32>_QuickSortAdversary_262144	36.3ns ± 0%		26.8ns ±13%		-26.00%			36.3ns ± 0%		25.1ns ± 4%		-30.91%		1
BM_Sort_tuple<uint32, uint64, uint32>_Random_1		4.78ns ± 1%		4.76ns ± 1%		-0.54%			5.07ns ±12%		4.11ns ± 1%		-18.90%		10
BM_Sort_tuple<uint32, uint64, uint32>_Random_4		7.20ns ± 0%		7.29ns ± 3%		1.23%			7.91ns ±14%		7.89ns ± 1%		0.00%		100
BM_Sort_tuple<uint32, uint64, uint32>_Random_16		31.6ns ± 1%		19.7ns ± 2%		-37.59%			34.3ns ±10%		20.4ns ± 1%		-40.67%		1000
BM_Sort_tuple<uint32, uint64, uint32>_Random_64		44.4ns ± 1%		33.7ns ± 1%		-24.11%			48.2ns ±11%		33.8ns ± 1%		-30.04%		1000
BM_Sort_tuple<uint32, uint64, uint32>_Random_256	51.7ns ± 1%		40.4ns ± 0%		-21.84%			56.0ns ±12%		41.2ns ± 1%		-26.43%		1000
BM_Sort_tuple<uint32, uint64, uint32>_Random_1024	59.3ns ± 1%		47.1ns ± 1%		-20.69%			63.6ns ±11%		48.0ns ± 1%		-24.51%		1000
BM_Sort_tuple<uint32, uint64, uint32>_Random_16384	75.6ns ± 1%		63.6ns ± 2%		-15.89%			81.0ns ±11%		64.7ns ± 1%		-20.16%		100
BM_Sort_tuple<uint32, uint64, uint32>_Random_262144	93.5ns ± 2%		81.1ns ± 5%		-13.24%			98.1ns ±13%		81.4ns ± 3%		-17.09%		10
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_1	4.81ns ± 2%		4.76ns ± 1%		-0.94%			4.85ns ± 6%		4.13ns ± 1%		-14.86%		1
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_4	2.33ns ± 2%		2.33ns ± 4%		0.00%			2.35ns ± 3%		2.23ns ± 4%		-5.15%		1
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_16	3.17ns ± 1%		1.74ns ± 2%		-45.24%			3.18ns ± 2%		1.61ns ± 2%		-49.24%		1
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_64	2.57ns ± 3%		2.50ns ± 5%		-2.81%			2.62ns ± 6%		2.77ns ± 2%		6.03%		1
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_256	2.59ns ± 2%		2.49ns ± 4%		-3.67%			2.61ns ± 3%		2.88ns ± 3%		10.45%		1
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_1024	2.23ns ± 4%		2.18ns ± 3%		-2.32%			2.23ns ± 2%		2.48ns ± 2%		10.98%		1
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_16384	2.10ns ± 2%		2.06ns ± 2%		-1.54%			2.11ns ± 2%		2.34ns ± 1%		10.78%		1
BM_Sort_tuple<uint32, uint64, uint32>_Ascending_262144	2.12ns ± 2%		2.08ns ± 2%		-1.59%			2.12ns ± 1%		2.32ns ± 2%		9.44%		1
BM_Sort_tuple<uint32, uint64, uint32>_Descending_1	4.81ns ± 1%		4.78ns ± 2%		-0.55%			4.84ns ± 2%		4.12ns ± 1%		-14.87%		1
BM_Sort_tuple<uint32, uint64, uint32>_Descending_4	3.89ns ± 3%		3.77ns ± 1%		-3.04%			3.89ns ± 1%		4.07ns ± 1%		4.81%		1
BM_Sort_tuple<uint32, uint64, uint32>_Descending_16	5.95ns ±10%		10.26ns ± 1%		72.39%			5.92ns ± 0%		9.28ns ± 3%		56.86%		1
BM_Sort_tuple<uint32, uint64, uint32>_Descending_64	4.63ns ±12%		4.06ns ± 2%		-12.37%			4.48ns ± 3%		4.11ns ± 2%		-8.15%		1
BM_Sort_tuple<uint32, uint64, uint32>_Descending_256	3.82ns ±13%		3.74ns ± 3%		0.00%			3.69ns ± 2%		4.25ns ± 1%		15.14%		1
BM_Sort_tuple<uint32, uint64, uint32>_Descending_1024	4.56ns ±13%		3.23ns ± 1%		-29.31%			4.39ns ± 1%		3.68ns ± 5%		-16.16%		1
BM_Sort_tuple<uint32, uint64, uint32>_Descending_16384	4.23ns ±13%		3.00ns ± 1%		-29.09%			4.06ns ± 1%		3.32ns ± 2%		-18.30%		1
BM_Sort_tuple<uint32, uint64, uint32>_Descending_262144	4.65ns ± 3%		3.40ns ± 1%		-27.01%			4.66ns ± 2%		3.65ns ± 1%		-21.53%		1
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_1		4.82ns ± 2%		4.77ns ± 2%		-1.05%			5.06ns ±14%		4.13ns ± 1%		-18.47%		1
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_4		2.50ns ± 2%		2.51ns ± 4%		0.00%			2.64ns ±15%		2.35ns ± 7%		-10.75%		1
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_16		2.45ns ± 1%		1.99ns ± 2%		-18.72%			2.60ns ±12%		1.69ns ± 5%		-34.99%		1
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_64		2.78ns ±11%		3.36ns ± 4%		20.88%			2.87ns ±13%		2.94ns ± 5%		0.00%		1
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_256		3.11ns ±24%		3.09ns ± 3%		0.00%			3.05ns ±16%		2.99ns ± 9%		0.00%		1
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_1024	2.70ns ± 9%		2.52ns ± 3%		-6.38%			2.63ns ±11%		2.36ns ± 9%		-10.36%		1
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_16384	2.48ns ±12%		2.44ns ± 3%		0.00%			2.30ns ± 2%		2.32ns ± 9%		0.00%		1
BM_Sort_tuple<uint32, uint64, uint32>_SingleElement_262144	2.60ns ± 8%		2.45ns ± 7%		-5.71%			2.43ns ± 3%		2.38ns ± 9%		0.00%		1
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_1		5.73ns ±10%		5.05ns ±13%		-11.87%			5.27ns ± 4%		4.61ns ±11%		-12.43%		1
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_4		2.71ns ±12%		2.71ns ±13%		0.00%			2.58ns ± 0%		2.95ns ±10%		14.52%		1
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_16		5.81ns ± 2%		4.29ns ±13%		-26.10%			5.81ns ± 1%		3.64ns ±10%		-37.30%		1
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_64		6.05ns ± 3%		5.64ns ±14%		-6.68%			6.04ns ± 2%		8.07ns ±13%		33.50%		1
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_256		7.43ns ± 3%		4.01ns ±12%		-46.12%			7.45ns ± 1%		5.67ns ± 2%		-23.87%		1
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_1024		8.91ns ± 1%		6.92ns ±14%		-22.30%			8.98ns ± 1%		7.61ns ± 1%		-15.22%		1
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_16384		9.65ns ± 1%		7.87ns ± 1%		-18.47%			9.75ns ± 1%		8.66ns ± 1%		-11.20%		1
BM_Sort_tuple<uint32, uint64, uint32>_PipeOrgan_262144		11.7ns ± 1%		9.2ns ± 1%		-21.28%			11.8ns ± 1%		9.8ns ± 1%		-16.53%		1
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_1	4.80ns ± 2%		4.79ns ± 2%		0.00%			4.83ns ± 2%		4.14ns ± 4%		-14.23%		1
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_4	2.71ns ± 1%		2.71ns ± 2%		0.00%			2.71ns ± 1%		2.69ns ± 1%		-0.74%		1
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_16	5.97ns ± 1%		4.22ns ±13%		-29.33%			5.97ns ± 0%		3.98ns ± 1%		-33.38%		1
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_64	10.0ns ± 2%		9.3ns ±12%		-6.99%			9.95ns ± 1%		9.07ns ± 1%		-8.84%		1
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_256	12.5ns ± 1%		10.7ns ±12%		-13.94%			12.6ns ± 4%		10.2ns ± 1%		-18.73%		1
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_1024	23.2ns ± 2%		15.5ns ±11%		-33.05%			23.3ns ± 4%		14.6ns ± 1%		-37.55%		1
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_16384	48.5ns ± 1%		41.0ns ±10%		-15.46%			48.5ns ± 1%		39.0ns ± 2%		-19.70%		1
BM_Sort_tuple<uint32, uint64, uint32>_QuickSortAdversary_262144	53.6ns ± 6%		47.2ns ±10%		-11.88%			53.5ns ± 1%		43.7ns ± 1%		-18.20%		1
BM_Sort_string_Random_1				5.01ns ± 1%		5.03ns ± 2%		0.00%			5.29ns ±12%		5.82ns ± 1%		10.12%		10
BM_Sort_string_Random_4				16.3ns ± 4%		16.0ns ± 5%		0.00%			17.2ns ±16%		18.2ns ± 5%		5.71%		100
BM_Sort_string_Random_16			48.7ns ± 3%		38.2ns ± 4%		-21.57%			52.4ns ±13%		41.5ns ± 4%		-20.66%		1000
BM_Sort_string_Random_64			74.6ns ± 3%		65.0ns ± 6%		-12.84%			75.5ns ± 3%		66.6ns ± 4%		-11.81%		1000
BM_Sort_string_Random_256			96.0ns ± 2%		85.1ns ± 2%		-11.35%			96.9ns ± 3%		86.6ns ± 4%		-10.59%		1000
BM_Sort_string_Random_1024			119ns ± 3%		105ns ± 4%		-11.64%			120ns ± 3%		107ns ± 4%		-11.06%		1000
BM_Sort_string_Random_16384			182ns ± 5%		161ns ± 8%		-11.68%			186ns ± 5%		163ns ± 6%		-12.11%		100
BM_Sort_string_Random_262144			265ns ±17%		230ns ±13%		-13.39%			276ns ±22%		227ns ± 9%		-17.60%		10
BM_Sort_string_Ascending_1			5.03ns ± 2%		5.00ns ± 1%		-0.75%			5.05ns ± 2%		5.81ns ± 1%		15.09%		1
BM_Sort_string_Ascending_4			9.03ns ±12%		8.84ns ±14%		0.00%			8.87ns ± 6%		8.98ns ±14%		0.00%		1
BM_Sort_string_Ascending_16			16.0ns ± 9%		10.9ns ±22%		-32.14%			16.5ns ±22%		11.1ns ±12%		-32.85%		1
BM_Sort_string_Ascending_64			20.6ns ±17%		19.8ns ±19%		0.00%			20.6ns ±31%		20.0ns ±23%		0.00%		1
BM_Sort_string_Ascending_256			19.6ns ±19%		19.1ns ±20%		0.00%			19.6ns ±14%		20.0ns ±17%		0.00%		1
BM_Sort_string_Ascending_1024			18.7ns ±14%		17.9ns ±23%		0.00%			19.4ns ±18%		19.2ns ±27%		0.00%		1
BM_Sort_string_Ascending_16384			36.4ns ±28%		34.4ns ±46%		0.00%			39.7ns ±17%		38.0ns ±18%		0.00%		1
BM_Sort_string_Ascending_262144			40.3ns ±32%		36.1ns ±49%		0.00%			44.4ns ±21%		38.8ns ±29%		-12.62%		1
BM_Sort_string_Descending_1			5.05ns ± 2%		5.03ns ± 2%		-0.50%			5.04ns ± 2%		5.81ns ± 2%		15.46%		1
BM_Sort_string_Descending_4			11.3ns ± 7%		11.8ns ±14%		4.51%			11.5ns ± 5%		14.0ns ± 6%		22.42%		1
BM_Sort_string_Descending_16			27.7ns ±13%		47.9ns ±16%		73.17%			27.2ns ± 5%		50.3ns ± 9%		85.19%		1
BM_Sort_string_Descending_64			28.0ns ±19%		26.4ns ±18%		0.00%			28.1ns ±11%		26.3ns ±15%		-6.21%		1
BM_Sort_string_Descending_256			25.7ns ±14%		24.5ns ±27%		0.00%			26.9ns ±15%		24.4ns ±19%		-9.56%		1
BM_Sort_string_Descending_1024			30.5ns ±16%		23.0ns ±19%		-24.39%			32.6ns ±15%		24.3ns ±16%		-25.51%		1
BM_Sort_string_Descending_16384			47.9ns ±20%		42.9ns ±37%		-10.55%			49.1ns ±42%		42.1ns ±30%		-14.24%		1
BM_Sort_string_Descending_262144		69.3ns ±35%		53.2ns ±67%		-23.20%			74.0ns ±50%		56.8ns ±47%		-23.32%		1
BM_Sort_string_SingleElement_1			5.07ns ± 2%		5.09ns ± 7%		0.00%			5.21ns ±13%		5.81ns ± 1%		11.44%		1
BM_Sort_string_SingleElement_4			8.94ns ± 8%		9.14ns ±12%		0.00%			9.03ns ±13%		9.08ns ±11%		0.00%		1
BM_Sort_string_SingleElement_16			20.1ns ± 6%		9.2ns ±13%		-54.12%			20.5ns ± 5%		9.2ns ±11%		-55.09%		1
BM_Sort_string_SingleElement_64			27.9ns ±10%		26.9ns ±20%		0.00%			28.2ns ± 9%		25.1ns ±16%		-11.04%		1
BM_Sort_string_SingleElement_256		25.9ns ± 8%		24.5ns ±23%		-5.43%			26.6ns ± 5%		24.5ns ± 8%		-7.90%		1
BM_Sort_string_SingleElement_1024		24.1ns ±17%		21.4ns ±20%		-11.23%			25.0ns ± 8%		21.6ns ± 6%		-13.53%		1
BM_Sort_string_SingleElement_16384		22.8ns ±16%		21.4ns ±26%		0.00%			24.0ns ±18%		21.7ns ± 9%		-9.38%		1
BM_Sort_string_SingleElement_262144		21.0ns ±23%		20.7ns ±28%		0.00%			22.5ns ±35%		21.7ns ±14%		0.00%		1
BM_Sort_string_PipeOrgan_1			5.10ns ± 2%		5.04ns ± 3%		-1.02%			5.07ns ± 2%		5.83ns ± 2%		14.89%		1
BM_Sort_string_PipeOrgan_4			9.89ns ±13%		9.61ns ±12%		0.00%			10.0ns ±22%		10.3ns ±15%		0.00%		1
BM_Sort_string_PipeOrgan_16			27.3ns ± 8%		18.8ns ±12%		-31.28%			28.4ns ±17%		21.1ns ± 7%		-25.66%		1
BM_Sort_string_PipeOrgan_64			39.1ns ±13%		34.3ns ±18%		-12.35%			40.3ns ±20%		47.8ns ±11%		18.65%		1
BM_Sort_string_PipeOrgan_256			49.4ns ±13%		27.9ns ±16%		-43.50%			49.4ns ± 6%		34.1ns ±15%		-30.98%		1
BM_Sort_string_PipeOrgan_1024			59.6ns ±13%		44.5ns ±20%		-25.30%			58.2ns ± 8%		48.6ns ±16%		-16.42%		1
BM_Sort_string_PipeOrgan_16384			80.8ns ±10%		65.0ns ±19%		-19.64%			82.8ns ±26%		76.8ns ±22%		-7.21%		1
BM_Sort_string_PipeOrgan_262144			126ns ±26%		92ns ±43%		-27.23%			122ns ±33%		106ns ±31%		-12.89%		1
BM_Sort_string_QuickSortAdversary_1		5.02ns ± 1%		5.00ns ± 1%		-0.40%			5.00ns ± 1%		5.82ns ± 1%		16.23%		1
BM_Sort_string_QuickSortAdversary_4		16.3ns ± 4%		16.9ns ±13%		0.00%			16.4ns ± 4%		18.0ns ± 4%		9.75%		1
BM_Sort_string_QuickSortAdversary_16		49.3ns ± 6%		40.3ns ±13%		-18.24%			50.1ns ± 5%		43.3ns ±15%		-13.41%		1
BM_Sort_string_QuickSortAdversary_64		79.1ns ±12%		68.6ns ±15%		-13.24%			77.2ns ± 5%		66.0ns ± 4%		-14.51%		1
BM_Sort_string_QuickSortAdversary_256		97.1ns ± 4%		86.0ns ± 5%		-11.42%			97.6ns ± 3%		86.3ns ± 2%		-11.57%		1
BM_Sort_string_QuickSortAdversary_1024		119ns ± 3%		106ns ± 8%		-11.30%			120ns ± 3%		107ns ± 1%		-10.91%		1
BM_Sort_string_QuickSortAdversary_16384		180ns ±13%		158ns ±12%		-12.06%			190ns ±17%		162ns ± 6%		-14.34%		1
BM_Sort_string_QuickSortAdversary_262144	252ns ±13%		220ns ±13%		-12.61%			272ns ±21%		227ns ± 7%		-16.38%		1
BM_Sort_float_Random_1				4.10ns ± 0%		4.41ns ± 0%		7.43%			4.34ns ±12%		3.63ns ± 2%		-16.37%		10
BM_Sort_float_Random_4				1.72ns ± 0%		1.76ns ± 1%		1.87%			1.83ns ±12%		6.87ns ± 1%		275.77%		100
BM_Sort_float_Random_16				11.0ns ± 1%		11.7ns ± 1%		6.85%			11.0ns ± 2%		12.6ns ± 1%		14.94%		1000
BM_Sort_float_Random_64				19.6ns ± 0%		18.3ns ± 1%		-6.60%			19.6ns ± 1%		19.0ns ± 2%		-3.13%		1000
BM_Sort_float_Random_256			28.1ns ± 0%		22.5ns ± 0%		-19.87%			28.1ns ± 1%		22.9ns ± 1%		-18.46%		1000
BM_Sort_float_Random_1024			36.6ns ± 1%		25.5ns ± 0%		-30.31%			36.7ns ± 1%		25.9ns ± 1%		-29.44%		1000
BM_Sort_float_Random_16384			53.2ns ± 0%		30.9ns ± 1%		-42.00%			53.2ns ± 0%		31.1ns ± 1%		-41.56%		100
BM_Sort_float_Random_262144			69.5ns ± 2%		36.2ns ± 1%		-47.84%			69.5ns ± 2%		36.4ns ± 1%		-47.59%		10
BM_Sort_float_Ascending_1			4.11ns ± 1%		4.41ns ± 1%		7.32%			4.11ns ± 1%		3.62ns ± 1%		-11.90%		1
BM_Sort_float_Ascending_4			1.73ns ± 0%		1.76ns ± 1%		1.77%			1.73ns ± 1%		2.04ns ± 2%		17.76%		1
BM_Sort_float_Ascending_16			0.94ns ± 1%		0.91ns ± 1%		-3.25%			0.94ns ± 0%		1.10ns ± 1%		17.31%		1
BM_Sort_float_Ascending_64			1.15ns ± 1%		1.80ns ± 0%		56.09%			1.15ns ± 0%		1.57ns ± 1%		36.45%		1
BM_Sort_float_Ascending_256			1.27ns ± 2%		1.81ns ± 1%		42.73%			1.27ns ± 1%		1.63ns ± 2%		28.86%		1
BM_Sort_float_Ascending_1024			1.11ns ± 1%		1.65ns ± 1%		49.39%			1.11ns ± 1%		1.44ns ± 3%		29.65%		1
BM_Sort_float_Ascending_16384			1.04ns ± 1%		1.59ns ± 1%		52.25%			1.04ns ± 0%		1.35ns ± 4%		29.89%		1
BM_Sort_float_Ascending_262144			1.03ns ± 1%		1.58ns ± 0%		52.92%			1.03ns ± 0%		1.22ns ± 2%		18.32%		1
BM_Sort_float_Descending_1			4.28ns ±14%		4.41ns ± 1%		2.89%			4.12ns ± 8%		3.62ns ± 1%		-12.15%		1
BM_Sort_float_Descending_4			1.73ns ± 1%		1.76ns ± 1%		1.66%			1.80ns ±14%		2.53ns ± 1%		40.19%		1
BM_Sort_float_Descending_16			5.04ns ± 1%		4.72ns ± 1%		-6.36%			5.26ns ±14%		4.76ns ± 1%		-9.44%		1
BM_Sort_float_Descending_64			2.07ns ± 1%		3.44ns ± 1%		66.02%			2.18ns ±13%		3.47ns ± 1%		59.20%		1
BM_Sort_float_Descending_256			2.02ns ± 1%		3.34ns ± 0%		65.55%			2.12ns ±13%		3.25ns ± 2%		53.09%		1
BM_Sort_float_Descending_1024			2.38ns ± 1%		3.18ns ± 1%		33.86%			2.48ns ±14%		2.83ns ± 1%		14.08%		1
BM_Sort_float_Descending_16384			2.12ns ± 1%		3.08ns ± 1%		45.43%			2.12ns ± 2%		2.57ns ± 2%		21.13%		1
BM_Sort_float_Descending_262144			2.09ns ± 1%		3.07ns ± 1%		46.51%			2.10ns ± 1%		2.57ns ± 1%		22.61%		1
BM_Sort_float_SingleElement_1			4.30ns ±13%		4.47ns ±11%		3.80%			4.11ns ± 1%		3.63ns ± 2%		-11.60%		1
BM_Sort_float_SingleElement_4			1.84ns ±11%		1.86ns ±12%		0.93%			1.73ns ± 1%		2.06ns ± 8%		19.13%		1
BM_Sort_float_SingleElement_16			1.02ns ±10%		1.56ns ±13%		53.04%			0.94ns ± 1%		1.18ns ±11%		25.23%		1
BM_Sort_float_SingleElement_64			1.24ns ±12%		2.09ns ±13%		68.70%			1.16ns ± 0%		1.58ns ±10%		35.98%		1
BM_Sort_float_SingleElement_256			1.34ns ±11%		1.91ns ± 3%		42.54%			1.26ns ± 1%		1.41ns ±10%		12.00%		1
BM_Sort_float_SingleElement_1024		1.26ns ±11%		1.80ns ± 3%		42.89%			1.19ns ± 1%		1.28ns ± 9%		0.00%		1
BM_Sort_float_SingleElement_16384		1.18ns ±14%		1.75ns ± 1%		48.21%			1.13ns ± 0%		1.20ns ±10%		0.00%		1
BM_Sort_float_SingleElement_262144		1.13ns ± 2%		1.75ns ± 8%		55.35%			1.12ns ± 0%		1.17ns ±10%		0.00%		1
BM_Sort_float_PipeOrgan_1			4.11ns ± 1%		4.42ns ± 1%		7.55%			4.11ns ± 1%		3.84ns ±12%		0.00%		1
BM_Sort_float_PipeOrgan_4			1.73ns ± 0%		1.78ns ±11%		3.07%			1.73ns ± 1%		2.04ns ± 1%		17.77%		1
BM_Sort_float_PipeOrgan_16			1.96ns ± 1%		2.05ns ±11%		0.00%			1.96ns ± 1%		2.08ns ± 1%		6.46%		1
BM_Sort_float_PipeOrgan_64			6.41ns ± 1%		3.95ns ±11%		-38.40%			6.41ns ± 0%		4.97ns ± 1%		-22.52%		1
BM_Sort_float_PipeOrgan_256			2.76ns ± 1%		3.56ns ±10%		28.91%			2.76ns ± 0%		4.13ns ± 2%		49.87%		1
BM_Sort_float_PipeOrgan_1024			3.67ns ± 1%		6.24ns ± 9%		70.06%			3.67ns ± 1%		5.72ns ± 1%		56.04%		1
BM_Sort_float_PipeOrgan_16384			4.33ns ± 1%		7.35ns ±10%		69.67%			4.33ns ± 1%		6.51ns ± 1%		50.30%		1
BM_Sort_float_PipeOrgan_262144			5.16ns ± 0%		8.17ns ±11%		58.39%			5.16ns ± 0%		7.33ns ± 1%		42.12%		1
BM_Sort_float_QuickSortAdversary_1		4.13ns ± 8%		4.69ns ±11%		13.50%			4.11ns ± 1%		3.61ns ± 1%		-12.03%		1
BM_Sort_float_QuickSortAdversary_4		1.73ns ± 0%		1.85ns ±13%		6.97%			1.73ns ± 1%		1.96ns ± 1%		13.01%		1
BM_Sort_float_QuickSortAdversary_16		0.94ns ± 1%		0.91ns ± 0%		-3.23%			0.99ns ±13%		1.06ns ± 2%		7.42%		1
BM_Sort_float_QuickSortAdversary_64		10.9ns ± 1%		15.8ns ± 1%		44.76%			11.5ns ±13%		13.2ns ± 1%		14.68%		1
BM_Sort_float_QuickSortAdversary_256		14.7ns ± 1%		22.9ns ± 0%		55.56%			15.5ns ±13%		16.7ns ± 1%		8.02%		1
BM_Sort_float_QuickSortAdversary_1024		34.1ns ± 3%		38.3ns ± 1%		12.36%			35.9ns ±13%		42.3ns ± 4%		18.04%		1
BM_Sort_float_QuickSortAdversary_16384		65.1ns ± 4%		73.6ns ± 2%		12.99%			68.4ns ±13%		70.1ns ± 1%		2.46%		1
BM_Sort_float_QuickSortAdversary_262144		69.8ns ± 1%		80.9ns ± 1%		15.87%			74.1ns ±11%		81.0ns ± 2%		0.00%		1
																
Weighted Average										-15.93%									-15.16%

As it stands, we basically have to decide whether to optimize for randomly sorted data or to also take into account common patterns IIUC. That's not a super easy call to make, since we're talking about 40-50% improvements and 40-50% regressions in each case. I remember @Morwenn had mentioned pdqsort previously, and that seemed to be an algorithm that would handle both random data and also patterns properly. Would that solve the problems of this algorithm?

@ldionne : if we collect and provide data on the relative change in cycles spent on std::sort for Google's data centers with the new implementation, would that be sufficient to land this diff in libcxx? How much improvement would you expect from the new implementation for tilting the balance in its favor?

FWIW I think this would be a good change to accept for the following reasons:

  1. random patterns are more likely than specific patterns, especially because some of these patterns are quite unrealistic (e.g. single element repeated 256k times)
  2. these deterministic patterns tend to be much faster to sort anyway (e.g. ~1ns vs ~64ns for ascending vs random)
  3. the fact that they are so fast also increases the measurement noise so the numbers might not be entirely reliable
  4. for the reasons above, I see the benchmarks for these edge cases to be more of a sanity check (i.e. are we making ascending terrible? this does not seem to be the case here)
  5. the numbers for strings are great and sorting strings is one of the most common use cases

Pinging @ldionne. Can you comment on the data that would help you to make the decision? I am trying to convince folks at Google on trying out this diff. The main question being asked is what data you would like to see?

As it stands, we basically have to decide whether to optimize for randomly sorted data or to also take into account common patterns IIUC. That's not a super easy call to make, since we're talking about 40-50% improvements and 40-50% regressions in each case. I remember @Morwenn had mentioned pdqsort previously, and that seemed to be an algorithm that would handle both random data and also patterns properly. Would that solve the problems of this algorithm?

@ldionne : if we collect and provide data on the relative change in cycles spent on std::sort for Google's data centers with the new implementation, would that be sufficient to land this diff in libcxx? How much improvement would you expect from the new implementation for tilting the balance in its favor?

ldionne accepted this revision.Nov 11 2022, 4:22 PM

@ldionne : if we collect and provide data on the relative change in cycles spent on std::sort for Google's data centers with the new implementation, would that be sufficient to land this diff in libcxx? How much improvement would you expect from the new implementation for tilting the balance in its favor?

Thanks for adding the weights to the benchmarks here, I think this was the missing link for me. You and @marcogelmi are right, the patterns that are the most important get an improvement, so I think this is good to go.

Let's rebase this, wait for the CI to be green, and merge this. Let me or another libc++ review group member know if you need help with merging this (if you do, please provide Author Name <email@domain> for commit attribution).

Is https://reviews.llvm.org/D93233 still relevant?

This revision is now accepted and ready to land.Nov 11 2022, 4:22 PM

As you rebase, it may be worth adding a release note explaining that we revised our implementation of std::sort, that it should be much faster in most cases now and that they can file a bug report in case they notice a notable performance regression.

nilayvaish added a comment.EditedNov 28 2022, 1:53 PM

@ldionne : thanks for the accepting the change. We are trying to evaluate the effect of the change on Android APK's size. You might recall there were several comments about increased APK size in https://reviews.llvm.org/D113413 after it was submitted. I think it would be better if we are aware of the change in APK's size beforehand.

The evaluation is taking time, hopefully would get done soon.

@ldionne : thanks for the accepting the change. We are trying to evaluate the effect of the change on Android APK's size. You might recall there were several comments about increased APK size in https://reviews.llvm.org/D113413 after it was submitted. I think it would be better if we are aware of the change in APK's size beforehand.

The evaluation is taking time, hopefully would get done soon.

Thanks for the update. We'll be eager to hear about the results if you are allowed to share them. Hopefully there is no prohibitive code size regression. If so, it may be worth investigating if externally instantiating some of these templates in the dylib could mitigate the regression. IIRC this optimization was only enabled for integer types, right?

@ldionne : thanks for the accepting the change. We are trying to evaluate the effect of the change on Android APK's size. You might recall there were several comments about increased APK size in https://reviews.llvm.org/D113413 after it was submitted. I think it would be better if we are aware of the change in APK's size beforehand.

The evaluation is taking time, hopefully would get done soon.

Thanks for the update. We'll be eager to hear about the results if you are allowed to share them. Hopefully there is no prohibitive code size regression. If so, it may be worth investigating if externally instantiating some of these templates in the dylib could mitigate the regression. IIRC this optimization was only enabled for integer types, right?

@ayzhao reported that this change would increase Chrome Android APK size by about 65.54 KB. The Chrome team is OK with this for now. For comparison, when we shifted to introsort, the increase was about 120 KB.

The optimization would be enabled for all arithmetic types.

I would next rebase this change and redo the internal testing I did previously. Would post the rebased diff soon.

Thanks for looking into the APK size impact. Could you give me a couple of days (I'll try my best to get to this tomorrow) to evaluate this on Meta's Android apps as well? I maintain the libc++ used by those, and we have some apps that are incredibly sensitive to APK size (increases of even a few KB are flagged), so I'd appreciate the chance to evaluate this before it lands. (If I haven't been able to get to this by Wednesday of next week, feel free to go ahead regardless.)

I don't know if the NDK folks are also interested in the size implications here, but CCing @rprichard just in case.

Ah, I was hoping I'd be able to rebase the current version onto the 15.x release branch cleanly, but I'm running into conflicts that I'm not sure how to resolve. I'll wait for the rebased version and then try that.

Rebase to the HEAD revision

ldionne added inline comments.Dec 5 2022, 11:41 AM
libcxx/include/__algorithm/sort.h
343

Your rebase is incorrect. You should be using _Ops::__iter_move(__k) instead of std::move(*__k) here and in a bunch of other places.

nilayvaish updated this revision to Diff 480614.Dec 6 2022, 2:06 PM

Changed to use _Ops::__iter_move instead of std::move.

nilayvaish updated this revision to Diff 480919.Dec 7 2022, 8:08 AM

Changed more std::move to _Ops::__iter_move

Passes tests on my machine.

I'm not able to apply the latest version of the patch, and it looks like CI isn't either. Which revision is this patch based on top of?

Your stack still has D122858 in it, which is making arc patch unhappy locally, but even taking the raw patch and using git apply is failing for me.

Properly update the diff

I'm not able to apply the latest version of the patch, and it looks like CI isn't either. Which revision is this patch based on top of?

Your stack still has D122858 in it, which is making arc patch unhappy locally, but even taking the raw patch and using git apply is failing for me.

@smeenai , can you try again? I think I don't understand the 'arc' tool that well.

I'm not able to apply the latest version of the patch, and it looks like CI isn't either. Which revision is this patch based on top of?

Your stack still has D122858 in it, which is making arc patch unhappy locally, but even taking the raw patch and using git apply is failing for me.

@smeenai , can you try again? I think I don't understand the 'arc' tool that well.

It's a pretty hard tool to understand, to be fair :)

I'm able to apply this now, thanks. I'm working on the size evaluation today; I should have that today if I don't run into any unexpected snags.

Corrected errors pointed out by the CI

The size regressions from this are acceptably small for us, so we have no problems with that. Thanks for flagging it.

Rebase to HEAD

nilayvaish marked an inline comment as done.Dec 19 2022, 2:05 PM

@ldionne : this diff is ready for submission. If you could do so, that would be great.

libcxx/include/__algorithm/sort.h
343

I have fixed all the places. std::move is being only used to move the value, not the iterator.

ldionne accepted this revision.Dec 20 2022, 4:38 PM

@nilayvaish Can you follow these steps to get commit access? https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access You can CC me on the email.

I would normally be happy to apply this for you, but I'm about to go OOO and I think it would be wise for you to have commit access in case this needs to be reverted (hopefully that does not happen).

nilayvaish marked an inline comment as done.

Rebase to head

Rebase to HEAD

Buildkite seems to have been failing for unrelated reasons.