This is an archive of the discontinued LLVM Phabricator instance.

Improve std::stable_sort using radix sort algorithm
DraftPublic

Authored by izvolov on Oct 23 2021, 2:56 AM.
This is a draft revision that has not yet been submitted for review.

Details

Reviewers
None
Summary

std::stable_sort uses a temporary buffer, which can be used to sort integers with radix sort algorithm. It dramatically improves the performance in case of sorting integers.

2021-10-23T13:05:54+03:00
Running ./projects/libcxx/benchmarks/algorithms.libcxx.out
Run on (12 X 2600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 3.27, 2.56, 2.18
--------------------------------------------------------------------------
Benchmark                                          Before            After 
--------------------------------------------------------------------------
BM_StableSort_uint32_Random_1                     3.43 ns          3.09 ns
BM_StableSort_uint32_Random_4                     4.97 ns          5.20 ns
BM_StableSort_uint32_Random_16                    8.87 ns          8.65 ns
BM_StableSort_uint32_Random_64                    13.9 ns          14.4 ns
BM_StableSort_uint32_Random_256                   22.1 ns          5.31 ns
BM_StableSort_uint32_Random_1024                  31.0 ns          3.97 ns
BM_StableSort_uint32_Random_16384                 47.7 ns          8.07 ns
BM_StableSort_uint32_Random_262144                66.5 ns          13.0 ns
BM_StableSort_uint32_Random_1048576               71.5 ns          20.8 ns
BM_StableSort_uint32_Random_33554432              95.4 ns          38.8 ns
BM_StableSort_uint32_Ascending_1                  3.71 ns          3.55 ns
BM_StableSort_uint32_Ascending_4                  1.43 ns          1.32 ns
BM_StableSort_uint32_Ascending_16                0.828 ns         0.963 ns
BM_StableSort_uint32_Ascending_64                0.637 ns         0.784 ns
BM_StableSort_uint32_Ascending_256                1.94 ns          4.52 ns
BM_StableSort_uint32_Ascending_1024               2.78 ns          2.56 ns
BM_StableSort_uint32_Ascending_16384              4.80 ns          1.91 ns
BM_StableSort_uint32_Ascending_262144             7.03 ns          2.77 ns
BM_StableSort_uint32_Ascending_1048576            9.50 ns          4.85 ns
BM_StableSort_uint32_Ascending_33554432           12.6 ns          8.25 ns
BM_StableSort_uint32_Descending_1                 3.67 ns          3.56 ns
BM_StableSort_uint32_Descending_4                 1.90 ns          2.19 ns
BM_StableSort_uint32_Descending_16                3.45 ns          5.12 ns
BM_StableSort_uint32_Descending_64                13.6 ns          18.9 ns
BM_StableSort_uint32_Descending_256               15.7 ns          5.26 ns
BM_StableSort_uint32_Descending_1024              15.9 ns          3.86 ns
BM_StableSort_uint32_Descending_16384             18.1 ns          9.01 ns
BM_StableSort_uint32_Descending_262144            20.5 ns          14.5 ns
BM_StableSort_uint32_Descending_1048576           21.8 ns          16.2 ns
BM_StableSort_uint32_Descending_33554432          39.7 ns          27.6 ns
BM_StableSort_uint32_SingleElement_1              3.68 ns          3.59 ns
BM_StableSort_uint32_SingleElement_4              1.41 ns          1.37 ns
BM_StableSort_uint32_SingleElement_16            0.822 ns         0.914 ns
BM_StableSort_uint32_SingleElement_64            0.632 ns         0.784 ns
BM_StableSort_uint32_SingleElement_256            1.94 ns          4.62 ns
BM_StableSort_uint32_SingleElement_1024           2.80 ns          2.76 ns
BM_StableSort_uint32_SingleElement_16384          4.81 ns          2.11 ns
BM_StableSort_uint32_SingleElement_262144         6.99 ns          2.95 ns
BM_StableSort_uint32_SingleElement_1048576        8.83 ns          5.49 ns
BM_StableSort_uint32_SingleElement_33554432       12.3 ns          8.51 ns
BM_StableSort_uint32_PipeOrgan_1                  3.65 ns          3.57 ns
BM_StableSort_uint32_PipeOrgan_4                  1.52 ns          1.57 ns
BM_StableSort_uint32_PipeOrgan_16                 1.86 ns          2.02 ns
BM_StableSort_uint32_PipeOrgan_64                 3.55 ns          5.08 ns
BM_StableSort_uint32_PipeOrgan_256                8.57 ns          5.26 ns
BM_StableSort_uint32_PipeOrgan_1024               9.38 ns          3.82 ns
BM_StableSort_uint32_PipeOrgan_16384              11.9 ns          9.58 ns
BM_StableSort_uint32_PipeOrgan_262144             13.8 ns          8.50 ns
BM_StableSort_uint32_PipeOrgan_1048576            15.4 ns          10.6 ns
BM_StableSort_uint32_PipeOrgan_33554432           25.5 ns          17.5 ns
BM_StableSort_uint64_Random_1                     3.27 ns          3.33 ns
BM_StableSort_uint64_Random_4                     5.31 ns          4.97 ns
BM_StableSort_uint64_Random_16                    8.97 ns          9.44 ns
BM_StableSort_uint64_Random_64                    14.6 ns          15.0 ns
BM_StableSort_uint64_Random_256                   23.2 ns          22.9 ns
BM_StableSort_uint64_Random_1024                  31.1 ns          7.06 ns
BM_StableSort_uint64_Random_16384                 47.2 ns          13.7 ns
BM_StableSort_uint64_Random_262144                63.9 ns          21.8 ns
BM_StableSort_uint64_Random_1048576               71.7 ns          19.7 ns
BM_StableSort_uint64_Random_33554432              97.1 ns          44.3 ns
BM_StableSort_uint64_Ascending_1                  3.34 ns          3.69 ns
BM_StableSort_uint64_Ascending_4                  1.54 ns          1.48 ns
BM_StableSort_uint64_Ascending_16                 1.00 ns         0.743 ns
BM_StableSort_uint64_Ascending_64                0.889 ns         0.605 ns
BM_StableSort_uint64_Ascending_256                2.00 ns          1.64 ns
BM_StableSort_uint64_Ascending_1024               2.71 ns          5.07 ns
BM_StableSort_uint64_Ascending_16384              4.44 ns          3.89 ns
BM_StableSort_uint64_Ascending_262144             7.39 ns          3.76 ns
BM_StableSort_uint64_Ascending_1048576            11.0 ns          7.27 ns
BM_StableSort_uint64_Ascending_33554432           15.3 ns          13.1 ns
BM_StableSort_uint64_Descending_1                 3.34 ns          3.70 ns
BM_StableSort_uint64_Descending_4                 2.04 ns          1.82 ns
BM_StableSort_uint64_Descending_16                4.21 ns          3.64 ns
BM_StableSort_uint64_Descending_64                16.2 ns          14.5 ns
BM_StableSort_uint64_Descending_256               17.7 ns          15.9 ns
BM_StableSort_uint64_Descending_1024              18.8 ns          7.12 ns
BM_StableSort_uint64_Descending_16384             20.3 ns          12.3 ns
BM_StableSort_uint64_Descending_262144            23.2 ns          27.9 ns
BM_StableSort_uint64_Descending_1048576           26.9 ns          31.1 ns
BM_StableSort_uint64_Descending_33554432          45.5 ns          36.6 ns
BM_StableSort_uint64_SingleElement_1              3.38 ns          3.87 ns
BM_StableSort_uint64_SingleElement_4              1.52 ns          1.45 ns
BM_StableSort_uint64_SingleElement_16             1.01 ns         0.744 ns
BM_StableSort_uint64_SingleElement_64            0.968 ns         0.619 ns
BM_StableSort_uint64_SingleElement_256            2.01 ns          1.64 ns
BM_StableSort_uint64_SingleElement_1024           2.70 ns          6.10 ns
BM_StableSort_uint64_SingleElement_16384          4.43 ns          5.07 ns
BM_StableSort_uint64_SingleElement_262144         6.94 ns          5.14 ns
BM_StableSort_uint64_SingleElement_1048576        10.8 ns          9.03 ns
BM_StableSort_uint64_SingleElement_33554432       15.4 ns          14.6 ns
BM_StableSort_uint64_PipeOrgan_1                  3.37 ns          3.67 ns
BM_StableSort_uint64_PipeOrgan_4                  1.71 ns          1.49 ns
BM_StableSort_uint64_PipeOrgan_16                 1.84 ns          1.74 ns
BM_StableSort_uint64_PipeOrgan_64                 4.99 ns          3.80 ns
BM_StableSort_uint64_PipeOrgan_256                10.6 ns          8.74 ns
BM_StableSort_uint64_PipeOrgan_1024               11.3 ns          7.29 ns
BM_StableSort_uint64_PipeOrgan_16384              12.8 ns          12.8 ns
BM_StableSort_uint64_PipeOrgan_262144             15.4 ns          28.2 ns
BM_StableSort_uint64_PipeOrgan_1048576            19.6 ns          20.4 ns
BM_StableSort_uint64_PipeOrgan_33554432           30.9 ns          25.2 ns

Diff Detail

Event Timeline

izvolov created this revision.Oct 23 2021, 2:56 AM
izvolov edited the summary of this revision. (Show Details)Oct 23 2021, 3:12 AM
izvolov edited the summary of this revision. (Show Details)Oct 23 2021, 3:19 AM
izvolov updated this revision to Diff 381723.Oct 23 2021, 4:26 AM

Update formatting, include lost files.

izvolov updated this revision to Diff 381726.Oct 23 2021, 4:42 AM

Update formatting.

izvolov updated this revision to Diff 381728.Oct 23 2021, 5:16 AM

no comments

izvolov updated this revision to Diff 383655.Oct 31 2021, 8:28 AM

C++17 only for now

izvolov updated this revision to Diff 385288.Nov 6 2021, 11:24 AM

defaultinit