This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Optimize std::ranges::{min, max} for types that are cheap to copy
ClosedPublic

Authored by philnik on Feb 8 2023, 10:48 AM.

Details

Summary

Don't forward to min_element for small types that are trivially copyable, and instead use a naive loop that keeps track of the smallest element (as opposed to an iterator to the smallest element). This allows the compiler to vectorize the loop in some cases.

Diff Detail

Event Timeline

philnik created this revision.Feb 8 2023, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2023, 10:48 AM
philnik added a comment.EditedFeb 8 2023, 10:50 AM

Benchmark results:

----------------------------------------------------------------------
Benchmark                                          old             new
----------------------------------------------------------------------
BM_std_min<char>/1                            0.396 ns        0.353 ns
BM_std_min<char>/2                             1.06 ns        0.941 ns
BM_std_min<char>/3                             1.46 ns         1.52 ns
BM_std_min<char>/4                             1.94 ns         2.00 ns
BM_std_min<char>/5                             1.88 ns         2.47 ns
BM_std_min<char>/6                             2.24 ns         3.10 ns
BM_std_min<char>/7                             2.64 ns         3.57 ns
BM_std_min<char>/8                             3.04 ns         4.39 ns
BM_std_min<char>/9                             3.04 ns         5.01 ns
BM_std_min<char>/10                            3.44 ns         5.11 ns
BM_std_min<char>/11                            3.86 ns         6.76 ns
BM_std_min<char>/12                            4.30 ns         6.26 ns
BM_std_min<char>/13                            4.47 ns         6.94 ns
BM_std_min<char>/14                            4.70 ns         7.58 ns
BM_std_min<char>/15                            5.30 ns         8.34 ns
BM_std_min<char>/16                            5.69 ns         9.23 ns
BM_std_min<char>/17                            5.89 ns         1.41 ns
BM_std_min<char>/18                            6.25 ns         1.88 ns
BM_std_min<char>/19                            6.68 ns         2.40 ns
BM_std_min<char>/20                            7.19 ns         2.86 ns
BM_std_min<char>/21                            7.26 ns         3.44 ns
BM_std_min<char>/22                            7.64 ns         3.96 ns
BM_std_min<char>/23                            8.16 ns         4.65 ns
BM_std_min<char>/24                            8.91 ns         5.32 ns
BM_std_min<char>/25                            9.13 ns         5.91 ns
BM_std_min<char>/26                            9.55 ns         6.62 ns
BM_std_min<char>/27                            9.95 ns         7.30 ns
BM_std_min<char>/28                            10.5 ns         7.92 ns
BM_std_min<char>/29                            10.7 ns         8.77 ns
BM_std_min<char>/30                            11.1 ns         9.69 ns
BM_std_min<char>/31                            11.5 ns         10.6 ns
BM_std_min<char>/32                            12.0 ns         11.3 ns
BM_std_min<char>/64                            27.7 ns         11.9 ns
BM_std_min<char>/512                            290 ns         17.8 ns
BM_std_min<char>/1024                           592 ns         20.9 ns
BM_std_min<char>/4000                          2326 ns         37.1 ns
BM_std_min<char>/4096                          2368 ns         38.1 ns
BM_std_min<char>/5500                          3200 ns         38.7 ns
BM_std_min<char>/64000                        37334 ns          487 ns
BM_std_min<char>/65536                        38101 ns          498 ns
BM_std_min<char>/70000                        40719 ns          530 ns
BM_std_min<short>/1                           0.396 ns        0.472 ns
BM_std_min<short>/2                           0.792 ns        0.942 ns
BM_std_min<short>/3                            1.19 ns         1.27 ns
BM_std_min<short>/4                            1.58 ns         1.60 ns
BM_std_min<short>/5                            1.98 ns         1.97 ns
BM_std_min<short>/6                            2.38 ns         2.60 ns
BM_std_min<short>/7                            2.78 ns         2.83 ns
BM_std_min<short>/8                            3.17 ns         3.52 ns
BM_std_min<short>/9                            3.57 ns         2.09 ns
BM_std_min<short>/10                           3.96 ns         2.13 ns
BM_std_min<short>/11                           4.35 ns         2.37 ns
BM_std_min<short>/12                           4.75 ns         2.68 ns
BM_std_min<short>/13                           5.16 ns         3.13 ns
BM_std_min<short>/14                           5.54 ns         3.70 ns
BM_std_min<short>/15                           5.94 ns         4.31 ns
BM_std_min<short>/16                           6.33 ns         4.70 ns
BM_std_min<short>/17                           6.73 ns         1.90 ns
BM_std_min<short>/18                           7.13 ns         2.13 ns
BM_std_min<short>/19                           7.52 ns         2.41 ns
BM_std_min<short>/20                           7.92 ns         2.81 ns
BM_std_min<short>/21                           8.31 ns         3.32 ns
BM_std_min<short>/22                           8.72 ns         3.81 ns
BM_std_min<short>/23                           9.15 ns         4.33 ns
BM_std_min<short>/24                           9.57 ns         4.82 ns
BM_std_min<short>/25                           9.98 ns         2.14 ns
BM_std_min<short>/26                           10.4 ns         2.36 ns
BM_std_min<short>/27                           10.8 ns         2.63 ns
BM_std_min<short>/28                           11.3 ns         2.99 ns
BM_std_min<short>/29                           11.7 ns         3.47 ns
BM_std_min<short>/30                           12.1 ns         3.97 ns
BM_std_min<short>/31                           13.2 ns         4.52 ns
BM_std_min<short>/32                           13.0 ns         5.12 ns
BM_std_min<short>/64                           28.4 ns         5.86 ns
BM_std_min<short>/512                           341 ns         11.2 ns
BM_std_min<short>/1024                          699 ns         15.8 ns
BM_std_min<short>/4000                         2790 ns         51.1 ns
BM_std_min<short>/4096                         2858 ns         52.2 ns
BM_std_min<short>/5500                         3847 ns         69.2 ns
BM_std_min<short>/64000                       45064 ns          947 ns
BM_std_min<short>/65536                       46449 ns          970 ns
BM_std_min<short>/70000                       49629 ns         1034 ns
BM_std_min<int>/1                             0.398 ns        0.352 ns
BM_std_min<int>/2                             0.798 ns        0.939 ns
BM_std_min<int>/3                              1.21 ns         1.23 ns
BM_std_min<int>/4                              1.58 ns         1.56 ns
BM_std_min<int>/5                              1.98 ns         1.88 ns
BM_std_min<int>/6                              2.38 ns         2.31 ns
BM_std_min<int>/7                              3.37 ns         2.88 ns
BM_std_min<int>/8                              3.17 ns         3.47 ns
BM_std_min<int>/9                              3.56 ns         3.66 ns
BM_std_min<int>/10                             3.96 ns         4.12 ns
BM_std_min<int>/11                             4.43 ns         5.09 ns
BM_std_min<int>/12                             4.75 ns         5.05 ns
BM_std_min<int>/13                             5.15 ns         5.45 ns
BM_std_min<int>/14                             5.54 ns         5.93 ns
BM_std_min<int>/15                             5.94 ns         6.46 ns
BM_std_min<int>/16                             6.33 ns         6.89 ns
BM_std_min<int>/17                             6.73 ns         7.50 ns
BM_std_min<int>/18                             7.59 ns         8.18 ns
BM_std_min<int>/19                             7.52 ns         8.73 ns
BM_std_min<int>/20                             7.92 ns         9.38 ns
BM_std_min<int>/21                             8.31 ns         9.98 ns
BM_std_min<int>/22                             9.27 ns         10.4 ns
BM_std_min<int>/23                             9.11 ns         10.9 ns
BM_std_min<int>/24                             9.51 ns         11.4 ns
BM_std_min<int>/25                             9.91 ns         12.0 ns
BM_std_min<int>/26                             10.3 ns         12.5 ns
BM_std_min<int>/27                             10.7 ns         13.0 ns
BM_std_min<int>/28                             11.1 ns         13.4 ns
BM_std_min<int>/29                             11.5 ns         14.1 ns
BM_std_min<int>/30                             11.9 ns         14.7 ns
BM_std_min<int>/31                             13.2 ns         15.0 ns
BM_std_min<int>/32                             12.8 ns         15.6 ns
BM_std_min<int>/64                             28.9 ns         19.4 ns
BM_std_min<int>/512                             342 ns         29.2 ns
BM_std_min<int>/1024                            699 ns         43.6 ns
BM_std_min<int>/4000                           2785 ns          114 ns
BM_std_min<int>/4096                           2849 ns          118 ns
BM_std_min<int>/5500                           3832 ns          147 ns
BM_std_min<int>/64000                         44817 ns         1898 ns
BM_std_min<int>/65536                         45893 ns         1949 ns
BM_std_min<int>/70000                         49026 ns         2068 ns
BM_std_min<long long>/1                       0.396 ns        0.353 ns
BM_std_min<long long>/2                       0.791 ns        0.938 ns
BM_std_min<long long>/3                        1.19 ns         1.17 ns
BM_std_min<long long>/4                        1.59 ns         2.33 ns
BM_std_min<long long>/5                        2.12 ns         1.87 ns
BM_std_min<long long>/6                        3.06 ns         2.33 ns
BM_std_min<long long>/7                        2.96 ns         2.73 ns
BM_std_min<long long>/8                        3.19 ns         3.14 ns
BM_std_min<long long>/9                        3.58 ns         3.61 ns
BM_std_min<long long>/10                       3.97 ns         4.10 ns
BM_std_min<long long>/11                       4.35 ns         4.54 ns
BM_std_min<long long>/12                       4.75 ns         4.96 ns
BM_std_min<long long>/13                       5.15 ns         5.43 ns
BM_std_min<long long>/14                       5.54 ns         5.90 ns
BM_std_min<long long>/15                       5.94 ns         6.46 ns
BM_std_min<long long>/16                       6.33 ns         6.91 ns
BM_std_min<long long>/17                       6.73 ns         2.70 ns
BM_std_min<long long>/18                       7.12 ns         3.13 ns
BM_std_min<long long>/19                       7.52 ns         3.47 ns
BM_std_min<long long>/20                       7.92 ns         3.85 ns
BM_std_min<long long>/21                       8.34 ns         4.26 ns
BM_std_min<long long>/22                       8.72 ns         4.86 ns
BM_std_min<long long>/23                       9.14 ns         5.53 ns
BM_std_min<long long>/24                       9.57 ns         6.18 ns
BM_std_min<long long>/25                       9.96 ns         6.68 ns
BM_std_min<long long>/26                       10.4 ns         7.24 ns
BM_std_min<long long>/27                       10.8 ns         7.74 ns
BM_std_min<long long>/28                       11.2 ns         8.37 ns
BM_std_min<long long>/29                       11.6 ns         9.10 ns
BM_std_min<long long>/30                       12.1 ns         9.74 ns
BM_std_min<long long>/31                       13.3 ns         10.5 ns
BM_std_min<long long>/32                       13.0 ns         11.2 ns
BM_std_min<long long>/64                       28.4 ns         12.6 ns
BM_std_min<long long>/512                       342 ns         43.0 ns
BM_std_min<long long>/1024                      699 ns         76.5 ns
BM_std_min<long long>/4000                     2776 ns          270 ns
BM_std_min<long long>/4096                     2844 ns          278 ns
BM_std_min<long long>/5500                     3823 ns          374 ns
BM_std_min<long long>/64000                   44663 ns         4872 ns
BM_std_min<long long>/65536                   45742 ns         4985 ns
BM_std_min<long long>/70000                   48852 ns         5476 ns
BM_std_min<__int128>/1                        0.396 ns        0.471 ns
BM_std_min<__int128>/2                         1.32 ns         1.20 ns
BM_std_min<__int128>/3                         2.24 ns         2.01 ns
BM_std_min<__int128>/4                         3.17 ns         2.77 ns
BM_std_min<__int128>/5                         4.11 ns         3.57 ns
BM_std_min<__int128>/6                         5.02 ns         4.46 ns
BM_std_min<__int128>/7                         5.96 ns         5.40 ns
BM_std_min<__int128>/8                         6.93 ns         6.30 ns
BM_std_min<__int128>/9                         7.90 ns         7.24 ns
BM_std_min<__int128>/10                        8.84 ns         8.22 ns
BM_std_min<__int128>/11                        9.85 ns         9.22 ns
BM_std_min<__int128>/12                        10.8 ns         10.2 ns
BM_std_min<__int128>/13                        11.8 ns         11.2 ns
BM_std_min<__int128>/14                        12.8 ns         12.2 ns
BM_std_min<__int128>/15                        13.9 ns         13.2 ns
BM_std_min<__int128>/16                        14.8 ns         14.2 ns
BM_std_min<__int128>/17                        15.9 ns         15.3 ns
BM_std_min<__int128>/18                        24.1 ns         16.4 ns
BM_std_min<__int128>/19                        17.8 ns         17.4 ns
BM_std_min<__int128>/20                        18.8 ns         18.5 ns
BM_std_min<__int128>/21                        19.7 ns         19.5 ns
BM_std_min<__int128>/22                        20.8 ns         20.6 ns
BM_std_min<__int128>/23                        21.7 ns         21.7 ns
BM_std_min<__int128>/24                        22.9 ns         22.7 ns
BM_std_min<__int128>/25                        23.7 ns         23.8 ns
BM_std_min<__int128>/26                        24.7 ns         24.8 ns
BM_std_min<__int128>/27                        25.7 ns         25.9 ns
BM_std_min<__int128>/28                        26.8 ns         27.0 ns
BM_std_min<__int128>/29                        27.8 ns         28.0 ns
BM_std_min<__int128>/30                        28.6 ns         29.1 ns
BM_std_min<__int128>/31                        32.9 ns         30.1 ns
BM_std_min<__int128>/32                        30.8 ns         31.1 ns
BM_std_min<__int128>/64                        63.6 ns         64.7 ns
BM_std_min<__int128>/512                        538 ns          542 ns
BM_std_min<__int128>/1024                      1082 ns         1084 ns
BM_std_min<__int128>/4000                      4244 ns         4242 ns
BM_std_min<__int128>/4096                      4353 ns         4326 ns
BM_std_min<__int128>/5500                      5841 ns         5823 ns
BM_std_min<__int128>/64000                    67872 ns        68063 ns
BM_std_min<__int128>/65536                    70434 ns        69720 ns
BM_std_min<__int128>/70000                    74210 ns        74484 ns
BM_std_min<unsigned char>/1                   0.396 ns        0.354 ns
BM_std_min<unsigned char>/2                    1.06 ns        0.953 ns
BM_std_min<unsigned char>/3                    1.45 ns         2.08 ns
BM_std_min<unsigned char>/4                    1.86 ns         1.56 ns
BM_std_min<unsigned char>/5                    1.88 ns         1.93 ns
BM_std_min<unsigned char>/6                    2.24 ns         2.34 ns
BM_std_min<unsigned char>/7                    2.64 ns         2.76 ns
BM_std_min<unsigned char>/8                    3.04 ns         3.23 ns
BM_std_min<unsigned char>/9                    3.04 ns         3.67 ns
BM_std_min<unsigned char>/10                   3.44 ns         4.15 ns
BM_std_min<unsigned char>/11                   3.85 ns         4.59 ns
BM_std_min<unsigned char>/12                   4.30 ns         5.09 ns
BM_std_min<unsigned char>/13                   4.44 ns         5.59 ns
BM_std_min<unsigned char>/14                   4.71 ns         6.12 ns
BM_std_min<unsigned char>/15                   5.29 ns         6.57 ns
BM_std_min<unsigned char>/16                   5.69 ns         7.10 ns
BM_std_min<unsigned char>/17                   5.85 ns         1.42 ns
BM_std_min<unsigned char>/18                   6.26 ns         1.77 ns
BM_std_min<unsigned char>/19                   6.64 ns         2.01 ns
BM_std_min<unsigned char>/20                   7.17 ns         2.36 ns
BM_std_min<unsigned char>/21                   7.20 ns         2.77 ns
BM_std_min<unsigned char>/22                   7.65 ns         3.29 ns
BM_std_min<unsigned char>/23                   8.16 ns         3.75 ns
BM_std_min<unsigned char>/24                   8.88 ns         4.31 ns
BM_std_min<unsigned char>/25                   9.10 ns         4.80 ns
BM_std_min<unsigned char>/26                   9.54 ns         5.34 ns
BM_std_min<unsigned char>/27                   9.94 ns         5.82 ns
BM_std_min<unsigned char>/28                   10.4 ns         6.36 ns
BM_std_min<unsigned char>/29                   10.6 ns         6.85 ns
BM_std_min<unsigned char>/30                   11.0 ns         7.32 ns
BM_std_min<unsigned char>/31                   12.4 ns         8.01 ns
BM_std_min<unsigned char>/32                   12.0 ns         8.68 ns
BM_std_min<unsigned char>/64                   27.8 ns         9.23 ns
BM_std_min<unsigned char>/512                   292 ns         14.0 ns
BM_std_min<unsigned char>/1024                  590 ns         16.0 ns
BM_std_min<unsigned char>/4000                 2330 ns         29.6 ns
BM_std_min<unsigned char>/4096                 2377 ns         31.6 ns
BM_std_min<unsigned char>/5500                 3204 ns         37.1 ns
BM_std_min<unsigned char>/64000               37238 ns          481 ns
BM_std_min<unsigned char>/65536               38185 ns          492 ns
BM_std_min<unsigned char>/70000               40794 ns          526 ns
BM_std_min<unsigned short>/1                  0.396 ns        0.377 ns
BM_std_min<unsigned short>/2                  0.792 ns         1.18 ns
BM_std_min<unsigned short>/3                   1.19 ns         1.70 ns
BM_std_min<unsigned short>/4                   1.66 ns         2.19 ns
BM_std_min<unsigned short>/5                   2.01 ns         2.76 ns
BM_std_min<unsigned short>/6                   2.49 ns         3.30 ns
BM_std_min<unsigned short>/7                   2.88 ns         3.81 ns
BM_std_min<unsigned short>/8                   3.23 ns         4.24 ns
BM_std_min<unsigned short>/9                   3.68 ns         1.49 ns
BM_std_min<unsigned short>/10                  4.06 ns         1.76 ns
BM_std_min<unsigned short>/11                  5.20 ns         2.18 ns
BM_std_min<unsigned short>/12                  5.27 ns         2.65 ns
BM_std_min<unsigned short>/13                  6.71 ns         3.16 ns
BM_std_min<unsigned short>/14                  6.34 ns         3.77 ns
BM_std_min<unsigned short>/15                  7.19 ns         4.24 ns
BM_std_min<unsigned short>/16                  7.74 ns         4.76 ns
BM_std_min<unsigned short>/17                  7.89 ns         1.65 ns
BM_std_min<unsigned short>/18                  8.51 ns         1.96 ns
BM_std_min<unsigned short>/19                  7.80 ns         2.41 ns
BM_std_min<unsigned short>/20                  8.60 ns         2.96 ns
BM_std_min<unsigned short>/21                  8.78 ns         3.45 ns
BM_std_min<unsigned short>/22                  9.67 ns         3.99 ns
BM_std_min<unsigned short>/23                  11.1 ns         5.66 ns
BM_std_min<unsigned short>/24                  11.4 ns         4.94 ns
BM_std_min<unsigned short>/25                  12.4 ns         3.02 ns
BM_std_min<unsigned short>/26                  12.3 ns         2.15 ns
BM_std_min<unsigned short>/27                  12.4 ns         2.65 ns
BM_std_min<unsigned short>/28                  13.6 ns         3.10 ns
BM_std_min<unsigned short>/29                  13.4 ns         3.62 ns
BM_std_min<unsigned short>/30                  13.8 ns         4.28 ns
BM_std_min<unsigned short>/31                  14.0 ns         4.74 ns
BM_std_min<unsigned short>/32                  14.4 ns         5.24 ns
BM_std_min<unsigned short>/64                  29.4 ns         7.18 ns
BM_std_min<unsigned short>/512                  318 ns         9.53 ns
BM_std_min<unsigned short>/1024                 641 ns         14.8 ns
BM_std_min<unsigned short>/4000                2576 ns         51.6 ns
BM_std_min<unsigned short>/4096                2466 ns         53.2 ns
BM_std_min<unsigned short>/5500                3543 ns         67.8 ns
BM_std_min<unsigned short>/64000              42742 ns          949 ns
BM_std_min<unsigned short>/65536              44433 ns          971 ns
BM_std_min<unsigned short>/70000              47769 ns         1036 ns
BM_std_min<unsigned int>/1                    0.398 ns        0.352 ns
BM_std_min<unsigned int>/2                    0.791 ns        0.890 ns
BM_std_min<unsigned int>/3                     1.19 ns         1.10 ns
BM_std_min<unsigned int>/4                     1.58 ns         1.34 ns
BM_std_min<unsigned int>/5                     1.98 ns         1.60 ns
BM_std_min<unsigned int>/6                     3.27 ns         1.90 ns
BM_std_min<unsigned int>/7                     2.78 ns         2.26 ns
BM_std_min<unsigned int>/8                     3.18 ns         2.60 ns
BM_std_min<unsigned int>/9                     3.56 ns         2.94 ns
BM_std_min<unsigned int>/10                    3.96 ns         3.26 ns
BM_std_min<unsigned int>/11                    4.35 ns         3.78 ns
BM_std_min<unsigned int>/12                    4.75 ns         3.94 ns
BM_std_min<unsigned int>/13                    5.15 ns         4.32 ns
BM_std_min<unsigned int>/14                    5.54 ns         4.65 ns
BM_std_min<unsigned int>/15                    5.94 ns         5.02 ns
BM_std_min<unsigned int>/16                    6.33 ns         5.32 ns
BM_std_min<unsigned int>/17                    6.73 ns         5.67 ns
BM_std_min<unsigned int>/18                    7.25 ns         6.00 ns
BM_std_min<unsigned int>/19                    7.52 ns         6.38 ns
BM_std_min<unsigned int>/20                    7.92 ns         6.89 ns
BM_std_min<unsigned int>/21                    9.26 ns         7.24 ns
BM_std_min<unsigned int>/22                    9.44 ns         11.0 ns
BM_std_min<unsigned int>/23                    9.11 ns         8.00 ns
BM_std_min<unsigned int>/24                    9.51 ns         8.41 ns
BM_std_min<unsigned int>/25                    9.91 ns         8.80 ns
BM_std_min<unsigned int>/26                    10.3 ns         9.12 ns
BM_std_min<unsigned int>/27                    10.7 ns         9.26 ns
BM_std_min<unsigned int>/28                    11.1 ns         9.82 ns
BM_std_min<unsigned int>/29                    11.6 ns         10.2 ns
BM_std_min<unsigned int>/30                    13.3 ns         10.4 ns
BM_std_min<unsigned int>/31                    13.7 ns         10.9 ns
BM_std_min<unsigned int>/32                    12.8 ns         11.5 ns
BM_std_min<unsigned int>/64                    29.4 ns         14.0 ns
BM_std_min<unsigned int>/512                    342 ns         24.9 ns
BM_std_min<unsigned int>/1024                   700 ns         36.9 ns
BM_std_min<unsigned int>/4000                  2783 ns          109 ns
BM_std_min<unsigned int>/4096                  2851 ns          112 ns
BM_std_min<unsigned int>/5500                  3832 ns          144 ns
BM_std_min<unsigned int>/64000                44828 ns         1895 ns
BM_std_min<unsigned int>/65536                45891 ns         1946 ns
BM_std_min<unsigned int>/70000                49018 ns         2068 ns
BM_std_min<unsigned long long>/1              0.396 ns        0.352 ns
BM_std_min<unsigned long long>/2              0.791 ns        0.824 ns
BM_std_min<unsigned long long>/3               1.21 ns         1.10 ns
BM_std_min<unsigned long long>/4               1.58 ns         1.37 ns
BM_std_min<unsigned long long>/5               1.98 ns         1.61 ns
BM_std_min<unsigned long long>/6               2.37 ns         1.89 ns
BM_std_min<unsigned long long>/7               2.78 ns         2.31 ns
BM_std_min<unsigned long long>/8               3.17 ns         2.81 ns
BM_std_min<unsigned long long>/9               3.56 ns         3.05 ns
BM_std_min<unsigned long long>/10              3.96 ns         3.28 ns
BM_std_min<unsigned long long>/11              4.35 ns         3.60 ns
BM_std_min<unsigned long long>/12              4.75 ns         3.94 ns
BM_std_min<unsigned long long>/13              5.15 ns         4.36 ns
BM_std_min<unsigned long long>/14              5.54 ns         4.65 ns
BM_std_min<unsigned long long>/15              5.94 ns         4.99 ns
BM_std_min<unsigned long long>/16              6.33 ns         5.33 ns
BM_std_min<unsigned long long>/17              6.73 ns         4.11 ns
BM_std_min<unsigned long long>/18              7.12 ns         4.43 ns
BM_std_min<unsigned long long>/19              7.52 ns         4.65 ns
BM_std_min<unsigned long long>/20              7.91 ns         4.96 ns
BM_std_min<unsigned long long>/21              8.31 ns         5.26 ns
BM_std_min<unsigned long long>/22              8.71 ns         5.56 ns
BM_std_min<unsigned long long>/23              9.57 ns         5.93 ns
BM_std_min<unsigned long long>/24              10.1 ns         6.35 ns
BM_std_min<unsigned long long>/25              9.94 ns         6.70 ns
BM_std_min<unsigned long long>/26              10.4 ns         7.02 ns
BM_std_min<unsigned long long>/27              15.8 ns         7.62 ns
BM_std_min<unsigned long long>/28              16.1 ns         8.23 ns
BM_std_min<unsigned long long>/29              11.6 ns         8.60 ns
BM_std_min<unsigned long long>/30              12.0 ns         8.90 ns
BM_std_min<unsigned long long>/31              17.5 ns         9.47 ns
BM_std_min<unsigned long long>/32              12.9 ns         9.95 ns
BM_std_min<unsigned long long>/64              30.8 ns         11.8 ns
BM_std_min<unsigned long long>/512              340 ns         54.4 ns
BM_std_min<unsigned long long>/1024             698 ns          100 ns
BM_std_min<unsigned long long>/4000            2774 ns          370 ns
BM_std_min<unsigned long long>/4096            2842 ns          378 ns
BM_std_min<unsigned long long>/5500            3822 ns          512 ns
BM_std_min<unsigned long long>/64000          44665 ns         6236 ns
BM_std_min<unsigned long long>/65536          45744 ns         6399 ns
BM_std_min<unsigned long long>/70000          48850 ns         7008 ns
BM_std_min<unsigned __int128>/1               0.397 ns        0.469 ns
BM_std_min<unsigned __int128>/2                1.32 ns         1.20 ns
BM_std_min<unsigned __int128>/3                2.24 ns         1.98 ns
BM_std_min<unsigned __int128>/4                3.17 ns         2.71 ns
BM_std_min<unsigned __int128>/5                4.13 ns         3.46 ns
BM_std_min<unsigned __int128>/6                5.02 ns         4.18 ns
BM_std_min<unsigned __int128>/7                5.97 ns         4.88 ns
BM_std_min<unsigned __int128>/8                6.94 ns         5.57 ns
BM_std_min<unsigned __int128>/9                7.89 ns         6.25 ns
BM_std_min<unsigned __int128>/10               8.92 ns         6.97 ns
BM_std_min<unsigned __int128>/11               9.90 ns         7.74 ns
BM_std_min<unsigned __int128>/12               10.9 ns         8.46 ns
BM_std_min<unsigned __int128>/13               11.9 ns         9.19 ns
BM_std_min<unsigned __int128>/14               12.9 ns         9.92 ns
BM_std_min<unsigned __int128>/15               14.0 ns         10.6 ns
BM_std_min<unsigned __int128>/16               15.0 ns         11.3 ns
BM_std_min<unsigned __int128>/17               16.0 ns         12.0 ns
BM_std_min<unsigned __int128>/18               27.9 ns         12.8 ns
BM_std_min<unsigned __int128>/19               18.0 ns         13.3 ns
BM_std_min<unsigned __int128>/20               19.0 ns         14.1 ns
BM_std_min<unsigned __int128>/21               20.0 ns         14.8 ns
BM_std_min<unsigned __int128>/22               21.0 ns         15.5 ns
BM_std_min<unsigned __int128>/23               22.0 ns         16.4 ns
BM_std_min<unsigned __int128>/24               23.0 ns         16.9 ns
BM_std_min<unsigned __int128>/25               24.1 ns         17.6 ns
BM_std_min<unsigned __int128>/26               25.1 ns         18.3 ns
BM_std_min<unsigned __int128>/27               26.2 ns         19.0 ns
BM_std_min<unsigned __int128>/28               27.2 ns         19.7 ns
BM_std_min<unsigned __int128>/29               28.3 ns         20.4 ns
BM_std_min<unsigned __int128>/30               29.4 ns         21.1 ns
BM_std_min<unsigned __int128>/31               41.0 ns         21.9 ns
BM_std_min<unsigned __int128>/32               31.5 ns         22.6 ns
BM_std_min<unsigned __int128>/64               65.6 ns         45.2 ns
BM_std_min<unsigned __int128>/512               538 ns          371 ns
BM_std_min<unsigned __int128>/1024             1089 ns          742 ns
BM_std_min<unsigned __int128>/4000             4261 ns         2889 ns
BM_std_min<unsigned __int128>/4096             4407 ns         2950 ns
BM_std_min<unsigned __int128>/5500             5920 ns         3956 ns
BM_std_min<unsigned __int128>/64000           69070 ns        46258 ns
BM_std_min<unsigned __int128>/65536           70673 ns        47416 ns
BM_std_min<unsigned __int128>/70000           75566 ns        50518 ns
philnik updated this revision to Diff 497082.Feb 13 2023, 1:15 PM

Simplify code

philnik retitled this revision from [libc++] Optmize std::ranges::min for integral types to [libc++] Optmize std::ranges::min for types that are cheap to copy.Feb 13 2023, 1:20 PM
philnik updated this revision to Diff 501239.Feb 28 2023, 11:32 AM

Try to fix CI

philnik published this revision for review.Feb 28 2023, 7:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2023, 7:38 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
ldionne requested changes to this revision.Mar 6 2023, 8:42 AM
ldionne added a subscriber: ldionne.

I think this is an interesting change, thanks for noticing! I have a few comments.

libcxx/benchmarks/CMakeLists.txt
103

Can you record what's the goal of this patch in the commit message + Phab description?

106

Commit message typo: Optmize

libcxx/include/__algorithm/ranges_max.h
1

Separate patch: We should get rid of the // TODO(ranges): ranges::min_element can now simply delegate to std::__min_element todo comment.

12

That seems like a leftover.

71

Can you please make sure that both ranges::min and ranges::max have tests that call the algorithm on values that would be cheap to copy and non-cheap to copy? And also with iterators that would allow copying and disallow copying? We probably already do, but if we don't, let's make sure that we are testing this code path.

libcxx/include/__algorithm/ranges_min.h
13

This doesn't look used.

66–67
// different header?
template <class _Tp>
constexpr bool __is_cheap_to_copy = is_trivially_copyable_v<_Tp> && sizeof(_Tp) <= sizeof(std::intmax_t) /* strawman proposal */;

// here
if constexpr (!__is_cheap_to_copy<range_value_t<_Rp>> && forward_range<_Rp>)
  use-min_element();
else
  use-the-naive-loop();

IMO this is easier to read and more self-explanatory.

This revision now requires changes to proceed.Mar 6 2023, 8:42 AM
philnik retitled this revision from [libc++] Optmize std::ranges::min for types that are cheap to copy to [libc++] Optimize std::ranges::min for types that are cheap to copy.Mar 7 2023, 11:02 AM
philnik edited the summary of this revision. (Show Details)
philnik retitled this revision from [libc++] Optimize std::ranges::min for types that are cheap to copy to [libc++] Optimize std::ranges::{min, max} for types that are cheap to copy.Mar 7 2023, 11:18 AM
philnik updated this revision to Diff 503341.Mar 8 2023, 6:05 AM
philnik marked 6 inline comments as done.

Address comments

philnik updated this revision to Diff 503352.Mar 8 2023, 6:29 AM

Try to fix CI

ldionne accepted this revision.Mar 9 2023, 8:30 AM

Don't forward to min_element for small types that are trivially copyable. This allows the compiler to vectorize the loop in some cases.

I think this would be clearer:

Don't forward to min_element for small types that are trivially copyable, and instead use a naive loop that keeps track of the smallest element (as opposed to an iterator to the smallest element). This allows the compiler to vectorize the loop in some cases.

This revision is now accepted and ready to land.Mar 9 2023, 8:30 AM
philnik edited the summary of this revision. (Show Details)Mar 9 2023, 1:30 PM
philnik updated this revision to Diff 503921.Mar 9 2023, 1:36 PM

Try to fix CI

philnik updated this revision to Diff 504060.Mar 10 2023, 2:06 AM

Try to fix CI