Don't forward to min_element for small types that are trivially copyable, and instead use a naive loop that keeps track of the smallest element (as opposed to an iterator to the smallest element). This allows the compiler to vectorize the loop in some cases.
Details
- Reviewers
ldionne - Group Reviewers
Restricted Project - Commits
- rGaff3cdc604b0: [libc++] Optimize std::ranges::{min, max} for types that are cheap to copy
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Benchmark results:
---------------------------------------------------------------------- Benchmark old new ---------------------------------------------------------------------- BM_std_min<char>/1 0.396 ns 0.353 ns BM_std_min<char>/2 1.06 ns 0.941 ns BM_std_min<char>/3 1.46 ns 1.52 ns BM_std_min<char>/4 1.94 ns 2.00 ns BM_std_min<char>/5 1.88 ns 2.47 ns BM_std_min<char>/6 2.24 ns 3.10 ns BM_std_min<char>/7 2.64 ns 3.57 ns BM_std_min<char>/8 3.04 ns 4.39 ns BM_std_min<char>/9 3.04 ns 5.01 ns BM_std_min<char>/10 3.44 ns 5.11 ns BM_std_min<char>/11 3.86 ns 6.76 ns BM_std_min<char>/12 4.30 ns 6.26 ns BM_std_min<char>/13 4.47 ns 6.94 ns BM_std_min<char>/14 4.70 ns 7.58 ns BM_std_min<char>/15 5.30 ns 8.34 ns BM_std_min<char>/16 5.69 ns 9.23 ns BM_std_min<char>/17 5.89 ns 1.41 ns BM_std_min<char>/18 6.25 ns 1.88 ns BM_std_min<char>/19 6.68 ns 2.40 ns BM_std_min<char>/20 7.19 ns 2.86 ns BM_std_min<char>/21 7.26 ns 3.44 ns BM_std_min<char>/22 7.64 ns 3.96 ns BM_std_min<char>/23 8.16 ns 4.65 ns BM_std_min<char>/24 8.91 ns 5.32 ns BM_std_min<char>/25 9.13 ns 5.91 ns BM_std_min<char>/26 9.55 ns 6.62 ns BM_std_min<char>/27 9.95 ns 7.30 ns BM_std_min<char>/28 10.5 ns 7.92 ns BM_std_min<char>/29 10.7 ns 8.77 ns BM_std_min<char>/30 11.1 ns 9.69 ns BM_std_min<char>/31 11.5 ns 10.6 ns BM_std_min<char>/32 12.0 ns 11.3 ns BM_std_min<char>/64 27.7 ns 11.9 ns BM_std_min<char>/512 290 ns 17.8 ns BM_std_min<char>/1024 592 ns 20.9 ns BM_std_min<char>/4000 2326 ns 37.1 ns BM_std_min<char>/4096 2368 ns 38.1 ns BM_std_min<char>/5500 3200 ns 38.7 ns BM_std_min<char>/64000 37334 ns 487 ns BM_std_min<char>/65536 38101 ns 498 ns BM_std_min<char>/70000 40719 ns 530 ns BM_std_min<short>/1 0.396 ns 0.472 ns BM_std_min<short>/2 0.792 ns 0.942 ns BM_std_min<short>/3 1.19 ns 1.27 ns BM_std_min<short>/4 1.58 ns 1.60 ns BM_std_min<short>/5 1.98 ns 1.97 ns BM_std_min<short>/6 2.38 ns 2.60 ns BM_std_min<short>/7 2.78 ns 2.83 ns BM_std_min<short>/8 3.17 ns 3.52 ns BM_std_min<short>/9 3.57 ns 2.09 ns BM_std_min<short>/10 3.96 ns 2.13 ns BM_std_min<short>/11 4.35 ns 2.37 ns BM_std_min<short>/12 4.75 ns 2.68 ns BM_std_min<short>/13 5.16 ns 3.13 ns BM_std_min<short>/14 5.54 ns 3.70 ns BM_std_min<short>/15 5.94 ns 4.31 ns BM_std_min<short>/16 6.33 ns 4.70 ns BM_std_min<short>/17 6.73 ns 1.90 ns BM_std_min<short>/18 7.13 ns 2.13 ns BM_std_min<short>/19 7.52 ns 2.41 ns BM_std_min<short>/20 7.92 ns 2.81 ns BM_std_min<short>/21 8.31 ns 3.32 ns BM_std_min<short>/22 8.72 ns 3.81 ns BM_std_min<short>/23 9.15 ns 4.33 ns BM_std_min<short>/24 9.57 ns 4.82 ns BM_std_min<short>/25 9.98 ns 2.14 ns BM_std_min<short>/26 10.4 ns 2.36 ns BM_std_min<short>/27 10.8 ns 2.63 ns BM_std_min<short>/28 11.3 ns 2.99 ns BM_std_min<short>/29 11.7 ns 3.47 ns BM_std_min<short>/30 12.1 ns 3.97 ns BM_std_min<short>/31 13.2 ns 4.52 ns BM_std_min<short>/32 13.0 ns 5.12 ns BM_std_min<short>/64 28.4 ns 5.86 ns BM_std_min<short>/512 341 ns 11.2 ns BM_std_min<short>/1024 699 ns 15.8 ns BM_std_min<short>/4000 2790 ns 51.1 ns BM_std_min<short>/4096 2858 ns 52.2 ns BM_std_min<short>/5500 3847 ns 69.2 ns BM_std_min<short>/64000 45064 ns 947 ns BM_std_min<short>/65536 46449 ns 970 ns BM_std_min<short>/70000 49629 ns 1034 ns BM_std_min<int>/1 0.398 ns 0.352 ns BM_std_min<int>/2 0.798 ns 0.939 ns BM_std_min<int>/3 1.21 ns 1.23 ns BM_std_min<int>/4 1.58 ns 1.56 ns BM_std_min<int>/5 1.98 ns 1.88 ns BM_std_min<int>/6 2.38 ns 2.31 ns BM_std_min<int>/7 3.37 ns 2.88 ns BM_std_min<int>/8 3.17 ns 3.47 ns BM_std_min<int>/9 3.56 ns 3.66 ns BM_std_min<int>/10 3.96 ns 4.12 ns BM_std_min<int>/11 4.43 ns 5.09 ns BM_std_min<int>/12 4.75 ns 5.05 ns BM_std_min<int>/13 5.15 ns 5.45 ns BM_std_min<int>/14 5.54 ns 5.93 ns BM_std_min<int>/15 5.94 ns 6.46 ns BM_std_min<int>/16 6.33 ns 6.89 ns BM_std_min<int>/17 6.73 ns 7.50 ns BM_std_min<int>/18 7.59 ns 8.18 ns BM_std_min<int>/19 7.52 ns 8.73 ns BM_std_min<int>/20 7.92 ns 9.38 ns BM_std_min<int>/21 8.31 ns 9.98 ns BM_std_min<int>/22 9.27 ns 10.4 ns BM_std_min<int>/23 9.11 ns 10.9 ns BM_std_min<int>/24 9.51 ns 11.4 ns BM_std_min<int>/25 9.91 ns 12.0 ns BM_std_min<int>/26 10.3 ns 12.5 ns BM_std_min<int>/27 10.7 ns 13.0 ns BM_std_min<int>/28 11.1 ns 13.4 ns BM_std_min<int>/29 11.5 ns 14.1 ns BM_std_min<int>/30 11.9 ns 14.7 ns BM_std_min<int>/31 13.2 ns 15.0 ns BM_std_min<int>/32 12.8 ns 15.6 ns BM_std_min<int>/64 28.9 ns 19.4 ns BM_std_min<int>/512 342 ns 29.2 ns BM_std_min<int>/1024 699 ns 43.6 ns BM_std_min<int>/4000 2785 ns 114 ns BM_std_min<int>/4096 2849 ns 118 ns BM_std_min<int>/5500 3832 ns 147 ns BM_std_min<int>/64000 44817 ns 1898 ns BM_std_min<int>/65536 45893 ns 1949 ns BM_std_min<int>/70000 49026 ns 2068 ns BM_std_min<long long>/1 0.396 ns 0.353 ns BM_std_min<long long>/2 0.791 ns 0.938 ns BM_std_min<long long>/3 1.19 ns 1.17 ns BM_std_min<long long>/4 1.59 ns 2.33 ns BM_std_min<long long>/5 2.12 ns 1.87 ns BM_std_min<long long>/6 3.06 ns 2.33 ns BM_std_min<long long>/7 2.96 ns 2.73 ns BM_std_min<long long>/8 3.19 ns 3.14 ns BM_std_min<long long>/9 3.58 ns 3.61 ns BM_std_min<long long>/10 3.97 ns 4.10 ns BM_std_min<long long>/11 4.35 ns 4.54 ns BM_std_min<long long>/12 4.75 ns 4.96 ns BM_std_min<long long>/13 5.15 ns 5.43 ns BM_std_min<long long>/14 5.54 ns 5.90 ns BM_std_min<long long>/15 5.94 ns 6.46 ns BM_std_min<long long>/16 6.33 ns 6.91 ns BM_std_min<long long>/17 6.73 ns 2.70 ns BM_std_min<long long>/18 7.12 ns 3.13 ns BM_std_min<long long>/19 7.52 ns 3.47 ns BM_std_min<long long>/20 7.92 ns 3.85 ns BM_std_min<long long>/21 8.34 ns 4.26 ns BM_std_min<long long>/22 8.72 ns 4.86 ns BM_std_min<long long>/23 9.14 ns 5.53 ns BM_std_min<long long>/24 9.57 ns 6.18 ns BM_std_min<long long>/25 9.96 ns 6.68 ns BM_std_min<long long>/26 10.4 ns 7.24 ns BM_std_min<long long>/27 10.8 ns 7.74 ns BM_std_min<long long>/28 11.2 ns 8.37 ns BM_std_min<long long>/29 11.6 ns 9.10 ns BM_std_min<long long>/30 12.1 ns 9.74 ns BM_std_min<long long>/31 13.3 ns 10.5 ns BM_std_min<long long>/32 13.0 ns 11.2 ns BM_std_min<long long>/64 28.4 ns 12.6 ns BM_std_min<long long>/512 342 ns 43.0 ns BM_std_min<long long>/1024 699 ns 76.5 ns BM_std_min<long long>/4000 2776 ns 270 ns BM_std_min<long long>/4096 2844 ns 278 ns BM_std_min<long long>/5500 3823 ns 374 ns BM_std_min<long long>/64000 44663 ns 4872 ns BM_std_min<long long>/65536 45742 ns 4985 ns BM_std_min<long long>/70000 48852 ns 5476 ns BM_std_min<__int128>/1 0.396 ns 0.471 ns BM_std_min<__int128>/2 1.32 ns 1.20 ns BM_std_min<__int128>/3 2.24 ns 2.01 ns BM_std_min<__int128>/4 3.17 ns 2.77 ns BM_std_min<__int128>/5 4.11 ns 3.57 ns BM_std_min<__int128>/6 5.02 ns 4.46 ns BM_std_min<__int128>/7 5.96 ns 5.40 ns BM_std_min<__int128>/8 6.93 ns 6.30 ns BM_std_min<__int128>/9 7.90 ns 7.24 ns BM_std_min<__int128>/10 8.84 ns 8.22 ns BM_std_min<__int128>/11 9.85 ns 9.22 ns BM_std_min<__int128>/12 10.8 ns 10.2 ns BM_std_min<__int128>/13 11.8 ns 11.2 ns BM_std_min<__int128>/14 12.8 ns 12.2 ns BM_std_min<__int128>/15 13.9 ns 13.2 ns BM_std_min<__int128>/16 14.8 ns 14.2 ns BM_std_min<__int128>/17 15.9 ns 15.3 ns BM_std_min<__int128>/18 24.1 ns 16.4 ns BM_std_min<__int128>/19 17.8 ns 17.4 ns BM_std_min<__int128>/20 18.8 ns 18.5 ns BM_std_min<__int128>/21 19.7 ns 19.5 ns BM_std_min<__int128>/22 20.8 ns 20.6 ns BM_std_min<__int128>/23 21.7 ns 21.7 ns BM_std_min<__int128>/24 22.9 ns 22.7 ns BM_std_min<__int128>/25 23.7 ns 23.8 ns BM_std_min<__int128>/26 24.7 ns 24.8 ns BM_std_min<__int128>/27 25.7 ns 25.9 ns BM_std_min<__int128>/28 26.8 ns 27.0 ns BM_std_min<__int128>/29 27.8 ns 28.0 ns BM_std_min<__int128>/30 28.6 ns 29.1 ns BM_std_min<__int128>/31 32.9 ns 30.1 ns BM_std_min<__int128>/32 30.8 ns 31.1 ns BM_std_min<__int128>/64 63.6 ns 64.7 ns BM_std_min<__int128>/512 538 ns 542 ns BM_std_min<__int128>/1024 1082 ns 1084 ns BM_std_min<__int128>/4000 4244 ns 4242 ns BM_std_min<__int128>/4096 4353 ns 4326 ns BM_std_min<__int128>/5500 5841 ns 5823 ns BM_std_min<__int128>/64000 67872 ns 68063 ns BM_std_min<__int128>/65536 70434 ns 69720 ns BM_std_min<__int128>/70000 74210 ns 74484 ns BM_std_min<unsigned char>/1 0.396 ns 0.354 ns BM_std_min<unsigned char>/2 1.06 ns 0.953 ns BM_std_min<unsigned char>/3 1.45 ns 2.08 ns BM_std_min<unsigned char>/4 1.86 ns 1.56 ns BM_std_min<unsigned char>/5 1.88 ns 1.93 ns BM_std_min<unsigned char>/6 2.24 ns 2.34 ns BM_std_min<unsigned char>/7 2.64 ns 2.76 ns BM_std_min<unsigned char>/8 3.04 ns 3.23 ns BM_std_min<unsigned char>/9 3.04 ns 3.67 ns BM_std_min<unsigned char>/10 3.44 ns 4.15 ns BM_std_min<unsigned char>/11 3.85 ns 4.59 ns BM_std_min<unsigned char>/12 4.30 ns 5.09 ns BM_std_min<unsigned char>/13 4.44 ns 5.59 ns BM_std_min<unsigned char>/14 4.71 ns 6.12 ns BM_std_min<unsigned char>/15 5.29 ns 6.57 ns BM_std_min<unsigned char>/16 5.69 ns 7.10 ns BM_std_min<unsigned char>/17 5.85 ns 1.42 ns BM_std_min<unsigned char>/18 6.26 ns 1.77 ns BM_std_min<unsigned char>/19 6.64 ns 2.01 ns BM_std_min<unsigned char>/20 7.17 ns 2.36 ns BM_std_min<unsigned char>/21 7.20 ns 2.77 ns BM_std_min<unsigned char>/22 7.65 ns 3.29 ns BM_std_min<unsigned char>/23 8.16 ns 3.75 ns BM_std_min<unsigned char>/24 8.88 ns 4.31 ns BM_std_min<unsigned char>/25 9.10 ns 4.80 ns BM_std_min<unsigned char>/26 9.54 ns 5.34 ns BM_std_min<unsigned char>/27 9.94 ns 5.82 ns BM_std_min<unsigned char>/28 10.4 ns 6.36 ns BM_std_min<unsigned char>/29 10.6 ns 6.85 ns BM_std_min<unsigned char>/30 11.0 ns 7.32 ns BM_std_min<unsigned char>/31 12.4 ns 8.01 ns BM_std_min<unsigned char>/32 12.0 ns 8.68 ns BM_std_min<unsigned char>/64 27.8 ns 9.23 ns BM_std_min<unsigned char>/512 292 ns 14.0 ns BM_std_min<unsigned char>/1024 590 ns 16.0 ns BM_std_min<unsigned char>/4000 2330 ns 29.6 ns BM_std_min<unsigned char>/4096 2377 ns 31.6 ns BM_std_min<unsigned char>/5500 3204 ns 37.1 ns BM_std_min<unsigned char>/64000 37238 ns 481 ns BM_std_min<unsigned char>/65536 38185 ns 492 ns BM_std_min<unsigned char>/70000 40794 ns 526 ns BM_std_min<unsigned short>/1 0.396 ns 0.377 ns BM_std_min<unsigned short>/2 0.792 ns 1.18 ns BM_std_min<unsigned short>/3 1.19 ns 1.70 ns BM_std_min<unsigned short>/4 1.66 ns 2.19 ns BM_std_min<unsigned short>/5 2.01 ns 2.76 ns BM_std_min<unsigned short>/6 2.49 ns 3.30 ns BM_std_min<unsigned short>/7 2.88 ns 3.81 ns BM_std_min<unsigned short>/8 3.23 ns 4.24 ns BM_std_min<unsigned short>/9 3.68 ns 1.49 ns BM_std_min<unsigned short>/10 4.06 ns 1.76 ns BM_std_min<unsigned short>/11 5.20 ns 2.18 ns BM_std_min<unsigned short>/12 5.27 ns 2.65 ns BM_std_min<unsigned short>/13 6.71 ns 3.16 ns BM_std_min<unsigned short>/14 6.34 ns 3.77 ns BM_std_min<unsigned short>/15 7.19 ns 4.24 ns BM_std_min<unsigned short>/16 7.74 ns 4.76 ns BM_std_min<unsigned short>/17 7.89 ns 1.65 ns BM_std_min<unsigned short>/18 8.51 ns 1.96 ns BM_std_min<unsigned short>/19 7.80 ns 2.41 ns BM_std_min<unsigned short>/20 8.60 ns 2.96 ns BM_std_min<unsigned short>/21 8.78 ns 3.45 ns BM_std_min<unsigned short>/22 9.67 ns 3.99 ns BM_std_min<unsigned short>/23 11.1 ns 5.66 ns BM_std_min<unsigned short>/24 11.4 ns 4.94 ns BM_std_min<unsigned short>/25 12.4 ns 3.02 ns BM_std_min<unsigned short>/26 12.3 ns 2.15 ns BM_std_min<unsigned short>/27 12.4 ns 2.65 ns BM_std_min<unsigned short>/28 13.6 ns 3.10 ns BM_std_min<unsigned short>/29 13.4 ns 3.62 ns BM_std_min<unsigned short>/30 13.8 ns 4.28 ns BM_std_min<unsigned short>/31 14.0 ns 4.74 ns BM_std_min<unsigned short>/32 14.4 ns 5.24 ns BM_std_min<unsigned short>/64 29.4 ns 7.18 ns BM_std_min<unsigned short>/512 318 ns 9.53 ns BM_std_min<unsigned short>/1024 641 ns 14.8 ns BM_std_min<unsigned short>/4000 2576 ns 51.6 ns BM_std_min<unsigned short>/4096 2466 ns 53.2 ns BM_std_min<unsigned short>/5500 3543 ns 67.8 ns BM_std_min<unsigned short>/64000 42742 ns 949 ns BM_std_min<unsigned short>/65536 44433 ns 971 ns BM_std_min<unsigned short>/70000 47769 ns 1036 ns BM_std_min<unsigned int>/1 0.398 ns 0.352 ns BM_std_min<unsigned int>/2 0.791 ns 0.890 ns BM_std_min<unsigned int>/3 1.19 ns 1.10 ns BM_std_min<unsigned int>/4 1.58 ns 1.34 ns BM_std_min<unsigned int>/5 1.98 ns 1.60 ns BM_std_min<unsigned int>/6 3.27 ns 1.90 ns BM_std_min<unsigned int>/7 2.78 ns 2.26 ns BM_std_min<unsigned int>/8 3.18 ns 2.60 ns BM_std_min<unsigned int>/9 3.56 ns 2.94 ns BM_std_min<unsigned int>/10 3.96 ns 3.26 ns BM_std_min<unsigned int>/11 4.35 ns 3.78 ns BM_std_min<unsigned int>/12 4.75 ns 3.94 ns BM_std_min<unsigned int>/13 5.15 ns 4.32 ns BM_std_min<unsigned int>/14 5.54 ns 4.65 ns BM_std_min<unsigned int>/15 5.94 ns 5.02 ns BM_std_min<unsigned int>/16 6.33 ns 5.32 ns BM_std_min<unsigned int>/17 6.73 ns 5.67 ns BM_std_min<unsigned int>/18 7.25 ns 6.00 ns BM_std_min<unsigned int>/19 7.52 ns 6.38 ns BM_std_min<unsigned int>/20 7.92 ns 6.89 ns BM_std_min<unsigned int>/21 9.26 ns 7.24 ns BM_std_min<unsigned int>/22 9.44 ns 11.0 ns BM_std_min<unsigned int>/23 9.11 ns 8.00 ns BM_std_min<unsigned int>/24 9.51 ns 8.41 ns BM_std_min<unsigned int>/25 9.91 ns 8.80 ns BM_std_min<unsigned int>/26 10.3 ns 9.12 ns BM_std_min<unsigned int>/27 10.7 ns 9.26 ns BM_std_min<unsigned int>/28 11.1 ns 9.82 ns BM_std_min<unsigned int>/29 11.6 ns 10.2 ns BM_std_min<unsigned int>/30 13.3 ns 10.4 ns BM_std_min<unsigned int>/31 13.7 ns 10.9 ns BM_std_min<unsigned int>/32 12.8 ns 11.5 ns BM_std_min<unsigned int>/64 29.4 ns 14.0 ns BM_std_min<unsigned int>/512 342 ns 24.9 ns BM_std_min<unsigned int>/1024 700 ns 36.9 ns BM_std_min<unsigned int>/4000 2783 ns 109 ns BM_std_min<unsigned int>/4096 2851 ns 112 ns BM_std_min<unsigned int>/5500 3832 ns 144 ns BM_std_min<unsigned int>/64000 44828 ns 1895 ns BM_std_min<unsigned int>/65536 45891 ns 1946 ns BM_std_min<unsigned int>/70000 49018 ns 2068 ns BM_std_min<unsigned long long>/1 0.396 ns 0.352 ns BM_std_min<unsigned long long>/2 0.791 ns 0.824 ns BM_std_min<unsigned long long>/3 1.21 ns 1.10 ns BM_std_min<unsigned long long>/4 1.58 ns 1.37 ns BM_std_min<unsigned long long>/5 1.98 ns 1.61 ns BM_std_min<unsigned long long>/6 2.37 ns 1.89 ns BM_std_min<unsigned long long>/7 2.78 ns 2.31 ns BM_std_min<unsigned long long>/8 3.17 ns 2.81 ns BM_std_min<unsigned long long>/9 3.56 ns 3.05 ns BM_std_min<unsigned long long>/10 3.96 ns 3.28 ns BM_std_min<unsigned long long>/11 4.35 ns 3.60 ns BM_std_min<unsigned long long>/12 4.75 ns 3.94 ns BM_std_min<unsigned long long>/13 5.15 ns 4.36 ns BM_std_min<unsigned long long>/14 5.54 ns 4.65 ns BM_std_min<unsigned long long>/15 5.94 ns 4.99 ns BM_std_min<unsigned long long>/16 6.33 ns 5.33 ns BM_std_min<unsigned long long>/17 6.73 ns 4.11 ns BM_std_min<unsigned long long>/18 7.12 ns 4.43 ns BM_std_min<unsigned long long>/19 7.52 ns 4.65 ns BM_std_min<unsigned long long>/20 7.91 ns 4.96 ns BM_std_min<unsigned long long>/21 8.31 ns 5.26 ns BM_std_min<unsigned long long>/22 8.71 ns 5.56 ns BM_std_min<unsigned long long>/23 9.57 ns 5.93 ns BM_std_min<unsigned long long>/24 10.1 ns 6.35 ns BM_std_min<unsigned long long>/25 9.94 ns 6.70 ns BM_std_min<unsigned long long>/26 10.4 ns 7.02 ns BM_std_min<unsigned long long>/27 15.8 ns 7.62 ns BM_std_min<unsigned long long>/28 16.1 ns 8.23 ns BM_std_min<unsigned long long>/29 11.6 ns 8.60 ns BM_std_min<unsigned long long>/30 12.0 ns 8.90 ns BM_std_min<unsigned long long>/31 17.5 ns 9.47 ns BM_std_min<unsigned long long>/32 12.9 ns 9.95 ns BM_std_min<unsigned long long>/64 30.8 ns 11.8 ns BM_std_min<unsigned long long>/512 340 ns 54.4 ns BM_std_min<unsigned long long>/1024 698 ns 100 ns BM_std_min<unsigned long long>/4000 2774 ns 370 ns BM_std_min<unsigned long long>/4096 2842 ns 378 ns BM_std_min<unsigned long long>/5500 3822 ns 512 ns BM_std_min<unsigned long long>/64000 44665 ns 6236 ns BM_std_min<unsigned long long>/65536 45744 ns 6399 ns BM_std_min<unsigned long long>/70000 48850 ns 7008 ns BM_std_min<unsigned __int128>/1 0.397 ns 0.469 ns BM_std_min<unsigned __int128>/2 1.32 ns 1.20 ns BM_std_min<unsigned __int128>/3 2.24 ns 1.98 ns BM_std_min<unsigned __int128>/4 3.17 ns 2.71 ns BM_std_min<unsigned __int128>/5 4.13 ns 3.46 ns BM_std_min<unsigned __int128>/6 5.02 ns 4.18 ns BM_std_min<unsigned __int128>/7 5.97 ns 4.88 ns BM_std_min<unsigned __int128>/8 6.94 ns 5.57 ns BM_std_min<unsigned __int128>/9 7.89 ns 6.25 ns BM_std_min<unsigned __int128>/10 8.92 ns 6.97 ns BM_std_min<unsigned __int128>/11 9.90 ns 7.74 ns BM_std_min<unsigned __int128>/12 10.9 ns 8.46 ns BM_std_min<unsigned __int128>/13 11.9 ns 9.19 ns BM_std_min<unsigned __int128>/14 12.9 ns 9.92 ns BM_std_min<unsigned __int128>/15 14.0 ns 10.6 ns BM_std_min<unsigned __int128>/16 15.0 ns 11.3 ns BM_std_min<unsigned __int128>/17 16.0 ns 12.0 ns BM_std_min<unsigned __int128>/18 27.9 ns 12.8 ns BM_std_min<unsigned __int128>/19 18.0 ns 13.3 ns BM_std_min<unsigned __int128>/20 19.0 ns 14.1 ns BM_std_min<unsigned __int128>/21 20.0 ns 14.8 ns BM_std_min<unsigned __int128>/22 21.0 ns 15.5 ns BM_std_min<unsigned __int128>/23 22.0 ns 16.4 ns BM_std_min<unsigned __int128>/24 23.0 ns 16.9 ns BM_std_min<unsigned __int128>/25 24.1 ns 17.6 ns BM_std_min<unsigned __int128>/26 25.1 ns 18.3 ns BM_std_min<unsigned __int128>/27 26.2 ns 19.0 ns BM_std_min<unsigned __int128>/28 27.2 ns 19.7 ns BM_std_min<unsigned __int128>/29 28.3 ns 20.4 ns BM_std_min<unsigned __int128>/30 29.4 ns 21.1 ns BM_std_min<unsigned __int128>/31 41.0 ns 21.9 ns BM_std_min<unsigned __int128>/32 31.5 ns 22.6 ns BM_std_min<unsigned __int128>/64 65.6 ns 45.2 ns BM_std_min<unsigned __int128>/512 538 ns 371 ns BM_std_min<unsigned __int128>/1024 1089 ns 742 ns BM_std_min<unsigned __int128>/4000 4261 ns 2889 ns BM_std_min<unsigned __int128>/4096 4407 ns 2950 ns BM_std_min<unsigned __int128>/5500 5920 ns 3956 ns BM_std_min<unsigned __int128>/64000 69070 ns 46258 ns BM_std_min<unsigned __int128>/65536 70673 ns 47416 ns BM_std_min<unsigned __int128>/70000 75566 ns 50518 ns
I think this is an interesting change, thanks for noticing! I have a few comments.
libcxx/benchmarks/CMakeLists.txt | ||
---|---|---|
103 | Can you record what's the goal of this patch in the commit message + Phab description? | |
106 | Commit message typo: Optmize | |
libcxx/include/__algorithm/ranges_max.h | ||
1 | Separate patch: We should get rid of the // TODO(ranges): ranges::min_element can now simply delegate to std::__min_element todo comment. | |
12 | That seems like a leftover. | |
71 | Can you please make sure that both ranges::min and ranges::max have tests that call the algorithm on values that would be cheap to copy and non-cheap to copy? And also with iterators that would allow copying and disallow copying? We probably already do, but if we don't, let's make sure that we are testing this code path. | |
libcxx/include/__algorithm/ranges_min.h | ||
13 | This doesn't look used. | |
66–67 | // different header? template <class _Tp> constexpr bool __is_cheap_to_copy = is_trivially_copyable_v<_Tp> && sizeof(_Tp) <= sizeof(std::intmax_t) /* strawman proposal */; // here if constexpr (!__is_cheap_to_copy<range_value_t<_Rp>> && forward_range<_Rp>) use-min_element(); else use-the-naive-loop(); IMO this is easier to read and more self-explanatory. |
Don't forward to min_element for small types that are trivially copyable. This allows the compiler to vectorize the loop in some cases.
I think this would be clearer:
Don't forward to min_element for small types that are trivially copyable, and instead use a naive loop that keeps track of the smallest element (as opposed to an iterator to the smallest element). This allows the compiler to vectorize the loop in some cases.
Can you record what's the goal of this patch in the commit message + Phab description?