Current implementation of nth_element will be quadratic under some cases, use introselect to avoid quadratic behavior.
I also add a benchmark for nth_element.
related issue https://github.com/llvm/llvm-project/issues/52747
Differential D141205
[libcxx] nth_element use introselect to avoid quadratic behavior Backl1ght on Jan 7 2023, 12:00 PM. Authored by
Details Current implementation of nth_element will be quadratic under some cases, use introselect to avoid quadratic behavior. I also add a benchmark for nth_element. related issue https://github.com/llvm/llvm-project/issues/52747
Diff Detail
Unit Tests Event TimelineComment Actions Here is the benchmark result for this patch. ------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------- BM_NthElement_uint32_QuickSelectAdversary_4 7.97 ns 7.96 ns 87818240 BM_NthElement_uint32_QuickSelectAdversary_16 64.9 ns 64.8 ns 10928128 BM_NthElement_uint32_QuickSelectAdversary_64 712 ns 711 ns 991232 BM_NthElement_uint32_QuickSelectAdversary_256 9542 ns 9537 ns 73728 BM_NthElement_uint32_QuickSelectAdversary_1024 132177 ns 132090 ns 5376 BM_NthElement_uint32_QuickSelectAdversary_16384 32366447 ns 32343198 ns 16 BM_NthElement_uint32_QuickSelectAdversary_262144 8268646976 ns 8260294657 ns 1 BM_NthElement_uint64_QuickSelectAdversary_4 7.98 ns 7.97 ns 88211456 BM_NthElement_uint64_QuickSelectAdversary_16 73.5 ns 73.3 ns 9814016 BM_NthElement_uint64_QuickSelectAdversary_64 977 ns 976 ns 737280 BM_NthElement_uint64_QuickSelectAdversary_256 13829 ns 13817 ns 51200 BM_NthElement_uint64_QuickSelectAdversary_1024 195468 ns 195371 ns 3584 BM_NthElement_uint64_QuickSelectAdversary_16384 47547973 ns 47521170 ns 16 BM_NthElement_uint64_QuickSelectAdversary_262144 12350804915 ns 12340965027 ns 1 ------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------- BM_NthElement_uint32_QuickSelectAdversary_4 9.70 ns 9.69 ns 72351744 BM_NthElement_uint32_QuickSelectAdversary_16 66.9 ns 66.8 ns 10387456 BM_NthElement_uint32_QuickSelectAdversary_64 676 ns 676 ns 1028096 BM_NthElement_uint32_QuickSelectAdversary_256 3950 ns 3947 ns 178176 BM_NthElement_uint32_QuickSelectAdversary_1024 19214 ns 19203 ns 36608 BM_NthElement_uint32_QuickSelectAdversary_16384 417974 ns 417712 ns 1680 BM_NthElement_uint32_QuickSelectAdversary_262144 8901424 ns 8895905 ns 79 BM_NthElement_uint64_QuickSelectAdversary_4 10.5 ns 10.5 ns 68026368 BM_NthElement_uint64_QuickSelectAdversary_16 71.9 ns 71.9 ns 9732096 BM_NthElement_uint64_QuickSelectAdversary_64 835 ns 834 ns 839680 BM_NthElement_uint64_QuickSelectAdversary_256 5169 ns 5166 ns 136192 BM_NthElement_uint64_QuickSelectAdversary_1024 25318 ns 25302 ns 27904 BM_NthElement_uint64_QuickSelectAdversary_16384 565411 ns 565056 ns 1248 BM_NthElement_uint64_QuickSelectAdversary_262144 11918315 ns 11911538 ns 59 RUNNING: ./nth_element.libcxx.before.out --benchmark_filter=BM_NthElement_(uint32|uint64)_QuickSelectAdversary.* --benchmark_out=/tmp/tmpm2fwl5zr RUNNING: ./nth_element.libcxx.after.out --benchmark_filter=BM_NthElement_(uint32|uint64)_QuickSelectAdversary.* --benchmark_out=/tmp/tmpeoej8bia Comparing ./nth_element.libcxx.before.out to ./nth_element.libcxx.after.out Benchmark Time CPU Time Old Time New CPU Old CPU New ----------------------------------------------------------------------------------------------------------------------------------------------- BM_NthElement_uint32_QuickSelectAdversary_4 +0.2172 +0.2170 8 10 8 10 BM_NthElement_uint32_QuickSelectAdversary_16 +0.0304 +0.0308 65 67 65 67 BM_NthElement_uint32_QuickSelectAdversary_64 -0.0500 -0.0500 712 676 711 676 BM_NthElement_uint32_QuickSelectAdversary_256 -0.5861 -0.5861 9542 3950 9537 3947 BM_NthElement_uint32_QuickSelectAdversary_1024 -0.8546 -0.8546 132177 19214 132090 19203 BM_NthElement_uint32_QuickSelectAdversary_16384 -0.9871 -0.9871 32366447 417974 32343198 417712 BM_NthElement_uint32_QuickSelectAdversary_262144 -0.9989 -0.9989 8268646976 8901424 8260294657 8895905 BM_NthElement_uint64_QuickSelectAdversary_4 +0.3182 +0.3182 8 11 8 11 BM_NthElement_uint64_QuickSelectAdversary_16 -0.0206 -0.0192 73 72 73 72 BM_NthElement_uint64_QuickSelectAdversary_64 -0.1454 -0.1451 977 835 976 834 BM_NthElement_uint64_QuickSelectAdversary_256 -0.6262 -0.6261 13829 5169 13817 5166 BM_NthElement_uint64_QuickSelectAdversary_1024 -0.8705 -0.8705 195468 25318 195371 25302 BM_NthElement_uint64_QuickSelectAdversary_16384 -0.9881 -0.9881 47547973 565411 47521170 565056 BM_NthElement_uint64_QuickSelectAdversary_262144 -0.9990 -0.9990 12350804915 11918315 12340965027 11911538 OVERALL_GEOMEAN -0.8659 -0.8658 0 0 0 0 Comment Actions There are some other select algorithms outperform introselect, ie pdqselect, adqselect. But I think we could prevent quadratic behavior first, and then pursue extreme performance. Comment Actions Do you plan to make a patch for one of them? If yes, it's probably easier to just go for it directly, since it's probably a re-implementation anyways? Comment Actions I do plan to make one, and I agree with that maybe go for one of them directly will be better. Comment Actions Thanks for working on this. Can you also provide benchmarks to see the effect on the other kinds of data?
|
perhaps use typename _IterOps<_AlgPolicy>::__diference_type