This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] nth_element use introselect to avoid quadratic behavior
Needs ReviewPublic

Authored by Backl1ght on Jan 7 2023, 12:00 PM.

Details

Reviewers
ldionne
danlark
philnik
Group Reviewers
Restricted Project
Summary

Current implementation of nth_element will be quadratic under some cases, use introselect to avoid quadratic behavior.

I also add a benchmark for nth_element.

related issue https://github.com/llvm/llvm-project/issues/52747

Diff Detail

Event Timeline

Backl1ght created this revision.Jan 7 2023, 12:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2023, 12:00 PM
Backl1ght requested review of this revision.Jan 7 2023, 12:00 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptJan 7 2023, 12:00 PM

Here is the benchmark result for this patch.

-------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations
-------------------------------------------------------------------------------------------
BM_NthElement_uint32_QuickSelectAdversary_4            7.97 ns         7.96 ns     87818240
BM_NthElement_uint32_QuickSelectAdversary_16           64.9 ns         64.8 ns     10928128
BM_NthElement_uint32_QuickSelectAdversary_64            712 ns          711 ns       991232
BM_NthElement_uint32_QuickSelectAdversary_256          9542 ns         9537 ns        73728
BM_NthElement_uint32_QuickSelectAdversary_1024       132177 ns       132090 ns         5376
BM_NthElement_uint32_QuickSelectAdversary_16384    32366447 ns     32343198 ns           16
BM_NthElement_uint32_QuickSelectAdversary_262144 8268646976 ns   8260294657 ns            1
BM_NthElement_uint64_QuickSelectAdversary_4            7.98 ns         7.97 ns     88211456
BM_NthElement_uint64_QuickSelectAdversary_16           73.5 ns         73.3 ns      9814016
BM_NthElement_uint64_QuickSelectAdversary_64            977 ns          976 ns       737280
BM_NthElement_uint64_QuickSelectAdversary_256         13829 ns        13817 ns        51200
BM_NthElement_uint64_QuickSelectAdversary_1024       195468 ns       195371 ns         3584
BM_NthElement_uint64_QuickSelectAdversary_16384    47547973 ns     47521170 ns           16
BM_NthElement_uint64_QuickSelectAdversary_262144 12350804915 ns   12340965027 ns            1
-------------------------------------------------------------------------------------------
Benchmark                                                 Time             CPU   Iterations
-------------------------------------------------------------------------------------------
BM_NthElement_uint32_QuickSelectAdversary_4            9.70 ns         9.69 ns     72351744
BM_NthElement_uint32_QuickSelectAdversary_16           66.9 ns         66.8 ns     10387456
BM_NthElement_uint32_QuickSelectAdversary_64            676 ns          676 ns      1028096
BM_NthElement_uint32_QuickSelectAdversary_256          3950 ns         3947 ns       178176
BM_NthElement_uint32_QuickSelectAdversary_1024        19214 ns        19203 ns        36608
BM_NthElement_uint32_QuickSelectAdversary_16384      417974 ns       417712 ns         1680
BM_NthElement_uint32_QuickSelectAdversary_262144    8901424 ns      8895905 ns           79
BM_NthElement_uint64_QuickSelectAdversary_4            10.5 ns         10.5 ns     68026368
BM_NthElement_uint64_QuickSelectAdversary_16           71.9 ns         71.9 ns      9732096
BM_NthElement_uint64_QuickSelectAdversary_64            835 ns          834 ns       839680
BM_NthElement_uint64_QuickSelectAdversary_256          5169 ns         5166 ns       136192
BM_NthElement_uint64_QuickSelectAdversary_1024        25318 ns        25302 ns        27904
BM_NthElement_uint64_QuickSelectAdversary_16384      565411 ns       565056 ns         1248
BM_NthElement_uint64_QuickSelectAdversary_262144   11918315 ns     11911538 ns           59
RUNNING: ./nth_element.libcxx.before.out --benchmark_filter=BM_NthElement_(uint32|uint64)_QuickSelectAdversary.* --benchmark_out=/tmp/tmpm2fwl5zr
RUNNING: ./nth_element.libcxx.after.out --benchmark_filter=BM_NthElement_(uint32|uint64)_QuickSelectAdversary.* --benchmark_out=/tmp/tmpeoej8bia
Comparing ./nth_element.libcxx.before.out to ./nth_element.libcxx.after.out
Benchmark                                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
BM_NthElement_uint32_QuickSelectAdversary_4                     +0.2172         +0.2170             8            10             8            10
BM_NthElement_uint32_QuickSelectAdversary_16                    +0.0304         +0.0308            65            67            65            67
BM_NthElement_uint32_QuickSelectAdversary_64                    -0.0500         -0.0500           712           676           711           676
BM_NthElement_uint32_QuickSelectAdversary_256                   -0.5861         -0.5861          9542          3950          9537          3947
BM_NthElement_uint32_QuickSelectAdversary_1024                  -0.8546         -0.8546        132177         19214        132090         19203
BM_NthElement_uint32_QuickSelectAdversary_16384                 -0.9871         -0.9871      32366447        417974      32343198        417712
BM_NthElement_uint32_QuickSelectAdversary_262144                -0.9989         -0.9989    8268646976       8901424    8260294657       8895905
BM_NthElement_uint64_QuickSelectAdversary_4                     +0.3182         +0.3182             8            11             8            11
BM_NthElement_uint64_QuickSelectAdversary_16                    -0.0206         -0.0192            73            72            73            72
BM_NthElement_uint64_QuickSelectAdversary_64                    -0.1454         -0.1451           977           835           976           834
BM_NthElement_uint64_QuickSelectAdversary_256                   -0.6262         -0.6261         13829          5169         13817          5166
BM_NthElement_uint64_QuickSelectAdversary_1024                  -0.8705         -0.8705        195468         25318        195371         25302
BM_NthElement_uint64_QuickSelectAdversary_16384                 -0.9881         -0.9881      47547973        565411      47521170        565056
BM_NthElement_uint64_QuickSelectAdversary_262144                -0.9990         -0.9990   12350804915      11918315   12340965027      11911538
OVERALL_GEOMEAN                                                 -0.8659         -0.8658             0             0             0             0
Backl1ght added a comment.EditedJan 7 2023, 12:16 PM

There are some other select algorithms outperform introselect, ie pdqselect, adqselect. But I think we could prevent quadratic behavior first, and then pursue extreme performance.

Backl1ght updated this revision to Diff 487133.Jan 7 2023, 8:28 PM

fix error: ADL lookup

There are some other select algorithms outperform introselect, ie pdqselect, adqselect. But I think we could prevent quadratic behavior first, and then pursue extreme performance.

Do you plan to make a patch for one of them? If yes, it's probably easier to just go for it directly, since it's probably a re-implementation anyways?

There are some other select algorithms outperform introselect, ie pdqselect, adqselect. But I think we could prevent quadratic behavior first, and then pursue extreme performance.

Do you plan to make a patch for one of them? If yes, it's probably easier to just go for it directly, since it's probably a re-implementation anyways?

I do plan to make one, and I agree with that maybe go for one of them directly will be better.

Thanks for working on this. Can you also provide benchmarks to see the effect on the other kinds of data?

libcxx/include/__algorithm/nth_element.h
104

Can you add comment why this is done? Currently the comment and the code tell the same thing.
For maintainability it would be very good to know why we switch the algorithm.

danlark added inline comments.Jan 8 2023, 9:04 AM
libcxx/include/__algorithm/nth_element.h
104

Even though I am in favor of this patch, this does not fully fix the issue. With the comparator that changes the inputs on the fly, it's possible to get into a O(n log n) average case.

Otherwise LGTM, thanks!

Backl1ght added inline comments.Jan 9 2023, 5:21 AM
libcxx/include/__algorithm/nth_element.h
104

I think the average time complexity of introselect should be O(n), since the average time complexity of quickselect is O(n), and according to the original paper of introselect:

If as in introsort we limit the depth to a logarithmic bound the average time remains linear but the worst case is O(N log N).

danlark added inline comments.Jan 9 2023, 5:33 AM
libcxx/include/__algorithm/nth_element.h
104

The example from the link https://github.com/llvm/llvm-project/issues/52747#issuecomment-996916909 will still be of average case O(N log N)

huixie90 added inline comments.
libcxx/include/__algorithm/nth_element.h
52

perhaps use typename _IterOps<_AlgPolicy>::__diference_type

104

I think that example should be UB. I can't remember the legacy version's wording, but I am pretty sure the ranges version requires the predicate to be "equality-preserving". The same inputs to the predicate should return the same output. But that example's predicate seems to modify the states in such way that same inputs can result in different output

105

nit: _Compare can be deduced