This is an archive of the discontinued LLVM Phabricator instance.

[libc++] nth_element change to partial sort when range is really small
Needs ReviewPublic

Authored by Backl1ght on Jan 12 2023, 7:52 AM.

Details

Reviewers
philnik
ldionne
Mordante
Group Reviewers
Restricted Project
Summary

For really small range, to make the element pointed at by __nth correct, we don't need to selection sort the whole range, we only need to do partial sort on the range so that [__first, __nth] contains the sorted __nth - __first + 1 smallest elements in the range [__first, __last).

And this can be done by only modifying the terminate condition of outer loop in selection sort since element before __first will not be changed anymore in the future round.

The difference is that the latter will end up earlier and make less comparison unless __nth equal to __last - 1 in which case there are same number of comparisons.

Diff Detail

Event Timeline

Backl1ght created this revision.Jan 12 2023, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 7:52 AM
Backl1ght requested review of this revision.Jan 12 2023, 7:52 AM
Herald added a reviewer: Restricted Project. · View Herald TranscriptJan 12 2023, 7:52 AM

Thanks for your contribution!

Can you show the benchmark results of the improvement, if needed by adding a new benchmark.

libcxx/include/__algorithm/nth_element.h
29

Please make this a _LIBCPP_ASSERT instead.

30

I really love to have some documentation for this function.

33

Why not use __nth? The new variable name does not help me what this does.

34

If __nth == __last this will fail.

37

Is _Compare not deduced?

Backl1ght updated this revision to Diff 489235.Jan 14 2023, 5:42 AM

refine code

Backl1ght marked 5 inline comments as done.Jan 14 2023, 5:53 AM

For previous diff I just copy code of __selection_sort from sort.h and modify the terminate condition.

Backl1ght updated this revision to Diff 489237.Jan 14 2023, 6:27 AM

It seems like the improvement is too small to make sense in benchmark. But from the output of code bellow we can tell that this reduce some calculations.

#include <algorithm>
#include <iostream>
#include <random>
#include <vector>

int main(int argc, char** argv) {
  const int REPETITIONS = 100'000;
  const int N = 7;

  int seed = 998244353;
  std::mt19937 rng(seed);
  int num_comp = 0;
  std::vector<int> a(N);
  for (int i = 0; i < REPETITIONS; ++i) {
    std::iota(a.begin(), a.end(), 0);
    std::shuffle(a.begin(), a.end(), rng);

    int k = rng() % N;
    std::nth_element(a.begin(), a.begin() + k, a.end(),
                     [&](int x, int y) -> bool {
                       ++num_comp;
                       return x < y;
                     });
  }
  std::cout << "num comp: " << num_comp << std::endl;
  return 0;
}

output before: num comp: 2100000
output after: num comp: 1597554

Backl1ght updated this revision to Diff 489249.EditedJan 14 2023, 7:34 AM

fix build error by replace U+2212 with U+002d

It seems like the improvement is too small to make sense in benchmark. But from the output of code bellow we can tell that this reduce some calculations.

#include <algorithm>
#include <iostream>
#include <random>
#include <vector>

int main(int argc, char** argv) {
  const int REPETITIONS = 100'000;
  const int N = 7;

  int seed = 998244353;
  std::mt19937 rng(seed);
  int num_comp = 0;
  std::vector<int> a(N);
  for (int i = 0; i < REPETITIONS; ++i) {
    std::iota(a.begin(), a.end(), 0);
    std::shuffle(a.begin(), a.end(), rng);

    int k = rng() % N;
    std::nth_element(a.begin(), a.begin() + k, a.end(),
                     [&](int x, int y) -> bool {
                       ++num_comp;
                       return x < y;
                     });
  }
  std::cout << "num comp: " << num_comp << std::endl;
  return 0;
}

output before: num comp: 2100000
output after: num comp: 1597554

I the improvement is too small to show up in a benchmark, I don't think it's actually an improvement. Maybe you could use a comparator that isn't trivial (e.g. use std::string)? This part at least shows that it isn't a pessimization for trivial comparators.

libcxx/include/__algorithm/nth_element.h
42

Why not min_element? Using the public API makes it always easier to refactor internal stuff, since that part is stable.

Backl1ght updated this revision to Diff 498886.Feb 20 2023, 9:50 AM
Backl1ght edited the summary of this revision. (Show Details)
Backl1ght updated this revision to Diff 500406.Feb 25 2023, 3:46 AM
Backl1ght added inline comments.
libcxx/include/__algorithm/nth_element.h
42

I use __min_element since selection_sort in sort.h uses __min_element.

I agree with that using the public API will be better. So I tried using min_element but got fail in a unittest defined in ranges_robust_against_proxy_iterators.pass.cpp.

I find out that selection_sort in sort.h used min_element before and changed to __min_element in D129823 to make range algorithms support proxy iterators

This is the benchmark result.

This is the benchmark result for BM_NthElement_string* after I changed length of each string to 8192.

Backl1ght marked an inline comment as done.Mar 2 2023, 11:53 PM