This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Introsort based sorting function
AbandonedPublic

Authored by philnik on Aug 7 2017, 12:36 PM.

Details

Summary

The sorting algorithm currently employed in libc+ library uses quicksort with tail recursion elimination, as a result of which the worst case complexity turns out to be O(N^2).
This patch reduces the worst case time complexity, by employing Introsort algorithm. Introsort is a sorting technique, which begins with quicksort and when the recursion depth (or depth limit) goes beyond a threshold value, then it switches to Heapsort .As a result the worst case complexity reduces to O(NlogN)

Worked in collaboration with Aditya Kumar.

Diff Detail

Event Timeline

DIVYA created this revision.Aug 7 2017, 12:36 PM
bcraig edited edge metadata.Aug 7 2017, 3:41 PM

This patch needs benchmarks that demonstrate the performance changes.

bcraig added a comment.Aug 7 2017, 6:54 PM

alternatively, you could report the comparison of the old code vs. the new code with an existing benchmark, like benchmarks/algorithms.bench.cpp

DIVYA added a comment.EditedAug 8 2017, 9:57 AM

benchmarks/algorithms.bench.cpp Results

CPU Time With old code (in ns)

BM_sort_std_common<std::vector<int>>/16384 : 730752
BM_sort_std_common<std::vector<int>>/32768 : 1.58E+06
BM_sort_std_ascending<std::vector<int>>/16384 : 17160.5
BM_sort_std_ascending<std::vector<int>>/32768 : 35350.1
BM_sort_std_descending<std::vector<int>>/16384 : 35809
BM_sort_std_descending<std::vector<int>>/32768 : 72133
BM_sort_std_list_with_vector<std::list<int>>/16384 : 124250
BM_sort_std_list_with_vector<std::list<int>>/32768 : 247705
BM_sort_std_worst_quick<std::vector<int>>/16384 : 1.03E+07
BM_sort_std_worst_quick<std::vector<int>>/32768 : 4.04E+07

CPU Time With new code (in ns)

BM_sort_std_common<std::vector<int>>/16384 : 720510
BM_sort_std_common<std::vector<int>>/32768 : 1.55E+06
BM_sort_std_ascending<std::vector<int>>/16384 : 17164.9
BM_sort_std_ascending<std::vector<int>>/32768 : 34726.7
BM_sort_std_descending<std::vector<int>>/16384 : 35671
BM_sort_std_descending<std::vector<int>>/32768 : 72100.7
BM_sort_std_list_with_vector<std::list<int>>/16384 : 125816
BM_sort_std_list_with_vector<std::list<int>>/32768 : 247450
BM_sort_std_worst_quick<std::vector<int>>/16384 : 987016
BM_sort_std_worst_quick<std::vector<int>>/32768 : 2.14E+06

bcraig added a comment.Aug 8 2017, 6:46 PM

Those are interesting (and useful) results... but they don't look like they came from the same algorithms.bench.cpp that I'm looking at...
https://github.com/llvm-mirror/libcxx/blob/master/benchmarks/algorithms.bench.cpp

That being said, the benchmark there only does 1k elements at a time, and doesn't have the worst case for quick sort like yours does. Adding to the upstream algorithms.bench.cpp seems valuable.

bcraig added a comment.Aug 8 2017, 6:55 PM

I like this change in general. Dinkumware has been using introsort for 10+ years, so I'm a bit surprised that libc++ wasn't already.

include/algorithm
4208

This comment says basically the same thing as the code. The comment would be more useful if it said why 2*log2(size) is used.

DIVYA added a comment.Aug 9 2017, 1:18 PM
include/algorithm
4208

We tested the code with depth limit from log2(size) to 4*log2(size).It was giving good performance around 2*log2(size).So the depth limit was fixed a this value.

If you want the performance improvements in the BM_sort_std_worst_quick cases preserved, you really need to port the benchmark from Aditya's repo into the libcxx benchmark code base. Otherwise, the next person that comes along to improve std::sort performance may very well wreck the performance gains you achieved.

DIVYA updated this revision to Diff 111027.Aug 14 2017, 10:22 AM
DIVYA marked an inline comment as done.

Added benchmark from Aditya's repo into the libcxx benchmark code base

The test headers should not be in the production include folder. They should probably be in the benchmark folder.

If possible, follow the style and conventions of the existing tests. When you can't follow the style and convention of the existing tests, try to make it obvious (or leave a comment) as to why the new test is special.

For example, one way to follow the existing conventions would be to have a test that looks like this...

BENCHMARK_CAPTURE(BM_Sort, qsort_worst_uint32, getQSortKiller<uint32_t>)->Arg(TestNumInputs);

That test would be called qsort_worst_uint32. You would need to author the sequence generator so that it had a signature like this...
template <class T> std::vector<T> getQSortKiller(size_t N)

On an encouraging note, I don't see anything wrong with the production code. I'm optimistic about getting this in once we iron out the test issues.

hiraditya added a comment.EditedAug 20 2017, 1:19 PM

Results with the patch.

Before:

Run on (8 X 3900 MHz CPU s)
2017-08-20 15:11:41
-------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations
-------------------------------------------------------------------------------
BM_Sort/random_uint32/65536               14202353 ns   14203202 ns         48
BM_Sort/sorted_ascending_uint32/65536       254100 ns     254108 ns       2754
BM_Sort/sorted_descending_uint32/65536      552118 ns     552151 ns       1232
BM_Sort/single_element_uint32/65536         170140 ns     170136 ns       4090
BM_Sort/pipe_organ_uint32/65536            5989117 ns    5989494 ns        113
BM_Sort/random_strings/65536             105697682 ns  105702553 ns          7
BM_Sort/sorted_ascending_strings/65536    13324109 ns   13324186 ns         50
BM_Sort/sorted_descending_strings/65536   19057303 ns   19058005 ns         36
BM_Sort/single_element_strings/65536      57941433 ns   57944691 ns         12
BM_Sort/qsort_worst_uint32/65536         694858550 ns  694894213 ns          1

After:

Run on (8 X 3900 MHz CPU s)
2017-08-20 15:15:14
-------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations
-------------------------------------------------------------------------------
BM_Sort/random_uint32/65536               14073209 ns   14073732 ns         49
BM_Sort/sorted_ascending_uint32/65536       257596 ns     257610 ns       2740
BM_Sort/sorted_descending_uint32/65536      560208 ns     560069 ns       1226
BM_Sort/single_element_uint32/65536         170543 ns     170549 ns       4075
BM_Sort/pipe_organ_uint32/65536            6008832 ns    6009173 ns        113
BM_Sort/random_strings/65536             104672888 ns  104677220 ns          7
BM_Sort/sorted_ascending_strings/65536    13334016 ns   13334393 ns         54
BM_Sort/sorted_descending_strings/65536   18883275 ns   18883831 ns         37
BM_Sort/single_element_strings/65536      57022905 ns   57025206 ns         12
BM_Sort/qsort_worst_uint32/65536          16870788 ns   16871828 ns         41

The last test which exploits the worst case behavior in quick sort improves greatly while others are mostly unaffected.

DIVYA updated this revision to Diff 111998.Aug 21 2017, 9:24 AM
  • added test qsort_worst_uint32 in algorithm.bench.cpp

LGTM. You should probably get one other person to approve though. I'm hoping that @EricWF or @mclow.lists will take a look

philnik commandeered this revision.Jun 1 2022, 12:42 AM
philnik added a reviewer: DIVYA.
philnik added a subscriber: philnik.

This has been superseded by D113413.

Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 12:42 AM
Herald added a subscriber: mgrang. · View Herald Transcript
philnik abandoned this revision.Jun 1 2022, 12:42 AM