This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
include/__algorithm/
-
__algorithm/
-
nth_element.h
2/3
sort.h
-
stable_sort.h
-
src/
-
CMakeLists.txt
-
algorithm.cpp
4/5
legacy-sort.cpp

Differential D93233

[libc++] Replaces std::sort by Bitset sorting algorithm.
AbandonedPublic

Authored by nilayvaish on Dec 14 2020, 9:27 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
minjaehwang
ldionne

Group Reviewers

Restricted Project

Summary

Bitset Sort

Bitset Sort is a variant of quick sort, specifically BlockQuickSort. Bitset Sort uses a carefully written partition function to let the compiler generates SIMD instructions without actually writing SIMD intrinsics in the loop.

Bitset Sort is interface compatible with std::sort and meant to replace std::sort in libc++.

Bitset Sort is 3.4x faster (or spends 71% less time) than libc++ std::sort when sorting uint64s and 1.58x faster (or spends 37% less time) when sorting std::string.

Bitset Sort uses multiple techniques to improve runtime performance of sort. This includes sorting networks, a variant of merge sort called Bitonic Order Merge Sort that is faster for small N, and pattern recognitions.

Please see Github page for more detailed documentations and full results.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

minjaehwang created this revision.Dec 14 2020, 9:27 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 14 2020, 9:27 AM

minjaehwang requested review of this revision.Dec 14 2020, 9:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2020, 9:27 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

minjaehwang edited the summary of this revision. (Show Details)Dec 14 2020, 9:29 AM

minjaehwang edited the summary of this revision. (Show Details)

minjaehwang added a reviewer: EricWF.Dec 14 2020, 9:32 AM

lebedev.ri retitled this revision from Replaces std::sort by Bitset sorting algorithm. to [TableGen] Replaces std::sort by Bitset sorting algorithm..Dec 14 2020, 9:34 AM

lebedev.ri edited reviewers, added: Paul-C-Anagnostopoulos; removed: EricWF.

Did you upload the right diff? The code does not show sort.

Made the incorrect patch previously.

Herald added a reviewer: jdoerfert. · View Herald TranscriptDec 14 2020, 9:58 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added subscribers: libcxx-commits, sstefan1, mgrang. · View Herald Transcript

In D93233#2452523, @MaskRay wrote:

Did you upload the right diff? The code does not show sort.

I am very sorry. Just updated the review with the sort.

What are the perf changes on CPU's without support for those instructions?

IMHO bitset sort is not a standard term, so it needs some clarification. From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

In D93233#2452718, @MaskRay wrote:

IMHO bitset sort is not a standard term, so it needs some clarification. From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

There's a list of standard algorithms that are allowed to use temporary heap storage.
That list does not include std::sort (it consists of stable_sort, stable_partition and inplace_merge)

abidh added a subscriber: abidh.Dec 14 2020, 11:06 AM

Harbormaster completed remote builds in B82309: Diff 311640.Dec 14 2020, 11:08 AM

This is an interesting contribution. I'm trying to wrap my head around it.

In D93233#2452718, @MaskRay wrote:

From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

Unless I missed a subtlety, it doesn't require heap allocation. The "bitsets" used are actually 64 (or 32) bit integers. The current algorithm also doesn't have heap allocation, so keeping that property is something we want (and something this patch does AFAICT).

@minjaehwang Is there documentation for this algorithm? I believe it would help us (at least myself) understand the algorithm and the potential tradeoffs involved. Also, are you aware of Timsort? IIRC there had been an analysis conducted that concluded that we should be using Timsort (as it was faster than our sort and respected the complexity requirements of the Standard).

libcxx/test/libcxx/algorithms/sort.pass.cpp
21 ↗	(On Diff #311640)	Shouldn't this test work with all implementations? Why not?

Harbormaster completed remote builds in B82306: Diff 311632.Dec 14 2020, 11:16 AM

In D93233#2452760, @ldionne wrote:

This is an interesting contribution. I'm trying to wrap my head around it.

Thanks!

In D93233#2452718, @MaskRay wrote:

From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

Unless I missed a subtlety, it doesn't require heap allocation. The "bitsets" used are actually 64 (or 32) bit integers. The current algorithm also doesn't have heap allocation, so keeping that property is something we want (and something this patch does AFAICT).

You are right. It only requires two 64 (or 32) bit integers to represent bitsets on left partition and right partition.

@minjaehwang Is there documentation for this algorithm? I believe it would help us (at least myself) understand the algorithm and the potential tradeoffs involved. Also, are you aware of Timsort? IIRC there had been an analysis conducted that concluded that we should be using Timsort (as it was faster than our sort and respected the complexity requirements of the Standard).

I have the documentation which was intended to present in my employer. I will work with the company to publish the document.

I am aware of Timsort. Timsort is probably good for replacing std::stable_sort. It is a variant of merge sort and could be generally slower compared to unstable sorts.

In D93233#2452579, @lebedev.ri wrote:

What are the perf changes on CPU's without support for those instructions?

This is definitely something that I would like to do. Could you suggest which platforms or architectures should I try?

This implementation benefits from bsr or lzcnt instructions. I am guessing bitset will speed up even without specialized instructions because lzcnt and bsr are not terribly slow to implement with general instructions.

In D93233#2452718, @MaskRay wrote:

IMHO bitset sort is not a standard term, so it needs some clarification. From a glance this uses a bitset partition and needs temporary storage (heap memory allocation?). Does the original algorithm have heap memory allocation?

You are right. I coined bitset sort terminology and have not published anything publicly yet. I will work on that.

Bitset partition works with stack variables without heap memory allocation.

minjaehwang added inline comments.Dec 14 2020, 12:25 PM

libcxx/test/libcxx/algorithms/sort.pass.cpp
21 ↗	(On Diff #311640)	You are right. This is a general sort test. Will fix it.

Hi, @ldionne asked me whether it was possible to review the algorithm, and so did I over the last three days. I eventually decided to create an account here and share again what I already sent him by mail, with additional information.

First, it looks like the main new trick is an implementation of the idea described in *BlockQuicksort: How Branch Mispredictions don't affect Quicksort* by Edelkamp and Weiß: basically, the partitioning algorithm performs comparisons over blocks of 32~64 elements, stores their results as booleans (here in a bitset), then performs the swaps to put the misplaced elements into place. It nicely avoids issues linked to branch misprediction, and thus is kind of a silver bullet for quicksort as long as the comparisons and projections passed to the algorithm are themselves branchless. As we can see for the strings benchmark, this partitioning scheme can be slower than a more traditional one when there are branches in the comparisons, which is likely due to the overhead from the additional logic involved.

On the good side, I tweaked bitset_sort so that I could inject it in my cpp-sort library, and ran the full test suite: the algorithm doesn't seem to have obvious bugs, and ubsan and asan didn't fid any issue either. Also the benchmarks show good results (here for a std::vector<double>: https://i.imgur.com/Nu6l8lI.png

On the not-so-good side, bitset_sort seems to be O(n²), just like the current implementation of std::sort in C++. I reran the quicksort killer implementation of Orson Peters from this issue, and got the same results. The quicksort killer is based on *A Killer Adversary for Quicksort* by M. D. McIlroy. Here is a graph showing the O(n log n) vs. O(n²) behaviour of the different algorithms: https://i.imgur.com/IWa2teO.png

So far my overall conclusion is that the idea of reimplementing the BlockQuicksort logic over bitsets is nice, but pdqsort otherwise seems better in every regard: it is truly O(n log n) - and even O(n log k) for k unique elements when k is small -, is efficient at breaking common quicksort-adverse patterns such as the pipe organ pattern in my first benchmark, and it actually switches over to a simpler non-branchless partitioning algorithm when it can't prove at compile time that the comparison will be branchless, avoiding the regression on types such as std::string.

If std::sort has to be changed, then I'd definitely reconsider using pdqsort instead. It should be possible to reimplement it over bitsets à la bitset_sort if you care about the reduced impact on stack memory.

Thank @Morwenn very much for looking into this code.

@Morwenn, you are right that the idea is largely based on Block Quicksort except that Bitset sort uses a 32/64 bit integer instead of an array. A bitset will be kept within CPU registers and won't require store instructions like Block Quicksort does.

As you know, Pdqsort is a variation of Block Quicksort with a pattern recognition and a different pivot selection. It recognizes ascending ranges, descending ranges, and O(n ^ 2) cases. Pdqsort shows O(n) performance on these known patterns and guarantees O(n lg n) in the worse case.

The bitset trick can be applied to Pdqsort's partition function just like this code does for the existing std::sort's partition function. I have implemented Bitset trick on top of pdqsort as well but chose the current implementation over pdqsort-based Bitset sort. The current implementation introduces the minimal change to std::sort. It replaces the partition function but keeps most of the outer layer of std::sort.

The reason for keeping the outer layer of std::sort is that it will keep most of performance characteristics unchanged and adds a little overhead when there is a branch in the comparison function.

@Morwenn mentioned the O(n ^ 2) case for quicksort which also happens for the existing std::sort. As you might know, a simple change can avoid O(n ^ 2). Introsort avoids O(n ^ 2) by calling heap sort when a quicksort depth goes above O(lg n). There has been efforts to introduce introsort into libc++ (Kumar's introsort). For unknown reason to me, it has not been submitted. Pdqsort avoids O(n ^ 2) by calling heap sort when partitions are highly unbalanced.

I understand that Pdqsort is faster in many known patterns over std::sort. If we are not afraid of changing the entire sort implementation, I can certainly bring Bitset-on-pdqsort as another code review. @Morwenn and @ldionne, could you give your thoughts on the future direction?

@Morwenn Bitset sort is faster for std::strings than pdqsort which turns off Block sort technique for non-arithmetic types. See the following comparison. Block sort technique can be faster even with comparison functions with branches.

BM_PdqSort_string_Random_1                                                   4.65 ns         4.64 ns    150470656
BM_PdqSort_string_Random_4                                                   19.0 ns         19.0 ns     36175872
BM_PdqSort_string_Random_16                                                  46.4 ns         46.4 ns     14942208
BM_PdqSort_string_Random_64                                                  64.7 ns         64.7 ns     10485760
BM_PdqSort_string_Random_256                                                 84.1 ns         84.1 ns      8126464
BM_PdqSort_string_Random_1024                                                 103 ns          103 ns      6553600
BM_PdqSort_string_Random_16384                                                160 ns          160 ns      4456448
BM_PdqSort_string_Random_262144                                               242 ns          242 ns      2359296
BM_BitsetSort_string_Random_1                                                3.85 ns         3.85 ns    181665792
BM_BitsetSort_string_Random_4                                                18.1 ns         18.1 ns     38535168
BM_BitsetSort_string_Random_16                                               45.5 ns         45.5 ns     14942208
BM_BitsetSort_string_Random_64                                               62.6 ns         62.6 ns     10747904
BM_BitsetSort_string_Random_256                                              74.4 ns         74.4 ns      9175040
BM_BitsetSort_string_Random_1024                                             85.9 ns         85.9 ns      7864320
BM_BitsetSort_string_Random_16384                                             121 ns          121 ns      5767168
BM_BitsetSort_string_Random_262144                                            179 ns          179 ns      3407872
BM_Sort_string_Random_1                                                      4.05 ns         4.05 ns    173277184
BM_Sort_string_Random_4                                                      17.9 ns         17.9 ns     38010880
BM_Sort_string_Random_16                                                     50.4 ns         50.3 ns     13369344
BM_Sort_string_Random_64                                                     73.3 ns         73.3 ns      9437184
BM_Sort_string_Random_256                                                    94.6 ns         94.4 ns      7077888
BM_Sort_string_Random_1024                                                    115 ns          115 ns      6029312
BM_Sort_string_Random_16384                                                   177 ns          177 ns      3932160
BM_Sort_string_Random_262144                                                  279 ns          279 ns      2097152

In D93233#2466828, @minjaehwang wrote:

The bitset trick can be applied to Pdqsort's partition function just like this code does for the existing std::sort's partition function. I have implemented Bitset trick on top of pdqsort as well but chose the current implementation over pdqsort-based Bitset sort. The current implementation introduces the minimal change to std::sort. It replaces the partition function but keeps most of the outer layer of std::sort.

The reason for keeping the outer layer of std::sort is that it will keep most of performance characteristics unchanged and adds a little overhead when there is a branch in the comparison function.

That's surely a fair concern.

@Morwenn mentioned the O(n ^ 2) case for quicksort which also happens for the existing std::sort. As you might know, a simple change can avoid O(n ^ 2). Introsort avoids O(n ^ 2) by calling heap sort when a quicksort depth goes above O(lg n). There has been efforts to introduce introsort into libc++ (Kumar's introsort). For unknown reason to me, it has not been submitted. Pdqsort avoids O(n ^ 2) by calling heap sort when partitions are highly unbalanced.

Yeah, I'm still rather surprised that the libc++ implementation of std::sort isn't some flavour of introsort yet. It's not excessively hard to implement, and it would make the algorithm standard-compliant. I know that a few years ago they wanted proper sorting benchmarks before giving the "go" to a new sort implementation, but it seems like something that could have been fixed regardless.

I understand that Pdqsort is faster in many known patterns over std::sort. If we are not afraid of changing the entire sort implementation, I can certainly bring Bitset-on-pdqsort as another code review. @Morwenn and @ldionne, could you give your thoughts on the future direction?

@Morwenn Bitset sort is faster for std::strings than pdqsort which turns off Block sort technique for non-arithmetic types. See the following comparison. Block sort technique can be faster even with comparison functions with branches.

Oh, I thought that deactivating the branchless partitioning algorithm was done because it introduced regressions for non-arithmetic types. I guess I blindly believed that it had been compared against std::string of all things to make this decision, but never ran benchmarks to back that claim. That's new info to me.

As for the future direction I do believe that pdqsort has some interesting ideas that would be nice to have, but I'm not part of the libc++ project in the first place, so it's not really my call to make. I'm glad to see that things are moving in this area nonetheless, and bitset_sort would be a step in the right direction compared to the status quo anyway :)

I worked on performance improvements and wrote a documentation what bitset sort is and why it is faster. Thanks to LLVM's ability to generate SIMD instructions, bitset sort lets LLVM to generate SIMD instructions for the critical path (bitset partition).

The documentation and the newer implementation is here - https://github.com/minjaehwang/bitsetsort. This implementation is faster than pdqsort for all randomized inputs. For example, in the case of uint64 256k set, bitset sort only takes 15ns per element while pdqsort takes 24ns per element.

Bitset sort shows a little regression on patterns such as ascending or descending. This is something I can work on but I do believe that speed-ups in randomized inputs triumphs the regression in these patterns because these are so small in difference.

minjaehwang edited the summary of this revision. (Show Details)Jun 16 2021, 11:37 PM

Also, are you aware of Timsort? IIRC there had been an analysis conducted that concluded that we should be using Timsort (as it was faster than our sort and respected the complexity requirements of the Standard).

Swift went for the standard library from introsort to timsort: https://github.com/apple/swift/pull/19717

Improved randomized sort performance for all types.
Added introsort feature.

Harbormaster completed remote builds in B109655: Diff 352634.Jun 17 2021, 12:44 AM

minjaehwang edited the summary of this revision. (Show Details)Jun 17 2021, 12:49 AM

nilayvaish added a subscriber: nilayvaish.Oct 6 2021, 12:48 PM

I stumbled upon this again while looking at the review queue and realized I had never followed up. Please rest assured that the lack of response here is only a measure of how backed up we are with reviews and not how interested we are with this change. I also want to give special thanks to @Morwenn for responding to my request to review this and for the detailed explanation and interesting discussion that followed. For a non-sort-expert like me, that is essential in shedding light on the whole thing.

So my take away from the discussion is that Bitsetsort and pdqsort are comparable in efficiency, however Bitsetsort still suffers from O(n^2) worst case. Is that correct? If so, is it possible to modify Bitsetsort to avoid this O(n^2) worst case? That's a pretty bad offender in our conformance right now.

I'm going to commandeer this so I can rebase it onto main and apply a few changes necessary for libc++. I'm sorry this slept for so long. In particular, this patch as-is breaks the ABI because we stop exporting the internal __sort_whatever symbols from the dylib. We can't do that -- we're stuck with these symbols forever unfortunately. However, nothing prevents us from introducing new ones (even though for the time being I would try to avoid baking parts of the sort implementation in the dylib at all).

Rebase onto main
Keep the old std::sort implementation around for ABI compatibility
Do not explicitly instantiate bitset sort in the library -- that ties our hands too much

Still left to do:

Some tests are failing, they need to be investigated
Do basic code size testing since we're not instantiating the sorts in the dylib anymore -- does it cause the size of user programs to blow up badly?
Make sure I applied the diff correctly - I had to do it manually but the diff was generated weirdly, so I'm not sure the diff I applied is correct.
We need to decide on the visibility of implementation detail functions. We probably want to sprinkle _LIBCPP_HIDE_FROM_ABI all around.

Ideally @minjaehwang if you could commandeer this again and make those fixes, that would be awesome. From here on, it should be fairly straightforward to make changes until we can ship this, since I should have handled most of the annoying libc++ specific things.

Herald added a subscriber: mgorny. · View Herald TranscriptOct 6 2021, 3:08 PM

Harbormaster completed remote builds in B127414: Diff 377704.Oct 6 2021, 5:42 PM

In D93233#3046799, @ldionne wrote:

Rebase onto main

Keep the old std::sort implementation around for ABI compatibility

Do not explicitly instantiate bitset sort in the library -- that ties our hands too much

Still left to do:

Some tests are failing, they need to be investigated

Do basic code size testing since we're not instantiating the sorts in the dylib anymore -- does it cause the size of user programs to blow up badly?

Make sure I applied the diff correctly - I had to do it manually but the diff was generated weirdly, so I'm not sure the diff I applied is correct.

We need to decide on the visibility of implementation detail functions. We probably want to sprinkle _LIBCPP_HIDE_FROM_ABI all around.

Ideally @minjaehwang if you could commandeer this again and make those fixes, that would be awesome. From here on, it should be fairly straightforward to make changes until we can ship this, since I should have handled most of the annoying libc++ specific things.

ldionne@, I'll work on this on MinJae's behalf. Would post a new diff sometime soon.

In D93233#3047865, @nilayvaish wrote:

In D93233#3046799, @ldionne wrote:

Rebase onto main

Keep the old std::sort implementation around for ABI compatibility

Do not explicitly instantiate bitset sort in the library -- that ties our hands too much

Still left to do:

Some tests are failing, they need to be investigated

Do basic code size testing since we're not instantiating the sorts in the dylib anymore -- does it cause the size of user programs to blow up badly?

Make sure I applied the diff correctly - I had to do it manually but the diff was generated weirdly, so I'm not sure the diff I applied is correct.

We need to decide on the visibility of implementation detail functions. We probably want to sprinkle _LIBCPP_HIDE_FROM_ABI all around.

Ideally @minjaehwang if you could commandeer this again and make those fixes, that would be awesome. From here on, it should be fairly straightforward to make changes until we can ship this, since I should have handled most of the annoying libc++ specific things.

ldionne@, I'll work on this on MinJae's behalf. Would post a new diff sometime soon.

Awesome, thanks! I saw your post on libcxx-dev just now. To reproduce CI failures locally, you can use libcxx/utils/ci/run-buildbot-container to spawn a Docker container matching our CI nodes, and then libcxx/utils/ci/run-buildbot <config you want> to run the actual build/test. There's documentation in run-buildbot-container and also the Dockerfile in the same directory.

nilayvaish commandeered this revision.Oct 12 2021, 5:24 PM

nilayvaish added a reviewer: ldionne.

Fixed the implementation. No tests fail now: https://buildkite.com/llvm-project/libcxx-ci/builds/5917.
Added _LIBCPP_HIDE_FROM_ABI to most functions.

Questions for ldionne@

How do we test code size?
Do we need to hide member functions in classes from the ABI?

Harbormaster completed remote builds in B128507: Diff 379230.Oct 12 2021, 6:25 PM

It seems that we can use _LIBCPP_HIDE_FROM_ABI with __bitsetsort_loop. Causes the compiler to report errors like the following:

589 | bitsetsort_loop(_RandomAccessIterator first, _RandomAccessIterator last, _Compare comp,

| ^~~~~~~~~~~~~~~~~

/home/libcxx-builder/.buildkite-agent/builds/75f0779fb064-1/llvm-project/libcxx-ci/build/generic-gcc/include/c++/v1/__algorithm/sort.h:642:34: note: called from here

642 |       __bitsetsort_loop<_Compare>(__first, __ret.first, __comp, __buff, __limit);
    |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/home/libcxx-builder/.buildkite-agent/builds/75f0779fb064-1/llvm-project/libcxx-ci/build/generic-gcc/include/c++/v1/__algorithm/sort.h:589:1: error: inlining failed in call to 'always_inline' 'void std::1::bitsetsort::bitsetsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Compare, typename std::1::iterator_traits<_RandomAccessIterator1>::value_type*, typename std::1::iterator_traits<_RandomAccessIterator1>::difference_type) [with _Compare = std::1::__less<double, double>; _RandomAccessIterator = double*]': recursive inlining

589 | __bitsetsort_loop(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp,
    | ^~~~~~~~~~~~~~~~~

/home/libcxx-builder/.buildkite-agent/builds/75f0779fb064-1/llvm-project/libcxx-ci/build/generic-gcc/include/c++/v1/__algorithm/sort.h:645:34: note: called from here

645 |       __bitsetsort_loop<_Compare>(__ret.first + 1, __last, __comp, __buff, __limit);
    |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Harbormaster completed remote builds in B128517: Diff 379248.Oct 12 2021, 8:08 PM

nilayvaish added inline comments.Oct 13 2021, 11:34 AM

libcxx/src/legacy-sort.cpp
340–394	ldionne@, I am wondering if these symbols need to be in this file. Can we continue with the setup from before i.e. these symbols have extern declarations in sort.h and are defined in algorithm.cpp? Further is there a need for retaining the existing sorting algorithm? It seems to me we need to retain __insertion_sort_incomplete only. What do you think?

Sorry for the delay, see my answers below.

In D93233#3060110, @nilayvaish wrote:

Fixed the implementation. No tests fail now: https://buildkite.com/llvm-project/libcxx-ci/builds/5917.

Added _LIBCPP_HIDE_FROM_ABI to most functions.

Questions for ldionne@

How do we test code size?

I think we have benchmarks for std::sort -- one thing we could do is look at the size of that executable before and after the change.

Can you also provide the runtime output of these benchmarks before and after the change?

Do we need to hide member functions in classes from the ABI?

Yes, but I think you have figured that out already.

libcxx/include/__algorithm/sort.h
34	Why is this called `__sorting_network`? Sounds like a weird name.
libcxx/src/legacy-sort.cpp
340–394	I think we do need to retain all those functions since they were previously exported from the shared library. Removing these functions would be an ABI break. I've put it in `legacy-sort.cpp` because we don't ever want to deal with these functions again anymore -- they are only there for ABI compatibility purpose. I thought it was better to separate them into their own little file than keeping them around in the headers that we actually use. If you have a strong reason to keep them around in `sort.h`, let me know and we can discuss.
341	This should be guarded by `_LIBCPP_HAS_NO_WIDE_CHARACTERS`. I think this must have happened when you (or I) rebased the patch.
364–365	Same.

This revision now requires changes to proceed.Oct 27 2021, 2:56 PM

danlark mentioned this in D96946: [libcxx][RFC] Unspecified behavior randomization in libcxx.Oct 29 2021, 5:25 AM

Hi ldionne@, I have question for you. Do the tests we have for libc++ ensure that the new implementation works well across all versions of the C++ standard? I want to make sure that I am not going to unintentionally break someone's standard-compliant code out there.

Fixed some compilation issues and reduced the number of changes in ordering of equivalent elements

Harbormaster completed remote builds in B132301: Diff 384547.Nov 3 2021, 1:46 PM

ldionne@, I am going to break this change into few parts and provide the benchmark results for each of the part separately. To begin with, I have posted: https://reviews.llvm.org/D113413 that makes the introsort change to the std::sort implementation.

libcxx/include/__algorithm/sort.h
34	I think the name has been prevalent in the theory community for a long time. More info here: https://en.wikipedia.org/wiki/Sorting_network. The code in functions __sort[3...8] mimics what the sorting networks for those lengths would look like. The name is strange from namespace perspective. Willing to name it something else that you prefer.

nilayvaish added inline comments.Nov 8 2021, 4:04 PM

libcxx/src/legacy-sort.cpp
340–394	I do not have a preference for where these symbols are kept. Is there a way to avoid having sort implementation in legacy-sort.cpp file?

In D93233#3101571, @nilayvaish wrote:

Hi ldionne@, I have question for you. Do the tests we have for libc++ ensure that the new implementation works well across all versions of the C++ standard? I want to make sure that I am not going to unintentionally break someone's standard-compliant code out there.

They should, however you're welcome to look at them - our tests are sometimes not up to our quality standards, so you should make sure they will catch all issues you can think about.

We do run those tests in C++03, C++11, C++14, C++17, C++20 and C++23 modes, if that was the sole intent of your question.

In D93233#3116956, @nilayvaish wrote:

ldionne@, I am going to break this change into few parts and provide the benchmark results for each of the part separately. To begin with, I have posted: https://reviews.llvm.org/D113413 that makes the introsort change to the std::sort implementation.

That's fine by me if you prefer doing it that way, however at the moment I am failing to understand where https://reviews.llvm.org/D113413 fits into the picture. Can you explain what's the plan? Basically introduce introsort in D113413 and then replace it by this bitset sort? Sorry if that's a stupid question.

libcxx/include/__algorithm/sort.h
34	Oh, that's all good. If there's prior art with this name, ignore my comment -- I was not aware of it.

In D93233#3118424, @ldionne wrote:

In D93233#3101571, @nilayvaish wrote:

Hi ldionne@, I have question for you. Do the tests we have for libc++ ensure that the new implementation works well across all versions of the C++ standard? I want to make sure that I am not going to unintentionally break someone's standard-compliant code out there.

They should, however you're welcome to look at them - our tests are sometimes not up to our quality standards, so you should make sure they will catch all issues you can think about.

We do run those tests in C++03, C++11, C++14, C++17, C++20 and C++23 modes, if that was the sole intent of your question.

I think the current tests do not cover some of the ways in which folks use sort. I did some internal testing and that revealed few compilation issues with one of the previous versions of the diff. I'll try to reproduce those issues and add those to tests.

In D93233#3116956, @nilayvaish wrote:

ldionne@, I am going to break this change into few parts and provide the benchmark results for each of the part separately. To begin with, I have posted: https://reviews.llvm.org/D113413 that makes the introsort change to the std::sort implementation.

That's fine by me if you prefer doing it that way, however at the moment I am failing to understand where https://reviews.llvm.org/D113413 fits into the picture. Can you explain what's the plan? Basically introduce introsort in D113413 and then replace it by this bitset sort? Sorry if that's a stupid question.

D93233 has three different changes in it which I would like to break into parts. So the plan is to do the following:

D113413: introduce introsort (improves worst case from O(n^2) to O(n log n).
D93233 or another commit: introduce bitset sort (improves the core of the quicksort implementation to avoid branch mispredictions)
D93233 or another commit: improve small sized sort (improves small sized sorting so that it is possibly vectorized by the compiler and the number of comparisons in use is optimal).

Gentle ping here -- we have now fixed the O(N^2) behavior with introsort, but do we still want to make these improvements? (I suspect we do)

nilayvaish mentioned this in D122780: Modify std::sort: add BlockQuickSort partitioning algorithm for arithmetic types.Mar 31 2022, 10:47 AM

nilayvaish abandoned this revision.Nov 11 2022, 4:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2022, 4:37 PM

Revision Contents

Path

Size

libcxx/

include/

__algorithm/

nth_element.h

22 lines

sort.h

1095 lines

stable_sort.h

43 lines

src/

CMakeLists.txt

2 lines

algorithm.cpp

legacy-sort.cpp

395 lines

Diff 384547

libcxx/include/__algorithm/nth_element.h

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef _LIBCPP___ALGORITHM_NTH_ELEMENT_H			#ifndef _LIBCPP___ALGORITHM_NTH_ELEMENT_H
	#define _LIBCPP___ALGORITHM_NTH_ELEMENT_H			#define _LIBCPP___ALGORITHM_NTH_ELEMENT_H

	#include <__config>
	#include <__algorithm/comp.h>
	#include <__algorithm/comp_ref_type.h>			#include <__algorithm/comp_ref_type.h>
				#include <__algorithm/comp.h>
				#include <__algorithm/min_element.h>
	#include <__algorithm/sort.h>			#include <__algorithm/sort.h>
				#include <__config>
	#include <__iterator/iterator_traits.h>			#include <__iterator/iterator_traits.h>
	#include <__utility/swap.h>			#include <__utility/swap.h>

	#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)			#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
	#pragma GCC system_header			#pragma GCC system_header
	#endif			#endif

	_LIBCPP_BEGIN_NAMESPACE_STD			_LIBCPP_BEGIN_NAMESPACE_STD

				// Assumes size > 0
				template <class _Compare, class _BidirectionalIterator>
				_LIBCPP_CONSTEXPR_AFTER_CXX11 void __selection_sort(_BidirectionalIterator __first, _BidirectionalIterator __last,
				_Compare __comp) {
				_BidirectionalIterator __lm1 = __last;
				for (--__lm1; __first != __lm1; ++__first) {
				_BidirectionalIterator __i = _VSTD::min_element<_BidirectionalIterator, _Compare&>(__first, __last, __comp);
				if (__i != __first)
				swap(__first, __i);
				}
				}

	template<class _Compare, class _RandomAccessIterator>			template<class _Compare, class _RandomAccessIterator>
	_LIBCPP_CONSTEXPR_AFTER_CXX11 bool			_LIBCPP_CONSTEXPR_AFTER_CXX11 bool
	__nth_element_find_guard(_RandomAccessIterator& __i, _RandomAccessIterator& __j,			__nth_element_find_guard(_RandomAccessIterator& __i, _RandomAccessIterator& __j,
	_RandomAccessIterator __m, _Compare __comp)			_RandomAccessIterator __m, _Compare __comp)
	{			{
	// manually guard downward moving __j against __i			// manually guard downward moving __j against __i
	while (true) {			while (true) {
	if (__i == --__j) {			if (__i == --__j) {
	return false;			return false;
	}			}
	if (__comp(__j, __m)) {			if (__comp(__j, __m)) {
	return true; // found guard for downward moving __j, now use unguarded partition			return true; // found guard for downward moving __j, now use unguarded partition
	}			}
	}			}
	}			}

	template <class _Compare, class _RandomAccessIterator>			template <class _Compare, class _RandomAccessIterator>
	_LIBCPP_CONSTEXPR_AFTER_CXX11 void			_LIBCPP_CONSTEXPR_AFTER_CXX11 void
	__nth_element(_RandomAccessIterator __first, _RandomAccessIterator __nth, _RandomAccessIterator __last, _Compare __comp)			__nth_element(_RandomAccessIterator __first, _RandomAccessIterator __nth, _RandomAccessIterator __last, _Compare __comp)
	{			{
	// _Compare is known to be a reference type			// _Compare is known to be a reference type
	typedef typename iterator_traits<_RandomAccessIterator>::difference_type difference_type;			typedef typename iterator_traits<_RandomAccessIterator>::difference_type difference_type;
	const difference_type __limit = 7;			const difference_type __limit = 7;
				__sorting_network::__conditional_swap<_RandomAccessIterator, _Compare> __cond_swap(__comp);
	while (true)			while (true)
	{			{
	if (__nth == __last)			if (__nth == __last)
	return;			return;
	difference_type __len = __last - __first;			difference_type __len = __last - __first;
	switch (__len)			switch (__len)
	{			{
	case 0:			case 0:
	case 1:			case 1:
	return;			return;
	case 2:			case 2:
	if (__comp(--__last, __first))			if (__comp(--__last, __first))
	swap(__first, __last);			swap(__first, __last);
	return;			return;
	case 3:			case 3:
	{			{
	_RandomAccessIterator __m = __first;			_RandomAccessIterator __m = __first;
	_VSTD::__sort3<_Compare>(__first, ++__m, --__last, __comp);			__sorting_network::__sort3(__first, ++__m, --__last, __cond_swap);
	return;			return;
	}			}
	}			}
	if (__len <= __limit)			if (__len <= __limit)
	{			{
	_VSTD::__selection_sort<_Compare>(__first, __last, __comp);			_VSTD::__selection_sort<_Compare>(__first, __last, __comp);
	return;			return;
	}			}
	// __len > __limit >= 3			// __len > __limit >= 3
	_RandomAccessIterator __m = __first + __len/2;			_RandomAccessIterator __m = __first + __len/2;
	_RandomAccessIterator __lm1 = __last;			_RandomAccessIterator __lm1 = __last;
	unsigned __n_swaps = _VSTD::__sort3<_Compare>(__first, __m, --__lm1, __comp);			unsigned __n_swaps = __sorting_network::__sort3_with_number_of_swaps<_Compare>(__first, __m, --__lm1, __comp);
	// *__m is median			// *__m is median
	// partition [__first, __m) < __m and __m <= [__m, __last)			// partition [__first, __m) < __m and __m <= [__m, __last)
	// (this inhibits tossing elements equivalent to __m around unnecessarily)			// (this inhibits tossing elements equivalent to __m around unnecessarily)
	_RandomAccessIterator __i = __first;			_RandomAccessIterator __i = __first;
	_RandomAccessIterator __j = __lm1;			_RandomAccessIterator __j = __lm1;
	// j points beyond range to be tested, __lm1 is known to be <= __m			// j points beyond range to be tested, __lm1 is known to be <= __m
	// The search going up is known to be guarded but the search coming down isn't.			// The search going up is known to be guarded but the search coming down isn't.
	// Prime the downward search with a guard.			// Prime the downward search with a guard.
	▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

libcxx/include/__algorithm/sort.h

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef _LIBCPP___ALGORITHM_SORT_H			#ifndef _LIBCPP___ALGORITHM_SORT_H
	#define _LIBCPP___ALGORITHM_SORT_H			#define _LIBCPP___ALGORITHM_SORT_H

	#include <__config>
	#include <__algorithm/comp.h>			#include <__algorithm/comp.h>
	#include <__algorithm/comp_ref_type.h>			#include <__algorithm/comp_ref_type.h>
	#include <__algorithm/min_element.h>			#include <__algorithm/iter_swap.h>
				#include <__algorithm/make_heap.h>
	#include <__algorithm/partial_sort.h>			#include <__algorithm/partial_sort.h>
				#include <__algorithm/sort_heap.h>
	#include <__algorithm/unwrap_iter.h>			#include <__algorithm/unwrap_iter.h>
				#include <__config>
				#include <__iterator/iterator_traits.h>
				#include <__iterator/move_iterator.h>
				#include <__utility/move.h>
				#include <__utility/pair.h>
	#include <__utility/swap.h>			#include <__utility/swap.h>
	#include <memory>			#include <memory>
				#include <type_traits>

	#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)			#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
	#pragma GCC system_header			#pragma GCC system_header
	#endif			#endif

	_LIBCPP_BEGIN_NAMESPACE_STD			_LIBCPP_BEGIN_NAMESPACE_STD

				namespace __sorting_network {
				ldionneUnsubmitted Not Done Reply Inline Actions Why is this called `__sorting_network`? Sounds like a weird name. ldionne: Why is this called `__sorting_network`? Sounds like a weird name.
				nilayvaishAuthorUnsubmitted Done Reply Inline Actions I think the name has been prevalent in the theory community for a long time. More info here: https://en.wikipedia.org/wiki/Sorting_network. The code in functions __sort[3...8] mimics what the sorting networks for those lengths would look like. The name is strange from namespace perspective. Willing to name it something else that you prefer. nilayvaish: I think the name has been prevalent in the theory community for a long time. More info here…
				ldionneUnsubmitted Done Reply Inline Actions Oh, that's all good. If there's prior art with this name, ignore my comment -- I was not aware of it. ldionne: Oh, that's all good. If there's prior art with this name, ignore my comment -- I was not aware…

				template <class _RandomAccessIterator, class _Compare>
				class __conditional_swap {
				public:
				typedef typename __comp_ref_type<_Compare>::type _Comp_ref;

				_LIBCPP_CONSTEXPR_AFTER_CXX11 _Comp_ref get() const { return comp_; }
				_LIBCPP_CONSTEXPR_AFTER_CXX11 __conditional_swap(const _Comp_ref __comp) : comp_(__comp) {}
				_LIBCPP_CONSTEXPR_AFTER_CXX11 inline void operator()(_RandomAccessIterator __x, _RandomAccessIterator __y) {
				typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type value_type;
				bool __result = comp_(__y, __x);
				// Expect a compiler would short-circuit the following if-block.
				// 4 * sizeof(size_t) is a magic number. Expect a compiler to use SIMD
				// instruction on them.
				if (_VSTD::is_trivially_copy_constructible<value_type>::value &&
				_VSTD::is_trivially_copy_assignable<value_type>::value && sizeof(value_type) <= 4 * sizeof(size_t)) {
				value_type __min = __result ? _VSTD::move(__y) : _VSTD::move(__x);
				__y = __result ? _VSTD::move(__x) : _VSTD::move(*__y);
				*__x = _VSTD::move(__min);
				} else {
				if (__result) {
				_VSTD::iter_swap(__x, __y);
				}
				}
				}

				private:
				_Comp_ref comp_;
				};

				template <class _RandomAccessIterator, class _Compare>
				class __reverse_conditional_swap {
				typedef typename __comp_ref_type<_Compare>::type _Comp_ref;
				_Comp_ref comp_;

				public:
				_LIBCPP_CONSTEXPR_AFTER_CXX11 _Comp_ref get() const { return comp_; }
				_LIBCPP_CONSTEXPR_AFTER_CXX11
				__reverse_conditional_swap(const _Comp_ref __comp) : comp_(__comp) {}
				inline void operator()(_RandomAccessIterator __x, _RandomAccessIterator __y) {
				typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type value_type;
				bool __result = !comp_(__x, __y);
				// Expect a compiler would short-circuit the following if-block.
				if (_VSTD::is_trivially_copy_constructible<value_type>::value &&
				_VSTD::is_trivially_copy_assignable<value_type>::value && sizeof(value_type) <= 4 * sizeof(size_t)) {
				value_type __min = __result ? _VSTD::move(__x) : _VSTD::move(__y);
				__y = __result ? _VSTD::move(__y) : _VSTD::move(*__x);
				*__x = _VSTD::move(__min);
				} else {
				if (!__result) {
				_VSTD::iter_swap(__x, __y);
				}
				}
				}
				};

				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_HIDE_FROM_ABI void __sort2(_RandomAccessIterator __a, _ConditionalSwap __cond_swap) {
				__cond_swap(__a + 0, __a + 1);
				}

				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_HIDE_FROM_ABI void __sort3(_RandomAccessIterator __a, _ConditionalSwap __cond_swap) {
				__cond_swap(__a + 1, __a + 2);
				__cond_swap(__a + 0, __a + 1);
				__cond_swap(__a + 1, __a + 2);
				}

				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_HIDE_FROM_ABI void __sort4(_RandomAccessIterator __a, _ConditionalSwap __cond_swap) {
				__cond_swap(__a + 0, __a + 1);
				__cond_swap(__a + 2, __a + 3);
				__cond_swap(__a + 0, __a + 2);
				__cond_swap(__a + 1, __a + 3);
				__cond_swap(__a + 1, __a + 2);
				}

				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_HIDE_FROM_ABI void __sort5(_RandomAccessIterator __a, _ConditionalSwap __cond_swap) {
				__cond_swap(__a + 0, __a + 1);
				__cond_swap(__a + 3, __a + 4);
				__cond_swap(__a + 2, __a + 3);
				__cond_swap(__a + 3, __a + 4);
				__cond_swap(__a + 0, __a + 3);
				__cond_swap(__a + 1, __a + 4);
				__cond_swap(__a + 0, __a + 2);
				__cond_swap(__a + 1, __a + 3);
				__cond_swap(__a + 1, __a + 2);
				}

				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_HIDE_FROM_ABI void __sort6(_RandomAccessIterator __a, _ConditionalSwap __cond_swap) {
				__cond_swap(__a + 1, __a + 2);
				__cond_swap(__a + 4, __a + 5);
				__cond_swap(__a + 0, __a + 1);
				__cond_swap(__a + 3, __a + 4);
				__cond_swap(__a + 1, __a + 2);
				__cond_swap(__a + 4, __a + 5);
				__cond_swap(__a + 0, __a + 3);
				__cond_swap(__a + 1, __a + 4);
				__cond_swap(__a + 2, __a + 5);
				__cond_swap(__a + 2, __a + 4);
				__cond_swap(__a + 1, __a + 3);
				__cond_swap(__a + 2, __a + 3);
				}
				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_HIDE_FROM_ABI void __sort7(_RandomAccessIterator __a, _ConditionalSwap __cond_swap) {
				__cond_swap(__a + 1, __a + 2);
				__cond_swap(__a + 3, __a + 4);
				__cond_swap(__a + 5, __a + 6);
				__cond_swap(__a + 0, __a + 1);
				__cond_swap(__a + 3, __a + 5);
				__cond_swap(__a + 4, __a + 6);
				__cond_swap(__a + 1, __a + 2);
				__cond_swap(__a + 4, __a + 5);
				__cond_swap(__a + 0, __a + 4);
				__cond_swap(__a + 1, __a + 5);
				__cond_swap(__a + 2, __a + 6);
				__cond_swap(__a + 0, __a + 3);
				__cond_swap(__a + 2, __a + 5);
				__cond_swap(__a + 1, __a + 3);
				__cond_swap(__a + 2, __a + 4);
				__cond_swap(__a + 2, __a + 3);
				}

				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_HIDE_FROM_ABI void __sort8(_RandomAccessIterator __a, _ConditionalSwap __cond_swap) {
				__cond_swap(__a + 0, __a + 1);
				__cond_swap(__a + 2, __a + 3);
				__cond_swap(__a + 4, __a + 5);
				__cond_swap(__a + 6, __a + 7);
				__cond_swap(__a + 0, __a + 2);
				__cond_swap(__a + 1, __a + 3);
				__cond_swap(__a + 4, __a + 6);
				__cond_swap(__a + 5, __a + 7);
				__cond_swap(__a + 1, __a + 2);
				__cond_swap(__a + 5, __a + 6);
				__cond_swap(__a + 0, __a + 4);
				__cond_swap(__a + 1, __a + 5);
				__cond_swap(__a + 2, __a + 6);
				__cond_swap(__a + 3, __a + 7);
				__cond_swap(__a + 1, __a + 4);
				__cond_swap(__a + 3, __a + 6);
				__cond_swap(__a + 2, __a + 4);
				__cond_swap(__a + 3, __a + 5);
				__cond_swap(__a + 3, __a + 4);
				}

				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_HIDE_FROM_ABI void __sort1to8(_RandomAccessIterator __a,
				typename _VSTD::iterator_traits<_RandomAccessIterator>::difference_type __len,
				_ConditionalSwap __cond_swap) {
				switch (__len) {
				case 0:
				case 1:
				return;
				case 2:
				__sort2(__a, __cond_swap);
				return;
				case 3:
				__sort3(__a, __cond_swap);
				return;
				case 4:
				__sort4(__a, __cond_swap);
				return;
				case 5:
				__sort5(__a, __cond_swap);
				return;
				case 6:
				__sort6(__a, __cond_swap);
				return;
				case 7:
				__sort7(__a, __cond_swap);
				return;
				case 8:
				__sort8(__a, __cond_swap);
				return;
				}
				// ignore
				}
				template <class _RandomAccessIterator, class _ConditionalSwap>
				_LIBCPP_CONSTEXPR_AFTER_CXX11 _LIBCPP_HIDE_FROM_ABI void __sort3(_RandomAccessIterator __a0, _RandomAccessIterator __a1,
				_RandomAccessIterator __a2,
				_ConditionalSwap __cond_swap) {
				__cond_swap(__a1, __a2);
				__cond_swap(__a0, __a2);
				__cond_swap(__a0, __a1);
				}

	// stable, 2-3 compares, 0-2 swaps			// stable, 2-3 compares, 0-2 swaps

	template <class _Compare, class _ForwardIterator>			template <class _Compare, class _ForwardIterator>
	_LIBCPP_CONSTEXPR_AFTER_CXX11 unsigned			_LIBCPP_CONSTEXPR_AFTER_CXX11 _LIBCPP_HIDE_FROM_ABI unsigned
	__sort3(_ForwardIterator __x, _ForwardIterator __y, _ForwardIterator __z, _Compare __c)			__sort3_with_number_of_swaps(_ForwardIterator __x, _ForwardIterator __y, _ForwardIterator __z, _Compare __c) {
	{
	unsigned __r = 0;			unsigned __r = 0;
	if (!__c(__y, __x)) // if x <= y			if (!__c(__y, __x)) // if x <= y
	{			{
	if (!__c(__z, __y)) // if y <= z			if (!__c(__z, __y)) // if y <= z
	return __r; // x <= y && y <= z			return __r; // x <= y && y <= z
	// x <= y && y > z			// x <= y && y > z
	swap(__y, __z); // x <= z && y < z			swap(__y, __z); // x <= z && y < z
	__r = 1;			__r = 1;
	if (__c(__y, __x)) // if x > y			if (__c(__y, __x)) // if x > y
	{			{
	swap(__x, __y); // x < y && y <= z			swap(__x, __y); // x < y && y <= z
	__r = 2;			__r = 2;
	}			}
	return __r; // x <= y && y < z			return __r; // x <= y && y < z
	}			}
	if (__c(__z, __y)) // x > y, if y > z			if (__c(__z, __y)) // x > y, if y > z
	{			{
	swap(__x, __z); // x < y && y < z			swap(__x, __z); // x < y && y < z
	__r = 1;			__r = 1;
	return __r;			return __r;
	}			}
	swap(__x, __y); // x > y && y <= z			swap(__x, __y); // x > y && y <= z
	__r = 1; // x < y && x <= z			__r = 1; // x < y && x <= z
	if (__c(__z, __y)) // if y > z			if (__c(__z, __y)) // if y > z
	{			{
	swap(__y, __z); // x <= y && y < z			swap(__y, __z); // x <= y && y < z
	__r = 2;			__r = 2;
	}			}
	return __r;			return __r;
	} // x <= y && y <= z			}

	// stable, 3-6 compares, 0-5 swaps			} // namespace __sorting_network

	template <class _Compare, class _ForwardIterator>			namespace __bitonic {
	unsigned			class __detail {
	__sort4(_ForwardIterator __x1, _ForwardIterator __x2, _ForwardIterator __x3,			public:
	_ForwardIterator __x4, _Compare __c)			enum {
	{			__batch = 8,
	unsigned __r = _VSTD::__sort3<_Compare>(__x1, __x2, __x3, __c);			__bitonic_batch = __batch * 2,
	if (__c(__x4, __x3))			__small_sort_max = __bitonic_batch * 2,
	{			};
	swap(__x3, __x4);			};
	++__r;
	if (__c(__x3, __x2))			template <class _RandomAccessIterator, class _ConditionalSwap, class _ReverseConditionalSwap>
	{			_LIBCPP_HIDE_FROM_ABI void __enforce_order(_RandomAccessIterator __first, _RandomAccessIterator __last,
	swap(__x2, __x3);			_ConditionalSwap __cond_swap, _ReverseConditionalSwap __reverse_cond_swap) {
	++__r;			_RandomAccessIterator __i = __first;
	if (__c(__x2, __x1))			while (__detail::__bitonic_batch <= __last - __i) {
	{			__sorting_network::__sort8(__i, __cond_swap);
	swap(__x1, __x2);			__sorting_network::__sort8(__i + __detail::__batch, __reverse_cond_swap);
	++__r;			__i += __detail::__bitonic_batch;
				}
				if (__detail::__batch <= __last - __i) {
				__sorting_network::__sort8(__i, __cond_swap);
				__i += __detail::__batch;
				__sorting_network::__sort1to8(__i, __last - __i, __reverse_cond_swap);
				} else {
				__sorting_network::__sort1to8(__i, __last - __i, __cond_swap);
				}
	}			}

				class __construct {
				public:
				template <class _Type1, class _Type2>
				static inline void __op(_Type1* __result, _Type2&& __val) {
				new ((void*)__result) _Type1(_VSTD::move(__val));
				}
				};

				class __move_assign {
				public:
				template <class _Type1, class _Type2>
				static inline void __op(_Type1 __result, _Type2&& __val) {
				*__result = _VSTD::move(__val);
				}
				};

				template <class _Copy, class _Compare, class _InputIterator, class _OutputIterator>
				_LIBCPP_HIDE_FROM_ABI void __forward_merge(_InputIterator __first, _InputIterator __last, _OutputIterator __result,
				_Compare __comp) {
				--__last;
				// The len used here is one less than the actual length. This is so that the
				// comparison is carried out against 0. The final move is done
				// unconditionally at the end.
				typename _VSTD::iterator_traits<_InputIterator>::difference_type __len = __last - __first;
				for (; __len > 0; __len--) {
				if (__comp(__last, __first)) {
				_Copy::__op(__result, _VSTD::move(*__last));
				--__last;
				} else {
				_Copy::__op(__result, _VSTD::move(*__first));
				++__first;
	}			}
				++__result;
	}			}
	return __r;			_Copy::__op(__result, _VSTD::move(*__first));
	}			}

	// stable, 4-10 compares, 0-9 swaps			template <class _Copy, class _Compare, class _InputIterator, class _OutputIterator>
				_LIBCPP_HIDE_FROM_ABI void __backward_merge(_InputIterator __first, _InputIterator __last, _OutputIterator __result,
				_Compare __comp) {
				--__last;
				__result += __last - __first;
				// The len used here is one less than the actual length. This is so that the
				// comparison is carried out against 0. The final move is done
				// unconditionally at the end.
				typename _VSTD::iterator_traits<_InputIterator>::difference_type __len = __last - __first;
				for (; __len > 0; __len--) {
				if (__comp(__first, __last)) {
				_Copy::__op(__result, _VSTD::move(*__first));
				++__first;
				} else {
				_Copy::__op(__result, _VSTD::move(*__last));
				--__last;
				}
				--__result;
				}
				_Copy::__op(__result, _VSTD::move(*__first));
				}

	template <class _Compare, class _ForwardIterator>			template <class _RandomAccessIterator, class _ConditionalSwap, class _ReverseConditionalSwap>
	_LIBCPP_HIDDEN			inline _LIBCPP_HIDE_FROM_ABI bool
	unsigned			__small_sort(_RandomAccessIterator __first,
	__sort5(_ForwardIterator __x1, _ForwardIterator __x2, _ForwardIterator __x3,			typename _VSTD::iterator_traits<_RandomAccessIterator>::difference_type __len,
	_ForwardIterator __x4, _ForwardIterator __x5, _Compare __c)			typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type* __buff, _ConditionalSwap __cond_swap,
	{			_ReverseConditionalSwap __reverse_cond_swap) {
	unsigned __r = _VSTD::__sort4<_Compare>(__x1, __x2, __x3, __x4, __c);			typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type value_type;
	if (__c(__x5, __x4))			typedef typename _ConditionalSwap::_Comp_ref _Comp_ref;
	{			if (__len > __detail::__small_sort_max) {
	swap(__x4, __x5);			return false;
	++__r;			}
	if (__c(__x4, __x3))			_RandomAccessIterator __last = __first + __len;
	{			__enforce_order(__first, __last, __cond_swap, __reverse_cond_swap);
	swap(__x3, __x4);			if (__len <= __detail::__batch) {
	++__r;			// sorted.
	if (__c(__x3, __x2))			return true;
	{
	swap(__x2, __x3);
	++__r;
	if (__c(__x2, __x1))
	{
	swap(__x1, __x2);
	++__r;
	}			}
				const _Comp_ref __comp = __cond_swap.get();
				if (__len <= __detail::__bitonic_batch) {
				// single bitonic order merge.
				__forward_merge<__construct, _Comp_ref>(__first, __last, __buff, _Comp_ref(__comp));
				_VSTD::copy(_VSTD::make_move_iterator(__buff), _VSTD::make_move_iterator(__buff + __len), __first);
				for (auto __iter = __buff; __iter < __buff + __len; __iter++) {
				(*__iter).~value_type();
	}			}
				return true;
	}			}
				// double bitonic order merge.
				__forward_merge<__construct, _Comp_ref>(__first, __first + __detail::__bitonic_batch, __buff, _Comp_ref(__comp));
				__backward_merge<__construct, _Comp_ref>(__first + __detail::__bitonic_batch, __last,
				__buff + __detail::__bitonic_batch, _Comp_ref(__comp));
				__forward_merge<__move_assign, _Comp_ref>(__buff, __buff + __len, __first, _Comp_ref(__comp));
				for (auto __iter = __buff; __iter < __buff + __len; __iter++) {
				(*__iter).~value_type();
	}			}
	return __r;			return true;
	}			}
				} // namespace __bitonic

	// Assumes size > 0			namespace __bitsetsort {
	template <class _Compare, class _BidirectionalIterator>			struct __64bit_set {
	_LIBCPP_CONSTEXPR_AFTER_CXX11 void			typedef uint64_t __storage_t;
	__selection_sort(_BidirectionalIterator __first, _BidirectionalIterator __last, _Compare __comp)			enum { __block_size = 64 };
	{			static __storage_t __blsr(__storage_t x) {
	_BidirectionalIterator __lm1 = __last;			// _blsr_u64 can be used here but it did not make any performance
	for (--__lm1; __first != __lm1; ++__first)			// difference in practice.
	{			return x ^ (x & -x);
	_BidirectionalIterator __i = _VSTD::min_element<_BidirectionalIterator, _Compare&>(__first, __last, __comp);			}
	if (__i != __first)			static int __clz(__storage_t x) { return __builtin_clzll(x); }
	swap(__first, __i);			static int __ctz(__storage_t x) { return __builtin_ctzll(x); }
				};

				struct __32bit_set {
				typedef uint32_t __storage_t;
				enum { __block_size = 32 };
				static __storage_t __blsr(__storage_t x) {
				// _blsr_u32 can be used here but it did not make any performance
				// difference in practice.
				return x ^ (x & -x);
				}
				static int __clz(__storage_t x) { return __builtin_clzl(x); }
				static int __ctz(__storage_t x) { return __builtin_ctzl(x); }
				};

				template <int _Width>
				struct __set_selector {
				typedef __64bit_set __set;
				};

				template <>
				struct __set_selector<4> {
				typedef __32bit_set __set;
				};

				template <class _Bitset, class _RandomAccessIterator>
				inline _LIBCPP_HIDE_FROM_ABI void __swap_bitmap_pos(_RandomAccessIterator __first, _RandomAccessIterator __last,
				typename _Bitset::__storage_t& __left_bitset,
				typename _Bitset::__storage_t& __right_bitset) {
				while (__left_bitset != 0 & __right_bitset != 0) {
				int tz_left = _Bitset::__ctz(__left_bitset);
				__left_bitset = _Bitset::__blsr(__left_bitset);
				int tz_right = _Bitset::__ctz(__right_bitset);
				__right_bitset = _Bitset::__blsr(__right_bitset);
				_VSTD::iter_swap(__first + tz_left, __last - tz_right);
				}
				}

				template <class _Bitset, class _RandomAccessIterator, class _Compare>
				_LIBCPP_HIDE_FROM_ABI _VSTD::pair<_RandomAccessIterator, bool>
				__bitset_partition(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {
				typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type value_type;
				typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::difference_type difference_type;
				typedef typename _Bitset::__storage_t __storage_t;
				_RandomAccessIterator __begin = __first;
				value_type __pivot(_VSTD::move(*__first));

				// Check if pivot is less than the last element. Checking this first avoids
				// comparing the first and the last iterators on each iteration as done in the
				// else part.
				if (__comp(__pivot, *(__last - 1))) {
				// Guarded.
				while (!__comp(__pivot, *++__first)) {
				}
				} else {
				while (++__first < __last && !__comp(__pivot, *__first)) {
	}			}
	}			}

	template <class _Compare, class _BidirectionalIterator>			if (__first < __last) {
	void			// It will be always guarded because __bitset_sort will do the
	__insertion_sort(_BidirectionalIterator __first, _BidirectionalIterator __last, _Compare __comp)			// median-of-three before calling this.
	{			while (__comp(__pivot, *--__last)) {
	typedef typename iterator_traits<_BidirectionalIterator>::value_type value_type;			}
	if (__first != __last)			}
	{			bool __already_partitioned = __first >= __last;
	_BidirectionalIterator __i = __first;			if (!__already_partitioned) {
	for (++__i; __i != __last; ++__i)			_VSTD::iter_swap(__first, __last);
	{			++__first;
	_BidirectionalIterator __j = __i;			}
	value_type __t(_VSTD::move(*__j));
	for (_BidirectionalIterator __k = __i; __k != __first && __comp(__t, *--__k); --__j)			// In [__first, __last) __last is not inclusive. From now one, it uses last
	__j = _VSTD::move(__k);			// minus one to be inclusive on both sides.
	*__j = _VSTD::move(__t);			_RandomAccessIterator __lm1 = __last - 1;
				__storage_t __left_bitset = 0;
				__storage_t __right_bitset = 0;

				// Reminder: length = __lm1 - __first + 1.
				while (__lm1 - __first >= 2 * _Bitset::__block_size - 1) {
				if (__left_bitset == 0) {
				// Possible vectorization. With a proper "-march" flag, the following loop
				// will be compiled into a set of SIMD instructions.
				_RandomAccessIterator __iter = __first;
				for (int __j = 0; __j < _Bitset::__block_size;) {
				bool __comp_result = __comp(__pivot, *__iter);
				__left_bitset \|= (static_cast<__storage_t>(__comp_result) << __j);
				__j++;
				++__iter;
				}
				}
				if (__right_bitset == 0) {
				// Possible vectorization. With a proper "-march" flag, the following loop
				// will be compiled into a set of SIMD instructions.
				_RandomAccessIterator __iter = __lm1;
				for (int __j = 0; __j < _Bitset::__block_size;) {
				bool __comp_result = __comp(*__iter, __pivot);
				__right_bitset \|= (static_cast<__storage_t>(__comp_result) << __j);
				__j++;
				--__iter;
				}
				}
				__swap_bitmap_pos<_Bitset>(__first, __lm1, __left_bitset, __right_bitset);
				__first += (__left_bitset == 0) ? _Bitset::__block_size : 0;
				__lm1 -= (__right_bitset == 0) ? _Bitset::__block_size : 0;
				}
				// Now, we have a less-than a block on each side.
				difference_type __remaining_len = __lm1 - __first + 1;
				difference_type __l_size;
				difference_type __r_size;
				if (__left_bitset == 0 && __right_bitset == 0) {
				__l_size = __remaining_len / 2;
				__r_size = __remaining_len - __l_size;
				} else if (__left_bitset == 0) {
				// We know at least one side is a full block.
				__l_size = __remaining_len - _Bitset::__block_size;
				__r_size = _Bitset::__block_size;
				} else { // if (__right_bitset == 0)
				__l_size = _Bitset::__block_size;
				__r_size = __remaining_len - _Bitset::__block_size;
				}
				if (__left_bitset == 0) {
				_RandomAccessIterator __iter = __first;
				for (int j = 0; j < __l_size; j++) {
				bool __comp_result = __comp(__pivot, *__iter);
				__left_bitset \|= (static_cast<__storage_t>(__comp_result) << j);
				++__iter;
				}
				}
				if (__right_bitset == 0) {
				_RandomAccessIterator __iter = __lm1;
				for (int j = 0; j < __r_size; j++) {
				bool __comp_result = __comp(*__iter, __pivot);
				__right_bitset \|= (static_cast<__storage_t>(__comp_result) << j);
				--__iter;
				}
				}
				__swap_bitmap_pos<_Bitset>(__first, __lm1, __left_bitset, __right_bitset);
				__first += (__left_bitset == 0) ? __l_size : 0;
				__lm1 -= (__right_bitset == 0) ? __r_size : 0;

				if (__left_bitset) {
				// Swap within the left side.
				// Need to find set positions in the reverse order.
				while (__left_bitset != 0) {
				int __tz_left = _Bitset::__block_size - 1 - _Bitset::__clz(__left_bitset);
				__left_bitset &= (static_cast<__storage_t>(1) << __tz_left) - 1;
				_RandomAccessIterator it = __first + __tz_left;
				if (it != __lm1) {
				_VSTD::iter_swap(it, __lm1);
	}			}
				--__lm1;
	}			}
				__first = __lm1 + 1;
				} else if (__right_bitset) {
				// Swap within the right side.
				// Need to find set positions in the reverse order.
				while (__right_bitset != 0) {
				int __tz_right = _Bitset::__block_size - 1 - _Bitset::__clz(__right_bitset);
				__right_bitset &= (static_cast<__storage_t>(1) << __tz_right) - 1;
				_RandomAccessIterator it = __lm1 - __tz_right;
				if (it != __first) {
				_VSTD::iter_swap(it, __first);
				}
				++__first;
	}			}

	template <class _Compare, class _RandomAccessIterator>
	void
	__insertion_sort_3(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp)
	{
	typedef typename iterator_traits<_RandomAccessIterator>::value_type value_type;
	_RandomAccessIterator __j = __first+2;
	_VSTD::__sort3<_Compare>(__first, __first+1, __j, __comp);
	for (_RandomAccessIterator __i = __j+1; __i != __last; ++__i)
	{
	if (__comp(__i, __j))
	{
	value_type __t(_VSTD::move(*__i));
	_RandomAccessIterator __k = __j;
	__j = __i;
	do
	{
	__j = _VSTD::move(__k);
	__j = __k;
	} while (__j != __first && __comp(__t, *--__k));
	*__j = _VSTD::move(__t);
	}			}
	__j = __i;
				_RandomAccessIterator __pivot_pos = __first - 1;
				if (__begin != __pivot_pos) {
				__begin = _VSTD::move(__pivot_pos);
	}			}
				*__pivot_pos = _VSTD::move(__pivot);
				return _VSTD::make_pair(__pivot_pos, __already_partitioned);
	}			}

	template <class _Compare, class _RandomAccessIterator>			template <class _Compare, class _RandomAccessIterator>
	bool			inline _LIBCPP_HIDE_FROM_ABI bool __partial_insertion_sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
	__insertion_sort_incomplete(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp)			_Compare __comp) {
	{			typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type value_type;
	switch (__last - __first)			if (__first == __last)
	{
	case 0:
	case 1:
	return true;
	case 2:
	if (__comp(--__last, __first))
	swap(__first, __last);
	return true;
	case 3:
	_VSTD::__sort3<_Compare>(__first, __first+1, --__last, __comp);
	return true;			return true;
	case 4:
	_VSTD::__sort4<_Compare>(__first, __first+1, __first+2, --__last, __comp);
	return true;
	case 5:
	_VSTD::__sort5<_Compare>(__first, __first+1, __first+2, __first+3, --__last, __comp);
	return true;
	}
	typedef typename iterator_traits<_RandomAccessIterator>::value_type value_type;
	_RandomAccessIterator __j = __first+2;
	_VSTD::__sort3<_Compare>(__first, __first+1, __j, __comp);
	const unsigned __limit = 8;			const unsigned __limit = 8;
	unsigned __count = 0;			unsigned __count = 0;
	for (_RandomAccessIterator __i = __j+1; __i != __last; ++__i)			_RandomAccessIterator __j = __first;
	{			for (_RandomAccessIterator __i = __j + 1; __i != __last; ++__i) {
	if (__comp(__i, __j))			if (__comp(__i, __j)) {
	{
	value_type __t(_VSTD::move(*__i));			value_type __t(_VSTD::move(*__i));
	_RandomAccessIterator __k = __j;			_RandomAccessIterator __k = __j;
	__j = __i;			__j = __i;
	do			do {
	{
	__j = _VSTD::move(__k);			__j = _VSTD::move(__k);
	__j = __k;			__j = __k;
	} while (__j != __first && __comp(__t, *--__k));			} while (__j != __first && __comp(__t, *--__k));
	*__j = _VSTD::move(__t);			*__j = _VSTD::move(__t);
	if (++__count == __limit)			if (++__count == __limit)
	return ++__i == __last;			return ++__i == __last;
	}			}
	__j = __i;			__j = __i;
	}			}
	return true;			return true;
	}			}

	template <class _Compare, class _BidirectionalIterator>
	void
	__insertion_sort_move(_BidirectionalIterator __first1, _BidirectionalIterator __last1,
	typename iterator_traits<_BidirectionalIterator>::value_type* __first2, _Compare __comp)
	{
	typedef typename iterator_traits<_BidirectionalIterator>::value_type value_type;
	if (__first1 != __last1)
	{
	__destruct_n __d(0);
	unique_ptr<value_type, __destruct_n&> __h(__first2, __d);
	value_type* __last2 = __first2;
	::new ((void)__last2) value_type(_VSTD::move(__first1));
	__d.template __incr<value_type>();
	for (++__last2; ++__first1 != __last1; ++__last2)
	{
	value_type* __j2 = __last2;
	value_type* __i2 = __j2;
	if (__comp(__first1, --__i2))
	{
	::new ((void)__j2) value_type(_VSTD::move(__i2));
	__d.template __incr<value_type>();
	for (--__j2; __i2 != __first2 && __comp(__first1, --__i2); --__j2)
	__j2 = _VSTD::move(__i2);
	__j2 = _VSTD::move(__first1);
	}
	else
	{
	::new ((void)__j2) value_type(_VSTD::move(__first1));
	__d.template __incr<value_type>();
	}
	}
	__h.release();
	}
	}

	template <class _Compare, class _RandomAccessIterator>			template <class _Compare, class _RandomAccessIterator>
	void			void __bitsetsort_loop(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp,
	__sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp)			typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type* __buff,
	{			typename _VSTD::iterator_traits<_RandomAccessIterator>::difference_type __limit) {
	typedef typename iterator_traits<_RandomAccessIterator>::difference_type difference_type;			_LIBCPP_CONSTEXPR_AFTER_CXX11 int __ninther_threshold = 128;
	typedef typename iterator_traits<_RandomAccessIterator>::value_type value_type;			typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::difference_type difference_type;
	const difference_type __limit = is_trivially_copy_constructible<value_type>::value &&			typedef typename __comp_ref_type<_Compare>::type _Comp_ref;
	is_trivially_copy_assignable<value_type>::value ? 30 : 6;			__sorting_network::__conditional_swap<_RandomAccessIterator, _Compare> __cond_swap(__comp);
	while (true)			__sorting_network::__reverse_conditional_swap<_RandomAccessIterator, _Compare> __reverse_cond_swap(__comp);
	{			while (true) {
	__restart:			if (__limit == 0) {
	difference_type __len = __last - __first;			// Fallback to heap sort as Introsort suggests.
	switch (__len)			_VSTD::make_heap<_RandomAccessIterator, _Comp_ref>(__first, __last, _Comp_ref(__comp));
	{			_VSTD::sort_heap<_RandomAccessIterator, _Comp_ref>(__first, __last, _Comp_ref(__comp));
	case 0:
	case 1:
	return;
	case 2:
	if (__comp(--__last, __first))
	swap(__first, __last);
	return;
	case 3:
	_VSTD::__sort3<_Compare>(__first, __first+1, --__last, __comp);
	return;
	case 4:
	_VSTD::__sort4<_Compare>(__first, __first+1, __first+2, --__last, __comp);
	return;
	case 5:
	_VSTD::__sort5<_Compare>(__first, __first+1, __first+2, __first+3, --__last, __comp);
	return;			return;
	}			}
	if (__len <= __limit)			__limit--;
	{			difference_type __len = __last - __first;
	_VSTD::__insertion_sort_3<_Compare>(__first, __last, __comp);			if (__len <= __bitonic::__detail::__batch) {
				__sorting_network::__sort1to8(__first, __len, __cond_swap);
	return;			return;
	}			} else if (__len <= __bitonic::__detail::__small_sort_max) {
	// __len > 5			__bitonic::__small_sort(__first, __len, __buff, __cond_swap, __reverse_cond_swap);
	_RandomAccessIterator __m = __first;
	_RandomAccessIterator __lm1 = __last;
	--__lm1;
	unsigned __n_swaps;
	{
	difference_type __delta;
	if (__len >= 1000)
	{
	__delta = __len/2;
	__m += __delta;
	__delta /= 2;
	__n_swaps = _VSTD::__sort5<_Compare>(__first, __first + __delta, __m, __m+__delta, __lm1, __comp);
	}
	else
	{
	__delta = __len/2;
	__m += __delta;
	__n_swaps = _VSTD::__sort3<_Compare>(__first, __m, __lm1, __comp);
	}
	}
	// *__m is median
	// partition [__first, __m) < __m and __m <= [__m, __last)
	// (this inhibits tossing elements equivalent to __m around unnecessarily)
	_RandomAccessIterator __i = __first;
	_RandomAccessIterator __j = __lm1;
	// j points beyond range to be tested, __m is known to be <= __lm1
	// The search going up is known to be guarded but the search coming down isn't.
	// Prime the downward search with a guard.
	if (!__comp(__i, __m)) // if __first == __m
	{
	// __first == __m, *__first doesn't go in first part
	// manually guard downward moving __j against __i
	while (true)
	{
	if (__i == --__j)
	{
	// __first == __m, *__m <= all other elements
	// Parition instead into [__first, __i) == __first and __first < [__i, __last)
	++__i; // __first + 1
	__j = __last;
	if (!__comp(__first, --__j)) // we need a guard if __first == (__last-1)
	{
	while (true)
	{
	if (__i == __j)
	return; // [__first, __last) all equivalent elements
	if (__comp(__first, __i))
	{
	swap(__i, __j);
	++__n_swaps;
	++__i;
	break;
	}
	++__i;
	}
	}
	// [__first, __i) == __first and __first < [__j, __last) and __j == __last - 1
	if (__i == __j)
	return;			return;
	while (true)
	{
	while (!__comp(__first, __i))
	++__i;
	while (__comp(__first, --__j))
	;
	if (__i >= __j)
	break;
	swap(__i, __j);
	++__n_swaps;
	++__i;
	}
	// [__first, __i) == __first and __first < [__i, __last)
	// The first part is sorted, sort the second part
	// _VSTD::__sort<_Compare>(__i, __last, __comp);
	__first = __i;
	goto __restart;
	}
	if (__comp(__j, __m))
	{
	swap(__i, __j);
	++__n_swaps;
	break; // found guard for downward moving __j, now use unguarded partition
	}
	}
	}
	// It is known that __i < __m
	++__i;
	// j points beyond range to be tested, __m is known to be <= __lm1
	// if not yet partitioned...
	if (__i < __j)
	{
	// known that (__i - 1) < __m
	// known that __i <= __m
	while (true)
	{
	// __m still guards upward moving __i
	while (__comp(__i, __m))
	++__i;
	// It is now known that a guard exists for downward moving __j
	while (!__comp(--__j, __m))
	;
	if (__i > __j)
	break;
	swap(__i, __j);
	++__n_swaps;
	// It is known that __m != __j
	// If __m just moved, follow it
	if (__m == __i)
	__m = __j;
	++__i;
	}			}
				difference_type __half_len = __len / 2;
				if (__len > __ninther_threshold) {
				__sorting_network::__sort3(__first, __first + __half_len, __last - 1, __cond_swap);
				__sorting_network::__sort3(__first + 1, __first + (__half_len - 1), __last - 2, __cond_swap);
				__sorting_network::__sort3(__first + 2, __first + (__half_len + 1), __last - 3, __cond_swap);
				__sorting_network::__sort3(__first + (__half_len - 1), __first + __half_len, __first + (__half_len + 1),
				__cond_swap);
				_VSTD::iter_swap(__first, __first + __half_len);
				} else {
				__sorting_network::__sort3(__first + __half_len, __first, __last - 1, __cond_swap);
	}			}
	// [__first, __i) < __m and __m <= [__i, __last)			auto __ret = __bitset_partition<__64bit_set, _RandomAccessIterator, _Comp_ref>(__first, __last, _Comp_ref(__comp));
	if (__i != __m && __comp(__m, __i))			if (__ret.second) {
	{			bool __left = __partial_insertion_sort<_Comp_ref>(__first, __ret.first, _Comp_ref(__comp));
	swap(__i, __m);			bool __right = __partial_insertion_sort<_Comp_ref>(__ret.first + 1, __last, _Comp_ref(__comp));
	++__n_swaps;			if (__right) {
	}			if (__left)
	// [__first, __i) < __i and __i <= [__i+1, __last)
	// If we were given a perfect partition, see if insertion sort is quick...
	if (__n_swaps == 0)
	{
	bool __fs = _VSTD::__insertion_sort_incomplete<_Compare>(__first, __i, __comp);
	if (_VSTD::__insertion_sort_incomplete<_Compare>(__i+1, __last, __comp))
	{
	if (__fs)
	return;			return;
	__last = __i;			__last = __ret.first;
	continue;			continue;
	}			} else {
	else			if (__left) {
	{			__first = ++__ret.first;
	if (__fs)
	{
	__first = ++__i;
	continue;			continue;
	}			}
	}			}
	}			}
	// sort smaller range with recursive call and larger with tail recursion elimination
	if (__i - __first < __last - __i)			// Sort smaller range with recursive call and larger with tail recursion
	{			// elimination.
	_VSTD::__sort<_Compare>(__first, __i, __comp);			if (__ret.first - __first < __last - __ret.first) {
	// _VSTD::__sort<_Compare>(__i+1, __last, __comp);			__bitsetsort_loop<_Compare>(__first, __ret.first, __comp, __buff, __limit);
	__first = ++__i;			__first = ++__ret.first;
	}			} else {
	else			__bitsetsort_loop<_Compare>(__ret.first + 1, __last, __comp, __buff, __limit);
	{			__last = __ret.first;
	_VSTD::__sort<_Compare>(__i+1, __last, __comp);
	// _VSTD::__sort<_Compare>(__first, __i, __comp);
	__last = __i;
	}			}
	}			}
	}			}

	template <class _Compare, class _Tp>			template <typename _Number>
	inline _LIBCPP_INLINE_VISIBILITY			inline _LIBCPP_HIDE_FROM_ABI _Number __log2i(_Number __n) {
	void			_Number __log2 = 0;
	__sort(_Tp __first, _Tp __last, __less<_Tp*>&)			while (__n > 1) {
	{			__log2++;
	__less<uintptr_t> __comp;			__n >>= 1;
	_VSTD::__sort<__less<uintptr_t>&, uintptr_t>((uintptr_t)__first, (uintptr_t*)__last, __comp);			}
				return __log2;
	}			}

	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<char>&, char>(char, char*, __less<char>&))			template <class _Compare, class _RandomAccessIterator>
	#ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS			inline _LIBCPP_HIDE_FROM_ABI void __bitsetsort_internal(_RandomAccessIterator __first, _RandomAccessIterator __last,
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<wchar_t>&, wchar_t>(wchar_t, wchar_t*, __less<wchar_t>&))			_Compare __comp) {
	#endif			typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::value_type value_type;
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<signed char>&, signed char>(signed char, signed char*, __less<signed char>&))			typedef typename _VSTD::iterator_traits<_RandomAccessIterator>::difference_type difference_type;
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned char>&, unsigned char>(unsigned char, unsigned char*, __less<unsigned char>&))			typename _VSTD::aligned_storage<sizeof(value_type)>::type __buff[__bitonic::__detail::__small_sort_max];
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<short>&, short>(short, short*, __less<short>&))			typedef typename __comp_ref_type<_Compare>::type _Comp_ref;
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned short>&, unsigned short>(unsigned short, unsigned short*, __less<unsigned short>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<int>&, int>(int, int*, __less<int>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned>&, unsigned>(unsigned, unsigned*, __less<unsigned>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<long>&, long>(long, long*, __less<long>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned long>&, unsigned long>(unsigned long, unsigned long*, __less<unsigned long>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<long long>&, long long>(long long, long long*, __less<long long>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned long long>&, unsigned long long>(unsigned long long, unsigned long long*, __less<unsigned long long>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<float>&, float>(float, float*, __less<float>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<double>&, double>(double, double*, __less<double>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<long double>&, long double>(long double, long double*, __less<long double>&))

	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<char>&, char>(char, char*, __less<char>&))
	#ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<wchar_t>&, wchar_t>(wchar_t, wchar_t*, __less<wchar_t>&))
	#endif
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<signed char>&, signed char>(signed char, signed char*, __less<signed char>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<unsigned char>&, unsigned char>(unsigned char, unsigned char*, __less<unsigned char>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<short>&, short>(short, short*, __less<short>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<unsigned short>&, unsigned short>(unsigned short, unsigned short*, __less<unsigned short>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<int>&, int>(int, int*, __less<int>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<unsigned>&, unsigned>(unsigned, unsigned*, __less<unsigned>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<long>&, long>(long, long*, __less<long>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<unsigned long>&, unsigned long>(unsigned long, unsigned long*, __less<unsigned long>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<long long>&, long long>(long long, long long*, __less<long long>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<unsigned long long>&, unsigned long long>(unsigned long long, unsigned long long*, __less<unsigned long long>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<float>&, float>(float, float*, __less<float>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<double>&, double>(double, double*, __less<double>&))
	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<long double>&, long double>(long double, long double*, __less<long double>&))

	_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS unsigned __sort5<__less<long double>&, long double>(long double, long double, long double, long double, long double, __less<long double>&))			// 2*log2 comes from Introsort https://reviews.llvm.org/D36423.
				difference_type __depth_limit = 2 * __log2i(__last - __first);
				__bitsetsort_loop<_Comp_ref>(__first, __last, _Comp_ref(__comp), reinterpret_cast<value_type*>(&__buff[0]),
				__depth_limit);
				}
				} // namespace __bitsetsort

	template <class _RandomAccessIterator, class _Compare>			template <class _RandomAccessIterator, class _Compare>
	inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX17			inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX17 void sort(_RandomAccessIterator __first,
	void			_RandomAccessIterator __last, _Compare __comp) {
	sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp)
	{
	typedef typename __comp_ref_type<_Compare>::type _Comp_ref;			typedef typename __comp_ref_type<_Compare>::type _Comp_ref;
	if (__libcpp_is_constant_evaluated()) {			if (__libcpp_is_constant_evaluated()) {
	_VSTD::__partial_sort<_Comp_ref>(__first, __last, __last, _Comp_ref(__comp));			_VSTD::__partial_sort<_Comp_ref>(__first, __last, __last, _Comp_ref(__comp));
	} else {			} else {
	_VSTD::__sort<_Comp_ref>(_VSTD::__unwrap_iter(__first), _VSTD::__unwrap_iter(__last), _Comp_ref(__comp));			__bitsetsort::__bitsetsort_internal<_Comp_ref>(_VSTD::__unwrap_iter(__first), _VSTD::__unwrap_iter(__last),
				_Comp_ref(__comp));
	}			}
	}			}

	template <class _RandomAccessIterator>			template <class _RandomAccessIterator>
	inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX17			inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX17 void sort(_RandomAccessIterator __first,
	void			_RandomAccessIterator __last) {
	sort(_RandomAccessIterator __first, _RandomAccessIterator __last)
	{
	_VSTD::sort(__first, __last, __less<typename iterator_traits<_RandomAccessIterator>::value_type>());			_VSTD::sort(__first, __last, __less<typename iterator_traits<_RandomAccessIterator>::value_type>());
	}			}

	_LIBCPP_END_NAMESPACE_STD			_LIBCPP_END_NAMESPACE_STD

	#endif // _LIBCPP___ALGORITHM_SORT_H			#endif // _LIBCPP___ALGORITHM_SORT_H

libcxx/include/__algorithm/stable_sort.h

	Show All 19 Lines
	#include <type_traits> // swap			#include <type_traits> // swap

	#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)			#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
	#pragma GCC system_header			#pragma GCC system_header
	#endif			#endif

	_LIBCPP_BEGIN_NAMESPACE_STD			_LIBCPP_BEGIN_NAMESPACE_STD

				template <class _Compare, class _BidirectionalIterator>
				void __insertion_sort(_BidirectionalIterator __first, _BidirectionalIterator __last, _Compare __comp) {
				typedef typename iterator_traits<_BidirectionalIterator>::value_type value_type;
				if (__first != __last) {
				_BidirectionalIterator __i = __first;
				for (++__i; __i != __last; ++__i) {
				_BidirectionalIterator __j = __i;
				value_type __t(_VSTD::move(*__j));
				for (_BidirectionalIterator __k = __i; __k != __first && __comp(__t, *--__k); --__j)
				__j = _VSTD::move(__k);
				*__j = _VSTD::move(__t);
				}
				}
				}

				template <class _Compare, class _BidirectionalIterator>
				void __insertion_sort_move(_BidirectionalIterator __first1, _BidirectionalIterator __last1,
				typename iterator_traits<_BidirectionalIterator>::value_type* __first2, _Compare __comp) {
				typedef typename iterator_traits<_BidirectionalIterator>::value_type value_type;
				if (__first1 != __last1) {
				__destruct_n __d(0);
				unique_ptr<value_type, __destruct_n&> __h(__first2, __d);
				value_type* __last2 = __first2;
				::new ((void)__last2) value_type(_VSTD::move(__first1));
				__d.template __incr<value_type>();
				for (++__last2; ++__first1 != __last1; ++__last2) {
				value_type* __j2 = __last2;
				value_type* __i2 = __j2;
				if (__comp(__first1, --__i2)) {
				::new ((void)__j2) value_type(_VSTD::move(__i2));
				__d.template __incr<value_type>();
				for (--__j2; __i2 != __first2 && __comp(__first1, --__i2); --__j2)
				__j2 = _VSTD::move(__i2);
				__j2 = _VSTD::move(__first1);
				} else {
				::new ((void)__j2) value_type(_VSTD::move(__first1));
				__d.template __incr<value_type>();
				}
				}
				__h.release();
				}
				}

	template <class _Compare, class _InputIterator1, class _InputIterator2>			template <class _Compare, class _InputIterator1, class _InputIterator2>
	void			void
	__merge_move_construct(_InputIterator1 __first1, _InputIterator1 __last1,			__merge_move_construct(_InputIterator1 __first1, _InputIterator1 __last1,
	_InputIterator2 __first2, _InputIterator2 __last2,			_InputIterator2 __first2, _InputIterator2 __last2,
	typename iterator_traits<_InputIterator1>::value_type* __result, _Compare __comp)			typename iterator_traits<_InputIterator1>::value_type* __result, _Compare __comp)
	{			{
	typedef typename iterator_traits<_InputIterator1>::value_type value_type;			typedef typename iterator_traits<_InputIterator1>::value_type value_type;
	__destruct_n __d(0);			__destruct_n __d(0);
	▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

libcxx/src/CMakeLists.txt

	set(LIBCXX_LIB_CMAKEFILES_DIR "${CMAKE_CURRENT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}" PARENT_SCOPE)			set(LIBCXX_LIB_CMAKEFILES_DIR "${CMAKE_CURRENT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}" PARENT_SCOPE)

	# Get sources			# Get sources
	set(LIBCXX_SOURCES			set(LIBCXX_SOURCES
	algorithm.cpp
	any.cpp			any.cpp
	atomic.cpp			atomic.cpp
	barrier.cpp			barrier.cpp
	bind.cpp			bind.cpp
	charconv.cpp			charconv.cpp
	chrono.cpp			chrono.cpp
	condition_variable.cpp			condition_variable.cpp
	condition_variable_destructor.cpp			condition_variable_destructor.cpp
	exception.cpp			exception.cpp
	functional.cpp			functional.cpp
	future.cpp			future.cpp
	hash.cpp			hash.cpp
	include/apple_availability.h			include/apple_availability.h
	include/atomic_support.h			include/atomic_support.h
	include/config_elast.h			include/config_elast.h
	include/refstring.h			include/refstring.h
				legacy-sort.cpp
	memory.cpp			memory.cpp
	mutex.cpp			mutex.cpp
	mutex_destructor.cpp			mutex_destructor.cpp
	new.cpp			new.cpp
	optional.cpp			optional.cpp
	random_shuffle.cpp			random_shuffle.cpp
	shared_mutex.cpp			shared_mutex.cpp
	stdexcept.cpp			stdexcept.cpp
	▲ Show 20 Lines • Show All 408 Lines • Show Last 20 Lines

libcxx/src/algorithm.cpp

This file was deleted.

	//===----------------------- algorithm.cpp --------------------------------===//
	//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//
	//===----------------------------------------------------------------------===//

	#include "algorithm"

	_LIBCPP_BEGIN_NAMESPACE_STD

	template void __sort<__less<char>&, char>(char, char*, __less<char>&);
	#ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
	template void __sort<__less<wchar_t>&, wchar_t>(wchar_t, wchar_t*, __less<wchar_t>&);
	#endif
	template void __sort<__less<signed char>&, signed char>(signed char, signed char*, __less<signed char>&);
	template void __sort<__less<unsigned char>&, unsigned char>(unsigned char, unsigned char*, __less<unsigned char>&);
	template void __sort<__less<short>&, short>(short, short*, __less<short>&);
	template void __sort<__less<unsigned short>&, unsigned short>(unsigned short, unsigned short*, __less<unsigned short>&);
	template void __sort<__less<int>&, int>(int, int*, __less<int>&);
	template void __sort<__less<unsigned>&, unsigned>(unsigned, unsigned*, __less<unsigned>&);
	template void __sort<__less<long>&, long>(long, long*, __less<long>&);
	template void __sort<__less<unsigned long>&, unsigned long>(unsigned long, unsigned long*, __less<unsigned long>&);
	template void __sort<__less<long long>&, long long>(long long, long long*, __less<long long>&);
	template void __sort<__less<unsigned long long>&, unsigned long long>(unsigned long long, unsigned long long*, __less<unsigned long long>&);
	template void __sort<__less<float>&, float>(float, float*, __less<float>&);
	template void __sort<__less<double>&, double>(double, double*, __less<double>&);
	template void __sort<__less<long double>&, long double>(long double, long double*, __less<long double>&);

	template bool __insertion_sort_incomplete<__less<char>&, char>(char, char*, __less<char>&);
	#ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
	template bool __insertion_sort_incomplete<__less<wchar_t>&, wchar_t>(wchar_t, wchar_t*, __less<wchar_t>&);
	#endif
	template bool __insertion_sort_incomplete<__less<signed char>&, signed char>(signed char, signed char*, __less<signed char>&);
	template bool __insertion_sort_incomplete<__less<unsigned char>&, unsigned char>(unsigned char, unsigned char*, __less<unsigned char>&);
	template bool __insertion_sort_incomplete<__less<short>&, short>(short, short*, __less<short>&);
	template bool __insertion_sort_incomplete<__less<unsigned short>&, unsigned short>(unsigned short, unsigned short*, __less<unsigned short>&);
	template bool __insertion_sort_incomplete<__less<int>&, int>(int, int*, __less<int>&);
	template bool __insertion_sort_incomplete<__less<unsigned>&, unsigned>(unsigned, unsigned*, __less<unsigned>&);
	template bool __insertion_sort_incomplete<__less<long>&, long>(long, long*, __less<long>&);
	template bool __insertion_sort_incomplete<__less<unsigned long>&, unsigned long>(unsigned long, unsigned long*, __less<unsigned long>&);
	template bool __insertion_sort_incomplete<__less<long long>&, long long>(long long, long long*, __less<long long>&);
	template bool __insertion_sort_incomplete<__less<unsigned long long>&, unsigned long long>(unsigned long long, unsigned long long*, __less<unsigned long long>&);
	template bool __insertion_sort_incomplete<__less<float>&, float>(float, float*, __less<float>&);
	template bool __insertion_sort_incomplete<__less<double>&, double>(double, double*, __less<double>&);
	template bool __insertion_sort_incomplete<__less<long double>&, long double>(long double, long double*, __less<long double>&);

	template unsigned __sort5<__less<long double>&, long double>(long double, long double, long double, long double, long double, __less<long double>&);

	_LIBCPP_END_NAMESPACE_STD

libcxx/src/legacy-sort.cpp

This file was added.

				//===----------------------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				//
				// This file contains the legacy implementation of std::sort that used to ship
				// with libc++. Since we used to explicitly instantiate some specializations of
				// the sorting functions in the built library (to improve code size), we can't
				// just remove those symbols without breaking ABI.
				//
				// Hence, the old std::sort implementation remains in this file and we keep
				// exporting the necessary symbols to keep our ABI stable, however programs
				// compiled against newer versions of libc++ will never actually rely on those
				// symbols.
				//

				#include "algorithm"
				#include "memory"
				#include "utility"

				_LIBCPP_BEGIN_NAMESPACE_STD

				// stable, 2-3 compares, 0-2 swaps

				template <class _Compare, class _ForwardIterator>
				_LIBCPP_CONSTEXPR_AFTER_CXX11 unsigned __sort3(_ForwardIterator __x, _ForwardIterator __y, _ForwardIterator __z,
				_Compare __c) {
				unsigned __r = 0;
				if (!__c(__y, __x)) // if x <= y
				{
				if (!__c(__z, __y)) // if y <= z
				return __r; // x <= y && y <= z
				// x <= y && y > z
				swap(__y, __z); // x <= z && y < z
				__r = 1;
				if (__c(__y, __x)) // if x > y
				{
				swap(__x, __y); // x < y && y <= z
				__r = 2;
				}
				return __r; // x <= y && y < z
				}
				if (__c(__z, __y)) // x > y, if y > z
				{
				swap(__x, __z); // x < y && y < z
				__r = 1;
				return __r;
				}
				swap(__x, __y); // x > y && y <= z
				__r = 1; // x < y && x <= z
				if (__c(__z, __y)) // if y > z
				{
				swap(__y, __z); // x <= y && y < z
				__r = 2;
				}
				return __r;
				} // x <= y && y <= z

				// stable, 3-6 compares, 0-5 swaps

				template <class _Compare, class _ForwardIterator>
				unsigned __sort4(_ForwardIterator __x1, _ForwardIterator __x2, _ForwardIterator __x3, _ForwardIterator __x4,
				_Compare __c) {
				unsigned __r = _VSTD::__sort3<_Compare>(__x1, __x2, __x3, __c);
				if (__c(__x4, __x3)) {
				swap(__x3, __x4);
				++__r;
				if (__c(__x3, __x2)) {
				swap(__x2, __x3);
				++__r;
				if (__c(__x2, __x1)) {
				swap(__x1, __x2);
				++__r;
				}
				}
				}
				return __r;
				}

				// stable, 4-10 compares, 0-9 swaps

				template <class _Compare, class _ForwardIterator>
				_LIBCPP_HIDDEN unsigned __sort5(_ForwardIterator __x1, _ForwardIterator __x2, _ForwardIterator __x3,
				_ForwardIterator __x4, _ForwardIterator __x5, _Compare __c) {
				unsigned __r = _VSTD::__sort4<_Compare>(__x1, __x2, __x3, __x4, __c);
				if (__c(__x5, __x4)) {
				swap(__x4, __x5);
				++__r;
				if (__c(__x4, __x3)) {
				swap(__x3, __x4);
				++__r;
				if (__c(__x3, __x2)) {
				swap(__x2, __x3);
				++__r;
				if (__c(__x2, __x1)) {
				swap(__x1, __x2);
				++__r;
				}
				}
				}
				}
				return __r;
				}

				template <class _Compare, class _RandomAccessIterator>
				void __insertion_sort_3(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {
				typedef typename iterator_traits<_RandomAccessIterator>::value_type value_type;
				_RandomAccessIterator __j = __first + 2;
				_VSTD::__sort3<_Compare>(__first, __first + 1, __j, __comp);
				for (_RandomAccessIterator __i = __j + 1; __i != __last; ++__i) {
				if (__comp(__i, __j)) {
				value_type __t(_VSTD::move(*__i));
				_RandomAccessIterator __k = __j;
				__j = __i;
				do {
				__j = _VSTD::move(__k);
				__j = __k;
				} while (__j != __first && __comp(__t, *--__k));
				*__j = _VSTD::move(__t);
				}
				__j = __i;
				}
				}

				template <class _Compare, class _RandomAccessIterator>
				bool __insertion_sort_incomplete(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {
				switch (__last - __first) {
				case 0:
				case 1:
				return true;
				case 2:
				if (__comp(--__last, __first))
				swap(__first, __last);
				return true;
				case 3:
				_VSTD::__sort3<_Compare>(__first, __first + 1, --__last, __comp);
				return true;
				case 4:
				_VSTD::__sort4<_Compare>(__first, __first + 1, __first + 2, --__last, __comp);
				return true;
				case 5:
				_VSTD::__sort5<_Compare>(__first, __first + 1, __first + 2, __first + 3, --__last, __comp);
				return true;
				}
				typedef typename iterator_traits<_RandomAccessIterator>::value_type value_type;
				_RandomAccessIterator __j = __first + 2;
				_VSTD::__sort3<_Compare>(__first, __first + 1, __j, __comp);
				const unsigned __limit = 8;
				unsigned __count = 0;
				for (_RandomAccessIterator __i = __j + 1; __i != __last; ++__i) {
				if (__comp(__i, __j)) {
				value_type __t(_VSTD::move(*__i));
				_RandomAccessIterator __k = __j;
				__j = __i;
				do {
				__j = _VSTD::move(__k);
				__j = __k;
				} while (__j != __first && __comp(__t, *--__k));
				*__j = _VSTD::move(__t);
				if (++__count == __limit)
				return ++__i == __last;
				}
				__j = __i;
				}
				return true;
				}

				template <class _Compare, class _RandomAccessIterator>
				void __sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {
				typedef typename iterator_traits<_RandomAccessIterator>::difference_type difference_type;
				typedef typename iterator_traits<_RandomAccessIterator>::value_type value_type;
				const difference_type __limit =
				is_trivially_copy_constructible<value_type>::value && is_trivially_copy_assignable<value_type>::value ? 30 : 6;
				while (true) {
				__restart:
				difference_type __len = __last - __first;
				switch (__len) {
				case 0:
				case 1:
				return;
				case 2:
				if (__comp(--__last, __first))
				swap(__first, __last);
				return;
				case 3:
				_VSTD::__sort3<_Compare>(__first, __first + 1, --__last, __comp);
				return;
				case 4:
				_VSTD::__sort4<_Compare>(__first, __first + 1, __first + 2, --__last, __comp);
				return;
				case 5:
				_VSTD::__sort5<_Compare>(__first, __first + 1, __first + 2, __first + 3, --__last, __comp);
				return;
				}
				if (__len <= __limit) {
				_VSTD::__insertion_sort_3<_Compare>(__first, __last, __comp);
				return;
				}
				// __len > 5
				_RandomAccessIterator __m = __first;
				_RandomAccessIterator __lm1 = __last;
				--__lm1;
				unsigned __n_swaps;
				{
				difference_type __delta;
				if (__len >= 1000) {
				__delta = __len / 2;
				__m += __delta;
				__delta /= 2;
				__n_swaps = _VSTD::__sort5<_Compare>(__first, __first + __delta, __m, __m + __delta, __lm1, __comp);
				} else {
				__delta = __len / 2;
				__m += __delta;
				__n_swaps = _VSTD::__sort3<_Compare>(__first, __m, __lm1, __comp);
				}
				}
				// *__m is median
				// partition [__first, __m) < __m and __m <= [__m, __last)
				// (this inhibits tossing elements equivalent to __m around unnecessarily)
				_RandomAccessIterator __i = __first;
				_RandomAccessIterator __j = __lm1;
				// j points beyond range to be tested, __m is known to be <= __lm1
				// The search going up is known to be guarded but the search coming down isn't.
				// Prime the downward search with a guard.
				if (!__comp(__i, __m)) // if __first == __m
				{
				// __first == __m, *__first doesn't go in first part
				// manually guard downward moving __j against __i
				while (true) {
				if (__i == --__j) {
				// __first == __m, *__m <= all other elements
				// Parition instead into [__first, __i) == __first and __first < [__i, __last)
				++__i; // __first + 1
				__j = __last;
				if (!__comp(__first, --__j)) // we need a guard if __first == (__last-1)
				{
				while (true) {
				if (__i == __j)
				return; // [__first, __last) all equivalent elements
				if (__comp(__first, __i)) {
				swap(__i, __j);
				++__n_swaps;
				++__i;
				break;
				}
				++__i;
				}
				}
				// [__first, __i) == __first and __first < [__j, __last) and __j == __last - 1
				if (__i == __j)
				return;
				while (true) {
				while (!__comp(__first, __i))
				++__i;
				while (__comp(__first, --__j))
				;
				if (__i >= __j)
				break;
				swap(__i, __j);
				++__n_swaps;
				++__i;
				}
				// [__first, __i) == __first and __first < [__i, __last)
				// The first part is sorted, sort the second part
				// _VSTD::__sort<_Compare>(__i, __last, __comp);
				__first = __i;
				goto __restart;
				}
				if (__comp(__j, __m)) {
				swap(__i, __j);
				++__n_swaps;
				break; // found guard for downward moving __j, now use unguarded partition
				}
				}
				}
				// It is known that __i < __m
				++__i;
				// j points beyond range to be tested, __m is known to be <= __lm1
				// if not yet partitioned...
				if (__i < __j) {
				// known that (__i - 1) < __m
				// known that __i <= __m
				while (true) {
				// __m still guards upward moving __i
				while (__comp(__i, __m))
				++__i;
				// It is now known that a guard exists for downward moving __j
				while (!__comp(--__j, __m))
				;
				if (__i > __j)
				break;
				swap(__i, __j);
				++__n_swaps;
				// It is known that __m != __j
				// If __m just moved, follow it
				if (__m == __i)
				__m = __j;
				++__i;
				}
				}
				// [__first, __i) < __m and __m <= [__i, __last)
				if (__i != __m && __comp(__m, __i)) {
				swap(__i, __m);
				++__n_swaps;
				}
				// [__first, __i) < __i and __i <= [__i+1, __last)
				// If we were given a perfect partition, see if insertion sort is quick...
				if (__n_swaps == 0) {
				bool __fs = _VSTD::__insertion_sort_incomplete<_Compare>(__first, __i, __comp);
				if (_VSTD::__insertion_sort_incomplete<_Compare>(__i + 1, __last, __comp)) {
				if (__fs)
				return;
				__last = __i;
				continue;
				} else {
				if (__fs) {
				__first = ++__i;
				continue;
				}
				}
				}
				// sort smaller range with recursive call and larger with tail recursion elimination
				if (__i - __first < __last - __i) {
				_VSTD::__sort<_Compare>(__first, __i, __comp);
				// _VSTD::__sort<_Compare>(__i+1, __last, __comp);
				__first = ++__i;
				} else {
				_VSTD::__sort<_Compare>(__i + 1, __last, __comp);
				// _VSTD::__sort<_Compare>(__first, __i, __comp);
				__last = __i;
				}
				}
				}

				template _LIBCPP_FUNC_VIS void __sort<__less<char>&, char>(char, char*, __less<char>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<wchar_t>&, wchar_t>(wchar_t, wchar_t*, __less<wchar_t>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<signed char>&, signed char>(signed char, signed char*,
				ldionneUnsubmitted Done Reply Inline Actions This should be guarded by `_LIBCPP_HAS_NO_WIDE_CHARACTERS`. I think this must have happened when you (or I) rebased the patch. ldionne: This should be guarded by `_LIBCPP_HAS_NO_WIDE_CHARACTERS`. I think this must have happened…
				__less<signed char>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<unsigned char>&, unsigned char>(unsigned char, unsigned char*,
				__less<unsigned char>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<short>&, short>(short, short*, __less<short>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<unsigned short>&, unsigned short>(unsigned short, unsigned short*,
				__less<unsigned short>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<int>&, int>(int, int*, __less<int>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<unsigned>&, unsigned>(unsigned, unsigned*, __less<unsigned>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<long>&, long>(long, long*, __less<long>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<unsigned long>&, unsigned long>(unsigned long, unsigned long*,
				__less<unsigned long>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<long long>&, long long>(long long, long long*, __less<long long>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<unsigned long long>&, unsigned long long>(unsigned long long,
				unsigned long long*,
				__less<unsigned long long>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<float>&, float>(float, float*, __less<float>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<double>&, double>(double, double*, __less<double>&);
				template _LIBCPP_FUNC_VIS void __sort<__less<long double>&, long double>(long double, long double*,
				__less<long double>&);

				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<char>&, char>(char, char*, __less<char>&);
				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<wchar_t>&, wchar_t>(wchar_t, wchar_t*,
				__less<wchar_t>&);
				template _LIBCPP_FUNC_VIS bool
				ldionneUnsubmitted Done Reply Inline Actions Same. ldionne: Same.
				__insertion_sort_incomplete<__less<signed char>&, signed char>(signed char, signed char*, __less<signed char>&);
				template _LIBCPP_FUNC_VIS bool
				__insertion_sort_incomplete<__less<unsigned char>&, unsigned char>(unsigned char, unsigned char*,
				__less<unsigned char>&);
				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<short>&, short>(short, short*, __less<short>&);
				template _LIBCPP_FUNC_VIS bool
				__insertion_sort_incomplete<__less<unsigned short>&, unsigned short>(unsigned short, unsigned short*,
				__less<unsigned short>&);
				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<int>&, int>(int, int*, __less<int>&);
				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<unsigned>&, unsigned>(unsigned, unsigned*,
				__less<unsigned>&);
				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<long>&, long>(long, long*, __less<long>&);
				template _LIBCPP_FUNC_VIS bool
				__insertion_sort_incomplete<__less<unsigned long>&, unsigned long>(unsigned long, unsigned long*,
				__less<unsigned long>&);
				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<long long>&, long long>(long long, long long*,
				__less<long long>&);
				template _LIBCPP_FUNC_VIS bool
				__insertion_sort_incomplete<__less<unsigned long long>&, unsigned long long>(unsigned long long, unsigned long long*,
				__less<unsigned long long>&);
				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<float>&, float>(float, float*, __less<float>&);
				template _LIBCPP_FUNC_VIS bool __insertion_sort_incomplete<__less<double>&, double>(double, double*, __less<double>&);
				template _LIBCPP_FUNC_VIS bool
				__insertion_sort_incomplete<__less<long double>&, long double>(long double, long double*, __less<long double>&);

				template _LIBCPP_FUNC_VIS unsigned __sort5<__less<long double>&, long double>(long double, long double, long double,
				long double, long double,
				__less<long double>&);

				nilayvaishAuthorUnsubmitted Done Reply Inline Actions ldionne@, I am wondering if these symbols need to be in this file. Can we continue with the setup from before i.e. these symbols have extern declarations in sort.h and are defined in algorithm.cpp? Further is there a need for retaining the existing sorting algorithm? It seems to me we need to retain __insertion_sort_incomplete only. What do you think? nilayvaish: ldionne@, I am wondering if these symbols need to be in this file. Can we continue with the…
				ldionneUnsubmitted Not Done Reply Inline Actions I think we do need to retain all those functions since they were previously exported from the shared library. Removing these functions would be an ABI break. I've put it in `legacy-sort.cpp` because we don't ever want to deal with these functions again anymore -- they are only there for ABI compatibility purpose. I thought it was better to separate them into their own little file than keeping them around in the headers that we actually use. If you have a strong reason to keep them around in `sort.h`, let me know and we can discuss. ldionne: I think we do need to retain all those functions since they were previously exported from the…
				nilayvaishAuthorUnsubmitted Done Reply Inline Actions I do not have a preference for where these symbols are kept. Is there a way to avoid having sort implementation in legacy-sort.cpp file? nilayvaish: I do not have a preference for where these symbols are kept. Is there a way to avoid having…
				_LIBCPP_END_NAMESPACE_STD