This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
benchmarks/
-
algorithms.bench.cpp
-
include/__algorithm/
-
__algorithm/
2/6
sort.h
-
test/std/algorithms/alg.sorting/alg.sort/sort/
-
std/
-
algorithms/
-
alg.sorting/
-
alg.sort/
-
sort/
-
sort.pass.cpp

Differential D113413

Add introsort to avoid O(n^2) behavior and a benchmark for adversarial quick sort input.
ClosedPublic

Authored by nilayvaish on Nov 8 2021, 8:48 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
ldionne

Group Reviewers

Restricted Project

Commits

rG7f287390d78d: [libc++] Add introsort to avoid O(n^2) behavior

Summary

There are two commits in this request.  The first one adds a benchmark that tests std::sort on an adversarial inputs.  The second 
 commit adds intro sort to the std::sort.  Inputs where partitions are unbalanced even after 2 log(n) pivots have been selected, the 
 algorithm switches to heap sort to avoid the possibility of spending O(n^2) time on sorting the input.  Benchmark results show that
 the intro sort implementation does significantly better.

Benchmarking results before this change.  Time represents the sorting time
required per element.

----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
BM_Sort_uint32_QuickSortAdversary_1                                   3.75 ns         3.74 ns    187432960
BM_Sort_uint32_QuickSortAdversary_4                                   3.05 ns         3.05 ns    231211008
BM_Sort_uint32_QuickSortAdversary_16                                  2.45 ns         2.45 ns    288096256
BM_Sort_uint32_QuickSortAdversary_64                                  32.8 ns         32.8 ns     21495808
BM_Sort_uint32_QuickSortAdversary_256                                  132 ns          132 ns      5505024
BM_Sort_uint32_QuickSortAdversary_1024                                 498 ns          497 ns      1572864
BM_Sort_uint32_QuickSortAdversary_16384                               3846 ns         3845 ns       262144
BM_Sort_uint32_QuickSortAdversary_262144                             61431 ns        61400 ns       262144
BM_Sort_uint64_QuickSortAdversary_1                                   3.93 ns         3.92 ns    181141504
BM_Sort_uint64_QuickSortAdversary_4                                   3.10 ns         3.09 ns    222560256
BM_Sort_uint64_QuickSortAdversary_16                                  2.50 ns         2.50 ns    283639808
BM_Sort_uint64_QuickSortAdversary_64                                  33.2 ns         33.2 ns     21757952
BM_Sort_uint64_QuickSortAdversary_256                                  132 ns          132 ns      5505024
BM_Sort_uint64_QuickSortAdversary_1024                                 478 ns          477 ns      1572864
BM_Sort_uint64_QuickSortAdversary_16384                               3932 ns         3930 ns       262144
BM_Sort_uint64_QuickSortAdversary_262144                             61646 ns        61615 ns       262144

Benchmarking results after this change

----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
BM_Sort_uint32_QuickSortAdversary_1                                   6.31 ns         6.30 ns    107741184
BM_Sort_uint32_QuickSortAdversary_4                                   4.51 ns         4.50 ns    158859264
BM_Sort_uint32_QuickSortAdversary_16                                  3.00 ns         3.00 ns    223608832
BM_Sort_uint32_QuickSortAdversary_64                                  44.8 ns         44.8 ns     15990784
BM_Sort_uint32_QuickSortAdversary_256                                 69.0 ns         68.9 ns      9961472
BM_Sort_uint32_QuickSortAdversary_1024                                 118 ns          118 ns      6029312
BM_Sort_uint32_QuickSortAdversary_16384                                175 ns          175 ns      4194304
BM_Sort_uint32_QuickSortAdversary_262144                               210 ns          210 ns      3407872
BM_Sort_uint64_QuickSortAdversary_1                                   6.75 ns         6.73 ns    103809024
BM_Sort_uint64_QuickSortAdversary_4                                   4.53 ns         4.53 ns    160432128
BM_Sort_uint64_QuickSortAdversary_16                                  2.98 ns         2.97 ns    234356736 
BM_Sort_uint64_QuickSortAdversary_64                                  44.3 ns         44.3 ns     15990784
BM_Sort_uint64_QuickSortAdversary_256                                 69.2 ns         69.2 ns     10223616
BM_Sort_uint64_QuickSortAdversary_1024                                 119 ns          119 ns      6029312
BM_Sort_uint64_QuickSortAdversary_16384                                173 ns          173 ns      4194304
BM_Sort_uint64_QuickSortAdversary_262144                               212 ns          212 ns      3407872

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nilayvaish created this revision.Nov 8 2021, 8:48 AM

Herald added a subscriber: mgrang. · View Herald TranscriptNov 8 2021, 8:48 AM

Harbormaster completed remote builds in B133037: Diff 385518.Nov 8 2021, 10:13 AM

Fix compilation issues with cxx03

Harbormaster completed remote builds in B133073: Diff 385570.Nov 8 2021, 12:48 PM

Updated the commit message.

nilayvaish edited the summary of this revision. (Show Details)Nov 8 2021, 3:33 PM

nilayvaish published this revision for review.Nov 8 2021, 3:40 PM

nilayvaish edited the summary of this revision. (Show Details)

Herald added a reviewer: jdoerfert. · View Herald TranscriptNov 8 2021, 3:42 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added subscribers: libcxx-commits, sstefan1. · View Herald Transcript

nilayvaish added a reviewer: ldionne.Nov 8 2021, 3:42 PM

nilayvaish mentioned this in D93233: [libc++] Replaces std::sort by Bitset sorting algorithm..Nov 8 2021, 4:02 PM

Harbormaster completed remote builds in B133130: Diff 385648.Nov 8 2021, 6:58 PM

ldionne requested changes to this revision.Nov 9 2021, 7:03 AM

ldionne added inline comments.

libcxx/include/__algorithm/sort.h
264–265	It seems to me like you reformatted a lot of this code, making the change itself a lot more difficult to understand. Please do the re-formatting either before or after your functional change, in a separate "no functionality change" patch.

This revision now requires changes to proceed.Nov 9 2021, 7:03 AM

Moved the diffs due to formatting by arc to a separate commit.

Remove formatting diffs from the diff. It seems Phabricator is unable to show the diffs between different commits.

Another attempt at uploading changes without formatting diff.

Move the depth check after the initial switch so that small sized sorts are not affected by the depth check.

nilayvaish edited the summary of this revision. (Show Details)Nov 9 2021, 8:17 AM

nilayvaish added inline comments.Nov 9 2021, 8:20 AM

libcxx/include/__algorithm/sort.h
264–265	I have dropped the formatting changes. Those were done by arc and I just accepted them.

Harbormaster completed remote builds in B133268: Diff 385833.Nov 9 2021, 11:37 AM

danlark mentioned this in D96946: [libcxx][RFC] Unspecified behavior randomization in libcxx.Nov 11 2021, 9:22 AM

Pinging @ldionne.

Do you need someone to commit this for you? If so, please provide Author Name <email@domain> for attribution. Thanks!

libcxx/include/__algorithm/sort.h
264–265

This revision is now accepted and ready to land.Nov 16 2021, 8:09 AM

In D113413#3134968, @ldionne wrote:

Do you need someone to commit this for you? If so, please provide Author Name <email@domain> for attribution. Thanks!

Yes, if you can commit this for me that would be great. You can use 'Nilay Vaish <nilayvaish@google.com>' for attribution. Thanks for all the help!

Closed by commit rG7f287390d78d: [libc++] Add introsort to avoid O(n^2) behavior (authored by nilayvaish, committed by ldionne). · Explain WhyNov 16 2021, 8:41 AM

This revision was automatically updated to reflect the committed changes.

ldionne added a commit: rG7f287390d78d: [libc++] Add introsort to avoid O(n^2) behavior.

We are seeing build breakages on Fuchsia builders after this change. It complains definition of implicit copy constructor is deprecated because it has a user-provided copy assignment operator. Example of the failure can be found at https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.arm64-release/b8830264610876104385/overview . It is confirmed through bisection. Code that failed to build can be found at https://fuchsia.googlesource.com/third_party/flatbuffers/+/f22618113aea6d04ed10bcaf28cc3621eea146d2/include/flatbuffers/flatbuffers.h#1154

Would you mind taking a look? We can suppress the error on our end but we still haven't figure out why this change could trigger this error(warning).

• Quuxplusone added a subscriber: • Quuxplusone.Nov 17 2021, 7:10 PM

• Quuxplusone added inline comments.

libcxx/include/__algorithm/sort.h
468	This line needs to use `_VSTD::__introsort` (ADL-proofing — hmm, how did this pass `robust_against_adl.pass.cpp`? I should investigate). Also, it needs to use `_Comp_ref<_Compare>` instead of making a copy of `__comp`. @nilayvaish, please hunt through the rest of this patch to see if there are any other similar cases. (I did not look.) I will investigate adding a new test along the lines of `algorithms/robust_*.cpp` to verify that we never accidentally copy comparators or predicates.

• Quuxplusone mentioned this in D114133: [libc++] Minor fixups in the new introsort code..Nov 17 2021, 7:31 PM

nilayvaish added inline comments.Nov 17 2021, 7:54 PM

libcxx/include/__algorithm/sort.h
468	Noted. Will send a diff later today.

• Quuxplusone added inline comments.Nov 17 2021, 7:58 PM

libcxx/include/__algorithm/sort.h
468	Actually, nvm, I got nerdsniped into taking a look myself: D114133

We see that this patch is causing the size of the Chrome Android APK to increase by ~120 KB. This is confirmed via bisection. A breakdown of the symbols contributing to the size increase can be found at https://chrome-supersize.firebaseapp.com/viewer.html?load_url=https%3A%2F%2Fstorage.googleapis.com%2Fchromium-binary-size-trybot-results%2Fandroid-binary-size%2F2022%2F05%2F10%2F1112911%2Fsupersize_diff.sizediff&group_by=template .

At this point in time, I'm not exactly sure why this patch would increase the size by such a large amount. Does anyone have any ideas?

Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2022, 2:03 PM

In D113413#3512743, @ayzhao wrote:

We see that this patch is causing the size of the Chrome Android APK to increase by ~120 KB. This is confirmed via bisection. A breakdown of the symbols contributing to the size increase can be found at https://chrome-supersize.firebaseapp.com/viewer.html?load_url=https%3A%2F%2Fstorage.googleapis.com%2Fchromium-binary-size-trybot-results%2Fandroid-binary-size%2F2022%2F05%2F10%2F1112911%2Fsupersize_diff.sizediff&group_by=template .

At this point in time, I'm not exactly sure why this patch would increase the size by such a large amount. Does anyone have any ideas?

@From the symbol diffs on e.g. ng_grid_layout_algorithm.cc, it looks like we have lots of negative deltas for sort and larger positive deltas for introsort, which suggests that the difference is the cumulative effect of inlining many instances of __introsort with its slightly longer (len > 5) case in the new code.

Perhaps this code is being built with excessive inlining and needs -Os or similar? Perhaps Clang itself needs some inliner heuristic tuning? Perhaps this is an unavoidable size hit unless you want to take a speed hit?

In D113413#3516599, @pkasting wrote:

In D113413#3512743, @ayzhao wrote:

We see that this patch is causing the size of the Chrome Android APK to increase by ~120 KB. This is confirmed via bisection. A breakdown of the symbols contributing to the size increase can be found at https://chrome-supersize.firebaseapp.com/viewer.html?load_url=https%3A%2F%2Fstorage.googleapis.com%2Fchromium-binary-size-trybot-results%2Fandroid-binary-size%2F2022%2F05%2F10%2F1112911%2Fsupersize_diff.sizediff&group_by=template .

At this point in time, I'm not exactly sure why this patch would increase the size by such a large amount. Does anyone have any ideas?

@From the symbol diffs on e.g. ng_grid_layout_algorithm.cc, it looks like we have lots of negative deltas for sort and larger positive deltas for introsort, which suggests that the difference is the cumulative effect of inlining many instances of __introsort with its slightly longer (len > 5) case in the new code.

Perhaps this code is being built with excessive inlining and needs -Os or similar? Perhaps Clang itself needs some inliner heuristic tuning? Perhaps this is an unavoidable size hit unless you want to take a speed hit?

For documentation purposes:

I chatted offline with pkasting@ and we found out that adding __attribute__((noinline)) to __introsort actually made the binary size even worse - it added an additional ~40KB.

hans mentioned this in D125958: [libc++] Use __libcpp_clz for a tighter __log2i function.May 19 2022, 2:49 AM

re: binary size - std::sort-related symbols make up more than 1% of Chrome's binary size (for arm32 Android at least). Any work to reduce the size overhead of std::sort() would be very welcome. From what I can tell, the main contributing factor is the templated nature of the function. It gets stamped out a *lot* of times, and in a way that identical-code-folding is often not applicable.

hans mentioned this in rG865ad6bd2165: [libc++] Use __libcpp_clz for a tighter __log2i function.May 27 2022, 9:58 AM

glad to see this land. we had a very similar patch almost 5 years ago but got lost in the review process: https://reviews.llvm.org/D36423

In D113413#3528143, @agrieve wrote:

re: binary size - std::sort-related symbols make up more than 1% of Chrome's binary size (for arm32 Android at least). Any work to reduce the size overhead of std::sort() would be very welcome. From what I can tell, the main contributing factor is the templated nature of the function. It gets stamped out a *lot* of times, and in a way that identical-code-folding is often not applicable.

Will it help to explicitly specialize some of the commonly used templates in Chrome?

philnik mentioned this in D36423: [libc++] Introsort based sorting function.Jun 1 2022, 12:42 AM

In D113413#3548857, @hiraditya wrote:

In D113413#3528143, @agrieve wrote:

re: binary size - std::sort-related symbols make up more than 1% of Chrome's binary size (for arm32 Android at least). Any work to reduce the size overhead of std::sort() would be very welcome. From what I can tell, the main contributing factor is the templated nature of the function. It gets stamped out a *lot* of times, and in a way that identical-code-folding is often not applicable.

Will it help to explicitly specialize some of the commonly used templates in Chrome?

We enable identical code folding, so I don't see how that would help.

If we could identify at compile time when sort is being called with aggregates / PODs and route all such calls through qsort(), I think that might help.

I'm working on upgrading Meta's Android apps from libc++ 13 to libc++ 15, and seeing large size regressions caused by this change (similar to the Chrome case above). Was there any progress on figuring out how to offset the binary size increase?

(I recognize it's pretty annoying on my part to be raising the issue 9 months after the code was committed, and I apologize for that. We're working on being able to catch these issues earlier in the future.)

nilayvaish mentioned this in D122780: Modify std::sort: add BlockQuickSort partitioning algorithm for arithmetic types.Nov 28 2022, 1:53 PM

Revision Contents

Path

Size

libcxx/

benchmarks/

algorithms.bench.cpp

56 lines

include/

__algorithm/

sort.h

356 lines

test/

std/

algorithms/

alg.sorting/

alg.sort/

sort/

sort.pass.cpp

60 lines

Diff 385819

libcxx/benchmarks/algorithms.bench.cpp

Show All 32 Lines	std::conditional_t<
std::string> > > >;		std::string> > > >;

enum class Order {		enum class Order {
Random,		Random,
Ascending,		Ascending,
Descending,		Descending,
SingleElement,		SingleElement,
PipeOrgan,		PipeOrgan,
Heap		Heap,
		QuickSortAdversary,
};		};
struct AllOrders : EnumValuesAsTuple<AllOrders, Order, 6> {		struct AllOrders : EnumValuesAsTuple<AllOrders, Order, 7> {
static constexpr const char* Names[] = {"Random", "Ascending",		static constexpr const char* Names[] = {"Random", "Ascending",
"Descending", "SingleElement",		"Descending", "SingleElement",
"PipeOrgan", "Heap"};		"PipeOrgan", "Heap",
		"QuickSortAdversary"};
};		};

		// fillAdversarialQuickSortInput fills the input vector with N int-like values.
		// These values are arranged in such a way that they would invoke O(N^2)
		// behavior on any quick sort implementation that satisifies certain conditions.
		// Details are available in the following paper:
		// "A Killer Adversary for Quicksort", M. D. McIlroy, Software—Practice &
		// ExperienceVolume 29 Issue 4 April 10, 1999 pp 341–344.
		// https://dl.acm.org/doi/10.5555/311868.311871.
		template <class T>
		void fillAdversarialQuickSortInput(T& V, size_t N) {
		assert(N > 0);
		// If an element is equal to gas, it indicates that the value of the element
		// is still to be decided and may change over the course of time.
		const int gas = N - 1;
		V.resize(N);
		for (int i = 0; i < N; ++i) {
		V[i] = gas;
		}
		// Candidate for the pivot position.
		int candidate = 0;
		int nsolid = 0;
		// Populate all positions in the generated input to gas.
		std::vector<int> ascVals(V.size());
		// Fill up with ascending values from 0 to V.size()-1. These will act as
		// indices into V.
		std::iota(ascVals.begin(), ascVals.end(), 0);
		std::sort(ascVals.begin(), ascVals.end(), [&](int x, int y) {
		if (V[x] == gas && V[y] == gas) {
		// We are comparing two inputs whose value is still to be decided.
		if (x == candidate) {
		V[x] = nsolid++;
		} else {
		V[y] = nsolid++;
		}
		}
		if (V[x] == gas) {
		candidate = x;
		} else if (V[y] == gas) {
		candidate = y;
		}
		return V[x] < V[y];
		});
		}

template <typename T>		template <typename T>
void fillValues(std::vector<T>& V, size_t N, Order O) {		void fillValues(std::vector<T>& V, size_t N, Order O) {
if (O == Order::SingleElement) {		if (O == Order::SingleElement) {
V.resize(N, 0);		V.resize(N, 0);
		} else if (O == Order::QuickSortAdversary) {
		fillAdversarialQuickSortInput(V, N);
} else {		} else {
while (V.size() < N)		while (V.size() < N)
V.push_back(V.size());		V.push_back(V.size());
}		}
}		}

template <typename T>		template <typename T>
void fillValues(std::vector<std::pair<T, T> >& V, size_t N, Order O) {		void fillValues(std::vector<std::pair<T, T> >& V, size_t N, Order O) {
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	case Order::SingleElement:
break;		break;
case Order::PipeOrgan:		case Order::PipeOrgan:
std::sort(V.begin(), V.end());		std::sort(V.begin(), V.end());
std::reverse(V.begin() + V.size() / 2, V.end());		std::reverse(V.begin() + V.size() / 2, V.end());
break;		break;
case Order::Heap:		case Order::Heap:
std::make_heap(V.begin(), V.end());		std::make_heap(V.begin(), V.end());
break;		break;
		case Order::QuickSortAdversary:
		// Nothing to do
		break;
}		}
}		}

constexpr size_t TestSetElements =		constexpr size_t TestSetElements =
#if !TEST_HAS_FEATURE(memory_sanitizer)		#if !TEST_HAS_FEATURE(memory_sanitizer)
1 << 18;		1 << 18;
#else		#else
1 << 14;		1 << 14;
▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

libcxx/include/__algorithm/sort.h

Show First 20 Lines • Show All 255 Lines • ▼ Show 20 Lines if (__first1 != __last1)

::new ((void*)__j2) value_type(_VSTD::move(*__first1)); ::new ((void*)__j2) value_type(_VSTD::move(*__first1));

__d.template __incr<value_type>(); __d.template __incr<value_type>();

} }

__h.release(); __h.release();

} }

template <class _Compare, class _RandomAccessIterator> template <class _Compare, class _RandomAccessIterator>

void void __introsort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp,

ldionneUnsubmitted

Done

It seems to me like you reformatted a lot of this code, making the change itself a lot more difficult to understand. Please do the re-formatting either before or after your functional change, in a separate "no functionality change" patch.

ldionne: It seems to me like you reformatted a lot of this code, making the change itself a lot more…

nilayvaishAuthorUnsubmitted

Done

I have dropped the formatting changes. Those were done by arc and I just accepted them.

nilayvaish: I have dropped the formatting changes. Those were done by arc and I just accepted them.

ldionneUnsubmitted

Not Done

template <class _Compare, class _RandomAccessIterator>

- void

+ void _LIBCPP_HIDE_FROM_ABI

__introsort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp,

ldionne:

__sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) typename _VSTD::iterator_traits<_RandomAccessIterator>::difference_type __depth) {

{

typedef typename iterator_traits<_RandomAccessIterator>::difference_type difference_type; typedef typename iterator_traits<_RandomAccessIterator>::difference_type difference_type;

typedef typename iterator_traits<_RandomAccessIterator>::value_type value_type; typedef typename iterator_traits<_RandomAccessIterator>::value_type value_type;

const difference_type __limit = is_trivially_copy_constructible<value_type>::value && const difference_type __limit =

is_trivially_copy_assignable<value_type>::value ? 30 : 6; is_trivially_copy_constructible<value_type>::value && is_trivially_copy_assignable<value_type>::value ? 30 : 6;

while (true) typedef typename __comp_ref_type<_Compare>::type _Comp_ref;

{ while (true) {

__restart: __restart:

if (__depth == 0) {

// Fallback to heap sort as Introsort suggests.

_VSTD::__partial_sort<_Comp_ref>(__first, __last, __last, _Comp_ref(__comp));

return;

}

--__depth;

difference_type __len = __last - __first; difference_type __len = __last - __first;

switch (__len) switch (__len) {

{

case 0: case 0:

case 1: case 1:

return; return;

case 2: case 2:

if (__comp(*--__last, *__first)) if (__comp(*--__last, *__first))

swap(*__first, *__last); swap(*__first, *__last);

return; return;

case 3: case 3:

_VSTD::__sort3<_Compare>(__first, __first+1, --__last, __comp); _VSTD::__sort3<_Compare>(__first, __first + 1, --__last, __comp);

return; return;

case 4: case 4:

_VSTD::__sort4<_Compare>(__first, __first+1, __first+2, --__last, __comp); _VSTD::__sort4<_Compare>(__first, __first + 1, __first + 2, --__last, __comp);

return; return;

case 5: case 5:

_VSTD::__sort5<_Compare>(__first, __first+1, __first+2, __first+3, --__last, __comp); _VSTD::__sort5<_Compare>(__first, __first + 1, __first + 2, __first + 3, --__last, __comp);

return; return;

} }

if (__len <= __limit) if (__len <= __limit) {

{

_VSTD::__insertion_sort_3<_Compare>(__first, __last, __comp); _VSTD::__insertion_sort_3<_Compare>(__first, __last, __comp);

return; return;

} }

// __len > 5 // __len > 5

_RandomAccessIterator __m = __first; _RandomAccessIterator __m = __first;

_RandomAccessIterator __lm1 = __last; _RandomAccessIterator __lm1 = __last;

--__lm1; --__lm1;

unsigned __n_swaps; unsigned __n_swaps;

{ {

difference_type __delta; difference_type __delta;

if (__len >= 1000) if (__len >= 1000) {

{

__delta = __len/2; __delta = __len / 2;

__m += __delta; __m += __delta;

__delta /= 2; __delta /= 2;

__n_swaps = _VSTD::__sort5<_Compare>(__first, __first + __delta, __m, __m+__delta, __lm1, __comp); __n_swaps = _VSTD::__sort5<_Compare>(__first, __first + __delta, __m, __m + __delta, __lm1, __comp);

} } else {

else

{

__delta = __len/2; __delta = __len / 2;

__m += __delta; __m += __delta;

__n_swaps = _VSTD::__sort3<_Compare>(__first, __m, __lm1, __comp); __n_swaps = _VSTD::__sort3<_Compare>(__first, __m, __lm1, __comp);

} }

// *__m is median // *__m is median

// partition [__first, __m) < *__m and *__m <= [__m, __last) // partition [__first, __m) < *__m and *__m <= [__m, __last)

// (this inhibits tossing elements equivalent to __m around unnecessarily) // (this inhibits tossing elements equivalent to __m around unnecessarily)

_RandomAccessIterator __i = __first; _RandomAccessIterator __i = __first;

_RandomAccessIterator __j = __lm1; _RandomAccessIterator __j = __lm1;

// j points beyond range to be tested, *__m is known to be <= *__lm1 // j points beyond range to be tested, *__m is known to be <= *__lm1

// The search going up is known to be guarded but the search coming down isn't. // The search going up is known to be guarded but the search coming down isn't.

// Prime the downward search with a guard. // Prime the downward search with a guard.

if (!__comp(*__i, *__m)) // if *__first == *__m if (!__comp(*__i, *__m)) // if *__first == *__m

{ {

// *__first == *__m, *__first doesn't go in first part // *__first == *__m, *__first doesn't go in first part

// manually guard downward moving __j against __i // manually guard downward moving __j against __i

while (true) while (true) {

{ if (__i == --__j) {

if (__i == --__j)

{

// *__first == *__m, *__m <= all other elements // *__first == *__m, *__m <= all other elements

// Parition instead into [__first, __i) == *__first and *__first < [__i, __last) // Parition instead into [__first, __i) == *__first and *__first < [__i, __last)

++__i; // __first + 1 ++__i; // __first + 1

__j = __last; __j = __last;

if (!__comp(*__first, *--__j)) // we need a guard if *__first == *(__last-1) if (!__comp(*__first, *--__j)) // we need a guard if *__first == *(__last-1)

{ {

while (true) while (true) {

{

if (__i == __j) if (__i == __j)

return; // [__first, __last) all equivalent elements return; // [__first, __last) all equivalent elements

if (__comp(*__first, *__i)) if (__comp(*__first, *__i)) {

{

swap(*__i, *__j); swap(*__i, *__j);

++__n_swaps; ++__n_swaps;

++__i; ++__i;

break; break;

} }

++__i; ++__i;

} }

// [__first, __i) == *__first and *__first < [__j, __last) and __j == __last - 1 // [__first, __i) == *__first and *__first < [__j, __last) and __j == __last - 1

if (__i == __j) if (__i == __j)

return; return;

while (true) while (true) {

{

while (!__comp(*__first, *__i)) while (!__comp(*__first, *__i))

++__i; ++__i;

while (__comp(*__first, *--__j)) while (__comp(*__first, *--__j))

; ;

if (__i >= __j) if (__i >= __j)

break; break;

swap(*__i, *__j); swap(*__i, *__j);

++__n_swaps; ++__n_swaps;

++__i; ++__i;

} }

// [__first, __i) == *__first and *__first < [__i, __last) // [__first, __i) == *__first and *__first < [__i, __last)

// The first part is sorted, sort the second part // The first part is sorted, sort the second part

// _VSTD::__sort<_Compare>(__i, __last, __comp); // _VSTD::__sort<_Compare>(__i, __last, __comp);

__first = __i; __first = __i;

goto __restart; goto __restart;

} }

if (__comp(*__j, *__m)) if (__comp(*__j, *__m)) {

{

swap(*__i, *__j); swap(*__i, *__j);

++__n_swaps; ++__n_swaps;

break; // found guard for downward moving __j, now use unguarded partition break; // found guard for downward moving __j, now use unguarded partition

} }

// It is known that *__i < *__m // It is known that *__i < *__m

++__i; ++__i;

// j points beyond range to be tested, *__m is known to be <= *__lm1 // j points beyond range to be tested, *__m is known to be <= *__lm1

// if not yet partitioned... // if not yet partitioned...

if (__i < __j) if (__i < __j) {

{

// known that *(__i - 1) < *__m // known that *(__i - 1) < *__m

// known that __i <= __m // known that __i <= __m

while (true) while (true) {

{

// __m still guards upward moving __i // __m still guards upward moving __i

while (__comp(*__i, *__m)) while (__comp(*__i, *__m))

++__i; ++__i;

// It is now known that a guard exists for downward moving __j // It is now known that a guard exists for downward moving __j

while (!__comp(*--__j, *__m)) while (!__comp(*--__j, *__m))

; ;

if (__i > __j) if (__i > __j)

break; break;

swap(*__i, *__j); swap(*__i, *__j);

++__n_swaps; ++__n_swaps;

// It is known that __m != __j // It is known that __m != __j

// If __m just moved, follow it // If __m just moved, follow it

if (__m == __i) if (__m == __i)

__m = __j; __m = __j;

++__i; ++__i;

} }

// [__first, __i) < *__m and *__m <= [__i, __last) // [__first, __i) < *__m and *__m <= [__i, __last)

if (__i != __m && __comp(*__m, *__i)) if (__i != __m && __comp(*__m, *__i)) {

{

swap(*__i, *__m); swap(*__i, *__m);

++__n_swaps; ++__n_swaps;

} }

// [__first, __i) < *__i and *__i <= [__i+1, __last) // [__first, __i) < *__i and *__i <= [__i+1, __last)

// If we were given a perfect partition, see if insertion sort is quick... // If we were given a perfect partition, see if insertion sort is quick...

if (__n_swaps == 0) if (__n_swaps == 0) {

{

bool __fs = _VSTD::__insertion_sort_incomplete<_Compare>(__first, __i, __comp); bool __fs = _VSTD::__insertion_sort_incomplete<_Compare>(__first, __i, __comp);

if (_VSTD::__insertion_sort_incomplete<_Compare>(__i+1, __last, __comp)) if (_VSTD::__insertion_sort_incomplete<_Compare>(__i + 1, __last, __comp)) {

{

if (__fs) if (__fs)

return; return;

__last = __i; __last = __i;

continue; continue;

} } else {

else if (__fs) {

{

if (__fs)

{

__first = ++__i; __first = ++__i;

continue; continue;

} }

// sort smaller range with recursive call and larger with tail recursion elimination // sort smaller range with recursive call and larger with tail recursion elimination

if (__i - __first < __last - __i) if (__i - __first < __last - __i) {

{ _VSTD::__introsort<_Compare>(__first, __i, __comp, __depth);

_VSTD::__sort<_Compare>(__first, __i, __comp);

// _VSTD::__sort<_Compare>(__i+1, __last, __comp);

__first = ++__i; __first = ++__i;

} } else {

else _VSTD::__introsort<_Compare>(__i + 1, __last, __comp, __depth);

{

_VSTD::__sort<_Compare>(__i+1, __last, __comp);

// _VSTD::__sort<_Compare>(__first, __i, __comp);

__last = __i; __last = __i;

} }

template <typename _Number>

inline _LIBCPP_HIDE_FROM_ABI _Number __log2i(_Number __n) {

_Number __log2 = 0;

while (__n > 1) {

__log2++;

__n >>= 1;

}

return __log2;

}

template <class _Compare, class _RandomAccessIterator>

void __sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {

typedef typename iterator_traits<_RandomAccessIterator>::difference_type difference_type;

difference_type __depth_limit = 2 * __log2i(__last - __first);

__introsort(__first, __last, __comp, __depth_limit);

}

template <class _Compare, class _Tp> template <class _Compare, class _Tp>

inline _LIBCPP_INLINE_VISIBILITY inline _LIBCPP_INLINE_VISIBILITY

void void

__sort(_Tp** __first, _Tp** __last, __less<_Tp*>&) __sort(_Tp** __first, _Tp** __last, __less<_Tp*>&)

{ {

__less<uintptr_t> __comp; __less<uintptr_t> __comp;

_VSTD::__sort<__less<uintptr_t>&, uintptr_t*>((uintptr_t*)__first, (uintptr_t*)__last, __comp); _VSTD::__sort<__less<uintptr_t>&, uintptr_t*>((uintptr_t*)__first, (uintptr_t*)__last, __comp);

} }

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<char>&, char*>(char*, char*, __less<char>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<char>&, char*>(char*, char*, __less<char>&))

#ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS #ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<wchar_t>&, wchar_t*>(wchar_t*, wchar_t*, __less<wchar_t>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<wchar_t>&, wchar_t*>(wchar_t*, wchar_t*, __less<wchar_t>&))

#endif #endif

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<signed char>&, signed char*>(signed char*, signed char*, __less<signed char>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<signed char>&, signed char*>(signed char*, signed char*, __less<signed char>&))

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned char>&, unsigned char*>(unsigned char*, unsigned char*, __less<unsigned char>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned char>&, unsigned char*>(unsigned char*, unsigned char*, __less<unsigned char>&))

QuuxplusoneUnsubmitted

Not Done

This line needs to use _VSTD::__introsort (ADL-proofing — hmm, how did this pass robust_against_adl.pass.cpp? I should investigate).
Also, it needs to use _Comp_ref<_Compare> instead of making a copy of __comp.
@nilayvaish, please hunt through the rest of this patch to see if there are any other similar cases. (I did not look.)

I will investigate adding a new test along the lines of algorithms/robust_*.cpp to verify that we never accidentally copy comparators or predicates.

Quuxplusone: This line needs to use `_VSTD::__introsort` (ADL-proofing — hmm, how did this pass…

QuuxplusoneUnsubmitted

Not Done

Actually, nvm, I got nerdsniped into taking a look myself: D114133

Quuxplusone: Actually, nvm, I got nerdsniped into taking a look myself: D114133

nilayvaishAuthorUnsubmitted

Not Done

Noted. Will send a diff later today.

nilayvaish: Noted. Will send a diff later today.

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<short>&, short*>(short*, short*, __less<short>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<short>&, short*>(short*, short*, __less<short>&))

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned short>&, unsigned short*>(unsigned short*, unsigned short*, __less<unsigned short>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned short>&, unsigned short*>(unsigned short*, unsigned short*, __less<unsigned short>&))

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<int>&, int*>(int*, int*, __less<int>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<int>&, int*>(int*, int*, __less<int>&))

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned>&, unsigned*>(unsigned*, unsigned*, __less<unsigned>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned>&, unsigned*>(unsigned*, unsigned*, __less<unsigned>&))

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<long>&, long*>(long*, long*, __less<long>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<long>&, long*>(long*, long*, __less<long>&))

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned long>&, unsigned long*>(unsigned long*, unsigned long*, __less<unsigned long>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned long>&, unsigned long*>(unsigned long*, unsigned long*, __less<unsigned long>&))

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<long long>&, long long*>(long long*, long long*, __less<long long>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<long long>&, long long*>(long long*, long long*, __less<long long>&))

_LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned long long>&, unsigned long long*>(unsigned long long*, unsigned long long*, __less<unsigned long long>&)) _LIBCPP_EXTERN_TEMPLATE(_LIBCPP_FUNC_VIS void __sort<__less<unsigned long long>&, unsigned long long*>(unsigned long long*, unsigned long long*, __less<unsigned long long>&))

▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	for (int i = 0; i < array_size; i++) {
pv[i] = &v[array_size - 1 - i];		pv[i] = &v[array_size - 1 - i];
}		}
std::sort(pv, pv + array_size);		std::sort(pv, pv + array_size);
assert(*pv[0] == v[0]);		assert(*pv[0] == v[0]);
assert(*pv[1] == v[1]);		assert(*pv[1] == v[1]);
assert(*pv[array_size - 1] == v[array_size - 1]);		assert(*pv[array_size - 1] == v[array_size - 1]);
}		}

		// test_adversarial_quicksort generates a vector with values arranged in such a
		// way that they would invoke O(N^2) behavior on any quick sort implementation
		// that satisifies certain conditions. Details are available in the following
		// paper:
		// "A Killer Adversary for Quicksort", M. D. McIlroy, Software—Practice &
		// ExperienceVolume 29 Issue 4 April 10, 1999 pp 341–344.
		// https://dl.acm.org/doi/10.5555/311868.311871.
		struct AdversaryComparator {
		AdversaryComparator(int N, std::vector<int>& input) : gas(N - 1), V(input) {
		V.resize(N);
		// Populate all positions in the generated input to gas to indicate that
		// none of the values have been fixed yet.
		for (int i = 0; i < N; ++i)
		V[i] = gas;
		}

		bool operator()(int x, int y) {
		if (V[x] == gas && V[y] == gas) {
		// We are comparing two inputs whose value is still to be decided.
		if (x == candidate) {
		V[x] = nsolid++;
		} else {
		V[y] = nsolid++;
		}
		}
		if (V[x] == gas) {
		candidate = x;
		} else if (V[y] == gas) {
		candidate = y;
		}
		return V[x] < V[y];
		}

		private:
		// If an element is equal to gas, it indicates that the value of the element
		// is still to be decided and may change over the course of time.
		const int gas;
		// This is a reference so that we can manipulate the input vector later.
		std::vector<int>& V;
		// Candidate for the pivot position.
		int candidate = 0;
		int nsolid = 0;
		};

		void test_adversarial_quicksort(int N) {
		assert(N > 0);
		std::vector<int> ascVals(N);
		// Fill up with ascending values from 0 to N-1. These will act as indices
		// into V.
		std::iota(ascVals.begin(), ascVals.end(), 0);
		std::vector<int> V;
		AdversaryComparator comp(N, V);
		std::sort(ascVals.begin(), ascVals.end(), comp);
		std::sort(V.begin(), V.end());
		assert(std::is_sorted(V.begin(), V.end()));
		}

int main(int, char**)		int main(int, char**)
{		{
// test null range		// test null range
int d = 0;		int d = 0;
std::sort(&d, &d);		std::sort(&d, &d);
// exhaustively test all possibilities up to length 8		// exhaustively test all possibilities up to length 8
test_sort_<1>();		test_sort_<1>();
test_sort_<2>();		test_sort_<2>();
test_sort_<3>();		test_sort_<3>();
test_sort_<4>();		test_sort_<4>();
test_sort_<5>();		test_sort_<5>();
test_sort_<6>();		test_sort_<6>();
test_sort_<7>();		test_sort_<7>();
test_sort_<8>();		test_sort_<8>();

test_larger_sorts(256);		test_larger_sorts(256);
test_larger_sorts(257);		test_larger_sorts(257);
test_larger_sorts(499);		test_larger_sorts(499);
test_larger_sorts(500);		test_larger_sorts(500);
test_larger_sorts(997);		test_larger_sorts(997);
test_larger_sorts(1000);		test_larger_sorts(1000);
test_larger_sorts(1009);		test_larger_sorts(1009);

test_pointer_sort();		test_pointer_sort();
		test_adversarial_quicksort(1 << 20);

return 0;		return 0;
}		}

This is an archive of the discontinued LLVM Phabricator instance.

Add introsort to avoid O(n^2) behavior and a benchmark for adversarial quick sort input.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 385819

libcxx/benchmarks/algorithms.bench.cpp

libcxx/include/__algorithm/sort.h

libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp

Add introsort to avoid O(n^2) behavior and a benchmark for adversarial quick sort input.
ClosedPublic