This is an archive of the discontinued LLVM Phabricator instance.

Introduce branchless sorting functions for sort3, sort4 and sort5.
ClosedPublic

Authored by marcogelmi on Jan 24 2022, 3:50 AM.

Details

Summary

We are introducing branchless variants for sort3, sort4 and sort5.
These sorting functions have been generated using Reinforcement
Learning and aim to replace sort3, sort4 and __sort5 variants
for integral types.

The libc++ benchmarks were run on isolated machines for Skylake, ARM and
AMD architectures and achieve statistically significant improvement in
sorting random integers on test cases from sort1 to sort262144 for
uint32 and uint64.

A full performance overview for Intel Skylake, AMD and Arm can be
found here: https://bit.ly/3AtesYf

Diff Detail

Event Timeline

marcogelmi created this revision.Jan 24 2022, 3:50 AM
marcogelmi requested review of this revision.Jan 24 2022, 3:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2022, 3:50 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
philnik requested changes to this revision.Jan 24 2022, 4:12 AM
philnik added a subscriber: philnik.

A few points right now:
(1) Please use using x = . It's a clang extension in C++03, so we can use it.
(2) __magic_swap is a really bad name. Please change that to something more meaningful.
(3) Don't qualify any type traits with _VSTD::.
(4) You can't use enable_if_t. Use __enable_if_t instead.
I'll have a closer look later.

This revision now requires changes to proceed.Jan 24 2022, 4:12 AM

fix initial style issues

@philnik just wondering whether you had a chance to look at the new changes we have made to the patch?

I'm guessing all the ternaries are making the compiler generate cmovs?
I noticed that the benchmarks for size 4 are generally pretty bad. Do you know if anything can be done there?
I haven't yet looked at the logic itself very closely.

libcxx/include/__algorithm/sort.h
152

Ditto below

160

Is there a reason you restricted this to integrals? Does it pessimize some code? I guessing at least all primitive types should be allowed. I haven't benchmarked, but looking at code generation it seems floating points should also benefit from this change.

I noticed that the benchmarks for size 4 are generally pretty bad. Do you know if anything can be done there?

I noticed this also, but assume that we don't really care about the performance of std::sort on arrays of size 4 (or less). ;) If we're trading some perf on unrealistically small arrays, for better perf on large arrays, that sounds like a good tradeoff to me.
But I, also, have not particularly reviewed this code. Just glanced briefly.

libcxx/include/__algorithm/sort.h
152

And vice versa, _VSTD::__sort3_internal. All function calls (that take arguments of user-defined types, anyway) need qualification specifically so that you don't get ADL. Type/variable/concept/... names don't need qualification because they never do ADL.
Also, almost certainly _VSTD::__sort3_internal<_Compare>, right? We should have a test for "no unnecessary copies of predicates" that catches this, and if it's not failing on this PR then it needs a bit of investigation why not.

160

is_scalar might be relevant.

marcogelmi updated this revision to Diff 404904.Feb 1 2022, 6:04 AM

use is_arithmetic instead of is_integral

marcogelmi added a comment.EditedFeb 1 2022, 6:08 AM

I'm guessing all the ternaries are making the compiler generate cmovs?

yes that's correct

I noticed that the benchmarks for size 4 are generally pretty bad. Do you know if anything can be done there?

Random 4 is actually very good. It's bad for deterministic orderings because the benchmark just keeps on calling sort4 with the same ordering over and over again, so the branch predictor has a very easy job. I suspect this type of data pattern is not very realistic.

marcogelmi marked an inline comment as done.Feb 1 2022, 6:15 AM
marcogelmi added inline comments.
libcxx/include/__algorithm/sort.h
152

sorry what line of code is this referring to? I thought I used _VSTD::__sort3_internal<_Compare> everywhere I'm calling it.

160

this relies on cmovs, on larger data types it could pessimize the code if it compiles to extra copies instead, or not compile at all if it's not trivially copyable.

160

switched to is_arithmetic and added benchmarks here: https://bit.ly/3rgv2r9

Random 4 is actually very good. It's bad for deterministic orderings because the benchmark just keeps on calling sort4 with the same ordering over and over again, so the branch predictor has a very easy job. I suspect this type of data pattern is not very realistic.

Ok, that makes sense.
I checked the sorting logic now and that part LGTM. Now the only question is which types should be allowed.

libcxx/include/__algorithm/sort.h
160

Could we generalize it to something like sizeof(typename iterator_traits<_RandomAccessIterator>::value_type) <= sizeof(size_t) && is_trivially_copyable_v<typename iterator_traits<_RandomAccessIterator>::value_type>? I'm assuming here that sizeof(size_t) is less than or equal to the word size. This would allow this optimization to be performed on something like std::optional<int>. Could you check if that still yields a performance improvement? std::optional doesn't have a trivial comparison operator, so it might be interesting.

marcogelmi added inline comments.Feb 2 2022, 9:33 AM
libcxx/include/__algorithm/sort.h
160

so I tried testing it on optional<int> and the results are underwhelming (pretty much equivalent on intel skylake and actually slower on ARM - see below).

I think this is because the compiler doesn't generate conditional moves, see https://godbolt.org/z/o6648s6f1

I wonder if there is a way to ensure it's enabled only for types for which conditional moves will be generated...

Results on ARM:

name                            old cpu/op  new cpu/op  delta
BM_Sort_optional_Random_1       4.02ns ± 0%  4.01ns ± 1%   -0.36%   (p=0.000 n=99+98)
BM_Sort_optional_Random_3       5.33ns ± 0%  5.46ns ± 0%   +2.42%   (p=0.000 n=94+93)
BM_Sort_optional_Random_4       5.39ns ± 0%  6.17ns ± 0%  +14.45%   (p=0.000 n=99+99)
BM_Sort_optional_Random_5       6.47ns ± 0%  7.35ns ± 0%  +13.54%   (p=0.000 n=99+98)
BM_Sort_optional_Random_16      10.2ns ± 1%  10.8ns ± 2%   +5.74%  (p=0.000 n=95+100)
philnik accepted this revision as: philnik.Feb 2 2022, 10:45 AM

I don't see a way to force clang to generate the cmovs, so I'm OK with the patch % nit. I'd like @ldionne to have a look too, since I don't know if the want the new functions to be part of the ABI or not. I'd say we don't, but that's not my decision to make.

libcxx/benchmarks/algorithms.bench.cpp
30–32
marcogelmi updated this revision to Diff 405556.Feb 3 2022, 2:35 AM

fix whitespace

marcogelmi marked an inline comment as done.Feb 3 2022, 2:36 AM

hi @ldionne I was wondering if you had a chance to look at this?

ldionne requested changes to this revision.Feb 7 2022, 12:58 PM

Thanks for the patch. So, to summarize, these ternary conditionals generate cmov instead of normal branches when the types involved are arithmetic, right? And that's basically where the improvements come from.

This looks sensible to me, the benchmarks show a fairly consistent improvement for the sizes that we care most about.

libcxx/include/__algorithm/sort.h
128

Also, does the inline make a difference in your benchmarks? If not, I'd drop it, because they are semantically equivalent (templates have inline semantics by default).

This comment applies below in several places too.

201

Instead, would it be possible to put the enable_if_t on __sort5 directly, and then only add one overload of __sort5 for when is_arithmetic is true? This would avoid introducing a new layer of complexity like __sort5_internal. This applies elsewhere too.

This revision now requires changes to proceed.Feb 7 2022, 12:58 PM

thanks for the review!

libcxx/include/__algorithm/sort.h
128

Removing the inline seems to negatively affect the performance for some combinations of tests and architectures. Would it be ok to keep the inline for extra safety?

201

unfortunately the original version of __sortN also returns the number of swaps, while the new one doesn't, so we need both versions

ldionne added inline comments.Feb 10 2022, 7:26 AM
libcxx/include/__algorithm/sort.h
128

Can you share those numbers? IMO that's a LLVM bug, since we like to say that inline doesn't have any impact on the codegen. But yeah, at the end of the day, if it makes a positive difference we should keep it.

201

Right -- that actually highlights the problem I have with the name chosen (__sort5_internal). There's nothing "internal" about that version of __sort5. Instead, we could rename it to __sort5_maybe_branchless? It's not great, but at least it explains what it does.

rename sortN_internal to sortN_maybe_branchless

  • rename sortN_internal to sortN_maybe_branchless
  • use _LIBCPP_HIDE_FROM_ABI
marcogelmi marked an inline comment as done.Feb 10 2022, 9:51 AM
marcogelmi added inline comments.
libcxx/include/__algorithm/sort.h
128

sure, here they are: https://bit.ly/34Q6z3q

libcxx/include/__algorithm/sort.h
134

Either call _VSTD::__cond_swap<_Compare>(...) below, or (less good IMHO) change it up here to take _Compare& instead of _Compare. Otherwise we end up making copies of the predicate — and I would have expected that to fail libcxx/test/libcxx/algorithms/robust_against_copying_comparators.pass.cpp! Could you also find out why that test isn't failing, and add a regression test to it? (I imagine it's just not taking the offending codepath because of the shape of the input data values, so adding a few extra test cases of the correct shape will solve the problem.)

Ditto __partially_sorted_swap.

187–191

Nit: It's more libc++ style to use enable_if_t<>* = nullptr than return type SFINAE; I'd say the advantages are readability and that return type SFINAE greatly increases the length of the mangled name while an extra void* type parameter doesn't so much.
Drive-by s/typename/class/ and whitespace.
Likewise below.

add regression test for comparator copies of ints

fix template issues

additional fixes for missing temmplates

fix c++03 test

marcogelmi marked an inline comment as done.Feb 11 2022, 11:19 AM
marcogelmi added inline comments.
libcxx/include/__algorithm/sort.h
187–191

I'm not entirely sure why, but this seems to be failing for the c++03 build (it works fine for the others):, e.g. https://reviews.llvm.org/harbormaster/unit/view/2520185/

candidate template ignored: substitution failure [with _Compare = std::__debug_less<NonConstArgCmp>, _RandomAccessIterator = int *]: non-type template argument does not refer to any declaration
inline _LIBCPP_HIDE_FROM_ABI void __sort3_maybe_branchless(_RandomAccessIterator __x1, _RandomAccessIterator __x2,

revert small, unintended test change

libcxx/benchmarks/algorithms.bench.cpp
30–32

It might be (long past) time to change this to simply

using Types = std::tuple<
    uint32_t, uint64_t, std::pair<uint32_t, uint64_t>, std::tuple<uint32_t, uint64_t, uint32_t>,
    std::string, float
>;

template<class V>
using Value = std::tuple_element_t<(int)V(), Types>;
libcxx/include/__algorithm/sort.h
152

sorry what line of code is this referring to?

My best guess is that I misread line 153

__sort3_maybe_branchless(_RandomAccessIterator __x1, _RandomAccessIterator __x2, _RandomAccessIterator __x3,

as a function call. :) No action required.

160

FWIW, I think it would be worth pulling out the condition into a named trait, like

template<class _Tp>
using __use_branchless_sort = bool_constant<
  sizeof(_Tp) <= sizeof(void*) && is_scalar<_Tp>::value
>;

All this focus on value_type has recently made me paranoid about vector<bool> and proxy iterators in general. I see your __partially_sorted_swap doing like 13 calls to operator*; are we sure that's still going to be efficient when the _RandomAccessIterator in question is an arbitrarily complicated Ranges thing? (We don't support ranges::sort yet, but it would be unfortunate if we poured all this effort into std::sort and then had to re-evaluate the entire thing for ranges::sort.)

Do you actually have a use-case for arbitrary random-access iterators? Could we limit this to contiguous iterators?

template<class _Iter, class _Tp = typename iterator_traits<_Iter>::value_type>
using __use_branchless_sort = bool_constant<
  __is_cpp17_contiguous_iterator<_Iter> &&
  sizeof(_Tp) <= sizeof(void*) && is_scalar<_Tp>::value
>;

Also note I keep trying to expand arithmetic to scalar. :)

187–191

Yuck, I see that on Godbolt (C++03 is unhappy with giving a non-type template parameter a default value). So I guess the return type SFINAE is fine.

libcxx/test/libcxx/algorithms/robust_against_copying_comparators.pass.cpp
177 ↗(On Diff #407968)

IIUC, what you've done here is just add a call to std::sort with value_type=int, which triggers your new thing, specifically because your new thing is constrained to arithmetic value_types instead of (as I keep trying to drive-by it into) scalar value_types. (Pointer types are scalar but not arithmetic.)

Is this the only change that was needed in order to trigger your comparator-copying codepath?
If so, I'd slightly rather see you just expand your new thing to include scalar types!

I don't really object to the idea of templatizing the whole thing on T and testing with different Ts; but it feels like a lot of churn if the only culprit was arithmetic/scalar, and also I'm skeptical that that was the only culprit here.

I had thought that the problem was more about the shape of the input data — I mean, notice that is_sorted(first, last) is already true going into this call.

In particular, if we now know that there are different codepaths taken for arrays of length 3, 4, 5, then shouldn't we be testing here

(void)std::sort(first, first+3, Less(&copies)); assert(copies == 0);
(void)std::sort(first, first+4, Less(&copies)); assert(copies == 0);
(void)std::sort(first, first+5, Less(&copies)); assert(copies == 0);
(void)std::sort(first, last, Less(&copies)); assert(copies == 0);

?

address style comments, use named trait

marcogelmi marked 5 inline comments as done.Feb 14 2022, 9:40 AM

Thanks for the suggestions.

Regarding is_scalar vs is_arithmetic: expanding to is_scalar would include pointer types (which are likely to have a custom comparator) so we would have to test whether our changes provide similar improvements when you have a complex custom comparator. This is also not trivial to do with the current test setup.

Can we therefore first submit this CL and then look into the feasibility of incorporating those additional types into our setup?

libcxx/benchmarks/algorithms.bench.cpp
30–32

note that I'm using clang-format and it's changing the format slightly. Also, I had to change to a plain enum

libcxx/test/libcxx/algorithms/robust_against_copying_comparators.pass.cpp
177 ↗(On Diff #407968)

yes this is triggered by int specifically because void* is not included by is_arithmetic

the reason why this is triggered even if the shape of the data is of length 10 is because there is a base case (insertion_sort_3) which calls __sort3 once and is invoked when 5 < size <= 30.

Anyway, I've added the test cases you recommended.

marcogelmi marked 2 inline comments as done.Feb 15 2022, 2:06 AM
marcogelmi added inline comments.
libcxx/include/__algorithm/sort.h
127

bool_constant seems to need >= c++14, any suggestion on how to make this work for c++03?

philnik added inline comments.Feb 15 2022, 2:09 AM
libcxx/include/__algorithm/sort.h
127

use _LIBCPP_BOOL_CONSTANT

use _LIBCPP_BOOL_CONSTANT instead of bool_constant

rebase to head

hi @Quuxplusone, @ldionne just pinging this thread: what do you think of the current state of the diff?

FWIW, I have no strong desire to be obstructionist, and I definitely don't think we should abandon this direction, but I have one bad and one good(?) reason I think we should hold this PR in suspended animation for a while longer:

  • Bad: I'm still uncomfortable with the invasiveness of the changes to robust_against_copying_comparators.pass.cpp.
  • Good(?): Very soon now we're going to have to implement ranges::sort, which innovates in a bunch of different directions — we're going to have to deal with sentinels as well as iterators, and we're going to have to replace __c(*x, *y) with invoke(__c, invoke(__p, *x), invoke(__p, *y), and so on. I feel like it would be a good idea to get that stuff out of the way first, and then revisit algorithmic changes like this PR, rather than vice versa. Unfortunately I have no particular estimate of when ranges::sort will get done; @philnik might have a long-range plan to share, but being as we're still down in the foothills of ranges::min_element etc., personally I'd predict it'll be a couple months at least.

Anyway, @ldionne will be the ultimate arbiter (but IIRC he's on vacation this week).

I'll also keep recommending that you replace is_arithmetic with is_scalar, until you actually do it! or explain why not. :)

libcxx/benchmarks/algorithms.bench.cpp
30–32

For my curiosity, why did you have to change to a plain enum? (All else being equal, we should prefer enum class, but if it really technically doesn't work, then I don't imagine it's a big deal.)

FWIW, I have no strong desire to be obstructionist, and I definitely don't think we should abandon this direction, but I have one bad and one good(?) reason I think we should hold this PR in suspended animation for a while longer:

  • Bad: I'm still uncomfortable with the invasiveness of the changes to robust_against_copying_comparators.pass.cpp.
  • Good(?): Very soon now we're going to have to implement ranges::sort, which innovates in a bunch of different directions — we're going to have to deal with sentinels as well as iterators, and we're going to have to replace __c(*x, *y) with invoke(__c, invoke(__p, *x), invoke(__p, *y), and so on. I feel like it would be a good idea to get that stuff out of the way first, and then revisit algorithmic changes like this PR, rather than vice versa. Unfortunately I have no particular estimate of when ranges::sort will get done; @philnik might have a long-range plan to share, but being as we're still down in the foothills of ranges::min_element etc., personally I'd predict it'll be a couple months at least.

Anyway, @ldionne will be the ultimate arbiter (but IIRC he's on vacation this week).

I'll also keep recommending that you replace is_arithmetic with is_scalar, until you actually do it! or explain why not. :)

I think ranges::sort is one of the last algorithms on the list. Unlike the ones I'm working on currently, the most naive implementation isn't the fastest. So it will be much more work, and even the ones currently under review take a long time, since we don't have many copy/paste-able tests currently. Assuming we will get faster I'd guess the earliest I would start to implement sort is in 4-6 months. I wouldn't be surprised if sort alone will be under review for a few months.

marcogelmi marked an inline comment as done.Feb 22 2022, 2:53 AM

FWIW, I have no strong desire to be obstructionist, and I definitely don't think we should abandon this direction, but I have one bad and one good(?) reason I think we should hold this PR in suspended animation for a while longer:

  • Bad: I'm still uncomfortable with the invasiveness of the changes to robust_against_copying_comparators.pass.cpp.
  • Good(?): Very soon now we're going to have to implement ranges::sort, which innovates in a bunch of different directions — we're going to have to deal with sentinels as well as iterators, and we're going to have to replace __c(*x, *y) with invoke(__c, invoke(__p, *x), invoke(__p, *y), and so on. I feel like it would be a good idea to get that stuff out of the way first, and then revisit algorithmic changes like this PR, rather than vice versa. Unfortunately I have no particular estimate of when ranges::sort will get done; @philnik might have a long-range plan to share, but being as we're still down in the foothills of ranges::min_element etc., personally I'd predict it'll be a couple months at least.

Anyway, @ldionne will be the ultimate arbiter (but IIRC he's on vacation this week).

I'll also keep recommending that you replace is_arithmetic with is_scalar, until you actually do it! or explain why not. :)

Regarding is_arithmetic vs is_scalar: expanding to is_scalar would include pointer types (which are likely to have a custom comparator) so we would have to test whether our changes provide similar improvements when you have a complex custom comparator. This is also not trivial to do with the current test setup. Can we therefore first submit this CL and then look into the feasibility of incorporating those additional types into our setup?

libcxx/benchmarks/algorithms.bench.cpp
30–32

this is just because we need to cast it to int to index into the Types tuple and that's more tricky with an enum class.

libcxx/benchmarks/algorithms.bench.cpp
30–32

Ahhh, I see now. There's a pair of parens missing from my original code:

using Value = std::tuple_element_t<(int)V(), Types>;

should have been either

using Value = std::tuple_element_t<(int)V()(), Types>;  // or, better,
using Value = std::tuple_element_t<(int)V::value, Types>;

Use either of those instead, please. (Preferably the latter.)

libcxx/include/__algorithm/sort.h
160

Regarding is_arithmetic vs is_scalar: expanding to is_scalar would include pointer types (which are likely to have a custom comparator) so we would have to test whether our changes provide similar improvements when you have a complex custom comparator.

Hmm. IOW, you're saying that you expect this optimization to perform well on simple predicates like

int is[1000];
Widget *ps[1000];
std::sort(is, is+1000, std::less<>);  // A
std::sort(ps, ps+1000, std::less<>);  // B

but poorly on complicated predicates like

Widget ws[10];
std::sort(is, is+1000, [&](int i, int j) { return ws[i] < ws[j]; });  // C
std::sort(ps, ps+1000, [](auto *pi, auto *pj) { return *pi < *pj; });  // D

That is, you expect that this optimization, applied across the board, would measurably speed up A and B, but measurably slow down C and D. And you expect that "pointer-ness" correlates heavily with "complicated-predicate-ness," so A is relatively much more common than C but B is much less common than D. So, your patch intentionally applies only to A and C, and excludes B and D. Is that all correct?
I do still think it'd be interesting to find out whether this patch slows down C and D.

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 11:10 AM
marcogelmi updated this revision to Diff 413004.Mar 4 2022, 7:25 AM
marcogelmi marked an inline comment as done.

revert to using enum class in tests

marcogelmi marked an inline comment as done.Mar 4 2022, 7:33 AM
marcogelmi added inline comments.
libcxx/include/__algorithm/sort.h
160

You are correct. Our patch applies to A and B and we are pretty certain that our changes will slow down C and D, because our branchless version calls the comparator more times in the average case. This is why we have limited ourselves to A and B. Note that our patch not only applies to ints but also has similar performance improvements for floats. @ldionne it would be great to get your thoughts on the patch as to whether we can submit as is? Thanks in advance.

hi @ldionne just wondering if you had time to review the latest changes? do you think it can be submitted as is? thanks again!

Thanks for the ping! I still have some comments, I'm still trying to make sure I understand all the implications of this change since std::sort is such an important (and non-trivial as we can see) algorithm!

libcxx/benchmarks/algorithms.bench.cpp
30–31
libcxx/include/__algorithm/sort.h
118

Commenting here for lack of better place: With this patch, I think we need to add tests for std::sort for a random access but non-contiguous range, otherwise we may end up testing. And for sorting on non arithmetic types, too. It looks like our std::sort tests might be lacking quite a bit, now that I look at them.

127

I would just go for integral_constant<bool, EXPR>. I actually had to look up _LIBCPP_BOOL_CONSTANT, and IMO it does not pull its weight. I'll upload a patch to remove it and we can discuss its fate there.

154–155

We usually use the following idiom for SFINAE (this is supported in C++03 as a Clang extension):

template <class _Compare, class _RandomAccessIterator, __enable_if_t<
    __use_branchless_sort<_RandomAccessIterator>::value
>* = 0>
inline _LIBCPP_HIDE_FROM_ABI
void __sort3_maybe_branchless(...) { ... }

It has the benefit that the return type is not hidden in the __enable_if_t.

160

Let me just try to state this in the most naive way possible to make sure I'm on the same page as you folks:

Basically, you assume that sorts on arithmetic types over contiguous ranges will not be using complicated comparators, and that you'll save more by avoiding branches and performing more calls to compare than reducing the number of calls to compare but having more branches. However, if we have a contiguous range of arithmetic types but the compare is extremely expensive, this optimization will actually be a pessimization over the naive approach. Is that correct?

If that is correct, then basically in an ideal world, what you'd want would be to enable this optimization only when you know that the cost of the comparator is negligible compared to the cost of branching, since that's what you're reducing with your approach. Of course, that is difficult if not impossible to tell. One approach would be to enable this optimization only when the comparator is something trivial like std::less and friends, but IDK if that would prevent this optimization in too many cases.

Enable branchless sort for simple comparators only. Increase test coverage.

fix test errors for c++03

marcogelmi marked an inline comment as done.Mar 31 2022, 5:01 AM

Thanks for your in-depth feedback!

libcxx/include/__algorithm/sort.h
118

I've added tests for non-contiguous iterators (using a std::deque<int>) and non-arithmetic types (using a std::pair<int, int>). Let me know if there are other tests I'm missing. I've also added more checks using std::is_permutation because the current tests only check for std::is_sorted

154–155

this was mentioned by another reviewer but if I try using this idiom I get failures for c++03:

candidate template ignored: substitution failure [with _Compare = std::__debug_less<NonConstArgCmp>, _RandomAccessIterator = int *]: non-type template argument does not refer to any declaration
inline _LIBCPP_HIDE_FROM_ABI void __sort3_maybe_branchless(_RandomAccessIterator __x1, _RandomAccessIterator __x2,
160

Yes your understanding is correct. After discussing your feedback amongst ourselves, we think that you had a great suggestion to enable our patch for simple comparators.

Note that I've had to enable it for __less<_Tp>& because that seems to be the default when nothing is passed in. Also, the comparator seems to be transformed into a reference using __comp_ref_type so that's why it's templated on the reference type.

ldionne accepted this revision.Apr 6 2022, 10:45 AM

Thanks a lot! I think this is now a no-brainer since we've made this more conservative (it only triggers on a few comparators that we know are going to benefit from it).

Please ship this once you've addressed my comments and the CI is passing. If you need help committing this, please provide Author Name <email@domain> and we can commit it on your behalf. Thanks again for the patch and for your patience+collaboration going through the review!

libcxx/test/std/algorithms/alg.sorting/alg.sort/sort/sort.pass.cpp
52–56 ↗(On Diff #419160)

Instead of specializing the function template, can you instead add an overload?

inline void set_value(std::pair<int, int>& dest, int value)
{
    dest.first = value;
    dest.second = value;
}
74 ↗(On Diff #419160)

Please don't use reserved identifiers in the tests. This should be class Constainer instead of class _Container.

98 ↗(On Diff #419160)

Same comment, overload instead of specialization please :-)

113 ↗(On Diff #419160)

Should be Iter, not _Iter.

We only use _Foo and __foo in the source code of the library itself because it ensures that no users can possibly define a macro with a name that conflicts with our identifiers. Inside our test code, this isn't needed (or desirable since it obscures everything).

137 ↗(On Diff #419160)

I like that we're adding this coverage!

This revision is now accepted and ready to land.Apr 6 2022, 10:45 AM
marcogelmi updated this revision to Diff 421157.Apr 7 2022, 4:22 AM

address review comments on tests style

marcogelmi marked 4 inline comments as done.Apr 7 2022, 7:22 AM

Thanks everyone for your comments and improvements to this patch!

I don't think I have committer rights so if you could commit this on my behalf that would be great. The attribution should be: Marco Gelmi <marcogelmi@deepmind.com>

Thanks again!

m4ph3w added a subscriber: m4ph3w.Jun 15 2023, 5:55 AM
Sdopn added a subscriber: Sdopn.Jul 14 2023, 2:48 AM