Page MenuHomePhabricator

[libc++] Implement `lexicographical_compare_three_way`
ClosedPublic

Authored by avogelsgesang on Aug 8 2022, 6:17 AM.

Details

Summary

The implementation makes use of the freedom added by LWG 3410. We have
two variants of this algorithm:

  • a fast path for random access iterators: This fast path computes the maximum number of loop iterations up-front and does not compare the iterators against their limits on every loop iteration.
  • A basic implementation for all other iterators: This implementation compares the iterators against their limits in every loop iteration. However, it still takes advantage of the freedom added by LWG 3410 to avoid unnecessary additional iterator comparisons, as originally specified by P1614R2.

https://godbolt.org/z/7xbMEen5e shows the benefit of the fast path:
The hot loop generated of lexicographical_compare_three_way3 is
more tight than for lexicographical_compare_three_way1. The added
benchmark illustrates how this leads to a 30% performance improvement on
integer vectors.

Implements part of P1614R2 "The Mothership has Landed"

Fixes LWG 3410 and LWG 3350

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
var-const requested changes to this revision.Nov 30 2022, 1:43 AM

Thanks a lot for the updated benchmarks! I really appreciate the effort you put into this.

All in all, the results look pretty good, but I really hope we can get to the bottom of the difference between int* and random_access_iterator<int*>.

The results for {fast,slow}_path/1 vary from run to run. The slow_path is not faster than the fast_path consistently across runs. I think this is primarily variance

I suspect that this might be more than just variance. I haven't tried plotting this, but from looking at the numbers, it seems that the fast path starts _slower_ for the smallest input of 1, becomes roughly equal in time for a slightly larger input of 4, and becomes faster with larger inputs, with the difference becoming larger and larger as the input grows. So if we were to plot this (with the x axis being input size and y axis being time), it seems like we would get two curves where the fast path curve starts out slightly above the slow path curve, but then they intersect almost immediately and from that point the slow curve goes up at a sharper angle, so to say, than the fast curve. I could also imagine how the implementation for the fast path could end up doing slightly more work for tiny inputs. So in short, I think we might be seeing a real issue where the optimization is actually slightly less efficient for very small inputs.

I doubt, however, that this is an issue worth fixing. Adding a runtime check for small inputs would introduce branching that is likely to do way more harm than good, and optimizing for large inputs is a lot more important than for small ones. In short, this seems like a good (and probably unavoidable) tradeoff.

three_way<random_access_iterator<int*>>/1048576 is less efficient than three_way<int*>/1048576. As shown below, the assembly code is exactly identical between the two, though. So this seems to be due to some micro-architectural shenanigans (maybe code alignment?)
[...]
I am not sure how to further pinpoint this difference between int* and random_access_iterator<int*>.
Either way, I think we can say for sure: The assembly for the fast path is more efficient than the assembly for the slow path.
There seems to be some performance variability, though, due to some aspects which I don't understand fully.

Here, too, I suspect we're seeing a real thing and not just variance. I say that because the difference tends to stay pretty consistent, with random_access_iterator being ~40% (30-50%) slower. This seems to point to the issue being real.

I don't immediately see how the same assembly could produce consistently different timings, so I'd start with the hypothesis that the difference happens somewhere else. On the other hand, as the input grows, any factors such as inlining or not inlining the function or copying the iterators should be completely drowned out by the body of the algorithm. But perhaps looking at a larger piece of generated assembly would help pinpoint the issue. I'll try to find the time this week to see if I can reproduce the numbers on my side.

I would expect the results for int* and random_access_iterator<int*> to be within the margin of error in an optimized build. If they differ, it might imply we accidentally do something inefficient with iterators, and it would be great to get to the bottom of this because it's such a common use case. I certainly don't want to block this patch forever on this, though. Basically, I think we should dig more into the issue, but if we run out of ideas, I'd be okay to ship it as is and deal with it later.

libcxx/benchmarks/lexicographical_compare_three_way.bench.cpp
74

Nit: passing this is unnecessary because it's the default value, right?

74

Nit: move?

libcxx/include/__algorithm/lexicographical_compare_three_way.h
49

FWIW, I don't have a strong preference here, but for me, one of the most important and informative aspects of auto is "guaranteed no conversion". This is as relevant for the author as it is for the reader. While it's true that auto can be misused by a writer to avoid thinking about which types are returned (IMO that's a bigger time "saver" than the literal typing which can often be autocompleted), I think it has legitimate uses where it makes the intention clearer.

67

Would it make sense to pass __comp by reference? It could, in theory, be e.g. a lambda that captures a lot of state. In fact, we generally commit to avoid copying user-provided functors. However, see also the other comment about __comp_ref_type (which would make this a reference but also have an additional effect in the debug mode).

92

As a libc++-specific extension, we mark the return value of lexicographical_compare with _LIBCPP_NODISCARD_EXT. We should do the same here (and add a check to test/libcxx/diagnostics/nodiscard_extensions.verify.cpp).

101

@JohenEGP I presume you referring to the __comp_ref_type, right? It's a great suggestion.

@avogelsgesang For context, there is an existing pattern in algorithms where the internal implementation of the function wraps the comparator in a typedef that is defined differently in debug and non-debug modes (see include/__algorithm/comp_ref_type.h):

#ifdef _LIBCPP_ENABLE_DEBUG_MODE
template <class _Comp>
using __comp_ref_type = __debug_less<_Comp>;
#else
template <class _Comp>
using __comp_ref_type = _Comp&;
#endif

So in non-debug modes, this resolves to simply a reference. In debug mode, however, it creates a temporary of a helper type __debug_less that additionally checks that the comparator in fact does induce a strict weak order.

I think we want to continue using this pattern going forward (e.g. lexicographical_compare uses it). The only issue is that the existing __debug_less only supports comparators returning a boolean. You would probably need to create a separate __three_way_comp_ref_type typedef and a separate __debug_three_way_comp helper struct (names are subject to change).

Please let me know if you'd like any additional context on this (this can be kinda confusing). Many existing algorithms can be used as examples of this pattern; a caveat is that more recent code uses a (very) slightly different approach:

  • older code tends to call have _Compare as a template parameter of the internal function, declare the parameter as _Compare __comp, and have the caller specify the template argument explicitly, like std::__foo<_comp_ref_type<_Comp>>(__first, __last, __comp). See e.g. lexicographical_compare;
  • in newer code, the pattern was to declare the parameter as a forwarding reference _Compare&& __comp and have the caller do a static_cast, like std::__foo(__first, __last, static_cast<__comp_ref_type(__comp)> (see e.g. inplace_merge).

If you prefer to, I'm fine with doing this in a follow up since this patch has been open for a while now.

110

Move the iterators? (in the other function as well)

You might additionally static_assert that the given iterators are copyable, to prevent users from accidentally passing move-only iterators (that our implementation would happen to accept due to the optimization but which isn't guaranteed by the Standard). A few of the existing algorithms do that, see e.g. upper_bound.

libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way.pass.cpp
4

We have a few tests that check for a certain behavior across a wide range of algorithms; those have to be updated to include lexicographical_compare_three_way now:

  • test/libcxx/diagnostics/nodiscard_extensions.verify.cpp (potentially);
  • test/libcxx/algorithms/robust_against_copying_comparators.pass.cpp;
  • test/libcxx/algorithms/robust_against_cpp20_hostile_iterators.compile.pass.cpp;
  • test/std/algorithms/robust_re_difference_type.compile.pass.cpp;
  • test/std/algorithms/robust_against_adl.compile.pass.cpp.

Please let me know if you need any help with those.

30

Nit: s/auto/decltype(auto)/. That way we always check the _exact_ returned type, without potentially stripping away references. While it's very unlikely that this algorithm could plausibly return a reference, I think the main advantage of using decltype(auto) is just not having to think about it and have it guaranteed to always catch the exact type.

libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.pass.cpp
32

Question: what is the purpose of comparing just the last digit instead of just the two given numbers? It makes the implementation slightly more complicated (and the variable names longer), but all the inputs are single-digit anyway.

35

Is this branch ever taken?

58

Nit: I'd put expected last so that all the algorithm inputs are next to each other.

68

Optional: create a local constant with a shorter name for the comparator to cut down on the boilerplate a little?

154

Nit: s/both/the/.

libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.verify.cpp
41

Nit: please also check that passing RandomAccessIteratorBadDifferenceType iterators for the second range fails as well (i.e., calling std::lexicographical_compare_three_way(c, d, a, b, std::compare_three_way())).

libcxx/test/support/almost_satisfies_types.h
425

Ultranit: there's one extraneous blank line.

libcxx/test/support/test_comparisons.h
263

Is this branch ever taken?

This revision now requires changes to proceed.Nov 30 2022, 1:43 AM
philnik added inline comments.Nov 30 2022, 4:13 AM
libcxx/include/__algorithm/lexicographical_compare_three_way.h
101

IMO we should make the debug stuff a separate effort. AFAIK we don't test it anywhere and because of that I'm pretty sure it regressed in some places. Instead of adding another untested branch, I'd suggest creating a patch that adds tests to all algorithms and, depending on scope, update the algorithms in follow-up patches or as part of the test-patch.

var-const added inline comments.Nov 30 2022, 11:34 AM
libcxx/include/__algorithm/lexicographical_compare_three_way.h
101

I would agree to that if someone volunteers to do that follow-up in the near future -- in fact, it would be great. Would you or @avogelsgesang be willing to take this on?

avogelsgesang marked 9 inline comments as done.

address varconst's commments

avogelsgesang added inline comments.Nov 30 2022, 7:05 PM
libcxx/benchmarks/lexicographical_compare_three_way.bench.cpp
74

Nit: passing this is unnecessary because it's the default value, right?

correct. removed

Nit: move?

move what? the input iterators? I don't think that is possible without a "use-after-move". I need them for each iteration of the for loop.

libcxx/include/__algorithm/lexicographical_compare_three_way.h
67

passing as _Cmp& now

101

added the __three_way_comp_ref_type as requested

110

Is adding the additional calls to std::move standard compliant? https://eel.is/c++draft/alg.three.way defines lexicographical_compare_three_way as not moving the iterators

libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way.pass.cpp
4

I added lexicographical_compare_three_way to all of them, except for robust_re_difference_type.compile.pass.cpp.
Afaict this test case relies on undefined behavior. The difference_type must be an integer type (https://eel.is/c++draft/iterator.iterators#2.2), but this test case violates that requirement.
As requested by @philnik I added a static_assert to lexicographical_compare_three_way which guards against non-integer different_types. Hence, adding lexicographical_compare_three_way triggers this static_assert.
I see two ways forward: Not adding lexicographical_compare_three_way to robust_re_difference_type.compile.pass.cpp or removing the static_assert. Which one do you prefer?

libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way_comp.pass.cpp
32

the idea is to test a "non-default" comparator. If I just used normal integer comparisons here, the test cases wouldn't catch it if lexicographical_compare_three_way just ignored the comparator and used the standard comparator instead

35

see "Check for a partial_ordering::unordered result" inside test_comparison_categories

libcxx/test/support/test_comparisons.h
263

See "Check for a partial_ordering::unordered result" in lexicographical_compare_three_way.pass.cpp

huixie90 added inline comments.Dec 3 2022, 9:55 AM
libcxx/include/__algorithm/lexicographical_compare_three_way.h
46

I would Not use common_type_t as it might not exist a common_type as common_type does not work for types that need implicit conversions.
The relevant spec here:
https://eel.is/c++draft/iterator.concept.winc#6.sentence-1
In short, integral types are only "explicitly" convertible to each other, not implicitly.
There are related discussion here: https://github.com/ericniebler/range-v3/issues/1745

I think we want to find which difference_type is wider (or we need another type to cover both) and then we need to static_cast to that type

avogelsgesang added inline comments.Dec 5 2022, 4:47 AM
libcxx/include/__algorithm/lexicographical_compare_three_way.h
46

I cannot quite follow. Which part of common_type_t does not work for this use case?

I think we want to find which difference_type is wider (or we need another type to cover both)

My understanding is that common_type_t does exactly that. Note that https://eel.is/c++draft/iterator.iterators#2.2 guarantees that difference_type is a signed integral. As such, there can be no mismatches on signedness, and common_type_t should always give the wider `difference_type

then we need to static_cast to that type

Why do we need static_casts instead of relying on implicit conversions? https://eel.is/c++draft/iterator.concept.winc#6.sentence-1 states that casting to a wider integer type is an implicit conversion.

@avogelsgesang I tried out the benchmarks, and it looks like you were probably accidentally running the benchmarks on the debug version of the build (unfortunately, it's a very easy mistake to make since it's the default). Using the debug build, I get timings very similar to what you saw earlier:

-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
BM_lexicographical_compare_three_way<IntPtr>/1                                     25.7 ns         25.7 ns     27179505
BM_lexicographical_compare_three_way<IntPtr>/4                                     64.1 ns         64.0 ns     10934595
BM_lexicographical_compare_three_way<IntPtr>/16                                     316 ns          249 ns      2924673
BM_lexicographical_compare_three_way<IntPtr>/64                                     880 ns          864 ns       820595
BM_lexicographical_compare_three_way<IntPtr>/256                                   3328 ns         3324 ns       211011
BM_lexicographical_compare_three_way<IntPtr>/1024                                 13151 ns        13145 ns        53256
BM_lexicographical_compare_three_way<IntPtr>/4096                                 52541 ns        52531 ns        13321
BM_lexicographical_compare_three_way<IntPtr>/16384                               210420 ns       210398 ns         3342
BM_lexicographical_compare_three_way<IntPtr>/65536                               847864 ns       847372 ns          825
BM_lexicographical_compare_three_way<IntPtr>/262144                             3394918 ns      3393657 ns          207
BM_lexicographical_compare_three_way<IntPtr>/1048576                           13569716 ns     13563941 ns           51
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1             37.7 ns         37.7 ns     18632581
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4             85.6 ns         85.6 ns      8158698
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16             286 ns          285 ns      2474206
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/64            1036 ns         1035 ns       676002
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/256           4018 ns         4017 ns       167247
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1024         16138 ns        15993 ns        43945
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4096         63604 ns        63602 ns        10994
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16384       255664 ns       255574 ns         2749
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/65536      1025497 ns      1025498 ns          683
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/262144     4113879 ns      4111688 ns          170
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1048576   16455950 ns     16454558 ns           43
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1               29.5 ns         29.5 ns     23748940
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4               99.7 ns         99.7 ns      6798031
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16               300 ns          300 ns      2339940
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/64              1116 ns         1115 ns       632037
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/256             4347 ns         4344 ns       161197
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1024           17271 ns        17270 ns        40467
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4096           68959 ns        68958 ns        10135
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16384         276437 ns       276348 ns         2533
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/65536        1111640 ns      1111475 ns          629
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/262144       4451482 ns      4450650 ns          157
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1048576     17807151 ns     17807154 ns           39

With a similar observation where random_access_iterator<int*> is consistently slower than int* (which makes sense in an unoptimized build).

Rerunning the build with -DCMAKE_BUILD_TYPE=Release, however, makes the difference go away -- now the timings are within the margin of error (not to mention many times faster):

BM_lexicographical_compare_three_way<IntPtr>/1                                    0.447 ns        0.444 ns   1000000000
BM_lexicographical_compare_three_way<IntPtr>/4                                     1.99 ns         1.96 ns    361161702
BM_lexicographical_compare_three_way<IntPtr>/16                                    5.86 ns         5.81 ns    121777252
BM_lexicographical_compare_three_way<IntPtr>/64                                    21.6 ns         21.4 ns     32736439
BM_lexicographical_compare_three_way<IntPtr>/256                                   97.6 ns         96.2 ns      7379296
BM_lexicographical_compare_three_way<IntPtr>/1024                                   346 ns          343 ns      2028815
BM_lexicographical_compare_three_way<IntPtr>/4096                                  1352 ns         1339 ns       515878
BM_lexicographical_compare_three_way<IntPtr>/16384                                 5356 ns         5312 ns       132878
BM_lexicographical_compare_three_way<IntPtr>/65536                                21379 ns        21179 ns        32955
BM_lexicographical_compare_three_way<IntPtr>/262144                               85883 ns        84892 ns         8327
BM_lexicographical_compare_three_way<IntPtr>/1048576                             359011 ns       355722 ns         1956
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1            0.448 ns        0.444 ns   1000000000
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4             1.96 ns         1.94 ns    359261768
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16            5.89 ns         5.83 ns    121641817
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/64            21.5 ns         21.3 ns     32731847
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/256           97.2 ns         96.2 ns      7337449
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1024           346 ns          344 ns      2040876
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4096          1347 ns         1336 ns       516457
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16384         5378 ns         5329 ns       131757
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/65536        21296 ns        21168 ns        32939
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/262144       85516 ns        84770 ns         8164
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1048576     362302 ns       358839 ns         1979
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1              0.576 ns        0.568 ns   1000000000
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4               2.27 ns         2.26 ns    309139488
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16              7.85 ns         7.80 ns     87860227
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/64              37.9 ns         37.7 ns     18627871
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/256              131 ns          130 ns      5375353
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1024             503 ns          501 ns      1386276
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4096            1995 ns         1986 ns       352332
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16384           7947 ns         7919 ns        88986
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/65536          31716 ns        31642 ns        22141
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/262144        127179 ns       126551 ns         5454
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1048576       517207 ns       514653 ns         1360

I think that explains it -- we were assuming we're seeing optimized results which wasn't actually the case. It also means the code is doing the right thing, so there's no actual issue, which is great!

var-const added a comment.EditedDec 8 2022, 10:02 PM

Edit: never mind my original comment, I got incorrect results. Update below:

@avogelsgesang Thanks for pointing out you were using RelWithDebInfo. Running in that mode, I get similar results to Release, with no divergence between int* and random_access_iterator<int*>:

-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
BM_lexicographical_compare_three_way<IntPtr>/1                                    0.590 ns        0.568 ns   1000000000
BM_lexicographical_compare_three_way<IntPtr>/4                                     2.24 ns         2.24 ns    312197559
BM_lexicographical_compare_three_way<IntPtr>/16                                    6.05 ns         6.05 ns    116194144
BM_lexicographical_compare_three_way<IntPtr>/64                                    21.4 ns         21.4 ns     32794105
BM_lexicographical_compare_three_way<IntPtr>/256                                   94.8 ns         94.8 ns      7386226
BM_lexicographical_compare_three_way<IntPtr>/1024                                   340 ns          340 ns      2051468
BM_lexicographical_compare_three_way<IntPtr>/4096                                  1319 ns         1319 ns       530854
BM_lexicographical_compare_three_way<IntPtr>/16384                                 5332 ns         5284 ns       132817
BM_lexicographical_compare_three_way<IntPtr>/65536                                20946 ns        20945 ns        33471
BM_lexicographical_compare_three_way<IntPtr>/262144                               83706 ns        83701 ns         8370
BM_lexicographical_compare_three_way<IntPtr>/1048576                             348663 ns       348648 ns         1981
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1            0.557 ns        0.557 ns   1000000000
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4             2.20 ns         2.20 ns    313624287
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16            6.04 ns         6.04 ns    117445723
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/64            27.8 ns         27.8 ns     25139524
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/256           95.5 ns         94.7 ns      7396059
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1024           339 ns          339 ns      2072201
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/4096          1319 ns         1319 ns       530504
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/16384         5245 ns         5245 ns       133212
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/65536        20917 ns        20917 ns        33457
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/262144       83482 ns        83481 ns         8363
BM_lexicographical_compare_three_way<random_access_iterator<IntPtr>>/1048576     346843 ns       346843 ns         2018
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1              0.627 ns        0.627 ns   1000000000
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4               2.92 ns         2.92 ns    239622629
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16              12.4 ns         12.4 ns     55910543
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/64              50.1 ns         50.1 ns     13777900
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/256              206 ns          206 ns      3390586
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1024             807 ns          807 ns       861390
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/4096            3286 ns         3286 ns       214230
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/16384          12841 ns        12841 ns        54509
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/65536          52191 ns        51713 ns        13388
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/262144        204980 ns       204978 ns         3411
BM_lexicographical_compare_three_way<cpp17_input_iterator<IntPtr>>/1048576       824384 ns       824375 ns          847

So my original guess about Debug was probably incorrect (though it did reproduce the relative difference in numbers pretty well). I'm not sure what happened, but since I'm seeing the expected results locally, I think everything's good with the patch, and it's good to proceed.

var-const added inline comments.Dec 8 2022, 10:46 PM
libcxx/benchmarks/lexicographical_compare_three_way.bench.cpp
29

Hmm, I think the slow/fast_path functions still need the comparator passed explicitly.

74

Right, please disregard (I might have been looking at a previous iteration of this code where iterators were initialized inside the loop).

libcxx/include/__algorithm/lexicographical_compare_three_way.h
110

I think we are allowed to do that (as long as the difference is not observable), and it is done in a few places. I can't think of a valid way a user could observe the difference -- since we are allowed to copy or move the iterators in the implementation, they can't rely on move constructor not being called or a copy constructor only being called once.

libcxx/include/__algorithm/three_way_comp_ref_type.h
10

You might need to run something like ninja -C build libcxx-generate-files to take the new header into account.

28

Functions in this file should be marked _LIBCPP_HIDE_FROM_ABI.

31

Hmm, looks like the return type can be just auto? (if I'm reading this correctly, it can never return a reference)

38

Note: _LIBCPP_INLINE_VISIBILITY was used in old code, in new code we use _LIBCPP_HIDE_FROM_ABI for this (they do the same thing, though).

40

Question: do we need this requires clause? We only use the class in our internal code, so it seems like we shouldn't need this. If it's to validate the comparator given by the user, then it shouldn't only be done in the debug mode.

47

Hmm, shouldn't this be == __expected? IIUC, __expected is already "reversed".

54

Since this class's definition is only available in C++20 and above, this can be just constexpr (throughout the file).

libcxx/test/std/algorithms/alg.sorting/alg.three.way/lexicographical_compare_three_way.pass.cpp
4

Hmm, it's a great question. I'm not sure why this wasn't just static_asserted before. I'll check -- in the meantime, I think it's best to leave the static_assert in place and add a comment to the test file to explain that lexicographical_compare_three_way is omitted intentionally.

libcxx/test/std/algorithms/robust_against_adl.compile.pass.cpp
109

This looks like it was commented out accidentally?

112

Can this be uncommented?

avogelsgesang marked 18 inline comments as done.

Adress a couple of varconst's comments. Not completely done, yet. Want to get feedback from the CI

libcxx/include/__algorithm/three_way_comp_ref_type.h
10

That seems like a no-op to me. At least, git does not mark any files as changed after running this

ldionne added inline comments.Dec 14 2022, 11:20 AM
libcxx/include/__algorithm/lexicographical_compare_three_way.h
44

@avogelsgesang @philnik @var-const

Is there a reason why we're testing for this *here* specifically? Isn't this a general property that we require from iterators?

Also, why don't we check that the difference type is signed? That's what https://eel.is/c++draft/iterator.iterators#2.2 says.

FWIW, I believe it would be preferable to add this assertion in a more principled way across the library, and also (especially) to do it in a separate patch. We probably need to add a temporary escape hatch for this one cause I would assume that it may break a lot of users.

philnik added inline comments.Dec 14 2022, 11:28 AM
libcxx/include/__algorithm/lexicographical_compare_three_way.h
44

@avogelsgesang @philnik @var-const

Is there a reason why we're testing for this *here* specifically? Isn't this a general property that we require from iterators?

We're adding it here because it's a new algorithm, so we can't break people.

Also, why don't we check that the difference type is signed? That's what https://eel.is/c++draft/iterator.iterators#2.2 says.

I don't remember why we only check that it's integral; I assume it was just an oversight.

FWIW, I believe it would be preferable to add this assertion in a more principled way across the library, and also (especially) to do it in a separate patch. We probably need to add a temporary escape hatch for this one cause I would assume that it may break a lot of users.

I agree that it would make sense to add it to all the algorithms, but I don't think it makes a lot of sense to add an escape hatch to a new algorithm.

ldionne added inline comments.Dec 14 2022, 4:09 PM
libcxx/include/__algorithm/lexicographical_compare_three_way.h
44

Good point about the new algorithm. Let's add it here unconditionally. And then we can pursue lifting this into something like static_assert(__iterator_requirements_whatever<_It>) that we can perhaps sprinkle in a few places (with a escape hatch for existing algorithms).

Basically I'd like to avoid this being just a one-off. I think this assert is a good idea and would like to expand its use.

Let's address the signed integral issue before we ship, though.

avogelsgesang marked 6 inline comments as done.
  • static_assert for signed integral
  • add missing includes
ldionne accepted this revision.Dec 20 2022, 2:59 PM

LGTM w/ green CI and current review comments addressed. Some of us will be out for the holidays and I don't want this patch to be blocked from making progress artificially since it seems like everything has been addressed or at least discussed.

Is this blocked on anything? It would be awesome to finish it up and merge it in time for LLVM 17. I think this had pretty much all the approvals it needed to go ahead and only a few review comments had to be addressed.

libcxx/docs/Status/Cxx20Issues.csv
265–266
libcxx/docs/Status/Cxx2bIssues.csv
66
avogelsgesang marked 3 inline comments as done.Feb 12 2023, 11:02 AM

Is this blocked on anything?

I am unfortunately still running into linking issues with libc++ (latest update here) which means I am unable to run any tests locally. So far, my strategy was to give at a bit of time since maybe someone else with more experience than me might be running into the same problem and fix it for me. I just rebased on latest main, but the problems still persist. I guess, I will have to dig deeper on what is broken there.

Rebase on main; update status pages: 16.0 -> 17.0

Add test for __debug_three_way_comp

avogelsgesang added inline comments.Feb 12 2023, 1:02 PM
libcxx/include/__algorithm/three_way_comp_ref_type.h
40

The idea is: For comparators which can be called as cmp(Type1, Type2) but not as cmp(Type2, Type1), we want to skip this check, and fallback to the other __do_compare_assert further down.

I copied this from __algorithm/comp_ref_type.h, which had

decltype((void)std::declval<_Compare&>()(
    std::declval<_LHS &>(), std::declval<_RHS &>()))

to check whether the parameters can be swapped

fix clang-tidy by replacing declval with std::declval

Landing despite red CI, because:

  • clang-format is red because it is complaining about incorrectly formatting in a pre-existing file
  • "Apple back-deployment macosx-10.15" is failing due to some unrelated infrastructure error
  • clang-cl failed due to std/thread/thread.mutex/thread.mutex.requirements/thread.timedmutex.requirements/thread.timedmutex.recursive/try_lock_for.pass.cpp which is unrelated to this commit

Build error from Apple back-deployment test:

+ tar -xz --strip-components=1 -C /Users/libcxx-buildkite-agent/libcxx.buildkite-agent/builds/y10-8-macminivault-com/llvm-project/libcxx-ci/build/apple-system-backdeployment-10.15/macos-roots
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   934  100   934    0     0     31      0  0:00:30  0:00:30 --:--:--   246
tar: Error opening archive: Unrecognized archive format
This revision was not accepted when it landed; it landed in state Needs Review.Feb 12 2023, 2:51 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
h-vetinari added inline comments.
libcxx/docs/Status/Cxx20Issues.csv
264–265

This line is now duplicated.

libcxx/test/std/algorithms/robust_re_difference_type.compile.pass.cpp
148–153

Is it intentional that these tests are commented out? If so, the comment doesn't really elucidate why, or what would be necessary to enable them.

avogelsgesang added inline comments.Feb 13 2023, 12:48 AM
libcxx/test/std/algorithms/robust_re_difference_type.compile.pass.cpp
148–153

Yes, commenting them out was intentional, see discussion in https://reviews.llvm.org/D131395#inline-1342123

I hoped that the comment

lexicographical_compare_three_way static_asserts that the difference type is an integer, as required by https://eel.is/c++draft/iterator.iterators#2.2

would explain why lexicographical_compare_three_way would reject the difference_type used in this test here, but now I realize that the comment is missing the *signed* integer requirement...

On 2nd thought: maybe it would have been sufficient to use PickyIterator<void**, unsigned long>(a); instead of PickyIterator<void**, long>(a); ...

avogelsgesang marked an inline comment as done.Feb 13 2023, 4:34 PM
avogelsgesang added inline comments.
libcxx/docs/Status/Cxx20Issues.csv
264–265

Thanks! Fixed :)