Page MenuHomePhabricator

[libc++][ranges]Refactor `copy{,_backward}` and `move{,_backward}`
ClosedPublic

Authored by var-const on Jul 28 2022, 2:57 AM.

Details

Summary

Instead of using reverse_iterator, share the optimization between the 4 algorithms. The key observation here that memmove applies to both copy and move identically, and to their _backward versions very similarly. All algorithms now follow the same pattern along the lines of:

if constexpr (can_memmove<InIter, OutIter>) {
  memmove(first, last, out);
} else {
  naive_implementation(first, last, out);
}

A follow-up will delete unconstrained_reverse_iterator.

This patch removes duplication and divergence between std::copy, std::move and std::move_backward. It also improves testing:

  • the test for whether the optimization is used only applied to std::copy and, more importantly, was essentially a no-op because it would still pass if the optimization was not used;
  • there were no tests to make sure the optimization is not used when the effect would be visible.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
var-const updated this revision to Diff 456895.Aug 31 2022, 1:53 AM
var-const marked 21 inline comments as done.

Address feedback

libcxx/include/__algorithm/copy.h
34

Restored _Sent. I didn't think this qualifies as a range algorithm, but now that you mention it, I presume it does (I still don't think having two different name conventions is a good idea).

34–35

Yes, the idea is that pointers are more specialized. I actually had mutually-exclusive constraints in an earlier revision, but relying on pointers seems to provide a nice simplification.

51

I'd rather keep everything related to the memmove optimization in one place. One of the goals of this patch was to make these details leak out as little as possible into the actual algorithms.

libcxx/include/__algorithm/copy_backward.h
38

I think @huixie90 pointed out previously that make_pair creates copies of the arguments in C++03 mode.

58–67

Thanks for noticing! I started working on this patch a while ago, this is probably a rudiment from that time.

libcxx/include/__algorithm/copy_move_common.h
35

The original form of this code was (with slight NFC reformatting):

if (__libcpp_is_constant_evaluated()
#ifndef _LIBCPP_COMPILER_GCC
      && !is_trivially_copyable<_InValueT>::value
#endif
     ) {
    return std::__copy_impl<_InValueT*, _InValueT*, _OutValueT*>(__first, __last, __result);
  }
  const size_t __n = static_cast<size_t>(__last - __first);
  if (__n > 0)
    ::__builtin_memmove(__result, __first, __n * sizeof(_OutValueT));
  return std::make_pair(__first + __n, __result + __n);

Unless I'm missing something obvious, __builtin_memmove was never called during constant evaluation.

110–119

Probably. What would be the advantage of using _Lazy here?

116

This isn't necessarily a sentinel in the C++20-concept sense. This can be instantiated with e.g. a pair of input iterators that aren't copyable.

libcxx/include/__algorithm/move.h
32

I think this is a great point, thanks. I'm inclined to leave it as is for this patch because it's a preexistent thing, but I will address it in a follow-up.

libcxx/include/__algorithm/move_backward.h
42

This is preexisting and I'd rather not change it in this patch.

libcxx/include/__algorithm/unwrap_iter.h
36 ↗(On Diff #456586)

@philnik I find that logically, *re*wrapping has to go after unwrapping, since *re* implies adding something that was previously removed. I originally had more changes to this file, but those went away eventually. I reverted the change from this patch and will submit it separately.

libcxx/include/__algorithm/unwrap_range.h
34 ↗(On Diff #456586)

Hmm, I remember adding this to avoid a compilation error, but it looks like that error no longer applies. Removed.

var-const marked an inline comment as done.Aug 31 2022, 1:53 AM
philnik added inline comments.Aug 31 2022, 5:06 AM
libcxx/include/__algorithm/copy_backward.h
38

Then we should just remove the #ifndef _LIBCPP_CXX03_LANG, since we require rvalue support in C++03. See D133013.

58–67

Ah OK, so it was just a merge conflict probably.

libcxx/include/__algorithm/copy_move_common.h
35

To me it looks like it was called when the type is_trivially_copyable.

116

C++17 input iterators are always copyable and input_iterators can't be their own sentinel if they aren't copyable .

var-const updated this revision to Diff 457753.Fri, Sep 2, 7:15 PM
var-const marked 4 inline comments as done.

Address feedback, fix CI.

libcxx/include/__algorithm/copy.h
59

Now because of using __builtin_memmove, it's necessary to get the value_type of the iterator, so _AlgPolicy is now necessary for copy as well.

libcxx/include/__algorithm/copy_backward.h
38

Thanks for doing the change!

libcxx/include/__algorithm/copy_move_common.h
35

Okay, I restored using __builtin_memmove during constant evaluation, with the necessary additional conditions. I also added a test to make sure the is_trivially_copyable condition is honored.

I'm still not sure if it's worth the extra complexity (although now the complexity is all in one place, making this a little more manageable). @ldionne What do you think? I'm somewhat on the fence and can be convinced either way.

ldionne added inline comments.Fri, Sep 9, 6:56 AM
libcxx/include/__algorithm/copy_move_common.h
35

Yeah, I think restoring the __builtin_memmove behavior is probably worth it given the surprising (to me) compile-time speedup.

55–88

IMO those should be defined in copy.h, move.h, copy_backward.h and move_backward.h instead.

libcxx/include/__algorithm/move.h
32

Is this really something we should fix? Aren't we allowed to bypass the iterator's iter_move?

philnik accepted this revision.Sun, Sep 11, 1:29 AM

LGTM with comments addressed and a green CI.

libcxx/include/__algorithm/copy_backward.h
62

Please static_assert that the iterators are copyable.

libcxx/include/__algorithm/copy_move_common.h
59

It's a weird quirk of clang-format right now and I don't know exactly why it does this, but these spaces can be removed and clang-format won't add them again. I think it's because of SpacesInAngles: Keep, but that shouldn't mean they are kept after a an opening one. Anyways, if you feel like removing these it would be nice, but not a must from my side.

libcxx/include/__algorithm/move_backward.h
66–67

Please also static_assert here.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_nontrivial.pass.cpp
26

Maybe use a requires since the test requires C++20 anyways?

107

This is technically still ill-formed, since the return type has to be element_type*, but I guess it's OK for us to do it in this case.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_trivial.pass.cpp
24

To make it less prone to failing when we change includes, could you move the <type_traits> include down and static_assert the trivially_copyable later? It should still catch a problem before anything unpredictable happens. AFAICT you don't actually need the <cstring> include. <cstddef> should suffice for size_t.

51

Also requires here.

69

What is this Foo for? Shouldn't the compiler be able to deduce the correct type?

libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_pointer_size.pass.cpp
9–11 ↗(On Diff #457753)

Why is this required? Did it hit the constexpr limit without the __builtin_memmove? In that case, please remove the bumped limit again.

This revision is now accepted and ready to land.Sun, Sep 11, 1:29 AM
philnik requested changes to this revision.Sun, Sep 11, 4:00 PM

Sorry, I know I approved it before, but I just noticed that this patch still regresses on some optimizations (in particular optimizing reverse_iterator<contiguous-iterator>), so maybe I'm missing more and I'm also questioning the usefulness of the changes more and more, so I'd like some clarification. What exactly is the goal of the refactoring here? How does the refactored code do the job better than forwarding copy_backward, move and move_backward to copy? The answers to these questions would probably also be good to have in the commit message. I first thought that his patch removed some of the complication in copy.h, but the longer I think about it it the more it looks to me like it's just shifting the complexity around.

This revision now requires changes to proceed.Sun, Sep 11, 4:00 PM
var-const edited the summary of this revision. (Show Details)Tue, Sep 20, 4:59 PM
var-const edited the summary of this revision. (Show Details)
var-const updated this revision to Diff 461776.Tue, Sep 20, 7:13 PM
var-const marked 9 inline comments as done.

Feedback + rebase

Sorry, I know I approved it before, but I just noticed that this patch still regresses on some optimizations (in particular optimizing reverse_iterator<contiguous-iterator>), so maybe I'm missing more and I'm also questioning the usefulness of the changes more and more, so I'd like some clarification. What exactly is the goal of the refactoring here? How does the refactored code do the job better than forwarding copy_backward, move and move_backward to copy? The answers to these questions would probably also be good to have in the commit message. I first thought that his patch removed some of the complication in copy.h, but the longer I think about it it the more it looks to me like it's just shifting the complexity around.

The main motivation is to reduce complexity and remove code duplication. Majority of the remaining complexity is due to having to support C++03. The ideal state I envision is essentially something like

if constexpr (__can_memmove<_InIter, _OutIter>) {
  return __memmove(__first, __last, __out);
} else {
  return __copy_loop(__first, __last, __out);
}

in every algorithm (as a side note, I really hope we get Clang to support if constexpr in C++03 mode at some point). This patch brings this as close to that state as I could within the limitations of C++03.

As for specific improvements:

  • the optimization was fully duplicated between std::copy and std::move. Moreover, std::move_backward had a divergent, less thorough implementation of the optimization. Now the duplication is removed and everything related to this optimization is encapsulated in one place.
  • all 4 algorithms now have symmetrical, self-contained implementations which makes them easier to read and debug.
  • reverse_iterator is no longer used in the implementation, which makes it significantly easier to understand and removes artificial types from backtraces. Using reverse_iterator has lead to at least one serious bug already.
  • unconstrainted_reverse_iterator can now be removed (which I intend to do in a follow-up), removing a lot of code duplication and complexity.
  • the test for whether the optimization is used only applied to std::copy and, more importantly, was essentially a no-op because it would still pass if the optimization was not used;
  • there were no tests to make sure the optimization is not used when the effect would be visible.

Regarding the reverse_iterator optimization you mention, I think it was solution for a problem that the previous approach introduced and that is no longer applicable. The optimization was very limited in that it would only apply to the case where both the input and the output iterators were reverse iterators. I see no inherent reason why users would reverse both the input and the output sequence rather than just one of them, outside of the artificial case where our implementation itself would do the wrapping. If this optimization were to be restored, it would have to apply equally to input and output iterators. I don’t think, however, it would be pulling its weight.

libcxx/include/__algorithm/move.h
32

Hmm, inlining the definition from the standard a little bit, it looks like:

Effects: [...] For each non-negative integer `n < N`, performs `*(result + n) = ranges​::​iter_­move(first + n)`.

I'm not sure if there's anything that would allow us to bypass this?
(see https://eel.is/c++draft/alg.move)

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_nontrivial.pass.cpp
26

Good point.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_trivial.pass.cpp
24

Replaced the <cstring> include (it's from the previous iteration which used memmove). I'd prefer to keep the static assertion close to the point of class definition, though. Can you elaborate on how the test might fail if we change includes?

libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_pointer_size.pass.cpp
9–11 ↗(On Diff #457753)

It failed on CI. I'll remove the flags and see if the current state stays within the limit or not.

The main motivation is to reduce complexity and remove code duplication. Majority of the remaining complexity is due to having to support C++03. The ideal state I envision is essentially something like

if constexpr (__can_memmove<_InIter, _OutIter>) {
  return __memmove(__first, __last, __out);
} else {
  return __copy_loop(__first, __last, __out);
}

in every algorithm (as a side note, I really hope we get Clang to support if constexpr in C++03 mode at some point). This patch brings this as close to that state as I could within the limitations of C++03.

It would definitely be nice if we could write it that simple. Unfortunately we can't currently.

As for specific improvements:

  • the optimization was fully duplicated between std::copy and std::move. Moreover, std::move_backward had a divergent, less thorough implementation of the optimization. Now the duplication is removed and everything related to this optimization is encapsulated in one place.

That would also be the case if we forward to std::copy though, right?

  • all 4 algorithms now have symmetrical, self-contained implementations which makes them easier to read and debug.

I'm not sure this is actually simpler to debug. I haven't checked the current backtrace, so maybe it's worse, but here is one similar to what I encountered while working on D132505:

* thread #1, name = 't.tmp.exe', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x000055555555967c t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const [inlined] std::__1::__segmented_iterator_traits<std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>::__end[abi:v16000](__iter=0x0000000000000000) at deque:406:91
    frame #1: 0x000055555555967c t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const at copy.h:83:27
    frame #2: 0x0000555555559662 t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const [inlined] std::__1::pair<int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__unwrap_and_dispatch[abi:v16000]<std::__1::__overload<std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_trivial>, int const*, int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(__first=0x0000000000000000, __last=0x0000000000000000, __out_first=__deque_iterator<int, int *, int &, int **, long, 1024L> @ 0x0000000001c45b10) at copy_move_common.h:81:19
    frame #3: 0x0000555555559662 t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const [inlined] std::__1::pair<int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__dispatch_copy_or_move[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_trivial, int const*, int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=0x0000000000000000, __last=0x0000000000000000, __out_first=__deque_iterator<int, int *, int &, int **, long, 1024L> @ 0x0000000001c13090) at copy_move_common.h:119:10
    frame #4: 0x0000555555559662 t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const [inlined] std::__1::pair<int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy[abi:v16000]<std::__1::_ClassicAlgPolicy, int const*, int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=0x0000000000000000, __last=0x0000000000000000, __result=__deque_iterator<int, int *, int &, int **, long, 1024L> @ 0x0000000001c13090) at copy.h:108:10
    frame #5: 0x0000555555559662 t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator(this=0x00007fffffffda18, __first=__deque_iterator<int, const int *, const int &, const int *const *, long, 1024L> @ 0x00000000019d5fd0, __last=__deque_iterator<int, const int *, const int &, const int *const *, long, 1024L> @ 0x00000000019d5fd0, __result=__deque_iterator<int, int *, int &, int **, long, 1024L> @ 0x00007fffffffda00)[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const at copy.h:50:22
    frame #6: 0x00005555555554cc t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(int, int) [inlined] std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__unwrap_and_dispatch[abi:v16000]<std::__1::__overload<std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_trivial>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(__first=<unavailable>, __last=<unavailable>, __out_first=<unavailable>) at copy_move_common.h:81:19
    frame #7: 0x000055555555549e t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(int, int) [inlined] std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__dispatch_copy_or_move[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_trivial, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=<unavailable>, __last=<unavailable>, __out_first=<unavailable>) at copy_move_common.h:119:10
    frame #8: 0x000055555555549e t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(int, int) [inlined] std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=<unavailable>, __last=<unavailable>, __result=<unavailable>) at copy.h:108:10
    frame #9: 0x000055555555549e t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(int, int) [inlined] std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l> std::__1::copy[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=<unavailable>, __last=<unavailable>, __result=<unavailable>) at copy.h:115:10
    frame #10: 0x000055555555549e t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(start=0, N=0) at copy.pass.cpp:56:5
    frame #11: 0x000055555555520d t.tmp.exe`main((null)=<unavailable>, (null)=<unavailable>) at copy.pass.cpp:77:13
    frame #12: 0x00007ffff7b2d20a libc.so.6`___lldb_unnamed_symbol3135 + 122
    frame #13: 0x00007ffff7b2d2bc libc.so.6`__libc_start_main + 124
    frame #14: 0x0000555555555121 t.tmp.exe`_start + 33
  • reverse_iterator is no longer used in the implementation, which makes it significantly easier to understand and removes artificial types from backtraces. Using reverse_iterator has lead to at least one serious bug already.

AFAICT this was a language bug that will get fixed. When it's fixed I don't think we'd need an __unconstrained_reverse_iterator to forward using an iterator wrapper. Not that that helps us currently.

  • unconstrainted_reverse_iterator can now be removed (which I intend to do in a follow-up), removing a lot of code duplication and complexity.

Do you already have that follow-up? I don't know how much extra complexity will result in removing __unconstrained_reverse_iterator.

  • the test for whether the optimization is used only applied to std::copy and, more importantly, was essentially a no-op because it would still pass if the optimization was not used;
  • there were no tests to make sure the optimization is not used when the effect would be visible.

I agree with you here. The tests are definitely a huge improvement. The only gripe I have with them is that they have UB, but I guess it would be a huge red flag if that wasn't the case.

Regarding the reverse_iterator optimization you mention, I think it was solution for a problem that the previous approach introduced and that is no longer applicable. The optimization was very limited in that it would only apply to the case where both the input and the output iterators were reverse iterators. I see no inherent reason why users would reverse both the input and the output sequence rather than just one of them, outside of the artificial case where our implementation itself would do the wrapping. If this optimization were to be restored, it would have to apply equally to input and output iterators. I don’t think, however, it would be pulling its weight.

This optimization was reported as a performance bug. See https://github.com/llvm/llvm-project/issues/33687. This was one of the reasons the overload was introduced. If we remove this optimization we should at least comment there that it was removed any why.

Just to be clear, I'm not against this change. I don't know which way to implement copy and friends is the right way, but I want to make sure we don't regress unexpectedly in some area.

Another thing to consider is that I think D132505 could implement the algorithm once instead of four times if we use the "forward with iterator wrappers" approach and adding specializations for the wrappers. D132505 is an improvement either way, so I'm not that concerned. I also haven't tried to implement the forwarding version, so take it with a grain of salt. Other suggestions to remove the duplicates are always appreciated!

libcxx/include/__algorithm/move.h
32

The contiguous_iterator concept guarantees that ranges::iter_move(a) has the same type, value category and effects as std::move(*a). AFAICT it would be UB to add an overload to std::move, so a specialized ranges::iter_move() that does anything else than casting to an rvalue would be UB.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_trivial.pass.cpp
24

It would break if <type_traits> would include copy for some reason. https://godbolt.org/z/P9Eq13ErY

  • unconstrainted_reverse_iterator can now be removed (which I intend to do in a follow-up), removing a lot of code duplication and complexity.

Do you already have that follow-up? I don't know how much extra complexity will result in removing __unconstrained_reverse_iterator.

We looked at it briefly and it didn't seem to incur much penalty. FWIW, I think removing the need for __unconstrained_reverse_iterator (and generally speaking the use of reverse_iterator as an implementation detail of low-level algorithms) is by far the most important motivation, at least for me.

var-const marked 5 inline comments as done.Fri, Sep 23, 9:53 PM

It would definitely be nice if we could write it that simple. Unfortunately we can't currently.

I really hope we would be able to at some point. The code structure from this patch, while far from this ideal, can be transformed into this form in a straightforward way.

By the way, are there any C++11-and-later features that you feel would *really* make a difference if they were available in C++03? I gave it some thought, and if constexpr seemed to be number one (and number 2 would be auto and decltype(auto) as the return type of a function).

That would also be the case if we forward to std::copy though, right?

Not really. The code structure in this patch is basically:

std::copy          -> <memmove>
std::copy_backward -> <memmove>
std::move          -> <memmove>
std::move_backward -> <memmove>

All implementations are independent of each other, can be read in isolation, and are symmetrical. If I know how copy is implemented, I pretty much know how the other 3 are implemented as well.

With forwarding, the structure is

std::copy                                  -> <memmove>
std::copy_backward -> std::copy            -> <memmove>
std::move          -> std::copy            -> <memmove>
std::move_backward -> std::copy (or move?) -> <memmove>

It is inherently more complicated. Now knowing the implementation of copy doesn't tell me anything about the implementation of copy_backward. And move, IIUC, is even worse because it *sometimes* uses its own implementation and *sometimes* forwards to copy. This is even ignoring the issue of reverse iterators for the moment.

I'm not sure this is actually simpler to debug. I haven't checked the current backtrace, so maybe it's worse, but here is one similar to what I encountered while working on D132505:

I ran a simple test. The current state is:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001000020ad t.tmp.exe`std::__1::pair<std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<int*> > std::__1::__copy_impl[abi:v16000]<std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<int*> >(__first=__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void *> > @ 0x00007ff7bfefeed8, __last=__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void *> > @ 0x00007ff7bfefeed0, __result=__unconstrained_reverse_iterator<int *> @ 0x00007ff7bfefeec8) at copy.h:39:6
    frame #1: 0x0000000100001e80 t.tmp.exe`std::__1::pair<std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<int*> > std::__1::__copy[abi:v16000]<std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<int*>, 0>(__first=__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void *> > @ 0x00007ff7bfefef98, __last=__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void *> > @ 0x00007ff7bfefef90, __result=__unconstrained_reverse_iterator<int *> @ 0x00007ff7bfefef88) at copy.h:96:18
    frame #2: 0x0000000100001d9f t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__copy_backward[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__list_iterator<int, void*>, int*, 0>(__first=__list_iterator<int, void *> @ 0x00007ff7bfeff028, __last=__list_iterator<int, void *> @ 0x00007ff7bfeff020, __result=0x00007ff7bfeff0e0) at copy_backward.h:36:16
    frame #3: 0x0000000100001b95 t.tmp.exe`int* std::__1::copy_backward[abi:v16000]<std::__1::__list_iterator<int, void*>, int*>(__first=__list_iterator<int, void *> @ 0x00007ff7bfeff088, __last=__list_iterator<int, void *> @ 0x00007ff7bfeff080, __result=0x00007ff7bfeff0e0) at copy_backward.h:57:10
    frame #4: 0x0000000100001ac9 t.tmp.exe`foo() at copy.pass.cpp:93:3
    frame #5: 0x0000000100001c6b t.tmp.exe`main((null)=1, (null)=0x00007ff7bfeff280) at copy.pass.cpp:102:3
    frame #6: 0x00000001000454fe dyld`start + 462

With this patch it becomes

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100003b31 t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__copy_backward_loop<std::__1::_ClassicAlgPolicy>::operator(this=0x00007ff7bfefef40, __first=__list_iterator<int, void *> @ 0x00007ff7bfefeec8, __last=__list_iterator<int, void *> @ 0x00007ff7bfefeec0, __result=0x00007ff7bfeff0c8)[abi:v16000]<std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*>(std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*) const at copy_backward.h:36:8
    frame #1: 0x000000010000398f t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__unwrap_and_dispatch[abi:v16000]<std::__1::__overload<std::__1::__copy_backward_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_backward_trivial>, std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*, 0>(__first=__list_iterator<int, void *> @ 0x00007ff7bfefef88, __last=__list_iterator<int, void *> @ 0x00007ff7bfefef80, __out_first=0x00007ff7bfeff0e0) at copy_move_common.h:81:19
    frame #2: 0x00000001000038dd t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__dispatch_copy_or_move[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__copy_backward_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_backward_trivial, std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*>(__first=__list_iterator<int, void *> @ 0x00007ff7bfefefd8, __last=__list_iterator<int, void *> @ 0x00007ff7bfefefd0, __out_first=0x00007ff7bfeff0e0) at copy_move_common.h:119:10
    frame #3: 0x000000010000384d t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__copy_backward[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*>(__first=__list_iterator<int, void *> @ 0x00007ff7bfeff028, __last=__list_iterator<int, void *> @ 0x00007ff7bfeff020, __result=0x00007ff7bfeff0e0) at copy_backward.h:55:10
    frame #4: 0x000000010000365d t.tmp.exe`int* std::__1::copy_backward[abi:v16000]<std::__1::__list_iterator<int, void*>, int*>(__first=__list_iterator<int, void *> @ 0x00007ff7bfeff088, __last=__list_iterator<int, void *> @ 0x00007ff7bfeff080, __result=0x00007ff7bfeff0e0) at copy_backward.h:68:10
    frame #5: 0x0000000100003579 t.tmp.exe`foo() at copy.pass.cpp:93:3
    frame #6: 0x000000010000372b t.tmp.exe`main((null)=1, (null)=0x00007ff7bfeff280) at copy.pass.cpp:102:3
    frame #7: 0x000000010004d4fe dyld`start + 462

I find the types way more readable now that they are no longer wrapped in __unconstrained_reverse_iterator.

AFAICT this was a language bug that will get fixed.

I mean the wrong return value for ranges::{copy,move}_backward (see D130968). We even had wrong tests for those (i.e., the tests were testing for the *wrong* value). This sort of confusion of whether we are talking about the begin/end of the actual range or the reversed range is inherent when using reverse iterators. I take it as empirical evidence that using reverse iterators in the implementation adds complexity and potential to introduce bugs.

Do you already have that follow-up? I don't know how much extra complexity will result in removing __unconstrained_reverse_iterator.

Not yet. The change amounts to a couple additional functions in inplace_merge.

This optimization was reported as a performance bug. See https://github.com/llvm/llvm-project/issues/33687. This was one of the reasons the overload was introduced. If we remove this optimization we should at least comment there that it was removed any why.

Reading the original issue, I'm not convinced we should fix this at all. Unfortunately, it doesn't really explain the actual problem the issue author had, which would be the most important part. As for open source examples, it seems that the vector::insert function could easily use copy_backward instead of copying with reverse iterators, and ToolbarActionBar::GetActions() doesn't seem like the kind of function where this kind of microoptimization could ever matter. Addressing this issue is not just a matter of putting in the effort. We are paying real costs for the added complexity, and by extension so are our users (when we introduce a bug, for example) -- and note that this applies to pretty much *all* of our users, whereas the optimization would benefit only a certain subset. I can certainly be convinced otherwise by some concrete, real-world examples.

In any case, if we do this optimization at all, it should apply equally to the input and the output sequence. If that is too complicated to implement, I'll take it as further evidence that this optimization doesn't justify the added cost.

Just to be clear, I'm not against this change. I don't know which way to implement copy and friends is the right way, but I want to make sure we don't regress unexpectedly in some area.

It sounds like we can agree that if constexpr would be the best way if it were feasible. I really hope it becomes feasible at some point in the future.

Another thing to consider is that I think D132505 could implement the algorithm once instead of four times if we use the "forward with iterator wrappers" approach and adding specializations for the wrappers. D132505 is an improvement either way, so I'm not that concerned. I also haven't tried to implement the forwarding version, so take it with a grain of salt. Other suggestions to remove the duplicates are always appreciated!

I don't particularly like this sort of decorator pattern, although it can definitely be the lesser of two evils (for D132505, it's likely justified, though I haven't looked closely yet).

libcxx/include/__algorithm/move.h
32

@philnik Thanks a lot for digging this up! Looks like we don't have to do anything (thankfully, because it would complicate the logic quite a bit).

var-const updated this revision to Diff 462645.Fri, Sep 23, 9:54 PM
var-const marked an inline comment as done.

Feedback.

philnik accepted this revision.Sat, Sep 24, 1:29 AM

By the way, are there any C++11-and-later features that you feel would *really* make a difference if they were available in C++03? I gave it some thought, and if constexpr seemed to be number one (and number 2 would be auto and decltype(auto) as the return type of a function).

Other than concepts (would probably a bit too much to ask), lambdas would definitely be my number one. It would just be so much nicer to use algorithms instead of raw loops everywhere. C++11 attributes would also be quite nice.

AFAICT this was a language bug that will get fixed.

I mean the wrong return value for ranges::{copy,move}_backward (see D130968). We even had wrong tests for those (i.e., the tests were testing for the *wrong* value). This sort of confusion of whether we are talking about the begin/end of the actual range or the reversed range is inherent when using reverse iterators. I take it as empirical evidence that using reverse iterators in the implementation adds complexity and potential to introduce bugs.

Yeah, I don't think this was because of reverse_iterators. My intuition about what {copy, move}_backward should return is just backwards.

This revision is now accepted and ready to land.Sat, Sep 24, 1:29 AM

Fix the CI.

var-const updated this revision to Diff 464186.Fri, Sep 30, 2:24 AM

Fix the CI.

var-const updated this revision to Diff 464466.Fri, Sep 30, 8:38 PM

Fix the CI.

var-const updated this revision to Diff 464482.Sat, Oct 1, 2:13 AM

Fix the CI.

var-const updated this revision to Diff 464527.Sat, Oct 1, 3:50 PM

Fix the CI.