This is an archive of the discontinued LLVM Phabricator instance.

[libc++][ranges]Refactor `copy{,_backward}` and `move{,_backward}`
AbandonedPublic

Authored by var-const on Jul 28 2022, 2:57 AM.

Details

Summary

Instead of using reverse_iterator, share the optimization between the 4 algorithms. The key observation here that memmove applies to both copy and move identically, and to their _backward versions very similarly. All algorithms now follow the same pattern along the lines of:

if constexpr (can_memmove<InIter, OutIter>) {
  memmove(first, last, out);
} else {
  naive_implementation(first, last, out);
}

A follow-up will delete unconstrained_reverse_iterator.

This patch removes duplication and divergence between std::copy, std::move and std::move_backward. It also improves testing:

  • the test for whether the optimization is used only applied to std::copy and, more importantly, was essentially a no-op because it would still pass if the optimization was not used;
  • there were no tests to make sure the optimization is not used when the effect would be visible.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ldionne added inline comments.Aug 4 2022, 12:03 PM
libcxx/include/__algorithm/copy_move_common.h
28–29

We do need a different trait to implement std::copy and std::move. We can't check is_trivially_move_assignable or is_trivially_copy_assignable for both algorithms.

43

I would use std::memmove instead. We're not using this optimization during constexpr anyway.

107

I think this is actually insufficient. There's another bug that we rewrap iterators and sentinels, leading to running the algorithm on a non-sensical range (delimited by [unwrapped-iterator, wrapped-sentinel)).

See the TODO: some algorithms calls std::__copy in libcxx/test/std/algorithms/alg.sorting/sortable_helpers.h. I'm not 100% sure what's the right fix, but I would maybe explore querying whether __unwrap_range can be called on [iter, sent) -- that would require changes to the semantics of __unwrap_range, though. Or maybe you could call __unwrap_range unconditionally and have it be a no-op if it can't unwrap.

That also makes me think that __can_unwrap should actually be called __can_rewrap, and then everything makes a bit more sense.

TLDR: Rename __can_unwrap to __can_rewrap, and then use __unwrap_range instead of __unwrap_iter for __first and __last. That way, we'll either unwrap both or none of them, and we'll know that we're able to rewrap them after.

128–132

Same above. It's easier to read enable_if when they are in the template argument list, and it's what we try to do in new code elsewhere in the library.

var-const retitled this revision from [libc++][ranges][NFC] Refactor `move` and `move_backward`. to [libc++][ranges][NFC] Refactor `copy{,_backward}` and `move{,_backward}`.Aug 8 2022, 6:54 PM
var-const edited the summary of this revision. (Show Details)
var-const updated this revision to Diff 451023.Aug 8 2022, 6:58 PM

Fix the CI.

var-const updated this revision to Diff 451024.Aug 8 2022, 6:59 PM

Fix the CI.

var-const updated this revision to Diff 451199.Aug 9 2022, 10:25 AM
var-const marked 7 inline comments as done.

Address feedback

libcxx/include/__algorithm/copy_move_common.h
107

Renamed the trait and replaced __unwrap_iter with __unwrap_range.

Or maybe you could call __unwrap_range unconditionally and have it be a no-op if it can't unwrap.

I might be misreading it, but it seems to already be the case. __unwrap_range only calls __unwrap_iter either if both the iterator and the sentinel are the same type or if the end iterator can be created in constant time; otherwise, it simply returns the given arguments.

128–132

Thanks for explaining!

libcxx/include/__algorithm/move.h
77

I'm not sure if this optimization is still pulling its weight in complexity now that we no longer use reverse iterators in the implementation. I also don't really like that it only applies to the case where both the input and the output iterators are reverse, even though these are orthogonal.

var-const updated this revision to Diff 451203.Aug 9 2022, 10:30 AM

Fix the CI.

ldionne added inline comments.Aug 11 2022, 12:10 PM
libcxx/include/__algorithm/copy.h
65 ↗(On Diff #451203)

I think we can get rid of __is_reverse_iterator in this patch too.

A folllow-up patch should get rid of __unconstrained_reverse_iterator, but that may require a bit of additional work in e.g. inplace_merge. Still worth doing.

libcxx/include/__algorithm/copy_move_common.h
2

Draft from our discussion just now:

template <class _AlgPolicy>
struct __copy_loop {
  template <class _InIter, class _Sentinel, class _OutIter>
  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX11
  pair<_InIter, _OutIter>
  operator()(_InIter __first, _Sentinel __last, _OutIter __result) const {
    while (__first != __last) {
      *__result = *__first;
      ++__first;
      ++__result;
    }

    return pair<_InIter, _OutIter>(std::move(__first), std::move(__result));
  }
};

template <class _AlgPolicy>
struct __copy_trivial {
  // At this point, the iterators have been unwrapped so any contiguous_iterator
  // has been unwrapped to a pointer.
  template <class _In, class _Out, class = __enable_if_t<
    is_trivially_assignable<_Out&, _In&>::value
  > >
  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX11
  pair<_In*, _Out*>
  operator()(_In* __first, _In* __last, _Out* __result) const {
    const size_t __n = static_cast<size_t>(__last - __first);
    std::memmove(__result, __first, __n * sizeof(_Out));

    return pair<_In*, _Out*>(__last, __result + __n);
  }
};

template <class _F1, class _F2>
struct __overload : _F1, _F2 {
    using _F1::operator();
    using _F2::operator();
};

template <class _Algorithm, class _InIter, class _Sent, class _OutIter,
        __enable_if_t<__can_rewrap<_InIter, _Sent, _OutIter>::value, int> = 0>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX14
pair<_InIter, _OutIter>
__unwrap_and_dispatch(_InIter __first, _Sent __last, _OutIter __out_first) {
  auto __range = std::__unwrap_range(__first, std::move(__last));
  auto __result = _Algorithm()(std::move(__range.first),
                               std::move(__range.second),
                               std::__unwrap_iter(__out_first));
  return pair<_InIter, _OutIter>(
      std::__rewrap_range<_Sent>(std::move(__first), std::move(__result.first)),
      std::__rewrap_iter(std::move(__out_first), std::move(__result.second)));
}

template <class _Algorithm, class _InIter, class _Sent, class _OutIter,
          __enable_if_t<!__can_rewrap<_InIter, _Sent, _OutIter>::value, int> = 0>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX14
pair<_InIter, _OutIter>
__unwrap_and_dispatch(_InIter __first, _Sent __last, _OutIter __out_first) {
  return _Algorithm()(std::move(__first), std::move(__last), std::move(__out_first));
}

template <class _NaiveAlgorithm, class _OptimizedAlgorithm, class _InIter, class _Sent, class _OutIter>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX14
pair<_InIter, _OutIter>
__dispatch_copy_or_move(_InIter __first, _Sent __last, _OutIter __out_first) {
  if (__libcpp_is_constant_evaluated()) {
    return std::__unwrap_and_dispatch<_NaiveAlgorithm>(std::move(__first), std::move(__last), std::move(__out_first));
  }

  using _Algorithm = std::__overload<_NaiveAlgorithm, _OptimizedAlgorithm>;
  return std::__unwrap_and_dispatch<_Algorithm>(std::move(__first), std::move(__last), std::move(__out_first));
}


template <class _AlgPolicy, class _InIter, class _Sent, class _OutIter>
pair<_InIter, _OutIter>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX11
__copy(_InIter __first, _Sent __last, _OutIter __result) {
  return std::__dispatch_copy_or_move<__copy_loop<_AlgPolicy>, __copy_trivial<_AlgPolicy> >(
      std::move(__first), std::move(__last), std::move(__result));
}

template <class _InputIterator, class _OutputIterator>
inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX17
_OutputIterator
copy(_InputIterator __first, _InputIterator __last, _OutputIterator __result) {
  return std::__copy<_ClassicAlgPolicy>(__first, __last, __result).second;
}
98

Per your comment just now: maybe we can take T* in the trivial version of __run since contiguous iterators are already unwrapped by unwrap_iter by the time we get there. That would simplify the sfinae and perhaps more.

libcxx/include/__algorithm/ranges_move_backward.h
19

Not needed anymore.

libcxx/include/__algorithm/unwrap_iter.h
36 ↗(On Diff #451203)

Not sure whether the reordering is worth doing. Neutral on this.

libcxx/test/std/algorithms/alg.modifying.operations/alg.copy/ranges.copy_backward.pass.cpp
106–109 ↗(On Diff #451203)

I don't think that's still the case?

var-const updated this revision to Diff 456586.Aug 30 2022, 2:58 AM
var-const marked 7 inline comments as done.

Feedback, add tests.

libcxx/include/__algorithm/copy_move_common.h
83

Now that our implementation no longer relies on reverse iterators, I don't think the extra complexity is justified.

libcxx/include/__algorithm/unwrap_iter.h
36 ↗(On Diff #451203)

In that case, I'd keep it -- I think it makes more sense to place _re_wrapping after unwrapping.

libcxx/test/std/algorithms/alg.modifying.operations/alg.copy/ranges.copy_backward.pass.cpp
106–109 ↗(On Diff #451203)

Yes, this is fixed now!

libcxx/test/std/algorithms/alg.modifying.operations/alg.move/ranges.move.pass.cpp
1

Should all be covered now.

var-const marked an inline comment as done.Aug 30 2022, 2:59 AM
var-const published this revision for review.Aug 30 2022, 3:01 AM
var-const retitled this revision from [libc++][ranges][NFC] Refactor `copy{,_backward}` and `move{,_backward}` to [libc++][ranges]Refactor `copy{,_backward}` and `move{,_backward}`.
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
var-const added inline comments.Aug 30 2022, 3:09 AM
libcxx/test/libcxx/algorithms/alg.modifying.operations/copy.pass.cpp
74 ↗(On Diff #456586)

This test was actually incorrect -- because unwrapping iterators uses to_address to reduce them to pointers, the fact that this iterator is not incrementable doesn't matter -- it would work if the implementation chooses to forward to memmove just fine.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_nontrivial.pass.cpp
9 ↗(On Diff #456586)

I think these tests can be made to work pre-C++20, but it would be painful, partially because contiguous_iterator_tag is a C++20 addition. Please let me know if you think it's worth the trouble.

66 ↗(On Diff #456586)

I have manually verified that this check works -- if I change NonTrivialCopy/Move to become trivial, this test will no longer compile, complaining that contiguous_iterator cannot be converted to void*.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_trivial.pass.cpp
37 ↗(On Diff #456586)

I don't think it's possible to test this any other way, since it is by design that whether the optimization was applied is not observable by normal means. While it is undefined behavior to add things to the std namespace for user code, I would consider tests to be part of the implementation (and in any case, we control the implementation). This logic would break if the optimization changes, e.g. if the call to memmove were to be replaced by __builtin_memmove, but in that case the test will start failing, so I don't think that's a significant problem.

philnik requested changes to this revision.Aug 30 2022, 4:01 AM
philnik added a subscriber: philnik.
philnik added inline comments.
libcxx/include/__algorithm/copy.h
24 ↗(On Diff #456586)

Could you clang-format all the changes?

27 ↗(On Diff #456586)

Why did you rename _Sent to _Sentinel? AFAIK we settled on _Sent for the naming in the ranges algorithms. Same for _InputIterator and similar stuff.

54 ↗(On Diff #456586)

Why does copy get an _AlgPolicy now? AFAICT it's never used anywhere, which means it just produces duplicate code.

libcxx/include/__algorithm/copy_backward.h
38 ↗(On Diff #456586)

std::make_pair?

52 ↗(On Diff #456586)

?!

libcxx/include/__algorithm/copy_move_common.h
21

Could you use the granularized includes instead?

35

Why did you get rid of calling __builtin_memmove during constant evaluation?

116

Sentinels are always copy-constructible.

151
libcxx/include/__algorithm/move_backward.h
63–64

Please add a static_assert to make sure the iterators are copyable.

66

Why constexpr since 20? Others are marked as constexpr since 14.

libcxx/include/__algorithm/unwrap_iter.h
36 ↗(On Diff #456586)

Why did you move this code?

libcxx/include/__algorithm/unwrap_range.h
34 ↗(On Diff #456586)

AFAICT this isn't correct. Lines 43-45 should handle this case just fine. Also, why not _v?

35 ↗(On Diff #456586)

Why the indentation change? This file was properly clang-formatted before.

This revision now requires changes to proceed.Aug 30 2022, 4:01 AM
ldionne accepted this revision as: ldionne.Aug 30 2022, 11:27 AM

I am quite happy with this patch. There are a few nits, but other than that, LGTM. Thanks for the numerous iterations offline on this one.

@philnik I'll let you give final approval once you're satisfied.

libcxx/include/__algorithm/copy.h
54 ↗(On Diff #456586)

For consistency.

To address the code size concern, I think what we should do is change __copy above like so:

template <class _AlgPolicy, class _InIter, class _Sent, class _OutIter>
pair<_InIter, _OutIter>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14
__copy(_InIter __first, _Sent __last, _OutIter __result) {
  // NOTE: There is no difference between the ranges and the STL classic version of this algorithm,
  //       so we always use _ClassicAlgPolicy to reduce code bloat.
  return std::__dispatch_copy_or_move<__copy_loop<_ClassicAlgPolicy>, __copy_trivial<_ClassicAlgPolicy> >(
      std::move(__first), std::move(__last), std::move(__result));
}
libcxx/include/__algorithm/copy_move_common.h
35

GCC didn't support it, which added a bunch of complexity. So now what we do instead is unwrap the iterators and use the loop version when we're inside constant evaluation. The performance during constant evaluation is less of a concern than performance at runtime.

libcxx/include/__algorithm/unwrap_iter.h
36 ↗(On Diff #451203)

Please land it as a separate NFC patch, that should solve the issue and close this thread.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_nontrivial.pass.cpp
9 ↗(On Diff #456586)

I don't think it's worth the trouble.

50 ↗(On Diff #456586)
64 ↗(On Diff #456586)
philnik added inline comments.Aug 30 2022, 12:05 PM
libcxx/include/__algorithm/copy.h
54 ↗(On Diff #456586)

This isn't just affecting copy though. This also results in multiple versions for all the set algorithms. I really don't think that this bit of consistency is worth lots of extra instantiations for these algorithms.

libcxx/include/__algorithm/copy_move_common.h
35

AFAICT the only complexity added by GCC not supporting it is an extra #ifndef _LIBCPP_COMPILER_GCC. How does calling __builtin_memmove result in worse runtime performance? IMO we shouldn't massively pessimize constant evaluation performance for removing a few lines of code.

For example:

#include <cstddef>

constexpr void copy_arrays(int* begin, int* end, int* out) {
#ifdef USE_MEMMOVE
  ::__builtin_memmove(out, begin, static_cast<std::size_t>(end - begin));
#else
  while (begin != end) {
    *out = *begin;
    ++begin;
    ++out;
  }
#endif
}

constexpr bool use_data() {
  int a[100000] = {};
  int b[100000] = {};
  ::copy_arrays(a, a + 100000, b);

  return true;
}

static_assert(use_data());

Without using __builtin_memmove takes 200ms to compile, with __builtin_memmove is takes 30ms.

huixie90 accepted this revision as: huixie90.Aug 30 2022, 12:56 PM
huixie90 added a subscriber: huixie90.

LGMT with nits and with green CI

libcxx/include/__algorithm/copy.h
24 ↗(On Diff #456586)

would it be possible to make this not a template?

27–30 ↗(On Diff #456586)

I understand it is not a problem but putting a completely unconstrained overload into the __overload struct makes me a bit concerned. I understand it is working right now because the other overload's signature takes a pointer, which is more specific than this one. Having said that, I don't think putting a constraint helps it in any way.

46 ↗(On Diff #456586)

would it be possible to put __copy_trivial definition inside this header? AFAICT, the common header only reuses __copy_trivial_impl not __copy_trivial

libcxx/include/__algorithm/copy_move_common.h
105–108

AFAIK, this is not an aggregate pre c++17 and you will need a constructor

110–119

nit: would _Lazy<_And<... work here?

143

extra nit and not an issue: I don't like the name Naive but I don't anything better

libcxx/include/__algorithm/move.h
33

question: if the input is a contiguous iterator pointing to a trivially copyable/moveable type, but the iterator has specialised iter_move ( yes it is very contrived), would the move dispatch to the overload that unwraps the contiguous iterator to pointer and call memmove without going through the specialised iter_move?

var-const updated this revision to Diff 456895.Aug 31 2022, 1:53 AM
var-const marked 21 inline comments as done.

Address feedback

libcxx/include/__algorithm/copy.h
27 ↗(On Diff #456586)

Restored _Sent. I didn't think this qualifies as a range algorithm, but now that you mention it, I presume it does (I still don't think having two different name conventions is a good idea).

27–30 ↗(On Diff #456586)

Yes, the idea is that pointers are more specialized. I actually had mutually-exclusive constraints in an earlier revision, but relying on pointers seems to provide a nice simplification.

46 ↗(On Diff #456586)

I'd rather keep everything related to the memmove optimization in one place. One of the goals of this patch was to make these details leak out as little as possible into the actual algorithms.

libcxx/include/__algorithm/copy_backward.h
38 ↗(On Diff #456586)

I think @huixie90 pointed out previously that make_pair creates copies of the arguments in C++03 mode.

52 ↗(On Diff #456586)

Thanks for noticing! I started working on this patch a while ago, this is probably a rudiment from that time.

libcxx/include/__algorithm/copy_move_common.h
35

The original form of this code was (with slight NFC reformatting):

if (__libcpp_is_constant_evaluated()
#ifndef _LIBCPP_COMPILER_GCC
      && !is_trivially_copyable<_InValueT>::value
#endif
     ) {
    return std::__copy_impl<_InValueT*, _InValueT*, _OutValueT*>(__first, __last, __result);
  }
  const size_t __n = static_cast<size_t>(__last - __first);
  if (__n > 0)
    ::__builtin_memmove(__result, __first, __n * sizeof(_OutValueT));
  return std::make_pair(__first + __n, __result + __n);

Unless I'm missing something obvious, __builtin_memmove was never called during constant evaluation.

110–119

Probably. What would be the advantage of using _Lazy here?

116

This isn't necessarily a sentinel in the C++20-concept sense. This can be instantiated with e.g. a pair of input iterators that aren't copyable.

libcxx/include/__algorithm/unwrap_iter.h
36 ↗(On Diff #456586)

@philnik I find that logically, *re*wrapping has to go after unwrapping, since *re* implies adding something that was previously removed. I originally had more changes to this file, but those went away eventually. I reverted the change from this patch and will submit it separately.

libcxx/include/__algorithm/unwrap_range.h
34 ↗(On Diff #456586)

Hmm, I remember adding this to avoid a compilation error, but it looks like that error no longer applies. Removed.

var-const added inline comments.Aug 31 2022, 1:53 AM
libcxx/include/__algorithm/move.h
33

I think this is a great point, thanks. I'm inclined to leave it as is for this patch because it's a preexistent thing, but I will address it in a follow-up.

libcxx/include/__algorithm/move_backward.h
66

This is preexisting and I'd rather not change it in this patch.

var-const marked an inline comment as done.Aug 31 2022, 1:53 AM
philnik added inline comments.Aug 31 2022, 5:06 AM
libcxx/include/__algorithm/copy_backward.h
38 ↗(On Diff #456586)

Then we should just remove the #ifndef _LIBCPP_CXX03_LANG, since we require rvalue support in C++03. See D133013.

52 ↗(On Diff #456586)

Ah OK, so it was just a merge conflict probably.

libcxx/include/__algorithm/copy_move_common.h
35

To me it looks like it was called when the type is_trivially_copyable.

116

C++17 input iterators are always copyable and input_iterators can't be their own sentinel if they aren't copyable .

var-const updated this revision to Diff 457753.Sep 2 2022, 7:15 PM
var-const marked 4 inline comments as done.

Address feedback, fix CI.

libcxx/include/__algorithm/copy.h
54 ↗(On Diff #456586)

Now because of using __builtin_memmove, it's necessary to get the value_type of the iterator, so _AlgPolicy is now necessary for copy as well.

libcxx/include/__algorithm/copy_backward.h
38 ↗(On Diff #456586)

Thanks for doing the change!

libcxx/include/__algorithm/copy_move_common.h
35

Okay, I restored using __builtin_memmove during constant evaluation, with the necessary additional conditions. I also added a test to make sure the is_trivially_copyable condition is honored.

I'm still not sure if it's worth the extra complexity (although now the complexity is all in one place, making this a little more manageable). @ldionne What do you think? I'm somewhat on the fence and can be convinced either way.

ldionne added inline comments.Sep 9 2022, 6:56 AM
libcxx/include/__algorithm/copy_move_common.h
35

Yeah, I think restoring the __builtin_memmove behavior is probably worth it given the surprising (to me) compile-time speedup.

55–88

IMO those should be defined in copy.h, move.h, copy_backward.h and move_backward.h instead.

libcxx/include/__algorithm/move.h
33

Is this really something we should fix? Aren't we allowed to bypass the iterator's iter_move?

philnik accepted this revision.Sep 11 2022, 1:29 AM

LGTM with comments addressed and a green CI.

libcxx/include/__algorithm/copy_backward.h
53 ↗(On Diff #457753)

Please static_assert that the iterators are copyable.

libcxx/include/__algorithm/copy_move_common.h
59

It's a weird quirk of clang-format right now and I don't know exactly why it does this, but these spaces can be removed and clang-format won't add them again. I think it's because of SpacesInAngles: Keep, but that shouldn't mean they are kept after a an opening one. Anyways, if you feel like removing these it would be nice, but not a must from my side.

libcxx/include/__algorithm/move_backward.h
63–64

Please also static_assert here.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_nontrivial.pass.cpp
25 ↗(On Diff #457753)

Maybe use a requires since the test requires C++20 anyways?

106 ↗(On Diff #457753)

This is technically still ill-formed, since the return type has to be element_type*, but I guess it's OK for us to do it in this case.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_trivial.pass.cpp
23 ↗(On Diff #457753)

To make it less prone to failing when we change includes, could you move the <type_traits> include down and static_assert the trivially_copyable later? It should still catch a problem before anything unpredictable happens. AFAICT you don't actually need the <cstring> include. <cstddef> should suffice for size_t.

50 ↗(On Diff #457753)

Also requires here.

68 ↗(On Diff #457753)

What is this Foo for? Shouldn't the compiler be able to deduce the correct type?

libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_pointer_size.pass.cpp
9–11 ↗(On Diff #457753)

Why is this required? Did it hit the constexpr limit without the __builtin_memmove? In that case, please remove the bumped limit again.

This revision is now accepted and ready to land.Sep 11 2022, 1:29 AM
philnik requested changes to this revision.Sep 11 2022, 4:00 PM

Sorry, I know I approved it before, but I just noticed that this patch still regresses on some optimizations (in particular optimizing reverse_iterator<contiguous-iterator>), so maybe I'm missing more and I'm also questioning the usefulness of the changes more and more, so I'd like some clarification. What exactly is the goal of the refactoring here? How does the refactored code do the job better than forwarding copy_backward, move and move_backward to copy? The answers to these questions would probably also be good to have in the commit message. I first thought that his patch removed some of the complication in copy.h, but the longer I think about it it the more it looks to me like it's just shifting the complexity around.

This revision now requires changes to proceed.Sep 11 2022, 4:00 PM
var-const edited the summary of this revision. (Show Details)Sep 20 2022, 4:59 PM
var-const edited the summary of this revision. (Show Details)
var-const updated this revision to Diff 461776.Sep 20 2022, 7:13 PM
var-const marked 9 inline comments as done.

Feedback + rebase

Sorry, I know I approved it before, but I just noticed that this patch still regresses on some optimizations (in particular optimizing reverse_iterator<contiguous-iterator>), so maybe I'm missing more and I'm also questioning the usefulness of the changes more and more, so I'd like some clarification. What exactly is the goal of the refactoring here? How does the refactored code do the job better than forwarding copy_backward, move and move_backward to copy? The answers to these questions would probably also be good to have in the commit message. I first thought that his patch removed some of the complication in copy.h, but the longer I think about it it the more it looks to me like it's just shifting the complexity around.

The main motivation is to reduce complexity and remove code duplication. Majority of the remaining complexity is due to having to support C++03. The ideal state I envision is essentially something like

if constexpr (__can_memmove<_InIter, _OutIter>) {
  return __memmove(__first, __last, __out);
} else {
  return __copy_loop(__first, __last, __out);
}

in every algorithm (as a side note, I really hope we get Clang to support if constexpr in C++03 mode at some point). This patch brings this as close to that state as I could within the limitations of C++03.

As for specific improvements:

  • the optimization was fully duplicated between std::copy and std::move. Moreover, std::move_backward had a divergent, less thorough implementation of the optimization. Now the duplication is removed and everything related to this optimization is encapsulated in one place.
  • all 4 algorithms now have symmetrical, self-contained implementations which makes them easier to read and debug.
  • reverse_iterator is no longer used in the implementation, which makes it significantly easier to understand and removes artificial types from backtraces. Using reverse_iterator has lead to at least one serious bug already.
  • unconstrainted_reverse_iterator can now be removed (which I intend to do in a follow-up), removing a lot of code duplication and complexity.
  • the test for whether the optimization is used only applied to std::copy and, more importantly, was essentially a no-op because it would still pass if the optimization was not used;
  • there were no tests to make sure the optimization is not used when the effect would be visible.

Regarding the reverse_iterator optimization you mention, I think it was solution for a problem that the previous approach introduced and that is no longer applicable. The optimization was very limited in that it would only apply to the case where both the input and the output iterators were reverse iterators. I see no inherent reason why users would reverse both the input and the output sequence rather than just one of them, outside of the artificial case where our implementation itself would do the wrapping. If this optimization were to be restored, it would have to apply equally to input and output iterators. I don’t think, however, it would be pulling its weight.

libcxx/include/__algorithm/move.h
33

Hmm, inlining the definition from the standard a little bit, it looks like:

Effects: [...] For each non-negative integer `n < N`, performs `*(result + n) = ranges​::​iter_­move(first + n)`.

I'm not sure if there's anything that would allow us to bypass this?
(see https://eel.is/c++draft/alg.move)

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_nontrivial.pass.cpp
25 ↗(On Diff #457753)

Good point.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_trivial.pass.cpp
23 ↗(On Diff #457753)

Replaced the <cstring> include (it's from the previous iteration which used memmove). I'd prefer to keep the static assertion close to the point of class definition, though. Can you elaborate on how the test might fail if we change includes?

libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_pointer_size.pass.cpp
9–11 ↗(On Diff #457753)

It failed on CI. I'll remove the flags and see if the current state stays within the limit or not.

The main motivation is to reduce complexity and remove code duplication. Majority of the remaining complexity is due to having to support C++03. The ideal state I envision is essentially something like

if constexpr (__can_memmove<_InIter, _OutIter>) {
  return __memmove(__first, __last, __out);
} else {
  return __copy_loop(__first, __last, __out);
}

in every algorithm (as a side note, I really hope we get Clang to support if constexpr in C++03 mode at some point). This patch brings this as close to that state as I could within the limitations of C++03.

It would definitely be nice if we could write it that simple. Unfortunately we can't currently.

As for specific improvements:

  • the optimization was fully duplicated between std::copy and std::move. Moreover, std::move_backward had a divergent, less thorough implementation of the optimization. Now the duplication is removed and everything related to this optimization is encapsulated in one place.

That would also be the case if we forward to std::copy though, right?

  • all 4 algorithms now have symmetrical, self-contained implementations which makes them easier to read and debug.

I'm not sure this is actually simpler to debug. I haven't checked the current backtrace, so maybe it's worse, but here is one similar to what I encountered while working on D132505:

* thread #1, name = 't.tmp.exe', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x000055555555967c t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const [inlined] std::__1::__segmented_iterator_traits<std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>::__end[abi:v16000](__iter=0x0000000000000000) at deque:406:91
    frame #1: 0x000055555555967c t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const at copy.h:83:27
    frame #2: 0x0000555555559662 t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const [inlined] std::__1::pair<int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__unwrap_and_dispatch[abi:v16000]<std::__1::__overload<std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_trivial>, int const*, int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(__first=0x0000000000000000, __last=0x0000000000000000, __out_first=__deque_iterator<int, int *, int &, int **, long, 1024L> @ 0x0000000001c45b10) at copy_move_common.h:81:19
    frame #3: 0x0000555555559662 t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const [inlined] std::__1::pair<int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__dispatch_copy_or_move[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_trivial, int const*, int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=0x0000000000000000, __last=0x0000000000000000, __out_first=__deque_iterator<int, int *, int &, int **, long, 1024L> @ 0x0000000001c13090) at copy_move_common.h:119:10
    frame #4: 0x0000555555559662 t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator()[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const [inlined] std::__1::pair<int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy[abi:v16000]<std::__1::_ClassicAlgPolicy, int const*, int const*, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=0x0000000000000000, __last=0x0000000000000000, __result=__deque_iterator<int, int *, int &, int **, long, 1024L> @ 0x0000000001c13090) at copy.h:108:10
    frame #5: 0x0000555555559662 t.tmp.exe`std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>::operator(this=0x00007fffffffda18, __first=__deque_iterator<int, const int *, const int &, const int *const *, long, 1024L> @ 0x00000000019d5fd0, __last=__deque_iterator<int, const int *, const int &, const int *const *, long, 1024L> @ 0x00000000019d5fd0, __result=__deque_iterator<int, int *, int &, int **, long, 1024L> @ 0x00007fffffffda00)[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>) const at copy.h:50:22
    frame #6: 0x00005555555554cc t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(int, int) [inlined] std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__unwrap_and_dispatch[abi:v16000]<std::__1::__overload<std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_trivial>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>, 0>(__first=<unavailable>, __last=<unavailable>, __out_first=<unavailable>) at copy_move_common.h:81:19
    frame #7: 0x000055555555549e t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(int, int) [inlined] std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__dispatch_copy_or_move[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__copy_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_trivial, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=<unavailable>, __last=<unavailable>, __out_first=<unavailable>) at copy_move_common.h:119:10
    frame #8: 0x000055555555549e t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(int, int) [inlined] std::__1::pair<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>> std::__1::__copy[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=<unavailable>, __last=<unavailable>, __result=<unavailable>) at copy.h:108:10
    frame #9: 0x000055555555549e t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(int, int) [inlined] std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l> std::__1::copy[abi:v16000]<std::__1::__deque_iterator<int, int const*, int const&, int const* const*, long, 1024l>, std::__1::__deque_iterator<int, int*, int&, int**, long, 1024l>>(__first=<unavailable>, __last=<unavailable>, __result=<unavailable>) at copy.h:115:10
    frame #10: 0x000055555555549e t.tmp.exe`void testN<std::__1::deque<int, std::__1::allocator<int>>>(start=0, N=0) at copy.pass.cpp:56:5
    frame #11: 0x000055555555520d t.tmp.exe`main((null)=<unavailable>, (null)=<unavailable>) at copy.pass.cpp:77:13
    frame #12: 0x00007ffff7b2d20a libc.so.6`___lldb_unnamed_symbol3135 + 122
    frame #13: 0x00007ffff7b2d2bc libc.so.6`__libc_start_main + 124
    frame #14: 0x0000555555555121 t.tmp.exe`_start + 33
  • reverse_iterator is no longer used in the implementation, which makes it significantly easier to understand and removes artificial types from backtraces. Using reverse_iterator has lead to at least one serious bug already.

AFAICT this was a language bug that will get fixed. When it's fixed I don't think we'd need an __unconstrained_reverse_iterator to forward using an iterator wrapper. Not that that helps us currently.

  • unconstrainted_reverse_iterator can now be removed (which I intend to do in a follow-up), removing a lot of code duplication and complexity.

Do you already have that follow-up? I don't know how much extra complexity will result in removing __unconstrained_reverse_iterator.

  • the test for whether the optimization is used only applied to std::copy and, more importantly, was essentially a no-op because it would still pass if the optimization was not used;
  • there were no tests to make sure the optimization is not used when the effect would be visible.

I agree with you here. The tests are definitely a huge improvement. The only gripe I have with them is that they have UB, but I guess it would be a huge red flag if that wasn't the case.

Regarding the reverse_iterator optimization you mention, I think it was solution for a problem that the previous approach introduced and that is no longer applicable. The optimization was very limited in that it would only apply to the case where both the input and the output iterators were reverse iterators. I see no inherent reason why users would reverse both the input and the output sequence rather than just one of them, outside of the artificial case where our implementation itself would do the wrapping. If this optimization were to be restored, it would have to apply equally to input and output iterators. I don’t think, however, it would be pulling its weight.

This optimization was reported as a performance bug. See https://github.com/llvm/llvm-project/issues/33687. This was one of the reasons the overload was introduced. If we remove this optimization we should at least comment there that it was removed any why.

Just to be clear, I'm not against this change. I don't know which way to implement copy and friends is the right way, but I want to make sure we don't regress unexpectedly in some area.

Another thing to consider is that I think D132505 could implement the algorithm once instead of four times if we use the "forward with iterator wrappers" approach and adding specializations for the wrappers. D132505 is an improvement either way, so I'm not that concerned. I also haven't tried to implement the forwarding version, so take it with a grain of salt. Other suggestions to remove the duplicates are always appreciated!

libcxx/include/__algorithm/move.h
33

The contiguous_iterator concept guarantees that ranges::iter_move(a) has the same type, value category and effects as std::move(*a). AFAICT it would be UB to add an overload to std::move, so a specialized ranges::iter_move() that does anything else than casting to an rvalue would be UB.

libcxx/test/libcxx/algorithms/alg.modifying.operations/copy_move_trivial.pass.cpp
23 ↗(On Diff #457753)

It would break if <type_traits> would include copy for some reason. https://godbolt.org/z/P9Eq13ErY

  • unconstrainted_reverse_iterator can now be removed (which I intend to do in a follow-up), removing a lot of code duplication and complexity.

Do you already have that follow-up? I don't know how much extra complexity will result in removing __unconstrained_reverse_iterator.

We looked at it briefly and it didn't seem to incur much penalty. FWIW, I think removing the need for __unconstrained_reverse_iterator (and generally speaking the use of reverse_iterator as an implementation detail of low-level algorithms) is by far the most important motivation, at least for me.

var-const marked 5 inline comments as done.Sep 23 2022, 9:53 PM

It would definitely be nice if we could write it that simple. Unfortunately we can't currently.

I really hope we would be able to at some point. The code structure from this patch, while far from this ideal, can be transformed into this form in a straightforward way.

By the way, are there any C++11-and-later features that you feel would *really* make a difference if they were available in C++03? I gave it some thought, and if constexpr seemed to be number one (and number 2 would be auto and decltype(auto) as the return type of a function).

That would also be the case if we forward to std::copy though, right?

Not really. The code structure in this patch is basically:

std::copy          -> <memmove>
std::copy_backward -> <memmove>
std::move          -> <memmove>
std::move_backward -> <memmove>

All implementations are independent of each other, can be read in isolation, and are symmetrical. If I know how copy is implemented, I pretty much know how the other 3 are implemented as well.

With forwarding, the structure is

std::copy                                  -> <memmove>
std::copy_backward -> std::copy            -> <memmove>
std::move          -> std::copy            -> <memmove>
std::move_backward -> std::copy (or move?) -> <memmove>

It is inherently more complicated. Now knowing the implementation of copy doesn't tell me anything about the implementation of copy_backward. And move, IIUC, is even worse because it *sometimes* uses its own implementation and *sometimes* forwards to copy. This is even ignoring the issue of reverse iterators for the moment.

I'm not sure this is actually simpler to debug. I haven't checked the current backtrace, so maybe it's worse, but here is one similar to what I encountered while working on D132505:

I ran a simple test. The current state is:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001000020ad t.tmp.exe`std::__1::pair<std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<int*> > std::__1::__copy_impl[abi:v16000]<std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<int*> >(__first=__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void *> > @ 0x00007ff7bfefeed8, __last=__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void *> > @ 0x00007ff7bfefeed0, __result=__unconstrained_reverse_iterator<int *> @ 0x00007ff7bfefeec8) at copy.h:39:6
    frame #1: 0x0000000100001e80 t.tmp.exe`std::__1::pair<std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<int*> > std::__1::__copy[abi:v16000]<std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void*> >, std::__1::__unconstrained_reverse_iterator<int*>, 0>(__first=__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void *> > @ 0x00007ff7bfefef98, __last=__unconstrained_reverse_iterator<std::__1::__list_iterator<int, void *> > @ 0x00007ff7bfefef90, __result=__unconstrained_reverse_iterator<int *> @ 0x00007ff7bfefef88) at copy.h:96:18
    frame #2: 0x0000000100001d9f t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__copy_backward[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__list_iterator<int, void*>, int*, 0>(__first=__list_iterator<int, void *> @ 0x00007ff7bfeff028, __last=__list_iterator<int, void *> @ 0x00007ff7bfeff020, __result=0x00007ff7bfeff0e0) at copy_backward.h:36:16
    frame #3: 0x0000000100001b95 t.tmp.exe`int* std::__1::copy_backward[abi:v16000]<std::__1::__list_iterator<int, void*>, int*>(__first=__list_iterator<int, void *> @ 0x00007ff7bfeff088, __last=__list_iterator<int, void *> @ 0x00007ff7bfeff080, __result=0x00007ff7bfeff0e0) at copy_backward.h:57:10
    frame #4: 0x0000000100001ac9 t.tmp.exe`foo() at copy.pass.cpp:93:3
    frame #5: 0x0000000100001c6b t.tmp.exe`main((null)=1, (null)=0x00007ff7bfeff280) at copy.pass.cpp:102:3
    frame #6: 0x00000001000454fe dyld`start + 462

With this patch it becomes

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100003b31 t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__copy_backward_loop<std::__1::_ClassicAlgPolicy>::operator(this=0x00007ff7bfefef40, __first=__list_iterator<int, void *> @ 0x00007ff7bfefeec8, __last=__list_iterator<int, void *> @ 0x00007ff7bfefeec0, __result=0x00007ff7bfeff0c8)[abi:v16000]<std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*>(std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*) const at copy_backward.h:36:8
    frame #1: 0x000000010000398f t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__unwrap_and_dispatch[abi:v16000]<std::__1::__overload<std::__1::__copy_backward_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_backward_trivial>, std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*, 0>(__first=__list_iterator<int, void *> @ 0x00007ff7bfefef88, __last=__list_iterator<int, void *> @ 0x00007ff7bfefef80, __out_first=0x00007ff7bfeff0e0) at copy_move_common.h:81:19
    frame #2: 0x00000001000038dd t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__dispatch_copy_or_move[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__copy_backward_loop<std::__1::_ClassicAlgPolicy>, std::__1::__copy_backward_trivial, std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*>(__first=__list_iterator<int, void *> @ 0x00007ff7bfefefd8, __last=__list_iterator<int, void *> @ 0x00007ff7bfefefd0, __out_first=0x00007ff7bfeff0e0) at copy_move_common.h:119:10
    frame #3: 0x000000010000384d t.tmp.exe`std::__1::pair<std::__1::__list_iterator<int, void*>, int*> std::__1::__copy_backward[abi:v16000]<std::__1::_ClassicAlgPolicy, std::__1::__list_iterator<int, void*>, std::__1::__list_iterator<int, void*>, int*>(__first=__list_iterator<int, void *> @ 0x00007ff7bfeff028, __last=__list_iterator<int, void *> @ 0x00007ff7bfeff020, __result=0x00007ff7bfeff0e0) at copy_backward.h:55:10
    frame #4: 0x000000010000365d t.tmp.exe`int* std::__1::copy_backward[abi:v16000]<std::__1::__list_iterator<int, void*>, int*>(__first=__list_iterator<int, void *> @ 0x00007ff7bfeff088, __last=__list_iterator<int, void *> @ 0x00007ff7bfeff080, __result=0x00007ff7bfeff0e0) at copy_backward.h:68:10
    frame #5: 0x0000000100003579 t.tmp.exe`foo() at copy.pass.cpp:93:3
    frame #6: 0x000000010000372b t.tmp.exe`main((null)=1, (null)=0x00007ff7bfeff280) at copy.pass.cpp:102:3
    frame #7: 0x000000010004d4fe dyld`start + 462

I find the types way more readable now that they are no longer wrapped in __unconstrained_reverse_iterator.

AFAICT this was a language bug that will get fixed.

I mean the wrong return value for ranges::{copy,move}_backward (see D130968). We even had wrong tests for those (i.e., the tests were testing for the *wrong* value). This sort of confusion of whether we are talking about the begin/end of the actual range or the reversed range is inherent when using reverse iterators. I take it as empirical evidence that using reverse iterators in the implementation adds complexity and potential to introduce bugs.

Do you already have that follow-up? I don't know how much extra complexity will result in removing __unconstrained_reverse_iterator.

Not yet. The change amounts to a couple additional functions in inplace_merge.

This optimization was reported as a performance bug. See https://github.com/llvm/llvm-project/issues/33687. This was one of the reasons the overload was introduced. If we remove this optimization we should at least comment there that it was removed any why.

Reading the original issue, I'm not convinced we should fix this at all. Unfortunately, it doesn't really explain the actual problem the issue author had, which would be the most important part. As for open source examples, it seems that the vector::insert function could easily use copy_backward instead of copying with reverse iterators, and ToolbarActionBar::GetActions() doesn't seem like the kind of function where this kind of microoptimization could ever matter. Addressing this issue is not just a matter of putting in the effort. We are paying real costs for the added complexity, and by extension so are our users (when we introduce a bug, for example) -- and note that this applies to pretty much *all* of our users, whereas the optimization would benefit only a certain subset. I can certainly be convinced otherwise by some concrete, real-world examples.

In any case, if we do this optimization at all, it should apply equally to the input and the output sequence. If that is too complicated to implement, I'll take it as further evidence that this optimization doesn't justify the added cost.

Just to be clear, I'm not against this change. I don't know which way to implement copy and friends is the right way, but I want to make sure we don't regress unexpectedly in some area.

It sounds like we can agree that if constexpr would be the best way if it were feasible. I really hope it becomes feasible at some point in the future.

Another thing to consider is that I think D132505 could implement the algorithm once instead of four times if we use the "forward with iterator wrappers" approach and adding specializations for the wrappers. D132505 is an improvement either way, so I'm not that concerned. I also haven't tried to implement the forwarding version, so take it with a grain of salt. Other suggestions to remove the duplicates are always appreciated!

I don't particularly like this sort of decorator pattern, although it can definitely be the lesser of two evils (for D132505, it's likely justified, though I haven't looked closely yet).

libcxx/include/__algorithm/move.h
33

@philnik Thanks a lot for digging this up! Looks like we don't have to do anything (thankfully, because it would complicate the logic quite a bit).

var-const updated this revision to Diff 462645.Sep 23 2022, 9:54 PM
var-const marked an inline comment as done.

Feedback.

philnik accepted this revision.Sep 24 2022, 1:29 AM

By the way, are there any C++11-and-later features that you feel would *really* make a difference if they were available in C++03? I gave it some thought, and if constexpr seemed to be number one (and number 2 would be auto and decltype(auto) as the return type of a function).

Other than concepts (would probably a bit too much to ask), lambdas would definitely be my number one. It would just be so much nicer to use algorithms instead of raw loops everywhere. C++11 attributes would also be quite nice.

AFAICT this was a language bug that will get fixed.

I mean the wrong return value for ranges::{copy,move}_backward (see D130968). We even had wrong tests for those (i.e., the tests were testing for the *wrong* value). This sort of confusion of whether we are talking about the begin/end of the actual range or the reversed range is inherent when using reverse iterators. I take it as empirical evidence that using reverse iterators in the implementation adds complexity and potential to introduce bugs.

Yeah, I don't think this was because of reverse_iterators. My intuition about what {copy, move}_backward should return is just backwards.

This revision is now accepted and ready to land.Sep 24 2022, 1:29 AM

Fix the CI.

var-const updated this revision to Diff 464186.Sep 30 2022, 2:24 AM

Fix the CI.

var-const updated this revision to Diff 464466.Sep 30 2022, 8:38 PM

Fix the CI.

var-const updated this revision to Diff 464482.Oct 1 2022, 2:13 AM

Fix the CI.

var-const updated this revision to Diff 464527.Oct 1 2022, 3:50 PM

Fix the CI.

I suspect issues in this commit. If Clang is built on top of libc++ with this change in, Clang misbehaves (produces incorrect behaviour and then crashes) - I have bisected the issue down to this exact commit.

I don't know exactly what/why etc yet.

I suspect issues in this commit. If Clang is built on top of libc++ with this change in, Clang misbehaves (produces incorrect behaviour and then crashes) - I have bisected the issue down to this exact commit.

I don't know exactly what/why etc yet.

I don't think I'll be able to spend time on figuring out exactly what this patch breaks, but to reproduce the issue:

  1. Build libc++
  2. Build clang using the libc++ from 1. as standard C++ library
  3. Use clang from 2. to build a trivial C++ application

The test application I've tried to build in step 3 is https://github.com/mstorsjo/llvm-mingw/blob/master/test/hello-cpp.cpp, with a trivial compilation command like i686-w64-mingw32-clang++ hello-cpp.cpp -o i686/hello-cpp.exe. When building this trivial C++ example, Clang incorrectly produces these diagnostics:

In file included from hello-cpp.cpp:19:
In file included from D:/a/_temp/msys64/llvm-mingw/include/c++/v1/iostream:43:
In file included from D:/a/_temp/msys64/llvm-mingw/include/c++/v1/ios:221:
In file included from D:/a/_temp/msys64/llvm-mingw/include/c++/v1/__locale:15:
In file included from D:/a/_temp/msys64/llvm-mingw/include/c++/v1/__memory/shared_ptr.h:22:
In file included from D:/a/_temp/msys64/llvm-mingw/include/c++/v1/__memory/allocation_guard.h:14:
In file included from D:/a/_temp/msys64/llvm-mingw/include/c++/v1/__memory/allocator_traits.h:14:
In file included from D:/a/_temp/msys64/llvm-mingw/include/c++/v1/__memory/construct_at.h:20:
D:/a/_temp/msys64/llvm-mingw/include/c++/v1/new:285:12: error: no matching function for call to '__libcpp_operator_new'
    return __libcpp_operator_new(__size, __align_val);
           ^~~~~~~~~~~~~~~~~~~~~
D:/a/_temp/msys64/llvm-mingw/include/c++/v1/new:262:7: note: candidate template ignored: substitution failure: deduced incomplete pack <size_t, (no value)> for template parameter '_Args'
void* __libcpp_operator_new(_Args ...__args) {
      ^
D:/a/_temp/msys64/llvm-mingw/include/c++/v1/new:326:14: error: no matching function for call to '__libcpp_operator_delete'
      return __libcpp_operator_delete(__ptr, __align_val);
             ^~~~~~~~~~~~~~~~~~~~~~~~
D:/a/_temp/msys64/llvm-mingw/include/c++/v1/new:272:6: note: candidate template ignored: substitution failure: deduced incomplete pack <void *, (no value)> for template parameter '_Args'
void __libcpp_operator_delete(_Args ...__args) {
     ^

And after that, Clang crashes. See https://github.com/mstorsjo/llvm-mingw/actions/runs/3166862572/jobs/5158428014 for the issue in action.

Note that while this could look like an issue in the libc++ headers that Clang is including here, the same issue does not appear when Clang is built on top of a preexisting libstdc++ or older libc++.

I haven't tested, but would presume that the same issue appears when Clang targets other OSes too.

vitalybuka reopened this revision.Oct 2 2022, 4:54 PM
This revision is now accepted and ready to land.Oct 2 2022, 4:54 PM

@mstorsjo Thank you for the report! The patch has been reverted by @vitalybuka. The reverting commit links to some sanitizer findings which are hopefully related to the Clang breakage you were seeing. I'll investigate.

var-const abandoned this revision.Jan 30 2023, 12:33 AM