This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Future-proof std::copy for ranges
AbandonedPublic

Authored by ldionne on May 5 2021, 2:56 PM.

Details

Reviewers
None
Group Reviewers
Restricted Project
Summary

When we eventually get to implementing algorithms on ranges, we'll want
to avoid re-writing the algorithms from scratch. To do so, we'll need to
refactor how we write algorithms so that the same core implementation can
be used by both normal algorithms and ranges algorithms. The ranges
algorithms can't just call the iterator ones because the ranges algorithms
are more general (e.g. they accept an iterator and a sentinel).

The trick is to factor the algorithm into a private name and make sure
it works on an (iterator, sentinel) pair, and then use that from the
normal algorithm. Once we add the ranges algorithm, we will only need
to shim the result into the appropriate ranges result type (in_out_result
for ranges::copy).

Diff Detail

Event Timeline

ldionne created this revision.May 5 2021, 2:56 PM
ldionne requested review of this revision.May 5 2021, 2:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2021, 2:56 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript

I am putting this up as a strawman because we will need to go through all of the algorithms and do something similar at some point (before we implement the ranges:: algorithms).

Basically, we could apply a similar change to all algorithms that require it so that we're ready to implement the ranges:: version when the time comes. Here's how we would implement ranges::copy:

template<std::input_iterator I, std::sentinel_for<I> S, std::weakly_incrementable O>
  requires std::indirectly_copyable<I, O>
constexpr ranges::copy_result<I, O> copy( I first, S last, O result ) {
  auto [r1, r2] = _VSTD::__copy(first, last, result);
  return {r1, r2};
}

template< ranges::input_range R, std::weakly_incrementable O >
  requires std::indirectly_copyable<ranges::iterator_t<R>, O>
constexpr ranges::copy_result<ranges::borrowed_iterator_t<R>, O> copy(R&& r, O result) {
  return ranges::copy(ranges::begin(r), ranges::end(r), result);
}

This avoid duplicating any of the actual code in the copy algorithm. That doesn't represent *that much* for std::copy, but for other algorithms it'll be more significant, so we want to have a mechanical way of doing it. For algorithms that take a projection, we can implement the internal algorithm taking a projection and simply pass an identity function to implement the regular std:: algorithm. There are still some issues to think about with this approach, notably that we're copying/moving iterators around a lot more than we used to, which may or may not be a problem depending on whether iterators are cheap to copy.

libcxx/include/__algorithm/unwrap_iter.h
88

This is pretty terrible, and we'll need to introduce one for algorithms that return triples.

I think we should probably write our own little in_out_result and in_in_out_result pre-C++20 to avoid depending on pair here.

You can guess that I dislike any approach that seems to lead to one-file-per-function. :) What I personally would do is,
(1) Rename <algorithm> to <__algorithm/base.h>, and have <algorithm> simply #include <__algorithm/base.h>.
(2) Move all public entrypoints (sort, lower_bound, copy...) into <__algorithm/classic.h>, and have <algorithm> also include that. Leave all the private implementations (__sort, __lower_bound, __copy...) in <__algorithm/base.h>.
(3) Change all private implementations from template<class _Iter> to template<class _Iter, class _Sent>. (This change will be local to <__algorithm/base.h>.)
(4) Implement <__algorithm/ranges.h> as more-or-less an exact copy of <__algorithm/classic.h>, just using niebloids instead of plain old function templates.

The important thing IMO is to have a way for internal users to get the classic stuff without the Ranges stuff.

Btw, I observe that right now <string_view> includes <algorithm> AFAICT only for std::min. For that reason, I suggest that a "common prefix" (maybe the longest common prefix :P) of our two approaches would be for you to make a PR pulling out <__algorithm/min_max.h>, which could also serve as a testbed for some aspects of Ranges. (Eventually we'll have to create std::ranges::min and std::ranges::max, but we certainly don't want to drag all of Ranges into <string_view>...)

std::copy is also an "interesting" first testbed, because it's all tangled up with __wrap_iter and __is_cpp17_contiguous_iterator, which are still quite actively mutating at the moment.

You can guess that I dislike any approach that seems to lead to one-file-per-function. :) What I personally would do is,
[...]

std::copy is also an "interesting" first testbed, because it's all tangled up with __wrap_iter and __is_cpp17_contiguous_iterator, which are still quite actively mutating at the moment.

Thanks for your comments. It seems that our two approaches are basically the same except when it comes to how we are to split the header files.

I would like to have more discussion around aspects like how we are to bridge between ranges and classic algorithms in the most efficient and simplest way possible.

tcanens added a subscriber: tcanens.May 7 2021, 3:48 PM

I think something like remove or rotate (and more generally algorithms that require the use of the new iter_move and iter_swap customization points) might be a more interesting exercise.

zoecarver added inline comments.
libcxx/include/__algorithm/copy.h
28

What's the benefit of putting these into a namespace? Once we add the CPOs we're going to create two more namespaces. I think this namespace might add a bit of confusions, and I don't see any benefit (especially in such a small file).

libcxx/include/__algorithm/unwrap_iter.h
1

My personal preference: do this in three PRs:

  1. Move unwrap_iter into its own header. This you could just land without review; it's an obviously correct nfc.
  2. Move copy into its own header.
  3. Add the CPO.

I'd also support combining 2 and 3.

ldionne abandoned this revision.Sep 24 2021, 9:37 AM

After giving more thought to this, it appears that we're going to have to handle each algorithm on its own. We'll try to avoid duplication as much as possible, however it's not clear to me that a 100% mechanical approach is going to work for all algorithms.