This is an archive of the discontinued LLVM Phabricator instance.

[libcxx][ranges] Add `ranges::join_view`.

Authored by zoecarver on Aug 6 2021, 3:34 PM.


Group Reviewers
Restricted Project
rGdf324bba5c4c: [libcxx][ranges] Add `ranges::join_view`.

Diff Detail

Event Timeline

zoecarver created this revision.Aug 6 2021, 3:34 PM
zoecarver requested review of this revision.Aug 6 2021, 3:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2021, 3:34 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
tcanens added a subscriber: tcanens.Aug 6 2021, 9:31 PM
tcanens added inline comments.
ldionne requested changes to this revision.Aug 11 2021, 12:52 PM
ldionne added inline comments.

Not needed.


If you agree with what I say below, this should be removed.


Instead, I would do:

static constexpr bool _UseCache = !is_reference_v<_InnerRange>;
using _Cache = _If<_UseCache, __non_propagating_cache<remove_cv_t<_InnerRange>>, __empty_cache>;
[[no_unique_address]] _Cache __cache_;

That way, you get rid of __inner_cache altogether.


__iterator and __sentinel are templates -- can we define them out-of-line?


I think you should remove the TODO since this is and it will almost certainly be voted into the Standard.


IMO the spec is easier to understand as-is, and it doesn't require introducing a _CONSTEXPR_TERNARY macro that is only used in two places.

I don't mind if you use a IIFE or not.


I don't understand why the spec allows modifying a member of the parent join_view from the iterator. That's very sneaky and can lead to e.g. race conditions if someone does:

join_view v = ...;
auto it1 = v.begin();
auto it2 = v.begin();

  ++it1; // calls satisfy() under the hood, which modifies `v`

  ++it2; // calls satisfy() under the hood, which modifies `v`

// sneaky race condition since you had a copy of different iterators in each thread

Basically, I think it's quite vexing that incrementing an iterator modifies the underlying view in this case, and I don't fully understand why that is even required. What use case does that enable? Is the criteria is_reference_v<range_reference_t<V>> meant to mean something like "the outer range creates its inner ranges on-the-fly and they are temporary objects"? I'm having trouble following here.

CC @tcanens in case you can share some insight here.


You don't need the constraint anymore, since you're storing an optional and never default-constructing your inner iterator.

That isn't what the proposed resolution of says right now, but I'll send an email to LWG to request a change to the proposed resolution, since I think it's likely just an oversight.

Can you please add tests where the inner iterator is not default constructible to make sure that we can default construct the join_view::__iterator? Basically what we'd add if we were fixing the LWG issue after having implemented join_view per the current spec.


I think this should not be return ++*this, but merely ++*this. Otherwise, if ++*this returns something else than void, you'll get an error like void function '<name>' should not return a value.

Please add a test that exercises that.


Please define out-of-line.


Make sure you test explicit-ness.


This simple re-formatting makes the similarity between this and the const version below more obvious.


Same reason as above, this makes the similarity between the const and the non-const versions more obvious.


Please make sure you test the explicit-ness of this deduction guide with something like that:

join_view v = some_range; // should not compile
join_view v(some_range); // should compile

I'll make this change in a separate patch, I think I have something that works in a WIP branch right now. Thanks for bringing this to our attention.


This is a general comment not only applicable to this test. We seem to be missing coverage for a lot of range "topologies" (what I mean by that is an empty range, a range with 1 subrange, 2 subranges, an empty subrange, subranges of empty sizes, etc..).

I don't think it is reasonable to replicate each test for all topologies, but we should add tests with something like a random access range (const and non-const) checking all topologies. What I mean here is that I don't care if you don't test a range like [[x], [], [y, z]] with all combinations of archetypes, but you should test it at least with a few that are relevant (that probably is restricted to const and non-const).


Not needed (here and everywhere else).

This revision now requires changes to proceed.Aug 11 2021, 12:52 PM
tcanens added inline comments.Aug 11 2021, 2:41 PM

It's an input range for this case, so everything from the second begin() onwards is undefined.

Yes, this is the case where the inner ranges are created on the fly and we need to store the range somewhere while it is being iterated over, and the only reasonable place to store it is the view.

ldionne added inline comments.Aug 11 2021, 2:58 PM

Is there a reason why we don't use the criteria input_range<V> && !forward_range<V> to denote "it's only an input range and I can only do a single-pass on it" instead of !is_reference_v<range_reference_t<V>>?

Also, if everything from the second begin() onwards is undefined, then we can only ever have one iterator to that range. So would it be valid to store the temporary inner range in the iterator itself instead of storing it in the parent join_view?

zoecarver marked 13 inline comments as done.Aug 11 2021, 3:31 PM
zoecarver added inline comments.

I have removed this cache altogether now but if I need to add it back I'll do it like this.


No longer have two paths here.

tcanens added inline comments.Aug 11 2021, 3:55 PM

V, the underlying range, can be anything up to random access (for instance, iota(0, 42) | transform(to_string)), but if it is a range of prvalue ranges, then join_view<V> can only ever be input.

It's not valid to store the inner range in the iterator itself without an indirection. Iterators are movable, join_view's iterator must store an iterator into the inner range that is currently being iterated over, but moving a range can invalidate iterators into it.

IIRC Eric once said something along the lines of "views are lazy algorithms, and some algorithms have state". This is one of those stateful algorithms.

zoecarver marked 6 inline comments as done.Aug 11 2021, 6:24 PM
zoecarver added inline comments.

I can't seem to figure out how to test this. It seems like both compile:

I'll think some more and see if I can come up with something.


Added four tests: children all have different sizes (0-4), parent is empty, parent size is one, parent and child size is one. Let me know what you think.

zoecarver updated this revision to Diff 365889.Aug 11 2021, 6:26 PM
zoecarver marked an inline comment as done.

Fix based on review.

Add some tests.

Quuxplusone added inline comments.

@zoecarver: Remember that SFINAE works only on immediate contexts. requires(T r) { invokesImplicitCTAD(r); } is invariably true, regardless of what the body of invokesImplicitCTAD looks like.
I recall you had some macro in one of these recent reviews that was also trying to SFINAE on the body of a function (return-type-SFINAEing on the decltype of a lambda whose body contained the interesting part); again, that doesn't work. SFINAE only looks at the well-formedness of the exact expression being evaluated, not at the-bodies-of-all-the-things-that-expression-might-call.

The other problem with your example is that X(T) creates an implicit deduction guide. Change it to X(int) or X(std::type_identity_t<T>) to fix that part.

I don't think there's any way to test CTAD-ability with SFINAE. With a literal type, you can try something like this but right now this works only on Clang (not GCC nor MSVC), and it certainly doesn't help with join_view specifically. So the "should not compile" test would have to be a .fail.cpp, unless I'm mistaken (in which case I smell a blog post in the offing... ;))

ldionne accepted this revision.Aug 12 2021, 12:05 PM

LGTM with my comments applied and caching restored.


Okay, I think I understand now.

@zoecarver You need to restore caching, otherwise this kind of code won't work since you'll be re-evaluating *outer every time you increment the inner iterator:

std::set<int> already_called;

auto rng = std::views::iota(0, 42) | std::views::transform([&](int i) -> std::string {
    return std::to_string(i);

for (char c : std::views::join(rng)) {
    // fails without caching cause you re-evaluate the lambda every time

Please add tests for that.

Also, please add tests with a view that invalidates its iterators upon move (it could simply make a copy and hence reallocate the underlying storage to another place). If you cached *outer inside the iterator (instead of in the parent view), your test should fail when you move the join_view::iterator, because that'll move the cached value of *outer (and we just said that invalidates iterators into it). IIUC that's what Tim is saying above.


You can use a .verify.pass.cpp test, which can contain statements.


We also want tests like:

  • Only one subrange and it is empty
  • All subranges are empty
  • First subrange is empty, other are not
  • Only last subrange is empty

I think it would be reasonable to test those in a fashion similar to:

auto rng = std::vector{std::vector{...}, ..., std::vector{...}} | std::views::join;
std::vector<T> joined;
for (auto x : rng) // since we don't have ranges::copy yet
assert(joined == std::vector{...});
This revision is now accepted and ready to land.Aug 12 2021, 12:05 PM
zoecarver updated this revision to Diff 366155.Aug 12 2021, 5:26 PM
zoecarver marked 2 inline comments as done.

Address review comments.

Note: I have not fixed the bots yet.

I have a few inline comments that I want to resolve tomorrow and I also need to fix the bots. Then I'll land this.


I know you wanted me to do

static constexpr bool _UseCache = !is_reference_v<_InnerRange>;
using _Cache = _If<_UseCache, __non_propagating_cache<remove_cv_t<_InnerRange>>, __empty_cache>;
[[no_unique_address]] _Cache __cache_;

which I agree would be better, but I was getting compiler errors, because __non_propagating_cache<remove_cv_t<_InnerRange>> is not valid when _InnerRange is a reference type (because __non_propagating_cache requires is_object_v).

There may be a creative solution to work around this.


I spent a bit of time trying to find a small reproducer and this is as far as I got. I'll try to file a bug tomorrow.


Also, please add tests with a view that invalidates its iterators upon move (it could simply make a copy and hence reallocate the underlying storage to another place).

Done. I made ValueView's move constructor invalidate the moved-from iterator. This should catches both issues.

(Side note: I made the buffer in increment.pass.cpp into two buffers, because while debugging this, I found that we will just read past end, so it *looks* like there isn't a problem.)


I cannot use the standard's suggestion here because of


I'm just using the same method as the other tests. If you want vector for some reason, let me know, but I'd rather not pull that in here.


You have no idea how fun it was to debug this :P

tcanens added inline comments.Aug 12 2021, 5:57 PM

Just use remove_cvref_t? If it's not a reference, the ref part has no effect. If it is, then we don't care what it ends up being anyway (but it is an object type so it satisfies the constraint).

zoecarver marked an inline comment as done.Aug 13 2021, 9:24 AM
zoecarver added inline comments.

A simple solution that solves this nicely. Thanks, Tim.

zoecarver updated this revision to Diff 366298.Aug 13 2021, 9:24 AM
zoecarver marked an inline comment as done.

Use inline cache.

Fix GCC.

This revision was landed with ongoing or failed builds.Aug 13 2021, 11:31 AM
This revision was automatically updated to reflect the committed changes.