This is an archive of the discontinued LLVM Phabricator instance.

[libc++][ranges] implement `std::ranges::split_view` (targeting llvm 16)
ClosedPublic

Authored by huixie90 on Jan 18 2023, 3:52 PM.

Details

Summary
  • implement std::ranges::split_view (last c++20 view)

Diff Detail

Event Timeline

huixie90 created this revision.Jan 18 2023, 3:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 3:52 PM
huixie90 requested review of this revision.Jan 18 2023, 3:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 3:52 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
huixie90 added inline comments.Jan 18 2023, 3:54 PM
libcxx/include/__ranges/split_view.h
60

pointless _LIBCPP_NO_UNIQUE_ADDRESS .

The implementation looks mostly good. I didn't look at the tests yet.

libcxx/include/__ranges/split_view.h
121

Having this first seems like a bad idea. On 64-bit platforms this results in 7 bytes of padding. Maybe it would make sense to bit-pack this into the split_view<_View, _Pattern>*?

176

Why the comment here specifically, but not anywhere else?

190–194

Can you move this directly below the split_view, like we normally do?

201

same for the other one.

Looking great so far, thank you for working on this!

libcxx/include/__ranges/split_view.h
60

Can we omit it?

91

Nit: s/ele/elem/ (or el if you prefer a shorter name).

121

Also I don't think we have to follow the order in the standard, but there it's the last member variable.

201

@philnik We've been doing it this way for many (perhaps all) other views, should we change the others as well?

libcxx/test/std/ranges/range.adaptors/range.split/begin.pass.cpp
152

Question: is it not guaranteed to be 1?

170

I'll just leave a comment here to make sure we don't forget to revisit this test later (I presume it's a TODO).

libcxx/test/std/ranges/range.adaptors/range.split/ctor.range.pass.cpp
31

Optional: can counting be refactored out into a base class that is used as a mixin for RangeWithCounting?

libcxx/test/std/ranges/range.adaptors/range.split/end.pass.cpp
40

Nit: can you use a more visible number for splitting, e.g. 0 or -1? 3 doesn't really stand out.

56

Nit: I'm not sure what st stands for, can you consider a more descriptive name?

libcxx/test/std/ranges/range.adaptors/range.split/general.pass.cpp
2

Would it be possible to reuse tests from /range.adaptors/range.lazy.split/general.pass.cpp?

philnik added inline comments.Jan 19 2023, 8:57 PM
libcxx/include/__ranges/split_view.h
201

I think so, since it's an extension. I think we used [[nodiscard]] before because we wanted the extension for most users, but back then we didn't enable the [[nodiscard]] extensions by default. Now we do, so we can just say "don't disable the extension if you want it".

Looking through the views, it seems like we use [[nodiscard]] on views, and didn't declare views::iota, views::istream and views::zip with any [[nodiscard]] version.

huixie90 updated this revision to Diff 490806.Jan 20 2023, 6:21 AM
huixie90 marked 9 inline comments as done.

address comments

huixie90 added inline comments.Jan 20 2023, 7:45 AM
libcxx/include/__ranges/split_view.h
121

Yes I will move the bool to the end
Could you please elaborate on bit-packing?

201

let's do the others in a separate change

libcxx/test/std/ranges/range.adaptors/range.split/begin.pass.cpp
152

I am not totally sure as that is the implementation detail of std::ranges::search. What we care here is that the first time it is called the numbers go up and from the second time, the numbers won't go up anymore.

philnik added inline comments.Jan 20 2023, 8:09 AM
libcxx/include/__ranges/split_view.h
121

We can make sure that split_view is always at least 2 byte aligned (which it is in most cases anyways), which guarantees that the low bit of the pointer is always zero. Since we know that the low bit is always zero, we can use it as a flag. We just have to bit-and the pointer if we want to dereference it. I have an implementation of a utility class for this purpose in D140278, but that's currently not constexpr-friendly and needs some more work in general. Using it would essentially mean that we can't ship split_view in LLVM16, but I think QoI is more important than having split_view now.

ldionne accepted this revision as: ldionne.Jan 20 2023, 10:05 AM

This looks reasonable to me. This is missing tests for iterators though. I'll let @var-const give the final approval.

libcxx/docs/Status/Cxx2bIssues.csv
134–137

For each issue being marked as completed, can you please make sure that we have a regression test? I don't mind if that means you don't mark them as completed in this patch, but I would like to avoid marking something as completed if we don't have a test for it.

libcxx/include/__ranges/split_view.h
121

Can you remove this constraint and see if anything fails? If not, it means we're missing test coverage. In particular, if you have a Pattern that would both work with all_t and is also the range_value_t of your range, I guess we want to take this deduction guide, not the one above.

121

I agree we should try to use pointer_int_pair here. One thing we could do is ship split_view in LLVM 16, but ship (just that view) as experimental, still. We could guard just that view with _LIBCPP_ENABLE_EXPERIMENTAL.

Edit: Thinking about it some more, I think we'll always need to reinterpret_cast stuff around to make this work. It's not clear to me that it'll ever work inside constexpr (quite unfortunately). If we don't think we have a reasonable path forward to make it work inside constexpr, it probably doesn't make sense to artificially delay this "just because". @philnik , unless you can think of a way to make pointer_int_pair constexpr in the future and you think it is likely to materialize, I wouldn't delay this.

libcxx/test/std/ranges/range.adaptors/range.split/begin.pass.cpp
2

High-level comment: @var-const had found some really surprising behavior when using lazy_split with e.g. C-style strings, because the separator would be e.g. abc\0. He had a test file where he tested a bunch of these corner cases, which are (unfortunately) mandated by the standard, despite being really surprising for users. I think split probably has the same corner cases, so we should also test those. It's probably easy to copy the tests over.

var-const accepted this revision as: var-const.Jan 20 2023, 2:25 PM

Addressed comments look good, I'll hold off final approval until the test coverage is finished. Thanks!

libcxx/include/__ranges/split_view.h
121

We don't do this optimization in a very similar case of lazy_split_view::outer-iterator (neither do the other implementations, for lazy_split_view or split_view). More importantly, is this optimization worth doing? It adds a runtime penalty on many invocations of operator++ and operator==, while I'm not sure if the benefit is worth it -- it makes the iterator type occupy less space, but it still won't fit into a register, and it doesn't seem likely that users are going to store it often. It seems like it would penalize the most common use case for the sake of a more niche one.

philnik added inline comments.Jan 20 2023, 6:46 PM
libcxx/include/__ranges/split_view.h
121

@philnik , unless you can think of a way to make pointer_int_pair constexpr in the future and you think it is likely to materialize, I wouldn't delay this.

While not super efficient, we can always just allocate a chunk of memory during constant evaluation to store a pointer and int.

We don't do this optimization in a very similar case of lazy_split_view::outer-iterator (neither do the other implementations, for lazy_split_view or split_view). More importantly, is this optimization worth doing? It adds a runtime penalty on many invocations of operator++ and operator==, while I'm not sure if the benefit is worth it -- it makes the iterator type occupy less space, but it still won't fit into a register, and it doesn't seem likely that users are going to store it often. It seems like it would penalize the most common use case for the sake of a more niche one.

I would expect the penalty to be extremely small. The operation itself is very fast on modern machines, and the compiler should be able to hoist the bit-and out of a loop. Here is a simple example: https://godbolt.org/z/Y7T1Ge6zW

huixie90 added inline comments.Jan 21 2023, 5:05 AM
libcxx/include/__ranges/split_view.h
121

@philnik

While not super efficient, we can always just allocate a chunk of memory during constant evaluation to store a pointer and int.

What if it is a constexpr variable in namespace scope? Presumably you can't leak the memory allocation in the constexpr context?

philnik added inline comments.Jan 21 2023, 5:29 AM
libcxx/include/__ranges/split_view.h
121

Right, that doesn't work for constant initialization. Then I don't think it's possible without compiler magic, which I don't think is viable for now.

127–130

This makes it possible to pack the bool into the subrange.

huixie90 updated this revision to Diff 491181.Jan 22 2023, 10:01 AM
huixie90 marked 2 inline comments as done.

added remaining tests

huixie90 retitled this revision from [libc++][ranges] implement `std::ranges::split_view` (WIP on testing, targeting llvm 16) to [libc++][ranges] implement `std::ranges::split_view` (targeting llvm 16).Jan 22 2023, 10:02 AM
huixie90 edited the summary of this revision. (Show Details)
huixie90 marked 2 inline comments as done.
huixie90 added inline comments.
libcxx/include/__ranges/split_view.h
121

The constraint does not seem to be able to disambiguate these two overloads.
https://godbolt.org/z/h6Kcj53qY

var-const accepted this revision.Jan 23 2023, 1:16 PM

Thanks a lot for working on this! LGTM with just a few comments.

(two more important comments are to fix a stray lazy_split and update the release notes (for release notes it's fine to do a follow-up). The rest is optional)

libcxx/docs/Status/Cxx20Papers.csv
197

This should also be mentioned in the release notes. It might be easier to do that in a follow-up.

libcxx/test/std/ranges/range.adaptors/range.split/general.pass.cpp
46

s/lazy_split/split/.

libcxx/test/std/ranges/range.adaptors/range.split/iterator/increment.pass.cpp
27 ↗(On Diff #491181)

Optional: maybe make one of the next ranges have more characters than one?

27 ↗(On Diff #491181)

Do we need to also check for cases where there are two or more separators in a row and when the input is empty? If it's covered by general.pass.cpp, that's fine.

37 ↗(On Diff #491181)

Nit: any reason not to use same_as in the statement above? (If it's a compiler bug or something like that, never mind)

libcxx/test/std/ranges/range.adaptors/range.split/iterator/member_types.compile.pass.cpp
27 ↗(On Diff #491181)

Optional: a typedef for SplitIter<Range<Iter>, Range<PatternIter> would cut a little boilerplate here. If you prefer having this more explicit, that's totally fine too.

libcxx/test/std/ranges/range.adaptors/range.split/sentinel/equal.pass.cpp
36 ↗(On Diff #491181)

Optional: maybe advance the iterator to also check the case when it equals the sentinel?

This revision is now accepted and ready to land.Jan 23 2023, 1:16 PM
huixie90 updated this revision to Diff 491610.Jan 23 2023, 11:22 PM
huixie90 marked 9 inline comments as done.

addressing comments

libcxx/test/std/ranges/range.adaptors/range.split/iterator/increment.pass.cpp
27 ↗(On Diff #491181)

yes it is covered in general.pass.cpp

37 ↗(On Diff #491181)

some apple platforms based on an older version of clang cannot deduce reference types in the same_as checking

This revision was automatically updated to reflect the committed changes.
h-vetinari added inline comments.
libcxx/docs/ReleaseNotes.rst
45–46 ↗(On Diff #491649)

I guess the second sentence here has now become obsolete, and should be replaced?

ldionne added inline comments.Jan 24 2023, 10:56 AM
libcxx/docs/ReleaseNotes.rst
37 ↗(On Diff #491649)

@huixie90 I think you forgot to update the Ranges status page, since https://libcxx.llvm.org/Status/Ranges.html is still showing split_view as not being done!