- implement std::ranges::split_view (last c++20 view)
var-const philnik ldionne
- Group Reviewers
- rGa2b3ab8f7786: [libc++][ranges] implement `std::ranges::split_view`
The implementation looks mostly good. I didn't look at the tests yet.
Having this first seems like a bad idea. On 64-bit platforms this results in 7 bytes of padding. Maybe it would make sense to bit-pack this into the split_view<_View, _Pattern>*?
Why the comment here specifically, but not anywhere else?
Can you move this directly below the split_view, like we normally do?
same for the other one.
Looking great so far, thank you for working on this!
Can we omit it?
Nit: s/ele/elem/ (or el if you prefer a shorter name).
Also I don't think we have to follow the order in the standard, but there it's the last member variable.
@philnik We've been doing it this way for many (perhaps all) other views, should we change the others as well?
Question: is it not guaranteed to be 1?
I'll just leave a comment here to make sure we don't forget to revisit this test later (I presume it's a TODO).
Optional: can counting be refactored out into a base class that is used as a mixin for RangeWithCounting?
Nit: can you use a more visible number for splitting, e.g. 0 or -1? 3 doesn't really stand out.
Nit: I'm not sure what st stands for, can you consider a more descriptive name?
Would it be possible to reuse tests from /range.adaptors/range.lazy.split/general.pass.cpp?
I think so, since it's an extension. I think we used [[nodiscard]] before because we wanted the extension for most users, but back then we didn't enable the [[nodiscard]] extensions by default. Now we do, so we can just say "don't disable the extension if you want it".
Looking through the views, it seems like we use [[nodiscard]] on views, and didn't declare views::iota, views::istream and views::zip with any [[nodiscard]] version.
Yes I will move the bool to the end
let's do the others in a separate change
I am not totally sure as that is the implementation detail of std::ranges::search. What we care here is that the first time it is called the numbers go up and from the second time, the numbers won't go up anymore.
We can make sure that split_view is always at least 2 byte aligned (which it is in most cases anyways), which guarantees that the low bit of the pointer is always zero. Since we know that the low bit is always zero, we can use it as a flag. We just have to bit-and the pointer if we want to dereference it. I have an implementation of a utility class for this purpose in D140278, but that's currently not constexpr-friendly and needs some more work in general. Using it would essentially mean that we can't ship split_view in LLVM16, but I think QoI is more important than having split_view now.
This looks reasonable to me. This is missing tests for iterators though. I'll let @var-const give the final approval.
For each issue being marked as completed, can you please make sure that we have a regression test? I don't mind if that means you don't mark them as completed in this patch, but I would like to avoid marking something as completed if we don't have a test for it.
Can you remove this constraint and see if anything fails? If not, it means we're missing test coverage. In particular, if you have a Pattern that would both work with all_t and is also the range_value_t of your range, I guess we want to take this deduction guide, not the one above.
I agree we should try to use pointer_int_pair here. One thing we could do is ship split_view in LLVM 16, but ship (just that view) as experimental, still. We could guard just that view with _LIBCPP_ENABLE_EXPERIMENTAL.
Edit: Thinking about it some more, I think we'll always need to reinterpret_cast stuff around to make this work. It's not clear to me that it'll ever work inside constexpr (quite unfortunately). If we don't think we have a reasonable path forward to make it work inside constexpr, it probably doesn't make sense to artificially delay this "just because". @philnik , unless you can think of a way to make pointer_int_pair constexpr in the future and you think it is likely to materialize, I wouldn't delay this.
High-level comment: @var-const had found some really surprising behavior when using lazy_split with e.g. C-style strings, because the separator would be e.g. abc\0. He had a test file where he tested a bunch of these corner cases, which are (unfortunately) mandated by the standard, despite being really surprising for users. I think split probably has the same corner cases, so we should also test those. It's probably easy to copy the tests over.
Addressed comments look good, I'll hold off final approval until the test coverage is finished. Thanks!
We don't do this optimization in a very similar case of lazy_split_view::outer-iterator (neither do the other implementations, for lazy_split_view or split_view). More importantly, is this optimization worth doing? It adds a runtime penalty on many invocations of operator++ and operator==, while I'm not sure if the benefit is worth it -- it makes the iterator type occupy less space, but it still won't fit into a register, and it doesn't seem likely that users are going to store it often. It seems like it would penalize the most common use case for the sake of a more niche one.
While not super efficient, we can always just allocate a chunk of memory during constant evaluation to store a pointer and int.
I would expect the penalty to be extremely small. The operation itself is very fast on modern machines, and the compiler should be able to hoist the bit-and out of a loop. Here is a simple example: https://godbolt.org/z/Y7T1Ge6zW
What if it is a constexpr variable in namespace scope? Presumably you can't leak the memory allocation in the constexpr context?
The constraint does not seem to be able to disambiguate these two overloads.
Thanks a lot for working on this! LGTM with just a few comments.
(two more important comments are to fix a stray lazy_split and update the release notes (for release notes it's fine to do a follow-up). The rest is optional)
This should also be mentioned in the release notes. It might be easier to do that in a follow-up.
Optional: maybe make one of the next ranges have more characters than one?
Do we need to also check for cases where there are two or more separators in a row and when the input is empty? If it's covered by general.pass.cpp, that's fine.
Nit: any reason not to use same_as in the statement above? (If it's a compiler bug or something like that, never mind)
Optional: a typedef for SplitIter<Range<Iter>, Range<PatternIter> would cut a little boilerplate here. If you prefer having this more explicit, that's totally fine too.
Optional: maybe advance the iterator to also check the case when it equals the sentinel?
I guess the second sentence here has now become obsolete, and should be replaced?
@huixie90 I think you forgot to update the Ranges status page, since https://libcxx.llvm.org/Status/Ranges.html is still showing split_view as not being done!