Implement std::ranges::merge. added unit tests
Details
- Reviewers
philnik ldionne var-const - Group Reviewers
Restricted Project - Commits
- rG25607d143d1d: [libc++] Implement `std::ranges::merge`
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
219 | Typo! |
Thanks a lot for working on this!
libcxx/include/__algorithm/ranges_merge.h | ||
---|---|---|
49 | Nit: I don't think you need to qualify the ranges namespace on merge_result. Applies in a few places below as well. In general, we only qualify function calls due to ADL. | |
50 | Question: why not place this function in the internal __merge namespace? | |
75 | Formatting nit: namespace contents don't have to be indented in LLVM style. We generally follow that, and I think avoiding extra indentation where possible makes code more readable. | |
84 | Ultranit: in __merge_impl, there is no space before the closing > angle brace, can you please make this consistent? | |
95 | Move the iterators / possibly the other arguments as well? | |
111 | Weird formatting here -- can you split the line after the template argument list? | |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
38–39 | FWIW:
| |
109–112 | FWIW, I also think that testing each type gets into the area of diminishing returns pretty quickly. | |
113 | Question: what's the reason to take the arrays by lvalue reference? | |
116 | Nit: it looks like this line should be split into two. | |
123 | Optional: consider adding a typedef for merge_result -- it's pretty verbose and makes this line harder to parse. | |
153 | Nit: consider removing this comment. This function doesn't directly deal with different iterator types (its caller does), so this comment seems a little confusing. | |
155 | Can you add a brief comment to each test case to explain what is being tested? It makes quickly skimming through the file much easier. | |
171 | Can you please also add tests for the following cases?
| |
175 | Nit: s/Borrow/Borrowed/. | |
193 | Nit: using would be helpful here as well for the merge_result type. | |
211 | Question: what does In1In2 stand for? | |
256 | Hmm, the preconditions section in the standard doesn't indicate that the output range should be large enough to hold the result. Do you know if this precondition comes from some other section implicitly? | |
265 | This needs to be asserted, right? (Alternatively, you could assert in the implementation of TracedCopy). | |
291 | Nit: s/merged/be merged/ (or s/merged/merge/). | |
343 | Is defining a custom struct necessary? It seems that passing something like ranges::greater would achieve the same goal with less test code. | |
399 | Optional: s/std::ranges::equal_to{}/{}/ to decrease verbosity and make it more obvious that this argument is unimportant. | |
400 | FWIW, I called the equivalent class for testing stable_sort OrderedValue (and used doubles for ordering because it allows for nice "syntax" to number duplicates, like {6, 6.1}, {6, 6.2}, etc.). But feel free to keep as is. | |
423 | A couple of questions:
| |
470 | Should this be tested for the iterator version as well? (applies to the ConvertibleToBool test case below as well) | |
553 | Optional: I'm not sure this check buys us much. Consider moving these test cases into the "general" section in testImpl. |
address feedback
libcxx/include/__algorithm/ranges_merge.h | ||
---|---|---|
75 | fixed. I used the .clang-format file under the libcxx folder and it is done by clang-format. Perhaps we should update the .clang-format file to do this? | |
95 | the impl takes iterators by reference and use them as l-values, std::move them don't actually have any effects. or do I miss anything? | |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
38–39 | The main reason I dislike having default argument is, for example if I do template <class InIter1, class InIter2, class OutIter, class Sent1 = sentinel_wrapper<InIter1>, class Sent2 = sentinel_wrapper<InIter2>> concept hasMerge = ...; Then in the test if I write static_assert(hasMerge<Iter1, Iter2, Iter3, Iter4, Iter5>); As I reader of the test (without scrolling to the top to see the definition of the hasMerge), I would expect the order of these 5 arguments would match the order of the arguments in std::ranges::merge, i.e. InIter1 (Iter1), Sent1 (Iter2), InIter2 (Iter3), Sent2 (Iter4), OutIter (Iter5) But, in fact, because of the default arguments, the actual order it is testing is InIter1 (Iter1), InIter2 (Iter2), OutIter (Iter3), Sent1 (Iter4), Sent2 (Iter5) The change of the order might lead to unnecessary confusion IMO. Regarding the name, I don't like canMerge or hasMerge. I realised that I don't need to name it. there is already a thing I need in the standard std::invocable, so I changed it to use invocable. Regarding the maintain of the table, another idea I have is to have inline comments to indicate what they are static_assert( std::invocable<MergeFn, int* /*InIter1*/, int* /*Sent1*/, int* /*InIter2*/, int* /*Sent2*/, int* /*OutIter*/>); But I don't like it much Alternatively, we can format arguments in different lines and the order of lines matches the order of the original argument lists in std::ranges::merge (because I use std::invocable) static_assert( std::invocable<MergeFn, int*, int*, int*, int*, int*>); This should be relatively low effort to maintain What do you think? | |
113 | The type of input iterators I am testing has the reference type of non-const l-value reference to int. .e.g | |
171 | added suggested cases. | |
211 | renamed In1 -> InIter1 For withAllIn1Iterators, given the type InIter2 and OutIter, it instantiates the tests for different types for InIter1 Any suggestions for these names? | |
256 | In https://eel.is/c++draft/alg.merge#3
[result, result_last) has to be a range. If in this case array is not large enough [result, result_last) does not denote a range | |
343 | right. I used it for a couple of more tests. e.g. testing when a comparator is a member function pointer or the projection is a member variable pointer. (to test it is calling std::invoke rather than calling operator(). This test is suggested by @philnik | |
423 | This test is suggested by @philnik , this is to make sure we are calling std::invoke rather than opeartor() in the implementation as member pointers won't work with operator() | |
553 | this checks that when both inputs are empty, we don't assign the output. Or did I miss anything? The general testImpl works on array/std::array and this one cannot because array/std::array cannot be zero size. also the output has slightly different configuration. in the testImpl, the output array size is N1 + N2. but here N1 + N2 == 0 and we cannot have empty array. what do you think? |
libcxx/include/__algorithm/ranges_merge.h | ||
---|---|---|
95 | In the other ranges algorithms the _impl function takes the iterators by value. Is there a reason you don't do this here? | |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
38–39 | Do you have any objections to using one concept per range argument? That would allow you to keep the definition of the concepts close to the tests for increased readability and remove a lot of noise. I don't care much for the name, except that it is consistent between algorithm tests. | |
113 | Why not take it by value? We have to care about correctness and not performance in the tests. | |
553 | You've got it the other way around. std::array<int, 0> is well-formed: https://eel.is/c++draft/array.zero. int a[0]; is a compiler-extension. |
libcxx/include/__algorithm/ranges_merge.h | ||
---|---|---|
95 | It also puzzled me a lot why those algorithms have impl function by value. My original revision doesn't have the impl function at all, the range overload delegate to iterator overload, and you pointed out that having an impl function could save extra "moves" of the comparator. But if you take the iterator by value, then you have to move these 5 iterators. Then I am confused by the purpose of the impl function, saving the 3 extra moves of the comparator+projection, at the cost of having extra 5 moves of iterators. (note that the default comparator/projections are stateless and cheaper than moving the iterators, which can be expensive, e.g. zip_iterator) If we don't care about moves, why do we need this impl and not revert it to my original implementations? | |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
38–39 | I don't have objections. in my latest revision, I don't have any concepts defined, I just used std::invocable. I feel that it is more obvious the intent of the test. I am ok to create 2 extra concepts if std::invocable isn't as readable as a good name (any name suggestions?) regarding the argument lists, i think the problem is the nature of this algorithm takes 5 iterators and it is very easy to get lost which iterator is what. Having default arguments of types in my opinion is fine only if we don't change the order of the arguments. but unfortunately the type we want the default are the 2nd and the 4th argument, which will result in changing the order. I do understand having this table format will increase the formatting cost. so what do you think about the other alternatives I suggested? | |
113 | I don't have strong feelings and happy to change it to either. by reference or by value sound the same as long as it is not const. any reasons to prefer one not the other? |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
38–39 | anyway, i think we should agree on the disagreement and move on. since you are the main reader of the test, I applied all your suggestions and hopefully it is the most readable form to you. I followed the pattern in ranges move test, this line has a bug If you have a test for move only input iterator, your test would fail because you are passing lvalue to ranges::move in the concept, but actually it is OK to call ranges::move with prvalue move only input iterators |
At this point it looks good to me (I'll leave the final approval to the other reviewers). Once again, thanks a lot for working on this, and also for starting the discussion on different approaches to testing.
libcxx/include/__algorithm/ranges_merge.h | ||
---|---|---|
51 | Question: can you give an example where decay_t makes an observable difference in this case? | |
75 | Yes, I suspect it's the NamespaceIndentation: Inner setting. | |
95 | Sorry, I got used to the pattern of passing by value and missed that it's actually forwarding references. I think that pattern was following the assumption that iterators are always cheap to copy, which might be somewhat outdated at this point (like the example you give with zip_iterator). Switching to references in impl functions seems like a good idea to me. | |
99 | Nit: there are still a few unnecessary namespace qualifications -- e.g. here, std::weakly_incrementable and std::mergeable below, etc. | |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
38–39 | Some random thoughts: I feel like the best possible syntax for this would be named template arguments if those existed. Toying around with the idea, I came up with the following. I'm not sure if the complexity is worth it, but perhaps it might be useful to discuss. The basic idea is to make it possible to define a list of valid arguments, then override just a single argument with an invalid type. There should be no need to write out each valid argument again, and it should be clear which argument is being overridden. The syntax would look like: using ValidArguments = ArgumentList<int*, int*, int*, int*, int*>; using BadIter = int; using BadSentinel = int; static_assert( is_invocable<ValidArguments>); static_assert(!is_invocable<Override1st<BadIter, ValidArguments>>); static_assert(!is_invocable<Override2nd<BadSentinel, ValidArguments>>); // In theory, this is even chainable: static_assert(!is_invocable<Override1st<BadIter, Override2nd<BadSentinel, ValidArguments>>>); I'm not sure if it's worth the extra boilerplate, but it could use names instead of numbers for the arguments as well: template <class T> using OverrideInIter1 = Override1st<T, ValidArguments>; template <class T> using OverrideSent1 = Override2nd<T, ValidArguments>; static_assert(!is_invocable<OverrideInIter1<BadIter>>); static_assert(!is_invocable<OverrideSent1<BadSentinel>>); It would require a bunch of boilerplate, though it would be shared across many tests, making it perhaps more palatable. Draft (can definitely be simplified): template <class T1, class T2, class T3, class T4, class T5> struct ArgumentList5 { using _1st = T1; using _2nd = T2; using _3rd = T3; using _4th = T4; using _5th = T5; }; // Or: template <class T1, class T2, class T3, class T4, class T5> struct ArgumentList5 : ArgumentList4<T1, T2, T3, T4> { using _5th = T5; }; template <class T, class Arguments> struct Override1st : Arguments { using _1st = T; }; template <class T, class Arguments> struct Override2nd : Arguments { using _2nd = T; }; // ...and so on for each argument... template <class Func, class ArgList> concept is_invocable2 = std::invocable<Func, typename ArgList::_1st, typename ArgList::_2nd>; // ... template <class Func, class ArgList> concept is_invocable5 = std::invocable<Func, typename ArgList::_1st, typename ArgList::_2nd, typename ArgList::_3rd, typename ArgList::_4th, typename ArgList::_5th>; // In the test file: template <class ArgList> concept is_invocable = is_invocable5<std::ranges::merge, ArgList>; See toy example: https://wandbox.org/permlink/jveUh8rCiI7fExpi To be clear, this is just for discussion, not to be addressed in the current patch. | |
113 | I think passing by value is more readable in this case -- it makes it clear that the value can only change within this function's scope and never affects the caller, and it avoids giving the (misleading) impression that the caller expects to get an updated value from the function. The overhead from doing an extra copy is negligible given that it's a test and the value is pretty cheap to copy anyway. | |
171 | std::array<T, 0> should be fine (if I'm understanding the issue correctly). Anyway, the new state looks good. | |
211 | Thanks, it's clearer now with the renaming. I don't have a great suggestion. A perhaps more readable, but even more verbose name could be something like withAllPermutationsOfInIter1 (which leads to withAllPermutationsOfInIter2 and withAllPermutationsOfInIter1AndInIter2). But feel free to keep as is, unless you happen to like this name (or it gives you an idea for a better name). | |
256 | Just curious -- do you happen to have the quote handy that says that a range has to be a certain size to be considered a range? |
libcxx/docs/Status/RangesAlgorithms.csv | ||
---|---|---|
63–67 | Please add a link to this patch. |
libcxx/include/__algorithm/ranges_merge.h | ||
---|---|---|
51 | since the they are passed by reference, _InIter2 can be references (the caller passed lvalue references) and we need to decay the result | |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
38–39 | Thanks for sending the example. Yeah I think this is the right direction we can explore for testing algorithms that take so many arguments, which hopefully can make those tests easier to read. (all the set algorithms and merge algorithm take (InputIterator1, Sentinel1, InputIterator2, Sentinel2, OutputIterator) By slightly tweak your example and use the text book's Name Template Argument approach, chaining arguments can be made linear instead of nested Barry has a different way of naming the arguments with designated initializers and wrap types with values | |
211 | thanks I adopted the names you suggested | |
256 | I think it is not the "size" that important. it is because result_last isn't reachable if the array isn't large enough I think say out array is std::array<int ,3> let me know what do you think of this. I can ask around in sg9 to be sure |
LGTM with my comments addressed.
libcxx/include/__algorithm/ranges_merge.h | ||
---|---|---|
51 | Then shouldn't you use remove_reference_t? | |
98 | I don't think we qualify the concepts anywhere else. | |
105 | I don't think we qualify iterator_t anywhere else. | |
124 | You don't forward in __merge_impl and you didn't move the parameters in the iterator overload. Why are you moving them here? | |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
81 | Why are you forwarding here? I doesn't look like you are making use of that. (Also the iterator version) | |
98 | We normally avoid using declarations in the tests. For types I don't really care though. | |
102–107 | Could you branch locally and rebase that branch on top of D129040 to ensure you don't have any bugs with contiguous iterators? You can download a patch with arc patch D129040. | |
207–221 | Is this in any way different than ranges::owning_view with regards to borrowed-ness? | |
253–255 | Why isn't this checked with forward_iterator, bidirectional_iterator etc.? | |
263 | Is there a reason you don't use a bool and assert that it's not been copy-constructed? That makes the backtrace a lot nicer than using ranges::all_of, since it's crashing/failing to compile right at the source of the problem. | |
290 | I think checking the return type here again is a bit redundant. You're already checking it in testImpl thoroughly. Removing the check makes the test a bit nicer to read. |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
81 | The forwarding is needed, otherwise the tests are not testing what you'd expect. For example, if test is testing HasMergeIter<MoveOnlyInputIter>, it is actually OK to call the algorithm because the algorithm takes it by value. But without forwarding, this helper concept is checking if it is possible to "COPY" the lvalue and call the algorithm, which would fail, | |
98 | I think this is suggested by var-const since the fully qualified name makes the tests less readable due to its long name (plus three template arguments), if I understand his comment correctly. | |
207–221 | ranges::owning_view does not help here. It is a template and you need a concrete container to create a ranges::owning_view With this helper struct , it is very easy to control the iterator type and sentinel type | |
253–255 | Adding more types here make the CI fail because of the execution limits of the compile time execution. These tests are parameterized in three dimensions. The number of permutations grow in cubic. I can enable the runtime tests with those additional iterators with the guard of is_constant_time_evaluated() though | |
263 | The test is testing the elements are copied exactly once. But only assert the test could still pass if it is copy-assigned zero times. Then we need to test all_of the bools are true, which is essentially the same code here? |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
253–255 | You can avoid that by static_asserting in withAllPermutationsOfInIter1AndInIter2 for example. The execution step count is limited per static_assert. | |
263 | You are already checking data, which guarantees that every element has been copied to. If every element is copied from at most once the logical consequence is that every element has been copied exactly once. |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
102–107 | applied D129040 and removed the workarounds in this test and verified that the tests are passing | |
253–255 | This is good to know. Just to clarify, are you suggesting to add static_assert(true) so that template <class OutIter> constexpr void withAllPermutationsOfInIter1AndInIter2() { withAllPermutationsOfInIter1<cpp20_input_iterator<int*>, OutIter>(); withAllPermutationsOfInIter1<forward_iterator<int*>, OutIter>(); withAllPermutationsOfInIter1<bidirectional_iterator<int*>, OutIter>(); withAllPermutationsOfInIter1<random_access_iterator<int*>, OutIter>(); withAllPermutationsOfInIter1<contiguous_iterator<int*>, OutIter>(); static_assert(true); // workaround of constant execution step count } | |
263 | I will consider it in the next patch when I refactored this class into a common place to be reused in the set* tests. To confirm, I think we need to assert in both copy constructor and copy assignment operator. To be honest, I still prefer to an explicit line in the test to say "all of the elements are copied exactly once" what about adding a member function bool copiedOnce() const { return copy_assign == 1;} assert(std::ranges::all_of(out, &TracedCopy::copiedOnce)); or if you use the bool approach bool copiedOnce() const { return copied;} then |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
253–255 | No, unfortunately you have to do: template <class OutIter> void withAllPermutationsOfInIter1AndInIter2() { withAllPermutationsOfInIter1<cpp20_input_iterator<int*>, OutIter>(); withAllPermutationsOfInIter1<forward_iterator<int*>, OutIter>(); withAllPermutationsOfInIter1<bidirectional_iterator<int*>, OutIter>(); withAllPermutationsOfInIter1<random_access_iterator<int*>, OutIter>(); withAllPermutationsOfInIter1<contiguous_iterator<int*>, OutIter>(); static_assert(withAllPermutationsOfInIter1<cpp20_input_iterator<int*>, OutIter>()); static_assert(withAllPermutationsOfInIter1<forward_iterator<int*>, OutIter>()); static_assert(withAllPermutationsOfInIter1<bidirectional_iterator<int*>, OutIter>()); static_assert(withAllPermutationsOfInIter1<random_access_iterator<int*>, OutIter>()); static_assert(withAllPermutationsOfInIter1<contiguous_iterator<int*>, OutIter>()); } | |
263 | We can decide that later. It would probably be a good idea to put a few of these things in support. For example CopyOnce is used in quite a few tests and all of them have their own (differing) versions. |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
98 | I was suggesting a type alias (using Foo = Bar;), but FWIW, a using declaration is fine by me. |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
56–74 | In my opinion, this provides the most readability. It makes what we're testing really obvious and avoids the need for more sophisticated tools that would not provide as much clarity as this. The long lines are kind of annoying, but we could use shorter aliases to alleviate that. |
libcxx/test/std/algorithms/alg.sorting/alg.merge/ranges_merge.pass.cpp | ||
---|---|---|
56–74 | I think I also like this format. I dropped it because in the review there were concerns that the cost to maintain this format might outweigh the benefits |
This should be ✅, since this done once it's landed.