This doesn't affect our ABI because std::string::substr() isn't in the dylib and the mangling of substr() const and substr() const& are different.
Details
- Reviewers
ldionne Mordante var-const fsb4000 avogelsgesang - Group Reviewers
Restricted Project - Commits
- rG29378ab24b98: [libc++] Implement P2438R2 (std::string::substr() &&)
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
libcxx/include/string | ||
---|---|---|
268–270 | I would rather split this into two lines; basic_string substr(size_type pos = 0, size_type n = npos) const; // constexpr since C++20, removed in C++23 basic_string substr(size_type pos = 0, size_type n = npos) const&; // since C++23 | |
895 | So far, all constructors and functions were defined outside the interface definition. Was there a conscious decision to no longer do so? If not, I think we should keep implementations out of the class definitions for consistency. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
27 | maybe use DisableAllocationGuard to check that this substring operation here doesn't allocate any memory? | |
86 | afaict, we have no coverage for the S substr(std::move(orig), pos) variant of the constructor | |
137 | until here, the calls don't use the provided alloc. Still, they are executed multiple times as part of the test_string invocations with varying allocators. Maybe split this function into a test_string_without_allocator and test_string_with_allocator and call test_string_without_allocator only once from test | |
libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp | ||
27 | DisableAllocationGuard`? |
libcxx/include/string | ||
---|---|---|
895 | That's not true. See basic_string(__uininitialized_size_tag, size_type, const allocator_type&) for example. I don't actually know why the function definitions were written out of line. It just leads to massive amounts of boiler-plate. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
27 | That's not actually guaranteed by the standard. An implementation is still allowed to allocate. | |
86 | Did you name the wrong constructor, or did you miss test_string_pos? | |
137 | These functions test with a default-constructed allocator. | |
libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp | ||
27 | Same reason as above: an implementation is allowed to allocate. |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
---|---|---|
27 | I know that the standard doesn't guarantee that. But your implementation intentionally avoids this allocation. As such, I think it makes sense to test this behavior to guard ourselves against unintended regressions. Maybe using a LIBCPP_ASSERT, so that other standard-conforming libraries still pass the test case? | |
54 | is it intentional, that this call doesn't pas the alloc parameter? | |
86 | Indeed, I missed the constructor. All 4 variants are covered | |
137 | I know. But the code test_string<std::string>(std::allocator<char>{}); test_string<std::basic_string<char, std::char_traits<char>, min_allocator<char>>>(min_allocator<char>{}); test_string<std::basic_string<char, std::char_traits<char>, test_allocator<char>>>(test_allocator<char>{42}); below leads to the test_string_pos_n function being called 3 times, with the exact same parameters, given that they only rely on the default constructed allocator and ignore the alloc passed to test_string. That seems unnecessarily repetitive |
- Address comments
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
---|---|---|
54 | Nope, good catch! | |
137 | Running the tests with different allocators ensures that the allocator gets properly default-constructed. std::allocator doesn't have any state, so nobody would notice if the allocator wasn't constructed at all. It's also always a good idea to give templates user-defined types to ensure that the library doesn't just work with library-defined types. |
LGTM % one more nit
libcxx/include/string | ||
---|---|---|
895 | ok, your call | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
137 | makes sense! Thanks for the explanation | |
libcxx/test/std/strings/basic.string/string.ops/string_substr/substr.pass.cpp | ||
11–12 | also split into two functions here, same as in the string synopsis |
Thanks for working on this. Some comments and I see the entire MacOS build fails. Is that a glitch or this patch?
libcxx/docs/ReleaseNotes.rst | ||
---|---|---|
41 | I miss an update to the status page. | |
41 | Interestingly there's no feature-test macro in the paper. | |
libcxx/include/string | ||
268–270 | Can you improve the formatting and alignment here and above in the synopsis? | |
903 | Did you measure the overhead for this move when __pos == 0? | |
1319 | This seems to be https://wg21.link/lwg3752, can you mention in the status page we implement this not yet accepted LWG-issue? | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
164 | For new tests I really would like to see all character types being tested. | |
164 | How about using different char_traits too? | |
libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp | ||
14 | Please update the synopsis in the constexpr basic_string substr(size_type pos = 0, size_type n = npos) const; test. | |
66 | For new tests I really would like to see all character types being tested. | |
66 | How about using different char_traits too? |
- Address comments
libcxx/include/string | ||
---|---|---|
903 | I haven't measured, but AFAICT every memmove implementation is in that case basically a no-op. Our char_traits::move has if (__s1 < __s2 ) {...} else if (__s2 < __s1) {...}, the LLVM libc has if (dst < src) ... else if (dst > src) ... and musl has if (d == s) return d;. I haven't checked glibc, but I would be very surprised if if wasn't optimized there. | |
1319 | I wouldn't consider LWG3752 to be implemented. It doesn't propose a solution and IMO the right thing would be to use select_on_container_copy_construction on the const& overload and passing the allocator on the && overload. |
Looking at the paper I also noticed 4.5. Concerns on ABI stability, did you verify whether this isn't an issue for us?
libcxx/include/string | ||
---|---|---|
113 | Can you improve these too? | |
903 | If this is optimized out I'm fine with it. | |
1319 | But this deviates from the wording in the paper, there it's defined as | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
186 | Why test1 and test2 instead of just one test? |
libcxx/include/string | ||
---|---|---|
884–885 | Does this need _LIBCPP_HIDE_FROM_ABI | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
36 | I guess this check does not check too much. Would it be worth to add a test in test/libcxx that checks for a long string the original string is "moved"? | |
libcxx/test/std/strings/basic.string/string.ops/string_substr/substr.pass.cpp | ||
110 | Out of interest: none of the tests are using non-ascii characters. I guess it is irrelevant to what this particular test. |
libcxx/include/string | ||
---|---|---|
268–270 | Ultranit: s/since/in/? Since it's removed in the very next release, it doesn't really make sense to say it's constexpr _since_ C++20. | |
268–270 | Using a single line is consistent with other entries in the synopsis and makes it obvious to which function the comment refers to (whereas for the two-line version, it could refer to either the preceding or the following line). | |
895 | I think we've been trying to move away from the old style of defining these out-of-class. In addition to boilerplate, it makes modifying the interface, SFINAE in particular, a bigger pain than it should be. | |
895 | Other constructors also insert the newly-created string into the debug database, should this be done here as well? | |
897 | Can you please add some blank lines so that this function is split into logical blocks visually? I'd suggest one after the error check in the beginning (to separate argument validation from the main flow), one before the else branch, and possibly one before _Traits::move. | |
900 | Is this check necessary? Wouldn't __alloc == __str.__alloc() return true in that case? |
- Address comments
libcxx/include/string | ||
---|---|---|
113 | what exactly are you asking for? | |
884–885 | No, this constructor is part of the ABI. | |
895 | Yes, I think so. I couldn't find any comprehensive tests for the debug mode though. Do you know if we have any/where they are? | |
900 | Yes, but that allows the compiler to always remove the comparison. | |
903 | Do you mean if it's optimized out when the compiler knows that __pos == 0? Or do you mean it's fine as-is, since this scenario is optimized in memmove? | |
1319 | I don't know why we have the non-conforming behaviour. I just didn't want to change the behaviour in this patch. I've filed https://llvm.org/PR57190. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
186 | As usual, constexpr step limit. |
libcxx/include/string | ||
---|---|---|
113 | To split the line in two and align the since C++23 with the other since comments. | |
903 | I'm fine with this as is, so leave it to memmove. It gives some overhead by calling a function, but I don't think that's a big deal. | |
1319 | I still would like a comment in the code mentioning this is non-standard behaviour and a link to the bug report. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
186 | maybe add a comment, however I really love to increment the limit; but you're already seen that patch ;-) |
In general LGTM from my point of view, however I see all MacOS CI runs failing so I expect you have more things to do.
Can you have a look at this question too before I approve?
Looking at the paper I also noticed 4.5. Concerns on ABI stability, did you verify whether this isn't an issue for us?
libcxx/include/string | ||
---|---|---|
268–270 |
Sorry, looks like I misread the original comment -- please disregard. |
libcxx/include/string | ||
---|---|---|
887–916 | I'm a little surprised this constructor is not conditionally noexcept. Do you know if this is deliberate or an omission in the paper? | |
895 | I suspect we don't have those tests, but let's confirm with @ldionne. | |
901 | Can you confirm that the is_always_equal check optimizes away even without using if constexpr? | |
902 | Hmm, the move constructor instead does this: if (__libcpp_is_constant_evaluated()) { __zero(); __r_.first().__l = __str.__r_.first().__l; } else { __r_.first().__r = __str.__r_.first().__r; } Your version is simpler. Is there a functional difference? If these are equivalent, I think we should simplify the move constructor as well (not in this patch). If not equivalent, is there a reason why the new constructor can use a simple assignment without the conditional logic? | |
905 | Consider a comment like Move the given string then shrink it in place. | |
905 | Reevaluating data() calls to_address() and is_long() repeatedly. Would it make sense to store the result in an intermediate variable? | |
909 | Consider adding a comment like Perform a copy because allocators have different types. | |
912 | The move constructor also performs if (__is_long()) std::__debug_db_swap(this, &__str); I presume this is necessary here as well -- @ldionne would know more. | |
1318 | +1 to @Mordante's comment re. ABI stability. This should also be documented in the commit message (either why the ABI break doesn't happen in our case, or what we are doing to mitigate it). | |
1319 | Is the proposed resolution of LWG3752 to pass the allocator? I couldn't find the discussion it refers to, but because it says "we will fix it", I take it to mean that the current behavior, i.e., passing __alloc() should be changed, and the paper merely asks for confirmation. |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
---|---|---|
36 | Optional: consider moving the checks into a helper function to cut down on boilerplate. | |
54 | Can we do some equivalent of DisableAllocationGuard in this test as well? | |
62 | I'd split the normal case and the exceptional case in two functions. These are very different concerns, and I find the current triple-check (if pos is not valid, and it's not constant-evaluated, and exceptions are enabled) a little overwhelming. Making the expected parameter double as essentially a boolean also doesn't feel very clean. This would also be clearer at the call site (I'd group all normal cases and all exception cases together, i.e. a bunch of test_string_pos, test_string_pos_n, etc., followed by a bunch of test_string_pos_exception, test_string_pos_n_exception, etc. I know it will increase the number of test_string_* functions even more, but I think it's the lesser evil. The amount of code won't increase substantially, and the smaller functions will be easier to read. | |
119 | Question (not for this patch): do we still need these macros even in the latest language modes, or can this now be achieved by regular code? | |
libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp | ||
27 | The implementation is recommended to avoid allocation, though, and I think we want to check that we follow that. | |
29 | orig_string_copy seems unused? | |
30 | What's the benefit of calculating rlen, given that the str == expected check would also compare lengths? If it's necessary to check the length explicitly for some reason, I think it should be an argument to this function. | |
33 | In the constructor test we don't check __invariants() on str, can you make this consistent? | |
libcxx/test/support/count_new.h | ||
439–443 | Why is this a no-op in consteval mode? | |
453 | Why are the constexpr language versions different between member functions? |
- Address comments
libcxx/include/string | ||
---|---|---|
887–916 | Why are you surprised this isn't conditionally noexcept? Only the default constructor is conditionally noexcept, and the allocator and move constructors are always noexcept. Nothing else is. | |
901 | Do you mean something like this? | |
902 | I don't think there is any difference. I'll make a follow-up. | |
905 | If the compiler can't optimize it IMO we should give the compiler hints on data() instead of decreasing the readability of the code. | |
905 | I think this comment is more misleading than helpful. _Traits::move() and __set_size() seem to me quite intuitive. shrink in place sounds a lot like we shrink_to_fit() here, which we definitely don't want to do. | |
909 | Would it make more sense to rename __init to something like __init_from_char_array to make it obvious that this path copies the array? | |
1318 | It's simple actually. The mangling for the functions is different (https://godbolt.org/z/P8dfdYPs7) and we don't have it in our ABI. | |
1319 | There is no officially proposed solution (as you can see). If you're asking if that's what the standard says, yes. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
54 | Not really. Here the constructor sometimes has to allocate. | |
62 | I really wouldn't group normal and exception cases together. Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else. IMO it's also cleaner to always call the same function for testing the same function in this case. Having the "exception" at the end makes it quite obvious which are throwing while making the rest of the arguments easily comparable. | |
119 | AFAIK we still don't have any standard conversion functions, so I guess we still need the macro. | |
libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp | ||
27 | Again, we can't do that because the constructor might have to allocate. | |
30 | There isn't any. That was a leftover from a previous iteration. | |
libcxx/test/support/count_new.h | ||
439–443 | Because we use global variables here. | |
453 | Destructors can only be constexpr in C++20 and later. |
I still see a lot of open comments of @var-const and CI failures, but I've no additional issues.
So I approve this patch to make that clear and leave the final approval to @var-const.
libcxx/include/string | ||
---|---|---|
1318 | Thanks for the explanation and +1 adding this information to the commit message. | |
1319 | For me it's fine to keep this since we already have this behaviour in our original substr code. So I don't mind waiting for the final resolution of LWG3752 before changing the code. |
libcxx/include/string | ||
---|---|---|
887–916 | To be clear, I'm surprised that this isn't noexcept in some way (I just think it would have to be conditionally, but I didn't give much thought to it), since it's similar to the move constructor in many ways. | |
901 | The interesting part, IMO, is that it depends on the template type, but I presume it won't make a difference to the compiler. | |
905 | Personally, I find using a variable more readable (for me, the repeated function calls are "noisy" and add extra cognitive overhead of thinking whether the function result might change upon subsequent calls). Also, performance in debug mode is a common concern among users. | |
909 | I'd still prefer a comment. Even __init_from_char_array doesn't make it obvious that a copy is performed, and the part about allocators having different types is helpful as well. | |
1318 | Please add that explanation to the patch / commit description. | |
1319 | In that case, I feel we shouldn't pass __alloc for the rvalue overload -- yes, it would be inconsistent, but I don't think we should introduce more divergence and have more stuff to fix later. This is definitely something to check with @ldionne. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
36 | Friendly ping (feel free to push back). | |
62 |
I don't see how it would be less readable. The current state is test_string_pos<S>(STR(""), 0, STR("")); test_string_pos<S>(STR(""), 1, STR("exception")); test_string_pos<S>(STR("Banane"), 1, STR("anane")); test_string_pos<S>(STR("Banane"), 6, STR("")); test_string_pos<S>(STR("Banane"), 7, STR("exception")); test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO")); I would prefer test_string_pos<S>(STR(""), 0, STR("")); test_string_pos<S>(STR("Banane"), 1, STR("anane")); test_string_pos<S>(STR("Banane"), 6, STR("")); test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO")); test_string_pos_exception<S>(STR(""), 1); test_string_pos_exception<S>(STR("Banane"), 7); Which I think makes it even more obvious that now we're testing corner cases. Alternatively, keeping the same order is fine too: test_string_pos<S>(STR(""), 0, STR("")); test_string_pos_exception<S>(STR(""), 1); test_string_pos<S>(STR("Banane"), 1, STR("anane")); test_string_pos<S>(STR("Banane"), 6, STR("")); test_string_pos_exception<S>(STR("Banane"), 7); test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO")); Personally, I think this would be more readable, or at the very least no less readable, while making the existing issues (overcomplicated condition in test_string_pos, magic value in expected) go away. |
libcxx/include/string | ||
---|---|---|
1319 | Since @ldionne mentioned on Discord LWG-3752 probably will be closed as NAD I agree more with @var-const.(I had expected LWG to mandate the current MSVC STL and libc++ implementation.) |
libcxx/include/string | ||
---|---|---|
909 | I'm not sure if you are just a bit sloppy with the wording or if you misunderstand something. The allocators always have the same type, but the allocators may not compare equal. | |
1319 | I'd still rather do it in a separate patch, because it is a significant behaviour change that has nothing to do with P2438. The overloads should not have different behaviour IMO. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
36 | In my experience having the checks in it's own function just makes the test harder to debug, so I'd rather keep them duplicated. | |
62 | I prefer the first one, so I guess we just disagree here. IMO the second one makes it a lot less obvious that we're checking corner cases, because I can't immediately see that STR("Banane"), 7 is the exact case where it should throw, which I do immediately see if the line before was STR("Banane"), 6. The last one seems just messy to me, but that's nothing you can argue much about. |
The implementation LGTM, I have a few comments.
libcxx/include/string | ||
---|---|---|
895 | I don't think we have extremely comprehensive tests for the debug mode for string, but we do have e.g. libcxx/test/libcxx/strings/basic.string/string.iterators/debug.iterator.index.pass.cpp & friends in the same directory. I think we do want a simple test to make sure that the added constructor works properly with the debug mode. Concretely, I would imagine something like: std::string s = "hello world"; std::string s2(std::move(s), 0, 5); // "hello" std::string::iterator i = s2.begin(); assert(i[0] == 'h'); TEST_LIBCPP_ASSERT_FAILURE(i[5], "Attempted to subscript an iterator outside its valid range"); | |
909 | I think what @var-const meant was: // Perform a copy because the allocators are not compatible. The allocators have the same type, they just compare unequal, which in alllocator-land means that the memory allocated by one can't be freed by the other. | |
912 | Test needed! | |
1319 | I agree with @philnik here: let's implement this consistently with our current substr behavior, but then fix both substr behaviors to be standards conforming per LWG3752's expected resolution ASAP. Alternatively, we could implement LWG3752's expected resolution first, and then land this patch consistently with the fixed substr(). I don't care in what order this happens, as long as we don't ship another release without LWG3752's resolution. But I think we should really avoid introducing a difference between our two substr() implementations. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
28–50 | Instead, I would suggest the following: template <class S> constexpr void test_string_pos(S orig, typename S::size_type pos, S expected) { #ifdef _LIBCPP_VERSION DisableAllocationGuard g; #endif S substr(std::move(orig), pos); LIBCPP_ASSERT(orig.__invariants()); LIBCPP_ASSERT(orig.empty()); LIBCPP_ASSERT(substr.__invariants()); assert(substr == expected); } constexpr struct should_throw_exception_t { } should_throw_exception{}; template <class S> constexpr void test_string_pos(S orig, typename S::size_type pos, should_throw_exception_t) { #ifndef TEST_HAS_NO_EXCEPTIONS if (!std::is_constant_evaluated()) { try { [[maybe_unused]] S substr = S(std::move(orig), pos); assert(false); } catch (const std::out_of_range&) { } } #endif } We could also use a differently-named function, but you seemed to like the naming consistency at the callsite IIUC. Basically, what I care about is that we remove this logic from the test since it's so easy to remove, and also since those two functions do entirely different things. |
libcxx/include/string | ||
---|---|---|
905 | Ping. | |
909 | Yeah, I'm not sure why I was thinking about types when I was typing this. Current wording LGTM. | |
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp | ||
62 | I think the function name having _exception conveys that this tests corner cases a lot clearer than anything else could. On the other hand, there isn't anything that semantically connects the 6 and 7 test cases together -- this is at best very brittle (in the future, it's easy for someone to rearrange these test cases, insert a new test case between them, etc.) and requires reading the code closely and comparing test cases across the lines. |
- Rebased
- Try to fix CI
libcxx/include/string | ||
---|---|---|
905 | I don't think we can do much about debug performance in the library. Saving the result won't make much of a difference compared to letting the compiler optimize a bit. Even calling an empty function has quite a bit of overhead that isn't observable in any way AFAIK. void func() {} produces push rbp mov rbp, rsp pop rbp ret but it could just be ret without any problems. That's not something we can influence in the library. |
I miss an update to the status page.