This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Implement P2438R2 (std::string::substr() &&)
ClosedPublic

Authored by philnik on Aug 11 2022, 4:31 AM.

Details

Summary

This doesn't affect our ABI because std::string::substr() isn't in the dylib and the mangling of substr() const and substr() const& are different.

Diff Detail

Event Timeline

philnik created this revision.Aug 11 2022, 4:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 4:31 AM
philnik requested review of this revision.Aug 11 2022, 4:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 4:31 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
avogelsgesang added inline comments.Aug 12 2022, 5:25 PM
libcxx/include/string
261–263

I would rather split this into two lines;

basic_string substr(size_type pos = 0, size_type n = npos) const;                           // constexpr since C++20, removed in C++23
basic_string substr(size_type pos = 0, size_type n = npos) const&;                           // since C++23
880

So far, all constructors and functions were defined outside the interface definition. Was there a conscious decision to no longer do so? If not, I think we should keep implementations out of the class definitions for consistency.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
27

maybe use DisableAllocationGuard to check that this substring operation here doesn't allocate any memory?

86

afaict, we have no coverage for the S substr(std::move(orig), pos) variant of the constructor

137

until here, the calls don't use the provided alloc. Still, they are executed multiple times as part of the test_string invocations with varying allocators. Maybe split this function into a test_string_without_allocator and test_string_with_allocator and call test_string_without_allocator only once from test

libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp
27

DisableAllocationGuard`?

philnik marked 5 inline comments as done.Aug 13 2022, 2:37 AM
philnik added inline comments.
libcxx/include/string
880

That's not true. See basic_string(__uininitialized_size_tag, size_type, const allocator_type&) for example. I don't actually know why the function definitions were written out of line. It just leads to massive amounts of boiler-plate.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
27

That's not actually guaranteed by the standard. An implementation is still allowed to allocate.

86

Did you name the wrong constructor, or did you miss test_string_pos?

137

These functions test with a default-constructed allocator.

libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp
27

Same reason as above: an implementation is allowed to allocate.

avogelsgesang added inline comments.Aug 13 2022, 3:11 AM
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
27

I know that the standard doesn't guarantee that. But your implementation intentionally avoids this allocation. As such, I think it makes sense to test this behavior to guard ourselves against unintended regressions. Maybe using a LIBCPP_ASSERT, so that other standard-conforming libraries still pass the test case?

54

is it intentional, that this call doesn't pas the alloc parameter?

86

Indeed, I missed the constructor. All 4 variants are covered

137

I know. But the code

test_string<std::string>(std::allocator<char>{});
test_string<std::basic_string<char, std::char_traits<char>, min_allocator<char>>>(min_allocator<char>{});
test_string<std::basic_string<char, std::char_traits<char>, test_allocator<char>>>(test_allocator<char>{42});

below leads to the test_string_pos_n function being called 3 times, with the exact same parameters, given that they only rely on the default constructed allocator and ignore the alloc passed to test_string. That seems unnecessarily repetitive

philnik updated this revision to Diff 452399.Aug 13 2022, 3:41 AM
philnik marked 10 inline comments as done.
  • Address comments
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
54

Nope, good catch!

137

Running the tests with different allocators ensures that the allocator gets properly default-constructed. std::allocator doesn't have any state, so nobody would notice if the allocator wasn't constructed at all. It's also always a good idea to give templates user-defined types to ensure that the library doesn't just work with library-defined types.

avogelsgesang accepted this revision.Aug 13 2022, 4:04 AM

LGTM % one more nit

libcxx/include/string
880

ok, your call

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
137

makes sense! Thanks for the explanation

libcxx/test/std/strings/basic.string/string.ops/string_substr/substr.pass.cpp
11–12

also split into two functions here, same as in the string synopsis

Thanks for working on this. Some comments and I see the entire MacOS build fails. Is that a glitch or this patch?

libcxx/docs/ReleaseNotes.rst
41

I miss an update to the status page.

41

Interestingly there's no feature-test macro in the paper.

libcxx/include/string
261–263

Can you improve the formatting and alignment here and above in the synopsis?

888

Did you measure the overhead for this move when __pos == 0?

1301

This seems to be https://wg21.link/lwg3752, can you mention in the status page we implement this not yet accepted LWG-issue?

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
164

For new tests I really would like to see all character types being tested.

164

How about using different char_traits too?

libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp
14

Please update the synopsis in the constexpr basic_string substr(size_type pos = 0, size_type n = npos) const; test.

66

For new tests I really would like to see all character types being tested.

66

How about using different char_traits too?

philnik updated this revision to Diff 452631.Aug 15 2022, 5:06 AM
philnik marked 10 inline comments as done.
  • Address comments
libcxx/include/string
888

I haven't measured, but AFAICT every memmove implementation is in that case basically a no-op. Our char_traits::move has if (__s1 < __s2 ) {...} else if (__s2 < __s1) {...}, the LLVM libc has if (dst < src) ... else if (dst > src) ... and musl has if (d == s) return d;. I haven't checked glibc, but I would be very surprised if if wasn't optimized there.

1301

I wouldn't consider LWG3752 to be implemented. It doesn't propose a solution and IMO the right thing would be to use select_on_container_copy_construction on the const& overload and passing the allocator on the && overload.

Looking at the paper I also noticed 4.5. Concerns on ABI stability, did you verify whether this isn't an issue for us?

libcxx/include/string
108

Can you improve these too?

888

If this is optimized out I'm fine with it.

1301

But this deviates from the wording in the paper, there it's defined as
return basic_string(*this, pos, n); which uses basic_string's default allocator.
So there should be some comments as why we have this non-conforming behaviour.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
186

Why test1 and test2 instead of just one test?

huixie90 added inline comments.
libcxx/include/string
869–870

Does this need _LIBCPP_HIDE_FROM_ABI

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
36

I guess this check does not check too much. Would it be worth to add a test in test/libcxx that checks for a long string the original string is "moved"?

libcxx/test/std/strings/basic.string/string.ops/string_substr/substr.pass.cpp
110

Out of interest: none of the tests are using non-ascii characters. I guess it is irrelevant to what this particular test.

var-const requested changes to this revision.Aug 16 2022, 3:33 PM
var-const added inline comments.
libcxx/include/string
261–263

Ultranit: s/since/in/? Since it's removed in the very next release, it doesn't really make sense to say it's constexpr _since_ C++20.

261–263

Using a single line is consistent with other entries in the synopsis and makes it obvious to which function the comment refers to (whereas for the two-line version, it could refer to either the preceding or the following line).

880

I think we've been trying to move away from the old style of defining these out-of-class. In addition to boilerplate, it makes modifying the interface, SFINAE in particular, a bigger pain than it should be.

880

Other constructors also insert the newly-created string into the debug database, should this be done here as well?

882

Can you please add some blank lines so that this function is split into logical blocks visually? I'd suggest one after the error check in the beginning (to separate argument validation from the main flow), one before the else branch, and possibly one before _Traits::move.

885

Is this check necessary? Wouldn't __alloc == __str.__alloc() return true in that case?

This revision now requires changes to proceed.Aug 16 2022, 3:33 PM
philnik updated this revision to Diff 453243.Aug 17 2022, 2:55 AM
philnik marked 7 inline comments as done.
  • Address comments
libcxx/include/string
108

what exactly are you asking for?

869–870

No, this constructor is part of the ABI.

880

Yes, I think so. I couldn't find any comprehensive tests for the debug mode though. Do you know if we have any/where they are?

885

Yes, but that allows the compiler to always remove the comparison.

888

Do you mean if it's optimized out when the compiler knows that __pos == 0? Or do you mean it's fine as-is, since this scenario is optimized in memmove?

1301

I don't know why we have the non-conforming behaviour. I just didn't want to change the behaviour in this patch. I've filed https://llvm.org/PR57190.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
186

As usual, constexpr step limit.

Mordante added inline comments.Aug 17 2022, 9:18 AM
libcxx/include/string
108

To split the line in two and align the since C++23 with the other since comments.

888

I'm fine with this as is, so leave it to memmove. It gives some overhead by calling a function, but I don't think that's a big deal.

1301

I still would like a comment in the code mentioning this is non-standard behaviour and a link to the bug report.
That way when somebody sees this and wonders why it's clear this was done intentionally.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
186

maybe add a comment, however I really love to increment the limit; but you're already seen that patch ;-)

philnik updated this revision to Diff 453632.Aug 18 2022, 5:32 AM
philnik marked 9 inline comments as done.
  • Address comments

In general LGTM from my point of view, however I see all MacOS CI runs failing so I expect you have more things to do.

Can you have a look at this question too before I approve?
Looking at the paper I also noticed 4.5. Concerns on ABI stability, did you verify whether this isn't an issue for us?

var-const requested changes to this revision.Aug 18 2022, 3:35 PM
var-const added inline comments.
libcxx/include/string
261–263

Using a single line is consistent with other entries in the synopsis and makes it obvious to which function the comment refers to (whereas for the two-line version, it could refer to either the preceding or the following line).

Sorry, looks like I misread the original comment -- please disregard.

var-const added inline comments.Aug 18 2022, 3:35 PM
libcxx/include/string
872–899

I'm a little surprised this constructor is not conditionally noexcept. Do you know if this is deliberate or an omission in the paper?

880

I suspect we don't have those tests, but let's confirm with @ldionne.

886

Can you confirm that the is_always_equal check optimizes away even without using if constexpr?

887

Hmm, the move constructor instead does this:

if (__libcpp_is_constant_evaluated()) {
    __zero();
    __r_.first().__l = __str.__r_.first().__l;
} else {
    __r_.first().__r = __str.__r_.first().__r;
}

Your version is simpler. Is there a functional difference? If these are equivalent, I think we should simplify the move constructor as well (not in this patch). If not equivalent, is there a reason why the new constructor can use a simple assignment without the conditional logic?

890

Consider a comment like Move the given string then shrink it in place.

890

Reevaluating data() calls to_address() and is_long() repeatedly. Would it make sense to store the result in an intermediate variable?

894

Consider adding a comment like Perform a copy because allocators have different types.

897

The move constructor also performs

if (__is_long())
    std::__debug_db_swap(this, &__str);

I presume this is necessary here as well -- @ldionne would know more.

1300

+1 to @Mordante's comment re. ABI stability. This should also be documented in the commit message (either why the ABI break doesn't happen in our case, or what we are doing to mitigate it).

1301

Is the proposed resolution of LWG3752 to pass the allocator? I couldn't find the discussion it refers to, but because it says "we will fix it", I take it to mean that the current behavior, i.e., passing __alloc() should be changed, and the paper merely asks for confirmation.

This revision now requires changes to proceed.Aug 18 2022, 3:35 PM
var-const added inline comments.Aug 19 2022, 12:01 AM
libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
36

Optional: consider moving the checks into a helper function to cut down on boilerplate.

54

Can we do some equivalent of DisableAllocationGuard in this test as well?

62

I'd split the normal case and the exceptional case in two functions. These are very different concerns, and I find the current triple-check (if pos is not valid, and it's not constant-evaluated, and exceptions are enabled) a little overwhelming. Making the expected parameter double as essentially a boolean also doesn't feel very clean. This would also be clearer at the call site (I'd group all normal cases and all exception cases together, i.e. a bunch of test_string_pos, test_string_pos_n, etc., followed by a bunch of test_string_pos_exception, test_string_pos_n_exception, etc.

I know it will increase the number of test_string_* functions even more, but I think it's the lesser evil. The amount of code won't increase substantially, and the smaller functions will be easier to read.

119

Question (not for this patch): do we still need these macros even in the latest language modes, or can this now be achieved by regular code?

libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp
27

The implementation is recommended to avoid allocation, though, and I think we want to check that we follow that.

29

orig_string_copy seems unused?

30

What's the benefit of calculating rlen, given that the str == expected check would also compare lengths? If it's necessary to check the length explicitly for some reason, I think it should be an argument to this function.

33

In the constructor test we don't check __invariants() on str, can you make this consistent?

libcxx/test/support/count_new.h
439–443

Why is this a no-op in consteval mode?

453

Why are the constexpr language versions different between member functions?

philnik updated this revision to Diff 454077.Aug 19 2022, 12:14 PM
philnik marked 13 inline comments as done.
  • Address comments
libcxx/include/string
872–899

Why are you surprised this isn't conditionally noexcept? Only the default constructor is conditionally noexcept, and the allocator and move constructors are always noexcept. Nothing else is.

886

Do you mean something like this?

887

I don't think there is any difference. I'll make a follow-up.

890

If the compiler can't optimize it IMO we should give the compiler hints on data() instead of decreasing the readability of the code.

890

I think this comment is more misleading than helpful. _Traits::move() and __set_size() seem to me quite intuitive. shrink in place sounds a lot like we shrink_to_fit() here, which we definitely don't want to do.

894

Would it make more sense to rename __init to something like __init_from_char_array to make it obvious that this path copies the array?

1300

It's simple actually. The mangling for the functions is different (https://godbolt.org/z/P8dfdYPs7) and we don't have it in our ABI.

1301

There is no officially proposed solution (as you can see). If you're asking if that's what the standard says, yes.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
54

Not really. Here the constructor sometimes has to allocate.

62

I really wouldn't group normal and exception cases together. Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else. IMO it's also cleaner to always call the same function for testing the same function in this case. Having the "exception" at the end makes it quite obvious which are throwing while making the rest of the arguments easily comparable.

119

AFAIK we still don't have any standard conversion functions, so I guess we still need the macro.

libcxx/test/std/strings/basic.string/string.ops/string_substr/substr_rvalue.pass.cpp
27

Again, we can't do that because the constructor might have to allocate.

30

There isn't any. That was a leftover from a previous iteration.

libcxx/test/support/count_new.h
439–443

Because we use global variables here.

453

Destructors can only be constexpr in C++20 and later.

Mordante accepted this revision as: Mordante.Aug 20 2022, 7:39 AM

I still see a lot of open comments of @var-const and CI failures, but I've no additional issues.
So I approve this patch to make that clear and leave the final approval to @var-const.

libcxx/include/string
1300

Thanks for the explanation and +1 adding this information to the commit message.

1301

For me it's fine to keep this since we already have this behaviour in our original substr code. So I don't mind waiting for the final resolution of LWG3752 before changing the code.

var-const added inline comments.Aug 23 2022, 5:54 PM
libcxx/include/string
872–899

To be clear, I'm surprised that this isn't noexcept in some way (I just think it would have to be conditionally, but I didn't give much thought to it), since it's similar to the move constructor in many ways.

886

The interesting part, IMO, is that it depends on the template type, but I presume it won't make a difference to the compiler.

890

Personally, I find using a variable more readable (for me, the repeated function calls are "noisy" and add extra cognitive overhead of thinking whether the function result might change upon subsequent calls). Also, performance in debug mode is a common concern among users.

894

I'd still prefer a comment. Even __init_from_char_array doesn't make it obvious that a copy is performed, and the part about allocators having different types is helpful as well.

1300

Please add that explanation to the patch / commit description.

1301

In that case, I feel we shouldn't pass __alloc for the rvalue overload -- yes, it would be inconsistent, but I don't think we should introduce more divergence and have more stuff to fix later. This is definitely something to check with @ldionne.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
36

Friendly ping (feel free to push back).

62

Currently you can easily see that the exceptions cases are testing the edge cases, which wouldn't be the case anymore when moving them somewhere else.

I don't see how it would be less readable. The current state is

test_string_pos<S>(STR(""), 0, STR(""));
test_string_pos<S>(STR(""), 1, STR("exception"));
test_string_pos<S>(STR("Banane"), 1, STR("anane"));
test_string_pos<S>(STR("Banane"), 6, STR(""));
test_string_pos<S>(STR("Banane"), 7, STR("exception"));
test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));

I would prefer

test_string_pos<S>(STR(""), 0, STR(""));
test_string_pos<S>(STR("Banane"), 1, STR("anane"));
test_string_pos<S>(STR("Banane"), 6, STR(""));
test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));

test_string_pos_exception<S>(STR(""), 1);
test_string_pos_exception<S>(STR("Banane"), 7);

Which I think makes it even more obvious that now we're testing corner cases. Alternatively, keeping the same order is fine too:

test_string_pos<S>(STR(""), 0, STR(""));
test_string_pos_exception<S>(STR(""), 1);
test_string_pos<S>(STR("Banane"), 1, STR("anane"));
test_string_pos<S>(STR("Banane"), 6, STR(""));
test_string_pos_exception<S>(STR("Banane"), 7);
test_string_pos<S>(STR("long long string so no SSO"), 0, STR("long long string so no SSO"));

Personally, I think this would be more readable, or at the very least no less readable, while making the existing issues (overcomplicated condition in test_string_pos, magic value in expected) go away.

Mordante added inline comments.Aug 24 2022, 10:38 AM
libcxx/include/string
1301

Since @ldionne mentioned on Discord LWG-3752 probably will be closed as NAD I agree more with @var-const.(I had expected LWG to mandate the current MSVC STL and libc++ implementation.)

philnik edited the summary of this revision. (Show Details)Sep 6 2022, 10:30 AM
philnik marked 4 inline comments as done.Sep 6 2022, 10:47 AM
philnik added inline comments.
libcxx/include/string
894

I'm not sure if you are just a bit sloppy with the wording or if you misunderstand something. The allocators always have the same type, but the allocators may not compare equal.

1301

I'd still rather do it in a separate patch, because it is a significant behaviour change that has nothing to do with P2438. The overloads should not have different behaviour IMO.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
36

In my experience having the checks in it's own function just makes the test harder to debug, so I'd rather keep them duplicated.

62

I prefer the first one, so I guess we just disagree here. IMO the second one makes it a lot less obvious that we're checking corner cases, because I can't immediately see that STR("Banane"), 7 is the exact case where it should throw, which I do immediately see if the line before was STR("Banane"), 6. The last one seems just messy to me, but that's nothing you can argue much about.

ldionne accepted this revision as: ldionne.Sep 8 2022, 9:14 AM

The implementation LGTM, I have a few comments.

libcxx/include/string
880

I don't think we have extremely comprehensive tests for the debug mode for string, but we do have e.g. libcxx/test/libcxx/strings/basic.string/string.iterators/debug.iterator.index.pass.cpp & friends in the same directory.

I think we do want a simple test to make sure that the added constructor works properly with the debug mode. Concretely, I would imagine something like:

std::string s = "hello world";
std::string s2(std::move(s), 0, 5); // "hello"
std::string::iterator i = s2.begin();
assert(i[0] == 'h');
TEST_LIBCPP_ASSERT_FAILURE(i[5], "Attempted to subscript an iterator outside its valid range");
894

I think what @var-const meant was:

// Perform a copy because the allocators are not compatible.

The allocators have the same type, they just compare unequal, which in alllocator-land means that the memory allocated by one can't be freed by the other.

897

Test needed!

1301

I agree with @philnik here: let's implement this consistently with our current substr behavior, but then fix both substr behaviors to be standards conforming per LWG3752's expected resolution ASAP.

Alternatively, we could implement LWG3752's expected resolution first, and then land this patch consistently with the fixed substr(). I don't care in what order this happens, as long as we don't ship another release without LWG3752's resolution. But I think we should really avoid introducing a difference between our two substr() implementations.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
29–51

Instead, I would suggest the following:

template <class S>
constexpr void test_string_pos(S orig, typename S::size_type pos, S expected) {
#ifdef _LIBCPP_VERSION
  DisableAllocationGuard g;
#endif
  S substr(std::move(orig), pos);
  LIBCPP_ASSERT(orig.__invariants());
  LIBCPP_ASSERT(orig.empty());
  LIBCPP_ASSERT(substr.__invariants());
  assert(substr == expected);
}

constexpr struct should_throw_exception_t { } should_throw_exception{};

template <class S>
constexpr void test_string_pos(S orig, typename S::size_type pos, should_throw_exception_t) {
#ifndef TEST_HAS_NO_EXCEPTIONS
  if (!std::is_constant_evaluated()) {
    try {
      [[maybe_unused]] S substr = S(std::move(orig), pos);
      assert(false);
    } catch (const std::out_of_range&) {

    }
  }
#endif
}

We could also use a differently-named function, but you seemed to like the naming consistency at the callsite IIUC. Basically, what I care about is that we remove this logic from the test since it's so easy to remove, and also since those two functions do entirely different things.

philnik updated this revision to Diff 461369.Sep 19 2022, 2:38 PM
philnik marked 11 inline comments as done.
  • Rebased
  • Address comments
var-const accepted this revision as: var-const.Sep 20 2022, 4:05 PM
var-const added inline comments.
libcxx/include/string
890

Ping.

894

Yeah, I'm not sure why I was thinking about types when I was typing this. Current wording LGTM.

libcxx/test/std/strings/basic.string/string.cons/substr_rvalue.pass.cpp
62

I think the function name having _exception conveys that this tests corner cases a lot clearer than anything else could. On the other hand, there isn't anything that semantically connects the 6 and 7 test cases together -- this is at best very brittle (in the future, it's easy for someone to rearrange these test cases, insert a new test case between them, etc.) and requires reading the code closely and comparing test cases across the lines.

philnik updated this revision to Diff 463865.Sep 29 2022, 6:18 AM
  • Rebased
  • Try to fix CI
libcxx/include/string
890

I don't think we can do much about debug performance in the library. Saving the result won't make much of a difference compared to letting the compiler optimize a bit. Even calling an empty function has quite a bit of overhead that isn't observable in any way AFAIK.

void func() {}

produces

push    rbp
mov     rbp, rsp
pop     rbp
ret

but it could just be ret without any problems. That's not something we can influence in the library.

ldionne accepted this revision.Sep 29 2022, 2:37 PM

LGTM but let's not forget about LWG3752.

This revision is now accepted and ready to land.Sep 29 2022, 2:37 PM
philnik updated this revision to Diff 472574.Nov 2 2022, 4:35 AM

Try to fix CI

philnik updated this revision to Diff 472677.Nov 2 2022, 10:27 AM

Fix debug test

This revision was automatically updated to reflect the committed changes.