This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Implement P1391 for string_view
ClosedPublic

Authored by jloser on Sep 29 2021, 8:00 AM.

Details

Summary

Implement P1391 (https://wg21.link/p1391) which allows
std::string_view to be constructible from any contiguous range of
characters.

Note that a different paper (http://wg21.link/P1989) handles the generic
range constructor for std::string_view.

Diff Detail

Event Timeline

jloser requested review of this revision.Sep 29 2021, 8:00 AM
jloser created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2021, 8:00 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
jloser updated this revision to Diff 375895.Sep 29 2021, 8:01 AM

Adjust #ifdef to apply constructor for C++20 mode, not C++23

jloser updated this revision to Diff 375947.Sep 29 2021, 10:21 AM

Mark tests as XFAIL: apple-clang-12.0 since AppleClang 12.0 doesn't support concepts.

Mordante requested changes to this revision.Sep 29 2021, 12:13 PM

In general looks good, but the tests need some improvements.

libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
10

Minor nit, XFAIL isn't required, since // UNSUPPORTED: libcpp-no-concepts achieves the same.

26

Since this test only tests the types, it could be a compile.pass.cpp. But I see the other deduction tests are a pass.cpp. They also test whether the result is the expected result. Can you also add the other character types in this test? For example have a look at libcxx/test/std/strings/basic.string/string.cons/string_view_deduction.pass.cpp

32

Why do you want to test with int?

34

There should be a return 0;. On some platforms this is even required for main.

libcxx/test/std/strings/string.view/string.view.cons/from_iterator_iterator.pass.cpp
20

Here it would be nice to add a test for all character types. This can be done using templates. You can use MAKE_CSTRING from libcxx/test/support/make_string.h to create the val based on a template argument.

This revision now requires changes to proceed.Sep 29 2021, 12:13 PM
jloser updated this revision to Diff 376002.Sep 29 2021, 12:48 PM

Test other character types
Address other review feedback

jloser marked 3 inline comments as done.Sep 29 2021, 12:52 PM
jloser added inline comments.
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
10

Are you sure? Without the XFAIL, I was seeing CI failures only for AppleClang 12.0.0 as it couldn't resolve contiguous_iterator from what I could tell. The build failure link is https://buildkite.com/llvm-project/libcxx-ci/builds/5619.

BTW, we're about to drop support for AppleClang 12.0.0 as I talked with @ldionne the other day. AppleClang 13.0.0 just came out on 9/20 IIRC which has full support for concepts and then we can remove this XFAIL.

26

Added other character types and also test the result now. Good call!

32

Just as another character type. I decided to stick with char, wchar_t, and friends instead now.

libcxx/test/std/strings/string.view/string.view.cons/from_iterator_iterator.pass.cpp
20

Added a test for a variety of other character types. I added a MAKE_STRING_VIEW macro so our test function can be used in a constexpr context still (since we don't have fully constexpr string yet). This allows me to have static_assert(test())) work still.

jloser updated this revision to Diff 376007.Sep 29 2021, 12:54 PM
jloser marked 2 inline comments as done.

Test the result in deduct.pass.cpp
Don't rely on deduction guide in from_iterator_iterator.pass.cpp

ldionne requested changes to this revision.Sep 29 2021, 3:02 PM
ldionne added inline comments.
libcxx/include/string_view
281
693
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
10

Actually, I believe @Mordante is correct here. We can remove the XFAIL: apple-clang-12 since libcpp-no-concepts handle it. What you were seeing in https://buildkite.com/llvm-project/libcxx-ci/builds/5619 is a failure to build the library, we were not even getting to this test.

This revision now requires changes to proceed.Sep 29 2021, 3:02 PM
jloser updated this revision to Diff 376052.Sep 29 2021, 3:10 PM

Remove XFAIL: apple-clang-12.0.0
Adjust #ifdefs to require ranges support

jloser marked 5 inline comments as done.Sep 29 2021, 3:12 PM
Mordante accepted this revision as: Mordante.Sep 30 2021, 9:33 AM

LGTM, except for some small details.

libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
35

char8_t should be guarded by #ifndef _LIBCPP_HAS_NO_CHAR8_T.

libcxx/test/std/strings/string.view/string.view.cons/from_iterator_iterator.pass.cpp
23

This function can be a void.

libcxx/test/support/make_string.h
61 ↗(On Diff #376052)

Please align the \ at the end of the line.

jloser updated this revision to Diff 376253.Sep 30 2021, 9:43 AM

Guard use of char8_t
Align MAKE_STRING_VIEW \ macro

jloser marked 2 inline comments as done.Sep 30 2021, 9:45 AM
jloser added inline comments.
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
35

I just added the guard, but can you elaborate on why it's necessary? char8_t is a C++20 type and this is a C++20-or-later constructor we've added in string_view. Given that we support char8_t, I thought it could be used unconditionally without loss of generality. FWIW, CI passed without the guard.

Mordante added inline comments.Sep 30 2021, 10:13 AM
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
35

At the moment all usage of char8_t is guarded by this. Looking at cppreference it seems all our supported compilers implement this feature. So it's indeed no longer required. I'll create a patch to remove _LIBCPP_HAS_NO_CHAR8_T from __config instead.

jloser marked an inline comment as done.Sep 30 2021, 10:19 AM
jloser added inline comments.
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
35

Sounds good to me, thanks!. Are you happy with this PR as it stands then?

jloser updated this revision to Diff 376267.Sep 30 2021, 10:23 AM

Remove _LIBCPP_HAS_NO_CHAR8_t guard as it's not needed

jloser updated this revision to Diff 376268.Sep 30 2021, 10:24 AM
jloser edited the summary of this revision. (Show Details)

[NFC] Update commit message to reflect other paper for generic range constructor

Mordante added inline comments.Sep 30 2021, 11:42 AM
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
35

Yes I'm happy.

jloser updated this revision to Diff 376511.Oct 1 2021, 7:09 AM

Try poking CI

jloser marked an inline comment as done.Oct 1 2021, 7:09 AM
Quuxplusone requested changes to this revision.Oct 1 2021, 8:52 AM
Quuxplusone added a subscriber: cor3ntin.
Quuxplusone added inline comments.
libcxx/include/string_view
88–89

@cor3ntin: The Tony Table at the beginning of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1391r4.pdf claims that after this paper, foo(vec) will compile... but that's just plain false, correct? IIUC, the actual achievement of p1391 was to make foo(vec.begin(), vec.end()) compile; and then http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1989r2.pdf is the one that would actually support foo(vec)?
Just checking, because if p1391 isn't wrong, then I'm super confused.

285

__size(__end - __begin), please. We know the expression is well-formed because we know _End is a sized sentinel for _It.

287

Vice versa, I see no reason for __begin <= __end to be valid. Please add a test case using a non-iterator sentinel type. (I think my SFINAE test below is not sufficient, because it won't instantiate the body of this template, and it's the body that explodes.)

libcxx/test/std/strings/string.view/string.view.cons/from_iterator_iterator.pass.cpp
44

This test file should probably be named from_iterator_sentinel.pass.cpp.
I'd also like to see some SFINAE tests, and some mixed-constness tests. Just these would be fine:

static_assert( std::is_constructible_v<std::string_view, const char*, char*>); // OK
static_assert( std::is_constructible_v<std::string_view, char*, const char*>); // OK
static_assert(!std::is_constructible_v<std::string_view, char*, void*>); // not a sentinel
static_assert(!std::is_constructible_v<std::string_view, signed char*, signed char*>); // wrong char type
static_assert(!std::is_constructible_v<std::string_view, random_access_iterator<char*>, random_access_iterator<char*>>); // not contiguous
static_assert( std::is_constructible_v<std::string_view, contiguous_iterator<char*>, contiguous_iterator<char*>>); // OK
This revision now requires changes to proceed.Oct 1 2021, 8:52 AM
cor3ntin added inline comments.Oct 1 2021, 9:13 AM
libcxx/include/string_view
88–89

Yes. LWG decided to adopt half the paper in 20, so the change was not reflected in the motivation and the other half was done in 23 in P1989

jloser updated this revision to Diff 376881.Oct 4 2021, 6:34 AM
jloser marked 3 inline comments as done.
jloser retitled this revision from [libc++] Implement P1391 to [libc++] Implement P1391 for string_view.
jloser edited the summary of this revision. (Show Details)

Address Arthur's feedback

jloser added inline comments.Oct 4 2021, 6:34 AM
libcxx/include/string_view
285

Oops, we definitely don't want to do the std::distance dance since we have a iterator/sentinel. Just fixed -- thanks!

287

Well, the standard states that [begin, end) is a valid range as a precondition. I noticed we have some inconsistencies with string and string_view when it comes to these precondition checking. For the most part, we don't seem to actually check valid ranges, but we do in std::string::erase (https://github.com/llvm/llvm-project/blob/fd9bc13803ee24bfa674311584126b83e059d756/libcxx/include/string#L3228)

What's the recommended advice here @ldionne? From my perspective, a friendly debug _LIBCPP_ASSERT leads to a higher quality implementation given that the standard states the precondition explicitly.

For consistency, I'll update the message to say (iterator, sentinel) instead of (iterator, iterator), but let's decide if we want to have it at all first.

Mordante added inline comments.Oct 4 2021, 11:26 AM
libcxx/docs/Status/Cxx20Papers.csv
174

This looks wrong. Is it a rebase error?

jloser added inline comments.Oct 4 2021, 12:42 PM
libcxx/docs/Status/Cxx20Papers.csv
174

Mind elaborating? This PR only touches one line in Cxx20Papers.csv which is to mark P1391 as complete.

diff --git a/libcxx/docs/Status/Cxx20Papers.csv b/libcxx/docs/Status/Cxx20Papers.csv
index ee59d657b9f2..c722b9676b4a 100644
--- a/libcxx/docs/Status/Cxx20Papers.csv
+++ b/libcxx/docs/Status/Cxx20Papers.csv
@@ -134,7 +134,7 @@
 "`P1754 <https://wg21.link/P1754>`__","LWG","Rename concepts to standard_case for C++20, while we still can","Cologne","|In Progress|",""
 "","","","","",""
 "`P0883 <https://wg21.link/P0883>`__","LWG","Fixing Atomic Initialization","Belfast","|Complete| [#note-P0883]_","13.0"
-"`P1391 <https://wg21.link/P1391>`__","LWG","Range constructor for std::string_view","Belfast","* *",""
+"`P1391 <https://wg21.link/P1391>`__","LWG","Range constructor for std::string_view","Belfast","|Complete|","14.0"
 "`P1394 <https://wg21.link/P1394>`__","LWG","Range constructor for std::span","Belfast","* *",""
 "`P1456 <https://wg21.link/P1456>`__","LWG","Move-only views","Belfast","* *",""
 "`P1622 <https://wg21.link/P1622>`__","LWG","Mandating the Standard Library: Clause 32 - Thread support library","Belfast","* *",""
jloser updated this revision to Diff 377008.Oct 4 2021, 12:42 PM

Rebase to trigger CI

Mordante added inline comments.Oct 4 2021, 12:57 PM
libcxx/docs/Status/Cxx20Papers.csv
174

Very odd, have a look at this link https://reviews.llvm.org/D110718?vs=376511&id=376881#toc
This is the history between Diff 10 and Diff 11, this was the state when I wrote my comment.
In that view you see a change to P1868. I see the latest diff doesn't have this change.
I've no clue what happened, but it seems fine now.

jloser added inline comments.Oct 4 2021, 1:51 PM
libcxx/docs/Status/Cxx20Papers.csv
174

Hmm, that's weird. It was a clean rebase, so that's certainly odd.

jloser marked 2 inline comments as done.Oct 4 2021, 1:52 PM
Quuxplusone added inline comments.Oct 4 2021, 3:15 PM
libcxx/include/string_view
287

I think either we shouldn't have the assert, or (if we do) we should hide it behind a new macro, something like _LIBCPP_ASSERT_VALID_RANGE(__begin, __end). Then all the magic can go inside that macro and be reused consistently — whether it's requires, or calling _VSTD::distance, or what. Plus (although Louis will hate this idea ;)) anyone who wants ordinary assertions without paying for valid-range assertions can just compile with -D_LIBCPP_ASSERT_VALID_RANGE(x,y)=(void)0.

libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
26

Let's also add auto sv = std::basic_string_view((const CharT*)str.begin(), str.end()); (with different iterator types), which should also come out to std::basic_string_view<CharT>.

Also: Not a blocker necessarily, but I'd strongly prefer to use the auto sv = std::basic_string_view(str.begin(), str.end()); style for line 25 as well. Using the curly-brace sequence-initialization style makes it look as if we're initializing the string_view from a sequence of two elements, and that's not actually what's going on here.

Note that it is physically possible for auto x = foo{1,2} and auto x = foo(1,2) to wind up using different deduction guides (in fact, this happens with foo=std::vector), so if we really want exhaustive tests we should test both. But I think that's overkill. Then, assuming that we're only going to test one of the two syntaxes, I opine strongly that we should test the one people might ever use, which is the round-parens syntax, not the list-initialization syntax.

libcxx/test/std/strings/string.view/string.view.cons/from_iterator_sentinel.pass.cpp
24–30 ↗(On Diff #377008)

I suggest

template<class CharT, class Sentinel>
constexpr void test() {
  auto val = MAKE_STRING_VIEW(CharT, "test");
  auto sv = std::basic_string_view(val.begin(), Sentinel(val.end()));
  ASSERT_SAME_TYPE(decltype(sv), std::basic_string_view<CharT>);
  assert(sv.size() == val.size());
  assert(sv.data() == val.data());
}

and then test with

test<char, char*>();
test<wchar_t, wchar_t*>();
test<char8_t, char8_t*>();
test<char16_t, char16_t*>();
test<char32_t, char32_t*>();
test<char, const char*>();
test<char, sized_sentinel_wrapper<char*>>();

On the last line, sized_sentinel_wrapper would go into test_iterators.h, right after sentinel_wrapper; it would just provide operator- in addition to the sentinel_wrapper stuff... ick, actually, I see several existing tests overloading operator- for sentinel_wrapper in incompatible ways. I'll clean that up in a new PR myself. Needn't block this on that, although, I won't complain if you do. ;)

jloser added inline comments.Oct 4 2021, 4:46 PM
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
26

Switched to round-parens syntax. Sorry, just a habit of using braces everywhere as you can tell.

As for the mixed iterator types, I'm not sure how to make that work with our debug iterators. For example, your proposed test won't work with __wrap_iter:

error: cannot cast from type 'std::basic_string<char>::iterator' (aka '__wrap_iter<char *>') to pointer type 'const char *'
    std::basic_string_view sv((const CharT*)str.begin(), str.end());
jloser updated this revision to Diff 377058.Oct 4 2021, 4:57 PM

Prefer parens over braces
Add mixed iterator test in from_iterator_sentinel.pass.cpp

jloser updated this revision to Diff 377076.Oct 4 2021, 8:49 PM

Use local sentinel in from_iterator_sentinel.pass.cpp

jloser marked 3 inline comments as done.Oct 4 2021, 9:00 PM
jloser added inline comments.
libcxx/include/string_view
287

I'd be in favor of the assert personally, but I don't feel like pushing for it in this PR. I like the idea of factoring it out into a macro as I can see valid range checks being used all over in libc++ preconditions.

I don't imagine many people actually (except @Quuxplusone of course :)) wanting all-but-range-check assertions.

We'll see what @ldionne says, but nothing to block this PR on I'd imagine.

libcxx/test/std/strings/string.view/string.view.cons/from_iterator_sentinel.pass.cpp
24–30 ↗(On Diff #377008)

Added a test using a locally defined sentinel with appropriate operator== and operator-. I played for a bit in test_iterators.h with a sized_sentinel_wrapper. However, things didn't play so nicely due to constness and other things. Perhaps I overlooked something, but I'd say let's refactor this new use among others to use the new sized_sentinel_wrapper in a follow-up PR. I'm interested to see your PR for cleaning up the operator-- usage.

In the refactor, we can clean up the other uses in libcxx/test/std/ranges/range.adaptors/range.common.view/types.h and libcxx/test/std/ranges/range.adaptors/range.take/types.h from a quick look around.

jloser updated this revision to Diff 377077.Oct 4 2021, 9:03 PM
jloser marked 2 inline comments as done.

[NFC] Clean up spacing in string_view constructor

jloser added a comment.Oct 5 2021, 9:34 AM

ping @ldionne since CI is passing. Are you still happy with this?

jloser updated this revision to Diff 377380.Oct 5 2021, 4:21 PM

Introduce sized_sentinel_wrapper and use it in from_iterator_sentinel.pass.cpp

jloser added inline comments.Oct 5 2021, 4:23 PM
libcxx/test/std/strings/string.view/string.view.cons/from_iterator_sentinel.pass.cpp
24–30 ↗(On Diff #377008)

@Quuxplusone just reworked this to use sized_sentinel_wrapper and things worked out nicely this time around.

ldionne accepted this revision.Oct 5 2021, 6:14 PM

LGTM with an assertion added. Also, bonus points for adding a test for the assertion, but that's non-blocking. Thanks a lot for working on this!

libcxx/include/string_view
284

Those are the same, but _LIBCPP_HIDE_FROM_ABI is the preferred name.

287

I think it's pretty important to add these assertions when we can. And it's a lot easier to do it when we write the function than come back later to add it.

I am working on a patch that will decouple assertions and the debug mode, with the mid/short term goal to allow people to ship libc++ with assertions enabled. So even though it's not super useful today, it will soon be.

I have no strong opinion on adding a _LIBCPP_ASSERT_VALID_RANGE macro or doing something like if constexpr (i-can-order-begin-and-end) { _LIBCPP_ASSERT(...); }, but an assertion in *some* form seems like a good idea.

Quuxplusone accepted this revision.Oct 5 2021, 6:28 PM
Quuxplusone added inline comments.
libcxx/include/string_view
287

I strongly prefer a macro over if constexpr-anything.
However, it occurs to me that even though _LIBCPP_ASSERT(__begin <= __end) would be ill-formed, we could still simply do _LIBCPP_ASSERT((__end - __begin) >= 0). I don't object if we want to add that, and I don't think it needs to be hidden behind a macro.

libcxx/test/support/test_iterators.h
919 ↗(On Diff #377380)

If I weren't working on a PR for this, I would have some style comments here; but as I will eventually post a PR to clean this up, I don't object to this for now.
The one thing that would be helpful to do now, if it's easy enough for you, would be to rename sized_sentinel_wrapper<It> to just sized_sentinel<It>. Unfortunately we can't trivially rename sentinel_wrapper<It> to just sentinel<It> because the global name sentinel has already been land-grabbed by a few tests; but we can use the shorter and more consistent name sized_sentinel no problem. (Note that we use e.g. random_access_iterator<It>, not random_access_iterator_wrapper<It>.)

This revision is now accepted and ready to land.Oct 5 2021, 6:28 PM
jloser updated this revision to Diff 377528.Oct 6 2021, 6:48 AM

Add assert for valid range in string_view constructor
Rename sized_sentinel_wrapper to sized_sentinel

jloser marked 4 inline comments as done.Oct 6 2021, 6:58 AM
jloser added inline comments.
libcxx/include/string_view
284

Renamed. I used _LIBCPP_INLINE_VISIBILITY originally since it's what is used in the rest of this file. WDYT about a global s/_LIBCPP_INLINE_VISIBILITY/_LIBCPP_HIDE_FROM_ABI/g? And then we just adjust _LIBCPP_HIDE_FROM_ABI macro definition to not be defined in terms of _LIBCPP_INLINE_VISIBILITY but instead just be

#define _LIBCPP_HIDE_FROM_ABI __attribute__ ((__visibility__("hidden"), __always_inline__))

Then we avoid this issue in the future of having two identifiers to do the same thing.

287

Added an assert with _LIBCPP_ASSERT((__end - __begin) >= 0).

libcxx/test/support/test_iterators.h
919 ↗(On Diff #377380)

Renamed to sized_sentinel. It's unfortunate that sentinel is such a common name used in the range tests so it's off the table without a bit of work for renaming sentinel_wrapper to sentinel.

jloser updated this revision to Diff 377531.Oct 6 2021, 6:58 AM
jloser marked 3 inline comments as done.

s/_LIBCPP_INLINE_VISIBILITY/_LIBCPP_HIDE_FROM_ABI

If BuildKit is happy with this, I'll land it.

Quuxplusone accepted this revision.Oct 6 2021, 7:04 AM
Quuxplusone added inline comments.
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
26

your proposed test won't work with __wrap_iter

You could do

std::basic_string<CharT> str = MAKE_STRING(CharT, "test");
auto sv = std::basic_string_view(std::as_const(str).begin(), str.end());

(or even use .cbegin(); I'm not a fan of .cbegin() but I admit it'd be shorter than as_const(...).begin() in this case).
However, right now you're already testing the deduction guide in from_iterator_sentinel.pass.cpp, so maybe just copy-paste that test over into this file.

jloser updated this revision to Diff 377535.Oct 6 2021, 7:16 AM

Add mixed-const iterator test in deduct.pass.cpp.
Test a sentinel while we're at it in deduct.pass.cpp
Don't rely on deduction guide in from_iterator_sentinel.pass.cpp

jloser marked 2 inline comments as done.Oct 6 2021, 7:18 AM
jloser added inline comments.
libcxx/test/std/strings/string.view/string.view.cons/deduct.pass.cpp
26

Added tests for mixed const iterators in the deduct.pass.cpp by just copying the test function template from from_iterator_sentinel_pass.cpp.

Also changed from_iterator_sentinel.pass.cpp to not rely on the deduction guide. Otherwise, deduct.pass is completely superfluous with the old from_iterator_sentinel.pass.cpp that used the deduction guide.

Quuxplusone accepted this revision.Oct 6 2021, 9:04 AM

@jloser if buildkite is happy, ship it!
FYI when/if you start working on [p1989] string_view's range ctor, note that https://cplusplus.github.io/LWG/issue3581 is relevant. (span's range ctor is already appropriately constrained, so D110503 is still good.)

This revision was automatically updated to reflect the committed changes.