This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Implement P0980R1 (constexpr std::string)
ClosedPublic

Authored by philnik on Sep 27 2021, 4:48 PM.

Details

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • Changed split non-/constexpr tests into one and used TEST_IS_CONSTANT_EVALUATED
  • replaced main() with main(int, char**)
  • Updated feature-test macro
philnik added a subscriber: ldionne.

Maybe @ldionne should take a look at this PR too, since it's his paper.

philnik updated this revision to Diff 388153.Nov 18 2021, 4:07 AM

Rebased and added GCC workaround

philnik planned changes to this revision.Nov 18 2021, 4:33 AM
philnik planned changes to this revision.Nov 22 2021, 6:50 AM
philnik updated this revision to Diff 390217.Nov 28 2021, 10:35 AM
  • Rebased
  • check for nullptr
philnik updated this revision to Diff 390219.Nov 28 2021, 11:04 AM

removed XFAIL for GCC

philnik updated this revision to Diff 390222.Nov 28 2021, 12:39 PM
  • Removed debugging stuff for constant evaluation
  • fixed some tests
philnik updated this revision to Diff 390226.Nov 28 2021, 2:35 PM
  • Added check
  • Using __wrap_iter now only during runtime
philnik planned changes to this revision.Nov 28 2021, 3:54 PM

Maybe @ldionne should take a look at this PR too, since it's his paper.

Will do, but I was waiting for the build to be green. Can you ping me (on Discord) when you want me to take a look?

philnik updated this revision to Diff 390438.EditedNov 29 2021, 12:20 PM
  • Rebased onto D114733
  • Make test pass
philnik updated this revision to Diff 390843.Nov 30 2021, 3:58 PM
  • rebased
  • debug iterators should work now
philnik updated this revision to Diff 391005.Dec 1 2021, 6:42 AM
  • Some formatting changes

Please update the PR summary (i.e. the proposed commit message and general info about the patch). Particularly, I'd like to see some paragraph-form text explaining the general approach. I'm sure it's been explained and discussed several times, e.g. on Discord, probably a couple times in this PR's comments... but IIUC the details have also changed over time. An up-to-date description belongs in the PR summary, and ultimately in the commit message. It would even be appropriate to add something to libcxx/docs/DesignDocs/. All of this will benefit the next person to maintain this and/or to implement constexpr vector or constexpr whatever-they-do-in-C++23.

It would help if this were split into relatively orthogonal chunks, most of which could be "NFC." For example:

  • The test_iterators.h diff, which is just adding constexpr to one type, can be landed today AFAIC.
  • The test_macros.h diff is probably unnecessary?
  • A huge number of diffs in <string> are just adding the word _LIBCPP_CONSTEXPR_AFTER_CXX17; since this is all template code where the compiler shouldn't complain about incorrect constexpr keywords, can we land those as a separate patch, before the behavioral changes?
  • Vice versa, can we land the behavioral changes first and then the _LIBCPP_CONSTEXPR_AFTER_CXX17 markings second?
  • Several new member functions: __begin_lifetime, __finish_replace, __free_memory_constexpr, __insert_not_in_range. Can we introduce those first (changing the "shape" of the control flow), and then in a second step change their behavior to include the constexpr-friendly codepaths?
  • The constexpr-ified tests do need to be landed in the same commit as the behavior they test... but for review purposes they are orthogonal/distracting.
  • Even if we land this PR as a monolith, it would still be vastly easier to review in small pieces like that. If you're trying to review the control flow, all the _LIBCPP_CONSTEXPR_AFTER_CXX17 markings are distracting; and vice versa.

I don't find Phab's interface very convenient for splitting up patches in these ways. You can make a bunch of Phab PRs and link them together with dependencies, but really it might be easier to just edit the issue summary to include [the design info, plus] a link to a branch on your own GitHub (I mean a URL like https://github.com/Quuxplusone/llvm-project/compare/main...trivially-relocatable ) and say "Hey reviewers, if you prefer to see this PR broken down into its orthogonal component parts, take a look at this series of commits in GitHub."

libcxx/include/string
1820–1822

_VSTD::construct_at(_VSTD::addressof(__p[__i]), value_type())); for ADL.
On line 1785, please linebreak before void.

4231–4239

The PR summary (and possibly a design doc) should explain why the class invariants need to change in constexpr-land.
But aside from that, I find this diff hard to reason about. After some fiddling with it, I think the cleanest version introduces the concept of "SSO capacity," like this:

size_type __sso_capacity = __libcpp_is_constant_evaluated() ? 0 : (__min_cap - 1);
if (size() > capacity())
    return false;
if (capacity() < __sso_capacity)
    return false;
if (data() == nullptr)
    return false;
if (data()[size()] != value_type())
    return false;
return true;

IIUC, you use this calculation several places in this PR. Consider introducing a function __sso_capacity() for this purpose... except that at first glance I'm worried about any function whose return value changes depending on constexprness. I'm not sure that such a function is safe to use. So maybe pulling it out into a function would be very bad, I'm not sure.

philnik updated this revision to Diff 394399.Dec 14 2021, 4:09 PM
philnik retitled this revision from [libc++] P0980R1 (constexpr std::string) to [libc++] Implement P0980R1 (constexpr std::string).Mar 19 2022, 1:26 PM
philnik edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 1:26 PM
philnik updated this revision to Diff 416734.Mar 19 2022, 1:36 PM

Updates; There are still a few test changes but most are already done.

philnik updated this revision to Diff 416738.Mar 19 2022, 1:58 PM
  • generate files
rriddle removed a subscriber: rriddle.Mar 19 2022, 1:59 PM
philnik updated this revision to Diff 416807.Mar 20 2022, 4:52 PM
  • Initialize union before allocating
philnik updated this revision to Diff 416810.Mar 20 2022, 5:54 PM
  • Fix alternate string layout
philnik updated this revision to Diff 416937.Mar 21 2022, 7:43 AM
  • Try to fix 32bit builds

Thanks for working on this huge task!
In general I'm happy, but I'm not familiar enough with the internals of our string implementation to validate you didn't miss any important parts. Therefore I leave approval to somebody more familiar with string.

(This patch basically touches every function in string and thus "breaks" other patches for string. Would it be an idea to make some cleanup patches after this patch lands? Then we can use _LIBCPP_HIDE_FROM_ABI and std::.)

libcxx/include/string
77

Please update the synopsis.

1003–1004

The test test/std/strings/basic.string/string.capacity/resize_and_overwrite.pass.cpp can be updated by removing
#if defined(__cpp_lib_constexpr_string) && __cpp_lib_constexpr_string >= 201907L

1466–1467

Nit: I would keep the order LIBCPP_INLINE_VISIBILITY constexpr for consistency with the other changes.

1521

Should these new functions be private members?

libcxx/test/std/strings/basic.string/string.access/at.pass.cpp
12

Please update the synopsis in all tests.

libcxx/test/std/strings/basic.string/string.capacity/max_size.pass.cpp
82 ↗(On Diff #416937)

Why don't you test this in constexpr? The signature has changed in the paper.

libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_pointer_size.pass.cpp
53 ↗(On Diff #416937)

Why is this part removed?

libcxx/test/std/strings/basic.string/string.starts_with/starts_with.char.pass.cpp
19 ↗(On Diff #416937)

These functions are only available in C++20. (The same for ends_with and contains.)

philnik updated this revision to Diff 418040.Mar 24 2022, 2:06 PM
philnik marked 8 inline comments as done.
  • Update synopsis
  • Address comments
philnik added inline comments.Mar 24 2022, 8:20 PM
libcxx/include/string
1466–1467

I'd keep it as-is for now. I don't want to make this patch unnecessarily larger.

libcxx/test/std/strings/basic.string/string.capacity/max_size.pass.cpp
82 ↗(On Diff #416937)

This test makes extremely large allocations, like numeric_limits<size_t>::max() large. Because of this the constant evaluations fails.

philnik added inline comments.Mar 24 2022, 8:20 PM
libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_pointer_size.pass.cpp
53 ↗(On Diff #416937)

This is duplicated for std::string and std::basic_string<char, std::char_traits<char>, min_allocator<char>>. I just made this a template and removed the duplicated code.

SGTM modulo some small issues.

libcxx/test/std/strings/basic.string/string.capacity/max_size.pass.cpp
82 ↗(On Diff #416937)

Then please a test to validate the function can be called compile-time and add comment why we done the minimal testing for compile-time testing.

libcxx/test/std/strings/basic.string/string.contains/contains.char.pass.cpp
19 ↗(On Diff #418040)
philnik updated this revision to Diff 419393.Mar 31 2022, 4:05 AM
philnik marked 2 inline comments as done.
  • Address comments
ldionne requested changes to this revision.Apr 4 2022, 9:50 AM

Thanks for the patch! Mostly small comments about forgetting stuff in the tests, however I think we'll need to figure out a better story for the string's invariant inside constexpr. I think we agreed on a direction to investigate during our live review, so let's try that and re-evaluate after. Thanks!

cross-project-tests/debuginfo-tests/llvm-prettyprinters/gdb/mlir-support.cpp
12 ↗(On Diff #391005)

I don't think you intended to upload that.

libcxx/include/string
918–919

Please consider not reformatting as part of this patch since it's good to keep it as mechanical as possible.

1956–1957

What if instead of doing this, we made sure to have the invariants of the string hold before calling __init, and then we'd "simply" have to check whether we already have a long string allocated at the beginning of __init?

2110

This line does not look correct to me. Indeed, if __str is a short string, after the change we are accessing __str's long representation instead of its raw representation, which means that we might not be copying some part of the small buffer, if the small buffer is larger than the long representation.

This could happen if we had a very large character type where 2 * sizeof(CharType) is larger than sizeof(LongRepresentation). In that case, our short string buffer will contain two CharTypes at least, yet our long representation will be smaller than that. Please add a test to catch this, or explain why that doesn't work (you mentioned a bug related to large character types during live review).

2385–2391

Why don't we call __default_init() on a moved-from string instead?

Right now, after calling __zero() during constexpr evaluation, the string is invalid to access, thus we don't have the invariant __is_long(). Instead, I think we should make sure that the string satisfies __is_long() at all times during constexpr evaluation, which probably means replacing a few calls to __zero() by calls to __default_init().

This would allow removing the diff above.

2596

Same here -- if the invariant __is_long() held during constexpr evaluation, this nullptr check could go away.

2614

For instance, I think this is one place where we should call __default_init() instead.

2924–2930

I'd like to understand why this diff is needed. What happens if you remove it?

3166–3169

Same question here -- why is this required?

4233–4234

If we never put the string in an invalid state inside constexpr, is it possible to remove this diff entirely?

4475–4476

Could we investigate adding a private constructor that allows passing a capacity to pre-allocate, so this could be implemented like this:

using _String = basic_string<_CharT, _Traits, _Allocator>;
typename _String::size_type __lhs_sz = __lhs.size();
typename _String::size_type __rhs_sz = __rhs.size();
_String __r(_String::__alloc_traits::select_on_container_copy_construction(__lhs.get_allocator()), __capacity_tag{}, __lhs_sz + __rhs_sz);
__r.append(__lhs.data(), __lhs_sz);
__r.append(__rhs.data(), __rhs_sz);
return __r;

Can we please do this as a separate review? That should be easy to split out.

libcxx/test/std/strings/basic.string/string.cons/default_noexcept.pass.cpp
26–37 ↗(On Diff #419393)

Unless I am mistaken, we don't seem to have any test for basic_string::basic_string() that runs at runtime. This is kind of crazy.

As a different review, can you please move this test to default.pass.cpp, make it run the default constructor at runtime, and prepare it for constexpr-friendliness as well? That way, this diff can look like the other ones where you basically uncomment static_assert(test());.

libcxx/test/std/strings/basic.string/string.cons/dtor_noexcept.pass.cpp
13 ↗(On Diff #419393)

Same thing here -- we don't have a runtime test for the string destructor outside of the static variable below. Please add one -- this can be done in the same review as the default ctor.

You could use a special allocator and check that deallocate() has been called after the string is destroyed.

libcxx/test/std/strings/basic.string/string.iterators/iterators.pass.cpp
81 ↗(On Diff #419393)

I think this needs to change to TEST_STD_VER > 17?

Please also grep for __cpp_lib_constexpr_string after applying your patch just to make sure there aren't any stray ones left where they shouldn't be.

libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_T_size_size.pass.cpp
1867–1898 ↗(On Diff #419393)

I think you missed those.

I'm also not a big fan of the 2k lines of test here, but this has nothing to do with this review.

libcxx/test/std/strings/basic.string/string.modifiers/string_insert/size_string_size_size.pass.cpp
1827–1857 ↗(On Diff #419393)

You forgot those too! Please grep for // static_assert in the test suite just to make sure you haven't missed other ones.

libcxx/test/std/strings/basic.string/string.modifiers/string_replace/iter_iter_iter_iter.pass.cpp
1060–1069 ↗(On Diff #419393)

Here!

libcxx/test/std/strings/basic.string/string.modifiers/string_replace/iter_iter_pointer_size.pass.cpp
995–1006 ↗(On Diff #419393)

Here!

libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_T_size_size.pass.cpp
6079 ↗(On Diff #419393)

Here!

libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_pointer_size.pass.cpp
1340 ↗(On Diff #419393)

Here!

libcxx/test/std/strings/basic.string/string.modifiers/string_replace/size_size_string_size_size.pass.cpp
6069 ↗(On Diff #419393)

Here!

Wow, 6k lines of test. I'm sure they are all 100% relevant -_-.

libcxx/test/std/strings/basic.string/string.nonmembers/string.io/get_line.pass.cpp
14 ↗(On Diff #419393)

I don't think you meant to make this one constexpr.

libcxx/test/std/strings/basic.string/string.nonmembers/string.io/get_line_delim.pass.cpp
14 ↗(On Diff #419393)

Same, not constexpr.

libcxx/test/std/strings/basic.string/string.nonmembers/string.io/get_line_delim_rv.pass.cpp
14 ↗(On Diff #419393)

Same!

libcxx/test/std/strings/basic.string/string.nonmembers/string.io/get_line_rv.pass.cpp
14 ↗(On Diff #419393)

Same, and for all string.io probably.

libcxx/test/std/strings/basic.string/string.ops/string_compare/size_size_T_size_size.pass.cpp
5955 ↗(On Diff #419393)

We are missing some constexpr-ification here for sure.

libcxx/test/std/strings/basic.string/string.ops/string_compare/size_size_pointer_size.pass.cpp
1335 ↗(On Diff #419393)

Missed these ones!

libcxx/test/std/strings/basic.string/string.ops/string_compare/size_size_string_size_size.pass.cpp
6041 ↗(On Diff #419393)

Missed!

libcxx/test/std/strings/basic.string/string.ops/string_find.last.not.of/string_view_size.pass.cpp
154–157 ↗(On Diff #419393)

Can you try uncommenting this and seeing what happens?

libcxx/test/support/constexpr_char_traits.h
121 ↗(On Diff #419393)

// fails in constexpr because we might be comparing unrelated pointers

This revision now requires changes to proceed.Apr 4 2022, 9:50 AM
philnik updated this revision to Diff 421733.Apr 9 2022, 7:47 AM
philnik marked 31 inline comments as done.
  • Rebased
  • Address comments
libcxx/include/string
1956–1957

I think we agreed that this is OK and we make sure that __init is only ever called inside constructors.

2924–2930

We may compare unrelated pointers with __p + __pos <= __s && __s < __p + __sz.

3166–3169

Same here - we may compare unrelated pointers.

4475–4476

This is D123058 now.

libcxx/test/std/strings/basic.string/string.cons/dtor_noexcept.pass.cpp
13 ↗(On Diff #419393)

This is now D123129.

libcxx/test/std/strings/basic.string/string.ops/string_find.last.not.of/string_view_size.pass.cpp
154–157 ↗(On Diff #419393)

Seems to just work.

philnik updated this revision to Diff 421756.Apr 9 2022, 2:34 PM
  • Fix CI
ldionne requested changes to this revision.Apr 13 2022, 8:30 AM
ldionne added a subscriber: daltenty.

Thanks! Still a few comments and questions, but this is looking really good now.

libcxx/include/string
1591–1592

I don't understand why this diff is required when running in constexpr. What happens if you remove it?

1624

Is there any reason why we don't simply always do __r_.first() = __rep()? That would work both in constexpr and non-constexpr, and it seems to be pretty much equivalent: https://godbolt.org/z/T85f13ozT

And when that's done, perhaps we can even remove __zero() altogether?

1795–1796

This needs a comment!

3166–3168

Note to future me: I looked at this and it seems OK, no need to spend another 15 minutes validating.

3543

Can we repace this by a call to __fits_in_sso(__target_capacity)? (or maybe __target_capacity +/- 1, please double-check)

libcxx/test/support/constexpr_char_traits.h
130 ↗(On Diff #421756)

@daltenty The CI is currently failing in pretty bad ways on AIX. The compiler is being reported as IBM Open XL C/C++ for AIX 17.1.0 (5725-C72, 5765-J18), clang version 13.0.0, which we claim to support on our page.

I believe there might be some patches to the constant evaluator on top of Clang 13 that have not been cherry-picked to your internal branch. I suspect there are probably no reasonable workarounds to make std::string work inside constexpr on that compiler without fixing the compiler itself. Hence, I am unsure what's the best course of action here. I think what we should do is either:

  1. Mark those tests as XFAIL temporarily on AIX (LIBCXX-AIX-FIXME). This has the downside that you'll lose coverage for std::string at runtime too. Otherwise,
  2. Introduce a macro and #ifdef out only the constexpr part of the tests on AIX until the compiler is fixed or updated. I'm fine with doing that, but only if we know that it's a temporary state (< 6 months roughly).

Do you have thoughts on that?

Note to @philnik: Since I'm about to go OOO, if there is no resolution for this by EOW, feel free to land this with option (1) above (which is simplest) and we can always do (2) afterwards if that's the preferred solution.

This revision now requires changes to proceed.Apr 13 2022, 8:30 AM
philnik updated this revision to Diff 422649.Apr 13 2022, 2:14 PM
philnik marked 4 inline comments as done.
  • Rebased onto D123580
  • Address comments
libcxx/include/string
1591–1592

Fixed the bug!

ldionne added inline comments.Apr 25 2022, 8:53 AM
libcxx/include/string
1641–1643

Can you explain this?

philnik added inline comments.Apr 25 2022, 9:13 AM
libcxx/include/string
1641–1643

During runtime we never actually allocate __min_cap bytes, so it's OK that we return an even number in this case. It only exists for checking if the string fits in SSO or is in SSO mode AFAICT. (This should probably be changed at some point) During constant evaluation we have to return an odd number to make the allocation size even, otherwise we save garbage in the capacity member. That's why the other way to fix the bug was saving the actual capacity.

ldionne accepted this revision.Apr 26 2022, 12:51 PM

Please rebase onto main to get a clean CI run, and ship it. Thanks!

libcxx/include/string
1641–1643

Ok. As discussed offline, let's land this as-is, but let's also make a follow-up patch that tries to fix places that assume __min_cap - 1 is returned when the string fits in the SSO, so we can remove this if and always return static_cast<size_type>(__min_cap);.

philnik updated this revision to Diff 425299.Apr 26 2022, 2:03 PM
  • Rebased
  • XFAIL AIX
philnik updated this revision to Diff 425436.Apr 27 2022, 1:11 AM
  • Add a few more XFAILs
This revision was not accepted when it landed; it landed in state Needs Review.Apr 27 2022, 3:25 AM
This revision was automatically updated to reflect the committed changes.