This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Implement P0980R1 (constexpr std::string)

Authored by philnik on Sep 27 2021, 4:48 PM.


Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • Changed split non-/constexpr tests into one and used TEST_IS_CONSTANT_EVALUATED
  • replaced main() with main(int, char**)
  • Updated feature-test macro
philnik added a subscriber: ldionne.

Maybe @ldionne should take a look at this PR too, since it's his paper.

philnik updated this revision to Diff 388153.Nov 18 2021, 4:07 AM

Rebased and added GCC workaround

philnik planned changes to this revision.Nov 18 2021, 4:33 AM
philnik planned changes to this revision.Nov 22 2021, 6:50 AM
philnik updated this revision to Diff 390217.Nov 28 2021, 10:35 AM
  • Rebased
  • check for nullptr
philnik updated this revision to Diff 390219.Nov 28 2021, 11:04 AM

removed XFAIL for GCC

philnik updated this revision to Diff 390222.Nov 28 2021, 12:39 PM
  • Removed debugging stuff for constant evaluation
  • fixed some tests
philnik updated this revision to Diff 390226.Nov 28 2021, 2:35 PM
  • Added check
  • Using __wrap_iter now only during runtime
philnik planned changes to this revision.Nov 28 2021, 3:54 PM

Maybe @ldionne should take a look at this PR too, since it's his paper.

Will do, but I was waiting for the build to be green. Can you ping me (on Discord) when you want me to take a look?

philnik updated this revision to Diff 390438.EditedNov 29 2021, 12:20 PM
  • Rebased onto D114733
  • Make test pass
philnik updated this revision to Diff 390843.Nov 30 2021, 3:58 PM
  • rebased
  • debug iterators should work now
philnik updated this revision to Diff 391005.Dec 1 2021, 6:42 AM
  • Some formatting changes

Please update the PR summary (i.e. the proposed commit message and general info about the patch). Particularly, I'd like to see some paragraph-form text explaining the general approach. I'm sure it's been explained and discussed several times, e.g. on Discord, probably a couple times in this PR's comments... but IIUC the details have also changed over time. An up-to-date description belongs in the PR summary, and ultimately in the commit message. It would even be appropriate to add something to libcxx/docs/DesignDocs/. All of this will benefit the next person to maintain this and/or to implement constexpr vector or constexpr whatever-they-do-in-C++23.

It would help if this were split into relatively orthogonal chunks, most of which could be "NFC." For example:

  • The test_iterators.h diff, which is just adding constexpr to one type, can be landed today AFAIC.
  • The test_macros.h diff is probably unnecessary?
  • A huge number of diffs in <string> are just adding the word _LIBCPP_CONSTEXPR_AFTER_CXX17; since this is all template code where the compiler shouldn't complain about incorrect constexpr keywords, can we land those as a separate patch, before the behavioral changes?
  • Vice versa, can we land the behavioral changes first and then the _LIBCPP_CONSTEXPR_AFTER_CXX17 markings second?
  • Several new member functions: __begin_lifetime, __finish_replace, __free_memory_constexpr, __insert_not_in_range. Can we introduce those first (changing the "shape" of the control flow), and then in a second step change their behavior to include the constexpr-friendly codepaths?
  • The constexpr-ified tests do need to be landed in the same commit as the behavior they test... but for review purposes they are orthogonal/distracting.
  • Even if we land this PR as a monolith, it would still be vastly easier to review in small pieces like that. If you're trying to review the control flow, all the _LIBCPP_CONSTEXPR_AFTER_CXX17 markings are distracting; and vice versa.

I don't find Phab's interface very convenient for splitting up patches in these ways. You can make a bunch of Phab PRs and link them together with dependencies, but really it might be easier to just edit the issue summary to include [the design info, plus] a link to a branch on your own GitHub (I mean a URL like ) and say "Hey reviewers, if you prefer to see this PR broken down into its orthogonal component parts, take a look at this series of commits in GitHub."


_VSTD::construct_at(_VSTD::addressof(__p[__i]), value_type())); for ADL.
On line 1785, please linebreak before void.


The PR summary (and possibly a design doc) should explain why the class invariants need to change in constexpr-land.
But aside from that, I find this diff hard to reason about. After some fiddling with it, I think the cleanest version introduces the concept of "SSO capacity," like this:

size_type __sso_capacity = __libcpp_is_constant_evaluated() ? 0 : (__min_cap - 1);
if (size() > capacity())
    return false;
if (capacity() < __sso_capacity)
    return false;
if (data() == nullptr)
    return false;
if (data()[size()] != value_type())
    return false;
return true;

IIUC, you use this calculation several places in this PR. Consider introducing a function __sso_capacity() for this purpose... except that at first glance I'm worried about any function whose return value changes depending on constexprness. I'm not sure that such a function is safe to use. So maybe pulling it out into a function would be very bad, I'm not sure.

philnik updated this revision to Diff 394399.Dec 14 2021, 4:09 PM
philnik retitled this revision from [libc++] P0980R1 (constexpr std::string) to [libc++] Implement P0980R1 (constexpr std::string).Mar 19 2022, 1:26 PM
philnik edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 1:26 PM
philnik updated this revision to Diff 416734.Mar 19 2022, 1:36 PM

Updates; There are still a few test changes but most are already done.

philnik updated this revision to Diff 416738.Mar 19 2022, 1:58 PM
  • generate files
rriddle removed a subscriber: rriddle.Mar 19 2022, 1:59 PM
philnik updated this revision to Diff 416807.Mar 20 2022, 4:52 PM
  • Initialize union before allocating
philnik updated this revision to Diff 416810.Mar 20 2022, 5:54 PM
  • Fix alternate string layout
philnik updated this revision to Diff 416937.Mar 21 2022, 7:43 AM
  • Try to fix 32bit builds

Thanks for working on this huge task!
In general I'm happy, but I'm not familiar enough with the internals of our string implementation to validate you didn't miss any important parts. Therefore I leave approval to somebody more familiar with string.

(This patch basically touches every function in string and thus "breaks" other patches for string. Would it be an idea to make some cleanup patches after this patch lands? Then we can use _LIBCPP_HIDE_FROM_ABI and std::.)


Please update the synopsis.


The test test/std/strings/basic.string/string.capacity/resize_and_overwrite.pass.cpp can be updated by removing
#if defined(__cpp_lib_constexpr_string) && __cpp_lib_constexpr_string >= 201907L


Nit: I would keep the order LIBCPP_INLINE_VISIBILITY constexpr for consistency with the other changes.


Should these new functions be private members?


Please update the synopsis in all tests.


Why don't you test this in constexpr? The signature has changed in the paper.


Why is this part removed?


These functions are only available in C++20. (The same for ends_with and contains.)

philnik updated this revision to Diff 418040.Mar 24 2022, 2:06 PM
philnik marked 8 inline comments as done.
  • Update synopsis
  • Address comments
philnik added inline comments.Mar 24 2022, 8:20 PM

I'd keep it as-is for now. I don't want to make this patch unnecessarily larger.


This test makes extremely large allocations, like numeric_limits<size_t>::max() large. Because of this the constant evaluations fails.

philnik added inline comments.Mar 24 2022, 8:20 PM

This is duplicated for std::string and std::basic_string<char, std::char_traits<char>, min_allocator<char>>. I just made this a template and removed the duplicated code.

SGTM modulo some small issues.


Then please a test to validate the function can be called compile-time and add comment why we done the minimal testing for compile-time testing.

philnik updated this revision to Diff 419393.Mar 31 2022, 4:05 AM
philnik marked 2 inline comments as done.
  • Address comments
ldionne requested changes to this revision.Apr 4 2022, 9:50 AM

Thanks for the patch! Mostly small comments about forgetting stuff in the tests, however I think we'll need to figure out a better story for the string's invariant inside constexpr. I think we agreed on a direction to investigate during our live review, so let's try that and re-evaluate after. Thanks!

12 ↗(On Diff #391005)

I don't think you intended to upload that.


Please consider not reformatting as part of this patch since it's good to keep it as mechanical as possible.


What if instead of doing this, we made sure to have the invariants of the string hold before calling __init, and then we'd "simply" have to check whether we already have a long string allocated at the beginning of __init?


This line does not look correct to me. Indeed, if __str is a short string, after the change we are accessing __str's long representation instead of its raw representation, which means that we might not be copying some part of the small buffer, if the small buffer is larger than the long representation.

This could happen if we had a very large character type where 2 * sizeof(CharType) is larger than sizeof(LongRepresentation). In that case, our short string buffer will contain two CharTypes at least, yet our long representation will be smaller than that. Please add a test to catch this, or explain why that doesn't work (you mentioned a bug related to large character types during live review).


Why don't we call __default_init() on a moved-from string instead?

Right now, after calling __zero() during constexpr evaluation, the string is invalid to access, thus we don't have the invariant __is_long(). Instead, I think we should make sure that the string satisfies __is_long() at all times during constexpr evaluation, which probably means replacing a few calls to __zero() by calls to __default_init().

This would allow removing the diff above.


Same here -- if the invariant __is_long() held during constexpr evaluation, this nullptr check could go away.


For instance, I think this is one place where we should call __default_init() instead.


I'd like to understand why this diff is needed. What happens if you remove it?


Same question here -- why is this required?


If we never put the string in an invalid state inside constexpr, is it possible to remove this diff entirely?


Could we investigate adding a private constructor that allows passing a capacity to pre-allocate, so this could be implemented like this:

using _String = basic_string<_CharT, _Traits, _Allocator>;
typename _String::size_type __lhs_sz = __lhs.size();
typename _String::size_type __rhs_sz = __rhs.size();
_String __r(_String::__alloc_traits::select_on_container_copy_construction(__lhs.get_allocator()), __capacity_tag{}, __lhs_sz + __rhs_sz);
__r.append(, __lhs_sz);
__r.append(, __rhs_sz);
return __r;

Can we please do this as a separate review? That should be easy to split out.

26–37 ↗(On Diff #419393)

Unless I am mistaken, we don't seem to have any test for basic_string::basic_string() that runs at runtime. This is kind of crazy.

As a different review, can you please move this test to default.pass.cpp, make it run the default constructor at runtime, and prepare it for constexpr-friendliness as well? That way, this diff can look like the other ones where you basically uncomment static_assert(test());.

13 ↗(On Diff #419393)

Same thing here -- we don't have a runtime test for the string destructor outside of the static variable below. Please add one -- this can be done in the same review as the default ctor.

You could use a special allocator and check that deallocate() has been called after the string is destroyed.


I think this needs to change to TEST_STD_VER > 17?

Please also grep for __cpp_lib_constexpr_string after applying your patch just to make sure there aren't any stray ones left where they shouldn't be.


I think you missed those.

I'm also not a big fan of the 2k lines of test here, but this has nothing to do with this review.


You forgot those too! Please grep for // static_assert in the test suite just to make sure you haven't missed other ones.











Wow, 6k lines of test. I'm sure they are all 100% relevant -_-.

14 ↗(On Diff #419393)

I don't think you meant to make this one constexpr.

14 ↗(On Diff #419393)

Same, not constexpr.

14 ↗(On Diff #419393)


14 ↗(On Diff #419393)

Same, and for all probably.


We are missing some constexpr-ification here for sure.


Missed these ones!




Can you try uncommenting this and seeing what happens?


// fails in constexpr because we might be comparing unrelated pointers

This revision now requires changes to proceed.Apr 4 2022, 9:50 AM
philnik updated this revision to Diff 421733.Apr 9 2022, 7:47 AM
philnik marked 31 inline comments as done.
  • Rebased
  • Address comments

I think we agreed that this is OK and we make sure that __init is only ever called inside constructors.


We may compare unrelated pointers with __p + __pos <= __s && __s < __p + __sz.


Same here - we may compare unrelated pointers.


This is D123058 now.

13 ↗(On Diff #419393)

This is now D123129.


Seems to just work.

philnik updated this revision to Diff 421756.Apr 9 2022, 2:34 PM
  • Fix CI
ldionne requested changes to this revision.Apr 13 2022, 8:30 AM
ldionne added a subscriber: daltenty.

Thanks! Still a few comments and questions, but this is looking really good now.


I don't understand why this diff is required when running in constexpr. What happens if you remove it?


Is there any reason why we don't simply always do __r_.first() = __rep()? That would work both in constexpr and non-constexpr, and it seems to be pretty much equivalent:

And when that's done, perhaps we can even remove __zero() altogether?


This needs a comment!


Note to future me: I looked at this and it seems OK, no need to spend another 15 minutes validating.


Can we repace this by a call to __fits_in_sso(__target_capacity)? (or maybe __target_capacity +/- 1, please double-check)


@daltenty The CI is currently failing in pretty bad ways on AIX. The compiler is being reported as IBM Open XL C/C++ for AIX 17.1.0 (5725-C72, 5765-J18), clang version 13.0.0, which we claim to support on our page.

I believe there might be some patches to the constant evaluator on top of Clang 13 that have not been cherry-picked to your internal branch. I suspect there are probably no reasonable workarounds to make std::string work inside constexpr on that compiler without fixing the compiler itself. Hence, I am unsure what's the best course of action here. I think what we should do is either:

  1. Mark those tests as XFAIL temporarily on AIX (LIBCXX-AIX-FIXME). This has the downside that you'll lose coverage for std::string at runtime too. Otherwise,
  2. Introduce a macro and #ifdef out only the constexpr part of the tests on AIX until the compiler is fixed or updated. I'm fine with doing that, but only if we know that it's a temporary state (< 6 months roughly).

Do you have thoughts on that?

Note to @philnik: Since I'm about to go OOO, if there is no resolution for this by EOW, feel free to land this with option (1) above (which is simplest) and we can always do (2) afterwards if that's the preferred solution.

This revision now requires changes to proceed.Apr 13 2022, 8:30 AM
philnik updated this revision to Diff 422649.Apr 13 2022, 2:14 PM
philnik marked 4 inline comments as done.
  • Rebased onto D123580
  • Address comments

Fixed the bug!

ldionne added inline comments.Apr 25 2022, 8:53 AM

Can you explain this?

philnik added inline comments.Apr 25 2022, 9:13 AM

During runtime we never actually allocate __min_cap bytes, so it's OK that we return an even number in this case. It only exists for checking if the string fits in SSO or is in SSO mode AFAICT. (This should probably be changed at some point) During constant evaluation we have to return an odd number to make the allocation size even, otherwise we save garbage in the capacity member. That's why the other way to fix the bug was saving the actual capacity.

ldionne accepted this revision.Apr 26 2022, 12:51 PM

Please rebase onto main to get a clean CI run, and ship it. Thanks!


Ok. As discussed offline, let's land this as-is, but let's also make a follow-up patch that tries to fix places that assume __min_cap - 1 is returned when the string fits in the SSO, so we can remove this if and always return static_cast<size_type>(__min_cap);.

philnik updated this revision to Diff 425299.Apr 26 2022, 2:03 PM
  • Rebased
philnik updated this revision to Diff 425436.Apr 27 2022, 1:11 AM
  • Add a few more XFAILs
This revision was not accepted when it landed; it landed in state Needs Review.Apr 27 2022, 3:25 AM
This revision was automatically updated to reflect the committed changes.