This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/include/
-
include/
22/22
string

Differential D123580

[libc++] Use bit field for checking if string is in long or short mode
ClosedPublic

Authored by philnik on Apr 12 2022, 2:46 AM.

Download Raw Diff

Details

Reviewers

ldionne
Mordante

Group Reviewers

Restricted Project
Restricted Project

Commits

rG29c8c070a177: [libc++] Use bit field for checking if string is in long or short mode

Summary

This makes the code a bit simpler and (I think) removes the undefined behaviour from the normal string layout.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	14,930 ms	libcxx CI GCC 11 / C++latest > llvm-libc++-shared-gcc-cfg-in.libcxx/gdb::gdb_pretty_printer_test.sh.cpp

Event Timeline

philnik created this revision.Apr 12 2022, 2:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 2:46 AM

philnik requested review of this revision.Apr 12 2022, 2:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 2:46 AM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added a subscriber: libcxx-commits. · View Herald Transcript

Harbormaster completed remote builds in B159179: Diff 422154.Apr 12 2022, 3:04 AM

Fix pretty printers

Herald added a subscriber: arichardson. · View Herald TranscriptApr 12 2022, 5:14 AM

Harbormaster completed remote builds in B159207: Diff 422187.Apr 12 2022, 7:47 AM

In general I like this direction, but I'm somewhat concerned regarding ABI breaks.

libcxx/include/string
681	Do we have a test that validates the size of this struct? On some platforms (IIRC AIX) the members `__cap_` and `__is_long_` won't be stored in one `size_type` since they have different types.
681	Did you verify this isn't an ABI break when libc++14 and libc++15 compiled code is mixed? We should make sure that doesn't break an big and little endian.

Mordante added inline comments.Apr 12 2022, 8:45 AM

libcxx/include/string
696	Since you now understand how this works, it would be nice to add some documentation.
703	Is this a copy paste?
1486	Mainly curious, but why not a `_LIBCPP_ASSERT`?
libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Does this also break the LLDB pretty printer?

philnik marked 2 inline comments as done.Apr 12 2022, 9:10 AM

philnik added inline comments.

libcxx/include/string
681	https://godbolt.org/z/axTMxqaj3 is the code I used to verify that everything is OK. I think this should be good enough to say that they are equivalent. Do you know if there is any documentation about the bit field layout on different platforms?
703	No, depending on the string layout the factors have to be reversed. It's essentially: If you use the LSB for `is_long` you have to multiply by two, otherwise keep the same number.
1486	This ensures optimal code generation. GCC doesn't have `__builtin_assume`. This is a workaround.
libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Probably. Would be nice to have a test runner for that.

Mordante added inline comments.Apr 12 2022, 9:44 AM

libcxx/include/string
681	Good to know. BTW it's not AIX but win32 where the different types are an issue. (AIX had something else with bitfields) https://godbolt.org/z/n3zr5v4sr The different types make the struct 32 bytes on Windows. I don't know where the information regarding the bit field layout can be found. (I guess Google or the LLVM sources can help out.)
703	I see, I missed the `#else` on line 701.
1486	From a maintainability PoV I like `_LIBCPP_ASSERT` better. Would it be an option to adjust `_LIBCPP_ASSERT?` the "generate" this code when using GCC?
libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	I already planned to look into that, D97044#3440904 ;-)

ldionne added inline comments.Apr 12 2022, 11:15 AM

libcxx/include/string
696	Indeed -- it would be awesome if you could add some documentation explaining how these representations work, and that may also help in catching potential issues with this patch.

philnik marked 7 inline comments as done.Apr 12 2022, 2:06 PM

philnik added inline comments.

libcxx/include/string
1486	I agree that `_LIBCPP_ASSERT` would be nicer to read. The problem is that using this trick for `_LIBCPP_ASSERT` could lead to extra code generation in more complex scenarios. We would have to introduce something else like `_LIBCPP_ASSUME`, which should only be used for trivial cases. But for two instances I don't think this is justified.
libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Do you know where I would have to look to know what the LLDB pretty printers do?

Address comments; Try to fix CI

Harbormaster completed remote builds in B159322: Diff 422330.Apr 12 2022, 3:40 PM

I really like this cleanup, I think it makes it much easier to understand what's going on. I do have some comments, but this is looking pretty good.

libcxx/include/string
695–700	IMO this adds a bit of clarity.
1485	I think I understand that you based the bound on the number representable by the `__size_` bitfield. We could also establish it based on the maximum capacity of the small string buffer by using `__s >= __min_cap` instead. (`>=` because of the null terminator).
1486	Can you try again with `_LIBCPP_ASSERT` by making `__libcpp_assertion_handler` `noreturn`? This is how GCC implements their own `assert(...)`, so presumably there might be some optimization opportunities for them there. I think this would be much better than using `unreachable()`. Edit: Yuck, this indeed won't help when assertions are disabled on GCC. However, looking at the codegen, it appears that the only difference is that the compiler adds a single instruction to ensure that `__s <= 127`. Unless you can show us a case where there is a significant difference, I think the way to go is to use `_LIBCPP_ASSERT` and for GCC to be able to pick it up.
1493–1494	Same comment about using `_LIBCPP_ASSERT` -- unless we have a very convincing reason not to use it, let's use it.
3215	After discussing it, I think we agree this is a bit easier to digest.
3219	I see the CI is failing on backdeployment macOS arm64. You just explained to me that this was caused by a bug where we used to return an incorrect `max_size()` in Big Endian -- the test would expect that we throw `length_error`, but we don't and instead we try allocating, which leads to `bad_alloc` instead. This patch fixes that, but since `max_size()` has been inlined into the dylib, the test will fail in backdeployment. Since it seems that we are fixing a bug as part of this patch, let's add a regression test for it. We can then mark that test (and the failing `over_max_size.pass.cpp`) as `// XFAIL: use_system_cxx_lib && target=arm64{{.+}}-apple-macosx{{11.0\|12.0}}`, with an appropriate comment.
libcxx/utils/gdb/libcxx/printers.py
1 ↗	(On Diff #422330)	@saugustine The pretty-printer tests have been disabled on recent Clang versions since D118067. It would be nice to fix them.

This revision now requires changes to proceed.Apr 13 2022, 9:35 AM

philnik marked 7 inline comments as done.Apr 13 2022, 1:52 PM

philnik added inline comments.

libcxx/include/string
3219	I don't think we have to mark the new test as `XFAIL` because it doesn't use the dylib version.

Address comments

philnik added a child revision: D110598: [libc++] Implement P0980R1 (constexpr std::string).Apr 13 2022, 2:13 PM

philnik mentioned this in D110598: [libc++] Implement P0980R1 (constexpr std::string).Apr 13 2022, 2:14 PM

Use the correct max_size

LGTM with fixed test and passing CI.

Note that this is really tricky. The code looks correct upon inspection, and I think we should catch pretty much any issue via CI. However, if weird stuff starts happening with std::string in the upcoming few weeks (when I'm OOO), this should be the number one suspect, and it should be considered fine to revert this in case we suspect std::string is broken.

libcxx/test/libcxx/strings/basic.string/string.capacity/max_size.pass.cpp
10 ↗	(On Diff #422651)	Please add a short comment explaining what you're testing in this test. We already have a `max_size.pass.cpp` file in `test/std`, so how is this one different? The comment should address that.
25–27 ↗	(On Diff #422651)	Instead of those `#ifdefs`, I think we should instead be checking for specific platforms/architectures. Otherwise, this test informs us less about the behavior that we expect on specific platforms, and more about what the library itself does. So for example, if we changed the library in a way that broke behavior on a given platform, this test could keep passing as long as we were consistent w.r.t. `_LIBCPP_BIG_ENDIAN` and `_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT`. So instead, I suggest going for something like #if defined(__APPLE__) && defined(__aarch64__) // test behavior for alternate string layout #elif ... // test behavior for normal string layout #endif I don't think it should be too hard to come up with those `#ifdef`s to satisfy the CI.
libcxx/test/std/strings/basic.string/string.capacity/over_max_size.pass.cpp
11 ↗	(On Diff #422651)	Please add a comment explaining why you're adding this one. Something like // Prior to http://llvm.org/D123580, there was a bug with how the max_size() // was calculated. That was inlined into some functions in the dylib, which leads // to failures when running this test against an older system dylib. // XFAIL: use_system_cxx_lib && target=arm64{{.+}}-apple-macosx{{11.0\|12.0}} I don't know why the above `XFAIL` line doesn't have a comment, but one should have been added when we `XFAIL`ed it. Of course you don't need to add one, but that's the spirit.

This revision is now accepted and ready to land.Apr 13 2022, 3:15 PM

Harbormaster completed remote builds in B159550: Diff 422651.Apr 13 2022, 4:22 PM

LGTM, with some minor remarks. I think it would be good to wait for feedback from the LLDB devs to validate whether this breaks the data formatters.

libcxx/include/string
707	Thanks for the description!
1485
libcxx/test/libcxx/strings/basic.string/string.capacity/max_size.pass.cpp
20 ↗	(On Diff #422651)	For completeness please add tests for `std::wstring` and `std::u8string`, both properly guarded. Note that the `sizeof(wchar_t)` can be 16 or 32-bit.
libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Unfortunately no. @jingham seems to be the Data formatter code owner.

jgorbe added a subscriber: jgorbe.Apr 14 2022, 1:59 PM

jgorbe added inline comments.Apr 14 2022, 2:01 PM

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	There was a recent lldb change fixing prettyprinters after a similar change to string: https://github.com/llvm/llvm-project/commit/45428412fd7c9900d3d6ac9803aa7dcf6adfa6fe If the gdb prettyprinter needed fixing for this change, chances are that lldb will need a similar update too.

Address comments

Herald added a subscriber: krytarowski. · View Herald TranscriptApr 16 2022, 1:30 PM

philnik added a reviewer: Restricted Project.Apr 16 2022, 1:30 PM

philnik added inline comments.

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Could someone from #lldb help me figure out what to change in the pretty printers? I looked at the file, but I don't really understand how it works and TBH I don't really feel like spending a lot of time figuring it out. If nobody says anything I'll land this in a week. As a side note: it would be really nice if there were a few more comments inside `LibCxx.cpp` to explain what happens there. That would make fixing the pretty printer a lot easier. The code is probably not very hard (at least it doesn't look like it), but I am looking for a few things that I can't find and I have no idea what some of the things mean.

philnik added a project: Restricted Project.Apr 16 2022, 1:31 PM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptApr 16 2022, 1:31 PM

Harbormaster completed remote builds in B159949: Diff 423260.Apr 16 2022, 2:57 PM

Fix CI

Harbormaster completed remote builds in B159953: Diff 423266.Apr 16 2022, 6:40 PM

Add Windows

Harbormaster completed remote builds in B159994: Diff 423316.Apr 17 2022, 3:58 PM

dblaikie added a subscriber: dblaikie.Apr 18 2022, 10:21 AM

dblaikie added inline comments.

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Looks like something around https://github.com/llvm/llvm-project/blob/2e6ac54cf48aa04f7b05c382c33135b16d3f01ea/lldb/source/Plugins/Language/CPlusPlus/LibCxx.cpp#L597 (& the similar masking in the `else` block a few lines down) - I guess a specific lookup for the new field would be needed, rather than the bitmasking.

philnik added inline comments.Apr 19 2022, 3:45 AM

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Yes, but what do the numbers in `size_mode_locations` mean? Why is there no checking if it's big or little endian? Is it unsupported maybe? Does it work because of something else? Is there a reason that `g_data_name` exists instead of comparing directly? Should I add another layout style or should I just update the code for the new layout? I don't know anything about the LLDB codebase, so I don't understand the code and I don't know how I should change it.

Try to fix Windows

labath added a subscriber: labath.Apr 20 2022, 3:49 AM

labath added inline comments.

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	I don't think there's been any official policy decision either way, but historically we haven't been asking libc++ authors to update lldb pretty printers -- we would just fix them up on the lldb side when we noticed the change. The thing that has changed recently is that google started relying (and testing) more on lldb, which considerably shortened the time it takes to notice this change, and also makes it difficult for some people to make progress while we are in this state. But I don't think that means that updating the pretty printer is suddenly your responsibility. As for your questions, I'll try to answer them as best as I can: what do the numbers in size_mode_locations mean? These are the indexes of fields in the string object. For some reason (unknown to me), the pretty printer uses indexes rather than field names for its work. Prompted by the previous patch, I've been trying to change that, but I haven't done it yet, as I was trying to improve the testing story (more on that later). Why is there no checking if it's big or little endian? Is it unsupported maybe? Most likely yes. Although most parts of lldb support big endian, I am not aware of anyone testing it on a regular basis, so it's quite likely that a lot of things are in fact broken. Is there a reason that g_data_name exists instead of comparing directly? LLDB uses a global string pool, so this is an attempt to reduce the number of string pool queries. The pattern is not consistently used everywhere, and overall, I wouldn't be too worried about it. Should I add another layout style or should I just update the code for the new layout? As the pretty printers ship with lldb, they are expected to support not just the current format, but also the past ones (within reason). This is what makes adding a new format (or just refactoring the existing code) difficult, and it's why I was trying to come up with better tests for this (it remains to be seen if I am successful). Anyway, I think I should be able to make that pretty printer work with this patch. I should have something today or tomorrow, if you're ok with waiting that long.

Harbormaster completed remote builds in B160405: Diff 423841.Apr 20 2022, 3:56 AM

philnik added inline comments.Apr 20 2022, 9:11 AM

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Thanks for the answers! I think that it wouldn't be that hard for us to update the pretty printers if we have some test coverage and documentation for it. For now, is there any person/group we should ping if we suspect that we break the pretty printers? I'll wait a few days. It's not that important to land this patch soon.

labath mentioned this in rG1056c56786c1: [lldb] Adjust libc++ string formatter for changes in D123580.Apr 21 2022, 5:18 AM

This revision was landed with ongoing or failed builds.Apr 21 2022, 5:20 AM

Closed by commit rG29c8c070a177: [libc++] Use bit field for checking if string is in long or short mode (authored by philnik). · Explain Why

This revision was automatically updated to reflect the committed changes.

philnik added a commit: rG29c8c070a177: [libc++] Use bit field for checking if string is in long or short mode.

ldionne added inline comments.Apr 25 2022, 9:31 AM

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	The situation with pretty printers has been a source of frustration for the 4 years I've worked on libc++. I have been reaching out to various LLDB folks to get help setting up pre-commit CI for the LLDB pretty-printers in libc++'s own pipeline so that we can detect breakages in advance, but this has not been conclusive so far. @labath @jgorbe Would you be willing to help us set up a CI job that runs the LLDB pretty printers (and only that) in our pre-commit CI infrastructure? We have the machines and all the infrastructure in place. We just need the right CMake + `lit` invocations. Also CC @Mordante , since he had been investigating that IIRC. If we could notice breakages in advance, we could fix the pretty printers in the same patch where we make a change to libc++. We could call out for help from LLDB folks when needed. This would be a much smoother experience for everyone -- we would not need to revert our patches ever, and the LLDB folks would not be broken by changes that come out of the blue (as far as they are concerned).

Mordante added inline comments.Apr 26 2022, 8:40 AM

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	I've indeed been working on that, but not managed yet. I can build lldb, but the dataformatter tests return `UNSUPPORTED`. I haven't had time to investigate this further. I hope to find some time soon. But if @labath or @jgorbe have hints how to do this I would be interested to know. Alternatively what's the best way to contact you Discourse or Discord?

labath added inline comments.Apr 28 2022, 6:09 AM

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	I'm sorry about the delay. The lldb-libc++ integration is broken in several ways (different ways on different platforms), so I'm not all that surprised that it's not working for you. I don't really consider the data formatters my responsibility so I only go near them when I really have to. Still, I agree that this is not a good situation to be in, and the offer of offloading the data formatter work to the libc++ team is definitely appealing. So, I'll try to find some time to make that happen. I'm sorry I can't promise anything specific.

Mordante added inline comments.Apr 28 2022, 8:08 AM

libcxx/utils/gdb/libcxx/printers.py
192 ↗	(On Diff #422187)	Thanks for the information. The libc++ CI uses a Ubuntu 20.04 Docker image on amd64.

hubert.reinterpretcast mentioned this in D125496: [libc++] Simplify the string structures a bit more.Jun 11 2022, 11:23 AM

xingxue mentioned this in D128285: [libc++][AIX] Make basic_string layout compatible with earlier version.Jun 21 2022, 9:00 AM

xingxue mentioned this in rG60f7bdfd0317: [libc++][AIX] Make basic_string layout compatible with earlier version.Jun 24 2022, 2:26 PM

mib mentioned this in D128694: [lldb/Dataformatters] Adapt C++ std::string dataformatter for D128285.Jun 27 2022, 6:02 PM

Revision Contents

Path

Size

libcxx/

include/

string

113 lines

Diff 422154

libcxx/include/string

Show First 20 Lines • Show All 526 Lines • ▼ Show 20 Lines

#include <__debug> #include <__debug>

#include <__format/enable_insertable.h> #include <__format/enable_insertable.h>

#include <__ios/fpos.h> #include <__ios/fpos.h>

#include <__iterator/wrap_iter.h> #include <__iterator/wrap_iter.h>

#include <__memory/allocate_at_least.h> #include <__memory/allocate_at_least.h>

#include <__utility/auto_cast.h> #include <__utility/auto_cast.h>

#include <__utility/move.h> #include <__utility/move.h>

#include <__utility/swap.h> #include <__utility/swap.h>

#include <__utility/unreachable.h>

#include <climits>

#include <compare> #include <compare>

#include <cstdio> // EOF #include <cstdio> // EOF

#include <cstdlib> #include <cstdlib>

#include <cstring> #include <cstring>

#include <initializer_list> #include <initializer_list>

#include <iosfwd> #include <iosfwd>

#include <iterator> #include <iterator>

#include <limits>

#include <memory> #include <memory>

#include <stdexcept> #include <stdexcept>

#include <string_view> #include <string_view>

#include <type_traits> #include <type_traits>

#include <version> #include <version>

// TODO: remove these headers // TODO: remove these headers

#include <__functional/binary_function.h> #include <__functional/binary_function.h>

▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines

private: private:

#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT #ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

struct __long struct __long

{ {

pointer __data_; pointer __data_;

size_type __size_; size_type __size_;

size_type __cap_; size_type __cap_ : sizeof(size_type) * CHAR_BIT - 1;

bool __is_long_ : 1;

MordanteUnsubmitted

Done

size_type __cap_ : sizeof(size_type) * CHAR_BIT - 1;

- bool __is_long_ : 1;

+ size_type __is_long_ : 1;

};

enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?

Do we have a test that validates the size of this struct?

On some platforms (IIRC AIX) the members __cap_ and __is_long_ won't be stored in one size_type since they have different types.

Mordante: Do we have a test that validates the size of this struct? On some platforms (IIRC AIX) the…

MordanteUnsubmitted

Done

Did you verify this isn't an ABI break when libc++14 and libc++15 compiled code is mixed?
We should make sure that doesn't break an big and little endian.

Mordante: Did you verify this isn't an ABI break when libc++14 and libc++15 compiled code is mixed? We…

philnikAuthorUnsubmitted

Done

https://godbolt.org/z/axTMxqaj3 is the code I used to verify that everything is OK. I think this should be good enough to say that they are equivalent. Do you know if there is any documentation about the bit field layout on different platforms?

philnik: https://godbolt.org/z/axTMxqaj3 is the code I used to verify that everything is OK. I think…

MordanteUnsubmitted

Done

Good to know.
BTW it's not AIX but win32 where the different types are an issue. (AIX had something else with bitfields)
https://godbolt.org/z/n3zr5v4sr The different types make the struct 32 bytes on Windows.

I don't know where the information regarding the bit field layout can be found.
(I guess Google or the LLVM sources can help out.)

Mordante: Good to know. BTW it's not AIX but win32 where the different types are an issue. (AIX had…

}; };

#ifdef _LIBCPP_BIG_ENDIAN

static const size_type __short_mask = 0x01;

static const size_type __long_mask = 0x1ul;

#else // _LIBCPP_BIG_ENDIAN

static const size_type __short_mask = 0x80;

static const size_type __long_mask = ~(size_type(~0) >> 1);

#endif // _LIBCPP_BIG_ENDIAN

enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ? enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?

(sizeof(__long) - 1)/sizeof(value_type) : 2}; (sizeof(__long) - 1)/sizeof(value_type) : 2};

struct __short struct __short

{ {

value_type __data_[__min_cap]; value_type __data_[__min_cap];

unsigned char __padding[sizeof(value_type) - 1]; unsigned char __padding[sizeof(value_type) - 1];

unsigned char __size_; unsigned char __size_ : 7;

unsigned char __is_long_ : 1;

}; };

#ifdef _LIBCPP_BIG_ENDIAN

static const size_type __endian_factor = 2;

MordanteUnsubmitted

Done

Since you now understand how this works, it would be nice to add some documentation.

Mordante: Since you now understand how this works, it would be nice to add some documentation.

ldionneUnsubmitted

Done

Indeed -- it would be awesome if you could add some documentation explaining how these representations work, and that may also help in catching potential issues with this patch.

ldionne: Indeed -- it would be awesome if you could add some documentation explaining how these…

#else

static const size_type __endian_factor = 1;

#endif

ldionneUnsubmitted

Done

unsigned char __is_long_ : 1;

};

- // The __endian_factor is required, because the size_type has now one bit less space.

- // If the LSB is used in short string mode we have to multiply the size by two when the it is stored and

- // divide by two when it is loaded. In long string mode we ignore it,

- // because we can assume that we always allocate an even amount of value_types.

- // If the MSB is used the max_size() is numeric_limits<size_type>::max() / 2.

- // If the factor is 2 the LSB is used and if it is 1 the MSB is used.

+ // The __endian_factor is required because the field we use to store the size

+ // (either size_type or unsigned char depending on long/short) has one fewer

+ // bit than it would if it were not a bitfield.

+ //

+ // If the LSB is used to store the short-flag in the short string representation,

+ // we have to multiply the size by two when it is stored and divide it by two when

+ // it is loaded to make sure that we always store an even number. In the long string

+ // representation, we can ignore this because we can assume that we always allocate

+ // an even amount of value_types.

+ //

+ // If the MSB is used for the short-flag, the max_size() is numeric_limits<size_type>::max() / 2.

+ // This doesn't impact the short string representation, since we never need the MSB

+ // for representing the size of a short string anyway.

#ifdef _LIBCPP_BIG_ENDIAN

IMO this adds a bit of clarity.

ldionne: IMO this adds a bit of clarity.

#else #else

#ifdef _LIBCPP_BIG_ENDIAN

MordanteUnsubmitted

Done

Is this a copy paste?

Mordante: Is this a copy paste?

philnikAuthorUnsubmitted

Done

No, depending on the string layout the factors have to be reversed. It's essentially: If you use the LSB for is_long you have to multiply by two, otherwise keep the same number.

philnik: No, depending on the string layout the factors have to be reversed. It's essentially: If you…

MordanteUnsubmitted

Done

I see, I missed the #else on line 701.

Mordante: I see, I missed the `#else` on line 701.

static const size_type __endian_factor = 1;

#else

static const size_type __endian_factor = 2;

#endif

MordanteUnsubmitted

Done

Thanks for the description!

Mordante: Thanks for the description!

struct __long struct __long

{ {

size_type __cap_; bool __is_long_ : 1;

size_type __cap_ : sizeof(size_type) * CHAR_BIT - 1;

size_type __size_; size_type __size_;

pointer __data_; pointer __data_;

}; };

#ifdef _LIBCPP_BIG_ENDIAN

static const size_type __short_mask = 0x80;

static const size_type __long_mask = ~(size_type(~0) >> 1);

#else // _LIBCPP_BIG_ENDIAN

static const size_type __short_mask = 0x01;

static const size_type __long_mask = 0x1ul;

#endif // _LIBCPP_BIG_ENDIAN

enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ? enum {__min_cap = (sizeof(__long) - 1)/sizeof(value_type) > 2 ?

(sizeof(__long) - 1)/sizeof(value_type) : 2}; (sizeof(__long) - 1)/sizeof(value_type) : 2};

struct __short struct __short

{ {

union union

{ {

unsigned char __size_; struct {

bool __is_long_ : 1;

unsigned char __size_ : 7;

};

value_type __lx; value_type __lx;

}; };

value_type __data_[__min_cap]; value_type __data_[__min_cap];

}; };

#endif // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT #endif // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

union __ulx{__long __lx; __short __lxx;}; union __ulx{__long __lx; __short __lxx;};

▲ Show 20 Lines • Show All 690 Lines • ▼ Show 20 Lines #endif

_LIBCPP_INLINE_VISIBILITY bool __invariants() const; _LIBCPP_INLINE_VISIBILITY bool __invariants() const;

_LIBCPP_INLINE_VISIBILITY void __clear_and_shrink() _NOEXCEPT; _LIBCPP_INLINE_VISIBILITY void __clear_and_shrink() _NOEXCEPT;

_LIBCPP_INLINE_VISIBILITY void __shrink_or_extend(size_type __target_capacity); _LIBCPP_INLINE_VISIBILITY void __shrink_or_extend(size_type __target_capacity);

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

bool __is_long() const _NOEXCEPT bool __is_long() const _NOEXCEPT {

{return bool(__r_.first().__s.__size_ & __short_mask);} return __r_.first().__s.__is_long_;

}

#if _LIBCPP_DEBUG_LEVEL == 2 #if _LIBCPP_DEBUG_LEVEL == 2

bool __dereferenceable(const const_iterator* __i) const; bool __dereferenceable(const const_iterator* __i) const;

bool __decrementable(const const_iterator* __i) const; bool __decrementable(const const_iterator* __i) const;

bool __addable(const const_iterator* __i, ptrdiff_t __n) const; bool __addable(const const_iterator* __i, ptrdiff_t __n) const;

bool __subscriptable(const const_iterator* __i, ptrdiff_t __n) const; bool __subscriptable(const const_iterator* __i, ptrdiff_t __n) const;

Show All 30 Lines iterator __insert_from_safe_copy(size_type __n, size_type __ip, _ForwardIterator __first, _ForwardIterator __last) {

traits_type::assign(*__p, *__first); traits_type::assign(*__p, *__first);

return begin() + __ip; return begin() + __ip;

} }

_LIBCPP_HIDE_FROM_ABI allocator_type& __alloc() _NOEXCEPT { return __r_.second(); } _LIBCPP_HIDE_FROM_ABI allocator_type& __alloc() _NOEXCEPT { return __r_.second(); }

_LIBCPP_HIDE_FROM_ABI const allocator_type& __alloc() const _NOEXCEPT { return __r_.second(); } _LIBCPP_HIDE_FROM_ABI const allocator_type& __alloc() const _NOEXCEPT { return __r_.second(); }

#ifdef _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

void __set_short_size(size_type __s) _NOEXCEPT void __set_short_size(size_type __s) _NOEXCEPT {

# ifdef _LIBCPP_BIG_ENDIAN if (__s > 127)

ldionneUnsubmitted

Done

I think I understand that you based the bound on the number representable by the __size_ bitfield. We could also establish it based on the maximum capacity of the small string buffer by using __s >= __min_cap instead. (>= because of the null terminator).

ldionne: I think I understand that you based the bound on the number representable by the `__size_`…

MordanteUnsubmitted

Done

void __set_short_size(size_type __s) _NOEXCEPT {

- _LIBCPP_ASSERT(__s < __min_cap, "__s should never be greater or equal to the short string capacity");

+ _LIBCPP_ASSERT(__s < __min_cap, "__s should never be greater than or equal to the short string capacity");

__r_.first().__s.__size_ = __s;

Mordante:

{__r_.first().__s.__size_ = (unsigned char)(__s << 1);} __libcpp_unreachable();

MordanteUnsubmitted

Done

Mainly curious, but why not a _LIBCPP_ASSERT?

Mordante: Mainly curious, but why not a `_LIBCPP_ASSERT`?

philnikAuthorUnsubmitted

Done

This ensures optimal code generation. GCC doesn't have __builtin_assume. This is a workaround.

philnik: This ensures optimal code generation. GCC doesn't have `__builtin_assume`. This is a workaround.

MordanteUnsubmitted

Done

From a maintainability PoV I like _LIBCPP_ASSERT better. Would it be an option to adjust _LIBCPP_ASSERT? the "generate" this code when using GCC?

Mordante: From a maintainability PoV I like `_LIBCPP_ASSERT` better. Would it be an option to adjust…

philnikAuthorUnsubmitted

Done

I agree that _LIBCPP_ASSERT would be nicer to read. The problem is that using this trick for _LIBCPP_ASSERT could lead to extra code generation in more complex scenarios. We would have to introduce something else like _LIBCPP_ASSUME, which should only be used for trivial cases. But for two instances I don't think this is justified.

philnik: I agree that `_LIBCPP_ASSERT` would be nicer to read. The problem is that using this trick for…

ldionneUnsubmitted

Done

Can you try again with _LIBCPP_ASSERT by making __libcpp_assertion_handler noreturn? This is how GCC implements their own assert(...), so presumably there might be some optimization opportunities for them there. I think this would be much better than using unreachable().

Edit: Yuck, this indeed won't help when assertions are disabled on GCC. However, looking at the codegen, it appears that the only difference is that the compiler adds a single instruction to ensure that __s <= 127. Unless you can show us a case where there is a significant difference, I think the way to go is to use _LIBCPP_ASSERT and for GCC to be able to pick it up.

ldionne: Can you try again with `_LIBCPP_ASSERT` by making `__libcpp_assertion_handler` `noreturn`? This…

# else __r_.first().__s.__size_ = __s;

{__r_.first().__s.__size_ = (unsigned char)(__s);} __r_.first().__s.__is_long_ = false;

# endif }

_LIBCPP_INLINE_VISIBILITY

size_type __get_short_size() const _NOEXCEPT

# ifdef _LIBCPP_BIG_ENDIAN

{return __r_.first().__s.__size_ >> 1;}

# else

{return __r_.first().__s.__size_;}

# endif

#else // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

_LIBCPP_INLINE_VISIBILITY

void __set_short_size(size_type __s) _NOEXCEPT

# ifdef _LIBCPP_BIG_ENDIAN

{__r_.first().__s.__size_ = (unsigned char)(__s);}

# else

{__r_.first().__s.__size_ = (unsigned char)(__s << 1);}

# endif

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

size_type __get_short_size() const _NOEXCEPT size_type __get_short_size() const _NOEXCEPT {

# ifdef _LIBCPP_BIG_ENDIAN if (__r_.first().__s.__is_long_)

{return __r_.first().__s.__size_;} __libcpp_unreachable();

ldionneUnsubmitted

Done

Same comment about using _LIBCPP_ASSERT -- unless we have a very convincing reason not to use it, let's use it.

ldionne: Same comment about using `_LIBCPP_ASSERT` -- unless we have a very convincing reason not to use…

# else return __r_.first().__s.__size_;

{return __r_.first().__s.__size_ >> 1;} }

# endif

#endif // _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

void __set_long_size(size_type __s) _NOEXCEPT void __set_long_size(size_type __s) _NOEXCEPT

{__r_.first().__l.__size_ = __s;} {__r_.first().__l.__size_ = __s;}

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

size_type __get_long_size() const _NOEXCEPT size_type __get_long_size() const _NOEXCEPT

{return __r_.first().__l.__size_;} {return __r_.first().__l.__size_;}

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

void __set_size(size_type __s) _NOEXCEPT void __set_size(size_type __s) _NOEXCEPT

{if (__is_long()) __set_long_size(__s); else __set_short_size(__s);} {if (__is_long()) __set_long_size(__s); else __set_short_size(__s);}

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

void __set_long_cap(size_type __s) _NOEXCEPT void __set_long_cap(size_type __s) _NOEXCEPT {

{__r_.first().__l.__cap_ = __long_mask | __s;} __r_.first().__l.__cap_ = __s / __endian_factor;

__r_.first().__l.__is_long_ = true;

}

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

size_type __get_long_cap() const _NOEXCEPT size_type __get_long_cap() const _NOEXCEPT {

{return __r_.first().__l.__cap_ & size_type(~__long_mask);} return __r_.first().__l.__cap_ * __endian_factor;

}

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

void __set_long_pointer(pointer __p) _NOEXCEPT void __set_long_pointer(pointer __p) _NOEXCEPT

{__r_.first().__l.__data_ = __p;} {__r_.first().__l.__data_ = __p;}

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

pointer __get_long_pointer() _NOEXCEPT pointer __get_long_pointer() _NOEXCEPT

{return __r_.first().__l.__data_;} {return __r_.first().__l.__data_;}

_LIBCPP_INLINE_VISIBILITY _LIBCPP_INLINE_VISIBILITY

▲ Show 20 Lines • Show All 1,681 Lines • ▼ Show 20 Lines

} }

template <class _CharT, class _Traits, class _Allocator> template <class _CharT, class _Traits, class _Allocator>

inline inline

typename basic_string<_CharT, _Traits, _Allocator>::size_type typename basic_string<_CharT, _Traits, _Allocator>::size_type

basic_string<_CharT, _Traits, _Allocator>::max_size() const _NOEXCEPT basic_string<_CharT, _Traits, _Allocator>::max_size() const _NOEXCEPT

{ {

size_type __m = __alloc_traits::max_size(__alloc()); size_type __m = __alloc_traits::max_size(__alloc());

#ifdef _LIBCPP_BIG_ENDIAN return ((__m <= std::numeric_limits<size_type>::max() / 2) ? __m : __m / __endian_factor) - __alignment;

ldionneUnsubmitted

Done

size_type __m = __alloc_traits::max_size(__alloc());

- return ((__m <= std::numeric_limits<size_type>::max() / 2) ? __m : __m / (2 / __endian_factor)) - __alignment;

+ if (__m <= std::numeric_limits<size_type>::max() / 2) {

+ return __m - __alignment;

+ } else {

+ bool __uses_lsb = __endian_factor == 2;

+ return __uses_lsb ? __m - __alignment : (__m / 2) - __alignment;

+ }

}

template <class _CharT, class _Traits, class _Allocator>

After discussing it, I think we agree this is a bit easier to digest.

ldionne: After discussing it, I think we agree this is a bit easier to digest.

return (__m <= ~__long_mask ? __m : __m/2) - __alignment;

#else

return __m - __alignment;

#endif

} }

template <class _CharT, class _Traits, class _Allocator> template <class _CharT, class _Traits, class _Allocator>

void void

ldionneUnsubmitted

Done

I see the CI is failing on backdeployment macOS arm64. You just explained to me that this was caused by a bug where we used to return an incorrect max_size() in Big Endian -- the test would expect that we throw length_error, but we don't and instead we try allocating, which leads to bad_alloc instead. This patch fixes that, but since max_size() has been inlined into the dylib, the test will fail in backdeployment.

Since it seems that we are fixing a bug as part of this patch, let's add a regression test for it. We can then mark that test (and the failing over_max_size.pass.cpp) as // XFAIL: use_system_cxx_lib && target=arm64{{.+}}-apple-macosx{{11.0|12.0}}, with an appropriate comment.

ldionne: I see the CI is failing on backdeployment macOS arm64. You just explained to me that this was…

philnikAuthorUnsubmitted

Done

I don't think we have to mark the new test as XFAIL because it doesn't use the dylib version.

philnik: I don't think we have to mark the new test as `XFAIL` because it doesn't use the dylib version.

basic_string<_CharT, _Traits, _Allocator>::reserve(size_type __requested_capacity) basic_string<_CharT, _Traits, _Allocator>::reserve(size_type __requested_capacity)

{ {

if (__requested_capacity > max_size()) if (__requested_capacity > max_size())

__throw_length_error(); __throw_length_error();

// Make sure reserve(n) never shrinks. This is technically only required in C++20 // Make sure reserve(n) never shrinks. This is technically only required in C++20

// and later (since P0966R1), however we provide consistent behavior in all Standard // and later (since P0966R1), however we provide consistent behavior in all Standard

// modes because this function is instantiated in the shared library. // modes because this function is instantiated in the shared library.

▲ Show 20 Lines • Show All 1,297 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Use bit field for checking if string is in long or short modeClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 422154

libcxx/include/string

[libc++] Use bit field for checking if string is in long or short mode
ClosedPublic