Page MenuHomePhabricator

[libc++] Remove std::basic_string's base class in ABIv2

Authored by philnik on Dec 28 2021, 8:34 AM.



Remove std::basic_string's base class in ABI version 2

Diff Detail

Event Timeline

philnik requested review of this revision.Dec 28 2021, 8:34 AM
philnik created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 28 2021, 8:34 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Interesting, I thought this looked somewhat familiar and it's indeed something that I expected to be done in D111173.

I like to hear @ldionne's opinion why it wasn't done in D111173


This looks more consistent with the other ABI macros.


This should also be guarded by the ABI macro.

Quuxplusone requested changes to this revision.Dec 28 2021, 10:17 AM
Quuxplusone added inline comments.

I agree that the name shouldn't involve REMOVE; it should describe the new state of affairs, not just a diff against the obsolete state of affairs. FWLIW, I'd have said _LIBCPP_ABI_NO_BASIC_STRING_BASE_CLASSNO_STRING_BASE just seems too short/vague. But I acknowledge I'm suffering from a little bit of "obscure things should have obscure scary names" (the same reason we've written template<class ...> in front of templates for 30 years, you know), and _LIBCPP_ABI_NO_STRING_BASE is consistent with _LIBCPP_ABI_NO_BINDER_BASES and _LIBCPP_ABI_NO_ITERATOR_BASES, both of which I myself introduced.

Also, I notice that _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT already exists. I suspect we cannot combine these two ABI flags into one, because that would mess with people who are (unsupportedly) a-la-carte'ing that flag today. But I'd like to hear @ldionne's take on that idea anyway.

For reference, right now our two string-related ABI flags are _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT and _LIBCPP_ABI_STRING_OPTIMIZED_EXTERNAL_INSTANTIATION. Yet another option for this one would be _LIBCPP_ABI_STRING_NO_BASE_CLASS.

TLDR: I'm reasonably happy with @Mordante's suggested edit. ;)


(1) My understanding is that the _LIBCPP_NO_EXCEPTIONS branch needs to remain. Someone who compiles with -D_LIBCPP_NO_EXCEPTIONS must not see any calls to __throw_out_of_range popping up at link time.

(2) I have the (more nebulous) impression that we call a dedicated library function __basic_string_common<true>::__throw_out_of_range() instead of __throw_out_of_range("basic_string") on purpose, because std::string is used so heavily, and we don't want the user to pay for the 13-byte string literal "basic_string" in every single one of their TUs. (This rationale does not apply in lesser-used places such as deque.) However, off the top of my head I don't know that we have ever benchmarked this or that anyone really cares.

This revision now requires changes to proceed.Dec 28 2021, 10:17 AM
Mordante added inline comments.Dec 28 2021, 10:45 AM

(1) No that branch can be removed. I had the same initial feeling but include/stdexcept has the branch.

(2) That may be an issue indeed. I wonder whether the linker is/can be smart enough to remove the duplicates.

philnik added inline comments.Jan 1 2022, 4:07 PM

I ran a few tests now. My general conclusion would be to put the throws directly into the header. The TU size doesn't really matter I think, because even changing the build location can have a bigger impact on the build directory size. (At least with CMake)

I used export LINK_FLAGS="-nostdlib++ -L$LLVM/default/build/lib -Wl,-rpath,$LLVM/default/build/lib -lc++"; CC=clang CXX=clang++ cmake -G Ninja -DLLVM_ENABLE_PROJECTS="llvm;clang" -DCMAKE_BUILD_TYPE=Release ../../../llvm/ -DCMAKE_CXX_FLAGS="-nostdinc++ -nostdlib++ -isystem $LLVM/default/build/include/c++/v1" -DCMAKE_EXE_LINKER_FLAGS="$LINK_FLAGS" -DCMAKE_SHARED_LINKER_FLAGS="$LINK_FLAGS" -DLLVM_USE_LLD=On -DLLVM_ENABLE_EH=On -DLLVM_ENABLE_RTTI=On (-DLLVM_ENABLE_LTO=Thin) to generate the build.
I got these results:

throw in headerThinLTOclang-14 binary size (bytes)

I guess that especially with ThinLTO the functions can now be inlined or eliminated.
I also timed the first tests but there was no significant difference between the two.

philnik updated this revision to Diff 396921.Jan 2 2022, 4:20 AM
philnik marked 3 inline comments as done.
  • Renamed macro to _LIBCPP_ABI_NO_STRING_BASE
  • Guard __basic_string_common

All of my comments have been addressed. I think the only remaining question is "do we want to do this," which is above my pay grade. :) @ldionne ping?

ldionne requested changes to this revision.Jan 11 2022, 9:05 AM

I like to hear @ldionne's opinion why it wasn't done in D111173

Probably oversight. In D111173, I was solving a specific problem I saw internally, and I had no reason to expand the scope of the patch to removing the base class in the unstable ABI.

I am favourable to this change, but we'll have to tackle my comment (or show me I'm wrong).

Also, perhaps we should have a similar ABI flag for std::vector, which does the same thing?


Also, I notice that _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT already exists. I suspect we cannot combine these two ABI flags into one, because that would mess with people who are (unsupportedly) a-la-carte'ing that flag today. But I'd like to hear @ldionne's take on that idea anyway.

Indeed, ABI flags are meant to be a-la-carte'ed. For example, when we shipped libc++ on ARM64, we enabled the new string layout (see __config):

#if defined(__APPLE__) && !defined(__i386__) && !defined(__x86_64__) &&       \
    (!defined(__arm__) || __ARM_ARCH_7K__ >= 2)

So individual ABI flags need to stay stable once they are introduced.

Regarding the name, I much prefer _LIBCPP_ABI_NO_BASIC_STRING_BASE_CLASS too. It more clearly explains what it does.


If you read the description in D111173, I explain that one of the ideas behind hiding the exception-throwing-business behind a function call is this:

Encapsulate the code to throw an exception (which is non-trivial) in an externally-defined function so that the important code paths that call it (e.g. vector::at) are free from that code. Basically, the intent is for the "hot" code path to contain a single conditional jump (based on checking the error condition) to an externally-defined function, which handles all the exception-throwing business.

If you look at the code generation for std::vector::at before and after your change, you should see that it's more complicated after the change, because there's code to throw the exception. Before, it was a conditional jump to the function defined in the dylib, which then contained that business. I remember looking at the codegen and coming to that conclusion.

This patch shouldn't regress that.

This revision now requires changes to proceed.Jan 11 2022, 9:05 AM
philnik added inline comments.Jan 11 2022, 3:22 PM

I don't know what you have looked at exactly, but I can't reproduce your problem. It seems to be that clang actually goes out of its way to not include the call in the hot path:

With llvm-objdump --disassemble-symbols=_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE2atEm --x86-asm-syntax=intel | c++filt I get:	file format elf64-x86-64

Disassembly of section .text:

0000000000054ec0 <std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::at(unsigned long)>:
   54ec0: 50                           	push	rax
   54ec1: 0f b6 07                     	movzx	eax, byte ptr [rdi]
   54ec4: a8 01                        	test	al, 1
   54ec6: 74 06                        	je	0x54ece <std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::at(unsigned long)+0xe>
   54ec8: 48 8b 4f 08                  	mov	rcx, qword ptr [rdi + 8]
   54ecc: eb 06                        	jmp	0x54ed4 <std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::at(unsigned long)+0x14>
   54ece: 48 89 c1                     	mov	rcx, rax
   54ed1: 48 d1 e9                     	shr	rcx
   54ed4: 48 39 f1                     	cmp	rcx, rsi
   54ed7: 76 1c                        	jbe	0x54ef5 <std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::at(unsigned long)+0x35>
   54ed9: a8 01                        	test	al, 1
   54edb: 74 0c                        	je	0x54ee9 <std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::at(unsigned long)+0x29>
   54edd: 48 8b 7f 10                  	mov	rdi, qword ptr [rdi + 16]
   54ee1: 48 01 f7                     	add	rdi, rsi
   54ee4: 48 89 f8                     	mov	rax, rdi
   54ee7: 59                           	pop	rcx
   54ee8: c3                           	ret
   54ee9: 48 83 c7 01                  	add	rdi, 1
   54eed: 48 01 f7                     	add	rdi, rsi
   54ef0: 48 89 f8                     	mov	rax, rdi
   54ef3: 59                           	pop	rcx
   54ef4: c3                           	ret
   54ef5: e8 f6 e3 ff ff               	call	0x532f0 <std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__throw_out_of_range() const>
   54efa: 66 0f 1f 44 00 00            	nop	word ptr [rax + rax]

Note the extra lines 54ef5 and 54efa. Clang went out of its way to even just have a jump (54ecc) in the hot path.

Can we please change the name to _LIBCPP_ABI_NO_BASIC_STRING_BASE_CLASS?


So I looked at this again, and indeed I am unable to reproduce what I was seeing, although I do clearly remember seeing *something*, otherwise it would have been a no-brainer to just scratch all the code and just call _VSTD::__throw_foo() back in D111173. But hey, I won't block this improvement if I can't find a downside!

Godbolt does show me they are the same:

philnik updated this revision to Diff 402590.Jan 24 2022, 10:47 AM
  • Rename macro
ldionne accepted this revision.Jan 24 2022, 10:53 AM
This revision is now accepted and ready to land.Jan 24 2022, 10:53 AM
This revision was automatically updated to reflect the committed changes.