This change optimizes the copy constructor using partial inlining.
- adds default_value_tag() to memory, to support default initialization
- inlines copy contructor: non SSO init delegated to instantiated __init_long() method
Note that this change does not consider existing value initialization ctors, most would likely benefit from default initialization.
Generated code is small, i.e, considerably smaller than other hot inlined functions such as move ctor
given:
void StringCopyCtor(void* mem, const std::string& s) { std::string*p = new(mem) std::string{s}; }
asm:
cmp byte ptr [rsi + 23], 0 js .LBB0_2 mov rax, qword ptr [rsi + 16] mov qword ptr [rdi + 16], rax movups xmm0, xmmword ptr [rsi] movups xmmword ptr [rdi], xmm0 ret .LBB0_2: jmp std::basic_string::__init_long # TAILCALL
Benchmarks::
BM_StringCopy_Empty 5.18ns ± 7% 1.53ns ± 5% -70.45% (p=0.000 n=10+10)
BM_StringCopy_Small 5.18ns ± 7% 1.54ns ± 5% -70.21% (p=0.000 n=10+10)
BM_StringCopy_Large 19.0ns ± 1% 19.3ns ± 1% +1.77% (p=0.000 n=10+10)
BM_StringCopy_Huge 321ns ± 4% 310ns ± 1% ~ (p=0.408 n=10+8)
I'm really uncomfortable with piggy-backing on the fact that non-inline member functions with an explicit instantiation declaration won't be inlined into user code. We're also excluding this optimization from most builds which are not using the unstable ABI.
I think it's possible to do better by encoding the knowledge of when __init_long() was added to the shared library. As far as you're concerned, you could basically write code using _LIBCPP_HAS_INIT_LONG_IN_DYLIB (or whatever), and then vendors could go and define that macro when suitable for them.
We could also define all these _LIBCPP_HAS_<function-name>_IN_DYLIB macros in case we're using the unstable ABI. WDYT?