When a string grows from short (size < __min_cap) to large, the
minimum capacity is currently 2*__min_cap.
While growing this way allows amortized operations in general, I don't think
there is a good reason to force the minimal allocation to be 48 bytes. This
is wasteful for strings that are small but do not fit in the SSO buffer
(22 <= size < 32), as they use 48 rather than 32 bytes (note that
all allocation sizes are aligned to 16 bytes).
We have measured this overhead to be significant on our (Google) internal
workloads, as 64% of global string allocations (and 26% of allocated bytes)
are 48 bytes.
As an example of publicly available code, protobuf descriptors, which
use strings to store field names, have 95% of allocated buffers that
have a capacity of 48 bytes in our datacenters.
This patch changes the min capacity to 0 when a string moves from short to
long, which has the effect that, in that case, a string now always allocates
exactly instead of only when size >= 32. This does not change anything
for sizes outside of in [22,32).
Adding the branch is neutral on speed on the string benchmarks (see numbers
below) and the branch is constant-folded when called from __assign_no_alias).
We have seen no runtime regressions on our production benchmarks.
libcxx string benchmarks:
BM_StringAssignStr_Empty_Opaque 1.64ns ± 5% 1.64ns ± 5% ~ (p=0.457 n=47+43) BM_StringAssignStr_Empty_Transparent 1.35ns ± 5% 1.34ns ± 6% ~ (p=0.089 n=47+43) BM_StringAssignStr_Small_Opaque 1.36ns ± 3% 1.35ns ± 4% ~ (p=0.248 n=44+44) BM_StringAssignStr_Small_Transparent 1.33ns ± 5% 1.33ns ± 6% ~ (p=0.822 n=43+45) BM_StringAssignStr_Large_Opaque 17.6ns ± 5% 17.6ns ± 8% ~ (p=0.909 n=44+43) BM_StringAssignStr_Large_Transparent 16.4ns ± 4% 16.4ns ± 4% ~ (p=0.699 n=43+42) BM_StringAssignStr_Huge_Opaque 233ns ±13% 231ns ±10% ~ (p=0.119 n=49+48) BM_StringAssignStr_Huge_Transparent 232ns ±13% 228ns ±10% ~ (p=0.064 n=50+43) BM_StringAssignAsciiz_Empty_Opaque 6.54ns ± 4% 6.53ns ± 5% ~ (p=0.822 n=44+42) BM_StringAssignAsciiz_Empty_Transparent 0.82ns ± 5% 0.82ns ± 4% ~ (p=0.750 n=44+40) BM_StringAssignAsciiz_Small_Opaque 7.81ns ± 5% 7.82ns ± 8% ~ (p=0.966 n=47+40) BM_StringAssignAsciiz_Small_Transparent 1.33ns ± 6% 1.33ns ± 6% ~ (p=0.509 n=47+40) BM_StringAssignAsciiz_Large_Opaque 23.3ns ± 4% 23.3ns ± 4% ~ (p=0.952 n=47+40) BM_StringAssignAsciiz_Large_Transparent 18.5ns ± 3% 18.4ns ± 5% -0.66% (p=0.029 n=47+40) BM_StringAssignAsciiz_Huge_Opaque 269ns ± 6% 278ns ±13% ~ (p=0.078 n=44+50) BM_StringAssignAsciiz_Huge_Transparent 227ns ± 8% 232ns ±10% ~ (p=0.099 n=44+50) BM_StringAssignAsciizMix_Opaque 11.3ns ± 4% 11.4ns ± 8% ~ (p=0.386 n=42+40) BM_StringAssignAsciizMix_Transparent 5.02ns ± 5% 5.01ns ± 5% ~ (p=0.301 n=41+44)
FWIW, I'd rather see bool __was_short = __old_cap < __min_cap; — no const (for consistency with every other variable in this function even though, sure, we're not planning to modify any of them), and shorter/simpler condition. Assuming the condition __old_cap < __min_cap remains correct, that is.