This is an archive of the discontinued LLVM Phabricator instance.

[libc++] string: Allocate exactly when growing from short to long string.
Needs ReviewPublic

Authored by courbet on Sep 9 2021, 7:31 AM.

Details

Reviewers
Quuxplusone
ldionne
Group Reviewers
Restricted Project
Summary

When a string grows from short (size < __min_cap) to large, the
minimum capacity is currently 2*__min_cap.

While growing this way allows amortized operations in general, I don't think
there is a good reason to force the minimal allocation to be 48 bytes. This
is wasteful for strings that are small but do not fit in the SSO buffer
(22 <= size < 32), as they use 48 rather than 32 bytes (note that
all allocation sizes are aligned to 16 bytes).

We have measured this overhead to be significant on our (Google) internal
workloads, as 64% of global string allocations (and 26% of allocated bytes)
are 48 bytes.

As an example of publicly available code, protobuf descriptors, which
use strings to store field names, have 95% of allocated buffers that
have a capacity of 48 bytes in our datacenters.

This patch changes the min capacity to 0 when a string moves from short to
long, which has the effect that, in that case, a string now always allocates
exactly instead of only when size >= 32. This does not change anything
for sizes outside of in [22,32).

Adding the branch is neutral on speed on the string benchmarks (see numbers
below) and the branch is constant-folded when called from __assign_no_alias).
We have seen no runtime regressions on our production benchmarks.

libcxx string benchmarks:

BM_StringAssignStr_Empty_Opaque                              1.64ns ± 5%
1.64ns ± 5%    ~           (p=0.457 n=47+43)
BM_StringAssignStr_Empty_Transparent                         1.35ns ± 5%
1.34ns ± 6%    ~           (p=0.089 n=47+43)
BM_StringAssignStr_Small_Opaque                              1.36ns ± 3%
1.35ns ± 4%    ~           (p=0.248 n=44+44)
BM_StringAssignStr_Small_Transparent                         1.33ns ± 5%
1.33ns ± 6%    ~           (p=0.822 n=43+45)
BM_StringAssignStr_Large_Opaque                              17.6ns ± 5%
17.6ns ± 8%    ~           (p=0.909 n=44+43)
BM_StringAssignStr_Large_Transparent                         16.4ns ± 4%
16.4ns ± 4%    ~           (p=0.699 n=43+42)
BM_StringAssignStr_Huge_Opaque                                233ns ±13%
231ns ±10%    ~           (p=0.119 n=49+48)
BM_StringAssignStr_Huge_Transparent                           232ns ±13%
228ns ±10%    ~           (p=0.064 n=50+43)
BM_StringAssignAsciiz_Empty_Opaque                           6.54ns ± 4%
6.53ns ± 5%    ~           (p=0.822 n=44+42)
BM_StringAssignAsciiz_Empty_Transparent                      0.82ns ± 5%
0.82ns ± 4%    ~           (p=0.750 n=44+40)
BM_StringAssignAsciiz_Small_Opaque                           7.81ns ± 5%
7.82ns ± 8%    ~           (p=0.966 n=47+40)
BM_StringAssignAsciiz_Small_Transparent                      1.33ns ± 6%
1.33ns ± 6%    ~           (p=0.509 n=47+40)
BM_StringAssignAsciiz_Large_Opaque                           23.3ns ± 4%
23.3ns ± 4%    ~           (p=0.952 n=47+40)
BM_StringAssignAsciiz_Large_Transparent                      18.5ns ± 3%
18.4ns ± 5%  -0.66%        (p=0.029 n=47+40)
BM_StringAssignAsciiz_Huge_Opaque                             269ns ± 6%
278ns ±13%    ~           (p=0.078 n=44+50)
BM_StringAssignAsciiz_Huge_Transparent                        227ns ± 8%
232ns ±10%    ~           (p=0.099 n=44+50)
BM_StringAssignAsciizMix_Opaque                              11.3ns ± 4%
11.4ns ± 8%    ~           (p=0.386 n=42+40)
BM_StringAssignAsciizMix_Transparent                         5.02ns ± 5%
5.01ns ± 5%    ~           (p=0.301 n=41+44)

Diff Detail

Event Timeline

courbet requested review of this revision.Sep 9 2021, 7:31 AM
courbet created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2021, 7:31 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

We have measured this overhead to be significant on our (Google) internal workloads, as 64% of global string allocations (and 26% of allocated bytes) are 48 bytes.

And after this patch, what are the numbers? Does it become, for example, "30% of global string allocations are 48 bytes and 34% are 32 bytes"?

Seems plausibly like a good idea to me, but I think your rationale is conspicuously incomplete without those "after" numbers.

libcxx/include/string
2316

FWIW, I'd rather see bool __was_short = __old_cap < __min_cap; — no const (for consistency with every other variable in this function even though, sure, we're not planning to modify any of them), and shorter/simpler condition. Assuming the condition __old_cap < __min_cap remains correct, that is.

We have measured this overhead to be significant on our (Google) internal workloads, as 64% of global string allocations (and 26% of allocated bytes) are 48 bytes.

And after this patch, what are the numbers? Does it become, for example, "30% of global string allocations are 48 bytes and 34% are 32 bytes"?

Seems plausibly like a good idea to me, but I think your rationale is conspicuously incomplete without those "after" numbers.

Unfortunately I can't give the "after" number for now, because this analysis is for the whole fleet, which includes an extremely large number of binaries which I cannot all run with my patch.

One easy example for which I can give number is the protobuf descriptors I mentioned in the description (0.17% of the fleet memory) given how ubiquitous protobuf usage is at google. The exact numbers depend on what protos are linked into a given binary, but I looked at a few of our largest jobs and here's the percentage of allocations that are switched from 48 bytes to 32 bytes:

% in [22;31]
server 148.1
batch 142.1
batch 242.9
batch 348.9
libcxx/include/string
2316

I had originally kept the old condition, but I agree that this one is clearer. Both should be equivalent as __invariants checks that __cap < __min_cap => __cap + 1 == __min_cap. Done.

courbet updated this revision to Diff 371920.Sep 10 2021, 7:31 AM

Address review comments

courbet edited reviewers, added: ldionne; removed: ldam666fr.