Detailed description of change:
In std::string::resize/__resize_default_init, when the new size is greater than the previous capacity of the string, we currently use exponential growth. In this change, we instead use precise growth, i.e., we allocate a new buffer of exactly the requested size. We expect that for use cases that don't require amortized exponential growth (i.e. are not repeatedly growing the same string), we can save up to ~25% of the allocated memory without significantly increasing CPU usage (based on the assumptions of 2x growth rate and uniform distribution of lengths).
Evidence of efficiency win:
We believe that using precise growth in std::string::resize is the right tradeoff to make for a performance-focused standard library implementation because
(a) we can save significant memory by doing this,
(b) experiments indicate that measurable CPU regressions are rare, and
(c) when CPU regressions are detected, it is very simple to fix them.
We have experimented with this change and similar ones internally in Google and found them to be beneficial so we expect everyone can similarly benefit from this change. We recently changed TensorFlow’s tstring (github; which has a very similar API to std::string) to use precise growth instead of amortized exponential growth in resize(), and in Google use cases, we found no measurable CPU regression, and memory savings of ~23% of memory allocated in resize() - note that this was measured using Google Wide Profiling (GWP; see 1, 2). From GWP data and memory profiles from a load test of a large internal workload, we estimate that we can save ~0.1% of Google’s data center memory usage by making this change. We expect that other general C++ workloads can also save a similar amount of memory from this change, and that string-heavy workloads can save more.
Our internal profiles of Chrome memory usage show that std::string::resize uses at least ~1.2% of Chrome memory. The largest source of this is ~1.0% coming from base::Base64Decode, and we found that this case is manually working around the lack of precise sizing in std::string::resize by calling resize on a default constructed string. By doing precise sizing in resize, we could potentially remove the workaround in Chrome and improve performance there.
Risk of quadratic behavior:
There are cases in which we introduce quadratic CPU usage where it was previously linear, but these cases are straightforward to fix by using APIs with amortized constant complexity such as push_back/append or by using reserve, either with a known maximum size or to do amortized exponential growth if the maximum size is unknown.
The cases in which we introduce quadratic complexity are when resize is used to repeatedly grow a string gradually (e.g. in a loop) - the current code will grow exponentially and take O(N) time in resize whereas the new code will grow precisely and take O(N^2) time in resize, where N is the final size of the string. An example of manual amortized growth using reserve is here:
void StringResizeAmortized(std::string& s, size_t new_size) { const size_t cap = s.capacity(); if (new_size > cap) { // Make sure to always grow by at least a factor of 2x. s.reserve(std::max(new_size, 2 * cap)); } s.resize(new_size); }
Note also that reserve already uses precise growth rather than exponential growth, and the standard does not explicitly define the complexity of either reserve or resize - reference.
Evidence that this issue is rare:
Load tests for a large internal workload were neutral in CPU after fixing just one case of newly quadratic behavior to use push_back/append.
We have also run all Google tests (>1 million) on our internal repository (hundreds of millions of lines of C++ code), and we found only 7 cases in which code became quadratic and needed to be fixed in order to avoid test timeouts. Without exception, these cases were all in library code that provided append-like functionality but was implemented using resize/__resize_default_init rather than append.
Based on this evidence, we expect that usage of std::string::resize that noticeably regresses would also be very rare in other codebases.
Implementation notes:
We make these changes directly inside resize/__resize_default_init rather than inside __grow_by/__grow_by_and_replace (which are also called by push_back/append) because push_back is required to have amortized constant complexity (reference), and in general, append usage seems more likely to benefit from amortized constant complexity even though it is not required by the standard to have this complexity (e.g. see above test failures due to append-like APIs being used to repeatedly grow strings).
Can you add a libc++ specific test (under libcxx/test/libcxx) to check this new behavior?