Page MenuHomePhabricator

Use precise growth rather than amortized growth in std::string::resize/__resize_default_init.
Needs RevisionPublic

Authored by ezbr on May 18 2021, 2:37 PM.

Details

Reviewers
EricWF
ldionne
mvels
Group Reviewers
Restricted Project
Summary

Detailed description of change:

In std::string::resize/__resize_default_init, when the new size is greater than the previous capacity of the string, we currently use exponential growth. In this change, we instead use precise growth, i.e., we allocate a new buffer of exactly the requested size. We expect that for use cases that don't require amortized exponential growth (i.e. are not repeatedly growing the same string), we can save up to ~25% of the allocated memory without significantly increasing CPU usage (based on the assumptions of 2x growth rate and uniform distribution of lengths).

Evidence of efficiency win:

We believe that using precise growth in std::string::resize is the right tradeoff to make for a performance-focused standard library implementation because
(a) we can save significant memory by doing this,
(b) experiments indicate that measurable CPU regressions are rare, and
(c) when CPU regressions are detected, it is very simple to fix them.

We have experimented with this change and similar ones internally in Google and found them to be beneficial so we expect everyone can similarly benefit from this change. We recently changed TensorFlow’s tstring (github; which has a very similar API to std::string) to use precise growth instead of amortized exponential growth in resize(), and in Google use cases, we found no measurable CPU regression, and memory savings of ~23% of memory allocated in resize() - note that this was measured using Google Wide Profiling (GWP; see 1, 2). From GWP data and memory profiles from a load test of a large internal workload, we estimate that we can save ~0.1% of Google’s data center memory usage by making this change. We expect that other general C++ workloads can also save a similar amount of memory from this change, and that string-heavy workloads can save more.

Our internal profiles of Chrome memory usage show that std::string::resize uses at least ~1.2% of Chrome memory. The largest source of this is ~1.0% coming from base::Base64Decode, and we found that this case is manually working around the lack of precise sizing in std::string::resize by calling resize on a default constructed string. By doing precise sizing in resize, we could potentially remove the workaround in Chrome and improve performance there.

Risk of quadratic behavior:

There are cases in which we introduce quadratic CPU usage where it was previously linear, but these cases are straightforward to fix by using APIs with amortized constant complexity such as push_back/append or by using reserve, either with a known maximum size or to do amortized exponential growth if the maximum size is unknown.

The cases in which we introduce quadratic complexity are when resize is used to repeatedly grow a string gradually (e.g. in a loop) - the current code will grow exponentially and take O(N) time in resize whereas the new code will grow precisely and take O(N^2) time in resize, where N is the final size of the string. An example of manual amortized growth using reserve is here:

void StringResizeAmortized(std::string& s, size_t new_size) {
  const size_t cap = s.capacity();
  if (new_size > cap) {
    // Make sure to always grow by at least a factor of 2x.
    s.reserve(std::max(new_size, 2 * cap));
  }
  s.resize(new_size);
}

Note also that reserve already uses precise growth rather than exponential growth, and the standard does not explicitly define the complexity of either reserve or resize - reference.

Evidence that this issue is rare:

Load tests for a large internal workload were neutral in CPU after fixing just one case of newly quadratic behavior to use push_back/append.

We have also run all Google tests (>1 million) on our internal repository (hundreds of millions of lines of C++ code), and we found only 7 cases in which code became quadratic and needed to be fixed in order to avoid test timeouts. Without exception, these cases were all in library code that provided append-like functionality but was implemented using resize/__resize_default_init rather than append.

Based on this evidence, we expect that usage of std::string::resize that noticeably regresses would also be very rare in other codebases.

Implementation notes:

We make these changes directly inside resize/__resize_default_init rather than inside __grow_by/__grow_by_and_replace (which are also called by push_back/append) because push_back is required to have amortized constant complexity (reference), and in general, append usage seems more likely to benefit from amortized constant complexity even though it is not required by the standard to have this complexity (e.g. see above test failures due to append-like APIs being used to repeatedly grow strings).

Diff Detail

Unit TestsFailed

TimeTest
1,060 mslibcxx CI C++03 > libcxx-trunk-shared.std/strings/basic_string/string_capacity::over_max_size.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/std/strings/basic.string/string.capacity/over_max_size.pass.cpp --target=x86_64-unknown-linux-gnu -nostdinc++ -isystem /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/include/c++/v1 -I /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/support -std=c++03 -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -lc++experimental -nostdlib++ -L /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/lib -lc++ -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/lib -pthread -o /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/projects/libcxx/std/strings/basic.string/string.capacity/Output/over_max_size.pass.cpp.dir/t.tmp.exe
1,060 mslibcxx CI C++03 > libcxx-trunk-shared.std/strings/basic_string/string_capacity::resize_size.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/std/strings/basic.string/string.capacity/resize_size.pass.cpp --target=x86_64-unknown-linux-gnu -nostdinc++ -isystem /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/include/c++/v1 -I /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/support -std=c++03 -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -lc++experimental -nostdlib++ -L /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/lib -lc++ -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/lib -pthread -o /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/projects/libcxx/std/strings/basic.string/string.capacity/Output/resize_size.pass.cpp.dir/t.tmp.exe
1,020 mslibcxx CI C++03 > libcxx-trunk-shared.std/strings/basic_string/string_capacity::resize_size_char.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/std/strings/basic.string/string.capacity/resize_size_char.pass.cpp --target=x86_64-unknown-linux-gnu -nostdinc++ -isystem /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/include/c++/v1 -I /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/support -std=c++03 -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -lc++experimental -nostdlib++ -L /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/lib -lc++ -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/lib -pthread -o /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx03/projects/libcxx/std/strings/basic.string/string.capacity/Output/resize_size_char.pass.cpp.dir/t.tmp.exe
1,180 mslibcxx CI C++11 > libcxx-trunk-shared.std/strings/basic_string/string_capacity::over_max_size.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/std/strings/basic.string/string.capacity/over_max_size.pass.cpp --target=x86_64-unknown-linux-gnu -nostdinc++ -isystem /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx11/include/c++/v1 -I /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/support -std=c++11 -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -lc++experimental -nostdlib++ -L /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx11/lib -lc++ -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx11/lib -pthread -o /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx11/projects/libcxx/std/strings/basic.string/string.capacity/Output/over_max_size.pass.cpp.dir/t.tmp.exe
1,190 mslibcxx CI C++11 > libcxx-trunk-shared.std/strings/basic_string/string_capacity::resize_size.pass.cpp
Script: -- : 'COMPILED WITH'; /usr/bin/c++ /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/std/strings/basic.string/string.capacity/resize_size.pass.cpp --target=x86_64-unknown-linux-gnu -nostdinc++ -isystem /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx11/include/c++/v1 -I /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/libcxx/test/support -std=c++11 -Werror -Wall -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -lc++experimental -nostdlib++ -L /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx11/lib -lc++ -Wl,-rpath,/home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx11/lib -pthread -o /home/libcxx-builder/.buildkite-agent/builds/c27b61b1a64f-1/llvm-project/libcxx-ci/build/generic-cxx11/projects/libcxx/std/strings/basic.string/string.capacity/Output/resize_size.pass.cpp.dir/t.tmp.exe
View Full Test Results (21 Failed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ezbr requested review of this revision.May 18 2021, 2:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2021, 2:37 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript

We're going to want some documentation about this option. Please add it here: https://libcxx.llvm.org/UsingLibcxx.html#id7
(The files that generate this are under libcxx/docs).

Generally options are discouraged, but this seems small enough. Can we describe in more depth what benefits this change hopes to provide?

ezbr updated this revision to Diff 352550.Jun 16 2021, 2:01 PM

Updated configuration macro documentation.

ezbr edited the summary of this revision. (Show Details)Jun 16 2021, 2:07 PM

I updated the documentation and added more details about the motivation to the revision summary.

ldionne requested changes to this revision.Jul 5 2021, 9:05 AM
ldionne added a subscriber: ldionne.

If this is only an experiment and we're not sure we want to keep it, I would much prefer that you enable it internally in your downstream fork and then upstream it alongside with the data you gathered if the experiment turns out to be positive. When we add options to libc++, people are free to start using them and they are a lot harder to remove. I'd be happy to make this the default in libc++ if it's shown to be an improvement and keeps us conforming.

Regarding the change itself, let me make sure I understand. Basically, this patch calls reserve(n) automatically whenever someone calls resize(n), so that we end up not growing the string exponentially in append() if we don't have enough capacity. Another way to achieve this would be to *not* perform 2 * __old_cap in __grow_by_and_replace(), however this would also mean that a loop using push_back() repeatedly would have quadratic complexity (because we'd be growing by exactly 1 element each time), which is bad. Is that right, and is that why you're only adding this knob in resize()?

This revision now requires changes to proceed.Jul 5 2021, 9:05 AM
ezbr updated this revision to Diff 358762.Jul 14 2021, 2:49 PM

Change from adding an option for precise resize to doing so by default and update the change description.

ezbr retitled this revision from Add a compiler option to make string resize/__resize_default_init use precise growth instead of amortized growth. to Use precise growth rather than amortized growth in std::string::resize/__resize_default_init..Jul 14 2021, 2:50 PM
ezbr edited the summary of this revision. (Show Details)
ezbr edited the summary of this revision. (Show Details)Jul 14 2021, 2:55 PM

If this is only an experiment and we're not sure we want to keep it, I would much prefer that you enable it internally in your downstream fork and then upstream it alongside with the data you gathered if the experiment turns out to be positive. When we add options to libc++, people are free to start using them and they are a lot harder to remove. I'd be happy to make this the default in libc++ if it's shown to be an improvement and keeps us conforming.

Changed to update the default behavior instead of adding a compiler option. We've experimented internally and find that this is a desirable change (see updated change description).

Regarding the change itself, let me make sure I understand. Basically, this patch calls reserve(n) automatically whenever someone calls resize(n), so that we end up not growing the string exponentially in append() if we don't have enough capacity. Another way to achieve this would be to *not* perform 2 * __old_cap in __grow_by_and_replace(), however this would also mean that a loop using push_back() repeatedly would have quadratic complexity (because we'd be growing by exactly 1 element each time), which is bad. Is that right, and is that why you're only adding this knob in resize()?

Your understanding is correct. push_back() also relies on __grow_by() and is required to have amortized constant complexity. Updated the change description to include this information.

ezbr edited the summary of this revision. (Show Details)Jul 14 2021, 3:04 PM
ldionne requested changes to this revision.Jul 15 2021, 9:04 AM
ldionne added a subscriber: ckennelly.

I like the way this change is done better now, but I'm still not sure we want/can do this -- I have a few questions I need the answer to before this LGTM. Sorry if that sounds like I'm trying to poke holes in your optimization, but doing my due diligence requires me to try for a type as important as std::string and a method as widely used as resize(). We really don't want to get this wrong.

  1. Does this change how iterators are invalidated?
  1. http://eel.is/c++draft/string.capacity#5.2 says: "If n > size(), appends n - size() copies of c". Now, std::string::append has the strong exception guarantee when an exception is thrown, which means that std::string::resize also does when n > size(). Does your change modify that (please explain)?
  1. Can you explain what are the cases where this change introduced quadratic CPU usage? I assume this is whenever someone was calling resize() in a loop with a gradually larger size: previously, the code would have reserved exponentially so we would have reallocated very rarely, but with your change, we will reallocate every time.

@ckennelly I know you did a lot of work on std::string, can you take a look at this change?

libcxx/include/string
3271

Can you add a libc++ specific test (under libcxx/test/libcxx) to check this new behavior?

3280

Please put the __shrink_or_extend call on its own line (same below).

This revision now requires changes to proceed.Jul 15 2021, 9:04 AM

@ckennelly I know you did a lot of work on std::string, can you take a look at this change?

I've added @mvels who has done a lot more looking at std::string than I have.

@ckennelly I know you did a lot of work on std::string, can you take a look at this change?

I've added @mvels who has done a lot more looking at std::string than I have.

I'll take a peek at this

mvels added a comment.Jul 15 2021, 4:44 PM

Some 2 cents from my side.

The change here LGTM, but I do think the O(SQR) use case is a sharp edge. I think we should shield the 'stable' folks from these sharp edges imho. The libc++ spec states nothing about the 'growth' of resize(), which would make me caution on the side of DTRT instead of does the minimum memory thing. The latter is more a faang thing. I would imagine that a 'stable' user might be surprised with sudden regressions without maybe being too concerned with (temporary) allocations being +/- 20%.

My suggestion would be to make this the default for UNSTABLE only. Which means we don't create new knobs and defines, but simply #ifdef the few new loc under unstable.

mvels added inline comments.Jul 15 2021, 4:48 PM
libcxx/include/string
3280

as per my general comment, I'd suggest putting this inside #if defined(_LIBCPP_ABI_UNSTABLE)

miyuki added a subscriber: miyuki.Jul 19 2021, 4:59 AM

[...] which would make me caution on the side of DTRT instead of does the minimum memory thing. The latter is more a faang thing. I would imagine that a 'stable' user might be surprised with sudden regressions without maybe being too concerned with (temporary) allocations being +/- 20%.

I would say it's not just FAANG. Embedded developers would also benefit from the reduction of memory overhead.

My suggestion would be to make this the default for UNSTABLE only. Which means we don't create new knobs and defines, but simply #ifdef the few new loc under unstable.

But this change has nothing to do with the ABI.

Quuxplusone added inline comments.
libcxx/include/string
3280

But this change has nothing to do with the ABI.

It has to do with ABI in the sense that one can imagine compiling one TU with this patch, and one TU without this patch, and then linking them together, and having the finished program misbehave; whereas if both TUs were compiled with the old library, they'd have worked OK. In that sense, anything having to do with behavior — and that is not detectable in "API" — is classified as "ABI" by definition.

Note that this should absolutely not be guarded by _LIBCPP_ABI_UNSTABLE; that would make it a "receding target" that would never become the default, not even when libc++ moved to "ABI version 2," because when the stable version is 2, the unstable version will become 3. If we were going to do this as an ABI flag, we'd:

  • Add a new flag _LIBCPP_ABI_EXACT_VECTOR_RESIZE circa https://github.com/llvm/llvm-project/blob/2ff5a56/libcxx/include/__config#L95-L96
  • Guard the new lines in <vector> under _LIBCPP_ABI_EXACT_VECTOR_RESIZE
  • (We don't mention _LIBCPP_ABI_VERSION>=2 in the <vector> header, because (A) that's too easy to thinko e.g. as _LIBCPP_ABI_VERSION==2, and (B) it wouldn't be self-documenting. Every libc++ ABI change comes with a greppable name, so we don't end up with a confusing web of "miscellaneous ABI version 2 changes" that are hard to disentangle. This also serves as a good brake on people adding "miscellaneous version-2 changes" willy-nilly.)

Then users who compile libc++ with ABI version 2 (or newer) will receive the new behavior.

FWIW, personally I don't care whether this change is done under an ABI flag, or unconditionally, or not-at-all.

ezbr edited the summary of this revision. (Show Details)Jul 19 2021, 2:57 PM
ezbr updated this revision to Diff 359932.Jul 19 2021, 3:04 PM
ezbr edited the summary of this revision. (Show Details)

Guard changes behind new _LIBCPP_ABI_EXACT_STRING_RESIZE config flag and add a test.

We enable the new config flag in unstable ABI/ABI version 2.

ezbr added a comment.Jul 19 2021, 3:06 PM

I like the way this change is done better now, but I'm still not sure we want/can do this -- I have a few questions I need the answer to before this LGTM. Sorry if that sounds like I'm trying to poke holes in your optimization, but doing my due diligence requires me to try for a type as important as std::string and a method as widely used as resize(). We really don't want to get this wrong.

  1. Does this change how iterators are invalidated?

Yes, I think this change would potentially result in iterators being invalidated more frequently. __invalidate_all_iterators() is called when we reallocate the buffer in __grow_by here and in __shrink_or_extend here. IIUC, if a user repeatedly uses resize to grow a string by a small constant size increment (e.g. s.resize(s.size()+1)) from size 0 to size N, we will get O(N) buffer reallocations in the new code whereas in the current code, we have O(log(N)) buffer reallocations.

  1. http://eel.is/c++draft/string.capacity#5.2 says: "If n > size(), appends n - size() copies of c". Now, std::string::append has the strong exception guarantee when an exception is thrown, which means that std::string::resize also does when n > size(). Does your change modify that (please explain)?

From https://en.cppreference.com/w/cpp/string/basic_string/resize, resize() also has the strong exception guarantee since C++11 so we can't violate that here. IIUC, we don't because the allocation here is the only new place that can throw (we always follow that branch of __shrink_or_extend), and the modification calls (i.e. __set_long_cap, etc.) are all after that at the end of the function.

  1. Can you explain what are the cases where this change introduced quadratic CPU usage? I assume this is whenever someone was calling resize() in a loop with a gradually larger size: previously, the code would have reserved exponentially so we would have reallocated very rarely, but with your change, we will reallocate every time.

Yes, that's right. Added to the change description.

ezbr updated this revision to Diff 359941.Jul 19 2021, 3:18 PM

Try again to update phabricator review. The previous attempt somehow only had the test file change.

ezbr added a comment.Jul 19 2021, 3:26 PM

Somehow the current version of this phabricator review doesn't seem to include my changes other than the new test file. I'll try to figure it out tomorrow.

Note: my first attempt to arc diff --update was blocked on clang-format, and this diff seems to have only picked up the small formatting commit and not the underlying commit with the other files, but the string changes are still present in my local git directory. (I'm not used to git/arc, sorry.)

ezbr updated this revision to Diff 360164.Jul 20 2021, 9:02 AM

Fix an issue with the previous version of the review.

I had accidentally done git commit instead of git commit --amend so it was missing my original changes.

ezbr marked 4 inline comments as done.Jul 20 2021, 9:05 AM

I followed @Quuxplusone's recommendation of using a new __config flag and enabling it by default for unstable ABI/ABI version 2.

LGTM at this point mod nits.
However, I still express no opinion as to whether we want to do this at all.
I'm just saying that if we wanted to do this, then I think this patch now reflects the appropriate way to do it.

libcxx/test/libcxx/strings/basic.string/string.capacity/resize.exact.cpp
28 ↗(On Diff #360164)

Should this be alignof(std::max_align_t)? Or what's the significance of 16 — how was it computed?

29 ↗(On Diff #360164)

Style nit: I'd prefer to see i < (1 << 10) or simply i < 1024, instead of the unparenthesized i < 1 << 10. Mixing < and << reads too much like a typo.

50 ↗(On Diff #360164)

Please also test std::wstring, for completeness. (And as "documentation" that this optimization is intended to apply to all basic_string specializations, not just to std::string specifically.)

ezbr updated this revision to Diff 360200.Jul 20 2021, 10:34 AM

Add wstring coverage and add a comment in the test.

ezbr marked 3 inline comments as done.Jul 20 2021, 10:39 AM

Regarding kAlign, this is basic_string::__alignment, which is private. Do you think it's okay to include this implementation detail here? If not, I think it could be computed, but I thought that could add complexity to the test and that this value is unlikely to change often. Do you prefer computing it in the test?

libcxx/test/libcxx/strings/basic.string/string.capacity/resize.exact.cpp
28 ↗(On Diff #360164)

Regarding kAlign, this is [basic_string::__alignment, which is private.](https://github.com/llvm/llvm-project/blob/2d0f1fa/libcxx/include/string#L1577) Do you think it's okay to include this implementation detail here?

Well, I have no better suggestion; defer to @ldionne here.

ldionne requested changes to this revision.Jul 20 2021, 12:22 PM

The change here LGTM, but I do think the O(SQR) use case is a sharp edge. I think we should shield the 'stable' folks from these sharp edges imho. The libc++ spec states nothing about the 'growth' of resize(), which would make me caution on the side of DTRT instead of does the minimum memory thing. The latter is more a faang thing. I would imagine that a 'stable' user might be surprised with sudden regressions without maybe being too concerned with (temporary) allocations being +/- 20%.

I don't think this is ABI breaking, per my reply to Arthur. However, I want to address a misconception that seems to exist here: _LIBCPP_ABI_UNSTABLE is meant for improvements (in correctness, performance or other) that require breaking the ABI. They are the defaults of "tomorrow", when (if) we move to a newer ABI. _LIBCPP_ABI_UNSTABLE is not _LIBCPP_ENABLE_NOT_QUITE_ASSUMED_DECISIONS, which seems to be how you suggest using it.

In other words, we should figure out whether we want this change, and if so, then the question becomes how to do it. If it turns out not to be an ABI break, we do it by default. If it is an ABI break, then we hide it behind the macro as it has been done in the current version of this patch.

[...]

Thanks for your replies. I'd like to understand: how did you measure the memory improvements you found? Ideally, it would be nice to have something that we can reproduce on a large open-source code base (e.g. Chrome) so that the experiment can be verified. Is that possible? Otherwise, you have to understand I'm blindly basing the decision on "an internal Google experiment", which while I do trust, is not the best from a due diligence perspective. I'm being very conservative because silently introducing quadratic behavior in user code is pretty close to the scariest thing a maintainer can think about -- very bad but also very difficult to detect.

Also, if this came with some sort of Clang warning or clang-tidy check that flagged using resize() in a loop (because the standard doesn't guarantee amortized O(1) for that but people might be expecting it), I might ease up a bit. It still wouldn't fix the case where resize() is called in a loop lower in the call stack, but it would be good if we could detect obvious misuses automatically. What was the quadratic case you encountered and fixed? Is that something you can share?

I still left some comments on the patch in case this is an ABI break (which I don't think it is), but I feel like we need to figure out whether it's acceptable for libc++ to do this by default (either now or in a future ABI version) before we make progress.

libcxx/include/__config
116 ↗(On Diff #360164)

You could expand the comment a bit more -- something like the above.

libcxx/include/string
3280

It has to do with ABI in the sense that one can imagine compiling one TU with this patch, and one TU without this patch, and then linking them together, and having the finished program misbehave; whereas if both TUs were compiled with the old library, they'd have worked OK. In that sense, anything having to do with behavior — and that is not detectable in "API" — is classified as "ABI" by definition.

std::string::resize is exported from the dylib, so I don't think we have that issue. All programs should be producing an external call to resize and should just pick up the new implementation after this change. I also don't think this change is observable except:

  1. performance-wise, with the quadratic behavior sharp edge
  2. the frequency of iterator invalidation (programs relying on that are wrong but may exist - I wouldn't block this change on only that, though)

Do you see an ABI issue I'm missing?

libcxx/test/libcxx/strings/basic.string/string.capacity/resize.exact.cpp
13 ↗(On Diff #360200)

Enabling tests for the unstable ABI like this is not necessarily going to work. It may happen to work, but it could just as well not work, because the shared library will not have been built with the same ABI settings.

In other words, the ABI in use is not a property that can be changed by users with a mere #define, it's a configuration-wide setting that should be decided by vendors when they ship the library. I think what we need to do here instead is:

  1. add a CI configuration that builds libc++ with the unstable ABI and runs the tests under that configuration
  2. move the existing tests that do something like this to using // REQUIRES: libcpp-abi-unstable (we have a couple of tricky ones that define things like _LIBCPP_ABI_ENABLE_UNIQUE_PTR_TRIVIAL_ABI)
1 ↗(On Diff #360164)

This file needs to be called resize.exact.pass.cpp.

28 ↗(On Diff #360164)

How would you compute that? If it can be done non-intrusively, then I would do that, even if it adds a bit of complexity to the test.

This revision now requires changes to proceed.Jul 20 2021, 12:22 PM
mvels added a comment.Jul 20 2021, 2:12 PM

_LIBCPP_ABI_UNSTABLE is meant for improvements (in correctness, performance or other) that require breaking the ABI.

My main point is that this change is a good opportunity for something at Google scale, but of lesser value for most other users. An O(0.2) win on string heap that comes with an O(SQR) sharp edge on hot 'resize()` loops which used to be amortized O(N) is a sharp edge and scary proposition.

I understand your point on the ABI flag being targeted at 'stable ABI' only, yet we have no means at all for other 'tomorrow's invariants and behaviors'. The fact that the standard leaves the amortized growth 'open' is unfortunate, as it's not unreasonable for users to expect amortized growth, and we are basically rolling the "this is fine" dice for everybody else, and we know that there "will" be such pathological use cases in practice. I don't expect the world to burn down, but from a "respect the user" perspective, I'd really like some form / way to have users consciously opt in for a new version.

whether we want this change

It depends on how we define 'we'. Mandatory xkcd: https://xkcd.com/1172/

ezbr marked 3 inline comments as done.Jul 21 2021, 11:41 AM

Thanks for your replies. I'd like to understand: how did you measure the memory improvements you found? Ideally, it would be nice to have something that we can reproduce on a large open-source code base (e.g. Chrome) so that the experiment can be verified. Is that possible? Otherwise, you have to understand I'm blindly basing the decision on "an internal Google experiment", which while I do trust, is not the best from a due diligence perspective. I'm being very conservative because silently introducing quadratic behavior in user code is pretty close to the scariest thing a maintainer can think about -- very bad but also very difficult to detect.

We used Google Wide Profiling to detect the memory improvements. It's described in this paper, but I don't know of a similar system in place externally, unfortunately. We can see if there's a way to get similar data for Chrome.

Also, if this came with some sort of Clang warning or clang-tidy check that flagged using resize() in a loop (because the standard doesn't guarantee amortized O(1) for that but people might be expecting it), I might ease up a bit. It still wouldn't fix the case where resize() is called in a loop lower in the call stack, but it would be good if we could detect obvious misuses automatically.

The clang-tidy check seems reasonable. I'll try to find someone who could write one.

What was the quadratic case you encountered and fixed? Is that something you can share?

I can't share the code exactly, but I can say that it was using __resize_default_init to repeatedly extend a string and copy into the new space at the end of the string. I changed it to just use push_back/append. In that case, the problematic resize was in a different function from the loop so it wouldn't be caught by a simple clang-tidy.

Re: we shouldn't use ABI flag unless it's an ABI break. I can see @mvels's point that when users update to a new ABI version and/or opt-in to the unstable ABI they may be more aware that significant changes can occur and would potentially be less likely to have a bad experience with this change. Martijn had a related idea (discussing outside this review), which is: we could make resize() inline, and delegate to a new __resize_grow_exact() for n > size(). This would break the ABI to (a) improve performance, and (b) allow for a potentially more user-friendly transition to precise resize (guarded by unstable/version 2 ABI). @ldionne, what do you think of this idea?

libcxx/test/libcxx/strings/basic.string/string.capacity/resize.exact.cpp
13 ↗(On Diff #360200)

Is there documentation about how to add a CI configuration? Note that once I renamed the test to end in "pass.cpp", it was failing and seems to be using exponential growth so I think this is necessary.

ezbr updated this revision to Diff 360541.Jul 21 2021, 11:42 AM

Update config comment and change test to not hardcode basic_string::alignment.

I just cancelled the build since it would fail anyway. We changed some things in our build scripts which causes the build failures.
Could you rebase your patch on main? Sorry for the inconvenience.

[...]

Re: we shouldn't use ABI flag unless it's an ABI break. I can see @mvels's point that when users update to a new ABI version and/or opt-in to the unstable ABI they may be more aware that significant changes can occur and would potentially be less likely to have a bad experience with this change. Martijn had a related idea (discussing outside this review), which is: we could make resize() inline, and delegate to a new __resize_grow_exact() for n > size(). This would break the ABI to (a) improve performance, and (b) allow for a potentially more user-friendly transition to precise resize (guarded by unstable/version 2 ABI). @ldionne, what do you think of this idea?

As I said, it's really not about whether to break the ABI. It's about whether this change is good in the general case or not, and I'm not so sure about that. For example, I just came across code in locale.cpp that calls install() successively, and install() precisely calls resize(id + 1), growing the vector by one each time (I think): https://github.com/llvm/llvm-project/blob/main/libcxx/src/locale.cpp#L485 and https://github.com/llvm/llvm-project/blob/main/libcxx/src/locale.cpp#L199. And it's spread across multiple functions, so even a clang-tidy check would not catch it. It would not occur to me to use this pattern, but if even libc++ is doing it, I can imagine others are doing it too despite the lack of formal complexity guarantees on resize().

I also just did an internal code search and in just 5 minutes, I can find at least a couple places that appear to be resizing in a loop with an increasing size like that.

Instead of changing resize() like this, have you considered using shrink_to_fit() more aggressively in places that have a noticeable memory impact?

I would be curious to write a paper proposing for resize() to be spec'd to have amortized O(1) complexity (which would effectively make this change non-conforming), as a way of seeing what other implementers think, and whether there has been other experience with this. If other people say "it's a useful implementation to shrink-to-fit on resize()", that would be valuable information to have. If everybody says "hmm, the intent always have been that resize() would behave consistently with push_back() w.r.t. amortized complexity", then that would be valuable input.

I'm sorry, but I can't approve something with potentially such a sharp edge without more guidance. I realize you're just trying to upstream a change that is useful on your end and I appreciate that, but I'm a bit stuck because I have to try to represent everybody's interests, and I'm really lacking data to make anything but the conservative "let's keep the status quo" decision.

ezbr updated this revision to Diff 362872.Jul 29 2021, 1:41 PM

Remove ifdef guards (so we change the default behavior) and rebase onto a newer version of llvm.

ezbr edited the summary of this revision. (Show Details)Jul 29 2021, 1:42 PM
ezbr edited the summary of this revision. (Show Details)
ezbr added a comment.Jul 29 2021, 1:44 PM

Could you rebase your patch on main? Sorry for the inconvenience.

Done

I'm sorry, but I can't approve something with potentially such a sharp edge without more guidance. I realize you're just trying to upstream a change that is useful on your end and I appreciate that, but I'm a bit stuck because I have to try to represent everybody's interests, and I'm really lacking data to make anything but the conservative "let's keep the status quo" decision.

I understand. I've gathered more evidence and revised the patch description. I've tried to provide more evidence that this change is more beneficial than it is risky, and that it's the right tradeoff to make for everyone's benefit. Please take another look.

I also just did an internal code search and in just 5 minutes, I can find at least a couple places that appear to be resizing in a loop with an increasing size like that.

I think there's a distinction between code that calls resize in a loop, and code that would have a measurable regression. For example, if the loop's trip count is always small or if the loop is not run frequently, then it's not likely to be a problem. From my experimentation and testing, code that measurably regresses due to precise resizing is exceedingly rare within Google, and I expect that outside of Google, it would also be very rare.

Instead of changing resize() like this, have you considered using shrink_to_fit() more aggressively in places that have a noticeable memory impact?

I added to the change description that we expect to save ~0.1% of total memory usage with this change, but the savings are spread diffusely across a very large code base so I think it would not be feasible for me to get the same savings with shrink_to_fit. I'd also like to reemphasize that from my testing, I expect the fraction of uses of resize that measurably regress in performance to be very small so I think it is much more feasible to change those small fraction of cases to manually use exponential growth than it is to change the overwhelming majority of cases that can benefit from precise growth.

ezbr added a comment.Aug 9 2021, 8:59 AM

Friendly ping

ldionne requested changes to this revision.Aug 10 2021, 1:45 PM

Thanks a lot for the diligent reply and explanation you added to the patch summary. I still wasn't sure about this so I asked the question on the LWG mailing list, which basically consists of other implementers and experts on the library parts of the Standard -- all folks with a lot of experience and usually great judgement.

I mainly wanted to know whether there was unwritten intent in the spec that resize behaves like push_back and append with respect to geometric growth, and also to know whether other implementers had considered or done the change proposed here in their own implementation. The feedback I got is in line with my initial intuition and the reason why I've been so cautious with this change. Note that technically, this change does conform to the Standard, so I'm not pushing back on that basis. Instead, I'm pushing back based on the following reasons, which is a mix of the feedback I got on the LWG thread and my own opinion:

  1. The Standard uses the language "appends" to define resize (http://eel.is/c++draft/string.capacity#5.2), which is consistent with how push_back is defined. Those should behave the same for consistency, and it can be seen as a lack of clarity that the Standard doesn't currently specify a complexity guarantee for resize. From a user perspective, the library is easier to learn and use if resize, push_back, append and all the other ways to append (emplace_back, etc.) behave similarly, with the only knob to control the capacity being .reserve().
  1. Making this change would also make basic_string behave unlike vector, which adds to the inconsistency. I think we should be consistent across container types.
  1. People do use patterns like StringBuilder and are used to resize behaving in amortized O(N). While regressions to O(N^2) may be rare, the library's primary role is to be suitable to a general audience more than being aligned with any single code base's needs. Even though I understand that the memory improvements would probably benefit other code bases as well, I think the possibility for regressions (and especially nasty regressions at that) outweighs that potential benefit.
  1. Other surveyed implementations all use exponential growth. Diverging from this in libc++ would create implementation divergence, which is usually (although not always) bad. In this case, I think it may hurt portability.

I understand this isn't the resolution you're looking for and I am thankful for the diligence you've put into explaining the change and measuring it. However, someone has to make a judgement call somewhere, and in this case, I'm going with consistency-in-the-API and conservatism instead of the memory consumption gains this could provide.

I will also follow up with LWG to see whether we want to make it a hard requirement for resize to behave consistently with push_back and friends, which would provide a portability benefit (users would not have to worry about their code being quadratic on some implementations).

One way forward that was suggested on the LWG thread was to add new methods resize_exactly and reserve_exactly that would allow specifying the exact capacity of the container. Inside Google, you could potentially change all calls to resize into calls to resize_exactly, which would have the same effect as this patch. Of course, that has to go through the Committee first, but I think it would have chances of success.

This revision now requires changes to proceed.Aug 10 2021, 1:45 PM
ezbr added a comment.Aug 11 2021, 11:02 AM

Hi Louis, I appreciate your detailed explanation of your decision, and I understand. We will consider the resize_exactly idea.

Best,
Evan

As a follow-up, a LWG issue has been created here to clarify what the Standard expects from implementations: http://wg21.link/LWG3579

If my opinion changes based on the resolution of that issue, I'll ping you here to revive this. Again, thanks for understanding and please do not hesitate to contribute further improvements to the library.