Page MenuHomePhabricator

[libc++] Make sure basic_string::reserve never shrinks in all standard modes
ClosedPublic

Authored by ldionne on Jan 14 2022, 9:47 AM.

Details

Summary

Since basic_string::reserve is instantiated in the shared library but also
available to the compiler for inlining, its definition should not depend on
things like the Standard mode in use. Indeed, that flag may not match between
how the shared library is compiled and how users are compiling their own code,
resulting in ODR violations.

A valid implementation of reserve() has always been to never shrink,
even before C++20. This patch uses that implementation freedom to resolve
the issue by always implementing the C++20 behavior.

Fixes llvm-project#53170

Diff Detail

Event Timeline

ldionne requested review of this revision.Jan 14 2022, 9:47 AM
ldionne created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2022, 9:47 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Note that we could technically have made the change only for reserve(n), while leaving reserve() to actually perform shrink_to_fit. However, my opinion is that reserve() should be equivalent to reserve(0) (which is how it used to be specified before C++20), and so it should behave consistently with it.

ldionne updated this revision to Diff 400839.Jan 18 2022, 7:10 AM

Add XFAIL for older Apple dylibs and rebase onto main.

Gentle ping -- I'd like to get other people's eyes on this and ship this quickly since this is a pretty bad bug.

Mordante accepted this revision as: Mordante.Jan 18 2022, 11:19 AM
Mordante added a subscriber: Mordante.

LGTM! I really like the amount of comments.

libcxx/include/string
3269

Everything here makes sense except the decision to make s.reserve() nonequivalent to s.shrink_to_fit(). I don't see any ABI issue with that, so that part of the patch is orthogonal to the rest — and it seems to be altering behavior that is both useful and cppreference-documented. I suggest doing everything else you're doing, but not that specific one-line diff, because I don't see its upside.

libcxx/docs/ReleaseNotes.rst
114–118

Consider reflowing to fit in Phab's column count. :)
"so that both these calls never shrink the string consistently in all Standard modes" — so they always shrink it inconsistently? 😛 Suggestion:

- C++20 requires that ``std::basic_string::reserve`` never reduce the capacity
  of the string. (For that, use ``shrink_to_fit``.) Prior to this release, libc++'s
  ``std::basic_string::reserve(n)`` could reduce capacity in C++17 and before,
  but not in C++20 and later. This caused ODR violations when mixing code compiled
  under different Standard modes. After this change,
  ``std::basic_string::reserve(n)`` never reduces capacity, even in C++17 
  and before.
  Also, after this change, `std::basic_string::reserve()`` is treated as a no-op,
  rather than as a synonym for ``shrink_to_fit()``. [Arthur says: Why?]
libcxx/include/string
989

Why not? https://en.cppreference.com/w/cpp/string/basic_string/reserve specifically says that the zero-argument overload (which is deprecated in C++20) is supposed to be a synonym for shrink_to_fit. Why wouldn't we just do that?

Everything here makes sense except the decision to make s.reserve() nonequivalent to s.shrink_to_fit(). I don't see any ABI issue with that, so that part of the patch is orthogonal to the rest — and it seems to be altering behavior that is both useful and cppreference-documented. I suggest doing everything else you're doing, but not that specific one-line diff, because I don't see its upside.

The rationale is that reserve() is supposed to be the same as reserve(0) -- actually in C++03 reserve is specified as a single overload with a default argument: reserve(int = 0). If I make reserve() a synonym for shrink_to_fit(), it won't be equivalent to reserve(0) anymore. I think we have to pick our poison here: either reserve() and reserve(0) are inconsistent, or reserve() and reserve(0) are consistent but reserve() is inconsistent with what the Standard suggests.

Thoughts? Does that change your opinion?

Quuxplusone added inline comments.
libcxx/docs/ReleaseNotes.rst
114–118

The rationale is that reserve() is supposed to be the same as reserve(0) -- actually in C++03 reserve is specified as a single overload with a default argument: reserve(int = 0). If I make reserve() a synonym for shrink_to_fit(), it won't be equivalent to reserve(0) anymore.

reserve() is explicitly supposed to not-be-equivalent to reserve(0), in C++20 and later... but, at the same time, it's also explicitly deprecated. Okay, I buy that. I'd explain it like this:

- C++20 requires that ``std::basic_string::reserve(n)`` never reduce the capacity
  of the string. (For that, use ``shrink_to_fit``.) Prior to this release, libc++'s
  ``std::basic_string::reserve`` could reduce capacity in C++17 and before,
  but not in C++20 and later. This caused ODR violations when mixing code compiled
  under different Standard modes. After this change, libc++'s
  ``std::basic_string::reserve`` never reduces capacity, even in C++17 
  and before.
  C++20 deprecates the zero-argument overload of ``std::basic_string::reserve()``,
  but specifically permits it to reduce capacity. libc++ does not take advantage
  of this permission: we treat ``reserve()`` as a synonym for ``reserve(0)``
  in all Standard modes.

I am reconsidering my decision now. Let's put ourselves in the shoes of a programmer that wrote code in C++03 when the standard did say "if n < size(), this is a non-binding shrink-to-fit request". They most likely used str.reserve() with the exact expectation that it was going to shrink_to_fit.

API consistency is relevant for new code being written, however new code being written won't use reserve() because it's deprecated. So my opinion is now that if we change reserve() to never shrink, we're making the wrong tradeoff - existing code may change performance characteristics, and new code won't benefit from the consistency because new code won't use reserve() anyways.

Thoughts? If you agree, I'll turn this patch around and make reserve(n) never shrink in all standard modes, but I won't touch reserve() (so that it shrinks to fit in all standard modes).

I am reconsidering my decision now. Let's put ourselves in the shoes of a programmer that wrote code in C++03 when the standard did say "if n < size(), this is a non-binding shrink-to-fit request". They most likely used str.reserve() with the exact expectation that it was going to shrink_to_fit.

API consistency is relevant for new code being written, however new code being written won't use reserve() because it's deprecated. So my opinion is now that if we change reserve() to never shrink, we're making the wrong tradeoff - existing code may change performance characteristics, and new code won't benefit from the consistency because new code won't use reserve() anyways.

Thoughts? If you agree, I'll turn this patch around and make reserve(n) never shrink in all standard modes, but I won't touch reserve() (so that it shrinks to fit in all standard modes).

Sounds like I convinced you right as you convinced me? 😛 I think part of my confusion is that there are like three different things we might be worried about:
(A) Behavior change between s.reserve() compiled with Clang 14 -std=c++17 versus the same source code compiled with Clang 14 -std=c++20 (which leads to ODR violation when linked together).
(B) Behavior change between C++17 s.reserve() compiled with Clang 13, versus the same source code compiled with Clang 14.
(C) Behavior change between C++17 s.reserve() compiled with Clang 13 -O0 and linked against the Clang 13 dylib, versus the same object file linked against the Clang 14 dylib.
(D) Behavior change between s.reserve() (compiled with Clang 14 in any mode), versus s.reserve(0) (ditto).

Prior to D91778, libc++ used the default-argument implementation (Default arguments are the devil!), so a source-level call to s.reserve() in old code would have been secretly calling basic_string::reserve(0).

What @rsmith was complaining about was (A). Your original proposal (and what I was agreeing to when I clicked "Accept" :)) fixes (A) and (D) but breaks (B) and (C).
My proposal to leave C++20 .reserve() as shrink-to-fit (and what you just offered to do, IIUC) fixes (A) and (B) but breaks (C) and (D).

I think we cannot avoid breaking (C), because the dylib must implement the C++20 behavior for basic_string::reserve(0), so a compiled call to basic_string::reserve(0) must not shrink, and in Clang 13 s.reserve() was secretly a call to basic_string::reserve(0).

Anyway, you can go with the solution I suggested or the solution I accepted — I guess I have no call to be unhappy either way. ;)

I've no strong preference to which solution we take.

ldionne updated this revision to Diff 402567.Jan 24 2022, 9:23 AM
ldionne marked 4 inline comments as done.

Don't change the behavior of basic_string::reserve().

Quuxplusone accepted this revision.Jan 24 2022, 9:38 AM
Quuxplusone added inline comments.
libcxx/docs/ReleaseNotes.rst
127
This revision is now accepted and ready to land.Jan 24 2022, 9:38 AM
ldionne marked an inline comment as done.Jan 24 2022, 12:42 PM

Applying Arthur's comment upon committing. The AIX CI failure is unrelated, so I'll ignore it.

This revision was landed with ongoing or failed builds.Jan 24 2022, 12:43 PM
This revision was automatically updated to reflect the committed changes.