This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Don't return alloc_size - 1 in __recommend to add 1 when allocating
AbandonedPublic

Authored by philnik on Feb 12 2022, 4:46 AM.

Details

Reviewers
Quuxplusone
ldionne
EricWF
Group Reviewers
Restricted Project
Summary

For some reason __recommend returns the recommended allocation size minus one, and at almost every call site one is added to the returned value.

Diff Detail

Event Timeline

philnik requested review of this revision.Feb 12 2022, 4:46 AM
philnik created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2022, 4:46 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

ping

It seems like the CI doesn't like this at all. I assumed there's a missing +1 (or an extra +1) somewhere, so I haven't looked at it at all — waiting for the CI to be green before we think about whether this is a good idea or not.

I do think you should eliminate the _VSTD and HIDE_FROM_ABI diffs from this PR; this is subtle enough that fewer distractions will be better.

philnik updated this revision to Diff 409730.Feb 17 2022, 11:29 AM
  • Fix ASAN
libcxx/include/string
1547

Clang-format strikes again. (If you must clang-format, please remember to sanity-check the code when it's finished.)

3246–3249

This diff strikes me as questionable (which probably means the entire patch is questionable). The logic here previously was, "If the target capacity is equal to our current capacity, nothing to do, return. Otherwise, shrink or extend to the target capacity." Now the logic is, "If the target capacity is one greater than our current capacity, do nothing! Otherwise, shrink or extend to one less than the target capacity." Regardless of whether the arithmetic works out in practice, that logic seems very wrong. Maybe it can be saved by renaming some variables, e.g. maybe __target_capacity doesn't actually represent a capacity? Or maybe prior to line 3239 it's a capacity but after line 3239 it's a one-greater-than-a-capacity?

philnik updated this revision to Diff 410850.Feb 23 2022, 9:14 AM
philnik marked an inline comment as done.
  • Remove more ones
libcxx/include/string
1547

That wasn't clang-format, but hey.

ldionne requested changes to this revision.Feb 28 2022, 7:23 AM

I think we need to understand the situation with reserve and shrink_to_fit before moving forward with this, otherwise I think we might just be making the code more difficult to understand.

Also, this is pretty tricky because it's possible that already compiled code has the old definition of e.g. __init inlined, and hence the capacity of the string is set to N + 1. If such a string is passed to code compiled with the new definition, the new code would think that the capacity represents N, not N + 1, and would most likely mis-behave. In other words, I think this is ABI breaking.

libcxx/include/string
3246–3249

I agree, this looks fishy to me. Are you certain this wasn't a bug previously and that it should instead be if (__target_capacity == capacity()) return; *even* in the new diff? Do we have a test for that?

This revision now requires changes to proceed.Feb 28 2022, 7:23 AM

I think we need to understand the situation with reserve and shrink_to_fit before moving forward with this, otherwise I think we might just be making the code more difficult to understand.

Also, this is pretty tricky because it's possible that already compiled code has the old definition of e.g. __init inlined, and hence the capacity of the string is set to N + 1. If such a string is passed to code compiled with the new definition, the new code would think that the capacity represents N, not N + 1, and would most likely mis-behave. In other words, I think this is ABI breaking.

I'm not changing what is saved in long cap. That has always been the capacity including the null terminator AFAICT. Or is there another potential ABI break that I'm not seeing?

libcxx/include/string
3246–3249

IIUC basic_string::capacity() returns the number of characters without the null terminator while __recommend() (with this patch) returns the number of characters including the null terminator.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 5:21 AM
EricWF requested changes to this revision.Mar 3 2022, 6:58 AM
EricWF added a subscriber: EricWF.

I'm not OK with this change given that the description indicates a lack of understanding of the current situation.

EricWF added inline comments.Mar 3 2022, 7:00 AM
libcxx/include/string
1547

Please don't request whitespace changes on reviews.

EricWF added a comment.Mar 3 2022, 7:00 AM

I also agree with @ldionne's comment about this being potentially ABI breaking because of how std::string is externally instantiated.

I'm not OK with this change given that the description indicates a lack of understanding of the current situation.

Then tell me what I'm not understanding.

I also agree with @ldionne's comment about this being potentially ABI breaking because of how std::string is externally instantiated.

Can you give a scenario where that is the case? Just saying 'could be an ABI break' isn't an argument.

it's possible that already compiled code has the old definition of e.g. __init inlined, and hence the capacity of the string is set to N + 1. If such a string is passed to code compiled with the new definition, the new code would think that the capacity represents N, not N + 1, and would most likely mis-behave. In other words, I think this is ABI breaking.

I'm not changing what is saved in long cap. That has always been the capacity including the null terminator AFAICT. Or is there another potential ABI break that I'm not seeing?

I believe what Louis is getting at is that any time you change the meaning of a function's arguments (where the function is called across an ABI boundary), that can be an ABI break. For example, if f is calling arctan(a, b), and then the maintainer changes the prototype of arctan from arctan(int x, int y) to arctan(int y, int x), that's an ABI break. Or if f is calling __set_long_cap(n+1), and then the maintainer changes the prototype of __set_long_cap from __set_long_cap(int nplus1) to __set_long_cap(int n), that's an ABI break. But in this case I've double-checked that both __set_long_cap and __recommend are hide-from-abi, thus never called across an ABI boundary, so I think that's not a concern here; and if we were really really paranoid, we could even change the names of some of these helper functions.

libcxx/include/string
3246–3249

FWIW, I would bet that this is not a "real bug"; I would bet the machine code is correct. But I feel moderately strongly that we shouldn't ship source code that looks this confusing.

It is also possible that there is a bug in this vicinity after all. See https://llvm.org/pr51816 .

! In D119628#3356907, @philnik wrote:

! In D119628#3348916, @ldionne wrote:
it's possible that already compiled code has the old definition of e.g. __init inlined, and hence the capacity of the string is set to N + 1. If such a string is passed to code compiled with the new definition, the new code would think that the capacity represents N, not N + 1, and would most likely mis-behave. In other words, I think this is ABI breaking.

I'm not changing what is saved in long cap. That has always been the capacity including the null terminator AFAICT. Or is there another potential ABI break that I'm not seeing?

Ahhhhhh, that's actually what I missed. I saw the change from __set_long_cap(__cap+1); to __set_long_cap(__cap);, and I went "hmm this changes the meaning of the capacity held inside the string, so that's an ABI break". But in reality, it doesn't, because __recommend returns 1 more than it used to after applying your patch. So in both cases we are setting the capacity to the same value. Please double-check my reasoning, and if we're on the same page, then thanks for correcting me -- I don't think there's any ABI break in this patch.

libcxx/include/string
3246–3249

Let me reformulate my ask: @philnik (or someone else), can you please explain why this is correct:

if (__target_capacity == capacity() + 1)
  return;

It seems to me that it should instead be

if (__target_capacity == capacity())
  return;

I suspect I'll be much happier shipping this patch once I understand it (and it will probably mandate a comment since it's not trivial to understand).

@philnik explained what I was misunderstanding on Discord. I believe this patch is correct, and the code might be easier to understand if we clarify a few things in the code, like I suggest below.

Basically, what I understand is that when Howard initially wrote string, he made __recommend() exclude the null terminator so that it is consistent with capacity(). However, the capacity stored in the string object always included the null-terminator, which led to having to adjust the result of __recommend() in a bunch of places. This patch moves the inconsistency: it makes __recommend() and the stored capacity consistent, at the cost of making __recommend() and capacity() inconsistent with each other, needing a couple of adjustments (two to be exact, where I requested some comments). Given this, I don't think this patch is an absolute must-do -- we're just shifting complexity around. However, I do feel like it does reduce the complexity overall, so I'm favourable.

libcxx/include/string
933–938
1498

We should add a comment here (I would have added it to the actual data member, but it's hard because it's defined in the weirdest way) saying something like this:

// Note that the capacity we store always includes the null-terminator.
1542–1544

Can you please add a comment explaining what this returns? I.e. something like

// Returns the recommended capacity to store a string of size __s, including the null-terminator.
// Note that unlike __recommend(), string::capacity() returns the capacity *excluding* the null-terminator.
3246–3247

WDYT?

3258–3259
philnik updated this revision to Diff 412787.Mar 3 2022, 11:30 AM
philnik marked 10 inline comments as done.
  • Address comments
ldionne accepted this revision as: ldionne.Mar 3 2022, 12:02 PM

This looks correct to me, and I think it's an improvement in clarity over the status quo. LGTM, but I'd like someone else to also take a close look since this is std::string, i.e. one of the most used classes in the library.

Quuxplusone requested changes to this revision.Mar 3 2022, 3:30 PM
Quuxplusone added inline comments.
libcxx/include/string
2261–2264

(1) I'm still very unsure that this is correct. @philnik, can you maybe make up a cheat-sheet that shows which of these numbers will include +1 for the null terminator and which won't?

  • x = capacity()
  • __set_long_cap(x)
  • x = __recommend(...)
  • __grow_by(x, ...)
  • __grow_by_and_replace(x, ...)

Ideally, in the end, they'd all use the same meaning for "capacity" — which means it should not include the null terminator, because capacity() doesn't include the null terminator.

(2) On this diff in particular, aren't you accidentally failing to +1 the capacity when (__old_cap < __ms / 2 - __alignment)? Please investigate and add a regression test for what you find.

This revision now requires changes to proceed.Mar 3 2022, 3:30 PM
philnik added inline comments.Mar 3 2022, 3:56 PM
libcxx/include/string
2261–2264

(1)

functionx includes null-terminator?
x = capacity()N
__set_long_cap(x)Y
x = __recommend()Y
__grow_by(x, ...)N
__grow_by_and_replace(x, ...)N
x = __get_long_cap()Y

It's not possible for all of these to have the same meaning, because of ABI. We can't change __grow_by{, _and_replace} and __{get, set}_long_cap() because of ABI and capacity() because the standard demands different behavior.

(2) Do you mean the false branch? In that case I think it should be __ms. I'll add a regression test.

philnik abandoned this revision.Jun 26 2022, 12:46 AM