This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Remove the special logic for "noexcept iterators" in basic_string
ClosedPublic

Authored by Quuxplusone on Mar 12 2021, 7:59 PM.

Details

Summary
This reverts a large chunk of http://reviews.llvm.org/D15862 ,
and also fixes bugs in `insert`, `append`, and `assign`, which are now regression-tested.
(Thanks to Tim Song for pointing out the bug in `append`!)

Before this patch, we did a special dance in `append`, `assign`, and `insert`
(but not `replace`). All of these require the strong exception guarantee,
even when the user-provided InputIterator might have throwing operations.

The naive way to accomplish this is to construct a temporary string and
then append/assign/insert from the temporary; i.e., finish all the potentially
throwing InputIterator operations *before* starting to modify the `*this`
object. We need to do this in the following situations:

- If reallocation is needed, but reallocation would invalidate the original
    iterators. (Append-from-self, for example.)

- If reallocation is needed, but the original iterators' operations are
    throwing. We must provide the strong guarantee, and reallocation is
    an observable side effect, so if an iterator operation might throw,
    we must do all iterator operations prior to reallocating.

- If the original iterators' operations are throwing and we're going to
    be making any "irreversible" modifications to the string's data.
    Appending to the end of the string doesn't count as irreversible,
    because we can just restore the null byte and we're all good.
    Inserting in the middle of the string counts as irreversible simply
    because we haven't written a codepath to reverse it.

Most iterators have throwing (that is, non-noexcept) iterator operations.
Pointers and some libc++-provided iterators can mark themselves as
"trivial iterators" in order to get the non-naive behavior.

The old SFINAE condition attempted to check the specific iterator operations
++, ==, etc., even for user-provided iterators; but it was so complicated
that of course it was wrong; the new regression tests
`basic.string/string.modifiers/string_{insert,append}/strong_guarantee.pass.cpp`
both fail without this patch.

Finally, ADL-proof the call to `__ptr_in_range` and add a regression test.

( Background: https://stackoverflow.com/questions/66459162/where-do-standard-library-or-compilers-leverage-noexcept-move-semantics-other-t/66498981#66498981 )

Diff Detail

Event Timeline

Quuxplusone requested review of this revision.Mar 12 2021, 7:59 PM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald TranscriptMar 12 2021, 7:59 PM

Remove a stray assertion in the new strong_guarantee.pass.cpp.

tcanens added a subscriber: tcanens.EditedMar 14 2021, 5:32 PM

For append, we can always use the optimized approach, because if an iterator operation throws, we just restore the null terminator and we're back where we started, no problem.

Not if you have already reallocated the string and invalidated existing pointers and iterators.

Not if you have already reallocated the string and invalidated existing pointers and iterators.

Yikes!! Yes, you're right. I've added a new regression test for append (which fails before this patch, and also failed with the previous iteration of this patch). Am I adding too many epicycles to append? Should we just do the libstdc++ thing and make these functions always allocate a temporary string? Perhaps "anybody who cares that much about performance should just use the pointer+length overloads of these methods, not the arbitrary-input-iterator overloads"?

UNSUPPORTED: no-exceptions on the new tests.

ldionne requested changes to this revision.Apr 12 2021, 11:40 AM

Sorry for taking so long, I had this open for a long time but I was struggling with the condition in basic_string::append.

I'd be curious to know how you stumbled upon this.

libcxx/include/string
2660

Nit: I find the parens around !__libcpp_is_trivial_iterator<_ForwardIterator>::value confusing, are you OK with removing them?

Also, I would add a short comment saying something like

// If we're appending from a range inside this string or if the iterators have non-trivial operations AND we need to allocate to complete the operation, make a temporary copy so we can provide the strong exception guarantee.

... or something like that. You're much better at English than I am, but I think a comment would be helpful since this line is quite packed with important information.

2820–2824

I'm not sure I understand this special case. Does it only save us from using a temporary string in case we're doing self assignment (line 2836 below) *and* inserting at the end of the string? If so, I think it's fine to leave it out and not special-case it.

This revision now requires changes to proceed.Apr 12 2021, 11:40 AM
Quuxplusone planned changes to this revision.Apr 12 2021, 12:05 PM
Quuxplusone added inline comments.
libcxx/include/string
2660

Re parens: I prefer never to write !x && y because I believe people often thinko that when they meant !(x && y). However, swapping them to give (y && !x) has no such issue, and in this case it's safe to write it that way, so I probably will do that.

Between this and your comment below, I think I need to revisit this patch and make sure I still understand the situations in which these codepaths are hit! :) I'll repost with more code comments.

libcxx/include/string
2660

I'd be curious to know how you stumbled upon this.

I stumbled upon it in the process of answering https://stackoverflow.com/questions/66459162/where-do-standard-library-or-compilers-leverage-noexcept-move-semantics-other-t/66498981#66498981
I didn't actually hit the bug in real life; I just looked at the code and said "this is doing wacky complicated stuff with user-defined types, I guarantee it's wrong" and lo and behold, it was wrong. :)

This is also the second time this week I have an excuse to link to https://quuxplusone.github.io/blog/2018/06/12/attribute-noexcept-verify/ ;) in that the bug here is basically of the shape "I'm testing operation X for noexceptness, but then going and doing operation Y instead."

Quuxplusone edited the summary of this revision. (Show Details)

Rebase. Add a code comment and refactor a condition. Remove the simple but strictly unnecessary special case for "insert at end of string." Completely rewrite the commit message again. Drive-by ADL-proofing.

/>>/> >/ in test

Quuxplusone marked 2 inline comments as done.Apr 15 2021, 10:31 AM
Quuxplusone added inline comments.
libcxx/include/string
517

Oops, left over from debugging. Will let this sit for now, to let buildkite run the tests; but this new include will go away.

Poke buildkite.

libcxx/include/string
2661

It occurred to me just now (and I've asked in #standardese on Slack) that this condition is insufficient to deal with the trailing null byte, which is accessible but not part of the range [data, data+size). Here are two problematic cases in the wild today:
https://godbolt.org/z/q6dnKEvqc (libstdc++ seems to fail but only on Clang, somehow)
https://godbolt.org/z/da7zq58c6 (libc++ fails)
If these cases are legitimate, then I might need to continue messing with this code.

libcxx/include/string
2661

Botheration! I don't think we can use any of the fast paths here.
https://godbolt.org/z/e8hTj5Ysf
http://eel.is/c++draft/string.modifiers#string.append-14 is super clear that the effects of s.append(first, last) must be the same as the naïve s.append(string(first, last)); and it's super easy to create a pathological iterator where *__first is not in the range [data, data+size] but *(__first+1) is in that range.

libcxx/include/string
2661

Oh yeah, duh. https://godbolt.org/z/TdhnTeq1Y
__libcpp_trivial_iterator ensures that the iterator operations are noexcept, but it completely fails to check that the conversion to char is noexcept, so we lose the strong guarantee there, as well. Sounds like we have a good excuse to kill off not only __libcpp_string_gets_noexcept_iterator<T> but also __libcpp_is_trivial_iterator<T>.

Well, this has certainly been a can of worms!

Quuxplusone edited the summary of this revision. (Show Details)

Nth time's the charm!
Add more regression tests, putting them next to the old tests this time instead of in new files.
Fix an off-by-one error in __ptr_in_range, and rename it to __addr_in_range — notice that addressof(*__first) is ill-formed when __first is a move_iterator, so we shuffle the addressof operation into the context where we know __t is an lvalue.

Quuxplusone retitled this revision from [libc++] Remove most of the special logic for "noexcept iterators" in basic_string to [libc++] Remove the special logic for "noexcept iterators" in basic_string.Apr 16 2021, 6:16 PM

Poke buildkite.

Fix the C++03 build: can't instantiate test_exceptions with local struct type Widget, so make it a non-local struct type.

The gift that keeps on giving... ASAN finds a memory leak in basic_string::__init — apparently __init is called from constructors, so if it throws, it is responsible for deallocating the buffer itself. This is now fixed.
https://buildkite.com/llvm-project/libcxx-ci/builds/2613#fc0d2573-8a43-4b02-b142-009a84ef9b06

libcxx/include/string
2161–2176

@ldionne @EricWF I'd appreciate some extra eyeballs on this part. This memory leak was detected by ASAN: https://buildkite.com/llvm-project/libcxx-ci/builds/2613#fc0d2573-8a43-4b02-b142-009a84ef9b06 I'm not real familiar with the role of basic_string::__init; this fix seems straightforward, but I wonder if there's a subtle reason someone left it off originally. Compare to lines 2118–2132 above.

@ldionne: The Twitter/Reddit reaction to my blog post ( https://quuxplusone.github.io/blog/2021/04/17/pathological-string-appends/ ) indicates that we can probably do something MUCH simpler than what we are doing even in this PR. However, I'd like to land this PR as-is and asap, being as it adds a bunch of new tests that any alternative solution would also have to be able to pass. Thus providing a nice baseline for me or anyone else who wants to pick up the "simplify further" part, down the road.

ldionne requested changes to this revision.Apr 26 2021, 5:59 AM

I remember you had a guard that would put back '\0' at the end of the string if the operation failed in a previous version of this patch. To be sure I understand, that was only relevant when you tried using the non-allocating fast path with iterators that don't have noexcept operations, which you don't do anymore (now those will allocate "naively"). Correct?

libcxx/include/string
640

I don't understand why only pointers to arithmetic types are considered trivial here?

1701

_LIBCPP_INLINE_VISIBILTY

2161–2176

My knee jerk reaction would be that it was left off because traits_type::assign was thought of as being non-throwing, and nobody thought that the iterator types themselves could throw in their operations. IOW, I would be tempted to say it's an oversight.

2479

You can't dereference __first here if __first == __last.

This revision now requires changes to proceed.Apr 26 2021, 5:59 AM
Quuxplusone marked 4 inline comments as done.Apr 26 2021, 7:19 AM

I remember you had a guard that would put back '\0' at the end of the string if the operation failed in a previous version of this patch.

That guard was too naïve; it failed to deal correctly with https://quuxplusone.github.io/blog/2021/04/17/pathological-string-appends/#self-appending-via-conversion-operators
The guard could reliably "undo" the unzeroing of the \0; but the problem was that during the iteration the user might ask to look at that \0 again, and if its value had already changed, then boom. It's not enough to "plan to undo our change to the \0"; we must actually never change the \0 (until the very last step). So, no more guard.

libcxx/include/string
640

We need to exclude value_types like struct Evil; i.e., the permissible value_types are non-class-types-convertible-to-_CharT. In practice I figured that boils down to "arithmetic types," since other non-class value_types (e.g. int*) won't be convertible to _CharT in the first place.
We could get cleverer with something like _BOOL_CONSTANT<!is_class<_Tp>::value>, but that's "cleverness" (which is how we got into the mess) and vastly diminishing returns.

2479

If we get to that subexpression, we know that __string_is_trivial_iterator<_ForwardIterator>::value (so __n is meaningful) and !(__cap >= __n) (so __n > __cap), so we know __n > 0, so we know __first != __last.

(FYI: We are still probably being non-conforming by dereferencing __first both here and below. I think we're mandated to evaluate *it only once for each it in the range. But that was a pre-existing issue, and I'd like to land this "bugfix" patch before worrying about the quantum-leap-simplification to a different algorithm.)

Quuxplusone marked 2 inline comments as done.

@ldionne's review comments.

ldionne accepted this revision.Apr 26 2021, 8:30 AM
ldionne added inline comments.
libcxx/include/string
2479

You're right in your analysis, I didn't take into account that you were short-circuiting the || if n == 0.

This revision is now accepted and ready to land.Apr 26 2021, 8:30 AM
libcxx/include/string