This is an archive of the discontinued LLVM Phabricator instance.

[libc++][NFC] Remove goto from std::string
ClosedPublic

Authored by philnik on Dec 12 2021, 5:34 AM.

Details

Summary

Remove goto from std::string

Diff Detail

Event Timeline

philnik requested review of this revision.Dec 12 2021, 5:34 AM
philnik created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2021, 5:34 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone requested changes to this revision.Dec 12 2021, 8:24 AM
Quuxplusone added inline comments.
libcxx/include/string
1697–1704

We could make this

basic_string<_CharT, _Traits, _Allocator>& __null_terminate_at(value_type *__p, size_type __newsz)
{
    __set_size(__newsz);
    __invalidate_iterators_past(__newsz);
    traits_type::assign(__p[__newsz], value_type());
    return *this;
}

and then change about 10 callers to the appropriate choice out of

return __null_terminate_at(__p, __n);
__null_terminate_at(__get_pointer(), __n);
__null_terminate_at(__p, __sz + __n);
return __null_terminate_at(__p, __sz + (__n2 - __n1));

etc. The only friction I see to that is that some of these callers, e.g. append and insert, are currently skipping the call to __invalidate_iterators_past(__sz) because they know there aren't any valid iterators past the new size. So either we pessimize them in debug mode, or more likely we just leave them as-is and don't upgrade them to this new helper function.

1698–1699

This comment seems obsolete; I have no idea what "check" it's talking about. Let's remove the comment.

This revision now requires changes to proceed.Dec 12 2021, 8:24 AM
nilayvaish added inline comments.
libcxx/include/string
1698–1699

Maybe the UBSAN annotation on line 2988 and 3038. I am guessing this will get sanitizer's attention now. Should we annotate this function with _LIBCPP_DISABLE_UBSAN_UNSIGNED_INTEGER_CHECK?

philnik updated this revision to Diff 393772.Dec 12 2021, 11:45 AM
philnik marked 2 inline comments as done.

Applied Quuxplusone's suggestion

nilayvaish accepted this revision.Dec 12 2021, 12:04 PM
nilayvaish requested changes to this revision.Dec 12 2021, 12:11 PM
nilayvaish added inline comments.
libcxx/include/string
1697

The first param should be value_type* and update places where pointer is being to passed to pass value_type*.

This revision now requires changes to proceed.Dec 12 2021, 12:11 PM

LGTM now, although it'd be nice to find the answer to my question about pop_back and __erase_to_end's history, just in case it matters.

libcxx/include/string
1698–1699

Thanks, that's definitely it! Instead of keeping the ubsan macro at all, though, I believe we could just change new line 3020 to avoid the overflow: return __null_terminate_at(__p, __sz - (__n1 - __n2)); because at that point __n1 >= __n2, unless I'm mistaken.

2361

This can return __null_terminate_at(__p, __n) too. Tail-call for the win.

3001

Here and below: __sz + __n2 - __n1, right? I hope this caused some test failures! :)

3222

Huh. Anyone got any idea why the left-hand code was so... hand-inlined? Anything in the git log that sheds light on this? Ditto __erase_to_end below.

philnik updated this revision to Diff 393778.Dec 12 2021, 2:06 PM
philnik marked 4 inline comments as done.
  • Replaced null_terminate_at() with erase_to_end() in pop_back()
  • Added _LIBCPP_HIDE_FROM_ABI
  • Fixed bugs
nilayvaish accepted this revision.Dec 12 2021, 2:18 PM
nilayvaish added inline comments.
libcxx/include/string
3222

I think we want to avoid checking whether the string is long or short again and again. Would likely save a few cycles.

philnik marked 2 inline comments as done.Dec 12 2021, 2:21 PM
philnik added inline comments.
libcxx/include/string
3222

This code is part of the initial libc++ commit. Unless you know where to get any (svn?) log, I think this question can't be answered. The only thing I could guess is some sort of compiler limitation 12+ years ago.

ldionne accepted this revision.Dec 13 2021, 6:16 AM

LGTM with comment. Thanks for the refactoring, and happy this gets us closer to being constexpr friendly.

libcxx/include/string
3020
This revision is now accepted and ready to land.Dec 13 2021, 6:16 AM
Quuxplusone accepted this revision.Dec 13 2021, 8:55 AM
Quuxplusone added inline comments.
libcxx/include/string
3001

Same issue as line 3020: either __sz + (__n2 - __n1) or __sz - __n1 + __n2.

3020

Or "match line 3023" should also work, I think: __sz - __n1 + __n2
The important thing is not to let __sz + __n2 be a subexpression, because it might wrap around, triggering that UBSan checker.

philnik updated this revision to Diff 394081.Dec 13 2021, 4:22 PM
philnik marked 4 inline comments as done.

Added paranthesis

This revision was automatically updated to reflect the committed changes.