This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Remove recursion in basic_string::insert(const_iterator, ForwardIterator, ForwardIterator)
ClosedPublic

Authored by philnik on Feb 12 2022, 8:41 AM.

Details

Summary

__addr_in_range is a non-constexpr function, so we can't call it during constant evaluation.

Diff Detail

Event Timeline

philnik requested review of this revision.Feb 12 2022, 8:41 AM
philnik created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2022, 8:41 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
miscco accepted this revision.Feb 12 2022, 1:12 PM
miscco added a subscriber: miscco.

That was one of the things that pained me the most. We should really have something like __builtin_ranges_overlap without all the UB

I think this code looks correct enough, but it would be a lot less scary if you could just make __addr_in_range a non-constexpr function, instead of having it "lie" during constexpr evaluation and get us into infinite loops. E.g.:

if (__libcpp_is_constant_evaluated() || !__string_is_trivial_iterator<_ForwardIterator>::value || __addr_in_range(*__first)) {
    // Make a copy and then insert from the copy.
    const basic_string __temp(__first, __last, __alloc());
    return __insert_from_safe_copy(__n, __ip, __temp.begin(), __temp.end());
} else {
    return __insert_from_safe_copy(__n, __ip, __first, __last);
}

Note that __insert_from_safe_copy might be a better name than __insert_not_in_range, because being overlapping is just one of the many failure modes we need to guard against here.

Read (or re-read) https://quuxplusone.github.io/blog/2021/04/17/pathological-string-appends/ before deciding on this approach. "Burn the whole thing to the ground and rewrite it" might be a reasonable alternative, especially if the alternative is increasing the complexity of this already-complex code.

I think this code looks correct enough, but it would be a lot less scary if you could just make __addr_in_range a non-constexpr function, instead of having it "lie" during constexpr evaluation and get us into infinite loops. E.g.:

if (__libcpp_is_constant_evaluated() || !__string_is_trivial_iterator<_ForwardIterator>::value || __addr_in_range(*__first)) {
    // Make a copy and then insert from the copy.
    const basic_string __temp(__first, __last, __alloc());
    return __insert_from_safe_copy(__n, __ip, __temp.begin(), __temp.end());
} else {
    return __insert_from_safe_copy(__n, __ip, __first, __last);
}

Note that __insert_from_safe_copy might be a better name than __insert_not_in_range, because being overlapping is just one of the many failure modes we need to guard against here.

Read (or re-read) https://quuxplusone.github.io/blog/2021/04/17/pathological-string-appends/ before deciding on this approach. "Burn the whole thing to the ground and rewrite it" might be a reasonable alternative, especially if the alternative is increasing the complexity of this already-complex code.

What would you like to see changed, other than the name? I don't really see a way to make this code simpler.

Note that __insert_from_safe_copy might be a better name than __insert_not_in_range, because being overlapping is just one of the many failure modes we need to guard against here.

Read (or re-read) https://quuxplusone.github.io/blog/2021/04/17/pathological-string-appends/ before deciding on this approach. "Burn the whole thing to the ground and rewrite it" might be a reasonable alternative, especially if the alternative is increasing the complexity of this already-complex code.

What would you like to see changed, other than the name? I don't really see a way to make this code simpler.

Yeah, simplifying might not be possible, and on closer inspection is at least not low-hanging fruit. Microsoft's insert is about as complicated these days: https://github.com/microsoft/STL/blob/8096a27/stl/inc/xstring#L3437-L3455

And __addr_in_range is already non-constexpr; let's keep it that way. My comment was based on your commit message, which said "__addr_in_range cannot be known during constant evaluation, so it has to always return true during constant evaluation" — that's incorrect. __addr_in_range is simply a non-constexpr function; it doesn't return anything during constant evaluation because it can't be called. (And this is as it should be.) Please update the commit message to say something like "__addr_in_range is a non-constexpr function, so we shouldn't call it during constant evaluation" or something like that.

philnik updated this revision to Diff 408983.Feb 15 2022, 11:38 AM
  • rename to __insert_from_safe_copy
philnik edited the summary of this revision. (Show Details)Feb 15 2022, 11:40 AM

Woops, I just saw this after I landed 5c53afe5aac0d295a9345cd439c6caf3ac5eb8ba. You'll want to rebase onto main.

Quuxplusone accepted this revision.Feb 25 2022, 7:31 AM

LGTM; some remaining nits but they're all optional IIUC.

libcxx/include/string
1439

Nit: I suggest following the guideline "Use the most aggressive constexpr possible for internal details," and I see no reason this couldn't be constexpr in C++14, right?

1465–1466

Personally I wouldn't have touched these lines. But I don't really care.

2818

Should you make my suggested change right now? Or are you planning to make it in whatever subsequent PR adds _LIBCPP_CONSTEXPR_AFTER_CXX17 to this insert method?

This revision is now accepted and ready to land.Feb 25 2022, 7:31 AM
This revision was automatically updated to reflect the committed changes.
philnik marked 3 inline comments as done.