This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Fix bug in ranges::advance and refactor the tests
ClosedPublic

Authored by ldionne on Jan 13 2022, 11:04 AM.

Details

Reviewers
Quuxplusone
Group Reviewers
Restricted Project
Commits
rGd27cbfa9d366: [libc++] Fix bug in ranges::advance
Summary

In ranges::advance(iter, n, bound), we'd incorrectly handle the case
where bound < iter and n is 0:

int a[10];
int *p = a+5;
int *bound = a+3;
std::ranges::advance(p, 0, bound);
assert(p - a == 5); // we'd return 3 before this patch

This was caused by an incorrect handling of 0 inside __magnitude_eq.

Diff Detail

Event Timeline

ldionne requested review of this revision.Jan 13 2022, 11:04 AM
ldionne created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2022, 11:04 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
ldionne added inline comments.Jan 13 2022, 11:06 AM
libcxx/include/__iterator/advance.h
75–76

I'm open to bikeshedding this implementation.

AFAICT, this implementation is a general-purpose magnitude->= implementation that doesn't suffer from overflow (I think).

We technically don't need something as general, but my preference would be to keep it general unless we have a noticeably more performant implementation, because keeping it general makes it easier to understand.

Quuxplusone added inline comments.
libcxx/include/__iterator/advance.h
158

I would prefer to inline it at its one call site, here, rather than generalize it unnecessarily.

Unfortunately https://eel.is/c++draft/range.iter.op.advance#6.1.1 seems to mandate that even when __n == 0, we need to evaluate __bound - __i and then, if that's also zero, we must __i = __bound instead of simply returning immediately with 0. Is there any blanket-wording loophole that gets us out of that obligation? This code could get much simpler if we were allowed to put if (__n == 0) return 0; on line 168.

Also, notice that if !(bidirectional_iterator<_Ip> && same_as<_Ip, _Sp>), then comparing __n <= __M is going to be superfluous — we know __n can't be negative unless (bidirectional_iterator<_Ip> && same_as<_Ip, _Sp>), so, if !(bidirectional_iterator<_Ip> && same_as<_Ip, _Sp>), it would be really nice not to generate any test for that possibility.

https://godbolt.org/z/jE66PEP4E — I haven't really looked at this code but it sure looks like libc++ is doing something suboptimal right now, doesn't it?

libcxx/test/std/iterators/iterator.primitives/range.iter.ops/range.iter.ops.advance/iterator_count_sentinel.pass.cpp
202

This is obviously better than it was. But I'd love to see some simple regression tests like we were doing on Discord:

int a[10];
int *p;
const int *pc;
using ConstPtr = const int*;

// forward with same type
p = a+5; assert(std::ranges::advance(p, 0, a+7) == 0); assert(p == a+5);
p = a+5; assert(std::ranges::advance(p, 1, a+7) == 0); assert(p == a+6);
p = a+5; assert(std::ranges::advance(p, 2, a+7) == 0); assert(p == a+7);
p = a+5; assert(std::ranges::advance(p, 3, a+7) == 1); assert(p == a+7);

// forward with different, non-assignable sentinel
p = a+5; assert(std::ranges::advance(p, 0, ConstPtr(a+7)) == 0); assert(p == a+5);
p = a+5; assert(std::ranges::advance(p, 1, ConstPtr(a+7)) == 0); assert(p == a+6);
p = a+5; assert(std::ranges::advance(p, 2, ConstPtr(a+7)) == 0); assert(p == a+7);
p = a+5; assert(std::ranges::advance(p, 3, ConstPtr(a+7)) == 1); assert(p == a+7);

// forward with different, assignable sentinel
pc = a+5; assert(std::ranges::advance(pc, 0, a+7) == 0); assert(pc == a+5);
pc = a+5; assert(std::ranges::advance(pc, 1, a+7) == 0); assert(pc == a+6);
pc = a+5; assert(std::ranges::advance(pc, 2, a+7) == 0); assert(pc == a+7);
pc = a+5; assert(std::ranges::advance(pc, 3, a+7) == 1); assert(pc == a+7);

// backward with same type
p = a+5; assert(std::ranges::advance(p, 0, a+3) == 0); assert(p == a+5);
p = a+5; assert(std::ranges::advance(p, -1, a+3) == 0); assert(p == a+4);
p = a+5; assert(std::ranges::advance(p, -2, a+3) == 0); assert(p == a+3);
p = a+5; assert(std::ranges::advance(p, -3, a+3) == -1); assert(p == a+3);

// backward with different type is UB so don't test it

This covers all the cases, I think. The only other thing that needs testing is whether we assign from Sent to It in all the right places (and not in the wrong places); that'll require a sentinel type with a user-defined operator= so we can see that it's being called.

philnik added inline comments.
libcxx/include/__iterator/advance.h
75–76

If you don't want to do the early returns at least put braces around the outer if and else.

CaseyCarter added inline comments.
libcxx/include/__iterator/advance.h
158

Unfortunately https://eel.is/c++draft/range.iter.op.advance#6.1.1 seems to mandate that even when __n == 0, we need to evaluate __bound - __i and then, if that's also zero, we must __i = __bound instead of simply returning immediately with 0. Is there any blanket-wording loophole that gets us out of that obligation? This code could get much simpler if we were allowed to put if (__n == 0) return 0; on line 168.

It is unfortunate that the wording requires the difference to be performed when n == 0, and the assignment to be performed even when n == 0 and last - first == 0, which implies first == last already. MSVC does compute the distance (I think n == 0 is a weird enough corner not to be worth the cost of checking) but not the assignment when n == 0. I suggest you implement what you think is most performant, and file an LWG issue to allow your implementation and ours.

ldionne updated this revision to Diff 402937.Jan 25 2022, 8:56 AM
ldionne marked 3 inline comments as done.

Address review comments

I didn't look closely but noticed a style issue.

libcxx/include/__iterator/advance.h
153
ldionne updated this revision to Diff 402953.Jan 25 2022, 9:56 AM
ldionne marked an inline comment as done.

Uglify parameters and submit comments I had forgotten to submit.

ldionne added inline comments.
libcxx/include/__iterator/advance.h
153

Ugh of course, thanks.

158

I would prefer to inline it at its one call site, here, rather than generalize it unnecessarily.

After looking at the code generation and seeing that we are indeed paying for the generality, I agree that we should special case it: https://godbolt.org/z/Ez6W9Yfej. I'll still use a lambda just to avoid having a super complicated inlined if condition. I tried various implementations and yours was basically the most efficient and correct one. Even using if-else instead of a ternary somehow produces slightly worse code.

[about evaluating __bound - __i when __n == 0]

I still think it's better not to generate a special check for __n == 0, since that should be fairly rare (as Casey mentioned).

https://godbolt.org/z/jE66PEP4E — I haven't really looked at this code but it sure looks like libc++ is doing something suboptimal right now, doesn't it?

I looked, and I think it boils down to libstdc++ not protecting against signed integer overflow like what we fixed back in D106735. @jwakely , you might be interested in taking a look, here's a distilled version of libc++ vs libstdc++: https://godbolt.org/z/aYbq1nxhf

libcxx/test/std/iterators/iterator.primitives/range.iter.ops/range.iter.ops.advance/iterator_count_sentinel.pass.cpp
202

The only other thing that needs testing is whether we assign from Sent to It in all the right places (and not in the wrong places); that'll require a sentinel type with a user-defined operator= so we can see that it's being called.

Actually, in order to test that, I would need to define a whole random access iterator with an operator= that takes a sentinel, and count the assignments from the iterator's operator=. Otherwise, to only define a sentinel, I have to fall back to some trick like the conversion operator I've used elsewhere, but it wouldn't be clean to count the number of times the sentinel gets converted to the iterator type as a proxy for counting the number of assignments. In other words, defining only the sentinel as a shortcut would be brittle IMO.

Given that the intent is for the Standard not to force us to evaluate those anyway, and given that we are already testing this "from the side" by counting operations with stride_counting_iterator to ensure that we don't increment when we should be assigning, I think there is diminishing returns in adding this test. Let me know if you agree -- if not, I'll add it.

Quuxplusone added inline comments.
libcxx/include/__iterator/advance.h
158

I continue to see no reason to name __magnitude_geq, nor to define __M as const ( https://quuxplusone.github.io/blog/2022/01/23/dont-const-all-the-things/ ), nor to define __M using C++20 if-syntax instead of making it an ordinary declaration on its own line. But whatevs.
Vice versa, I'd prefer to see lines 154-155 either done as four lines, or done as one ternary. It's basically a (Fortran ;)) three-way arithmetic IF: if a<0 do one thing, else if a==0 do another, else if a>0 do a third.

It occurs to me that (unless I screwed up the math) it can also be done branchless:

auto __M = __bound - __i;
if (((__n < 0) | (__n >= __M)) & ((__n > 0) | (__n <= __M))) {
    __i = std::move(__bound);
    return __n - __M;
}

__i += __n;
return 0;

but this does not seem like better codegen on GCC, and not a clear-cut winner on Clang either, so I'd go with what you've got.

libcxx/test/std/iterators/iterator.primitives/range.iter.ops/range.iter.ops.advance/iterator_count_sentinel.pass.cpp
202

I agree that the returns aren't worth it. I didn't test an instrumented operator- in D117940 either.

But please compare against my tests in D117940 and see if they trigger anything you want to add over here too.
Since ranges::distance became an LWG-worthy rabbit hole, it occurs to me that you should also test

// backward with array as sentinel
p = a+5; assert(std::ranges::advance(p, 0, a) == 0); assert(p == a+5);
p = a+5; assert(std::ranges::advance(p, -1, a) == 0); assert(p == a+4);
p = a+5; assert(std::ranges::advance(p, -5, a) == 0); assert(p == a);
p = a+5; assert(std::ranges::advance(p, -6, a) == -1); assert(p == a);

I expect this to be fine today (because the sentinel is taken by value and that should never ever change), but it'll protect against its getting broken tomorrow (when LWG decides to change the thing that shouldn't change ;)).

jwakely added inline comments.Jan 26 2022, 8:18 AM
libcxx/include/__iterator/advance.h
158

I suggest you implement what you think is most performant, and file an LWG issue to allow your implementation and ours.

Yes, please do. I don't care about n==0 but doing a pointless assignment for bound - i == 0 is pointless.

@jwakely , you might be interested in taking a look

Thanks, I'd missed the reflector thread about it last July.

ldionne updated this revision to Diff 403387.Jan 26 2022, 1:08 PM
ldionne marked 2 inline comments as done.

Address review comments

ldionne accepted this revision as: Restricted Project.Jan 26 2022, 1:08 PM

Will ship once CI passes

libcxx/include/__iterator/advance.h
158

@Quuxplusone

Ah yes, I meant to use a single ternary, that's the best codegen of the two. I'm not going to do it branchless though, it's too obscure.

Regarding using __magnitude_geq: it allows me to document what this long condition is doing and arguably makes the code easier to read.

libcxx/test/std/iterators/iterator.primitives/range.iter.ops/range.iter.ops.advance/iterator_count_sentinel.pass.cpp
202

I've added the "basic" tests you suggested here.

It looks like we could use SFINAE tests in all of prev, next and the various advance overloads, but it doesn't belong in this patch.

This revision is now accepted and ready to land.Jan 26 2022, 1:08 PM
This revision was automatically updated to reflect the committed changes.