Page MenuHomePhabricator

[SCEV] Don't require ControlsExit for gt/lt NoWrap
Changes PlannedPublic

Authored by nikic on May 2 2021, 9:46 AM.

Details

Summary

When determining the exit count for gt/lt predicates for an AddRec with non-unit stride, we need to check whether the AddRec may overflow. NoWrap flags are used for this purpose, but currently only if ControlsExit is true. Among other things, this means that it must be a single-exit loop.

I believe the ControlsExit requirement is not necessary. To explain why, it's useful to consider the case where it is needed, which is for equality exit counts (https://github.com/llvm/llvm-project/blob/ec2e3e331e6d60dee77e4da83bbb192597069f15/llvm/lib/Analysis/ScalarEvolution.cpp#L9261). If we have something like {0,+,3}<nw> == 4, then it would be not correct to just return 4 u/ 3 as the exit count: It could be that a prior normal or abnormal exit exits before the UB branch on poison from the <nw> flag violation is encountered. It would be wrong to upper-bound the trip count using this exit.

The same issue doesn't apply to gt/lt predicates. If we have something like {0,+,2}<nuw> < Len then the loop will exit after ((1 + Len) /u 2) iterations, if it exits through that exit. It could also exit through an earlier (normal or abnormal) exit that might prevent a poison value from reaching the branch, but that ultimately doesn't matter.

Diff Detail

Event Timeline

nikic created this revision.May 2 2021, 9:46 AM
nikic requested review of this revision.May 2 2021, 9:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2021, 9:46 AM
reames added a comment.EditedMay 5 2021, 12:53 PM

I think this code may simply be too strict in general. Let me explain my reasoning.

For each exit, we are only computing the exit count for the loop *if and only if* that exit is taken. The fact that the loop might exit through another exit previously is handled by the fact we take the min of all exit counts. We have an invariant in this code that all cached exit counts are for exits which dominate the backedge (and thus are guaranteed not to be bypassed).

If we have an exit which depends on poison which dominates the backedge, we know that said loop either a) returns through another exit before that one on the final iteration, b) returns through another exit on a previous iteration, or b) is undefined.

I think the single exit check can be replaced with a check for whether the current exit dominates the backedge. One point of remaining confusion is why the check for abnormal exits at all, SCEVs exit counts only describe exits through explicit exits. This seems to have been added in c7f69b92, which I think may simply be wrong?

I mention this mostly because I didn't follow your reasoning about equality vs inequalities. :)

reames added a subscriber: aqjune.May 5 2021, 12:56 PM

One thing we need to be careful of is that we've been inconsistent about treating branch on poison as full UB. I know @aqjune has been working on that, but we might not be ready to be much more aggressive here yet. I'll let him comment.

nikic added a comment.May 5 2021, 1:28 PM

I think this code may simply be too strict in general. Let me explain my reasoning.

For each exit, we are only computing the exit count for the loop *if and only if* that exit is taken. The fact that the loop might exit through another exit previously is handled by the fact we take the min of all exit counts.

This is correct at a high level, but the details are a bit subtle. In particular, exactly because we take the min of all exit counts, we cannot under-estimate the exit count of an exit that is never taken.

As I didn't do a good job of explaining this, let's take the example from PR28012:

loop:
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  %iv.inc = add nsw i32 %iv, 3
  call void @may_exit()
  %becond = icmp ne i32 %iv.inc, 46
  br i1 %becond, label %loop, label %leave

Reporting 46 u/ 3 (which does not divide evenly) as the exit count would be incorrect. @may_exit() could bail out somewhere after iteration 46 /u 3, but before the iteration where nsw causes poison. To perform poison -> UB reasoning in this specific case, we can't have any exits (normal or abnormal) before the branch.

The same issue doesn't apply to a ult predicate, because we can't end up under-estimating the exit count. We definitely can't do more iterations than that exit count (if latch dominating), though we could do less (through a different exit).

We have an invariant in this code that all cached exit counts are for exits which dominate the backedge (and thus are guaranteed not to be bypassed).

I think we do compute all exit counts, but don't use the non-dominating ones when calculating the loop trip count (and also exclude them in various indvars optimizations).

aqjune added a comment.May 5 2021, 9:18 PM

One thing we need to be careful of is that we've been inconsistent about treating branch on poison as full UB. I know @aqjune has been working on that, but we might not be ready to be much more aggressive here yet. I'll let him comment.

The full fix isn't done yet (we have a few places in SimplifyCFG and loop opts where freeze is needed), but I believe it is those that should be fixed if there is any issue.

reames added a comment.EditedMay 19 2021, 11:02 AM

@nikic I'm going to try to expand a bit on my current thinking, but this may be easier to discuss on a call. Back and forth with instant clarification seems quite useful. :)

All of the following will use the same example you did. Specifically:
loop:

%iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
%iv.inc = add nsw i32 %iv, 3
call void @may_exit()
%becond = icmp ne i32 %iv.inc, 46
br i1 %becond, label %loop, label %leave

There are two interrelated sources of exit count information here:

  1. If the exit is actually taken.
  2. If the loop must hit UB on this exiting branch

As you pointed out, in this example without the nsw flag, this is an infinite loop. As such, part (a) contributes nothing to our analysis. Part (b) gives us an exit count of UINT32_MAX udiv 3 - because we can prove that a) poison is produced on that iteration and b) that poison reaches a UB triggering instruction.

In the current code, computeBECount also relies on wrapping being impossible (either by proof or by assumption) with it's reasoning about delta offsets. (I have separate reason to believe there's potentially a latent bug somewhere here, but haven't yet found it.) Note that this is still computing part (a) above, it isn't really computing (b) per se. This is one reason I'm really nervous about your change.

When reasoning, I think we need to carefully distinguish how we use part (b) facts. In particular, we need to distinguish the case where we know that UB would trigger on EC=x for this exit, and the case where the loop is guaranteed to exit through this exit (and thus *must* trigger UB). For the first, we can clearly report UINT32_MAX udiv 3 as an exit count for this exit without considering any other exit or the structure of the loop. For the second, we need the fact that this exit is the only exit and there are no abnormal exits to return anything less. But interestingly, if we can return anything less, we can also return any arbitrary value (e.g. 0) because we have now proven this loop *must* execute UB. (On this example, we can not do this last bit because @may_exit may be an abnormal exit.)

Hopefully I didn't make any critical mistakes in the above. This stuff is subtle and confusing.

nikic planned changes to this revision.Jun 8 2021, 1:16 PM

Okay, I think the change is not right after all. In particular, if we have an exit count like ((1 + %len) /u 2) that does represent the right exit count in arbitrary precision (via either exit or executed UB), but the expression for calculating the exit count itself may still overflow. If %len is UINT_MAX then the resulting exit count would be 0, while it should be UINT_MAX/2 (which is the iteration at which UB occurs). That's clearly not right.

If I'm understanding correctly, the issue you're describing is that if %len is ULONG_MAX in the following loop, on trunk LLVM, the backedge-taken count comes out to zero. This is because of the math in ScalarEvolution::computeBECount: we try to compute ceil(a/b) by converting it to floor((a + b - 1) / b), which doesn't work. There are different ways we could address this. For example, we could change around the computation: we can compute ceil(a/b) by instead expanding it to floor(a/b) + umin(a % b, 1).

I guess in the meantime, we should add the missing loopHasNoAbnormalExits() check.

declare void @maybe_abort()
define void @test1(i64 %len) {
start:
  br label %loop

loop:
  %iv = phi i64 [ 0, %start ], [ %iv.inc2, %loop ]
  %cmp1 = icmp ult i64 %iv, %len
  %iv.inc2 = add nuw i64 %iv, 2
  call void @maybe_abort()
  br i1 %cmp1, label %loop, label %exit

exit:
  ret void
}
nikic added a comment.Jun 9 2021, 10:11 AM

If I'm understanding correctly, the issue you're describing is that if %len is ULONG_MAX in the following loop, on trunk LLVM, the backedge-taken count comes out to zero.

Yes, exactly.

This is because of the math in ScalarEvolution::computeBECount: we try to compute ceil(a/b) by converting it to floor((a + b - 1) / b), which doesn't work. There are different ways we could address this. For example, we could change around the computation: we can compute ceil(a/b) by instead expanding it to floor(a/b) + umin(a % b, 1).

Right. However, I'd expect alternative encodings to be a pessimization in practice. We only really get in this situation if a is non-constant, so this won't be folding away.

I guess in the meantime, we should add the missing loopHasNoAbnormalExits() check.

Yeah, we should.

reames added a comment.Jun 9 2021, 2:12 PM

@nikic I'm increasingly thinking that the computeBECount logic is just wrong. I believe you can trip the overflow to zero case with a loop with explicit (correct) flags. There's some comments and a guard on negative steps which appear incorrect, and I'm guessing were an attempt to work around an existing bug.

I think we need to change the computeBECount function to correctly compute the trip count for the overflowing (computation) case, but doing that "cheap enough" has has me slightly stuck. I was hoping to get a patch up for this in the next day or two once I had some time to sit down and focus on that aspect.

reames added a subscriber: test.Jun 9 2021, 8:58 PM

Here is what I think an example of a bug in the current computeBECount. Overflow is complicated, so I might be wrong here. Sanity check me?

define void @test(i8 %n) {
entry:

br label %loop

loop:

%iv = phi i8 [0, %entry], [%iv.next, %loop]
%iv.next = add nuw i8 %iv, 127
%cmp = icmp ult i8 %iv, 254
br i1 %cmp, label %loop, label %exit

exit:

ret void

}

SCEV computes a backedge taken count for this loop of zero. I believe this is incorrect, and that this loop actually exits on the third iteration without executing UB.

On iteration #1:
%iv = 0
%iv.next = 127
%cmp = true

On iteration #2:
%iv = 127
%iv.next = 254
%cmp = true

On iteration #3:
%iv = 254
%iv.next = poison
%cmp = false

If we replace the exit test of 254 with an unknown %n, we can see why the result is wrong. We get:
((126 + %n) /u 127)

I believe this is the simplified form of:
((%n - 0 + 127 - 1) /u 127)

Which is what howManyLessThans + computeBECount should produce on this example if I'm reading it correctly.

Anything I'm missing here?

Looks like a real issue for strides that aren't a power of two.

doing that "cheap enough" has has me slightly stuck.

Spent a bit more time thinking about it. Maybe "umin(n, 1) + floor((n - umin(n, 1)) / stride)"?