This is an archive of the discontinued LLVM Phabricator instance.

[IndVarSimplify] Fixup nowrap flags during LFTR when moving to post-inc (PR31181)
ClosedPublic

Authored by nikic on Apr 20 2019, 5:38 AM.

Details

Summary

Fix for https://bugs.llvm.org/show_bug.cgi?id=31181. When LFTR moves a condition prior to increment to a condition after increment, it may depend on a value that is now poison due to nowrap flags. We need to make sure to drop the nowrap flags on the add in such a situation. I'm doing this by copying the nowrap flags that SCEV computed.

Once this issue is fixed, we can reenable computation of nowrap flags in CVP.

Diff Detail

Event Timeline

nikic created this revision.Apr 20 2019, 5:38 AM
hans added a comment.Apr 24 2019, 6:34 AM

Based on https://bugs.llvm.org/show_bug.cgi?id=31181#c9 it sounds like Sanjoy was the closest to a solution back then. Can you take a look?

sanjoy requested changes to this revision.May 4 2019, 1:17 PM

I'm not sure if the current fix is correct. I'd suggest doing one of the following:

  • Remove LFTR and deal with the fallout somehow
  • Change LFTR to strip out all no-wrap flags

We can lower the cost of the second option by moving LFTR to later in the optimization pipeline right before codegen since mostly mid level passes care about nowrap flags.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2382

I don't think the no-wrap flags on the SCEV expression can be propagated to the increment operation like this. I wrote up some background here: https://www.playingwithpointers.com/blog/scev-integer-overflow.html but in short, say your pre-increment SCEV expression is {S,+,X} then the post-inc SCEV expression, SE->getSCEV(CmpIndVar) down below, is {S+X,+,X}. Whether this is nsw/nuw has nothing to do with whether S+X can overflow -- all it says is that on all but the last iteration of the loop Add(CmpIndVar,X) will not overflow where CmpIndVar starts from S+X and is incremented by X on every iteration.

As a concrete example, say the pre-increment expression is {-1,+,1} and let's say the loop body executes 10 times. Then the post-inc IV is {0,+,1}, which is both nuw and nsw (the value of the IV is < 10 and adding 1 to it does not overflow). But you can't mark the increment operation as nuw because on the very first iteration it computes Add(-1,1) which unsigned overflows.

This revision now requires changes to proceed.May 4 2019, 1:17 PM
nikic updated this revision to Diff 198288.May 6 2019, 8:32 AM

Use both pre-inc and post-inc AddRec nowrap flags.

nikic marked an inline comment as done.May 6 2019, 8:38 AM
nikic added inline comments.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2382

Thanks, your blog post was very useful! You were also right that a nuw flag would be incorrectly added for your {-1,+,1} example.

Before moving on to more drastic measures, I'd like to suggest the approach in the updated patch: If I understood correctly, the problem here is basically that using the pre-increment addrec, we don't have a no-overflow guarantee on the last iteration (by semantics), while if we use the post-increment addrec, we don't have a no-overflow guarantee on the first iteration (because it's folded into the start value). However, if we combine overflow flags from both, we can cover the full range, including first and last iteration. Is that correct?

discovered an unfinished review that Sanjoy was doing to address PR31181: D30446
(mentioning just in case it appears to be useful)

reames added a subscriber: reames.May 14 2019, 4:10 PM

Reading through the discussion, it sounds like the only place where LFTR goes wrong is dealing with the post-increment form of the IV right?

If so, what about splitting LFTR into two parts. The first part does LFTR *without changing pre-to-post or vice versa*, and then a second part which converts to post-increment where possible. The value of the later seems quite questionable to me honestly, and if we have to restrict it greatly it wouldn't matter as much as having to restrict the first part. I'd also really like to be able to extend the first part to multiple exit loops whereas extending the second part seems actively harmful on some examples.

nikic added a comment.May 15 2019, 1:05 AM

Reading through the discussion, it sounds like the only place where LFTR goes wrong is dealing with the post-increment form of the IV right?

If so, what about splitting LFTR into two parts. The first part does LFTR *without changing pre-to-post or vice versa*, and then a second part which converts to post-increment where possible. The value of the later seems quite questionable to me honestly, and if we have to restrict it greatly it wouldn't matter as much as having to restrict the first part. I'd also really like to be able to extend the first part to multiple exit loops whereas extending the second part seems actively harmful on some examples.

Unfortunately this is not the case. Based on the additional discussion in D30446 the problem is not just the switch from pre- to post-increment. In addition to that LFTR may switch to a (more canonical) IV that is dynamically dead and as such may be poison. The example in https://reviews.llvm.org/D30446#702870 illustrates such a case. I've checked that this patch also fixes that case (i.e. change the add nsw into an add nuw).

One case this patch currently doesn't address is the following variation:

define i32 @test(i32* %ptr, i1 %always_false) {
entry:
  br label %for.cond

for.cond:
  %iv = phi i32 [ -2147483648, %entry ], [ %iv.inc, %always_taken ]
  %iv2 = phi i32 [ 0, %entry ], [ %iv2.inc, %always_taken ]
  store i32 %iv, i32* %ptr
  %cmp = icmp slt i32 %iv, 20
  br i1 %cmp, label %for.body, label %for.end

for.body:
  br i1 %always_false, label %never_taken, label %always_taken

never_taken:
  store volatile i32 %iv2, i32* %ptr
  br label %always_taken

always_taken:
  %iv.inc = add nsw i32 %iv, 1
  %iv2.inc = add nsw i32 %iv2, 1
  br label %for.cond

for.end:
  ret i32 0
}

This is the case where the exit block and the loop latch don't match and a pre-increment is used, and I'm currently only recomputing the flags in the post-increment case. When I tried to extend it to that case I ran into a snag with SCEV seemingly generating incorrect nowrap flags:

%iv = phi i32 [ -2147483648, %entry ], [ %iv.inc, %always_taken ]
-->  {-2147483648,+,1}<nsw><%for.cond> U: [-2147483648,21) S: [-2147483648,21)Exits: 20		LoopDispositions: { %for.cond: Computable }
%iv2 = phi i32 [ 0, %entry ], [ %iv2.inc, %always_taken ]
-->  {0,+,1}<nuw><nsw><%for.cond> U: [0,-2147483648) S: [0,-2147483648)	Exits: -2147483628		LoopDispositions: { %for.cond: Computable }
%iv.inc = add nsw i32 %iv, 1
-->  {-2147483647,+,1}<nsw><%for.cond> U: [-2147483647,22) S: [-2147483647,22)Exits: 21		LoopDispositions: { %for.cond: Computable }
%iv2.inc = add nsw i32 %iv2, 1
-->  {1,+,1}<nuw><%for.cond> U: [1,-2147483626) S: [1,-2147483626)		Exits: -2147483627		LoopDispositions: { %for.cond: Computable }

Here %iv2 is {0,+,1}<nuw><nsw>, while it should be {0,+,1}<nuw>, unless I'm misunderstanding something about the validity constraints on these flags (%iv2.inc is correct though). I haven't looked into what happens here yet.

So, while I think that the switch from pre-inc to post-inc might be the part that is most problematic in practice (and prevents us from enabling CVP optimizations), it's not really the fundamental problem in LFTR (which is reusing an IV while ignoring poison flags).

FYI, I'm slowly building a mental model for LFTR and it's approach and limitations. As part of that, I've been checking in a set of tests changes, NFC code motion, and strengthened assertions. You may find them interesting. I'm a couple of days at current rate of progress from being able to meaningfully respond to your comments above.

nikic updated this revision to Diff 200356.May 20 2019, 2:08 PM

Handle pre-increment case as well, improve comment, rebase over additional tests.

reames requested changes to this revision.May 28 2019, 4:00 PM

Ok, after reading through the code and trying out a couple of ideas, I'm fairly sure I understand what's going on here. There's actually (at least) three distinct bugs in the same logic. The core thing to keep in mind is that we truncate the IV to the backedge taken type, then perform the math on that type. The trivial alternative would be to extend both to the same type, but that results in markedly worse codegen. (Still exploring whether that's fundamental.)

The first bug is that if we have a backedge taken count of uint_max<T>, then adding 1 results in 0. This results in the truncated limit instruction simply being the trunc(IV.Start) which while correct for the exit value, is *also* the starting value. I haven't been able to write a test case to demonstrate this yet.

The second bug is that once we (correctly) compute the exit value, we rewrite the exit test as IV + 1 Cmp Limit. The problem is that poison is defined such that *branching on* a poison result is what triggers UB. As such, we could have had an nsw IV whose post-increment was poison w/o being UB (since we tested against the pre-increment which wasn't). As such, unless we can prove that (IV+1) isn't poison, we have three choices:

  1. Don't convert the post increment, but do the pre-increment version only.
  2. Strip nsw/nuw from the increment. (This is what the current patch does.)
  3. Bail out of the transform in this case entirely.

The third is the dead IV case mentioned in patch comments. Good catch, I'd missed that one until rereading the patch again. Nikita, can you list a specific test case which exercises this? I'd have thought that hasConcreteDef case would have caught this?

At the moment, I believe the patch posted is correct, but is unnecessarily conservative. Instead of only stripping nsw/nuw where necessary, it *always* strips. A simple, but noteworthy improvement would be to use SCEV's getSignedRange/getUnsignedRange to ensure the increment doesn't overflow and then only stripping on cases where it does.

I'd also prefer to see us simply refuse to switch to post-increment rather than stripping the nsw/nuw. That seems like a generally better tradeoff for a middle-end transform. LSR (or some other late transform) could do the stripping form if needed. (I doubt it would be.)

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2402

This test can never fail given the definition of LoopCounter used.

This revision now requires changes to proceed.May 28 2019, 4:00 PM
nikic marked an inline comment as done.May 29 2019, 12:33 AM

Ok, after reading through the code and trying out a couple of ideas, I'm fairly sure I understand what's going on here. There's actually (at least) three distinct bugs in the same logic. The core thing to keep in mind is that we truncate the IV to the backedge taken type, then perform the math on that type. The trivial alternative would be to extend both to the same type, but that results in markedly worse codegen. (Still exploring whether that's fundamental.)

The first bug is that if we have a backedge taken count of uint_max<T>, then adding 1 results in 0. This results in the truncated limit instruction simply being the trunc(IV.Start) which while correct for the exit value, is *also* the starting value. I haven't been able to write a test case to demonstrate this yet.

The second bug is that once we (correctly) compute the exit value, we rewrite the exit test as IV + 1 Cmp Limit. The problem is that poison is defined such that *branching on* a poison result is what triggers UB. As such, we could have had an nsw IV whose post-increment was poison w/o being UB (since we tested against the pre-increment which wasn't). As such, unless we can prove that (IV+1) isn't poison, we have three choices:

  1. Don't convert the post increment, but do the pre-increment version only.
  2. Strip nsw/nuw from the increment. (This is what the current patch does.)
  3. Bail out of the transform in this case entirely.

The third is the dead IV case mentioned in patch comments. Good catch, I'd missed that one until rereading the patch again. Nikita, can you list a specific test case which exercises this? I'd have thought that hasConcreteDef case would have caught this?

These are the switch_to_different_iv_pos_inc and switch_to_different_iv_pre_inc test cases. I'm assuming you mean AlmostDeadIV here rather than hasConcreteDef. AlmostDeadIV doesn't trigger here because the IV is only dynamically dead, it has static uses.

hasConcreteDef ostensibly is supposed to prevent the problem that this patch deals with in the first place, which is the IV becoming undef. However it does this in a rather naive manner and in particular does not consider poison flags. (And just checking for poison flags would pessimize this a lot -- just because they are present does not mean that it will actually become poison.)

At the moment, I believe the patch posted is correct, but is unnecessarily conservative. Instead of only stripping nsw/nuw where necessary, it *always* strips. A simple, but noteworthy improvement would be to use SCEV's getSignedRange/getUnsignedRange to ensure the increment doesn't overflow and then only stripping on cases where it does.

No, this patch only strips where necessary. It uses the nowrap flag computed by SCEV on the post-inc addrec, which are determined through a number of ways, including both range analysis, as well as nowrap flags on the original IR in conjunction with poison based reasoning. From my experiments this analysis seems to be pretty good, which is also why there is no fallout on existing tests.

See test_no_drop_nuw and test_no_drop_nsw for cases where flags are not dropped based on range analysis in particular, despite a change to post-inc IV.

I'd also prefer to see us simply refuse to switch to post-increment rather than stripping the nsw/nuw. That seems like a generally better tradeoff for a middle-end transform. LSR (or some other late transform) could do the stripping form if needed. (I doubt it would be.)

Sounds reasonable, I'll give this a try later.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2402

This can also be a GEP.

(Separating responses into two threads since these are largely separate points now.)

The third is the dead IV case mentioned in patch comments. Good catch, I'd missed that one until rereading the patch again. Nikita, can you list a specific test case which exercises this? I'd have thought that hasConcreteDef case would have caught this?

These are the switch_to_different_iv_pos_inc and switch_to_different_iv_pre_inc test cases. I'm assuming you mean AlmostDeadIV here rather than hasConcreteDef. AlmostDeadIV doesn't trigger here because the IV is only dynamically dead, it has static uses.

Ah, yes. I see your point. We're still adding a branch on a value which previously could have been poison without triggering UB. Dang, that's a much more general problem than the post-increment one I'd been mostly focused on.

So, how about this? What if we implemented the following:

  1. When changing IVs, always clear the flags. All of the other options (selective clearing, changing hasConcreteDef, bailing out) appear difficult to implement. If we can get away without doing so performance wise, we should.
  2. When reusing the same IV, only switch to post increment if we can do so without clearing flags.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2402

You are correct... Though, does that mean we have the same problem with inbounds? I think it may actually.

(Response part 2)

At the moment, I believe the patch posted is correct, but is unnecessarily conservative. Instead of only stripping nsw/nuw where necessary, it *always* strips. A simple, but noteworthy improvement would be to use SCEV's getSignedRange/getUnsignedRange to ensure the increment doesn't overflow and then only stripping on cases where it does.

No, this patch only strips where necessary. It uses the nowrap flag computed by SCEV on the post-inc addrec, which are determined through a number of ways, including both range analysis, as well as nowrap flags on the original IR in conjunction with poison based reasoning. From my experiments this analysis seems to be pretty good, which is also why there is no fallout on existing tests.

See test_no_drop_nuw and test_no_drop_nsw for cases where flags are not dropped based on range analysis in particular, despite a change to post-inc IV.

Yeah, you're right I simply misread the code. You are using the flags off the AR.

At first, I thought your placement was problematic (since it might use cached information), but after further digging, I've convinced myself that SCEV would return a result which must be valid in this case. SCEV appears to only set flags if those flags would be valid for *all locations* within the loop. Given we haven't inserted the new use yet (the compare), the analysis can't go wrong based on the presence of the nsw/nuw bits either. So, I think what you have here is actually 100% correct. :)

Hm, elsewhere in this thread, I'd encouraged you not to do the pre-to-post conversion if it causes us to drop flags. I think I'm going to drop that request given the further complexities involved in changing IVs. Let's go with the simple scheme (yours of dropping flags with re-inference) and only come back to the more complicated ones if performance shows us we need to. Let's just focus on getting a correct bug fix checked in for the moment.

Given that change in strategy, we only have two open items before I can LGTM this. 1) is the GEP/inbounds question and 2) the TODO question about hasConcreteDef.

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2397

I don't think this is possible given hasConcreteDef. It ensures the value on the first iteration is non-poison.

nikic marked an inline comment as done.May 29 2019, 2:15 PM
nikic added inline comments.
llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
2397

hasConcreteDef() ensures that it's not poison on entry into the loop, but doesn't make sure that it's non-poison after the first iteration. The post-inc addrec only applies after the first iteration. Here is an example (CHECK lines are with this patch):

define i32 @switch_to_different_iv_first_poison(i32* %ptr, i1 %always_false) {
; CHECK-LABEL: @switch_to_different_iv_first_poison(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    br label [[FOR_COND:%.*]]
; CHECK:       for.cond:
; CHECK-NEXT:    [[IV2:%.*]] = phi i32 [ -1, [[ENTRY:%.*]] ], [ [[IV2_INC:%.*]], [[ALWAYS_TAKEN:%.*]] ]
; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ -2147483648, [[ENTRY]] ], [ [[IV_INC:%.*]], [[ALWAYS_TAKEN]] ]
; CHECK-NEXT:    store i32 [[IV]], i32* [[PTR:%.*]]
; CHECK-NEXT:    br i1 [[ALWAYS_FALSE:%.*]], label [[NEVER_TAKEN:%.*]], label [[ALWAYS_TAKEN]]
; CHECK:       never_taken:
; CHECK-NEXT:    store volatile i32 [[IV2]], i32* [[PTR]]
; CHECK-NEXT:    br label [[ALWAYS_TAKEN]]
; CHECK:       always_taken:
; CHECK-NEXT:    [[IV2_INC]] = add nuw nsw i32 [[IV2]], 1
; CHECK-NEXT:    [[IV_INC]] = add nsw i32 [[IV]], 1
; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i32 [[IV2_INC]], -2147483628
; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_COND]], label [[FOR_END:%.*]]
; CHECK:       for.end:
; CHECK-NEXT:    ret i32 0
;
entry:
  br label %for.cond

for.cond:
  %iv2 = phi i32 [ -1, %entry ], [ %iv2.inc, %always_taken ]
  %iv = phi i32 [ -2147483648, %entry ], [ %iv.inc, %always_taken ]
  store i32 %iv, i32* %ptr
  br i1 %always_false, label %never_taken, label %always_taken

never_taken:
  store volatile i32 %iv2, i32* %ptr
  br label %always_taken

always_taken:
  %iv2.inc = add nuw nsw i32 %iv2, 1
  %iv.inc = add nsw i32 %iv, 1
  %cmp = icmp slt i32 %iv, 20
  br i1 %cmp, label %for.cond, label %for.end

for.end:
  ret i32 0
}

As you can see, we switched to %iv2.inc, but did not drop any nowrap flags.

Unfortunately I found out that there seems to be another issue that affects not just the first iteration. Here is a small variation on the above test case, starting from -2 instead of -1:

define i32 @switch_to_different_iv_second_poison(i32* %ptr, i1 %always_false) {
; CHECK-LABEL: @switch_to_different_iv_second_poison(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    br label [[FOR_COND:%.*]]
; CHECK:       for.cond:
; CHECK-NEXT:    [[IV2:%.*]] = phi i32 [ -2, [[ENTRY:%.*]] ], [ [[IV2_INC:%.*]], [[ALWAYS_TAKEN:%.*]] ]
; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ -2147483648, [[ENTRY]] ], [ [[IV_INC:%.*]], [[ALWAYS_TAKEN]] ]
; CHECK-NEXT:    store i32 [[IV]], i32* [[PTR:%.*]]
; CHECK-NEXT:    br i1 [[ALWAYS_FALSE:%.*]], label [[NEVER_TAKEN:%.*]], label [[ALWAYS_TAKEN]]
; CHECK:       never_taken:
; CHECK-NEXT:    store volatile i32 [[IV2]], i32* [[PTR]]
; CHECK-NEXT:    br label [[ALWAYS_TAKEN]]
; CHECK:       always_taken:
; CHECK-NEXT:    [[IV2_INC]] = add nsw i32 [[IV2]], 1
; CHECK-NEXT:    [[IV_INC]] = add nsw i32 [[IV]], 1
; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i32 [[IV2_INC]], -2147483629
; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_COND]], label [[FOR_END:%.*]]
; CHECK:       for.end:
; CHECK-NEXT:    ret i32 0
;
entry:
  br label %for.cond

for.cond:
  %iv2 = phi i32 [ -2, %entry ], [ %iv2.inc, %always_taken ]
  %iv = phi i32 [ -2147483648, %entry ], [ %iv.inc, %always_taken ]
  store i32 %iv, i32* %ptr
  br i1 %always_false, label %never_taken, label %always_taken

never_taken:
  store volatile i32 %iv2, i32* %ptr
  br label %always_taken

always_taken:
  %iv2.inc = add nuw nsw i32 %iv2, 1
  %iv.inc = add nsw i32 %iv, 1
  %cmp = icmp slt i32 %iv, 20
  br i1 %cmp, label %for.cond, label %for.end

for.end:
  ret i32 0
}

In this case the nuw flag is correctly dropped, but the nsw flag stays. I think this might be related to the fact that for the pre-inc addrec nowrap flags from IR are adopted unconditionally, and we may end up inferring something incorrect for the post-inc addrec from that (e.g. because it over-constrains ranges). I get the impression that SCEV might just not be prepared to deal with dynamically dead IVs.

reames accepted this revision.May 31 2019, 10:17 AM

Nikita and I talked offline. We decided that both the dead IV handling and gep inbounds cases raised previously are arguably distinct bugs in this code. The current patch does appear to fix one real issue (lftr post-incr flags), and that getting that functional fix in is valuable in the near term. Given that, I'm approving the review as it stands, and we're going to continue the discussion of the other possible issues in email, and then separate reviews for each.

This revision was not accepted when it landed; it landed in state Needs Revision.Jun 1 2019, 2:38 AM
This revision was automatically updated to reflect the committed changes.