I'm not super-clear on how the original code worked (but AFAICT scev does the wrong thing on the test case); so please consider this more of an RFC.
Details
Diff Detail
Event Timeline
lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1370 | Seems like we could still pick up some trivial cases (loops with only one exit block, only one backedge, etc.). Maybe we could do more with a dominance query? |
lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1370 | I think we can prove overflow if the loop has only one exit. Having only one backedge does not give us anything though (the counterexample in this change also has only one backedge). That said, it is not clear to me how important this optimization is (for instance, removing the optimization does not fail any existing unit tests). I'd like to fix the bug first and check in a correct version of this optimization as a later change. I plan to generalize this part of SCEV to nuws so I'll be touching this area in the future anyway. |
lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1370 | Well, I don't know either. Not affecting regressions tests, unfortunately, is not a good measure of how important it is. Please add the loop-has-one-exit check, and a regression test -- this handles some large fraction of performance-sensitive loops in general -- and otherwise, LGTM. (We should try not to regress trunk, even performance-wise, any more than necessary, even in between official releases, because most of us pull internal releases on offset schedules). |
Well, this is the usual problem of transferring nsw from one expression to another equivalent expression without checking control equivalence. I have no idea how you would break this without an early exit.
The purpose of this canonicalization step is to support LoopStrength reduce, which normalizes all of its induction variables so that pre and post-increment uses are associated with the same recurrence.
I agree there should be unit tests that are covering this if it is an important optimization. I think at the time there was no way to write stand-alone LSR tests.
You mentioned that you can replace this with a correct version. I would love to see that!
If you don't have a replacement for the broken code, then please just add a check that the "preincrement" recurrence must be incremented on every path that the "post-increment" recurrence will be used.
Something like: L->getExitBlock() == L->getLoopLatch()
lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1367 | recurrence |
Actually, you could probably break this in theory without an early exit with a recurrence defined by a loop header phi that is not actually used within the loop except for the increment, then only used conditionally outside the loop. I say "in theory" because of the way the poison bit theoretically propagates. In SCEV, we basically assume undefined behavior the moment the value is used in a recurrence. That issue is much bigger than this bit of code though. You would really have to strip NSW/NUW from SCEV to change that assumption.
You mentioned that you can replace this with a correct version. I would love to see that!
I initially thought that this optimization was valid if the loop had at most one exiting block; but that is not correct. That one exiting block could be an early exit that leaves the loop before computing the pre-incremented induction variable.
If we make the assumption that a PHI node taking a poison value is UB (big assumption since poison values do not have consistent semantics yet) then I think we can do this optimization as long as we can prove that the backedge will be taken at least once.
Actually, you could probably break this in theory without an early exit with a recurrence defined by a loop header phi that is not actually used within the loop except for the increment, then only used conditionally outside the loop. I say "in theory" because of the way the poison bit theoretically propagates. In SCEV, we basically assume undefined behavior the moment the value is used in a recurrence. That issue is much bigger than this bit of code though. You would really have to strip NSW/NUW from SCEV to change that assumption.
I agree -- I think SCEV should stick with this stronger notion of nsw/nuw until LLVM's semantics gets a consistent description of what poison does.
Right, you'd need to phrase the condition as Andy suggested:
Something like: L->getExitBlock() == L->getLoopLatch()
(where you also need to make sure that both are not nullptr). If the exit block is also the latch block, then there is no early-exit problem.
(where you also need to make sure that both are not nullptr). If the exit block is also the latch block, then there is no early-exit problem.
It depends on how you interpret poison semantics -- if you pretend that an add nsw "cannot" overflow then the above is sufficient. But if you interpret poison to cause UB when you *use* the poison, then the above condition is not enough: you could exit from the latch block in the first iteration, and not have a use of the overflowed nsw add and hence could not conclude that the addition would not overflow (since having an unused overflowed nsw add is okay). In other words, the early exit could be as late as the latch block if the only use of the add nsw is the phi for the add recurrence:
define i64 @foo(i32 %start, i32 %low.limit, i32 %high.limit) { ; CHECK-LABEL: Classifying expressions for: @foo entry: %postinc.start = add i32 %start, 1 br label %loop loop: %idx = phi i32 [ %start, %entry ], [ %idx.inc, %loop ] %postinc = phi i32 [ %postinc.start, %entry ], [ %postinc.inc, %loop ] %postinc.inc = add nsw i32 %postinc, 1 %postinc.sext = sext i32 %postinc to i64 %break.early = icmp slt i32 %postinc, %low.limit %idx.inc = add nsw i32 %idx, 1 br i1 %break.early, label %loop, label %early.exit early.exit: ret i64 %postinc.sext }
The only way out, AFAICT, is to prove that the backedge is taken at least once, because this means (assuming the pre-inc is {S,+,X}<nsw>) there has been a use of add nsw S, X in the phi node for the recurrence (even if the phi node has itself not been used). But depending on how you interpret poison semantics, that may not be enough either.
All of this reasoning is shaky because currently LLVM does not have a consistent semantics for poison. We may have to revisit this reasoning once it does :)
Here's what I'd like to do:
- For now, let's stick with your strongest interpretation, and assume the local check is sufficient.
- Add a loud WARNING comment explaining that the check is only correct if we assume the overflow cases of nsw are unreachable.
- Bring up this example in the llvmdev thread discussing poison semantics
FWIW, I believe that the strong interpretation is consistent with a confirming implementation of C++, for example, which allows a program to trap upon encountering undefined behavior (1.3.24), and overflowing an integer operation is undefined behavior (5p4); there is no requirement that the result be used in order for undefined behavior to result.
- For now, let's stick with your strongest interpretation, and assume the local check is sufficient.
- Add a loud WARNING comment explaining that the check is only correct if we assume the overflow cases of nsw are unreachable.
I'd consider this approach somewhat risky -- the rest of LLVM is
allowed to do transforms that break this assumption (i.e. they assume
arithmetic is not side-effecting). Consider a slightly modified
fragment of the test case with this patch (differences commented):
define i64 @foo(i32 %start, i32 %low.limit, i32 %high.limit, i32* %f) {
entry:
%postinc.start = add i32 %start, 1 br label %loop
loop:
%idx = phi i32 [ %start, %entry ], [ %idx.inc, %continue ] %postinc = phi i32 [ %postinc.start, %entry ], [ %postinc.inc, %continue ] %postinc.inc = add nsw i32 %postinc, 1 %postinc.sext = sext i32 %postinc to i64 %break.early = icmp slt i32 %postinc, %low.limit store volatile i32 %idx, i32* %f ;; something to hold %idx in place br i1 %break.early, label %continue, label %early.exit
continue:
%idx.inc = add nsw i32 %idx, 1 br i1 %break.early, label %loop, label %exit ;; repeated use of %break.early
exit:
ret i64 0
early.exit:
ret i64 %postinc.sext
}
For the exact same situation / reason as in the test case,
SCEV(%postinc)->getStart() is not nsw.
Now if I run this through 'opt -simplifycfg -loop-rotate' I get
define i64 @foo(i32 %start, i32 %low.limit, i32 %high.limit, i32* %f) {
entry:
%postinc.start = add i32 %start, 1 br label %loop
loop: ; preds = %loop, %entry
%idx = phi i32 [ %start, %entry ], [ %idx.inc, %loop ] %postinc = phi i32 [ %postinc.start, %entry ], [ %postinc.inc, %loop ] %postinc.inc = add nsw i32 %postinc, 1 %postinc.sext = sext i32 %postinc to i64 %break.early = icmp slt i32 %postinc, %low.limit store volatile i32 %idx, i32* %f %idx.inc = add nsw i32 %idx, 1 br i1 %break.early, label %loop, label %early.exit
early.exit: ; preds = %loop
%postinc.sext.lcssa = phi i64 [ %postinc.sext, %loop ] ret i64 %postinc.sext.lcssa
}
L->getExitingBlock() == L->getLoopLatch() is true for the loop and
we'll trigger the exact same bug even if we add that additional
constraint. This happens with the optimization removed (i.e. with
this present version of the change applied) so there is no circular
reasoning here.
(Btw, I've assumed Andy really meant `L->getExitingBlock() ==
L->getLoopLatch() and not L->getExitBlock() == L->getLoopLatch()`
since AFAICT the latter is trivially false -- the exit block is always
outside the loop while the latch is always inside).
Actually, pretending that a phi consuming poison is unreachable is
also risky since I'd expect loop-rerolling to break that assumption.
While this means there is no good answer at this time, pretending that
a phi consuming poison is UB has the advantage that scev already
assumes this; and we won't be introducing more tricky corner cases.
- Sanjoy
Added back the optimization with a condition on L->getExitingBlock() == L->getLoopLatch().
lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1380 | I think that you need to make sure that both L->getExitingBlock() and L->getLoopLatch() are not nullptr (which could happen if there is no unique loop latch and also no unique exiting block). Otherwise, LGTM. |
lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
1380 | You're right, that was a slip on my part; will fix and check in. Thanks for the review. :) |
This more aggressive version of this change that was checked in has caused a regression: http://llvm.org/bugs/show_bug.cgi?id=22641
Unless anyone has a better idea, I'll send in a more pessimistic version of this patch for review (something close to the initial version).
recurrence