This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Analysis/
-
Analysis/
6
ScalarEvolution.cpp
-
test/Analysis/ScalarEvolution/
-
Analysis/
-
ScalarEvolution/
-
bad-scev-prestart-nowrap.ll

Differential D7331

ScalarEvolution incorrectly assumes that the start of certain add recurrences don't overflow
ClosedPublic

Authored by sanjoy on Feb 2 2015, 1:24 AM.

Download Raw Diff

Details

Reviewers

majnemer
atrick
hfinkel

Commits

rGf2e931cae902: Bugfix: ScalarEvolution incorrectly assumes that the start of certain add…
rL228552: Bugfix: ScalarEvolution incorrectly assumes that the start of certain

Summary

I'm not super-clear on how the original code worked (but AFAICT scev does the wrong thing on the test case); so please consider this more of an RFC.

Diff Detail

Event Timeline

sanjoy updated this revision to Diff 19133.Feb 2 2015, 1:24 AM

sanjoy retitled this revision from to ScalarEvolution incorrectly assumes that the start of certain add recurrences don't overflow.

sanjoy updated this object.

sanjoy edited the test plan for this revision. (Show Details)

sanjoy added reviewers: atrick, majnemer.

sanjoy added a subscriber: Unknown Object (MLST).

ping!

hfinkel added a subscriber: hfinkel.Feb 6 2015, 5:41 PM

hfinkel added inline comments.

lib/Analysis/ScalarEvolution.cpp
1370	Seems like we could still pick up some trivial cases (loops with only one exit block, only one backedge, etc.). Maybe we could do more with a dominance query?

sanjoy added inline comments.Feb 6 2015, 5:51 PM

lib/Analysis/ScalarEvolution.cpp
1370	I think we can prove overflow if the loop has only one exit. Having only one backedge does not give us anything though (the counterexample in this change also has only one backedge). That said, it is not clear to me how important this optimization is (for instance, removing the optimization does not fail any existing unit tests). I'd like to fix the bug first and check in a correct version of this optimization as a later change. I plan to generalize this part of SCEV to `nuw`s so I'll be touching this area in the future anyway.

hfinkel accepted this revision.Feb 6 2015, 5:57 PM

hfinkel added a reviewer: hfinkel.

hfinkel added inline comments.

lib/Analysis/ScalarEvolution.cpp
1370	Well, I don't know either. Not affecting regressions tests, unfortunately, is not a good measure of how important it is. Please add the loop-has-one-exit check, and a regression test -- this handles some large fraction of performance-sensitive loops in general -- and otherwise, LGTM. (We should try not to regress trunk, even performance-wise, any more than necessary, even in between official releases, because most of us pull internal releases on offset schedules).

This revision is now accepted and ready to land.Feb 6 2015, 5:57 PM

Well, this is the usual problem of transferring nsw from one expression to another equivalent expression without checking control equivalence. I have no idea how you would break this without an early exit.

The purpose of this canonicalization step is to support LoopStrength reduce, which normalizes all of its induction variables so that pre and post-increment uses are associated with the same recurrence.

I agree there should be unit tests that are covering this if it is an important optimization. I think at the time there was no way to write stand-alone LSR tests.

You mentioned that you can replace this with a correct version. I would love to see that!

If you don't have a replacement for the broken code, then please just add a check that the "preincrement" recurrence must be incremented on every path that the "post-increment" recurrence will be used.

Something like: L->getExitBlock() == L->getLoopLatch()

lib/Analysis/ScalarEvolution.cpp
1367	recurrence

This revision now requires changes to proceed.Feb 6 2015, 6:36 PM

Actually, you could probably break this in theory without an early exit with a recurrence defined by a loop header phi that is not actually used within the loop except for the increment, then only used conditionally outside the loop. I say "in theory" because of the way the poison bit theoretically propagates. In SCEV, we basically assume undefined behavior the moment the value is used in a recurrence. That issue is much bigger than this bit of code though. You would really have to strip NSW/NUW from SCEV to change that assumption.

You mentioned that you can replace this with a correct version. I would love to see that!

I initially thought that this optimization was valid if the loop had at most one exiting block; but that is not correct. That one exiting block could be an early exit that leaves the loop before computing the pre-incremented induction variable.

If we make the assumption that a PHI node taking a poison value is UB (big assumption since poison values do not have consistent semantics yet) then I think we can do this optimization as long as we can prove that the backedge will be taken at least once.

Actually, you could probably break this in theory without an early exit with a recurrence defined by a loop header phi that is not actually used within the loop except for the increment, then only used conditionally outside the loop. I say "in theory" because of the way the poison bit theoretically propagates. In SCEV, we basically assume undefined behavior the moment the value is used in a recurrence. That issue is much bigger than this bit of code though. You would really have to strip NSW/NUW from SCEV to change that assumption.

I agree -- I think SCEV should stick with this stronger notion of nsw/nuw until LLVM's semantics gets a consistent description of what poison does.

In D7331#120156, @sanjoy wrote:

You mentioned that you can replace this with a correct version. I would love to see that!

I initially thought that this optimization was valid if the loop had at most one exiting block; but that is not correct. That one exiting block could be an early exit that leaves the loop before computing the pre-incremented induction variable.

Right, you'd need to phrase the condition as Andy suggested:

Something like: L->getExitBlock() == L->getLoopLatch()

(where you also need to make sure that both are not nullptr). If the exit block is also the latch block, then there is no early-exit problem.

(where you also need to make sure that both are not nullptr). If the exit block is also the latch block, then there is no early-exit problem.

It depends on how you interpret poison semantics -- if you pretend that an add nsw "cannot" overflow then the above is sufficient. But if you interpret poison to cause UB when you *use* the poison, then the above condition is not enough: you could exit from the latch block in the first iteration, and not have a use of the overflowed nsw add and hence could not conclude that the addition would not overflow (since having an unused overflowed nsw add is okay). In other words, the early exit could be as late as the latch block if the only use of the add nsw is the phi for the add recurrence:

define i64 @foo(i32 %start, i32 %low.limit, i32 %high.limit) {
; CHECK-LABEL: Classifying expressions for: @foo
 entry:
  %postinc.start = add i32 %start, 1
  br label %loop

 loop:
  %idx = phi i32 [ %start, %entry ], [ %idx.inc, %loop ]
  %postinc = phi i32 [ %postinc.start, %entry ], [ %postinc.inc, %loop ]
  %postinc.inc = add nsw i32 %postinc, 1
  %postinc.sext = sext i32 %postinc to i64
  %break.early = icmp slt i32 %postinc, %low.limit
  %idx.inc = add nsw i32 %idx, 1
  br i1 %break.early, label %loop, label %early.exit

 early.exit:
  ret i64 %postinc.sext
}

The only way out, AFAICT, is to prove that the backedge is taken at least once, because this means (assuming the pre-inc is {S,+,X}<nsw>) there has been a use of add nsw S, X in the phi node for the recurrence (even if the phi node has itself not been used). But depending on how you interpret poison semantics, that may not be enough either.

All of this reasoning is shaky because currently LLVM does not have a consistent semantics for poison. We may have to revisit this reasoning once it does :)

In D7331#120174, @sanjoy wrote:

(where you also need to make sure that both are not nullptr). If the exit block is also the latch block, then there is no early-exit problem.

It depends on how you interpret poison semantics -- if you pretend that an add nsw "cannot" overflow then the above is sufficient. But if you interpret poison to cause UB when you *use* the poison, then the above condition is not enough: you could exit from the latch block in the first iteration, ...

All of this reasoning is shaky because currently LLVM does not have a consistent semantics for poison. We may have to revisit this reasoning once it does :)

Here's what I'd like to do:

For now, let's stick with your strongest interpretation, and assume the local check is sufficient.
Add a loud WARNING comment explaining that the check is only correct if we assume the overflow cases of nsw are unreachable.
Bring up this example in the llvmdev thread discussing poison semantics

FWIW, I believe that the strong interpretation is consistent with a confirming implementation of C++, for example, which allows a program to trap upon encountering undefined behavior (1.3.24), and overflowing an integer operation is undefined behavior (5p4); there is no requirement that the result be used in order for undefined behavior to result.

For now, let's stick with your strongest interpretation, and assume the local check is sufficient.

Add a loud WARNING comment explaining that the check is only correct if we assume the overflow cases of nsw are unreachable.

I'd consider this approach somewhat risky -- the rest of LLVM is
allowed to do transforms that break this assumption (i.e. they assume
arithmetic is not side-effecting). Consider a slightly modified
fragment of the test case with this patch (differences commented):

define i64 @foo(i32 %start, i32 %low.limit, i32 %high.limit, i32* %f) {
entry:

%postinc.start = add i32 %start, 1
br label %loop

loop:

%idx = phi i32 [ %start, %entry ], [ %idx.inc, %continue ]
%postinc = phi i32 [ %postinc.start, %entry ], [ %postinc.inc, %continue ]
%postinc.inc = add nsw i32 %postinc, 1
%postinc.sext = sext i32 %postinc to i64
%break.early = icmp slt i32 %postinc, %low.limit
store volatile i32 %idx, i32* %f ;; something to hold %idx in place
br i1 %break.early, label %continue, label %early.exit

continue:

%idx.inc = add nsw i32 %idx, 1
br i1 %break.early, label %loop, label %exit ;; repeated use of %break.early

exit:

ret i64 0

early.exit:

ret i64 %postinc.sext

}

For the exact same situation / reason as in the test case,
SCEV(%postinc)->getStart() is not nsw.

Now if I run this through 'opt -simplifycfg -loop-rotate' I get

define i64 @foo(i32 %start, i32 %low.limit, i32 %high.limit, i32* %f) {
entry:

%postinc.start = add i32 %start, 1
br label %loop

loop: ; preds = %loop, %entry

%idx = phi i32 [ %start, %entry ], [ %idx.inc, %loop ]
%postinc = phi i32 [ %postinc.start, %entry ], [ %postinc.inc, %loop ]
%postinc.inc = add nsw i32 %postinc, 1
%postinc.sext = sext i32 %postinc to i64
%break.early = icmp slt i32 %postinc, %low.limit
store volatile i32 %idx, i32* %f
%idx.inc = add nsw i32 %idx, 1
br i1 %break.early, label %loop, label %early.exit

early.exit: ; preds = %loop

%postinc.sext.lcssa = phi i64 [ %postinc.sext, %loop ]
ret i64 %postinc.sext.lcssa

}

L->getExitingBlock() == L->getLoopLatch() is true for the loop and
we'll trigger the exact same bug even if we add that additional
constraint. This happens with the optimization removed (i.e. with
this present version of the change applied) so there is no circular
reasoning here.

(Btw, I've assumed Andy really meant `L->getExitingBlock() ==
L->getLoopLatch() and not L->getExitBlock() == L->getLoopLatch()`
since AFAICT the latter is trivially false -- the exit block is always
outside the loop while the latch is always inside).

Actually, pretending that a phi consuming poison is unreachable is
also risky since I'd expect loop-rerolling to break that assumption.
While this means there is no good answer at this time, pretending that
a phi consuming poison is UB has the advantage that scev already
assumes this; and we won't be introducing more tricky corner cases.

Sanjoy

Added back the optimization with a condition on L->getExitingBlock() == L->getLoopLatch().

hfinkel added inline comments.Feb 8 2015, 11:33 AM

lib/Analysis/ScalarEvolution.cpp
1380	I think that you need to make sure that both L->getExitingBlock() and L->getLoopLatch() are not nullptr (which could happen if there is no unique loop latch and also no unique exiting block). Otherwise, LGTM.

sanjoy added inline comments.Feb 8 2015, 12:24 PM

lib/Analysis/ScalarEvolution.cpp
1380	You're right, that was a slip on my part; will fix and check in. Thanks for the review. :)

Closed by commit rL228552: Bugfix: ScalarEvolution incorrectly assumes that the start of certain (authored by sanjoy). · Explain WhyFeb 8 2015, 2:54 PM

This revision was automatically updated to reflect the committed changes.

Thanks and LGTM.

This more aggressive version of this change that was checked in has caused a regression: http://llvm.org/bugs/show_bug.cgi?id=22641

Unless anyone has a better idea, I'll send in a more pessimistic version of this patch for review (something close to the initial version).

sanjoy mentioned this in D7808: Fix bug 22641.Feb 21 2015, 12:00 AM

Revision Contents

Path

Size

lib/

Analysis/

ScalarEvolution.cpp

16 lines

test/

Analysis/

ScalarEvolution/

bad-scev-prestart-nowrap.ll

50 lines

Diff 19133

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,350 Lines • ▼ Show 20 Lines	static const SCEV getPreStartForSignExtend(const SCEVAddRecExpr AR,
SmallVector<const SCEV *, 4> DiffOps;		SmallVector<const SCEV *, 4> DiffOps;
for (const SCEV *Op : SA->operands())		for (const SCEV *Op : SA->operands())
if (Op != Step)		if (Op != Step)
DiffOps.push_back(Op);		DiffOps.push_back(Op);

if (DiffOps.size() == SA->getNumOperands())		if (DiffOps.size() == SA->getNumOperands())
return nullptr;		return nullptr;

// This is a postinc AR. Check for overflow on the preinc recurrence using the		// This is a postinc AR. Rule out overflow on the preinc recurrence if
// same three conditions that getSignExtendedExpr checks.		// possible.

// 1. NSW flags on the step increment.
const SCEV *PreStart = SE->getAddExpr(DiffOps, SA->getNoWrapFlags());		const SCEV *PreStart = SE->getAddExpr(DiffOps, SA->getNoWrapFlags());
const SCEVAddRecExpr *PreAR = dyn_cast<SCEVAddRecExpr>(		const SCEVAddRecExpr *PreAR = dyn_cast<SCEVAddRecExpr>(
SE->getAddRecExpr(PreStart, Step, L, SCEV::FlagAnyWrap));		SE->getAddRecExpr(PreStart, Step, L, SCEV::FlagAnyWrap));

if (PreAR && PreAR->getNoWrapFlags(SCEV::FlagNSW))		// ScalarEvolution may have concluded that PreAR does not overflow due to a
return PreStart;		// pre-existing add recurrance that contains an `add nsw` at the IR level.
		atrickUnsubmitted Not Done Reply Inline Actions recurrence atrick: recurrence
		// There may be a path out of the loop without executing that `add nsw`, so we
		// cannot assume the increment of PreStart to not overflow even if PreAR does
		// not overflow.
		hfinkelUnsubmitted Not Done Reply Inline Actions Seems like we could still pick up some trivial cases (loops with only one exit block, only one backedge, etc.). Maybe we could do more with a dominance query? hfinkel: Seems like we could still pick up some trivial cases (loops with only one exit block, only one…
		sanjoyAuthorUnsubmitted Not Done Reply Inline Actions I think we can prove overflow if the loop has only one exit. Having only one backedge does not give us anything though (the counterexample in this change also has only one backedge). That said, it is not clear to me how important this optimization is (for instance, removing the optimization does not fail any existing unit tests). I'd like to fix the bug first and check in a correct version of this optimization as a later change. I plan to generalize this part of SCEV to `nuw`s so I'll be touching this area in the future anyway. sanjoy: I think we can prove overflow if the loop has only one exit. Having only one backedge does not…
		hfinkelUnsubmitted Not Done Reply Inline Actions Well, I don't know either. Not affecting regressions tests, unfortunately, is not a good measure of how important it is. Please add the loop-has-one-exit check, and a regression test -- this handles some large fraction of performance-sensitive loops in general -- and otherwise, LGTM. (We should try not to regress trunk, even performance-wise, any more than necessary, even in between official releases, because most of us pull internal releases on offset schedules). hfinkel: Well, I don't know either. Not affecting regressions tests, unfortunately, is not a good…

// 2. Direct overflow check on the step operation's expression.		// 1. Direct overflow check on the step operation's expression.
unsigned BitWidth = SE->getTypeSizeInBits(AR->getType());		unsigned BitWidth = SE->getTypeSizeInBits(AR->getType());
Type WideTy = IntegerType::get(SE->getContext(), BitWidth 2);		Type WideTy = IntegerType::get(SE->getContext(), BitWidth 2);
const SCEV *OperandExtendedStart =		const SCEV *OperandExtendedStart =
SE->getAddExpr(SE->getSignExtendExpr(PreStart, WideTy),		SE->getAddExpr(SE->getSignExtendExpr(PreStart, WideTy),
SE->getSignExtendExpr(Step, WideTy));		SE->getSignExtendExpr(Step, WideTy));
if (SE->getSignExtendExpr(Start, WideTy) == OperandExtendedStart) {		if (SE->getSignExtendExpr(Start, WideTy) == OperandExtendedStart) {
// Cache knowledge of PreAR NSW.		// Cache knowledge of PreAR NSW.
if (PreAR)		if (PreAR)
		hfinkelUnsubmitted Not Done Reply Inline Actions I think that you need to make sure that both L->getExitingBlock() and L->getLoopLatch() are not nullptr (which could happen if there is no unique loop latch and also no unique exiting block). Otherwise, LGTM. hfinkel: I think that you need to make sure that both L->getExitingBlock() and L->getLoopLatch() are not…
		sanjoyAuthorUnsubmitted Not Done Reply Inline Actions You're right, that was a slip on my part; will fix and check in. Thanks for the review. :) sanjoy: You're right, that was a slip on my part; will fix and check in. Thanks for the review. :)
const_cast<SCEVAddRecExpr *>(PreAR)->setNoWrapFlags(SCEV::FlagNSW);		const_cast<SCEVAddRecExpr *>(PreAR)->setNoWrapFlags(SCEV::FlagNSW);
// FIXME: this optimization needs a unit test		// FIXME: this optimization needs a unit test
DEBUG(dbgs() << "SCEV: untested prestart overflow check\n");		DEBUG(dbgs() << "SCEV: untested prestart overflow check\n");
return PreStart;		return PreStart;
}		}

// 3. Loop precondition.		// 2. Loop precondition.
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
const SCEV *OverflowLimit = getOverflowLimitForStep(Step, &Pred, SE);		const SCEV *OverflowLimit = getOverflowLimitForStep(Step, &Pred, SE);

if (OverflowLimit &&		if (OverflowLimit &&
SE->isLoopEntryGuardedByCond(L, Pred, PreStart, OverflowLimit)) {		SE->isLoopEntryGuardedByCond(L, Pred, PreStart, OverflowLimit)) {
return PreStart;		return PreStart;
}		}
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 6,928 Lines • Show Last 20 Lines

test/Analysis/ScalarEvolution/bad-scev-prestart-nowrap.ll

This file was added.

				; RUN: opt -analyze -scalar-evolution < %s \| FileCheck %s

				; An example run where SCEV(%postinc)->getStart() may overflow:
				;
				; %start = INT_SMAX
				; %low.limit = INT_SMIN
				; %high.limit = < not used >
				;
				; >> entry:
				; %postinc.start = INT_SMIN
				;
				; >> loop:
				; %idx = %start
				; %postinc = INT_SMIN
				; %postinc.inc = INT_SMIN + 1
				; %postinc.sext = sext(INT_SMIN) = i64 INT32_SMIN
				; %break.early = INT_SMIN `slt` INT_SMIN = false
				; br i1 false, ___, %early.exit
				;
				; >> early.exit:
				; ret i64 INT32_SMIN


				define i64 @foo(i32 %start, i32 %low.limit, i32 %high.limit) {
				; CHECK-LABEL: Classifying expressions for: @foo
				entry:
				%postinc.start = add i32 %start, 1
				br label %loop

				loop:
				%idx = phi i32 [ %start, %entry ], [ %idx.inc, %continue ]
				%postinc = phi i32 [ %postinc.start, %entry ], [ %postinc.inc, %continue ]
				%postinc.inc = add nsw i32 %postinc, 1
				%postinc.sext = sext i32 %postinc to i64
				; CHECK: %postinc.sext = sext i32 %postinc to i64
				; CHECK-NEXT: --> {(sext i32 (1 + %start) to i64),+,1}<nsw><%loop>
				%break.early = icmp slt i32 %postinc, %low.limit
				br i1 %break.early, label %continue, label %early.exit

				continue:
				%idx.inc = add nsw i32 %idx, 1
				%cmp = icmp slt i32 %idx.inc, %high.limit
				br i1 %cmp, label %loop, label %exit

				exit:
				ret i64 0

				early.exit:
				ret i64 %postinc.sext
				}