This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LoopStrengthReduce.cpp
-
test/Transforms/LoopStrengthReduce/X86/
-
Transforms/
-
LoopStrengthReduce/
-
X86/
-
2008-08-14-ShadowIV.ll

Differential D37209

[LSR] Fix Shadow IV in case of integer overflow
ClosedPublic

Authored by mkazantsev on Aug 28 2017, 4:20 AM.

Download Raw Diff

Details

Reviewers

wmi
• jnspaulsson
qcolombet
evstupac
anna
reames

Commits

rGbb1d01087262: [LSR] Fix Shadow IV in case of integer overflow
rL311986: [LSR] Fix Shadow IV in case of integer overflow

Summary

When LSR processes code like

int accumulator = 0;
for (int i = 0; i < N; i++) {
  for (int j = 0; j < N; j++)
    accumulator += i * j;
  use((double) sum);
}

It may decide to replace integer accumulator with a double Shadow IV to get rid
of casts. The problem with that is that the accumulator's value may overflow.
Starting from this moment, the behavior of integer and double accumulators
will differ.

This patch strenghtens up the conditions of Shadow IV mechanism applicability.
We only allow it for IVs that are proved to be AddRecs with nw flag.

Diff Detail

Repository: rL LLVM

Event Timeline

mkazantsev created this revision.Aug 28 2017, 4:20 AM

reames requested changes to this revision.Aug 28 2017, 9:31 AM

reames added inline comments.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
2034 ↗	(On Diff #112878)	I don't think this is the correct check. Specifically, this check allows overflow as long as we can guarantee that the value never reaches it's starting value. I think you actually want hasNoUnsignedWrap() && hasNoSignedWrap() Consider: unsigned i = 1 do { i += INT_MAX; d += i; } while ( (signed int)i > 0)
test/Transforms/LoopStrengthReduce/dont_turn_int_overflow_to_fp.ll
10 ↗	(On Diff #112878)	Can't you greatly reduce this example? Doesn't a simple sum of squares in the 32 bit domain trigger this?

This revision now requires changes to proceed.Aug 28 2017, 9:31 AM

Hi Max,

I believe, C and C++ standards say that signed integers overflow has undefined behavior.
So in your example we can safely change program behavior in case of overflow.

"If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [...]"
Unsigned case is more complicated.

Thanks,
Evgeny

In D37209#854175, @evstupac wrote:

Hi Max,

I believe, C and C++ standards say that signed integers overflow has undefined behavior.
So in your example we can safely change program behavior in case of overflow.

"If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [...]"
Unsigned case is more complicated.

Thanks,
Evgeny

Evgeny, what the C and C++ standards say is irrelevant to the semantics of LLVM IR and do not come into the discussion at all. It does not effect the legality of the transform in any way.

In this particular example, Clang would mark the add as being nsw when generating the IR. That encodes the semantics you're concerned about.

Further comments.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
2034 ↗	(On Diff #112878)	Giving a bit more detail, signed wrap is definitely a problem which needs prevented because double looses precision for maximum bitwidth and produces a different answer for smaller bit widths. consider signed_max<src_type> + 1. This should equal signed_min<src_type>, but may not with double arithmetic. Unsigned wrap is less clear to me. I don't have a counter example yet, but I'm suspicious that there is one. If we allowed negative steps, we'd clearly have the corresponding underflow problem to the signed wrap one just mentioned, but with the restriction to positive steps, I'm not finding one. Actually, here's an obvious one. for (unsigned int16_t = UINT16_MAX; i != UINT_MAX-1; i++) { (double)i }; UINT16_MAX + 1 should be 0. Since that fits within the signed domain of a double, we'd get (int)UINT16_MAX)+1.
test/Transforms/LoopStrengthReduce/X86/2008-08-14-ShadowIV.ll
1 ↗	(On Diff #112878)	Please convert this test to use filecheck (the automatic check lines should be fine), check that in, and then rebase please.
66 ↗	(On Diff #112878)	Is this even valid IR? I don't see a value being assigned here? Unless this is just wrapping really weirdly?

mkazantsev added inline comments.Aug 28 2017, 8:39 PM

lib/Transforms/Scalar/LoopStrengthReduce.cpp
2034 ↗	(On Diff #112878)	Thanks for pointing it out, it should indeed be nuw/nsw wraps. As for unsigned, the situation is basically the same as with signed even without negative steps. Double is able to store values that are greater than UINT_MAX, and on conversion from double to int they will be rounded to nearest (UINT_MAX, apparently) while in int they will be overflown values that are less than it.

mkazantsev added inline comments.Aug 28 2017, 9:44 PM

test/Transforms/LoopStrengthReduce/X86/2008-08-14-ShadowIV.ll
66 ↗	(On Diff #112878)	Guess that the unnamed variables are called `%1`, `%2` etc.

mkazantsev added inline comments.Aug 28 2017, 10:23 PM

test/Transforms/LoopStrengthReduce/X86/2008-08-14-ShadowIV.ll
66 ↗	(On Diff #112878)	Refactored the test as https://reviews.llvm.org/rL311980

Rebased, moved tests to existing test file, fixed the wrap check,

LGTM

test/Transforms/LoopStrengthReduce/X86/2008-08-14-ShadowIV.ll
141 ↗	(On Diff #113029)	this -> thus
188 ↗	(On Diff #113029)	this -> thus

This revision is now accepted and ready to land.Aug 29 2017, 12:17 AM

Closed by commit rL311986: [LSR] Fix Shadow IV in case of integer overflow (authored by mkazantsev). · Explain WhyAug 29 2017, 12:33 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

8 lines

test/

Transforms/

LoopStrengthReduce/

X86/

2008-08-14-ShadowIV.ll

94 lines

Diff 113034

llvm/trunk/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 2,021 Lines • ▼ Show 20 Lines	for (IVUsers::const_iterator UI = IU.begin(), E = IU.end();
// If target does not support DestTy natively then do not apply		// If target does not support DestTy natively then do not apply
// this transformation.		// this transformation.
if (!TTI.isTypeLegal(DestTy)) continue;		if (!TTI.isTypeLegal(DestTy)) continue;

PHINode *PH = dyn_cast<PHINode>(ShadowUse->getOperand(0));		PHINode *PH = dyn_cast<PHINode>(ShadowUse->getOperand(0));
if (!PH) continue;		if (!PH) continue;
if (PH->getNumIncomingValues() != 2) continue;		if (PH->getNumIncomingValues() != 2) continue;

		// If the calculation in integers overflows, the result in FP type will
		// differ. So we only can do this transformation if we are guaranteed to not
		// deal with overflowing values
		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(SE.getSCEV(PH));
		if (!AR) continue;
		if (IsSigned && !AR->hasNoSignedWrap()) continue;
		if (!IsSigned && !AR->hasNoUnsignedWrap()) continue;

Type *SrcTy = PH->getType();		Type *SrcTy = PH->getType();
int Mantissa = DestTy->getFPMantissaWidth();		int Mantissa = DestTy->getFPMantissaWidth();
if (Mantissa == -1) continue;		if (Mantissa == -1) continue;
if ((int)SE.getTypeSizeInBits(SrcTy) > Mantissa)		if ((int)SE.getTypeSizeInBits(SrcTy) > Mantissa)
continue;		continue;

unsigned Entry, Latch;		unsigned Entry, Latch;
if (PH->getIncomingBlock(0) == L->getLoopPreheader()) {		if (PH->getIncomingBlock(0) == L->getLoopPreheader()) {
▲ Show 20 Lines • Show All 3,456 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopStrengthReduce/X86/2008-08-14-ShadowIV.ll

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	bb: ; preds = %bb, %bb.nph
%tmp = sext i8 %indvar.next to i32		%tmp = sext i8 %indvar.next to i32
%exitcond = icmp eq i32 %tmp, 32767 ; <i1> [#uses=1]		%exitcond = icmp eq i32 %tmp, 32767 ; <i1> [#uses=1]
br i1 %exitcond, label %return, label %bb		br i1 %exitcond, label %return, label %bb

return: ; preds = %bb, %entry		return: ; preds = %bb, %entry
ret void		ret void
}		}

		; Unable to eliminate cast because the integer IV overflows (accum exceeds
		; SINT_MAX).

		define i32 @foobar5() {
		; CHECK-LABEL: foobar5(
		; CHECK-NOT: phi double
		; CHECK-NOT: phi float
		entry:
		br label %loop

		loop:
		%accum = phi i32 [ -3220, %entry ], [ %accum.next, %loop ]
		%iv = phi i32 [ 12, %entry ], [ %iv.next, %loop ]
		%tmp1 = sitofp i32 %accum to double
		tail call void @foo( double %tmp1 ) nounwind
		%accum.next = add i32 %accum, 9597741
		%iv.next = add nuw nsw i32 %iv, 1
		%exitcond = icmp ugt i32 %iv, 235
		br i1 %exitcond, label %exit, label %loop

		exit: ; preds = %loop
		ret i32 %accum.next
		}

		; Can eliminate if we set nsw and, thus, think that we don't overflow SINT_MAX.

		define i32 @foobar6() {
		; CHECK-LABEL: foobar6(
		; CHECK: phi double

		entry:
		br label %loop

		loop:
		%accum = phi i32 [ -3220, %entry ], [ %accum.next, %loop ]
		%iv = phi i32 [ 12, %entry ], [ %iv.next, %loop ]
		%tmp1 = sitofp i32 %accum to double
		tail call void @foo( double %tmp1 ) nounwind
		%accum.next = add nsw i32 %accum, 9597741
		%iv.next = add nuw nsw i32 %iv, 1
		%exitcond = icmp ugt i32 %iv, 235
		br i1 %exitcond, label %exit, label %loop

		exit: ; preds = %loop
		ret i32 %accum.next
		}

		; Unable to eliminate cast because the integer IV overflows (accum exceeds
		; UINT_MAX).

		define i32 @foobar7() {
		; CHECK-LABEL: foobar7(
		; CHECK-NOT: phi double
		; CHECK-NOT: phi float
		entry:
		br label %loop

		loop:
		%accum = phi i32 [ -3220, %entry ], [ %accum.next, %loop ]
		%iv = phi i32 [ 12, %entry ], [ %iv.next, %loop ]
		%tmp1 = uitofp i32 %accum to double
		tail call void @foo( double %tmp1 ) nounwind
		%accum.next = add i32 %accum, 9597741
		%iv.next = add nuw nsw i32 %iv, 1
		%exitcond = icmp ugt i32 %iv, 235
		br i1 %exitcond, label %exit, label %loop

		exit: ; preds = %loop
		ret i32 %accum.next
		}

		; Can eliminate if we set nuw and, thus, think that we don't overflow UINT_MAX.

		define i32 @foobar8() {
		; CHECK-LABEL: foobar8(
		; CHECK: phi double

		entry:
		br label %loop

		loop:
		%accum = phi i32 [ -3220, %entry ], [ %accum.next, %loop ]
		%iv = phi i32 [ 12, %entry ], [ %iv.next, %loop ]
		%tmp1 = uitofp i32 %accum to double
		tail call void @foo( double %tmp1 ) nounwind
		%accum.next = add nuw i32 %accum, 9597741
		%iv.next = add nuw nsw i32 %iv, 1
		%exitcond = icmp ugt i32 %iv, 235
		br i1 %exitcond, label %exit, label %loop

		exit: ; preds = %loop
		ret i32 %accum.next
		}

declare void @bar(i32)		declare void @bar(i32)

declare void @foo(double)		declare void @foo(double)

declare i32 @nn(...)		declare i32 @nn(...)