This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
IndVarSimplify.cpp
-
test/Transforms/IndVarSimplify/
-
Transforms/
-
IndVarSimplify/
-
iv-widen.ll

Differential D26059

[IndVars] Change the order to compute WidenAddRec in widenIVUse
ClosedPublic

Authored by wmi on Oct 27 2016, 6:29 PM.

Download Raw Diff

Details

Reviewers

atrick
sanjoy

Commits

rGd2948cef7059: [IndVars] Change the order to compute WidenAddRec in widenIVUse.
rL286987: [IndVars] Change the order to compute WidenAddRec in widenIVUse.

Summary

The patch is to solve the case like following which current indvar cannot widen because of limitation caused by SCEV folding.

void foo(int nsteps, unsigned char *maxarray, int startx, int V1, int V2, unsigned *lined) {

int j, k;
for (j = 0; j < nsteps; j++)
startx = V1 + j * V2;
for (k = 1; k < size - 1; k++)
((unsigned char *)lined[startx + k]) = ...

We have a two level nested loops. There is a array store lined[startx + k] inside the inner loop. Its index is a result of add expr with nsw flag. The operand0 startx can be represented as a SCEV: {V1 ,+, V2}<outerloop> (Note: no nsw here). The operand1 k can be represented as a SCEV: {1 ,+, 1}<nsw><innerloop>.

Because of folding in ScalarEvolution::getAddExpr (folding loop invariant into add recurrence), the SCEV of the add expr is: {{(1 + V1)<nsw> ,+, V2}<outerloop> ,+, 1}<nsw><innerloop>. As a result of the folding, IndVar cannot prove sext{{(1 + V1)<nsw> ,+, V2}<outerloop> ,+, 1}<nsw><innerloop> == sext( {V1 ,+, V2}<outerloop>) +nsw sext( {1 ,+, 1}<nsw><innerloop>) ("The major reason is there is no nsw for {V1 ,+, V2}<outerloop>"), so it cannot do widening for the add expr and cannot remove the sext of the index.

However, since we know from IR that the add expr "startx + k" has nsw flag, we know it is legal to do widening for the expr. We already utilize the nsw flag from IR to enhance the widening by using getExtendedOperandRecurrence. Why it doesn't work here?

To answer the question by myself, the problem is that in this case both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence return non-null but different WideAddRec. Because getWideRecurrence is called before getExtendedOperandRecurrence, we won't bother to use the result of getExtendedOperandRecurrence. However, if WidenIV::getExtendedOperandRecurrence returns non-null WideAddRec, we know for sure that it is legal to do widening for current instruction, so we should actually put getExtendedOperandRecurrence before getWideRecurrence. This is what the patch is doing.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi updated this revision to Diff 76154.Oct 27 2016, 6:29 PM

wmi retitled this revision from to [IndVars] Change the order to compute WidenAddRec in widenIVUse.

wmi updated this object.

wmi added reviewers: sanjoy, atrick.

wmi set the repository for this revision to rL LLVM.

wmi added subscribers: llvm-commits, davidxl.

lgtm, thanks!

If it isn't too much trouble, can you add a zext equivalent of the test case too, just as a sanity check?

This revision is now accepted and ready to land.Nov 8 2016, 6:33 PM

Add zext to the iv-widen.ll test.

I still need to update iv-widen-elim-ext.ll. Now SE->isKnownPredicate return false for "%add = add nsw i32 %i.02, 2" in foo because %add feeds into zext and zext doesn't propagate full poison, so the nsw flag cannot be copied from IR to SCEV. As a result, NarrowIVDefUse::NeverNegative for %add is false and my change here will generate an extra trunc before zext. The fact that existing compiler doesn't generate the extra trunc is by luck to some extent.

I update the test to add another use for %add: "udiv 5, %add", which is to ensure %add is not poison otherwise there will be undefine behavior, and ensure that nsw flag on %add instruction can be copied to its SCEV, so that NarrowIVDefUse::NeverNegative for %add will be true.

After the test update, w/wo the change here, indvars will not generate extra trunc for sext/zext for foo in the testcase.

Closed by commit rL286987: [IndVars] Change the order to compute WidenAddRec in widenIVUse. (authored by wmi). · Explain WhyNov 15 2016, 9:44 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

IndVarSimplify.cpp

4 lines

test/

Transforms/

IndVarSimplify/

iv-widen.ll

39 lines

Diff 76154

lib/Transforms/Scalar/IndVarSimplify.cpp

Show First 20 Lines • Show All 1,321 Lines • ▼ Show 20 Lines	if (IsSigned ? isa<SExtInst>(DU.NarrowUse) : isa<ZExtInst>(DU.NarrowUse)) {
// new loop phi. If we preserved IVUsers analysis, we would also want to		// new loop phi. If we preserved IVUsers analysis, we would also want to
// push the uses of WideDef here.		// push the uses of WideDef here.

// No further widening is needed. The deceased [sz]ext had done it for us.		// No further widening is needed. The deceased [sz]ext had done it for us.
return nullptr;		return nullptr;
}		}

// Does this user itself evaluate to a recurrence after widening?		// Does this user itself evaluate to a recurrence after widening?
const SCEVAddRecExpr *WideAddRec = getWideRecurrence(DU.NarrowUse);		const SCEVAddRecExpr *WideAddRec = getExtendedOperandRecurrence(DU);
if (!WideAddRec)		if (!WideAddRec)
WideAddRec = getExtendedOperandRecurrence(DU);		WideAddRec = getWideRecurrence(DU.NarrowUse);

if (!WideAddRec) {		if (!WideAddRec) {
// If use is a loop condition, try to promote the condition instead of		// If use is a loop condition, try to promote the condition instead of
// truncating the IV first.		// truncating the IV first.
if (widenLoopCompare(DU))		if (widenLoopCompare(DU))
return nullptr;		return nullptr;

// This user does not evaluate to a recurence after widening, so don't		// This user does not evaluate to a recurence after widening, so don't
▲ Show 20 Lines • Show All 976 Lines • Show Last 20 Lines

test/Transforms/IndVarSimplify/iv-widen.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	; CHECK: call void @dummy.i64(i64 [[IV_INC]])
br i1 %be.cond, label %loop, label %leave		br i1 %be.cond, label %loop, label %leave

leave:		leave:
ret void		ret void
}		}

declare void @dummy(i32)		declare void @dummy(i32)
declare void @dummy.i64(i64)		declare void @dummy.i64(i64)


		define void @loop_2(i32 %size, i32 %nsteps, i32 %hsize, i32* %lined, i8 %tmp1) {
		; CHECK-LABEL: @loop_2(
		entry:
		%cmp215 = icmp sgt i32 %size, 1
		%tmp0 = bitcast i32* %lined to i8*
		br label %for.body

		for.body:
		%j = phi i32 [ 0, %entry ], [ %inc6, %for.inc ]
		%mul = mul nsw i32 %j, %size
		%add = add nsw i32 %mul, %hsize
		br i1 %cmp215, label %for.body2, label %for.inc

		; check that the induction variable of the inner loop has been widened after indvars.
		; CHECK: [[INNERLOOPINV:%[^ ]+]] = add nsw i64
		; CHECK: for.body2:
		; CHECK: %indvars.iv = phi i64 [ 1, %for.body2.preheader ], [ %indvars.iv.next, %for.body2 ]
		; CHECK: [[WIDENED:%[^ ]+]] = add nsw i64 [[INNERLOOPINV]], %indvars.iv
		; CHECK: %add.ptr = getelementptr inbounds i8, i8* %tmp0, i64 [[WIDENED]]
		for.body2:
		%k = phi i32 [ %inc, %for.body2 ], [ 1, %for.body ]
		%add4 = add nsw i32 %add, %k
		%idx.ext = sext i32 %add4 to i64
		%add.ptr = getelementptr inbounds i8, i8* %tmp0, i64 %idx.ext
		store i8 %tmp1, i8* %add.ptr, align 1
		%inc = add nsw i32 %k, 1
		%cmp2 = icmp slt i32 %inc, %size
		br i1 %cmp2, label %for.body2, label %for.inc

		for.inc:
		%inc6 = add nsw i32 %j, 1
		%cmp = icmp slt i32 %inc6, %nsteps
		br i1 %cmp, label %for.body, label %for.end.loopexit

		for.end.loopexit:
		ret void
		}