This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
IndVarSimplify.cpp
-
test/Transforms/IndVarSimplify/
-
Transforms/
-
IndVarSimplify/
-
eliminate-comparison.ll
-
eliminate-trunc.ll

Differential D63618

Exploit a zero LoopExit count to eliminate loop exits
ClosedPublic

Authored by reames on Jun 20 2019, 12:29 PM.

Download Raw Diff

Details

Reviewers

nikic
sanjoy
apilipenko

Commits

rG8deb84c8ef85: Exploit a zero LoopExit count to eliminate loop exits
rL364135: Exploit a zero LoopExit count to eliminate loop exits

Summary

This turned out to be surprisingly effective. I was originally doing this just for completeness sake, but it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

Once this is in, I'm thinking about trying to build on the same infrastructure to eliminate provably untaken checks. There may be something generally interesting here.

Diff Detail

Repository: rL LLVM

Event Timeline

reames created this revision.Jun 20 2019, 12:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2019, 12:29 PM

Herald added subscribers: javed.absar, bollu, mcrosier. · View Herald Transcript

it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

This is somewhat surprising. For instance in @test_06 I would have expected SCEV to compute the range of %narrow.iv to be [INT32_MAX, INT32_MAX+1) (using the fact that the backedge taken count is 0), and thus figure resolve the icmp slt to false by just looking at the ranges. Any idea why that's not happening?

In D63618#1553053, @sanjoy wrote:

it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

This is somewhat surprising. For instance in @test_06 I would have expected SCEV to compute the range of %narrow.iv to be [INT32_MAX, INT32_MAX+1) (using the fact that the backedge taken count is 0), and thus figure resolve the icmp slt to false by just looking at the ranges. Any idea why that's not happening?

I haven't looked into that one specifically, but in general, I've been noticing a *lot* of issues with caching and computation of overflow bits. I've now seen several cases where we form a SCEV, then later figure out one component of it is nsw/nuw, but don't simplify the previously formed SCEV. Since we rely on equality in many cases for equivalence checks, that means we fail to prove things which appear obvious. Usually, a second run on indvars (after flags are set in IR) gets these cases.

Also, there is an argument that exit count reasoning "should" be more powerful than the predicate based form. When we're analysing a compare *knowing it controls an exit* we do have slightly more information than in the general predicate reasoning. (Consider a loop with one unanalysable exit and thus no exact BE count for the loop as a whole..)

In D63618#1553926, @reames wrote:

In D63618#1553053, @sanjoy wrote:

it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

This is somewhat surprising. For instance in @test_06 I would have expected SCEV to compute the range of %narrow.iv to be [INT32_MAX, INT32_MAX+1) (using the fact that the backedge taken count is 0), and thus figure resolve the icmp slt to false by just looking at the ranges. Any idea why that's not happening?

I haven't looked into that one specifically,

SCEV seems to compute the correct range for that specific case:

%narrow.iv = trunc i64 %iv to i32
-->  {2147483647,+,1}<%loop> U: [2147483647,-2147483648) S: [2147483647,-2147483648)          Exits: 2147483647               LoopDispositions: { %loop: Computable }

so the comparison should be folded based on constant range analysis alone. Do you mind spot checking if indvars is even _trying_ to fold the condition correctly?

but in general, I've been noticing a *lot* of issues with caching and computation of overflow bits. I've now seen several cases where we form a SCEV, then later figure out one component of it is nsw/nuw, but don't simplify the previously formed SCEV.

Having proper use lists for SCEV might help. That would let us simplify users of a SCEV in a simplified manner once a SCEV has been proven not to wrap.

Since we rely on equality in many cases for equivalence checks, that means we fail to prove things which appear obvious. Usually, a second run on indvars (after flags are set in IR) gets these cases.

Also, there is an argument that exit count reasoning "should" be more powerful than the predicate based form. When we're analysing a compare *knowing it controls an exit* we do have slightly more information than in the general predicate reasoning. (Consider a loop with one unanalysable exit and thus no exact BE count for the loop as a whole..)

Agreed. I'm mainly curious about the specific test changes in this patch, not with the general idea.

This revision is now accepted and ready to land.Jun 21 2019, 11:09 AM

In D63618#1554009, @sanjoy wrote:
In D63618#1553926, @reames wrote:

In D63618#1553053, @sanjoy wrote:

it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

This is somewhat surprising. For instance in @test_06 I would have expected SCEV to compute the range of %narrow.iv to be [INT32_MAX, INT32_MAX+1) (using the fact that the backedge taken count is 0), and thus figure resolve the icmp slt to false by just looking at the ranges. Any idea why that's not happening?

I haven't looked into that one specifically,

SCEV seems to compute the correct range for that specific case:
%narrow.iv = trunc i64 %iv to i32
-->  {2147483647,+,1}<%loop> U: [2147483647,-2147483648) S: [2147483647,-2147483648)          Exits: 2147483647               LoopDispositions: { %loop: Computable }
so the comparison should be folded based on constant range analysis alone. Do you mind spot checking if indvars is even _trying_ to fold the condition correctly?

SimplifyIndVar generally only works on direct users of the IV, including for icmp simplification. In this case there's a trunc between the IV and the icmp, so we don't try to simplify.

Closed by commit rL364135: Exploit a zero LoopExit count to eliminate loop exits (authored by reames). · Explain WhyJun 22 2019, 10:56 AM

This revision was automatically updated to reflect the committed changes.

In D63618#1554009, @sanjoy wrote:

Agreed. I'm mainly curious about the specific test changes in this patch, not with the general idea.

I dug into this a bit, and it turns out the Nikita is partly right. What's going on is that when we visit a cast (for the widening analysis), we always skip it's users (for the simplifying transforms). This doesn't seem to really make any sense, and removing it doesn't appear to break any test. Anyone have any idea why this might be the case? I'm really tempted to just remove the continue.

In D63618#1554829, @reames wrote:

In D63618#1554009, @sanjoy wrote:

Agreed. I'm mainly curious about the specific test changes in this patch, not with the general idea.

I dug into this a bit, and it turns out the Nikita is partly right. What's going on is that when we visit a cast (for the widening analysis), we always skip it's users (for the simplifying transforms). This doesn't seem to really make any sense, and removing it doesn't appear to break any test. Anyone have any idea why this might be the case? I'm really tempted to just remove the continue.

That sounds reasonable, but here are a couple of things you might want to check before making that change (I assume you're talking about removing this like: https://github.com/llvm-mirror/llvm/blob/5fae5e00b26a33fcdc2c350ddbe5b0bc36b67ea6/lib/Transforms/Utils/SimplifyIndVar.cpp#L919)):

Maybe we're skipping sext/zext since we're going to create a wider IV and revisit the widened users anyway (in IndVarSimplify::simplifyAndExtend).
Maybe we're skipping trunc because most likely the trunc was inserted by a previous round of Widener.createWideIV and we do not want to re-do the work of simplifying the narrow IV.
Maybe SimplifyIndvar::simplifyUsers expects all of the IV users it is asked to simplify to have the same bitwidth.

reames mentioned this in D63733: [IndVars] Use exit count reasoning to discharge obviously untaken exits.Jun 24 2019, 12:23 PM

reames mentioned this in rL365920: [IndVars] Use exit count reasoning to discharge obviously untaken exits.Jul 12 2019, 10:05 AM

reames mentioned this in rG34495b553383: [IndVars] Use exit count reasoning to discharge obviously untaken exits.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

IndVarSimplify.cpp

16 lines

test/

Transforms/

IndVarSimplify/

eliminate-comparison.ll

3 lines

eliminate-trunc.ll

4 lines

Diff 206124

llvm/trunk/lib/Transforms/Scalar/IndVarSimplify.cpp

Show First 20 Lines • Show All 2,718 Lines • ▼ Show 20 Lines	for (BasicBlock *ExitingBB : ExitingBlocks) {

if (!needsLFTR(L, ExitingBB))		if (!needsLFTR(L, ExitingBB))
continue;		continue;

const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);		const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
if (isa<SCEVCouldNotCompute>(ExitCount))		if (isa<SCEVCouldNotCompute>(ExitCount))
continue;		continue;

// Better to fold to true (TODO: do so!)		// If we know we'd exit on the first iteration, rewrite the exit to
if (ExitCount->isZero())		// reflect this. This does not imply the loop must exit through this
		// exit; there may be an earlier one taken on the first iteration.
		// TODO: Given we know the backedge can't be taken, we should go ahead
		// and break it. Or at least, kill all the header phis and simplify.
		if (ExitCount->isZero()) {
		auto *BI = cast<BranchInst>(ExitingBB->getTerminator());
		bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));
		auto *NewCond = ExitIfTrue ?
		ConstantInt::getTrue(BI->getCondition()->getType()) :
		ConstantInt::getFalse(BI->getCondition()->getType());
		BI->setCondition(NewCond);
		Changed = true;
continue;		continue;
		}

PHINode *IndVar = FindLoopCounter(L, ExitingBB, ExitCount, SE, DT);		PHINode *IndVar = FindLoopCounter(L, ExitingBB, ExitCount, SE, DT);
if (!IndVar)		if (!IndVar)
continue;		continue;

// Avoid high cost expansions. Note: This heuristic is questionable in		// Avoid high cost expansions. Note: This heuristic is questionable in
// that our definition of "high cost" is not exactly principled.		// that our definition of "high cost" is not exactly principled.
if (Rewriter.isHighCostExpansion(ExitCount, L))		if (Rewriter.isHighCostExpansion(ExitCount, L))
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/IndVarSimplify/eliminate-comparison.ll

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[__KEY8_0:%.]] = phi i32 [ [[TMP81:%.]], [[NOASSERT68:%.*]] ], [ 2, [[FORCOND38_PREHEADER]] ]		; CHECK-NEXT: [[__KEY8_0:%.]] = phi i32 [ [[TMP81:%.]], [[NOASSERT68:%.*]] ], [ 2, [[FORCOND38_PREHEADER]] ]
; CHECK-NEXT: br i1 true, label [[NOASSERT68]], label [[UNROLLEDEND:%.*]]		; CHECK-NEXT: br i1 true, label [[NOASSERT68]], label [[UNROLLEDEND:%.*]]
; CHECK: noassert68:		; CHECK: noassert68:
; CHECK-NEXT: [[TMP57:%.*]] = sdiv i32 -32768, [[__KEY8_0]]		; CHECK-NEXT: [[TMP57:%.*]] = sdiv i32 -32768, [[__KEY8_0]]
; CHECK-NEXT: [[SEXT34:%.*]] = shl i32 [[TMP57]], 16		; CHECK-NEXT: [[SEXT34:%.*]] = shl i32 [[TMP57]], 16
; CHECK-NEXT: [[SEXT21:%.*]] = shl i32 [[TMP57]], 16		; CHECK-NEXT: [[SEXT21:%.*]] = shl i32 [[TMP57]], 16
; CHECK-NEXT: [[TMP76:%.*]] = icmp ne i32 [[SEXT34]], [[SEXT21]]		; CHECK-NEXT: [[TMP76:%.*]] = icmp ne i32 [[SEXT34]], [[SEXT21]]
; CHECK-NEXT: [[TMP81]] = add nuw nsw i32 [[__KEY8_0]], 1		; CHECK-NEXT: [[TMP81]] = add nuw nsw i32 [[__KEY8_0]], 1
; CHECK-NEXT: br i1 [[TMP76]], label [[FORCOND38]], label [[ASSERT77:%.*]]		; CHECK-NEXT: br i1 false, label [[FORCOND38]], label [[ASSERT77:%.*]]
; CHECK: assert77:		; CHECK: assert77:
; CHECK-NEXT: tail call void @llvm.trap()		; CHECK-NEXT: tail call void @llvm.trap()
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: unrolledend:		; CHECK: unrolledend:
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
;		;
entry:		entry:
br label %forcond		br label %forcond
▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	be:
call void @side_effect()		call void @side_effect()
%be.cond = icmp sgt i32 %iv.dec, 4		%be.cond = icmp sgt i32 %iv.dec, 4
br i1 %be.cond, label %loop, label %leave		br i1 %be.cond, label %loop, label %leave

leave:		leave:
ret void		ret void
}		}


!0 = !{i32 0, i32 2147483647}		!0 = !{i32 0, i32 2147483647}

llvm/trunk/test/Transforms/IndVarSimplify/eliminate-trunc.ll

	Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @test_06(			; CHECK-LABEL: @test_06(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ -2147483649, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ -2147483649, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[NARROW_IV:%.*]] = trunc i64 [[IV]] to i32			; CHECK-NEXT: [[NARROW_IV:%.*]] = trunc i64 [[IV]] to i32
	; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[NARROW_IV]], [[N:%.]]			; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[NARROW_IV]], [[N:%.]]
	; CHECK-NEXT: br i1 [[CMP]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 false, label [[LOOP]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop			br label %loop
	loop:			loop:
	%iv = phi i64 [ -2147483649, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ -2147483649, %entry ], [ %iv.next, %loop ]
	%iv.next = add i64 %iv, 1			%iv.next = add i64 %iv, 1
	▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @test_06_unsigned(			; CHECK-LABEL: @test_06_unsigned(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ -1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ -1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[IV_NEXT]] = add nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nsw i64 [[IV]], 1
	; CHECK-NEXT: [[NARROW_IV:%.*]] = trunc i64 [[IV]] to i32			; CHECK-NEXT: [[NARROW_IV:%.*]] = trunc i64 [[IV]] to i32
	; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[NARROW_IV]], [[N:%.]]			; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[NARROW_IV]], [[N:%.]]
	; CHECK-NEXT: br i1 [[CMP]], label [[LOOP]], label [[EXIT:%.*]]			; CHECK-NEXT: br i1 false, label [[LOOP]], label [[EXIT:%.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %loop			br label %loop
	loop:			loop:
	%iv = phi i64 [ -1, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ -1, %entry ], [ %iv.next, %loop ]
	%iv.next = add i64 %iv, 1			%iv.next = add i64 %iv, 1
	▲ Show 20 Lines • Show All 303 Lines • Show Last 20 Lines