This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Analysis/
-
Analysis/
-
ScalarEvolutionExpander.cpp
-
test/
-
Analysis/ScalarEvolution/
-
ScalarEvolution/
2
expensive-expansion.ll
-
Transforms/IndVarSimplify/
-
IndVarSimplify/
2
no-iv-rewrite.ll

Differential D40945

[ScalarEvolution] Improve high cost heuristic in SCEVExpander.
Needs ReviewPublic

Authored by bevinh on Dec 7 2017, 1:28 AM.

Download Raw Diff

Details

Reviewers

mzolotukhin
sanjoy

Summary

SCEVExpander previously considered all native integer UDivs
by a power of two to be cheap, regardless of the LHS. This
heuristic is rather permissive; the expression on the left
could be incredibly expensive.

This patch has SCEVExpander verify the cost of the LHS as
well. One regression test changed because of this, as the
cost of expanding a SCEV went from low to high.

Diff Detail

Event Timeline

bevinh created this revision.Dec 7 2017, 1:28 AM

Removed the x86 triple from the tests.

bjope added a subscriber: bjope.Dec 7 2017, 2:14 AM

Ka-Ka added a subscriber: Ka-Ka.Dec 7 2017, 3:36 AM

lgtm to me, but I want Michael to take a look at the test update, as mentioned inline.

test/Analysis/ScalarEvolution/expensive-expansion.ll
40	Is this necessary?
test/Transforms/IndVarSimplify/no-iv-rewrite.ll
228–230	@mzolotukhin - can you please review this bit (that the test update is fine)?

sanjoy added a reviewer: mzolotukhin.Dec 7 2017, 11:19 PM

bevinh added inline comments.Dec 8 2017, 1:08 AM

test/Analysis/ScalarEvolution/expensive-expansion.ll
40	When I was constructing the reproducer, I couldn't get loop-unroll to unroll the loop if it wasn't present. Looking closer now, it seems that removing the target triple and the target specific attributes is preventing the unroll from happening even if I undo the patch. I'll see if I can get it to work again.

Enabled runtime loop unroll and added i32 as a legal integer type to prevent SCEVExpander from considering the trip count expensive due to that.

bjope added inline comments.Dec 18 2017, 10:23 AM

test/Transforms/IndVarSimplify/no-iv-rewrite.ll

228–230

Here is some more input to the review (I promised Bevin that I would keep an eye on this during his vacation).
With this patch I get:

define i64 @cloneOr(i32 %limit, i64* %base) #0 {
entry:
  %halfLim = ashr i32 %limit, 2
  %0 = sext i32 %halfLim to i64
  br label %loop

loop:                                             ; preds = %loop, %entry
  %indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ 0, %entry ]
  %adr = getelementptr i64, i64* %base, i64 %indvars.iv
  %val = load i64, i64* %adr
  %1 = or i64 %indvars.iv, 1
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
  %cmp = icmp slt i64 %indvars.iv.next, %0
  br i1 %cmp, label %loop, label %exit

exit:                                             ; preds = %loop
  %val.lcssa = phi i64 [ %val, %loop ]
  %t3.lcssa = phi i64 [ %1, %loop ]
  %result = and i64 %val.lcssa, %t3.lcssa
  ret i64 %result
}

Without the patch:

define i64 @cloneOr(i32 %limit, i64* %base) #0 {
entry:
  %halfLim = ashr i32 %limit, 2
  %0 = sext i32 %halfLim to i64
  %1 = icmp sgt i32 %halfLim, 2
  %smax = select i1 %1, i32 %halfLim, i32 2
  %2 = add i32 %smax, -1
  %3 = lshr i32 %2, 1
  %4 = zext i32 %3 to i64
  %5 = shl i64 %4, 1
  br label %loop

loop:                                             ; preds = %loop, %entry
  %indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ 0, %entry ]
  %adr = getelementptr i64, i64* %base, i64 %indvars.iv
  %val = load i64, i64* %adr
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
  %cmp = icmp slt i64 %indvars.iv.next, %0
  br i1 %cmp, label %loop, label %exit

exit:                                             ; preds = %loop
  %val.lcssa = phi i64 [ %val, %loop ]
  %6 = add i64 %5, 1
  %result = and i64 %val.lcssa, %6
  ret i64 %result
}

It is not completely obvious to me which one is better (at least I find it hard to say that this test case shows that this patch is good).
Not having the or inside the loop is probably good in most cases (at least when iteration count is high). If iteration count is low, then the calculation of %result is simpler when we leave the or inside the loop.

sanjoy resigned from this revision.Jan 29 2022, 5:32 PM

Herald added a subscriber: javed.absar. · View Herald TranscriptJan 29 2022, 5:32 PM

Revision Contents

Path

Size

lib/

Analysis/

ScalarEvolutionExpander.cpp

3 lines

test/

Analysis/

ScalarEvolution/

expensive-expansion.ll

41 lines

Transforms/

IndVarSimplify/

no-iv-rewrite.ll

8 lines

Diff 126100

lib/Analysis/ScalarEvolutionExpander.cpp

Show First 20 Lines • Show All 2,045 Lines • ▼ Show 20 Lines	if (auto *UDivExpr = dyn_cast<SCEVUDivExpr>(S)) {
// If the divisor is a power of two and the SCEV type fits in a native		// If the divisor is a power of two and the SCEV type fits in a native
// integer, consider the division cheap irrespective of whether it occurs in		// integer, consider the division cheap irrespective of whether it occurs in
// the user code since it can be lowered into a right shift.		// the user code since it can be lowered into a right shift.
if (auto *SC = dyn_cast<SCEVConstant>(UDivExpr->getRHS()))		if (auto *SC = dyn_cast<SCEVConstant>(UDivExpr->getRHS()))
if (SC->getAPInt().isPowerOf2()) {		if (SC->getAPInt().isPowerOf2()) {
const DataLayout &DL =		const DataLayout &DL =
L->getHeader()->getParent()->getParent()->getDataLayout();		L->getHeader()->getParent()->getParent()->getDataLayout();
unsigned Width = cast<IntegerType>(UDivExpr->getType())->getBitWidth();		unsigned Width = cast<IntegerType>(UDivExpr->getType())->getBitWidth();
return DL.isIllegalInteger(Width);		return DL.isIllegalInteger(Width) \|\|
		isHighCostExpansionHelper(UDivExpr->getLHS(), L, At, Processed);
}		}

// UDivExpr is very likely a UDiv that ScalarEvolution's HowFarToZero or		// UDivExpr is very likely a UDiv that ScalarEvolution's HowFarToZero or
// HowManyLessThans produced to compute a precise expression, rather than a		// HowManyLessThans produced to compute a precise expression, rather than a
// UDiv from the user's code. If we can't find a UDiv in the code with some		// UDiv from the user's code. If we can't find a UDiv in the code with some
// simple searching, assume the former consider UDivExpr expensive to		// simple searching, assume the former consider UDivExpr expensive to
// compute.		// compute.
BasicBlock *ExitingBB = L->getExitingBlock();		BasicBlock *ExitingBB = L->getExitingBlock();
▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

test/Analysis/ScalarEvolution/expensive-expansion.ll

This file was added.

				; RUN: opt -loop-unroll -unroll-runtime -S < %s \| FileCheck %s

				; Add 32 as a legal integer width so that SCEVExpander won't consider the loop
				; trip count as expensive due to type width.
				target datalayout = "n32"

				@g = global i32 0, align 4

				; Function Attrs: nounwind uwtable
				define void @fn(i32 %start) local_unnamed_addr #0 {
				; CHECK-LABEL: fn
				; CHECK-NOT: for.body.prol

				; LoopUnroll should not unroll this loop as the computed trip count is much
				; too expensive. The SCEV under consideration is:
				; ((2 + (-10 smax (-1 + (-1 * %start))) + %start) /u 2)
				; SCEVExpander previously thought this was cheap as it had a legal
				; division-by-power-of-2 on the RHS, but the LHS is quite large already.

				entry:
				%cmp1 = icmp sgt i32 %start, 7
				br i1 %cmp1, label %for.body.preheader, label %for.end

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.body: ; preds = %for.body.preheader, %for.body
				%i.02 = phi i32 [ %sub, %for.body ], [ %start, %for.body.preheader ]
				%0 = load volatile i32, i32* @g, align 4
				%inc = add nsw i32 %0, 1
				store volatile i32 %inc, i32* @g, align 4
				%sub = add nsw i32 %i.02, -2
				%cmp = icmp sgt i32 %i.02, 9
				br i1 %cmp, label %for.body, label %for.end.loopexit

				for.end.loopexit: ; preds = %for.body
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				ret void
				sanjoyUnsubmitted Not Done Reply Inline Actions Is this necessary? sanjoy: Is this necessary?
				bevinhAuthorUnsubmitted Not Done Reply Inline Actions When I was constructing the reproducer, I couldn't get loop-unroll to unroll the loop if it wasn't present. Looking closer now, it seems that removing the target triple and the target specific attributes is preventing the unroll from happening even if I undo the patch. I'll see if I can get it to work again. bevinh: When I was constructing the reproducer, I couldn't get loop-unroll to unroll the loop if it…
				}

test/Transforms/IndVarSimplify/no-iv-rewrite.ll

	Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	define i64 @cloneOr(i32 %limit, i64* %base) nounwind {			define i64 @cloneOr(i32 %limit, i64* %base) nounwind {
	entry:			entry:
	; ensure that the loop can't overflow			; ensure that the loop can't overflow
	%halfLim = ashr i32 %limit, 2			%halfLim = ashr i32 %limit, 2
	br label %loop			br label %loop

	; This test originally checked that the OR instruction was cloned. Now the			; This test originally checked that the OR instruction was cloned. Now the
	; ScalarEvolution is able to understand the loop evolution and that '%iv' at the			; ScalarEvolution is able to understand the loop evolution and that '%iv' at the
	; end of the loop is an even value. Thus '%val' is computed at the end of the			; end of the loop is an even value. However, the expression is too expensive to
	; loop and the OR instruction is replaced by an ADD keeping the result			; compute as the loop is not guarded, so the OR is not cloned (and changed into
	; equivalent.			; an add) as it was before.
				sanjoyUnsubmitted Not Done Reply Inline Actions @mzolotukhin - can you please review this bit (that the test update is fine)? sanjoy: @mzolotukhin - can you please review this bit (that the test update is fine)?
				bjopeUnsubmitted Not Done Reply Inline Actions Here is some more input to the review (I promised Bevin that I would keep an eye on this during his vacation). With this patch I get: define i64 @cloneOr(i32 %limit, i64* %base) #0 { entry: %halfLim = ashr i32 %limit, 2 %0 = sext i32 %halfLim to i64 br label %loop loop: ; preds = %loop, %entry %indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ 0, %entry ] %adr = getelementptr i64, i64* %base, i64 %indvars.iv %val = load i64, i64* %adr %1 = or i64 %indvars.iv, 1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2 %cmp = icmp slt i64 %indvars.iv.next, %0 br i1 %cmp, label %loop, label %exit exit: ; preds = %loop %val.lcssa = phi i64 [ %val, %loop ] %t3.lcssa = phi i64 [ %1, %loop ] %result = and i64 %val.lcssa, %t3.lcssa ret i64 %result } Without the patch: define i64 @cloneOr(i32 %limit, i64* %base) #0 { entry: %halfLim = ashr i32 %limit, 2 %0 = sext i32 %halfLim to i64 %1 = icmp sgt i32 %halfLim, 2 %smax = select i1 %1, i32 %halfLim, i32 2 %2 = add i32 %smax, -1 %3 = lshr i32 %2, 1 %4 = zext i32 %3 to i64 %5 = shl i64 %4, 1 br label %loop loop: ; preds = %loop, %entry %indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ 0, %entry ] %adr = getelementptr i64, i64* %base, i64 %indvars.iv %val = load i64, i64* %adr %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2 %cmp = icmp slt i64 %indvars.iv.next, %0 br i1 %cmp, label %loop, label %exit exit: ; preds = %loop %val.lcssa = phi i64 [ %val, %loop ] %6 = add i64 %5, 1 %result = and i64 %val.lcssa, %6 ret i64 %result } It is not completely obvious to me which one is better (at least I find it hard to say that this test case shows that this patch is good). Not having the `or` inside the loop is probably good in most cases (at least when iteration count is high). If iteration count is low, then the calculation of %result is simpler when we leave the `or` inside the loop. bjope: Here is some more input to the review (I promised Bevin that I would keep an eye on this during…
	;			;
	; CHECK: sext			; CHECK: sext
	; CHECK: loop:			; CHECK: loop:
	; CHECK: phi i64			; CHECK: phi i64
	; CHECK-NOT: sext			; CHECK-NOT: sext
	; CHECK: icmp slt i64			; CHECK: icmp slt i64
	; CHECK: exit:			; CHECK: exit:
	; CHECK: add i64			; CHECK-NOT: add i64
	loop:			loop:
	%iv = phi i32 [ 0, %entry], [ %iv.next, %loop ]			%iv = phi i32 [ 0, %entry], [ %iv.next, %loop ]
	%t1 = sext i32 %iv to i64			%t1 = sext i32 %iv to i64
	%adr = getelementptr i64, i64* %base, i64 %t1			%adr = getelementptr i64, i64* %base, i64 %t1
	%val = load i64, i64* %adr			%val = load i64, i64* %adr
	%t2 = or i32 %iv, 1			%t2 = or i32 %iv, 1
	%t3 = sext i32 %t2 to i64			%t3 = sext i32 %t2 to i64
	%iv.next = add i32 %iv, 2			%iv.next = add i32 %iv, 2
	▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ScalarEvolution] Improve high cost heuristic in SCEVExpander.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 126100

lib/Analysis/ScalarEvolutionExpander.cpp

test/Analysis/ScalarEvolution/expensive-expansion.ll

test/Transforms/IndVarSimplify/no-iv-rewrite.ll

[ScalarEvolution] Improve high cost heuristic in SCEVExpander.
Needs ReviewPublic