This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Allow formula containing Reg for SCEVAddRecExpr with loop other than current loop
ClosedPublic

Authored by wmi on Nov 8 2016, 3:35 PM.

Download Raw Diff

Details

Reviewers

qcolombet
atrick
sanjoy

Commits

rG8f20e63a20b1: [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with…
rG7ccf7651c0e1: [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
rL294814: [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with…
rL286999: [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.

Summary

In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is not current loop, the formula will be marked as Loser and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop.
Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle. That is why the formula doesn't have to be dropped.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi updated this revision to Diff 77278.Nov 8 2016, 3:35 PM

wmi retitled this revision from to [LSR] Allow formula containing Reg for SCEVAddRecExpr with loop other than current loop.

wmi updated this object.

wmi added reviewers: sanjoy, atrick.

wmi set the repository for this revision to rL LLVM.

wmi added subscribers: llvm-commits, davidxl.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptNov 8 2016, 3:35 PM

Hi Wei,

Looks mostly good to me.
Couple of comments regarding the test case.

Cheers,
-Quentin

test/Transforms/LoopStrengthReduce/nested-loop.ll
2	Could you add a comment on what this loop is supposed to check? I believe we want to stress that the relevant bit is the use of outer loop IV in inner loop.
2	Instead of parsing the debug output, could you expose a test case that takes advantage of the newly kept formulae? Basically, I am asking for check lines over the IR, not the debug output.

Quentin, thanks for the review.

test/Transforms/LoopStrengthReduce/nested-loop.ll
2	Added the comment on what the test is about to check.
2	Ok. I changed the check to be over IR.

Updated the patch per Quentin's comment.

Thanks Wei.

Looks good.

Cheers,
Quentin

test/Transforms/LoopStrengthReduce/nested-loop.ll
3	Nitpick: Break the line in multiple lines.

This revision is now accepted and ready to land.Nov 14 2016, 1:02 PM

Closed by commit rL286999: [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop. (authored by wmi). · Explain WhyNov 15 2016, 10:45 AM

This revision was automatically updated to reflect the committed changes.

First, sorry to revisit the patch after long time.

The patch was reverted after being commited because of the testcase failure: test/CodeGen/AArch64/eliminate-trunc.ll.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/412459.html.

Here is some explaination about how the patch broke the testcase and how it was fixed:

How the patch affect the testcase:

By allowing fomula containing SCEVAddRecExpr Reg from outer loop, LSR has more options to choose than before and it chooses to use fewer induction variables. Without the patch, two induction vars will be used. With the patch, only one induction var will be used.

Without the patch:

Before LSR:
  There is only one induction variable being used. The induction var will both be used in array suffix expr and also used in the compare expr in loop latch (other than in the array suffix expr, the induction var is used as a 32bit var), so it should be an 64 bit var.

After LSR:
  Two induction vars will be generated. One is used in array suffix expr. The other is used as loop iteration counter. They are separated from each other so the loop iteration counter can be a 32 bit var. That is what CHECK in the testcase is looking for.
   Note in asm, addrec instruction for the induction var used in array suffix expr is absorbed by post-increment load/store.

With the patch:

After LSR:
  Only one induction var will be used. Since it is both used in array suffix expr and used as loop iteration counter, it should be a 64 bit var. That is how the patch broke the testcase.

To make the testcase remain valid, I change the testcase and stop using outerloop induction var in array suffix expr, so we won't have fomula containing SCEVAddRecExpr Reg from outer loop, i.e., the patch will stop kicking in for the testcase.

Is it ok to fix the testcase like this and recommit the patch?

Thanks.

Hi Wei,

SGTM.

Cheers,
-Quentin

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

11 lines

test/

Transforms/

LoopStrengthReduce/

nested-loop.ll

55 lines

Diff 77278

lib/Transforms/Scalar/LoopStrengthReduce.cpp

	Show First 20 Lines • Show All 1,042 Lines • ▼ Show 20 Lines
	}			}

	/// Tally up interesting quantities from the given register.			/// Tally up interesting quantities from the given register.
	void Cost::RateRegister(const SCEV *Reg,			void Cost::RateRegister(const SCEV *Reg,
	SmallPtrSetImpl<const SCEV *> &Regs,			SmallPtrSetImpl<const SCEV *> &Regs,
	const Loop *L,			const Loop *L,
	ScalarEvolution &SE, DominatorTree &DT) {			ScalarEvolution &SE, DominatorTree &DT) {
	if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {			if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {
	// If this is an addrec for another loop, don't second-guess its addrec phi			// If this is an addrec for another loop, it should be an invariant
	// nodes. LSR isn't currently smart enough to reason about more than one			// with respect to L since L is the innermost loop (at least
	// loop at a time. LSR has already run on inner loops, will not run on outer			// for now LSR only handles innermost loops).
	// loops, and cannot be expected to change sibling loops.
	if (AR->getLoop() != L) {			if (AR->getLoop() != L) {
	// If the AddRec exists, consider it's register free and leave it alone.			// If the AddRec exists, consider it's register free and leave it alone.
	if (isExistingPhi(AR, SE))			if (isExistingPhi(AR, SE))
	return;			return;

	// Otherwise, do not consider this formula at all.			// Otherwise, it will be an invariant with respect to Loop L.
	Lose();			++NumRegs;
	return;			return;
	}			}
	AddRecCost += 1; /// TODO: This should be a function of the stride.			AddRecCost += 1; /// TODO: This should be a function of the stride.

	// Add the step value register, if it needs one.			// Add the step value register, if it needs one.
	// TODO: The non-affine case isn't precisely modeled here.			// TODO: The non-affine case isn't precisely modeled here.
	if (!AR->isAffine() \|\| !isa<SCEVConstant>(AR->getOperand(1))) {			if (!AR->isAffine() \|\| !isa<SCEVConstant>(AR->getOperand(1))) {
	if (!Regs.count(AR->getOperand(1))) {			if (!Regs.count(AR->getOperand(1))) {
	▲ Show 20 Lines • Show All 3,976 Lines • Show Last 20 Lines

test/Transforms/LoopStrengthReduce/nested-loop.ll

				; RUN: opt -loop-reduce -disable-output -debug-only=loop-reduce < %s 2>&1 \| FileCheck %s
				; REQUIRES: asserts
				qcolombetUnsubmitted Not Done Reply Inline Actions Could you add a comment on what this loop is supposed to check? I believe we want to stress that the relevant bit is the use of outer loop IV in inner loop. qcolombet: Could you add a comment on what this loop is supposed to check? I believe we want to stress…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Added the comment on what the test is about to check. wmi: Added the comment on what the test is about to check.
				qcolombetUnsubmitted Not Done Reply Inline Actions Instead of parsing the debug output, could you expose a test case that takes advantage of the newly kept formulae? Basically, I am asking for check lines over the IR, not the debug output. qcolombet: Instead of parsing the debug output, could you expose a test case that takes advantage of the…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok. I changed the check to be over IR. wmi: Ok. I changed the check to be over IR.

				qcolombetUnsubmitted Not Done Reply Inline Actions Nitpick: Break the line in multiple lines. qcolombet: Nitpick: Break the line in multiple lines.
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define void @foo(i32 %size, i32 %nsteps, i32 %hsize, i32* %lined, i8* %maxarray) {
				entry:
				%cmp215 = icmp sgt i32 %size, 1
				%t0 = zext i32 %size to i64
				%t1 = sext i32 %nsteps to i64
				%sub2 = sub i64 %t0, 2
				br label %for.body

				for.body: ; preds = %for.inc, %entry
				%indvars.iv2 = phi i64 [ %indvars.iv.next3, %for.inc ], [ 0, %entry ]
				%t2 = mul nsw i64 %indvars.iv2, %t0
				br i1 %cmp215, label %for.body2.preheader, label %for.inc

				for.body2.preheader: ; preds = %for.body
				br label %for.body2

				; CHECK: After filtering out undesirable candidates:
				; CHECK: LSR is examining the following uses:
				; CHECK: LSR Use: Kind=Address
				; For ivuse with induction variables from both inner loop and outer loop, expect that the formula candidates for LSR solving contains the formula with a separate reg used for basic induction variable of inner loop.
				; CHECK: reg({1,+,(zext i32 %size to i64)}<%for.body>) + 1*reg({0,+,1}<%for.body2>)

				for.body2: ; preds = %for.body2.preheader, %for.body2
				%indvars.iv = phi i64 [ 1, %for.body2.preheader ], [ %indvars.iv.next, %for.body2 ]
				%arrayidx1 = getelementptr inbounds i8, i8* %maxarray, i64 %indvars.iv
				%v1 = load i8, i8* %arrayidx1, align 1
				%idx2 = add nsw i64 %indvars.iv, %sub2
				%arrayidx2 = getelementptr inbounds i8, i8* %maxarray, i64 %idx2
				%v2 = load i8, i8* %arrayidx2, align 1
				%tmpv = xor i8 %v1, %v2
				%t4 = add nsw i64 %t2, %indvars.iv
				%add.ptr = getelementptr inbounds i8, i8* %maxarray, i64 %t4
				store i8 %tmpv, i8* %add.ptr, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%wide.trip.count = zext i32 %size to i64
				%exitcond = icmp ne i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %for.body2, label %for.inc.loopexit

				for.inc.loopexit: ; preds = %for.body2
				br label %for.inc

				for.inc: ; preds = %for.inc.loopexit, %for.body
				%indvars.iv.next3 = add nuw nsw i64 %indvars.iv2, 1
				%cmp = icmp slt i64 %indvars.iv.next3, %t1
				br i1 %cmp, label %for.body, label %for.end.loopexit

				for.end.loopexit: ; preds = %for.inc
				ret void
				}