This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Allow formula containing Reg for SCEVAddRecExpr with loop other than current loop
ClosedPublic

Authored by wmi on Nov 8 2016, 3:35 PM.

Details

Summary

In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is not current loop, the formula will be marked as Loser and dropped.

Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled.

Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop.
Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop.

But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.

From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle. That is why the formula doesn't have to be dropped.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi updated this revision to Diff 77278.Nov 8 2016, 3:35 PM
wmi retitled this revision from to [LSR] Allow formula containing Reg for SCEVAddRecExpr with loop other than current loop.
wmi updated this object.
wmi added reviewers: sanjoy, atrick.
wmi set the repository for this revision to rL LLVM.
wmi added subscribers: llvm-commits, davidxl.

Hi Wei,

Looks mostly good to me.
Couple of comments regarding the test case.

Cheers,
-Quentin

test/Transforms/LoopStrengthReduce/nested-loop.ll
2 ↗(On Diff #77278)

Could you add a comment on what this loop is supposed to check?
I believe we want to stress that the relevant bit is the use of outer loop IV in inner loop.

2 ↗(On Diff #77278)

Instead of parsing the debug output, could you expose a test case that takes advantage of the newly kept formulae?
Basically, I am asking for check lines over the IR, not the debug output.

wmi added a comment.Nov 10 2016, 5:49 PM

Quentin, thanks for the review.

test/Transforms/LoopStrengthReduce/nested-loop.ll
2 ↗(On Diff #77278)

Added the comment on what the test is about to check.

2 ↗(On Diff #77278)

Ok. I changed the check to be over IR.

wmi updated this revision to Diff 77580.Nov 10 2016, 5:51 PM

Updated the patch per Quentin's comment.

qcolombet accepted this revision.Nov 14 2016, 1:02 PM
qcolombet added a reviewer: qcolombet.

Thanks Wei.

Looks good.

Cheers,
Quentin

test/Transforms/LoopStrengthReduce/nested-loop.ll
2 ↗(On Diff #77580)

Nitpick: Break the line in multiple lines.

This revision is now accepted and ready to land.Nov 14 2016, 1:02 PM
This revision was automatically updated to reflect the committed changes.
wmi updated this revision to Diff 87563.Feb 7 2017, 5:38 PM
wmi added subscribers: rengolin, t.p.northover, jmolloy.

First, sorry to revisit the patch after long time.

The patch was reverted after being commited because of the testcase failure: test/CodeGen/AArch64/eliminate-trunc.ll.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/412459.html.

Here is some explaination about how the patch broke the testcase and how it was fixed:

How the patch affect the testcase:

By allowing fomula containing SCEVAddRecExpr Reg from outer loop, LSR has more options to choose than before and it chooses to use fewer induction variables. Without the patch, two induction vars will be used. With the patch, only one induction var will be used.

Without the patch:

Before LSR:
  There is only one induction variable being used. The induction var will both be used in array suffix expr and also used in the compare expr in loop latch (other than in the array suffix expr, the induction var is used as a 32bit var), so it should be an 64 bit var.

After LSR:
  Two induction vars will be generated. One is used in array suffix expr. The other is used as loop iteration counter. They are separated from each other so the loop iteration counter can be a 32 bit var. That is what CHECK in the testcase is looking for.
   Note in asm, addrec instruction for the induction var used in array suffix expr is absorbed by post-increment load/store.

With the patch:

After LSR:
  Only one induction var will be used. Since it is both used in array suffix expr and used as loop iteration counter, it should be a 64 bit var. That is how the patch broke the testcase.

To make the testcase remain valid, I change the testcase and stop using outerloop induction var in array suffix expr, so we won't have fomula containing SCEVAddRecExpr Reg from outer loop, i.e., the patch will stop kicking in for the testcase.

Is it ok to fix the testcase like this and recommit the patch?

atrick edited edge metadata.Feb 7 2017, 9:38 PM

Thanks.

Hi Wei,

SGTM.

Cheers,
-Quentin