In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is not current loop, the formula will be marked as Loser and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop.
Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle. That is why the formula doesn't have to be dropped.
Details
- Reviewers
qcolombet atrick sanjoy - Commits
- rG8f20e63a20b1: [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with…
rG7ccf7651c0e1: [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
rL294814: [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with…
rL286999: [LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop.
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Wei,
Looks mostly good to me.
Couple of comments regarding the test case.
Cheers,
-Quentin
test/Transforms/LoopStrengthReduce/nested-loop.ll | ||
---|---|---|
2 | Could you add a comment on what this loop is supposed to check? | |
2 | Instead of parsing the debug output, could you expose a test case that takes advantage of the newly kept formulae? |
Thanks Wei.
Looks good.
Cheers,
Quentin
test/Transforms/LoopStrengthReduce/nested-loop.ll | ||
---|---|---|
3 | Nitpick: Break the line in multiple lines. |
First, sorry to revisit the patch after long time.
The patch was reverted after being commited because of the testcase failure: test/CodeGen/AArch64/eliminate-trunc.ll.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/412459.html.
Here is some explaination about how the patch broke the testcase and how it was fixed:
How the patch affect the testcase:
By allowing fomula containing SCEVAddRecExpr Reg from outer loop, LSR has more options to choose than before and it chooses to use fewer induction variables. Without the patch, two induction vars will be used. With the patch, only one induction var will be used.
Without the patch:
Before LSR: There is only one induction variable being used. The induction var will both be used in array suffix expr and also used in the compare expr in loop latch (other than in the array suffix expr, the induction var is used as a 32bit var), so it should be an 64 bit var. After LSR: Two induction vars will be generated. One is used in array suffix expr. The other is used as loop iteration counter. They are separated from each other so the loop iteration counter can be a 32 bit var. That is what CHECK in the testcase is looking for. Note in asm, addrec instruction for the induction var used in array suffix expr is absorbed by post-increment load/store.
With the patch:
After LSR: Only one induction var will be used. Since it is both used in array suffix expr and used as loop iteration counter, it should be a 64 bit var. That is how the patch broke the testcase.
To make the testcase remain valid, I change the testcase and stop using outerloop induction var in array suffix expr, so we won't have fomula containing SCEVAddRecExpr Reg from outer loop, i.e., the patch will stop kicking in for the testcase.
Is it ok to fix the testcase like this and recommit the patch?
Could you add a comment on what this loop is supposed to check?
I believe we want to stress that the relevant bit is the use of outer loop IV in inner loop.