This is an archive of the discontinued LLVM Phabricator instance.

Fix LSR ImmCost calculation for profitable chains
AbandonedPublic

Authored by evstupac on Dec 1 2016, 2:11 PM.

Details

Reviewers
qcolombet
Summary

For the following case:
for(j = 0; j < n; j++) {

s += *(in++);
s += *(in++);

}
LSR consider "in" as profitable chain and do count ImmCost only for first load.
However when we have a base set before the loop
in += 1024;
ImmCost could be a significant value.

Diff Detail

Repository
rL LLVM

Event Timeline

evstupac updated this revision to Diff 79976.Dec 1 2016, 2:11 PM
evstupac retitled this revision from to Fix LSR ImmCost calculation for profitable chains.
evstupac updated this object.
evstupac added a reviewer: qcolombet.
evstupac set the repository for this revision to rL LLVM.
evstupac added subscribers: llvm-commits, anna, wmi, hfinkel.
qcolombet edited edge metadata.Dec 2 2016, 10:00 AM

Hi,

I am not sure I get the problem you are trying to solve.
You are saying that we want to account for an extra cost if the setting outside of the base is big, right?
Then, why is the average offset relevant to answer that question?

Thanks,
-Quentin

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1025

Use oxygen style: ///

test/Transforms/LoopStrengthReduce/X86/imm-cost.ll
27

Use opt -instnamer to get rid of implicit variables.

Hi,

Thanks for taking a look.

You are saying that we want to account for an extra cost if the setting outside of the base is big, right?

No. I just want correct behavior from current algorithm. LSR already take in account ImmCost, but do this incorrectly for "profitable chains".
When we do "CollectFixupsAndInitialFormulae()":
(line 2969) // Skip IV users that are part of profitable IV Chains.
We do not insert Fixups for memory instructions in the "profitable chain" and therefore do not count ImmCost for them in RateFormula():
(line 1156) for (const LSRFixup &Fixup : LU.Fixups) {

Then, why is the average offset relevant to answer that question?

As there is no exact Offset (see above) we can only estimate.

Thanks,
Evgeny

evstupac updated this revision to Diff 81192.Dec 13 2016, 12:49 AM
evstupac edited edge metadata.

Updated according to inline comments.

Hi,

Okay, I see what you are trying to solve now.

I'll have a closer look hopefully tomorrow.

Cheers,
-Quentin

Hi,

As there is no exact Offset (see above) we can only estimate.

That part bothered me in the first place. I don't see how we can derive good heuristics by injecting guesses in the input data. It sounds to me that we rely even more on the luck factor.

Looking a bit closer, I agree that the main problem is that we assume the fix ups are free for the profitable chain. Maybe we should change the profitability check for the chains altogether or have a different user for them, like one that only sums the immcost?
In particular, the code below that line could use some refinement:

// Incrementing by zero or some constant is neutral. We assume constants can
// be folded into an addressing mode or an add's immediate operand.

That being said, I believe we have another bug in the profitable computation. Namely, I believe this part

// An IV chain with a single increment is handled by LSR's postinc
// uses. However, a chain with multiple increments requires keeping the IV's
// value live longer than it needs to be if chained.
if (NumConstIncrements > 1)
  --cost;

Should be turned into:

// An IV chain with a single increment is handled by LSR's postinc
// uses. However, a chain with multiple increments requires keeping the IV's
// value live longer than it needs to be if chained.
if (NumConstIncrements > 1)
  ++cost;

I.e., having to keep something around should increase the cost of the chain, not decreasing it.

Fixing that seems to fix your test case, but I may have overlooked something admittedly.

Cheers,
-Quentin

Looking a bit closer, I agree that the main problem is that we assume the fix ups are free for the profitable chain. Maybe we should change the profitability check for the chains altogether or have a different user for them, like one that only sums the immcost?

That is exactly what I'm trying to implement in the patch. Basically "avg Offset" will most likely be just Offset, because it is hard to imagine how there could be more than 1 fixup for profitable chain.

In particular, the code below that line could use some refinement:

// Incrementing by zero or some constant is neutral. We assume constants can
// be folded into an addressing mode or an add's immediate operand.

...

Fixing that seems to fix your test case, but I may have overlooked something admittedly.

Yes it help with the test, but do not solve the problem.
Maybe we should this fix this as well as a part of another patch.

The following C example also suffers from incorrect immcost calculation (-fno-tree-vectorize required):
void foo(int n, char *x, char *y) {

char *xx = x - 1024;
char *yy = y - 2;
for (int i = 0; i < 1024; i++) {
  ++*xx++;
  ++*yy++;
}

}

Currently LLVM creates:

movq    $-1024, %rax

.LBB0_1:

incb    (%rsi,%rax)
incb    1022(%rdx,%rax)
incb    1(%rsi,%rax)
incb    1023(%rdx,%rax)
addq    $2, %rax
testl   %eax, %eax
jne     .LBB0_1

Assuming ImmCost(1022) < ImmCost(1024), but it should not as real ImmCost of the solution should be ImmCost(1022) + ImmCost(1023) which is > ImmCost(1024).

So the solution should look like:

movq    $0, %rax
addl    $-2, %rsi
addl    $-1024, %rdx

.LBB0_1:

incb    (%rsi,%rax)
incb    (%rdx,%rax)
incb    1(%rsi,%rax)
incb    1(%rdx,%rax)
addq    $2, %rax
cmpq   $1024, %rax
jne     .LBB0_1
qcolombet requested changes to this revision.Jan 25 2017, 1:36 PM

Hi,

Assuming ImmCost(1022) < ImmCost(1024), but it should not as real ImmCost of the solution should be ImmCost(1022) + ImmCost(1023) which is > ImmCost(1024).

Hold on, the constants that you are using for your cost is not what the LLVM IR has when LSR does the transformation.
Assuming I compiled the test case correctly, we traverse:
getelementptr inbounds i8, i8* %arg1, i64 -1024
getelementptr inbounds i8, i8* %arg2, i64 -2
For 0 to 1024 with a +2 increment (loop partially unrolled).

The IV we get is {-1024,+,2}, that means,
First access of %arg1 is through IV + arg1
Second access of %arg1 is through PrevArg1Access + 1 <-- this is supposed to be free by the chain profitability model
First access of %arg2 is through IV + arg2 + 1022
Second access of %arg2 is through PrevArg2Access + 1
<-- ditto
Then the comparison is against 0

In the end, the only ImmCost that LSR sees is 1022 and 0 for that solution. I understand you'd like to add 1023 (second access of arg2) and 1 (second access of arg1), but I don't think again the averaging the fixups with the number of same base address is the right way to do it.
Instead, if you believe the cost is not free for the second accesses, I would suggest to rework the profitability check for the chain, i.e., isProfitableChain.
More particularly, this snippet:

// Incrementing by zero or some constant is neutral. We assume constants can
// be folded into an addressing mode or an add's immediate operand.
if (isa<SCEVConstant>(Inc.IncExpr)) {
  ++NumConstIncrements;
  continue;
}

Thanks,
-Quentin

This revision now requires changes to proceed.Jan 25 2017, 1:36 PM

Second access of %arg1 is through PrevArg1Access + 1 <-- this is supposed to be free by the chain profitability model

It is free in terms of RegNum or AddRecCost... which has highest priority. But it is not always free in terms of ImmCost.

Narrowing profitable chains hits will lead to more Uses and exceed complexity limit. Which is not ok for RegNum...
So I don't think we should limit profitable chains somehow.

I don't see how I can increase ImmCost here:

// Incrementing by zero or some constant is neutral. We assume constants can
// be folded into an addressing mode or an add's immediate operand.
if (isa<SCEVConstant>(Inc.IncExpr)) {
  ++NumConstIncrements;
  continue;
}

If not count the same base address, I don't see how influence on ImmCost only.

but I don't think again the averaging the fixups with the number of same base address is the right way to do it.

We can add just cost of base offset.

Anyway there is no much profit from this (just some code size). I noticed this unexpected behavior during testing.
And I'm ok to leave the code as is.

evstupac abandoned this revision.May 2 2018, 12:03 PM