The patch is to solve the case like following which current indvar cannot widen because of limitation caused by SCEV folding.
void foo(int nsteps, unsigned char *maxarray, int startx, int V1, int V2, unsigned *lined) {
int j, k;
for (j = 0; j < nsteps; j++)
startx = V1 + j * V2;
for (k = 1; k < size - 1; k++)
((unsigned char *)lined[startx + k]) = ...
We have a two level nested loops. There is a array store lined[startx + k] inside the inner loop. Its index is a result of add expr with nsw flag. The operand0 startx can be represented as a SCEV: {V1 ,+, V2}<outerloop> (Note: no nsw here). The operand1 k can be represented as a SCEV: {1 ,+, 1}<nsw><innerloop>.
Because of folding in ScalarEvolution::getAddExpr (folding loop invariant into add recurrence), the SCEV of the add expr is: {{(1 + V1)<nsw> ,+, V2}<outerloop> ,+, 1}<nsw><innerloop>. As a result of the folding, IndVar cannot prove sext{{(1 + V1)<nsw> ,+, V2}<outerloop> ,+, 1}<nsw><innerloop> == sext( {V1 ,+, V2}<outerloop>) +nsw sext( {1 ,+, 1}<nsw><innerloop>) ("The major reason is there is no nsw for {V1 ,+, V2}<outerloop>"), so it cannot do widening for the add expr and cannot remove the sext of the index.
However, since we know from IR that the add expr "startx + k" has nsw flag, we know it is legal to do widening for the expr. We already utilize the nsw flag from IR to enhance the widening by using getExtendedOperandRecurrence. Why it doesn't work here?
To answer the question by myself, the problem is that in this case both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence return non-null but different WideAddRec. Because getWideRecurrence is called before getExtendedOperandRecurrence, we won't bother to use the result of getExtendedOperandRecurrence. However, if WidenIV::getExtendedOperandRecurrence returns non-null WideAddRec, we know for sure that it is legal to do widening for current instruction, so we should actually put getExtendedOperandRecurrence before getWideRecurrence. This is what the patch is doing.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lgtm, thanks!
If it isn't too much trouble, can you add a zext equivalent of the test case too, just as a sanity check?
Add zext to the iv-widen.ll test.
I still need to update iv-widen-elim-ext.ll. Now SE->isKnownPredicate return false for "%add = add nsw i32 %i.02, 2" in foo because %add feeds into zext and zext doesn't propagate full poison, so the nsw flag cannot be copied from IR to SCEV. As a result, NarrowIVDefUse::NeverNegative for %add is false and my change here will generate an extra trunc before zext. The fact that existing compiler doesn't generate the extra trunc is by luck to some extent.
I update the test to add another use for %add: "udiv 5, %add", which is to ensure %add is not poison otherwise there will be undefine behavior, and ensure that nsw flag on %add instruction can be copied to its SCEV, so that NarrowIVDefUse::NeverNegative for %add will be true.
After the test update, w/wo the change here, indvars will not generate extra trunc for sext/zext for foo in the testcase.