This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] trip count calculation for loops with unknown stride
ClosedPublic

Authored by pankajchawla on Jul 14 2016, 12:09 PM.

Details

Summary

Hello,

The patch teaches ScalarEvolution to compute trip count for loops such as these-

for(int i=0; i<n; i+=s) {
  A[i] = i;
}

The basic idea is that if the IV has a nsw/nuw flag and loop has no side effects then we can assume the direction of the loop and compute the correct backedge taken count.

Diff Detail

Event Timeline

pankajchawla retitled this revision from to [SCEV] trip count calculation for loops with unknown stride.
pankajchawla updated this object.
pankajchawla added a reviewer: sanjoy.
pankajchawla added a subscriber: llvm-commits.

You have to be a bit more careful here... consider:

for (int i=0; i<16; i-=2) {
    if (i < -2000) break;
}

Here, the backedge is taken 1000 times. (There are actually two related cases here: one, the case where there's an explicit break, and the case where some call in the loop throws an exception.)

It might be sufficient to show that the loop only has one exit and loopHasNoAbnormalExits() is true; not 100% sure about that.

llvm/lib/Analysis/ScalarEvolution.cpp
8794

I don't follow why this check is necessary.

Hi

You have to be a bit more careful here... consider:

for (int i=0; i<16; i-=2) {
    if (i < -2000) break;
}

Here, the backedge is taken 1000 times. (There are actually two related cases here: one, the case where there's an explicit break, and the case where some call in the loop throws an exception.)

It might be sufficient to show that the loop only has one exit and loopHasNoAbnormalExits() is true; not 100% sure about that.

Hi Eli,

The single exit condition is taken care of by checking the NoWrap flag as it is only set when ControlsExit is true, on line 8657.

We also need to check for loop entry guard because a single-trip do-while loop can have a negative stride. In this case, we will compute the wrong trip count. Consider this loop-

int i = 0;

do {
A[i] = i;
i += s;
} while (i<n);

This is a single trip loop if n = -10 and s = -1.

pankajchawla added inline comments.Jul 14 2016, 1:24 PM
llvm/lib/Analysis/ScalarEvolution.cpp
8794

Sorry, this is my first time using Phabricator.
I replied to this in my non-inline comment.

Oh, right, missed the ControlsExit. That doesn't help with the exception-throwing case, though.

Another potentially interesting case: what if the stride is zero? Then you have an infinite loop. I have no idea how to resolve this issue.

For the isLoopEntryGuardedByCond bit, I guess the issue is that you basically want "stride < 0 ? 0 : computedcount", and you can't write that in terms of SCEV expressions? That's reasonable, but you should probably call that out specifically in a comment.

Added check for loopHasNoAbnormalExits() and a comment about bailing out for do-while loops.

sanjoy requested changes to this revision.Jul 15 2016, 12:25 AM
sanjoy edited edge metadata.

Hi Pankaj,

There are three issues with the current patch:

  • Even after proving a non-negative stride, you cannot generally assume a non-zero stride. A zero stride is possible if the loop has a side effect in it (i.e. a volatile read or write, or some IO). If the loop does not have a side effect then running forever is undefined behavior, so proving the loop does not have side effects + ControlsExit should be enough to prove that stride is non-zero. A good place to compute it is in a common function that also computes loopHasNoAbnormalExits at the same time (to avoid traversing the loop body twice).
  • It looks like you're saying if Start - Stride < RHS then the loop cannot be a single iteration loop. I think that is incorrect, what if we have:
    • Start = -1, Stride = -2, RHS = 5 for an unsigned compare? Start - Stride = 1 which is u< RHS, but if the backedge is controlled by {-1,+,-2} u< 5 then it won't be taken even once.
    • Start = INT_MAX, Stride = -1, RHS = 5 for a signed compare? Start - Stride = INT_MIN which is s< RHS, but if the backedge is controlled by {INT_MAX,+,-1} s< 5 then it won't be taken even once.

      I think the right precondition is Start < RHS -- if that precondition is true then we know that {Start,+,Stride} is true on the first iteration, and if Stride is negative then we'll keep executing the loop till {Start,+,Stride} wraps.
  • I don't think the logic as written is correct for unsigned compares. I can legitimately have {0,+,-1}<nuw> u< -1 execute one backedge (i.e. with a "negative" step).

Finally, we should also do this in howManyGreaterThans for symmetry,
but that's a separate change and does not need to be addressed in this
revision.

This revision now requires changes to proceed.Jul 15 2016, 12:25 AM

Hi Sanjoy,

Please see my replies inlined.

Thanks,
Pankaj

Hi Pankaj,

There are three issues with the current patch:

  • Even after proving a non-negative stride, you cannot generally assume a non-zero stride. A zero stride is possible if the loop has a side effect in it (i.e. a volatile read or write, or some IO). If the loop does not have a side effect then running forever is undefined behavior, so proving the loop does not have side effects + ControlsExit should be enough to prove that stride is non-zero. A good place to compute it is in a common function that also computes loopHasNoAbnormalExits at the same time (to avoid traversing the loop body twice).

loopHasNoAbonormalExits() calls isGuaranteedToTransferExecutionToSuccessor() which checks for volatile loads/stores. This function also has a comment inside implying that this function checks for side effects. What other checks do we need? I don't know what it takes to prove that the loop has no side effects. Can you please give me some pointers?

  • It looks like you're saying if Start - Stride < RHS then the loop cannot be a single iteration loop. I think that is incorrect, what if we have:
    • Start = -1, Stride = -2, RHS = 5 for an unsigned compare? Start - Stride = 1 which is u< RHS, but if the backedge is controlled by {-1,+,-2} u< 5 then it won't be taken even once.
    • Start = INT_MAX, Stride = -1, RHS = 5 for a signed compare? Start - Stride = INT_MIN which is s< RHS, but if the backedge is controlled by {INT_MAX,+,-1} s< 5 then it won't be taken even once.

      I think the right precondition is Start < RHS -- if that precondition is true then we know that {Start,+,Stride} is true on the first iteration, and if Stride is negative then we'll keep executing the loop till {Start,+,Stride} wraps.
  • I don't think the logic as written is correct for unsigned compares. I can legitimately have {0,+,-1}<nuw> u< -1 execute one backedge (i.e. with a "negative" step).

I think there is some misunderstanding here. The reason we are checking for Start - Stride < RHS rather than Start < RHS is because the AddRec is coming from the incremented value of the IV used in the backedge.

Consider this for loop-

for (i=start; i<end; i+=stride)
  A[i] = i

In LLVM's canonical form, it will look like this-

i = start
if (start < end) {
  do {
    A[i] = i;
    i += stride;    // We pick up AddRec from updated i which at this point is start + stride.
  } while (i < end)
}

For your first example when the IV is {-1,+,-2} it cannot have nuw flag.
Please let me know if I have misunderstood you.

Finally, we should also do this in howManyGreaterThans for symmetry,
but that's a separate change and does not need to be addressed in this
revision.

Yes, we should but I cannot write a positive test for this case. The reason is that we do not propagate the nsw flag to the IV for cases like this-

for (i=n; i>0; i+=s)
  A[i] = i;

I discovered two additional issues with the variants of the original loop mentioned in the description. I brought this up in the llvm-dev mailing list. One is the lack of nsw propagation which I just mentioned. The second issue seems to be related to canonicalization. So if we change the loop control condition from i<n to i<=n, ScalarEvolution is not able to compute the trip count.

The canonical form of this loop looks something like this-

if (0 <= n) {
  i = 0;
  do {
    A[i] = i;
    i += s; // NSW add
  } while (! (i > n));  // sgt compare
}

The 'sgt' compare is inverted to 'sle' for analysis. ScalarEvolution isn't really expecting 'sle' in canonicalized loops so it reverts to brute force exit count computation using computeExitCountExhaustively() which doesn't work.

pankajchawla requested a review of this revision.Jul 20 2016, 12:33 PM
pankajchawla edited edge metadata.

Bumping this up again.

Sanjoy,

Can you please respond to my earlier comments so I know how to proceed?

Thanks,
Pankaj

Re: the side-effect issue: You're trying to prove the loop will eventually either exit or execute undefined behavior. loopHasNoAbnormalExits proves that if the loop exits, it will exit via a visible edge (so it can't throw an exception or kill the program). There's a gap between the two: loopHasNoAbnormalExits doesn't care whether there's a nested infinite loop. In practice, this becomes an issue with atomic operations.

wmi added a subscriber: wmi.Jul 21 2016, 3:30 PM

Hi Pankaj,

loopHasNoAbonormalExits() calls isGuaranteedToTransferExecutionToSuccessor() which checks for volatile loads/stores. This function also has a comment inside implying that this function checks for side effects. What other checks do we need? I don't know what it takes to prove that the loop has no side effects. Can you please give me some pointers?

You need to make sure that there are no volatile loads or stores, and no function calls that may write memory.

It is possible that isGuaranteedToTransferExecutionToSuccessor incidentally checks for the same thing; but I'd prefer an explicit loopHasNoSideEffects() that checks that the only instructions in the loop returning true for mayHaveSideEffects() are non-volatile stores non atomic stores.

  • It looks like you're saying if Start - Stride < RHS then the loop cannot be a single iteration loop. I think that is incorrect, what if we have:
    • Start = -1, Stride = -2, RHS = 5 for an unsigned compare? Start - Stride = 1 which is u< RHS, but if the backedge is controlled by {-1,+,-2} u< 5 then it won't be taken even once.
    • Start = INT_MAX, Stride = -1, RHS = 5 for a signed compare? Start - Stride = INT_MIN which is s< RHS, but if the backedge is controlled by {INT_MAX,+,-1} s< 5 then it won't be taken even once.

      I think the right precondition is Start < RHS -- if that precondition is true then we know that {Start,+,Stride} is true on the first iteration, and if Stride is negative then we'll keep executing the loop till {Start,+,Stride} wraps.
  • I don't think the logic as written is correct for unsigned compares. I can legitimately have {0,+,-1}<nuw> u< -1 execute one backedge (i.e. with a "negative" step).

For your first example when the IV is {-1,+,-2} it cannot have nuw flag.

The example I was thinking of was:

for (unsigned i = 1; i < 5; i += (-2)) {
}

i = 1;
if (i < 5) {
  do {
    i += (-2); // NUW
  } while (i u< 5);
}

then the pre-increment IV is {1,+,-2}. It does not unsigned-overflow on the first increment (1 + (-2) == -1), and then it exits the loop.

But that's besides the point -- generally, whether the increment operation has nuw or nsw is distinct from whether the SCEV expression mapped to the increment operation is nuw or nsw. Specifically, we mark {A,+,B} as nsw (resp nuw) if sext({A,+,B}) (resp. zext({A,+,B})) is equal to {sext A,+,sext B} (resp. {zext A,+,zext B}) for every iteration of the loop that executes. In some cases we can use the no-wrap flags on the increment to prove this, but e.g. for a loop that that does not take the backedge even once, every add recurrence in the loop is nuw and nsw irrespective of the no-wrap flags present in the increment operations. That includes loop like this:

i = INT_MIN
if (i s< 5) {
  do {
    i += (-1);
  } while (i s< 5);
}

The increment operation itself is not NSW, but the add recurrence mapped to the post-incremented value of i can still be marked NSW, since it follows our rule that for every iteration of the loop (there is only one), sext({INT_MIN,+,-1}) == {sext(INT_MIN), +, sext(-1)} (since in the one iteration, both the LHS and RHS are sext(INT_MIN)).

Now it is unlikely that SCEV will mark the add recurrence with NSW in the trip computing code itself, since it does not yet know the trip count; but marking it is NSW is legal, and we must allow for it.

pankajchawla edited edge metadata.

Added check for side effects in the loop.

Hi Sanjoy,

Thanks for explaining how we mark nuw/nsw for the AddRecs!

I have added logic to check for side effects in the loop in loophasNoSideEffects().

Regarding the loop entry guard check:
Please note that I haven't changed this check and I believe the original check is correct. The loop entry guard tells us whether the loop will be executed at least once. It cannot tell us whether the loop will be executed more than once. Please also note that the backedge taken count for the loop mentioned in the description is: ((-1 + %n) /u %s)
A more generic formula with unknown initial value is: ((-1 + %n - %init) /u %s)

Now lets go through your examples:

The example I was thinking of was:

for (unsigned i = 1; i < 5; i += (-2)) {
}

i = 1;
if (i < 5) {
  do {
    i += (-2); // NUW
  } while (i u< 5);
}

then the pre-increment IV is {1,+,-2}. It does not unsigned-overflow on the first increment (1 + (-2) == -1), and then it exits the loop.

The backedge taken count for this loop according to the formula above will be:
(5 -1 -1) /u (-2) ==> 0

So it is correct.

Probably the confusion is because I am using the existing terminology in the code of a positive or negative stride. May be we should be classifying them as count up or count down loops instead.

Here's your second example-

i = INT_MIN
if (i s< 5) {
  do {
    i += (-1);
  } while (i s< 5);
}

I believe this loop has undefined behavior since adding INT_MIN and -1 leads to signed underflow.

I guess my point is that once we have checked the loop entry guard, single trip loops will be handled just as well as multi-trip loops.
Please let me know if I am still missing anything.

Please note that the nsw marking on the IV AddRec is quite conservative. If I initialize the IV to anything else than 0 in the original loop, we lose the marking and the backedge taken count is not computed.

Hi Pankaj,

Regarding the loop entry guard check:

Please note that I haven't changed this check and I believe the
original check is correct. The loop entry guard tells us whether the

Yes, for what it is doing it is correct.

loop will be executed at least once. It cannot tell us whether the
loop will be executed more than once. Please also note that the
backedge taken count for the loop mentioned in the description is:
((-1 + %n) /u %s)

A more generic formula with unknown initial value is: ((-1 + %n -
%init) /u %s)

Now lets go through your examples:

The example I was thinking of was:

for (unsigned i = 1; i < 5; i += (-2)) {
}

i = 1;
if (i < 5) {
  do {
    i += (-2); // NUW
  } while (i u< 5);
}

then the pre-increment IV is {1,+,-2}. It does not unsigned-overflow on the first increment (1 + (-2) == -1), and then it exits the loop.

The backedge taken count for this loop according to the formula above will be:
(5 -1 -1) /u (-2) ==> 0

So it is correct.

I was trying to come up with examples where (based on the change
description):

  • The add recurrence, AR, being compared to in the loop exit check is legitimately nuw or nsw
  • The loop decrement amount (stride) is negative
  • The loop is guarded by "AR->getStart() - AR->getStride()"

If I understand you correctly, you're saying that even if the stride
is negative the expression that computes the backedge taken count is
valid. That may be correct (I'll re-review with that understanding),
but then the change description needs to say that instead of saying
"then we can assume the stride to be positive." Is that a correct
assessment?

Btw, if you think a real-time chat can help and are in PDT, we can
talk about this on the #llvm IRC channel on Monday.

Here's your second example-

i = INT_MIN
if (i s< 5) {
  do {
    i += (-1);
  } while (i s< 5);
}

I believe this loop has undefined behavior since adding INT_MIN
and -1 leads to signed underflow.

I was using that as a shorthand for LLVM IR. :) You could have
something like:

i = INT_MIN
if (i s< 5) {
  do {
    i = wrapping_add(i, -1);
  } while (i s< 5);
}

with

int wrapping_add(int a, int b) {
  return (int)(((unsigned)a) + ((unsigned)b));
}

I guess my point is that once we have checked the loop entry guard, single trip loops will be handled just as well as multi-trip loops.
Please let me know if I am still missing anything.

Please note that the nsw marking on the IV AddRec is quite conservative. If I initialize the IV to anything else than 0 in the original loop, we lose the marking and the backedge taken count is not computed.

Hi Sanjoy,

If I understand you correctly, you're saying that even if the stride
is negative the expression that computes the backedge taken count is
valid. That may be correct (I'll re-review with that understanding),
but then the change description needs to say that instead of saying
"then we can assume the stride to be positive." Is that a correct
assessment?

Your assessment is correct :)
I have updated the description. Hopefully, it makes more sense now. I am not sure what is the correct terminology to use here.

Btw, if you think a real-time chat can help and are in PDT, we can
talk about this on the #llvm IRC channel on Monday.

Yes, I am in PDT but now that (I think) we are on the same page this may not be needed. We can chat if you still think it is needed.

sanjoy added inline comments.Jul 26 2016, 12:33 AM
llvm/lib/Analysis/ScalarEvolution.cpp
4956

No braces needed here. Also, I'd drop the else, and just have:

if (X)
  return Y;
return Z;
4956

Use auto * to denote that SI is a pointer.

8730–8731

As we discussed, it isn't that given the conditions you're testing we can prove that the stride is non-negative, but that the math below is correct even if the stride is negative.

Also, I'd separate out the !NoWrap || !loopHasNoAbnormalExits(L) || !loopHasNoSideEffects(L) check into it's own if and have a small comment describing why they're necessary / sufficient (the example you had in the llvm-dev thread would be nice to add too).

8795

Here and in the comment in the test case, we're not assuming the stride is positive, but that the trip count math is correct even if stride is negative given that the conditions we checked above are correct; so I think the comment and the variable name needs to change.

I think there still is a problem with the unsigned variant of this
patch:

i8 index  = 127;
i8 limit  = 128;
i8 stride = 129;

if (index u< limit) {
  do {
    index = wrapping_add(index, stride);
  } while (index u< limit);
}

The backedge count of the loop above is 1. Given that, the post-inc
can be marked nuw (since it goes from 127 -> 255 only).
However, the formula (limit - start + (stride - 1)) /u stride
computes (128 - 0 + 128) /u 129 == 0.

Hi Sanjoy,

You example seems correct. Ideally, we should check the wrap flags on the phi associated with the IV but I think an alternative is to check that the stride is not known to be either positive or negative. The wrap flags cannot be added by ScalarEvolution for unknown strides. This is the case we were originally trying to fix anyway. If you agree, I will include this check in my next revision.

I will also make the changes you suggested in your inline comments.

Thanks,
Pankaj

pankajchawla added inline comments.Jul 26 2016, 2:52 PM
llvm/lib/Analysis/ScalarEvolution.cpp
8795

I think this check is unnecessary as the computed backedge taken count is correct for single trip do-while loops as well.

For a do-while loop like this-

int i = init;
do {
  A[i] = i;
  i += s;
} while (i<n);

The computed backedge taken count is: ((-1 + (-1 * %init) + ((%init + %s) smax %n)) /u %s)
This seems correct because if %n is less than (%init + %s), it reduces to ((-1 + %s) /u %s) ==> 0.

I will remove this check.

pankajchawla added inline comments.Jul 26 2016, 3:21 PM
llvm/lib/Analysis/ScalarEvolution.cpp
8730–8731

The check for abnormal exits seems redundant now because loopHasNoSideEffects() is more conservative than it. I am removing this check as well.

  • Added check to prove that the stride is truly unknown to ScalarEvolution so that the wrap flags are not propagated in edge cases.
  • Removed unnecessary check for loopHasNoAbnormalExits() and loop entry guard.
  • Fixed comments and the test.
sanjoy requested changes to this revision.Aug 1 2016, 7:13 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
8738

nit: "here -"

8752

I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride, there are certain things SCEV cannot prove. This will create a tricky coupling between different parts and e.g. introduces the possibility that we'll get miscompiles if we make an unrelated part of SCEV smarter in some creative way (like you're doing here :) ).

I think it is better to assume that the other bits in SCEV are omniscient (i.e. they may (but are not guaranteed to) prove anything that is factually correct at runtime), and base our inferences on that.

The other thing that's missing here is some clear justification of why what you're doing is correct. This does not necessarily have to be a formal proof (though that'd make me really happy :) ) but at least a clear statement of preconditions you've assumed, and why that implies that the trip count of (max(rhs, start) - start + stride - 1) / u stride is correct.

This revision now requires changes to proceed.Aug 1 2016, 7:13 PM
pankajchawla edited edge metadata.

Added bailout condition for predicated IVs.

pankajchawla marked an inline comment as done.Aug 2 2016, 1:08 PM
pankajchawla added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
8752

I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride, there are certain things SCEV cannot prove. This will create a tricky coupling between different parts and e.g. introduces the possibility that we'll get miscompiles if we make an unrelated part of SCEV smarter in some creative way (like you're doing here :) ).

I think it is better to assume that the other bits in SCEV are omniscient (i.e. they may (but are not guaranteed to) prove anything that is factually correct at runtime), and base our inferences on that.

The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of start, end and stride). So this smartness should then get reflected in either isKnownPositive(stride) or isKnownNonPositive(stride) which this change is guarded under. Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted. I cannot think of a foolproof way to prune the 'edge cases'. Also, just so you know, scalar evolution is successfully able to compute the trip count for the edge case mentioned in the comments by computing the trip count exhaustively before ever reaching this section of the code.

The other thing that's missing here is some clear justification of why what you're doing is correct. This does not necessarily have to be a formal proof (though that'd make me really happy :) ) but at least a clear statement of preconditions you've assumed, and why that implies that the trip count of (max(rhs, start) - start + stride - 1) / u stride is correct.

I am not sure whether you are not convinced that this is correct or you just want this to be documented properly. I have tried to document all the preconditions. I will take a crack at giving the proof in case you are not convinced.
Please bear with me as I am neither a mathematician nor a theoretical CS guy :)
Please also keep in mind that I am excluding all edge cases in the proof.

First lets define the loop structure we are handling-

i = start
do {
  A[i] = i;
  i += s;
} while (i < end);

The proposition is that the backedge taken count of this loop for any combination of start, stride and end is:
(max(end, start + stride) - start - 1) /u stride

Since I haven't changed the original backedge taken count formula, I assume that you agree that the formula is correct when stride is known to be positive (original behavior of the function).

Now, these are our preconditions-

a) Comparator is either ICMP_ULT or ICMP_SLT.
b) IV is either nuw or nsw depending upon the comparator type in a).
c) loop has single exit with no side effects.

We can split all the possible scenarios for this loop into these three cases-

  1. Positive Stride - This case reverts to the original behavior of the function which we assumed to be correct.
  1. Zero stride - Precondition c) implies that stride cannot be zero as that would imply an infinite loop which is undefined behavior.
  1. Negative stride - Precondition a) and b) together imply that this is only possible for single trip loops, which means ((start + stride) >= end) holds true. So the formula reduces like this-

(max(end, start + stride) - start - 1) /u stride ==>
(start + stride - start - 1) /u stride ==>
(stride - 1) /u stride ==> 0.

Hence proved :)

sanjoy added inline comments.Aug 5 2016, 5:53 PM
llvm/lib/Analysis/ScalarEvolution.cpp
8746

Nit: blank before -

8752

The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of start, end and stride).

So this smartness should then get reflected in either isKnownPositive(stride) or isKnownNonPositive(stride) which this change is guarded under.

Ideally we should not rely on things like isKnownPositive or isKnownNonPositive being uniformly precise all of the time -- i.e. an implementation that returns false unconditionally every 10th time these are called should be "okay" (terrible, but correct). This isn't just a thought experiment either -- isKnownPositive (since it internally calls getRange) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove isKnownPositive.

Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted.

The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons. The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares. However, I'm not sure if bailing out on unsigned compares will actually work for the case that motivated this patch.

I am not sure whether you are not convinced that this is correct or you just want this to be documented properly.

I was asking for a short comment documenting on why your line of reasoning is correct.

sanjoy requested changes to this revision.Aug 5 2016, 5:58 PM
sanjoy edited edge metadata.
This revision now requires changes to proceed.Aug 5 2016, 5:58 PM
pankajchawla added inline comments.Aug 8 2016, 3:08 PM
llvm/lib/Analysis/ScalarEvolution.cpp
8752

Ideally we should not rely on things like isKnownPositive or isKnownNonPositive being uniformly precise all of the time i.e. an implementation that returns false unconditionally every 10th time these are called should be "okay" (terrible, but correct).

I am not relying on their preciseness, I am relying on their correctness. They can (and should) conservatively return false which would be fine. Returning false unconditionally every 10th time is not okay, it is clearly incorrect. What if I call it on the same value 10 times? Are you suggesting that it is ok for them to return true the first 9 times and false the 10th time for the same value?

isKnownPositive (since it internally calls getRange) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove isKnownPositive.

If this is the case, I believe this is an issue with ScalarEvolution's implementation. The cache for the stride cannot be invalidated independently of the add recurrence it occurs in. We should invalidate SCEV for all values dependent on stride like what we do in forgetValue() otherwise ScalarEvolution is internally inconsistent. If we do not do this, the wrap flags applied on the add recurrence can be wrong and can lead to miscompilation. If isKnownPositive() cannot be re-proven, then the previous trip count computation which was made on this assumption is now invalid and therefore should be invalidated as well.

The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons.

I can agree on the principle but I think you are applying it incorrectly. As a component, ScalarEvolution's different sub-components have to be in 'sync' with each other otherwise no correctness guarantees can be provided by the component as a whole. I am not limiting the omniscience of other parts of SCEV here, I am merely assuming that when they grow stronger, they grow stronger in sync.

The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares.

The example you provided invalidates both signed and unsigned IVs. This is the IR produced by LLVM for your example-

define void @foo(i8* nocapture %A, i32 %n) local_unnamed_addr #0 {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %i.07 = phi i8 [ 127, %entry ], [ %add, %for.body ]
  %idxprom = zext i8 %i.07 to i64
  %arrayidx = getelementptr inbounds i8, i8* %A, i64 %idxprom
  store i8 %i.07, i8* %arrayidx, align 1
  %add = add i8 %i.07, -127
  %cmp = icmp sgt i8 %add, -1
  br i1 %cmp, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  ret void
}

This is the SCEV form of %add-
{0,+,-127}<nsw><%for.body> U: [-127,1) S: [-127,1)

There is already an nsw flag on it, and an nuw flag can be applied as well. So I cannot restrict it to signed IVs.

I was asking for a short comment documenting on why your line of reasoning is correct.

I will add a short version of the proof in the comments if we can agree that this patch should be committed.

sanjoy accepted this revision.Aug 8 2016, 11:27 PM
sanjoy edited edge metadata.

lgtm as mentioned inline

llvm/lib/Analysis/ScalarEvolution.cpp
8752

I can agree on the principle but I think you are applying it incorrectly. As a component, ScalarEvolution's different sub-components have to be in 'sync' with each other otherwise no correctness guarantees can be provided by the component as a whole. I am not limiting the omniscience of other parts of SCEV here, I am merely assuming that when they grow stronger, they grow stronger in sync.

You're right, I think I'm getting carried away in some excessively rigid chain of reasoning which isn't really applicable here. I'm fine with this change landing as-is (with a short comment informally describing why what you've changed is correct).

This revision is now accepted and ready to land.Aug 8 2016, 11:27 PM
pankajchawla edited edge metadata.

Added more elaborate comments explaining why the change is correct.

Let me know if you want me to check this in on you behalf. Otherwise lgtm as stated.

pankajchawla added inline comments.Aug 9 2016, 12:11 PM
llvm/lib/Analysis/ScalarEvolution.cpp
8752

Thanks!
I have updated the comments.

This revision was automatically updated to reflect the committed changes.
rnk added a subscriber: rnk.Aug 16 2016, 2:08 PM

This change appears to have introduced a miscompile of some opus code in Chromium: http://crbug.com/638314

I'm going to speculatively revert this, and I'll get some reproducer files soon.

rnk added a comment.Aug 16 2016, 2:15 PM

This is the pre-processed source and command line of one of a few TUs in opus that changed due to this patch. While I haven't stared at the IR diff long enough to understand if this is truly a miscompile, it's pretty suspicious to me. PTAL

Hi Sanjoy,

This seems to be an issue with indvars pass.

I have uploaded a smaller version of the reproducer (extracted from the original). The bug can be reproduced using this command-

$ opt -indvars indvars.ll -S

Before indvars, the loop for.body's IV is like this-

for.body:                                         ; preds = %for.body.lr.ph, %for.body
  %index_Q16.0928 = phi i32 [ 0, %for.body.lr.ph ], [ %add187, %for.body ]
  ...
  %add187 = add nsw i32 %index_Q16.0928, %index_increment_Q16
  %cmp = icmp slt i32 %add187, %max_index_Q16
  br i1 %cmp, label %for.body, label %sw.epilog.loopexit

The SCEV form of post-incremented IV %add187 is this:

{%index_increment_Q16,+,%index_increment_Q16}<nsw><%for.body>

The backedge taken count of the loop is like this:

((-1 + %max_index_Q16) /u %index_increment_Q16)

After indvars, the IV is extended to 64 bits but the problem is that the stride is zero-extended instead of getting sign-extended. This is incorrect as the stride can be negative. Here's the IR after modification-

for.body.lr.ph:                                   ; preds = %for.cond.preheader
%0 = zext i32 %index_increment_Q16 to i64
 br label %for.body

for.body:
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.lr.ph ]
  ...
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, %0
  br i1 %cmp, label %for.body, label %sw.epilog.loopexit

I am not familiar with indvars code. Can you please help?

Thanks,
Pankaj

Please ignore my previous email. My analysis was incorrect. The stride cannot be negative here as it will lead to UB.

@Reid
You mentioned that this is one of the TU which is changed due to the patch. Are you sure this one is causing the miscompilation?
Does the issue go away if we disable optimizations on this TU?
Did you run opt-bisect on it? Does it point to any pass? If not, is it possible to run it?

As the change is in an analysis, the issue can be in any transformation using this analysis rather than the analysis itself. Any help from your side in narrowing down the issue is greatly appreciated.

I will keep looking into the generated IR for this file in the meantime.

Thanks,
Pankaj

@rnk

I can't find anything wrong with the IR. I am awaiting your response for more information.

-Pankaj

rnk added a comment.Aug 25 2016, 2:30 PM

Sorry, that was the wrong TU. This is a small, single function TU that, when recompiled with your SCEV change, causes Chromium's tests to fail. Your change does not cause a difference in the optimized LLVM IR, only in the final assembly, so you can debug this with llc.

Thanks for the additional info!

Do I need to specify any options with llc like -O2?

I haven't worked with llc before so any tips would be appreciated.
Would it be possible to use bugpoint to narrow down the issue?

rnk added a comment.Aug 25 2016, 3:15 PM

You don't need any special options to reproduce, it should pick up the target from the .ll file. You could use bugpoint, but the test case is already pretty small, so I wouldn't bother. Run llc with -debug and try to figure out which pass is using SCEV to make bad choices. I would suspect LSR, which I think is a backend pass.

Hi all,

I have a fix for the issue. The max backedge taken count was incorrectly set to 1 for loops with unknown strides which resulted in LSR perofming the wrong transformation. I verified that we generate identical assembly code for the reproducer without my patch and with my patch + additional fix.

It looks like I am not able to upload the diff to this review since it is closed. Do I have to open a new review?

Thanks,
Pankaj

efriedma reopened this revision.Sep 2 2016, 2:23 PM
efriedma added a subscriber: efriedma.

There's a "Reopen" action available at the bottom of the page.

This revision is now accepted and ready to land.Sep 2 2016, 2:23 PM
pankajchawla removed rL LLVM as the repository for this revision.

Fixed max backedge count computation for loops with unknown stride by assuming a min stride of 1.

@Eli
Thanks for the info!

@sanjoy
The additional fix is minimal and should have been part of the original changeset so I updated it here.
Please review.

sanjoy requested changes to this revision.Sep 11 2016, 7:06 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
8785

I just noticed that doesIVOverflowOnLT only works for a positive stride, can you please take a look? For a non-positive Stride, I think we can make doesIVOverflowOnLT just return NoWrap.

8807

I'd feel safer if you short-circuited the entire max becount computation based on PositiveStride, since MinStride is no longer what it says on the tin ("the minimum value of Stride). (To me) it basically looks like the expression below that computes the max be count implicitly assumes a positive stride, and forking the logic more explicitly will be more obvious.

If that's a problem, I'd suggest at least renaming MinStride to something more in tune with what it represents now (perhaps StrideForMaxBECount?).

This revision now requires changes to proceed.Sep 11 2016, 7:06 PM
pankajchawla edited edge metadata.

Changes to address Sanjoy's comments.

pankajchawla added inline comments.Sep 12 2016, 5:36 PM
llvm/lib/Analysis/ScalarEvolution.cpp
8785–8786

The comments for this function imply that it expects a positive stride-
Verify if an linear IV with positive stride can overflow when...

I think this makes sense because the condition in the loop latch is of the form (i < N). I added an assert to the function verifying that the incoming stride is indeed positive.

This check seems to be useless for unknown strides so I skip it.

8808

I renamed MinStride to StrideForMaxBECount. I would have had to duplicate code to completely separate max backedge count computation for the two cases.

sanjoy accepted this revision.Sep 12 2016, 9:53 PM
sanjoy edited edge metadata.

lgtm with a minor nit

llvm/lib/Analysis/ScalarEvolution.cpp
8810

If it's all the same to you, I'd suggest avoiding a nested ternary and using an if instead.

This revision is now accepted and ready to land.Sep 12 2016, 9:53 PM
pankajchawla added inline comments.Sep 12 2016, 11:43 PM
llvm/lib/Analysis/ScalarEvolution.cpp
8810

I will change it to an if.

This revision was automatically updated to reflect the committed changes.
delena added a subscriber: delena.Mar 17 2020, 1:36 AM
delena added inline comments.
llvm/trunk/lib/Analysis/ScalarEvolution.cpp
4965 ↗(On Diff #71645)

This code is incorrect for masked store and scatter. It returns true for simple store, but false for the masked intrinsics.
As a result we can't calculate trip count for the loop "j" after vectorization of the loop "i":

for (int = 0; j < jCount; j+=Step) {

  for (int i = 0; i < 1023; ++i) {
    int ind = j*jStride + k*kStride + i;
    B[ind] = A[ind] + 2;
  }
}