This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
1/22
ScalarEvolution.cpp
-
test/Analysis/ScalarEvolution/
-
Analysis/
-
ScalarEvolution/
-
trip-count-unknown-stride.ll

Differential D22377

[SCEV] trip count calculation for loops with unknown stride
ClosedPublic

Authored by pankajchawla on Jul 14 2016, 12:09 PM.

Download Raw Diff

Details

Reviewers

sanjoy

Commits

rG8bbabee21aaf: Reapplying r278731 after fixing the problem that caused it to be reverted.
rG7fe18251a5e0: Enhance SCEV to compute the trip count for some loops with unknown stride.
rL281732: Reapplying r278731 after fixing the problem that caused it to be reverted.
rL278731: Enhance SCEV to compute the trip count for some loops with unknown stride.

Summary

Hello,

The patch teaches ScalarEvolution to compute trip count for loops such as these-

for(int i=0; i<n; i+=s) {
  A[i] = i;
}

The basic idea is that if the IV has a nsw/nuw flag and loop has no side effects then we can assume the direction of the loop and compute the correct backedge taken count.

Diff Detail

Event Timeline

pankajchawla updated this revision to Diff 64021.Jul 14 2016, 12:09 PM

pankajchawla retitled this revision from to [SCEV] trip count calculation for loops with unknown stride.

pankajchawla updated this object.

pankajchawla added a reviewer: sanjoy.

pankajchawla added a subscriber: llvm-commits.

Herald added subscribers: mzolotukhin, sanjoy. · View Herald TranscriptJul 14 2016, 12:09 PM

You have to be a bit more careful here... consider:

for (int i=0; i<16; i-=2) {
    if (i < -2000) break;
}

Here, the backedge is taken 1000 times. (There are actually two related cases here: one, the case where there's an explicit break, and the case where some call in the loop throws an exception.)

It might be sufficient to show that the loop only has one exit and loopHasNoAbnormalExits() is true; not 100% sure about that.

llvm/lib/Analysis/ScalarEvolution.cpp
8794	I don't follow why this check is necessary.

In D22377#484607, @eli.friedman wrote:
You have to be a bit more careful here... consider:
for (int i=0; i<16; i-=2) {
    if (i < -2000) break;
}
Here, the backedge is taken 1000 times. (There are actually two related cases here: one, the case where there's an explicit break, and the case where some call in the loop throws an exception.)

It might be sufficient to show that the loop only has one exit and loopHasNoAbnormalExits() is true; not 100% sure about that.

Hi Eli,

The single exit condition is taken care of by checking the NoWrap flag as it is only set when ControlsExit is true, on line 8657.

We also need to check for loop entry guard because a single-trip do-while loop can have a negative stride. In this case, we will compute the wrong trip count. Consider this loop-

int i = 0;

do {
A[i] = i;
i += s;
} while (i<n);

This is a single trip loop if n = -10 and s = -1.

pankajchawla added inline comments.Jul 14 2016, 1:24 PM

llvm/lib/Analysis/ScalarEvolution.cpp
8794	Sorry, this is my first time using Phabricator. I replied to this in my non-inline comment.

Oh, right, missed the ControlsExit. That doesn't help with the exception-throwing case, though.

Another potentially interesting case: what if the stride is zero? Then you have an infinite loop. I have no idea how to resolve this issue.

For the isLoopEntryGuardedByCond bit, I guess the issue is that you basically want "stride < 0 ? 0 : computedcount", and you can't write that in terms of SCEV expressions? That's reasonable, but you should probably call that out specifically in a comment.

Added check for loopHasNoAbnormalExits() and a comment about bailing out for do-while loops.

Hi Pankaj,

There are three issues with the current patch:

Even after proving a non-negative stride, you cannot generally assume a non-zero stride. A zero stride is possible if the loop has a side effect in it (i.e. a volatile read or write, or some IO). If the loop does not have a side effect then running forever is undefined behavior, so proving the loop does not have side effects + ControlsExit should be enough to prove that stride is non-zero. A good place to compute it is in a common function that also computes loopHasNoAbnormalExits at the same time (to avoid traversing the loop body twice).
It looks like you're saying if Start - Stride < RHS then the loop cannot be a single iteration loop. I think that is incorrect, what if we have:
- Start = -1, Stride = -2, RHS = 5 for an unsigned compare? Start - Stride = 1 which is u< RHS, but if the backedge is controlled by {-1,+,-2} u< 5 then it won't be taken even once.
- Start = INT_MAX, Stride = -1, RHS = 5 for a signed compare? Start - Stride = INT_MIN which is s< RHS, but if the backedge is controlled by {INT_MAX,+,-1} s< 5 then it won't be taken even once.
  
  I think the right precondition is Start < RHS -- if that precondition is true then we know that {Start,+,Stride} is true on the first iteration, and if Stride is negative then we'll keep executing the loop till {Start,+,Stride} wraps.
I don't think the logic as written is correct for unsigned compares. I can legitimately have {0,+,-1}<nuw> u< -1 execute one backedge (i.e. with a "negative" step).

Finally, we should also do this in howManyGreaterThans for symmetry,
but that's a separate change and does not need to be addressed in this
revision.

This revision now requires changes to proceed.Jul 15 2016, 12:25 AM

Hi Sanjoy,

Please see my replies inlined.

Thanks,
Pankaj

In D22377#485160, @sanjoy wrote:

Hi Pankaj,

There are three issues with the current patch:

Even after proving a non-negative stride, you cannot generally assume a non-zero stride. A zero stride is possible if the loop has a side effect in it (i.e. a volatile read or write, or some IO). If the loop does not have a side effect then running forever is undefined behavior, so proving the loop does not have side effects + ControlsExit should be enough to prove that stride is non-zero. A good place to compute it is in a common function that also computes loopHasNoAbnormalExits at the same time (to avoid traversing the loop body twice).

loopHasNoAbonormalExits() calls isGuaranteedToTransferExecutionToSuccessor() which checks for volatile loads/stores. This function also has a comment inside implying that this function checks for side effects. What other checks do we need? I don't know what it takes to prove that the loop has no side effects. Can you please give me some pointers?

It looks like you're saying if Start - Stride < RHS then the loop cannot be a single iteration loop. I think that is incorrect, what if we have:

Start = -1, Stride = -2, RHS = 5 for an unsigned compare? Start - Stride = 1 which is u< RHS, but if the backedge is controlled by {-1,+,-2} u< 5 then it won't be taken even once.

Start = INT_MAX, Stride = -1, RHS = 5 for a signed compare? Start - Stride = INT_MIN which is s< RHS, but if the backedge is controlled by {INT_MAX,+,-1} s< 5 then it won't be taken even once.

I think the right precondition is Start < RHS -- if that precondition is true then we know that {Start,+,Stride} is true on the first iteration, and if Stride is negative then we'll keep executing the loop till {Start,+,Stride} wraps.

I don't think the logic as written is correct for unsigned compares. I can legitimately have {0,+,-1}<nuw> u< -1 execute one backedge (i.e. with a "negative" step).

I think there is some misunderstanding here. The reason we are checking for Start - Stride < RHS rather than Start < RHS is because the AddRec is coming from the incremented value of the IV used in the backedge.

Consider this for loop-

for (i=start; i<end; i+=stride)
  A[i] = i

In LLVM's canonical form, it will look like this-

i = start
if (start < end) {
  do {
    A[i] = i;
    i += stride;    // We pick up AddRec from updated i which at this point is start + stride.
  } while (i < end)
}

For your first example when the IV is {-1,+,-2} it cannot have nuw flag.
Please let me know if I have misunderstood you.

Finally, we should also do this in howManyGreaterThans for symmetry,
but that's a separate change and does not need to be addressed in this
revision.

Yes, we should but I cannot write a positive test for this case. The reason is that we do not propagate the nsw flag to the IV for cases like this-

for (i=n; i>0; i+=s)
  A[i] = i;

I discovered two additional issues with the variants of the original loop mentioned in the description. I brought this up in the llvm-dev mailing list. One is the lack of nsw propagation which I just mentioned. The second issue seems to be related to canonicalization. So if we change the loop control condition from i<n to i<=n, ScalarEvolution is not able to compute the trip count.

The canonical form of this loop looks something like this-

if (0 <= n) {
  i = 0;
  do {
    A[i] = i;
    i += s; // NSW add
  } while (! (i > n));  // sgt compare
}

The 'sgt' compare is inverted to 'sle' for analysis. ScalarEvolution isn't really expecting 'sle' in canonicalized loops so it reverts to brute force exit count computation using computeExitCountExhaustively() which doesn't work.

Bumping this up again.

Sanjoy,

Can you please respond to my earlier comments so I know how to proceed?

Thanks,
Pankaj

Re: the side-effect issue: You're trying to prove the loop will eventually either exit or execute undefined behavior. loopHasNoAbnormalExits proves that if the loop exits, it will exit via a visible edge (so it can't throw an exception or kill the program). There's a gap between the two: loopHasNoAbnormalExits doesn't care whether there's a nested infinite loop. In practice, this becomes an issue with atomic operations.

wmi added a subscriber: wmi.Jul 21 2016, 3:30 PM

Hi Pankaj,

In D22377#485645, @pankajchawla wrote:

loopHasNoAbonormalExits() calls isGuaranteedToTransferExecutionToSuccessor() which checks for volatile loads/stores. This function also has a comment inside implying that this function checks for side effects. What other checks do we need? I don't know what it takes to prove that the loop has no side effects. Can you please give me some pointers?

You need to make sure that there are no volatile loads or stores, and no function calls that may write memory.

It is possible that isGuaranteedToTransferExecutionToSuccessor incidentally checks for the same thing; but I'd prefer an explicit loopHasNoSideEffects() that checks that the only instructions in the loop returning true for mayHaveSideEffects() are non-volatile stores non atomic stores.

It looks like you're saying if Start - Stride < RHS then the loop cannot be a single iteration loop. I think that is incorrect, what if we have:

Start = -1, Stride = -2, RHS = 5 for an unsigned compare? Start - Stride = 1 which is u< RHS, but if the backedge is controlled by {-1,+,-2} u< 5 then it won't be taken even once.

Start = INT_MAX, Stride = -1, RHS = 5 for a signed compare? Start - Stride = INT_MIN which is s< RHS, but if the backedge is controlled by {INT_MAX,+,-1} s< 5 then it won't be taken even once.

I think the right precondition is Start < RHS -- if that precondition is true then we know that {Start,+,Stride} is true on the first iteration, and if Stride is negative then we'll keep executing the loop till {Start,+,Stride} wraps.

I don't think the logic as written is correct for unsigned compares. I can legitimately have {0,+,-1}<nuw> u< -1 execute one backedge (i.e. with a "negative" step).

For your first example when the IV is {-1,+,-2} it cannot have nuw flag.

The example I was thinking of was:

for (unsigned i = 1; i < 5; i += (-2)) {
}

i = 1;
if (i < 5) {
  do {
    i += (-2); // NUW
  } while (i u< 5);
}

then the pre-increment IV is {1,+,-2}. It does not unsigned-overflow on the first increment (1 + (-2) == -1), and then it exits the loop.

But that's besides the point -- generally, whether the increment operation has nuw or nsw is distinct from whether the SCEV expression mapped to the increment operation is nuw or nsw. Specifically, we mark {A,+,B} as nsw (resp nuw) if sext({A,+,B}) (resp. zext({A,+,B})) is equal to {sext A,+,sext B} (resp. {zext A,+,zext B}) for every iteration of the loop that executes. In some cases we can use the no-wrap flags on the increment to prove this, but e.g. for a loop that that does not take the backedge even once, every add recurrence in the loop is nuw and nsw irrespective of the no-wrap flags present in the increment operations. That includes loop like this:

i = INT_MIN
if (i s< 5) {
  do {
    i += (-1);
  } while (i s< 5);
}

The increment operation itself is not NSW, but the add recurrence mapped to the post-incremented value of i can still be marked NSW, since it follows our rule that for every iteration of the loop (there is only one), sext({INT_MIN,+,-1}) == {sext(INT_MIN), +, sext(-1)} (since in the one iteration, both the LHS and RHS are sext(INT_MIN)).

Now it is unlikely that SCEV will mark the add recurrence with NSW in the trip computing code itself, since it does not yet know the trip count; but marking it is NSW is legal, and we must allow for it.

Added check for side effects in the loop.

Hi Sanjoy,

Thanks for explaining how we mark nuw/nsw for the AddRecs!

I have added logic to check for side effects in the loop in loophasNoSideEffects().

Regarding the loop entry guard check:
Please note that I haven't changed this check and I believe the original check is correct. The loop entry guard tells us whether the loop will be executed at least once. It cannot tell us whether the loop will be executed more than once. Please also note that the backedge taken count for the loop mentioned in the description is: ((-1 + %n) /u %s)
A more generic formula with unknown initial value is: ((-1 + %n - %init) /u %s)

Now lets go through your examples:

The example I was thinking of was:
for (unsigned i = 1; i < 5; i += (-2)) {
}
i = 1;
if (i < 5) {
  do {
    i += (-2); // NUW
  } while (i u< 5);
}
then the pre-increment IV is {1,+,-2}. It does not unsigned-overflow on the first increment (1 + (-2) == -1), and then it exits the loop.

The backedge taken count for this loop according to the formula above will be:
(5 -1 -1) /u (-2) ==> 0

So it is correct.

Probably the confusion is because I am using the existing terminology in the code of a positive or negative stride. May be we should be classifying them as count up or count down loops instead.

Here's your second example-

i = INT_MIN
if (i s< 5) {
  do {
    i += (-1);
  } while (i s< 5);
}

I believe this loop has undefined behavior since adding INT_MIN and -1 leads to signed underflow.

I guess my point is that once we have checked the loop entry guard, single trip loops will be handled just as well as multi-trip loops.
Please let me know if I am still missing anything.

Please note that the nsw marking on the IV AddRec is quite conservative. If I initialize the IV to anything else than 0 in the original loop, we lose the marking and the backedge taken count is not computed.

Hi Pankaj,

In D22377#493355, @pankajchawla wrote:

Regarding the loop entry guard check:

Please note that I haven't changed this check and I believe the
original check is correct. The loop entry guard tells us whether the

Yes, for what it is doing it is correct.

loop will be executed at least once. It cannot tell us whether the
loop will be executed more than once. Please also note that the
backedge taken count for the loop mentioned in the description is:
((-1 + %n) /u %s)

A more generic formula with unknown initial value is: ((-1 + %n -
%init) /u %s)

Now lets go through your examples:
The example I was thinking of was:
for (unsigned i = 1; i < 5; i += (-2)) {
}
i = 1;
if (i < 5) {
  do {
    i += (-2); // NUW
  } while (i u< 5);
}
then the pre-increment IV is {1,+,-2}. It does not unsigned-overflow on the first increment (1 + (-2) == -1), and then it exits the loop.
The backedge taken count for this loop according to the formula above will be:
(5 -1 -1) /u (-2) ==> 0

So it is correct.

I was trying to come up with examples where (based on the change
description):

The add recurrence, AR, being compared to in the loop exit check is legitimately nuw or nsw
The loop decrement amount (stride) is negative
The loop is guarded by "AR->getStart() - AR->getStride()"

If I understand you correctly, you're saying that even if the stride
is negative the expression that computes the backedge taken count is
valid. That may be correct (I'll re-review with that understanding),
but then the change description needs to say that instead of saying
"then we can assume the stride to be positive." Is that a correct
assessment?

Btw, if you think a real-time chat can help and are in PDT, we can
talk about this on the #llvm IRC channel on Monday.

Here's your second example-
i = INT_MIN
if (i s< 5) {
  do {
    i += (-1);
  } while (i s< 5);
}
I believe this loop has undefined behavior since adding INT_MIN
and -1 leads to signed underflow.

I was using that as a shorthand for LLVM IR. :) You could have
something like:

i = INT_MIN
if (i s< 5) {
  do {
    i = wrapping_add(i, -1);
  } while (i s< 5);
}

with

int wrapping_add(int a, int b) {
  return (int)(((unsigned)a) + ((unsigned)b));
}

I guess my point is that once we have checked the loop entry guard, single trip loops will be handled just as well as multi-trip loops.
Please let me know if I am still missing anything.

Please note that the nsw marking on the IV AddRec is quite conservative. If I initialize the IV to anything else than 0 in the original loop, we lose the marking and the backedge taken count is not computed.

pankajchawla updated this object.Jul 22 2016, 3:35 PM

Hi Sanjoy,

If I understand you correctly, you're saying that even if the stride
is negative the expression that computes the backedge taken count is
valid. That may be correct (I'll re-review with that understanding),
but then the change description needs to say that instead of saying
"then we can assume the stride to be positive." Is that a correct
assessment?

Your assessment is correct :)
I have updated the description. Hopefully, it makes more sense now. I am not sure what is the correct terminology to use here.

Btw, if you think a real-time chat can help and are in PDT, we can
talk about this on the #llvm IRC channel on Monday.

Yes, I am in PDT but now that (I think) we are on the same page this may not be needed. We can chat if you still think it is needed.

flyingforyou added a subscriber: flyingforyou.Jul 24 2016, 4:48 PM

sanjoy added inline comments.Jul 26 2016, 12:33 AM

llvm/lib/Analysis/ScalarEvolution.cpp
4956	No braces needed here. Also, I'd drop the `else`, and just have: if (X) return Y; return Z;
4956	Use `auto *` to denote that `SI` is a pointer.
8730–8731	As we discussed, it isn't that given the conditions you're testing we can prove that the stride is non-negative, but that the math below is correct even if the stride is negative. Also, I'd separate out the `!NoWrap \|\| !loopHasNoAbnormalExits(L) \|\| !loopHasNoSideEffects(L)` check into it's own `if` and have a small comment describing why they're necessary / sufficient (the example you had in the llvm-dev thread would be nice to add too).
8795	Here and in the comment in the test case, we're not assuming the stride is positive, but that the trip count math is correct even if stride is negative given that the conditions we checked above are correct; so I think the comment and the variable name needs to change.

I think there still is a problem with the unsigned variant of this
patch:

i8 index  = 127;
i8 limit  = 128;
i8 stride = 129;

if (index u< limit) {
  do {
    index = wrapping_add(index, stride);
  } while (index u< limit);
}

The backedge count of the loop above is 1. Given that, the post-inc
can be marked nuw (since it goes from 127 -> 255 only).
However, the formula (limit - start + (stride - 1)) /u stride
computes (128 - 0 + 128) /u 129 == 0.

Hi Sanjoy,

You example seems correct. Ideally, we should check the wrap flags on the phi associated with the IV but I think an alternative is to check that the stride is not known to be either positive or negative. The wrap flags cannot be added by ScalarEvolution for unknown strides. This is the case we were originally trying to fix anyway. If you agree, I will include this check in my next revision.

I will also make the changes you suggested in your inline comments.

Thanks,
Pankaj

pankajchawla added inline comments.Jul 26 2016, 2:52 PM

llvm/lib/Analysis/ScalarEvolution.cpp
8795	I think this check is unnecessary as the computed backedge taken count is correct for single trip do-while loops as well. For a do-while loop like this- int i = init; do { A[i] = i; i += s; } while (i<n); The computed backedge taken count is: ((-1 + (-1 * %init) + ((%init + %s) smax %n)) /u %s) This seems correct because if %n is less than (%init + %s), it reduces to ((-1 + %s) /u %s) ==> 0. I will remove this check.

pankajchawla added inline comments.Jul 26 2016, 3:21 PM

llvm/lib/Analysis/ScalarEvolution.cpp
8730–8731	The check for abnormal exits seems redundant now because loopHasNoSideEffects() is more conservative than it. I am removing this check as well.

Added check to prove that the stride is truly unknown to ScalarEvolution so that the wrap flags are not propagated in edge cases.
Removed unnecessary check for loopHasNoAbnormalExits() and loop entry guard.
Fixed comments and the test.

pankajchawla updated this object.Jul 26 2016, 4:18 PM

sanjoy requested changes to this revision.Aug 1 2016, 7:13 PM

sanjoy edited edge metadata.

sanjoy added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
8738	nit: "here -"
8752	I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride, there are certain things SCEV cannot prove. This will create a tricky coupling between different parts and e.g. introduces the possibility that we'll get miscompiles if we make an unrelated part of SCEV smarter in some creative way (like you're doing here :) ). I think it is better to assume that the other bits in SCEV are omniscient (i.e. they may (but are not guaranteed to) prove anything that is factually correct at runtime), and base our inferences on that. The other thing that's missing here is some clear justification of why what you're doing is correct. This does not necessarily have to be a formal proof (though that'd make me really happy :) ) but at least a clear statement of preconditions you've assumed, and why that implies that the trip count of `(max(rhs, start) - start + stride - 1) / u stride` is correct.

This revision now requires changes to proceed.Aug 1 2016, 7:13 PM

Added bailout condition for predicated IVs.

pankajchawla marked an inline comment as done.Aug 2 2016, 1:08 PM

pankajchawla added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
8752	I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride, there are certain things SCEV cannot prove. This will create a tricky coupling between different parts and e.g. introduces the possibility that we'll get miscompiles if we make an unrelated part of SCEV smarter in some creative way (like you're doing here :) ). I think it is better to assume that the other bits in SCEV are omniscient (i.e. they may (but are not guaranteed to) prove anything that is factually correct at runtime), and base our inferences on that. The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of `start`, `end` and `stride`). So this smartness should then get reflected in either `isKnownPositive(stride)` or `isKnownNonPositive(stride)` which this change is guarded under. Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted. I cannot think of a foolproof way to prune the 'edge cases'. Also, just so you know, scalar evolution is successfully able to compute the trip count for the edge case mentioned in the comments by computing the trip count exhaustively before ever reaching this section of the code. The other thing that's missing here is some clear justification of why what you're doing is correct. This does not necessarily have to be a formal proof (though that'd make me really happy :) ) but at least a clear statement of preconditions you've assumed, and why that implies that the trip count of (max(rhs, start) - start + stride - 1) / u stride is correct. I am not sure whether you are not convinced that this is correct or you just want this to be documented properly. I have tried to document all the preconditions. I will take a crack at giving the proof in case you are not convinced. Please bear with me as I am neither a mathematician nor a theoretical CS guy :) Please also keep in mind that I am excluding all edge cases in the proof. First lets define the loop structure we are handling- i = start do { A[i] = i; i += s; } while (i < end); The proposition is that the backedge taken count of this loop for any combination of start, stride and end is: `(max(end, start + stride) - start - 1) /u stride` Since I haven't changed the original backedge taken count formula, I assume that you agree that the formula is correct when stride is known to be positive (original behavior of the function). Now, these are our preconditions- a) Comparator is either ICMP_ULT or ICMP_SLT. b) IV is either nuw or nsw depending upon the comparator type in a). c) loop has single exit with no side effects. We can split all the possible scenarios for this loop into these three cases- Positive Stride - This case reverts to the original behavior of the function which we assumed to be correct. Zero stride - Precondition c) implies that stride cannot be zero as that would imply an infinite loop which is undefined behavior. Negative stride - Precondition a) and b) together imply that this is only possible for single trip loops, which means ((start + stride) >= end) holds true. So the formula reduces like this- `(max(end, start + stride) - start - 1) /u stride ==>` `(start + stride - start - 1) /u stride ==>` `(stride - 1) /u stride ==> 0.` Hence proved :)

sanjoy added inline comments.Aug 5 2016, 5:53 PM

llvm/lib/Analysis/ScalarEvolution.cpp
8746	Nit: blank before `-`
8752	The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of start, end and stride). So this smartness should then get reflected in either isKnownPositive(stride) or isKnownNonPositive(stride) which this change is guarded under. Ideally we should not rely on things like `isKnownPositive` or `isKnownNonPositive` being uniformly precise all of the time -- i.e. an implementation that returns `false` unconditionally every 10th time these are called should be "okay" (terrible, but correct). This isn't just a thought experiment either -- `isKnownPositive` (since it internally calls `getRange`) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove `isKnownPositive`. Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted. The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons. The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares. However, I'm not sure if bailing out on unsigned compares will actually work for the case that motivated this patch. I am not sure whether you are not convinced that this is correct or you just want this to be documented properly. I was asking for a short comment documenting on why your line of reasoning is correct.

sanjoy requested changes to this revision.Aug 5 2016, 5:58 PM

sanjoy edited edge metadata.

This revision now requires changes to proceed.Aug 5 2016, 5:58 PM

pankajchawla added inline comments.Aug 8 2016, 3:08 PM

llvm/lib/Analysis/ScalarEvolution.cpp
8752	Ideally we should not rely on things like `isKnownPositive` or `isKnownNonPositive` being uniformly precise all of the time i.e. an implementation that returns false unconditionally every 10th time these are called should be "okay" (terrible, but correct). I am not relying on their preciseness, I am relying on their correctness. They can (and should) conservatively return false which would be fine. Returning false unconditionally every 10th time is not okay, it is clearly incorrect. What if I call it on the same value 10 times? Are you suggesting that it is ok for them to return `true` the first 9 times and `false` the 10th time for the same value? isKnownPositive (since it internally calls getRange) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove isKnownPositive. If this is the case, I believe this is an issue with ScalarEvolution's implementation. The cache for the stride cannot be invalidated independently of the add recurrence it occurs in. We should invalidate SCEV for all values dependent on stride like what we do in `forgetValue()` otherwise ScalarEvolution is internally inconsistent. If we do not do this, the wrap flags applied on the add recurrence can be wrong and can lead to miscompilation. If `isKnownPositive()` cannot be re-proven, then the previous trip count computation which was made on this assumption is now invalid and therefore should be invalidated as well. The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons. I can agree on the principle but I think you are applying it incorrectly. As a component, ScalarEvolution's different sub-components have to be in 'sync' with each other otherwise no correctness guarantees can be provided by the component as a whole. I am not limiting the omniscience of other parts of SCEV here, I am merely assuming that when they grow stronger, they grow stronger in sync. The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares. The example you provided invalidates both signed and unsigned IVs. This is the IR produced by LLVM for your example- define void @foo(i8* nocapture %A, i32 %n) local_unnamed_addr #0 { entry: br label %for.body for.body: ; preds = %entry, %for.body %i.07 = phi i8 [ 127, %entry ], [ %add, %for.body ] %idxprom = zext i8 %i.07 to i64 %arrayidx = getelementptr inbounds i8, i8* %A, i64 %idxprom store i8 %i.07, i8* %arrayidx, align 1 %add = add i8 %i.07, -127 %cmp = icmp sgt i8 %add, -1 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void } This is the SCEV form of %add- {0,+,-127}<nsw><%for.body> U: [-127,1) S: [-127,1) There is already an `nsw` flag on it, and an `nuw` flag can be applied as well. So I cannot restrict it to signed IVs. I was asking for a short comment documenting on why your line of reasoning is correct. I will add a short version of the proof in the comments if we can agree that this patch should be committed.

lgtm as mentioned inline

llvm/lib/Analysis/ScalarEvolution.cpp
8752	I can agree on the principle but I think you are applying it incorrectly. As a component, ScalarEvolution's different sub-components have to be in 'sync' with each other otherwise no correctness guarantees can be provided by the component as a whole. I am not limiting the omniscience of other parts of SCEV here, I am merely assuming that when they grow stronger, they grow stronger in sync. You're right, I think I'm getting carried away in some excessively rigid chain of reasoning which isn't really applicable here. I'm fine with this change landing as-is (with a short comment informally describing why what you've changed is correct).

This revision is now accepted and ready to land.Aug 8 2016, 11:27 PM

Added more elaborate comments explaining why the change is correct.

Let me know if you want me to check this in on you behalf. Otherwise lgtm as stated.

pankajchawla added inline comments.Aug 9 2016, 12:11 PM

llvm/lib/Analysis/ScalarEvolution.cpp
8752	Thanks! I have updated the comments.

Closed by commit rL278731: Enhance SCEV to compute the trip count for some loops with unknown stride. (authored by dlkreitz). · Explain WhyAug 15 2016, 1:29 PM

This revision was automatically updated to reflect the committed changes.

This change appears to have introduced a miscompile of some opus code in Chromium: http://crbug.com/638314

I'm going to speculatively revert this, and I'll get some reproducer files soon.

resampler_private_down_FIR-7ac2a7.sh2 KBDownload

resampler_private_down_FIR-7ac2a7.c711 KBDownload

This is the pre-processed source and command line of one of a few TUs in opus that changed due to this patch. While I haven't stared at the IR diff long enough to understand if this is truly a miscompile, it's pretty suspicious to me. PTAL

indvars.ll29 KBDownload

Hi Sanjoy,

This seems to be an issue with indvars pass.

I have uploaded a smaller version of the reproducer (extracted from the original). The bug can be reproduced using this command-

$ opt -indvars indvars.ll -S

Before indvars, the loop for.body's IV is like this-

for.body:                                         ; preds = %for.body.lr.ph, %for.body
  %index_Q16.0928 = phi i32 [ 0, %for.body.lr.ph ], [ %add187, %for.body ]
  ...
  %add187 = add nsw i32 %index_Q16.0928, %index_increment_Q16
  %cmp = icmp slt i32 %add187, %max_index_Q16
  br i1 %cmp, label %for.body, label %sw.epilog.loopexit

The SCEV form of post-incremented IV %add187 is this:

{%index_increment_Q16,+,%index_increment_Q16}<nsw><%for.body>

The backedge taken count of the loop is like this:

((-1 + %max_index_Q16) /u %index_increment_Q16)

After indvars, the IV is extended to 64 bits but the problem is that the stride is zero-extended instead of getting sign-extended. This is incorrect as the stride can be negative. Here's the IR after modification-

for.body.lr.ph:                                   ; preds = %for.cond.preheader
%0 = zext i32 %index_increment_Q16 to i64
 br label %for.body

for.body:
  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.lr.ph ]
  ...
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, %0
  br i1 %cmp, label %for.body, label %sw.epilog.loopexit

I am not familiar with indvars code. Can you please help?

Thanks,
Pankaj

Please ignore my previous email. My analysis was incorrect. The stride cannot be negative here as it will lead to UB.

@Reid
You mentioned that this is one of the TU which is changed due to the patch. Are you sure this one is causing the miscompilation?
Does the issue go away if we disable optimizations on this TU?
Did you run opt-bisect on it? Does it point to any pass? If not, is it possible to run it?

As the change is in an analysis, the issue can be in any transformation using this analysis rather than the analysis itself. Any help from your side in narrowing down the issue is greatly appreciated.

I will keep looking into the generated IR for this file in the meantime.

Thanks,
Pankaj

@rnk

I can't find anything wrong with the IR. I am awaiting your response for more information.

-Pankaj

resampler_private_IIR_FIR.ll9 KBDownload

Sorry, that was the wrong TU. This is a small, single function TU that, when recompiled with your SCEV change, causes Chromium's tests to fail. Your change does not cause a difference in the optimized LLVM IR, only in the final assembly, so you can debug this with llc.

Thanks for the additional info!

Do I need to specify any options with llc like -O2?

I haven't worked with llc before so any tips would be appreciated.
Would it be possible to use bugpoint to narrow down the issue?

You don't need any special options to reproduce, it should pick up the target from the .ll file. You could use bugpoint, but the test case is already pretty small, so I wouldn't bother. Run llc with -debug and try to figure out which pass is using SCEV to make bad choices. I would suspect LSR, which I think is a backend pass.

Hi all,

I have a fix for the issue. The max backedge taken count was incorrectly set to 1 for loops with unknown strides which resulted in LSR perofming the wrong transformation. I verified that we generate identical assembly code for the reproducer without my patch and with my patch + additional fix.

It looks like I am not able to upload the diff to this review since it is closed. Do I have to open a new review?

Thanks,
Pankaj

There's a "Reopen" action available at the bottom of the page.

This revision is now accepted and ready to land.Sep 2 2016, 2:23 PM

Fixed max backedge count computation for loops with unknown stride by assuming a min stride of 1.

@Eli
Thanks for the info!

@sanjoy
The additional fix is minimal and should have been part of the original changeset so I updated it here.
Please review.

sanjoy requested changes to this revision.Sep 11 2016, 7:06 PM

sanjoy edited edge metadata.

sanjoy added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
8785	I just noticed that `doesIVOverflowOnLT` only works for a positive stride, can you please take a look? For a non-positive `Stride`, I think we can make `doesIVOverflowOnLT` just return `NoWrap`.
8807	I'd feel safer if you short-circuited the entire max becount computation based on `PositiveStride`, since `MinStride` is no longer what it says on the tin ("the minimum value of `Stride`). (To me) it basically looks like the expression below that computes the max be count implicitly assumes a positive stride, and forking the logic more explicitly will be more obvious. If that's a problem, I'd suggest at least renaming `MinStride` to something more in tune with what it represents now (perhaps `StrideForMaxBECount`?).

This revision now requires changes to proceed.Sep 11 2016, 7:06 PM

Changes to address Sanjoy's comments.

pankajchawla added inline comments.Sep 12 2016, 5:36 PM

llvm/lib/Analysis/ScalarEvolution.cpp
8785–8786	The comments for this function imply that it expects a positive stride- `Verify if an linear IV with positive stride can overflow when...` I think this makes sense because the condition in the loop latch is of the form (i < N). I added an assert to the function verifying that the incoming stride is indeed positive. This check seems to be useless for unknown strides so I skip it.
8808	I renamed `MinStride` to `StrideForMaxBECount`. I would have had to duplicate code to completely separate max backedge count computation for the two cases.

lgtm with a minor nit

llvm/lib/Analysis/ScalarEvolution.cpp
8810	If it's all the same to you, I'd suggest avoiding a nested ternary and using an `if` instead.

This revision is now accepted and ready to land.Sep 12 2016, 9:53 PM

pankajchawla added inline comments.Sep 12 2016, 11:43 PM

llvm/lib/Analysis/ScalarEvolution.cpp
8810	I will change it to an if.

Closed by commit rL281732: Reapplying r278731 after fixing the problem that caused it to be reverted. (authored by dlkreitz). · Explain WhySep 16 2016, 7:46 AM

This revision was automatically updated to reflect the committed changes.

delena added a subscriber: delena.Mar 17 2020, 1:36 AM

delena added inline comments.

llvm/trunk/lib/Analysis/ScalarEvolution.cpp
4965 ↗	(On Diff #71645)	This code is incorrect for masked store and scatter. It returns true for simple store, but false for the masked intrinsics. As a result we can't calculate trip count for the loop "j" after vectorization of the loop "i": for (int = 0; j < jCount; j+=Step) { for (int i = 0; i < 1023; ++i) { int ind = jjStride + kkStride + i; B[ind] = A[ind] + 2; } }

Herald added a subscriber: jfb. · View Herald TranscriptMar 17 2020, 1:36 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ScalarEvolution.h

7 lines

lib/

Analysis/

ScalarEvolution.cpp

81 lines

test/

Analysis/

ScalarEvolution/

trip-count-unknown-stride.ll

54 lines

Diff 67389

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 797 Lines • ▼ Show 20 Lines	private:
/// Memoized computeLoopDisposition results.		/// Memoized computeLoopDisposition results.
DenseMap<const SCEV *,		DenseMap<const SCEV *,
SmallVector<PointerIntPair<const Loop *, 2, LoopDisposition>, 2>>		SmallVector<PointerIntPair<const Loop *, 2, LoopDisposition>, 2>>
LoopDispositions;		LoopDispositions;

/// Cache for \c loopHasNoAbnormalExits.		/// Cache for \c loopHasNoAbnormalExits.
DenseMap<const Loop *, bool> LoopHasNoAbnormalExits;		DenseMap<const Loop *, bool> LoopHasNoAbnormalExits;

		/// Cache for \c loopHasNoSideEffects.
		DenseMap<const Loop *, bool> LoopHasNoSideEffects;

		/// Returns true if \p L contains no instruction that can have side effects
		/// (i.e. via throwing an exception, volatile or atomic access).
		bool loopHasNoSideEffects(const Loop *L);

/// Returns true if \p L contains no instruction that can abnormally exit		/// Returns true if \p L contains no instruction that can abnormally exit
/// the loop (i.e. via throwing an exception, by terminating the thread		/// the loop (i.e. via throwing an exception, by terminating the thread
/// cleanly or by infinite looping in a called function). Strictly		/// cleanly or by infinite looping in a called function). Strictly
/// speaking, the last one is not leaving the loop, but is identical to		/// speaking, the last one is not leaving the loop, but is identical to
/// leaving the loop for reasoning about undefined behavior.		/// leaving the loop for reasoning about undefined behavior.
bool loopHasNoAbnormalExits(const Loop *L);		bool loopHasNoAbnormalExits(const Loop *L);

/// Compute a LoopDisposition value.		/// Compute a LoopDisposition value.
▲ Show 20 Lines • Show All 1,010 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,941 Lines • ▼ Show 20 Lines	for (auto *PoisonUser : Poison->users()) {
}		}
}		}
}		}
}		}

return LatchControlDependentOnPoison && loopHasNoAbnormalExits(L);		return LatchControlDependentOnPoison && loopHasNoAbnormalExits(L);
}		}

		bool ScalarEvolution::loopHasNoSideEffects(const Loop *L) {
		auto Itr = LoopHasNoSideEffects.find(L);
		if (Itr == LoopHasNoSideEffects.end()) {
		auto NoSideEffectsInBB = [&](BasicBlock *BB) {
		return all_of(*BB, [](Instruction &I) {
		// Non-atomic, non-volatile stores are ok.
		if (auto *SI = dyn_cast<StoreInst>(&I))
		sanjoyUnsubmitted Not Done Reply Inline Actions No braces needed here. Also, I'd drop the `else`, and just have: if (X) return Y; return Z; sanjoy: No braces needed here. Also, I'd drop the `else`, and just have: ``` if (X) return Y…
		sanjoyUnsubmitted Not Done Reply Inline Actions Use `auto ` to denote that `SI` is a pointer. sanjoy:* Use `auto *` to denote that `SI` is a pointer.
		return SI->isSimple();

		return !I.mayHaveSideEffects();
		});
		};

		auto InsertPair = LoopHasNoSideEffects.insert(
		{L, all_of(L->getBlocks(), NoSideEffectsInBB)});
		assert(InsertPair.second && "We just checked!");
		Itr = InsertPair.first;
		}

		return Itr->second;
		}

bool ScalarEvolution::loopHasNoAbnormalExits(const Loop *L) {		bool ScalarEvolution::loopHasNoAbnormalExits(const Loop *L) {
auto Itr = LoopHasNoAbnormalExits.find(L);		auto Itr = LoopHasNoAbnormalExits.find(L);
if (Itr == LoopHasNoAbnormalExits.end()) {		if (Itr == LoopHasNoAbnormalExits.end()) {
auto NoAbnormalExitInBB = [&](BasicBlock *BB) {		auto NoAbnormalExitInBB = [&](BasicBlock *BB) {
return all_of(*BB, [](Instruction &I) {		return all_of(*BB, [](Instruction &I) {
return isGuaranteedToTransferExecutionToSuccessor(&I);		return isGuaranteedToTransferExecutionToSuccessor(&I);
});		});
};		};
▲ Show 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	void ScalarEvolution::forgetLoop(const Loop *L) {
}		}

// Forget all contained loops too, to avoid dangling entries in the		// Forget all contained loops too, to avoid dangling entries in the
// ValuesAtScopes map.		// ValuesAtScopes map.
for (Loop I : L)		for (Loop I : L)
forgetLoop(I);		forgetLoop(I);

LoopHasNoAbnormalExits.erase(L);		LoopHasNoAbnormalExits.erase(L);
		LoopHasNoSideEffects.erase(L);
}		}

void ScalarEvolution::forgetValue(Value *V) {		void ScalarEvolution::forgetValue(Value *V) {
Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
if (!I) return;		if (!I) return;

// Drop information about expressions based on loop-header PHIs.		// Drop information about expressions based on loop-header PHIs.
SmallVector<Instruction *, 16> Worklist;		SmallVector<Instruction *, 16> Worklist;
▲ Show 20 Lines • Show All 3,136 Lines • ▼ Show 20 Lines	ScalarEvolution::howManyLessThans(const SCEV LHS, const SCEV RHS,
const Loop *L, bool IsSigned,		const Loop *L, bool IsSigned,
bool ControlsExit, bool AllowPredicates) {		bool ControlsExit, bool AllowPredicates) {
SCEVUnionPredicate P;		SCEVUnionPredicate P;
// We handle only IV < Invariant		// We handle only IV < Invariant
if (!isLoopInvariant(RHS, L))		if (!isLoopInvariant(RHS, L))
return getCouldNotCompute();		return getCouldNotCompute();

const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);		const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);
if (!IV && AllowPredicates)		bool PredicatedIV = false;

		if (!IV && AllowPredicates) {
// Try to make this an AddRec using runtime tests, in the first X		// Try to make this an AddRec using runtime tests, in the first X
// iterations of this loop, where X is the SCEV expression found by the		// iterations of this loop, where X is the SCEV expression found by the
// algorithm below.		// algorithm below.
IV = convertSCEVToAddRecWithPredicates(LHS, L, P);		IV = convertSCEVToAddRecWithPredicates(LHS, L, P);
		PredicatedIV = true;
		}

// Avoid weird loops		// Avoid weird loops
if (!IV \|\| IV->getLoop() != L \|\| !IV->isAffine())		if (!IV \|\| IV->getLoop() != L \|\| !IV->isAffine())
return getCouldNotCompute();		return getCouldNotCompute();

bool NoWrap = ControlsExit &&		bool NoWrap = ControlsExit &&
IV->getNoWrapFlags(IsSigned ? SCEV::FlagNSW : SCEV::FlagNUW);		IV->getNoWrapFlags(IsSigned ? SCEV::FlagNSW : SCEV::FlagNUW);

const SCEV Stride = IV->getStepRecurrence(this);		const SCEV Stride = IV->getStepRecurrence(this);

// Avoid negative or zero stride values		bool PositiveStride = isKnownPositive(Stride);
		sanjoyUnsubmitted Not Done Reply Inline Actions As we discussed, it isn't that given the conditions you're testing we can prove that the stride is non-negative, but that the math below is correct even if the stride is negative. Also, I'd separate out the `!NoWrap \|\| !loopHasNoAbnormalExits(L) \|\| !loopHasNoSideEffects(L)` check into it's own `if` and have a small comment describing why they're necessary / sufficient (the example you had in the llvm-dev thread would be nice to add too). sanjoy: As we discussed, it isn't that given the conditions you're testing we can prove that the…
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions The check for abnormal exits seems redundant now because loopHasNoSideEffects() is more conservative than it. I am removing this check as well. pankajchawla: The check for abnormal exits seems redundant now because loopHasNoSideEffects() is more…
if (!isKnownPositive(Stride))
		// Avoid negative or zero stride values.
		if (!PositiveStride)
		// We can compute the correct backedge taken count for loops with unknown
		// strides if we can prove that the loop is not an infinite loop with side
		// effects. Here's the loop structure we are trying to handle -
		//
		sanjoyUnsubmitted Done Reply Inline Actions nit: "here -" sanjoy: nit: "here -"
		// i = start
		// do {
		// A[i] = i;
		// i += s;
		// } while (i < end);
		//
		// The backedge taken count for such loops is evaluated as -
		// (max(end, start + stride) - start - 1) /u stride
		sanjoyUnsubmitted Not Done Reply Inline Actions Nit: blank before `-` sanjoy: Nit: blank before `-`
		//
		// The additional preconditions that we need to check to prove correctness
		// of the above formula is as follows -
		//
		// a) IV is either nuw or nsw depending upon signedness (indicated by the
		// NoWrap flag).
		sanjoyUnsubmitted Not Done Reply Inline Actions I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride, there are certain things SCEV cannot prove. This will create a tricky coupling between different parts and e.g. introduces the possibility that we'll get miscompiles if we make an unrelated part of SCEV smarter in some creative way (like you're doing here :) ). I think it is better to assume that the other bits in SCEV are omniscient (i.e. they may (but are not guaranteed to) prove anything that is factually correct at runtime), and base our inferences on that. The other thing that's missing here is some clear justification of why what you're doing is correct. This does not necessarily have to be a formal proof (though that'd make me really happy :) ) but at least a clear statement of preconditions you've assumed, and why that implies that the trip count of `(max(rhs, start) - start + stride - 1) / u stride` is correct. sanjoy: I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride…
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride, there are certain things SCEV cannot prove. This will create a tricky coupling between different parts and e.g. introduces the possibility that we'll get miscompiles if we make an unrelated part of SCEV smarter in some creative way (like you're doing here :) ). I think it is better to assume that the other bits in SCEV are omniscient (i.e. they may (but are not guaranteed to) prove anything that is factually correct at runtime), and base our inferences on that. The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of `start`, `end` and `stride`). So this smartness should then get reflected in either `isKnownPositive(stride)` or `isKnownNonPositive(stride)` which this change is guarded under. Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted. I cannot think of a foolproof way to prune the 'edge cases'. Also, just so you know, scalar evolution is successfully able to compute the trip count for the edge case mentioned in the comments by computing the trip count exhaustively before ever reaching this section of the code. The other thing that's missing here is some clear justification of why what you're doing is correct. This does not necessarily have to be a formal proof (though that'd make me really happy :) ) but at least a clear statement of preconditions you've assumed, and why that implies that the trip count of (max(rhs, start) - start + stride - 1) / u stride is correct. I am not sure whether you are not convinced that this is correct or you just want this to be documented properly. I have tried to document all the preconditions. I will take a crack at giving the proof in case you are not convinced. Please bear with me as I am neither a mathematician nor a theoretical CS guy :) Please also keep in mind that I am excluding all edge cases in the proof. First lets define the loop structure we are handling- i = start do { A[i] = i; i += s; } while (i < end); The proposition is that the backedge taken count of this loop for any combination of start, stride and end is: `(max(end, start + stride) - start - 1) /u stride` Since I haven't changed the original backedge taken count formula, I assume that you agree that the formula is correct when stride is known to be positive (original behavior of the function). Now, these are our preconditions- a) Comparator is either ICMP_ULT or ICMP_SLT. b) IV is either nuw or nsw depending upon the comparator type in a). c) loop has single exit with no side effects. We can split all the possible scenarios for this loop into these three cases- Positive Stride - This case reverts to the original behavior of the function which we assumed to be correct. Zero stride - Precondition c) implies that stride cannot be zero as that would imply an infinite loop which is undefined behavior. Negative stride - Precondition a) and b) together imply that this is only possible for single trip loops, which means ((start + stride) >= end) holds true. So the formula reduces like this- `(max(end, start + stride) - start - 1) /u stride ==>` `(start + stride - start - 1) /u stride ==>` `(stride - 1) /u stride ==> 0.` Hence proved :) pankajchawla: > I'm not super-comfortable with this -- (IIUC) you're assuming that given an unknown stride…
		sanjoyUnsubmitted Not Done Reply Inline Actions The only argument I can give here is that no matter how smart SCEV gets, it can only propagate no wrap flags in 'edge cases' when it knows something about the stride (in fact, it may have to know something about all of start, end and stride). So this smartness should then get reflected in either isKnownPositive(stride) or isKnownNonPositive(stride) which this change is guarded under. Ideally we should not rely on things like `isKnownPositive` or `isKnownNonPositive` being uniformly precise all of the time -- i.e. an implementation that returns `false` unconditionally every 10th time these are called should be "okay" (terrible, but correct). This isn't just a thought experiment either -- `isKnownPositive` (since it internally calls `getRange`) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove `isKnownPositive`. Please let me know whether this is a satisfactory argument. I would prefer not to spend more time on a patch which is not likely to get accepted. The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons. The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares. However, I'm not sure if bailing out on unsigned compares will actually work for the case that motivated this patch. I am not sure whether you are not convinced that this is correct or you just want this to be documented properly. I was asking for a short comment documenting on why your line of reasoning is correct. sanjoy: > The only argument I can give here is that no matter how smart SCEV gets, it can only…
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions Ideally we should not rely on things like `isKnownPositive` or `isKnownNonPositive` being uniformly precise all of the time i.e. an implementation that returns false unconditionally every 10th time these are called should be "okay" (terrible, but correct). I am not relying on their preciseness, I am relying on their correctness. They can (and should) conservatively return false which would be fine. Returning false unconditionally every 10th time is not okay, it is clearly incorrect. What if I call it on the same value 10 times? Are you suggesting that it is ok for them to return `true` the first 9 times and `false` the 10th time for the same value? isKnownPositive (since it internally calls getRange) returns a cached result, and the cache for the stride can be cleared independently of the add recurrence using that stride, and by that time the IR (or the cache, actually) could have been modified in a way that SCEV would not be able to re-prove isKnownPositive. If this is the case, I believe this is an issue with ScalarEvolution's implementation. The cache for the stride cannot be invalidated independently of the add recurrence it occurs in. We should invalidate SCEV for all values dependent on stride like what we do in `forgetValue()` otherwise ScalarEvolution is internally inconsistent. If we do not do this, the wrap flags applied on the add recurrence can be wrong and can lead to miscompilation. If `isKnownPositive()` cannot be re-proven, then the previous trip count computation which was made on this assumption is now invalid and therefore should be invalidated as well. The "assume other parts of SCEV can be omniscient" is an important property, and I'm not okay with losing it without really good reasons. I can agree on the principle but I think you are applying it incorrectly. As a component, ScalarEvolution's different sub-components have to be in 'sync' with each other otherwise no correctness guarantees can be provided by the component as a whole. I am not limiting the omniscience of other parts of SCEV here, I am merely assuming that when they grow stronger, they grow stronger in sync. The only way forward here I can think of is to not do this optimization if the comparison is unsigned, since the only counter-example (such that the trip count formula is broken even with the preconditions satisfied) I've managed to produce is with unsigned compares. The example you provided invalidates both signed and unsigned IVs. This is the IR produced by LLVM for your example- define void @foo(i8* nocapture %A, i32 %n) local_unnamed_addr #0 { entry: br label %for.body for.body: ; preds = %entry, %for.body %i.07 = phi i8 [ 127, %entry ], [ %add, %for.body ] %idxprom = zext i8 %i.07 to i64 %arrayidx = getelementptr inbounds i8, i8* %A, i64 %idxprom store i8 %i.07, i8* %arrayidx, align 1 %add = add i8 %i.07, -127 %cmp = icmp sgt i8 %add, -1 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void } This is the SCEV form of %add- {0,+,-127}<nsw><%for.body> U: [-127,1) S: [-127,1) There is already an `nsw` flag on it, and an `nuw` flag can be applied as well. So I cannot restrict it to signed IVs. I was asking for a short comment documenting on why your line of reasoning is correct. I will add a short version of the proof in the comments if we can agree that this patch should be committed. pankajchawla: > Ideally we should not rely on things like `isKnownPositive` or `isKnownNonPositive` being…
		sanjoyUnsubmitted Not Done Reply Inline Actions I can agree on the principle but I think you are applying it incorrectly. As a component, ScalarEvolution's different sub-components have to be in 'sync' with each other otherwise no correctness guarantees can be provided by the component as a whole. I am not limiting the omniscience of other parts of SCEV here, I am merely assuming that when they grow stronger, they grow stronger in sync. You're right, I think I'm getting carried away in some excessively rigid chain of reasoning which isn't really applicable here. I'm fine with this change landing as-is (with a short comment informally describing why what you've changed is correct). sanjoy: > I can agree on the principle but I think you are applying it incorrectly. As a component…
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions Thanks! I have updated the comments. pankajchawla: Thanks! I have updated the comments.
		// b) loop is single exit with no side effects.
		//
		//
		// Precondition a) implies that if the stride is negative, this is a single
		// trip loop. The backedge taken count formula reduces to zero in this case.
		//
		// Precondition b) implies that the unknown stride cannot be zero otherwise
		// we have UB.
		//
		// The positive stride case is the same as isKnownPositive(Stride) returning
		// true (original behavior of the function).
		//
		// We want to make sure that the stride is truly unknown as there are edge
		// cases where ScalarEvolution propagates no wrap flags to the
		// post-increment/decrement IV even though the increment/decrement operation
		// itself is wrapping. The computed backedge taken count may be wrong in
		// such cases. This is prevented by checking that the stride is not known to
		// be either positive or non-positive. For example, no wrap flags are
		// propagated to the post-increment IV of this loop with a trip count of 2 -
		//
		// unsigned char i;
		// for(i=127; i<128; i+=129)
		// A[i] = i;
		//
		if (PredicatedIV \|\| !NoWrap \|\| isKnownNonPositive(Stride) \|\|
		!loopHasNoSideEffects(L))
return getCouldNotCompute();		return getCouldNotCompute();

// Avoid proven overflow cases: this will ensure that the backedge taken count		// Avoid proven overflow cases: this will ensure that the backedge taken count
// will not generate any unsigned overflow. Relaxed no-overflow conditions		// will not generate any unsigned overflow. Relaxed no-overflow conditions
// exploit NoWrapFlags, allowing to optimize in presence of undefined		// exploit NoWrapFlags, allowing to optimize in presence of undefined
// behaviors like the case of C language.		// behaviors like the case of C language.
if (!Stride->isOne() && doesIVOverflowOnLT(RHS, Stride, IsSigned, NoWrap))		if (!Stride->isOne() && doesIVOverflowOnLT(RHS, Stride, IsSigned, NoWrap))
		sanjoyUnsubmitted Not Done Reply Inline Actions I just noticed that `doesIVOverflowOnLT` only works for a positive stride, can you please take a look? For a non-positive `Stride`, I think we can make `doesIVOverflowOnLT` just return `NoWrap`. sanjoy: I just noticed that `doesIVOverflowOnLT` only works for a positive stride, can you please take…
return getCouldNotCompute();		return getCouldNotCompute();
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions The comments for this function imply that it expects a positive stride- `Verify if an linear IV with positive stride can overflow when...` I think this makes sense because the condition in the loop latch is of the form (i < N). I added an assert to the function verifying that the incoming stride is indeed positive. This check seems to be useless for unknown strides so I skip it. pankajchawla: The comments for this function imply that it expects a positive stride- `Verify if an linear IV…

ICmpInst::Predicate Cond = IsSigned ? ICmpInst::ICMP_SLT		ICmpInst::Predicate Cond = IsSigned ? ICmpInst::ICMP_SLT
: ICmpInst::ICMP_ULT;		: ICmpInst::ICMP_ULT;
const SCEV *Start = IV->getStart();		const SCEV *Start = IV->getStart();
const SCEV *End = RHS;		const SCEV *End = RHS;
if (!isLoopEntryGuardedByCond(L, Cond, getMinusSCEV(Start, Stride), RHS))		if (!isLoopEntryGuardedByCond(L, Cond, getMinusSCEV(Start, Stride), RHS))
End = IsSigned ? getSMaxExpr(RHS, Start) : getUMaxExpr(RHS, Start);		End = IsSigned ? getSMaxExpr(RHS, Start) : getUMaxExpr(RHS, Start);

		eli.friedmanUnsubmitted Not Done Reply Inline Actions I don't follow why this check is necessary. eli.friedman: I don't follow why this check is necessary.
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions Sorry, this is my first time using Phabricator. I replied to this in my non-inline comment. pankajchawla: Sorry, this is my first time using Phabricator. I replied to this in my non-inline comment.
const SCEV *BECount = computeBECount(getMinusSCEV(End, Start), Stride, false);		const SCEV *BECount = computeBECount(getMinusSCEV(End, Start), Stride, false);
		sanjoyUnsubmitted Not Done Reply Inline Actions Here and in the comment in the test case, we're not assuming the stride is positive, but that the trip count math is correct even if stride is negative given that the conditions we checked above are correct; so I think the comment and the variable name needs to change. sanjoy: Here and in the comment in the test case, we're not assuming the stride is positive, but that…
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions I think this check is unnecessary as the computed backedge taken count is correct for single trip do-while loops as well. For a do-while loop like this- int i = init; do { A[i] = i; i += s; } while (i<n); The computed backedge taken count is: ((-1 + (-1 * %init) + ((%init + %s) smax %n)) /u %s) This seems correct because if %n is less than (%init + %s), it reduces to ((-1 + %s) /u %s) ==> 0. I will remove this check. pankajchawla: I think this check is unnecessary as the computed backedge taken count is correct for single…

APInt MinStart = IsSigned ? getSignedRange(Start).getSignedMin()		APInt MinStart = IsSigned ? getSignedRange(Start).getSignedMin()
: getUnsignedRange(Start).getUnsignedMin();		: getUnsignedRange(Start).getUnsignedMin();

APInt MinStride = IsSigned ? getSignedRange(Stride).getSignedMin()		APInt MinStride = IsSigned ? getSignedRange(Stride).getSignedMin()
: getUnsignedRange(Stride).getUnsignedMin();		: getUnsignedRange(Stride).getUnsignedMin();

unsigned BitWidth = getTypeSizeInBits(LHS->getType());		unsigned BitWidth = getTypeSizeInBits(LHS->getType());
APInt Limit = IsSigned ? APInt::getSignedMaxValue(BitWidth) - (MinStride - 1)		APInt Limit = IsSigned ? APInt::getSignedMaxValue(BitWidth) - (MinStride - 1)
: APInt::getMaxValue(BitWidth) - (MinStride - 1);		: APInt::getMaxValue(BitWidth) - (MinStride - 1);

// Although End can be a MAX expression we estimate MaxEnd considering only		// Although End can be a MAX expression we estimate MaxEnd considering only
		sanjoyUnsubmitted Not Done Reply Inline Actions I'd feel safer if you short-circuited the entire max becount computation based on `PositiveStride`, since `MinStride` is no longer what it says on the tin ("the minimum value of `Stride`). (To me) it basically looks like the expression below that computes the max be count implicitly assumes a positive stride, and forking the logic more explicitly will be more obvious. If that's a problem, I'd suggest at least renaming `MinStride` to something more in tune with what it represents now (perhaps `StrideForMaxBECount`?). sanjoy: I'd feel safer if you short-circuited the entire max becount computation based on…
// the case End = RHS. This is safe because in the other case (End - Start)		// the case End = RHS. This is safe because in the other case (End - Start)
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions I renamed `MinStride` to `StrideForMaxBECount`. I would have had to duplicate code to completely separate max backedge count computation for the two cases. pankajchawla: I renamed `MinStride` to `StrideForMaxBECount`. I would have had to duplicate code to…
// is zero, leading to a zero maximum backedge taken count.		// is zero, leading to a zero maximum backedge taken count.
APInt MaxEnd =		APInt MaxEnd =
		sanjoyUnsubmitted Not Done Reply Inline Actions If it's all the same to you, I'd suggest avoiding a nested ternary and using an `if` instead. sanjoy: If it's all the same to you, I'd suggest avoiding a nested ternary and using an `if` instead.
		pankajchawlaAuthorUnsubmitted Not Done Reply Inline Actions I will change it to an if. pankajchawla: I will change it to an if.
IsSigned ? APIntOps::smin(getSignedRange(RHS).getSignedMax(), Limit)		IsSigned ? APIntOps::smin(getSignedRange(RHS).getSignedMax(), Limit)
: APIntOps::umin(getUnsignedRange(RHS).getUnsignedMax(), Limit);		: APIntOps::umin(getUnsignedRange(RHS).getUnsignedMax(), Limit);

const SCEV *MaxBECount;		const SCEV *MaxBECount;
if (isa<SCEVConstant>(BECount))		if (isa<SCEVConstant>(BECount))
MaxBECount = BECount;		MaxBECount = BECount;
else		else
MaxBECount = computeBECount(getConstant(MaxEnd - MinStart),		MaxBECount = computeBECount(getConstant(MaxEnd - MinStart),
▲ Show 20 Lines • Show All 1,760 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/trip-count-unknown-stride.ll

				; RUN: opt < %s -analyze -scalar-evolution \| FileCheck %s

				; ScalarEvolution should be able to compute trip count of the loop by proving
				; that this is not an infinite loop with side effects.

				; CHECK: Determining loop execution counts for: @foo1
				; CHECK: backedge-taken count is ((-1 + %n) /u %s)

				target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"

				; Function Attrs: norecurse nounwind
				define void @foo1(i32* nocapture %A, i32 %n, i32 %s) #0 {
				entry:
				%cmp4 = icmp sgt i32 %n, 0
				br i1 %cmp4, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%i.05 = phi i32 [ %add, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.05
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%add = add nsw i32 %i.05, %s
				%cmp = icmp slt i32 %add, %n
				br i1 %cmp, label %for.body, label %for.end

				for.end: ; preds = %for.body, %entry
				ret void
				}


				; Check that we are able to compute trip count of a loop without an entry guard.
				; CHECK: Determining loop execution counts for: @foo2
				; CHECK: backedge-taken count is ((-1 + (%n smax %s)) /u %s)

				; Function Attrs: norecurse nounwind
				define void @foo2(i32* nocapture %A, i32 %n, i32 %s) #0 {
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%i.05 = phi i32 [ %add, %for.body ], [ 0, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.05
				%0 = load i32, i32* %arrayidx, align 4
				%inc = add nsw i32 %0, 1
				store i32 %inc, i32* %arrayidx, align 4
				%add = add nsw i32 %i.05, %s
				%cmp = icmp slt i32 %add, %n
				br i1 %cmp, label %for.body, label %for.end

				for.end: ; preds = %for.body, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] trip count calculation for loops with unknown strideClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 67389

llvm/include/llvm/Analysis/ScalarEvolution.h

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/ScalarEvolution/trip-count-unknown-stride.ll

[SCEV] trip count calculation for loops with unknown stride
ClosedPublic