This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
2
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
4/16
ScalarEvolution.cpp
-
test/Analysis/ScalarEvolution/
-
Analysis/
-
ScalarEvolution/
-
2008-11-18-Stride2.ll
-
trip-count-unknown-stride.ll

Differential D105216

[ScalarEvolution] Fix overflow in computeBECount.
ClosedPublic

Authored by efriedma on Jun 30 2021, 11:17 AM.

Download Raw Diff

Details

Reviewers

reames
mkazantsev
fhahn
nikic

Commits

rGcbba71bfb50f: [ScalarEvolution] Fix overflow in computeBECount.
rG5b350183cdab: [ScalarEvolution] Fix overflow in computeBECount.

Summary

The current implementation of computeBECount doesn't account for the possibility that adding "Stride - 1" to Delta might overflow. For almost all loops, it doesn't, but it's not actually proven anywhere.

To deal with this, use a variety of tricks to try to prove that the addition doesn't overflow. If the proof is impossible, use an alternate sequence which never overflows.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

efriedma created this revision.Jun 30 2021, 11:17 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 30 2021, 11:17 AM

efriedma requested review of this revision.Jun 30 2021, 11:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 11:17 AM

I'm working on a very similar patch as we speak, so you found a knowledgeable reviewer. Your test diff is a lot smaller than my current one, so maybe you're on to something. Unfortunately, I'm about to leave for the day, so I won't be able to get you detailed feedback until at least tomorrow.

A couple of quick comments.

The code does disallow stride == 0 on the howManyLessThans path. The logic is explained in the !PositiveStride path, but the comments are really confusing, so let me explain again. For a non-positive exit, we require that this be the sole exit and the current condition control that exit. (Indirectly through NoWrap) As a result, if the stride was zero, then the loop must either a) be infinite, or b) exit on the first iteration. The !loopIsFiniteByAssumption check requires a zero stride to exit on the first iteration. (Ah, I think I see the issue you did now. The stride can be zero iff the exit is taken on first iteration. If this was your chain of thought, can you clarify submission comment?)

llvm/lib/Analysis/ScalarEvolution.cpp
11630	Style comment: I'd suggest pulling out a function getUDivCeiling(Delta,Stride,MayOverflow). I found (in my WIP patch) that this really helps readability. Once you have that, you can use early return via tail call to helper for most of the above.

reames added inline comments.Jun 30 2021, 12:10 PM

llvm/lib/Analysis/ScalarEvolution.cpp
11639	While true, this is the same set of facts which allowed you to avoid the max for the start expression. As such, you've already gotten most of the benefit.
11763–11764	This check needs removed. The problem is that this isn't doing what the comment says it is. The actual condition to prove the backedge is taken on the first iteration is OrigStart Cond OrigRHS. I have a test case which this check causes us to generate wrong code because Start-Stride < RHS, but Start < RHS does not hold. (Actually, what happened to that test case? I thought I'd committed that, but I don't see it in my commit history. Need to find that.) When I tried removing this without also fixing the overflow issue, I got bad results. I don't fully remember why atm, will try to reconstruct that and share it.

Harbormaster completed remote builds in B111813: Diff 355634.Jun 30 2021, 12:58 PM

In D105216#2851010, @reames wrote:

Your test diff is a lot smaller than my current one, so maybe you're on to something.

I found that I needed a wide variety of heuristics to handle various cases in the regression tests. This is the source of most of the complexity in this patch.

(Ah, I think I see the issue you did now. The stride can be zero iff the exit is taken on first iteration. If this was your chain of thought, can you clarify submission comment?)

I'll clarify the commit message, sure.

llvm/lib/Analysis/ScalarEvolution.cpp
11630	Hmm... Maybe I'll just make the computation of MayOverflow a lambda.
11763–11764	Okay. We can deal with this separately, I think?

efriedma edited the summary of this revision. (Show Details)Jun 30 2021, 1:06 PM

efriedma edited the summary of this revision. (Show Details)Jul 2 2021, 3:57 PM

Address review comments.

Harbormaster completed remote builds in B112293: Diff 356305.Jul 2 2021, 6:03 PM

How much worse does it become if we simply check that start <= MAX_INT - step and bail if we can't prove it?

llvm/include/llvm/Analysis/ScalarEvolution.h
2053	I feel it a bit confusing to pass end before start. Reverse order maybe?

In D105216#2858946, @mkazantsev wrote:

How much worse does it become if we simply check that start <= MAX_INT - step and bail if we can't prove it?

I assume that isn't the expression you meant to write?

Practically, I expect the most important cases to recognize are:

Power-of-two stride.
Post-increment loops: Start == Stride. (This is a special case of Start >= Stride - 1).

I'm not sure how well isLoopEntryGuardedByCond/isKnownViaNonRecursiveReasoning would do if I just throw Start >= Stride - 1 for unsigned, or the signed equivalent, at it. I can try, I guess.

Detailed comments inline. The two common issues running through are:

The use of IsSigned confuses me. We should be computing an unsigned expression at this point, so why does the signedness of the end max expression matter? I think I finally figured it out but a change to the code structure and comments would really help. My current understanding is that IsSigned == true implies (End >=s Start) and !IsSigned implies (End >=u Start) as preconditions of the method. I'd just state that explicitly in comment form.
We can have an exit which is dynamically dead, produces poison, and overflows in the computation. See the long explanation in the first comment. As far as I can tell, this only applies to the first case.

Sorry this took so long.

llvm/include/llvm/Analysis/ScalarEvolution.h
2041	... if this exit is taken
llvm/lib/Analysis/ScalarEvolution.cpp
11548	I'm not sure this holds, let me walk you through why. There is no guarantee that this exit must be taken. Instead, some other exit could be taken. The cornercase here is when we have an exit (which won't be taken) whose no-wrap IV does in fact wrap on the last iteration (producing poison), but whose value we don't branch on (e.g. no UB) because an earlier exit is taken instead. I believe this means the minimum BTC we can report for the (untaken) exit is 2^BW/Stride. (e.g. so it must be >= the exit count of the taken exit) I believe that invalidates your inequality, and thus your result. We can patch this in a couple of ways: If ControlsExit is true, then we know this condition is branched on by the sole exit. Thus, no other exit, and logic holds. If we know this condition is branched on by any latch dominating exit, then we know that the execution of iteration 2^BW/Stride must execute UB if no earlier exit is taken. Thus, either this exit is taken on a previous iteration, or another exit is taken up to that iteration. (I got stuck here, but it really seems like we should be able to do case analysis?) Aside: If we already know the other exit count is small, it's a real shame we can't use that here... Maybe we can do a two step analysis and simplify using min approach? Just a thought.
11563	Aside: None of the logic below appears to rely on no-wrap facts on the IV. Note sure if it's worth it, but should we reflect that in code structure somehow?
11568	Code structure wise, I request you pass in the BaseStart explicitly. This parsing of expressions is confusing.
11582	The last line of this comment confuses me. Why is this relevant? Otherwise, the reasoning in this case does appear to hold.
11586	According to my brute force program, your reasoning does hold. I find the isSigned usage very very confusing. Please write out the implied predicates.
11601	This appears to hold, though I didn't follow the logic TBH. I resorted to writing a small program to brute force the i8 space. :) Please state the !IsSigned precondition in the comment in End >=u Start form.
11607	Again, please state the precondition that IsSigned implies. Your code is correct, but the comment is very hard to follow.

efriedma added inline comments.Jul 7 2021, 3:15 PM

llvm/lib/Analysis/ScalarEvolution.cpp
11548	If ControlsExit is false, and stride isn't one, it's hard to get here. I think the only way is if canIVOverflowOnLT returns false, which I think is enough to prevent any trouble here? Maybe worth going into in the comment, though.
11601	It's basically just proving "end + (stride-1)" doesn't overflow. I'll try to clarify the comment.

Spent some more time on this.

I discovered that some of the checks were not really necessary. I think I must have added a check, then didn't realize it became unnecessary as I added other checks. So it's down to three relatively straightforward checks. We can re-add the other checks if we discover practical cases where they're necessary.

I rewrote the power-of-two proof; hopefully makes it more clear that it's actually correct. And I clarified the signed vs. unsigned on the other checks.

Separated out getUDivCeilSCEV() as a separate function. Makes this patch a little easier to read, and hopefully the general utility is useful for other cases in the future.

Harbormaster completed remote builds in B112909: Diff 357123.Jul 7 2021, 8:56 PM

reames added inline comments.Jul 8 2021, 9:46 AM

llvm/lib/Analysis/ScalarEvolution.cpp
11548	I believe you are correct, though I'll note I really dislike code whose correctness depends on understanding a non-trivial invariant through a large batch of complicated code. Please try to comment this clearly!

LGTM

Thank you for pushing this through.

I'll have a couple of small followups for review to tweak comments and code structure, but what you've got is a huge improvement and I want to get this in. It unblocks several other changes which improve opt quality.

This revision is now accepted and ready to land.Jul 8 2021, 9:55 AM

This revision was landed with ongoing or failed builds.Jul 8 2021, 10:10 AM

Closed by commit rG5b350183cdab: [ScalarEvolution] Fix overflow in computeBECount. (authored by efriedma). · Explain Why

This revision was automatically updated to reflect the committed changes.

efriedma added a commit: rG5b350183cdab: [ScalarEvolution] Fix overflow in computeBECount..

I’ve bisected a miscompilation on aarch64 down to this commit. I’ll follow up with a repro later…

uabelho added a subscriber: uabelho.Jul 9 2021, 1:06 AM

Yeah, we also found issue on PowerPC target. Reducing a small case:

__attribute__((noinline))int foo(int *arr)
{
  int prime = 0, k = 0;

  for(int i = 0; i < 8191; i++) {
    prime = i + i + 1; 
    k = i + prime;
    for (int j = k; j < 8191; j += prime)
      arr[j] = 0;
  }
  return 0;
}
int main(void)
{
  int arr[8191];

  foo (arr);
  return 0;
}

$ clang -O2 t.cpp ; ./a.out
Segmentation fault (core dumped)

-fno-vectorize can make above case pass. Initial investigation shows that after this patch, the loop execution count for the vectorized loop now can be wrap, so generate unexpected behaviour.

Hi,

We see a miscompile as well.
It goes wrong for my out of tree target so I don't have a reproducer for an in tree target but maybe some info may help anyway.
I have a function containing the following two nested loops:

void foo()
{
    #define MAX 2048

    static unsigned a[MAX];

    unsigned count = 2;

    for (unsigned i = 0; i < MAX; ++i)
    {
        if (a[i] == 0)
        {
            unsigned p = i + i + 3;
            count++;
            for (unsigned j = i; j < MAX; j += p)
            {
                a[j] = 1;
            }
        }
    }
    assert(count == 565);
}

With some debug printouts from the compiler I see that before we got the following from SCEV for the inner loop:

Exit count: ({2047,+,-1}<%for.body> /u {3,+,2}<nuw><nsw><%for.body>)
Trip count: (1 + ({2047,+,-1}<%for.body> /u {3,+,2}<nuw><nsw><%for.body>))<nuw><nsw>
Trip count bound: 16 bits, [1,684)

and with the patch we get

Exit count: ((((-1 * (1 umin {2045,+,-3}<%for.body>))<nuw><nsw> + {2045,+,-3}<%for.body>) /u {3,+,2}<nuw><nsw><%for.body>) + (1 umin {2045,+,-3}<%for.body>))
Trip count: (1 + (((-1 * (1 umin {2045,+,-3}<%for.body>))<nuw><nsw> + {2045,+,-3}<%for.body>) /u {3,+,2}<nuw><nsw><%for.body>) + (1 umin {2045,+,-3}<%for.body>))
Trip count bound: 16 bits, [1,21848)

With some printouts in the outer and inner loop I also see that when i is 682, then something goes wrong for the inner loop and it executes for the following (way too many) j values:

outer. i: 682
  inner. j: 682
  inner. j: 2049
  inner. j: 3416
  inner. j: 4783
  inner. j: 6150
  [...]
  inner. j: 63564
  inner. j: 64931
  inner. j: 762
outer. i: 683

The case I'm observing is within this translation unit: https://martin.st/temp/gsmdec-preproc.c

There's some difference visible in the assembly generated by clang -target aarch64-linux-gnu -c -O3 gsmdec-preproc.c, I haven't exactly pinpointed what part of it is that breaks.

If you have access to aarch64 linux, the issue can be reproduced at runtime with these commands:

git clone git://source.ffmpeg.org/ffmpeg
cd ffmpeg
./configure --cc=clang --samples=/path/to/empty/dir
make fate-rsync # sync data samples for running tests, into the path specified above
make -j$(nproc) fate-g723_1-dec-1

The error in this case is in libavformat/gsmdec.o. There's also a couple other miscompilations in there, visible in make fate-sub-vplayer and make fate-wmavoice-7k, but I haven't pinpointed which translation unit those errors stem from.

mstorsjo added a reverting change: rGe479777d3c8e: Revert "[ScalarEvolution] Fix overflow in computeBECount.".Jul 9 2021, 4:27 AM

lkail added a subscriber: lkail.Jul 9 2021, 5:12 AM

I think the miscompile is the isLoopEntryGuardedByCond issue noted in https://reviews.llvm.org/D105216#2851010 . Looking.

efriedma reopened this revision.Jul 9 2021, 12:15 PM

This revision is now accepted and ready to land.Jul 9 2021, 12:15 PM

Updated to fix the miscompile. This unfortunately involves yet another complicated proof...

More refactoring. Unbreak howManyGreaterThans.

Harbormaster completed remote builds in B113324: Diff 357674.Jul 9 2021, 7:02 PM

reames mentioned this in D105209: [SCEVExpander] Discount cost of umin(1, x) expressions.Jul 12 2021, 8:44 AM

@reames Please take another look before I merge; I ended up substantially refactoring the patch to deal with the miscompile.

I've glanced at this, and the refactor/new code seems very complicated to me. My LGTM no longer applies. I will try to take a closer look at the reasoning in the newly added parts, but I *strongly* request you look for ways to factor the code so that the reasoning is modular. If we can split this into two or more patches, that would be my strong preference.

but I *strongly* request you look for ways to factor the code so that the reasoning is modular

I really don't see any way to unify the two different proof strategies. The proofs are trying to reason about overflow on completely different operations.

If we can split this into two or more patches, that would be my strong preference.

I can split off the introduction of getUDivCeilSCEV() into a separate patch, and I can introduce the new isLoopEntryGuardedByCond(L, CondGT, Start, StartMinusStride) check in a separate patch. I'm not sure the result is easier to review, but I guess it would make it easier to figure out regressions.

efriedma mentioned this in D105865: [ScalarEvolution][NFC] Refactor howManyLessThans.Jul 12 2021, 7:35 PM

efriedma added a parent revision: D105865: [ScalarEvolution][NFC] Refactor howManyLessThans.

Posted D105865 with just cleanup. Then I'll post a followup with the missing isLoopEntryGuardedByCond() checks. Then I'll rebase this (so at that point, we'll just be adding the MayAddOverflow checks). Let me know if that makes sense.

reames mentioned this in rGe4b43973fbd4: [ScalarEvolution] Fix overflow when computing max trip counts.Jul 13 2021, 10:01 AM

In D105216#2872827, @efriedma wrote:

but I *strongly* request you look for ways to factor the code so that the reasoning is modular

I really don't see any way to unify the two different proof strategies. The proofs are trying to reason about overflow on completely different operations.

So, I think this might be part of our problem. The code and comments to date have not made your last sentence here obvious to me as the reviewer.

To be clear, I have not seen a concise description of what the bug in the original patch *was*. That makes it hard to review a patch which is supposed to fix it.

If we can split this into two or more patches, that would be my strong preference.

I can split off the introduction of getUDivCeilSCEV() into a separate patch, and I can introduce the new isLoopEntryGuardedByCond(L, CondGT, Start, StartMinusStride) check in a separate patch. I'm not sure the result is easier to review, but I guess it would make it easier to figure out regressions.

Splitting patches to isolate regressions is good!

I went ahead and landed e4b43973. This handles only the constant bounds cases using the new ceiling operation. This is an easy sub-case because a) stride is known positive, b) it'll constant fold thus not changing the result at all, and c) it doesn't appear to be influenced by your most recent proof. If you rebase over that, it should help a bit to reduce complexity.

In D105216#2874452, @reames wrote:

In D105216#2872827, @efriedma wrote:

but I *strongly* request you look for ways to factor the code so that the reasoning is modular

I really don't see any way to unify the two different proof strategies. The proofs are trying to reason about overflow on completely different operations.

So, I think this might be part of our problem. The code and comments to date have not made your last sentence here obvious to me as the reviewer.

To be clear, I have not seen a concise description of what the bug in the original patch *was*. That makes it hard to review a patch which is supposed to fix it.

Oh, sorry, I thought you understood the issue since you sort of pointed it out originally. The issue is the case where End < Start.

If we prove RHS >= Start, this is impossible. If we use End = max(RHS, Start), this is also impossible. If we only proved RHS > Start - Stride, it's possible that Start - Stride < End < Start. In this case, the backedge-taken count should be zero. But RHS - Start overflows, so getUDivCeilSCEV treats it as a very large positive number.

So we never want to use getUDivCeilSCEV if we're optimizing an expression by proving RHS > Start - Stride. As I outline in the proof in this patch, in this case, we can show that the backedge-taken count is actually ((RHS - 1) - (Start - Stride)) /u Stride, which is equivalent to the expression we would produce without this patch.

reames mentioned this in D105921: [SCEV] Handle zero stride correctly in howManyLessThans.Jul 13 2021, 11:19 AM

In D105216#2874703, @efriedma wrote:

So we never want to use getUDivCeilSCEV if we're optimizing an expression by proving RHS > Start - Stride. As I outline in the proof in this patch, in this case, we can show that the backedge-taken count is actually ((RHS - 1) - (Start - Stride)) /u Stride, which is equivalent to the expression we would produce without this patch.

It's this last bit which is non-obvious from the patch. You're using an entirely different formula for the BTC in this case, you should say so in the comments!

reames mentioned this in rG087310c71e5c: [SCEV] Strengthen inference of RHS > Start in howManyLessThans.Jul 13 2021, 11:54 AM

reames mentioned this in rG5ca9cf0e6b15: [tests] Precommit a test case from D105216.Jul 13 2021, 12:03 PM

reames mentioned this in rG4df591b5c960: [SCEV] Handle zero stride correctly in howManyLessThans.Jul 13 2021, 1:32 PM

Eli, I think I just had an "aha!" moment. Check my reasoning here, but I think we can greatly simplify the patch structure.

The most recent case you added, appears to simply be a specialization of ceiling(max(A,B)-A, C) where max(A,B) > A - C is known. I don't think we actually need to know anything about where A, B, and C come from for your lowering to be valid. I think we can also generalize this slightly into two pieces:

A general ceiling lowering when ceiling(X-A, C) where X > A - C
A pushdown rule for max(X,Y) > Z as X > Z && Y > Z ==> max(X,Y) > Z

This really tempts me to go back to the first version of your patch, and add another specialization of how to generate a faster ceiling.

What do you think?

llvm/lib/Analysis/ScalarEvolution.cpp
11845	I split off this part of the change into 087310c71e with a bit of restructuring.

The general formula for the backedge-taken count can be written, in arbitrary-precision arithmetic, as something like floor(max(End - Start + Stride - 1, 0) / Stride), I think. We could define a function to compute that, I guess, separate from howManyLessThans, and optimize based on that.

Granted, that's basically just taking this patch, and shoving the code into a different function. But it might be easier to reason about future changes.

reames mentioned this in D105942: [SCEV] Fix unsound reasoning in howManyLessThans.Jul 13 2021, 2:56 PM

Eli,

I went ahead and posted https://reviews.llvm.org/D105942 which splits this one step further, but in the process, I also convinced myself of the correctness of your revised patch. I'd be fine either landing the one I posted, then rebasing this into a smaller patch, or simply landing this one after rebase, and then posting a cleanup or two afterwards. Up to you.

I still like the idea of having all of this inside getUDivCeiling if possible, but doing that as cleanup once this lands seems reasonable.

efriedma mentioned this in rG205ed009a44c: [SCEV] Handle zero stride correctly in howManyLessThans.Jul 13 2021, 7:14 PM

reames mentioned this in rGa99d420a937b: [SCEV] Fix unsound reasoning in howManyLessThans.Jul 15 2021, 10:33 AM

reames mentioned this in D106083: [unittest] Exercise SCEV's udiv and udiv ceiling routines.Jul 15 2021, 10:56 AM

reames mentioned this in rGb980d2f54bb6: [unittest] Exercise SCEV's udiv and udiv ceiling routines.Jul 15 2021, 11:59 AM

Eli, I think we're out of things which can be reasonable split here. If you want to rebase your patch - preferably the form close to what you committed before - I think we're ready for the last big piece to go in. We probably should wait a day or two to make sure the previous bits stick, but assuming they do, I don't see any reason to delay this further.

Rebased.

Harbormaster completed remote builds in B114337: Diff 359105.Jul 15 2021, 1:23 PM

LGTM

And thank you!

This revision is now accepted and ready to land.Jul 15 2021, 2:17 PM

reames added a child revision: D104066: [SCEV] Use knowledge of stride to prove loops finite for LT exit count computation.Jul 15 2021, 2:20 PM

reames added a child revision: D104140: [SCEV] Allow negative steps for LT exit count computation for unsigned comparisons.

This revision was landed with ongoing or failed builds.Jul 16 2021, 4:15 PM

Closed by commit rGcbba71bfb50f: [ScalarEvolution] Fix overflow in computeBECount. (authored by efriedma). · Explain Why

This revision was automatically updated to reflect the committed changes.

efriedma added a commit: rGcbba71bfb50f: [ScalarEvolution] Fix overflow in computeBECount..

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ScalarEvolution.h

7 lines

lib/

Analysis/

ScalarEvolution.cpp

129 lines

test/

Analysis/

ScalarEvolution/

2008-11-18-Stride2.ll

2 lines

trip-count-unknown-stride.ll

6 lines

Diff 359473

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 2,026 Lines • ▼ Show 20 Lines	private:
/// AddRec and the predicates as a pair, and caches this pair in		/// AddRec and the predicates as a pair, and caches this pair in
/// PredicatedSCEVRewrites.		/// PredicatedSCEVRewrites.
/// If the analysis is not successful, a mapping from the \p SymbolicPHI to		/// If the analysis is not successful, a mapping from the \p SymbolicPHI to
/// itself (with no predicates) is recorded, and a nullptr with an empty		/// itself (with no predicates) is recorded, and a nullptr with an empty
/// predicates vector is returned as a pair.		/// predicates vector is returned as a pair.
Optional<std::pair<const SCEV , SmallVector<const SCEVPredicate , 3>>>		Optional<std::pair<const SCEV , SmallVector<const SCEVPredicate , 3>>>
createAddRecFromPHIWithCastsImpl(const SCEVUnknown *SymbolicPHI);		createAddRecFromPHIWithCastsImpl(const SCEVUnknown *SymbolicPHI);

/// Compute the backedge taken count knowing the interval difference, and
/// the stride for an inequality. Result takes the form:
/// (Delta + (Stride - 1)) udiv Stride.
/// Caller must ensure that this expression either does not overflow or
/// that the result is undefined if it does.
const SCEV computeBECount(const SCEV Delta, const SCEV *Stride);

/// Compute the maximum backedge count based on the range of values		/// Compute the maximum backedge count based on the range of values
/// permitted by Start, End, and Stride. This is for loops of the form		/// permitted by Start, End, and Stride. This is for loops of the form
/// {Start, +, Stride} LT End.		/// {Start, +, Stride} LT End.
///		///
/// Precondition: the induction variable is known to be positive. We don't		/// Precondition: the induction variable is known to be positive. We don't
/// assert these preconditions so please be careful.		/// assert these preconditions so please be careful.
const SCEV computeMaxBECountForLT(const SCEV Start, const SCEV *Stride,		const SCEV computeMaxBECountForLT(const SCEV Start, const SCEV *Stride,
		reamesUnsubmitted Not Done Reply Inline Actions ... if this exit is taken reames: ... if this exit is taken
const SCEV *End, unsigned BitWidth,		const SCEV *End, unsigned BitWidth,
bool IsSigned);		bool IsSigned);

/// Verify if an linear IV with positive stride can overflow when in a		/// Verify if an linear IV with positive stride can overflow when in a
/// less-than comparison, knowing the invariant term of the comparison,		/// less-than comparison, knowing the invariant term of the comparison,
/// the stride.		/// the stride.
bool canIVOverflowOnLT(const SCEV RHS, const SCEV Stride, bool IsSigned);		bool canIVOverflowOnLT(const SCEV RHS, const SCEV Stride, bool IsSigned);

/// Verify if an linear IV with negative stride can overflow when in a		/// Verify if an linear IV with negative stride can overflow when in a
/// greater-than comparison, knowing the invariant term of the comparison,		/// greater-than comparison, knowing the invariant term of the comparison,
/// the stride.		/// the stride.
bool canIVOverflowOnGT(const SCEV RHS, const SCEV Stride, bool IsSigned);		bool canIVOverflowOnGT(const SCEV RHS, const SCEV Stride, bool IsSigned);
		mkazantsevUnsubmitted Not Done Reply Inline Actions I feel it a bit confusing to pass end before start. Reverse order maybe? mkazantsev: I feel it a bit confusing to pass end before start. Reverse order maybe?

/// Get add expr already created or create a new one.		/// Get add expr already created or create a new one.
const SCEV getOrCreateAddExpr(ArrayRef<const SCEV > Ops,		const SCEV getOrCreateAddExpr(ArrayRef<const SCEV > Ops,
SCEV::NoWrapFlags Flags);		SCEV::NoWrapFlags Flags);

/// Get mul expr already created or create a new one.		/// Get mul expr already created or create a new one.
const SCEV getOrCreateMulExpr(ArrayRef<const SCEV > Ops,		const SCEV getOrCreateMulExpr(ArrayRef<const SCEV > Ops,
SCEV::NoWrapFlags Flags);		SCEV::NoWrapFlags Flags);
▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,522 Lines • ▼ Show 20 Lines	const SCEV ScalarEvolution::getUDivCeilSCEV(const SCEV N, const SCEV *D) {
// umin(N, 1) + floor((N - umin(N, 1)) / D)		// umin(N, 1) + floor((N - umin(N, 1)) / D)
// This is equivalent to "1 + floor((N - 1) / D)" for N != 0. The umin		// This is equivalent to "1 + floor((N - 1) / D)" for N != 0. The umin
// expression fixes the case of N=0.		// expression fixes the case of N=0.
const SCEV *MinNOne = getUMinExpr(N, getOne(N->getType()));		const SCEV *MinNOne = getUMinExpr(N, getOne(N->getType()));
const SCEV *NMinusOne = getMinusSCEV(N, MinNOne);		const SCEV *NMinusOne = getMinusSCEV(N, MinNOne);
return getAddExpr(MinNOne, getUDivExpr(NMinusOne, D));		return getAddExpr(MinNOne, getUDivExpr(NMinusOne, D));
}		}

const SCEV ScalarEvolution::computeBECount(const SCEV Delta,
const SCEV *Step) {
const SCEV *One = getOne(Step->getType());
Delta = getAddExpr(Delta, getMinusSCEV(Step, One));
return getUDivExpr(Delta, Step);
}

const SCEV ScalarEvolution::computeMaxBECountForLT(const SCEV Start,		const SCEV ScalarEvolution::computeMaxBECountForLT(const SCEV Start,
const SCEV *Stride,		const SCEV *Stride,
const SCEV *End,		const SCEV *End,
unsigned BitWidth,		unsigned BitWidth,
bool IsSigned) {		bool IsSigned) {

assert(!isKnownNonPositive(Stride) &&		assert(!isKnownNonPositive(Stride) &&
"Stride is expected strictly positive!");		"Stride is expected strictly positive!");
// Calculate the maximum backedge count based on the range of values		// Calculate the maximum backedge count based on the range of values
// permitted by Start, End, and Stride.		// permitted by Start, End, and Stride.
const SCEV *MaxBECount;		const SCEV *MaxBECount;
APInt MinStart =		APInt MinStart =
IsSigned ? getSignedRangeMin(Start) : getUnsignedRangeMin(Start);		IsSigned ? getSignedRangeMin(Start) : getUnsignedRangeMin(Start);

APInt StrideForMaxBECount =		APInt StrideForMaxBECount =
IsSigned ? getSignedRangeMin(Stride) : getUnsignedRangeMin(Stride);		IsSigned ? getSignedRangeMin(Stride) : getUnsignedRangeMin(Stride);

// We already know that the stride is positive, so we paper over conservatism		// We already know that the stride is positive, so we paper over conservatism
		reamesUnsubmitted Not Done Reply Inline Actions I'm not sure this holds, let me walk you through why. There is no guarantee that this exit must be taken. Instead, some other exit could be taken. The cornercase here is when we have an exit (which won't be taken) whose no-wrap IV does in fact wrap on the last iteration (producing poison), but whose value we don't branch on (e.g. no UB) because an earlier exit is taken instead. I believe this means the minimum BTC we can report for the (untaken) exit is 2^BW/Stride. (e.g. so it must be >= the exit count of the taken exit) I believe that invalidates your inequality, and thus your result. We can patch this in a couple of ways: If ControlsExit is true, then we know this condition is branched on by the sole exit. Thus, no other exit, and logic holds. If we know this condition is branched on by any latch dominating exit, then we know that the execution of iteration 2^BW/Stride must execute UB if no earlier exit is taken. Thus, either this exit is taken on a previous iteration, or another exit is taken up to that iteration. (I got stuck here, but it really seems like we should be able to do case analysis?) Aside: If we already know the other exit count is small, it's a real shame we can't use that here... Maybe we can do a two step analysis and simplify using min approach? Just a thought. reames: I'm not sure this holds, let me walk you through why. There is no guarantee that this exit…
		efriedmaAuthorUnsubmitted Done Reply Inline Actions If ControlsExit is false, and stride isn't one, it's hard to get here. I think the only way is if canIVOverflowOnLT returns false, which I think is enough to prevent any trouble here? Maybe worth going into in the comment, though. efriedma: If ControlsExit is false, and stride isn't one, it's hard to get here. I think the only way is…
		reamesUnsubmitted Not Done Reply Inline Actions I believe you are correct, though I'll note I really dislike code whose correctness depends on understanding a non-trivial invariant through a large batch of complicated code. Please try to comment this clearly! reames: I believe you are correct, though I'll note I really dislike code whose correctness depends on…
// in our range computation by forcing StrideForMaxBECount to be at least one.		// in our range computation by forcing StrideForMaxBECount to be at least one.
// In theory this is unnecessary, but we expect MaxBECount to be a		// In theory this is unnecessary, but we expect MaxBECount to be a
// SCEVConstant, and (udiv <constant> 0) is not constant folded by SCEV (there		// SCEVConstant, and (udiv <constant> 0) is not constant folded by SCEV (there
// is nothing to constant fold it to).		// is nothing to constant fold it to).
APInt One(BitWidth, 1, IsSigned);		APInt One(BitWidth, 1, IsSigned);
StrideForMaxBECount = APIntOps::smax(One, StrideForMaxBECount);		StrideForMaxBECount = APIntOps::smax(One, StrideForMaxBECount);

APInt MaxValue = IsSigned ? APInt::getSignedMaxValue(BitWidth)		APInt MaxValue = IsSigned ? APInt::getSignedMaxValue(BitWidth)
: APInt::getMaxValue(BitWidth);		: APInt::getMaxValue(BitWidth);
APInt Limit = MaxValue - (StrideForMaxBECount - 1);		APInt Limit = MaxValue - (StrideForMaxBECount - 1);

// Although End can be a MAX expression we estimate MaxEnd considering only		// Although End can be a MAX expression we estimate MaxEnd considering only
// the case End = RHS of the loop termination condition. This is safe because		// the case End = RHS of the loop termination condition. This is safe because
// in the other case (End - Start) is zero, leading to a zero maximum backedge		// in the other case (End - Start) is zero, leading to a zero maximum backedge
// taken count.		// taken count.
		reamesUnsubmitted Not Done Reply Inline Actions Aside: None of the logic below appears to rely on no-wrap facts on the IV. Note sure if it's worth it, but should we reflect that in code structure somehow? reames: Aside: None of the logic below appears to rely on no-wrap facts on the IV. Note sure if it's…
APInt MaxEnd = IsSigned ? APIntOps::smin(getSignedRangeMax(End), Limit)		APInt MaxEnd = IsSigned ? APIntOps::smin(getSignedRangeMax(End), Limit)
: APIntOps::umin(getUnsignedRangeMax(End), Limit);		: APIntOps::umin(getUnsignedRangeMax(End), Limit);

MaxBECount = getUDivCeilSCEV(getConstant(MaxEnd - MinStart) /* Delta */,		MaxBECount = getUDivCeilSCEV(getConstant(MaxEnd - MinStart) /* Delta */,
getConstant(StrideForMaxBECount) /* Step */);		getConstant(StrideForMaxBECount) /* Step */);
		reamesUnsubmitted Not Done Reply Inline Actions Code structure wise, I request you pass in the BaseStart explicitly. This parsing of expressions is confusing. reames: Code structure wise, I request you pass in the BaseStart explicitly. This parsing of…

return MaxBECount;		return MaxBECount;
}		}

ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::howManyLessThans(const SCEV LHS, const SCEV RHS,		ScalarEvolution::howManyLessThans(const SCEV LHS, const SCEV RHS,
const Loop *L, bool IsSigned,		const Loop *L, bool IsSigned,
bool ControlsExit, bool AllowPredicates) {		bool ControlsExit, bool AllowPredicates) {
SmallPtrSet<const SCEVPredicate *, 4> Predicates;		SmallPtrSet<const SCEVPredicate *, 4> Predicates;

const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);		const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);
bool PredicatedIV = false;		bool PredicatedIV = false;

if (!IV && AllowPredicates) {		if (!IV && AllowPredicates) {
		reamesUnsubmitted Not Done Reply Inline Actions The last line of this comment confuses me. Why is this relevant? Otherwise, the reasoning in this case does appear to hold. reames: The last line of this comment confuses me. Why is this relevant? Otherwise, the reasoning in…
// Try to make this an AddRec using runtime tests, in the first X		// Try to make this an AddRec using runtime tests, in the first X
// iterations of this loop, where X is the SCEV expression found by the		// iterations of this loop, where X is the SCEV expression found by the
// algorithm below.		// algorithm below.
IV = convertSCEVToAddRecWithPredicates(LHS, L, Predicates);		IV = convertSCEVToAddRecWithPredicates(LHS, L, Predicates);
		reamesUnsubmitted Not Done Reply Inline Actions According to my brute force program, your reasoning does hold. I find the isSigned usage very very confusing. Please write out the implied predicates. reames: According to my brute force program, your reasoning does hold. I find the isSigned usage very…
PredicatedIV = true;		PredicatedIV = true;
}		}

// Avoid weird loops		// Avoid weird loops
if (!IV \|\| IV->getLoop() != L \|\| !IV->isAffine())		if (!IV \|\| IV->getLoop() != L \|\| !IV->isAffine())
return getCouldNotCompute();		return getCouldNotCompute();

auto WrapType = IsSigned ? SCEV::FlagNSW : SCEV::FlagNUW;		auto WrapType = IsSigned ? SCEV::FlagNSW : SCEV::FlagNUW;
bool NoWrap = ControlsExit && IV->getNoWrapFlags(WrapType);		bool NoWrap = ControlsExit && IV->getNoWrapFlags(WrapType);
ICmpInst::Predicate Cond = IsSigned ? ICmpInst::ICMP_SLT : ICmpInst::ICMP_ULT;		ICmpInst::Predicate Cond = IsSigned ? ICmpInst::ICMP_SLT : ICmpInst::ICMP_ULT;

const SCEV Stride = IV->getStepRecurrence(this);		const SCEV Stride = IV->getStepRecurrence(this);

bool PositiveStride = isKnownPositive(Stride);		bool PositiveStride = isKnownPositive(Stride);

		reamesUnsubmitted Not Done Reply Inline Actions This appears to hold, though I didn't follow the logic TBH. I resorted to writing a small program to brute force the i8 space. :) Please state the !IsSigned precondition in the comment in End >=u Start form. reames: This appears to hold, though I didn't follow the logic TBH. I resorted to writing a small…
		efriedmaAuthorUnsubmitted Done Reply Inline Actions It's basically just proving "end + (stride-1)" doesn't overflow. I'll try to clarify the comment. efriedma: It's basically just proving "end + (stride-1)" doesn't overflow. I'll try to clarify the…
// Avoid negative or zero stride values.		// Avoid negative or zero stride values.
if (!PositiveStride) {		if (!PositiveStride) {
// We can compute the correct backedge taken count for loops with unknown		// We can compute the correct backedge taken count for loops with unknown
// strides if we can prove that the loop is not an infinite loop with side		// strides if we can prove that the loop is not an infinite loop with side
// effects. Here's the loop structure we are trying to handle -		// effects. Here's the loop structure we are trying to handle -
//		//
		reamesUnsubmitted Not Done Reply Inline Actions Again, please state the precondition that IsSigned implies. Your code is correct, but the comment is very hard to follow. reames: Again, please state the precondition that IsSigned implies. Your code is correct, but the…
// i = start		// i = start
// do {		// do {
// A[i] = i;		// A[i] = i;
// i += s;		// i += s;
// } while (i < end);		// } while (i < end);
//		//
// The backedge taken count for such loops is evaluated as -		// The backedge taken count for such loops is evaluated as -
// (max(end, start + stride) - start - 1) /u stride		// (max(end, start + stride) - start - 1) /u stride
//		//
// The additional preconditions that we need to check to prove correctness		// The additional preconditions that we need to check to prove correctness
// of the above formula is as follows -		// of the above formula is as follows -
//		//
// a) IV is either nuw or nsw depending upon signedness (indicated by the		// a) IV is either nuw or nsw depending upon signedness (indicated by the
// NoWrap flag).		// NoWrap flag).
// b) loop is single exit with no side effects.		// b) loop is single exit with no side effects.
//		//
//		//
// Precondition a) implies that if the stride is negative, this is a single		// Precondition a) implies that if the stride is negative, this is a single
// trip loop. The backedge taken count formula reduces to zero in this case.		// trip loop. The backedge taken count formula reduces to zero in this case.
//		//
// Precondition b) implies that the unknown stride cannot be zero otherwise		// Precondition b) implies that the unknown stride cannot be zero otherwise
// we have UB.		// we have UB.
//		//
		reamesUnsubmitted Not Done Reply Inline Actions Style comment: I'd suggest pulling out a function getUDivCeiling(Delta,Stride,MayOverflow). I found (in my WIP patch) that this really helps readability. Once you have that, you can use early return via tail call to helper for most of the above. reames: Style comment: I'd suggest pulling out a function getUDivCeiling(Delta,Stride,MayOverflow). I…
		efriedmaAuthorUnsubmitted Done Reply Inline Actions Hmm... Maybe I'll just make the computation of MayOverflow a lambda. efriedma: Hmm... Maybe I'll just make the computation of MayOverflow a lambda.
// The positive stride case is the same as isKnownPositive(Stride) returning		// The positive stride case is the same as isKnownPositive(Stride) returning
// true (original behavior of the function).		// true (original behavior of the function).
//		//
// We want to make sure that the stride is truly unknown as there are edge		// We want to make sure that the stride is truly unknown as there are edge
// cases where ScalarEvolution propagates no wrap flags to the		// cases where ScalarEvolution propagates no wrap flags to the
// post-increment/decrement IV even though the increment/decrement operation		// post-increment/decrement IV even though the increment/decrement operation
// itself is wrapping. The computed backedge taken count may be wrong in		// itself is wrapping. The computed backedge taken count may be wrong in
// such cases. This is prevented by checking that the stride is not known to		// such cases. This is prevented by checking that the stride is not known to
// be either positive or non-positive. For example, no wrap flags are		// be either positive or non-positive. For example, no wrap flags are
		reamesUnsubmitted Not Done Reply Inline Actions While true, this is the same set of facts which allowed you to avoid the max for the start expression. As such, you've already gotten most of the benefit. reames: While true, this is the same set of facts which allowed you to avoid the max for the start…
// propagated to the post-increment IV of this loop with a trip count of 2 -		// propagated to the post-increment IV of this loop with a trip count of 2 -
//		//
// unsigned char i;		// unsigned char i;
// for(i=127; i<128; i+=129)		// for(i=127; i<128; i+=129)
// A[i] = i;		// A[i] = i;
//		//
if (PredicatedIV \|\| !NoWrap \|\| isKnownNonPositive(Stride) \|\|		if (PredicatedIV \|\| !NoWrap \|\| isKnownNonPositive(Stride) \|\|
!loopIsFiniteByAssumption(L))		!loopIsFiniteByAssumption(L))
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	if (isa<SCEVCouldNotCompute>(Start))
return Start;		return Start;
}		}
if (RHS->getType()->isPointerTy()) {		if (RHS->getType()->isPointerTy()) {
RHS = getLosslessPtrToIntExpr(RHS);		RHS = getLosslessPtrToIntExpr(RHS);
if (isa<SCEVCouldNotCompute>(RHS))		if (isa<SCEVCouldNotCompute>(RHS))
return RHS;		return RHS;
}		}

const SCEV *End = RHS;
// When the RHS is not invariant, we do not know the end bound of the loop and		// When the RHS is not invariant, we do not know the end bound of the loop and
// cannot calculate the ExactBECount needed by ExitLimit. However, we can		// cannot calculate the ExactBECount needed by ExitLimit. However, we can
// calculate the MaxBECount, given the start, stride and max value for the end		// calculate the MaxBECount, given the start, stride and max value for the end
// bound of the loop (RHS), and the fact that IV does not overflow (which is		// bound of the loop (RHS), and the fact that IV does not overflow (which is
// checked above).		// checked above).
if (!isLoopInvariant(RHS, L)) {		if (!isLoopInvariant(RHS, L)) {
const SCEV *MaxBECount = computeMaxBECountForLT(		const SCEV *MaxBECount = computeMaxBECountForLT(
Start, Stride, RHS, getTypeSizeInBits(LHS->getType()), IsSigned);		Start, Stride, RHS, getTypeSizeInBits(LHS->getType()), IsSigned);
return ExitLimit(getCouldNotCompute() /* ExactNotTaken */, MaxBECount,		return ExitLimit(getCouldNotCompute() /* ExactNotTaken */, MaxBECount,
false /MaxOrZero/, Predicates);		false /MaxOrZero/, Predicates);
}		}
// If the backedge is taken at least once, then it will be taken
// (End-Start)/Stride times (rounded up to a multiple of Stride), where Start
// is the LHS value of the less-than comparison the first time it is evaluated
// and End is the RHS.
const SCEV *BECountIfBackedgeTaken =
computeBECount(getMinusSCEV(End, Start), Stride);

reamesUnsubmitted Not Done Reply Inline Actions This check needs removed. The problem is that this isn't doing what the comment says it is. The actual condition to prove the backedge is taken on the first iteration is OrigStart Cond OrigRHS. I have a test case which this check causes us to generate wrong code because Start-Stride < RHS, but Start < RHS does not hold. (Actually, what happened to that test case? I thought I'd committed that, but I don't see it in my commit history. Need to find that.) When I tried removing this without also fixing the overflow issue, I got bad results. I don't fully remember why atm, will try to reconstruct that and share it. reames: This check needs removed. The problem is that this isn't doing what the comment says it is.
efriedmaAuthorUnsubmitted Done Reply Inline Actions Okay. We can deal with this separately, I think? efriedma: Okay. We can deal with this separately, I think?
// We use the expression (max(End,Start)-Start)/Stride to describe the		// We use the expression (max(End,Start)-Start)/Stride to describe the
// backedge count, as if the backedge is taken at least once max(End,Start)		// backedge count, as if the backedge is taken at least once max(End,Start)
// is End and so the result is as above, and if not max(End,Start) is Start		// is End and so the result is as above, and if not max(End,Start) is Start
// so we get a backedge count of zero.		// so we get a backedge count of zero.
const SCEV *BECount = nullptr;		const SCEV *BECount = nullptr;
auto *StartMinusStride = getMinusSCEV(OrigStart, Stride);		auto *StartMinusStride = getMinusSCEV(OrigStart, Stride);
// Can we prove (max(RHS,Start) > Start - Stride?		// Can we prove (max(RHS,Start) > Start - Stride?
if (isLoopEntryGuardedByCond(L, Cond, StartMinusStride, Start) &&		if (isLoopEntryGuardedByCond(L, Cond, StartMinusStride, Start) &&
Show All 18 Lines	if (isLoopEntryGuardedByCond(L, Cond, StartMinusStride, Start) &&
// Our preconditions trivially imply no overflow in that form.		// Our preconditions trivially imply no overflow in that form.
const SCEV *MinusOne = getMinusOne(Stride->getType());		const SCEV *MinusOne = getMinusOne(Stride->getType());
const SCEV *Numerator =		const SCEV *Numerator =
getMinusSCEV(getAddExpr(RHS, MinusOne), StartMinusStride);		getMinusSCEV(getAddExpr(RHS, MinusOne), StartMinusStride);
if (!isa<SCEVCouldNotCompute>(Numerator)) {		if (!isa<SCEVCouldNotCompute>(Numerator)) {
BECount = getUDivExpr(Numerator, Stride);		BECount = getUDivExpr(Numerator, Stride);
}		}
}		}

		const SCEV *BECountIfBackedgeTaken = nullptr;
if (!BECount) {		if (!BECount) {
auto canProveRHSGreaterThanEqualStart = [&]() {		auto canProveRHSGreaterThanEqualStart = [&]() {
auto CondGE = IsSigned ? ICmpInst::ICMP_SGE : ICmpInst::ICMP_UGE;		auto CondGE = IsSigned ? ICmpInst::ICMP_SGE : ICmpInst::ICMP_UGE;
if (isLoopEntryGuardedByCond(L, CondGE, OrigRHS, OrigStart))		if (isLoopEntryGuardedByCond(L, CondGE, OrigRHS, OrigStart))
return true;		return true;

// (RHS > Start - 1) implies RHS >= Start.		// (RHS > Start - 1) implies RHS >= Start.
// * "RHS >= Start" is trivially equivalent to "RHS > Start - 1" if		// * "RHS >= Start" is trivially equivalent to "RHS > Start - 1" if
// "Start - 1" doesn't overflow.		// "Start - 1" doesn't overflow.
// * For signed comparison, if Start - 1 does overflow, it's equal		// * For signed comparison, if Start - 1 does overflow, it's equal
// to INT_MAX, and "RHS >s INT_MAX" is trivially false.		// to INT_MAX, and "RHS >s INT_MAX" is trivially false.
// * For unsigned comparison, if Start - 1 does overflow, it's equal		// * For unsigned comparison, if Start - 1 does overflow, it's equal
// to UINT_MAX, and "RHS >u UINT_MAX" is trivially false.		// to UINT_MAX, and "RHS >u UINT_MAX" is trivially false.
//		//
// FIXME: Should isLoopEntryGuardedByCond do this for us?		// FIXME: Should isLoopEntryGuardedByCond do this for us?
auto CondGT = IsSigned ? ICmpInst::ICMP_SGT : ICmpInst::ICMP_UGT;		auto CondGT = IsSigned ? ICmpInst::ICMP_SGT : ICmpInst::ICMP_UGT;
auto *StartMinusOne = getAddExpr(OrigStart,		auto *StartMinusOne = getAddExpr(OrigStart,
getMinusOne(OrigStart->getType()));		getMinusOne(OrigStart->getType()));
return isLoopEntryGuardedByCond(L, CondGT, OrigRHS, StartMinusOne);		return isLoopEntryGuardedByCond(L, CondGT, OrigRHS, StartMinusOne);
};		};

// If we know that RHS >= Start in the context of loop, then we know that		// If we know that RHS >= Start in the context of loop, then we know that
// max(RHS, Start) = RHS at this point.		// max(RHS, Start) = RHS at this point.
if (canProveRHSGreaterThanEqualStart())		const SCEV *End;
		if (canProveRHSGreaterThanEqualStart()) {
End = RHS;		End = RHS;
else		} else {
		// If RHS < Start, the backedge will be taken zero times. So in
		// general, we can write the backedge-taken count as:
		//
		// RHS >= Start ? ceil(RHS - Start) / Stride : 0
		//
		// We convert it to the following to make it more convenient for SCEV:
		//
		// ceil(max(RHS, Start) - Start) / Stride
End = IsSigned ? getSMaxExpr(RHS, Start) : getUMaxExpr(RHS, Start);		End = IsSigned ? getSMaxExpr(RHS, Start) : getUMaxExpr(RHS, Start);
BECount = computeBECount(getMinusSCEV(End, Start), Stride);
		// See what would happen if we assume the backedge is taken. This is
		// used to compute MaxBECount.
		BECountIfBackedgeTaken = getUDivCeilSCEV(getMinusSCEV(RHS, Start), Stride);
		}

		// At this point, we know:
		//
		// 1. If IsSigned, Start <=s End; otherwise, Start <=u End
		// 2. The index variable doesn't overflow.
		//
		// Therefore, we know N exists such that
		// (Start + Stride * N) >= End, and computing "(Start + Stride * N)"
		// doesn't overflow.
		//
		// Using this information, try to prove whether the addition in
		// "(Start - End) + (Stride - 1)" has unsigned overflow.
		const SCEV *One = getOne(Stride->getType());
		bool MayAddOverflow = [&] {
		if (auto *StrideC = dyn_cast<SCEVConstant>(Stride)) {
		if (StrideC->getAPInt().isPowerOf2()) {
		// Suppose Stride is a power of two, and Start/End are unsigned
		// integers. Let UMAX be the largest representable unsigned
		reamesUnsubmitted Not Done Reply Inline Actions I split off this part of the change into 087310c71e with a bit of restructuring. reames: I split off this part of the change into 087310c71e with a bit of restructuring.
		// integer.
		//
		// By the preconditions of this function, we know
		// "(Start + Stride * N) >= End", and this doesn't overflow.
		// As a formula:
		//
		// End <= (Start + Stride * N) <= UMAX
		//
		// Subtracting Start from all the terms:
		//
		// End - Start <= Stride * N <= UMAX - Start
		//
		// Since Start is unsigned, UMAX - Start <= UMAX. Therefore:
		//
		// End - Start <= Stride * N <= UMAX
		//
		// Stride * N is a multiple of Stride. Therefore,
		//
		// End - Start <= Stride * N <= UMAX - (UMAX mod Stride)
		//
		// Since Stride is a power of two, UMAX + 1 is divisible by Stride.
		// Therefore, UMAX mod Stride == Stride - 1. So we can write:
		//
		// End - Start <= Stride * N <= UMAX - Stride - 1
		//
		// Dropping the middle term:
		//
		// End - Start <= UMAX - Stride - 1
		//
		// Adding Stride - 1 to both sides:
		//
		// (End - Start) + (Stride - 1) <= UMAX
		//
		// In other words, the addition doesn't have unsigned overflow.
		//
		// A similar proof works if we treat Start/End as signed values.
		// Just rewrite steps before "End - Start <= Stride * N <= UMAX" to
		// use signed max instead of unsigned max. Note that we're trying
		// to prove a lack of unsigned overflow in either case.
		return false;
		}
		}
		if (Start == Stride \|\| Start == getMinusSCEV(Stride, One)) {
		// If Start is equal to Stride, (End - Start) + (Stride - 1) == End - 1.
		// If !IsSigned, 0 <u Stride == Start <=u End; so 0 <u End - 1 <u End.
		// If IsSigned, 0 <s Stride == Start <=s End; so 0 <s End - 1 <s End.
		//
		// If Start is equal to Stride - 1, (End - Start) + Stride - 1 == End.
		return false;
		}
		return true;
		}();

		const SCEV *Delta = getMinusSCEV(End, Start);
		if (!MayAddOverflow) {
		// floor((D + (S - 1)) / S)
		// We prefer this formulation if it's legal because it's fewer operations.
		BECount =
		getUDivExpr(getAddExpr(Delta, getMinusSCEV(Stride, One)), Stride);
		} else {
		BECount = getUDivCeilSCEV(Delta, Stride);
		}
}		}

const SCEV *MaxBECount;		const SCEV *MaxBECount;
bool MaxOrZero = false;		bool MaxOrZero = false;
if (isa<SCEVConstant>(BECount))		if (isa<SCEVConstant>(BECount)) {
MaxBECount = BECount;		MaxBECount = BECount;
else if (isa<SCEVConstant>(BECountIfBackedgeTaken)) {		} else if (BECountIfBackedgeTaken &&
		isa<SCEVConstant>(BECountIfBackedgeTaken)) {
// If we know exactly how many times the backedge will be taken if it's		// If we know exactly how many times the backedge will be taken if it's
// taken at least once, then the backedge count will either be that or		// taken at least once, then the backedge count will either be that or
// zero.		// zero.
MaxBECount = BECountIfBackedgeTaken;		MaxBECount = BECountIfBackedgeTaken;
MaxOrZero = true;		MaxOrZero = true;
} else {		} else {
MaxBECount = computeMaxBECountForLT(		MaxBECount = computeMaxBECountForLT(
Start, Stride, RHS, getTypeSizeInBits(LHS->getType()), IsSigned);		Start, Stride, RHS, getTypeSizeInBits(LHS->getType()), IsSigned);
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	if (isa<SCEVCouldNotCompute>(Start))
return Start;		return Start;
}		}
if (End->getType()->isPointerTy()) {		if (End->getType()->isPointerTy()) {
End = getLosslessPtrToIntExpr(End);		End = getLosslessPtrToIntExpr(End);
if (isa<SCEVCouldNotCompute>(End))		if (isa<SCEVCouldNotCompute>(End))
return End;		return End;
}		}

const SCEV *BECount = computeBECount(getMinusSCEV(Start, End), Stride);		// Compute ((Start - End) + (Stride - 1)) / Stride.
		// FIXME: This can overflow. Holding off on fixing this for now;
		// howManyGreaterThans will hopefully be gone soon.
		const SCEV *One = getOne(Stride->getType());
		const SCEV *BECount = getUDivExpr(
		getAddExpr(getMinusSCEV(Start, End), getMinusSCEV(Stride, One)), Stride);

APInt MaxStart = IsSigned ? getSignedRangeMax(Start)		APInt MaxStart = IsSigned ? getSignedRangeMax(Start)
: getUnsignedRangeMax(Start);		: getUnsignedRangeMax(Start);

APInt MinStride = IsSigned ? getSignedRangeMin(Stride)		APInt MinStride = IsSigned ? getSignedRangeMin(Stride)
: getUnsignedRangeMin(Stride);		: getUnsignedRangeMin(Stride);

unsigned BitWidth = getTypeSizeInBits(LHS->getType());		unsigned BitWidth = getTypeSizeInBits(LHS->getType());
▲ Show 20 Lines • Show All 2,076 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/2008-11-18-Stride2.ll

	; RUN: opt < %s -analyze -enable-new-pm=0 -scalar-evolution 2>&1 \| FileCheck %s			; RUN: opt < %s -analyze -enable-new-pm=0 -scalar-evolution 2>&1 \| FileCheck %s
	; RUN: opt < %s -disable-output "-passes=print<scalar-evolution>" 2>&1 2>&1 \| FileCheck %s			; RUN: opt < %s -disable-output "-passes=print<scalar-evolution>" 2>&1 2>&1 \| FileCheck %s

	; CHECK: Loop %bb: backedge-taken count is ((-1 + (-1 * %x) + (1000 umax (3 + %x))) /u 3)			; CHECK: Loop %bb: backedge-taken count is (((-3 + (-1 * (1 umin (-3 + (-1 * %x) + (1000 umax (3 + %x)))))<nuw><nsw> + (-1 * %x) + (1000 umax (3 + %x))) /u 3) + (1 umin (-3 + (-1 * %x) + (1000 umax (3 + %x)))))
	; CHECK: Loop %bb: max backedge-taken count is 334			; CHECK: Loop %bb: max backedge-taken count is 334


	; This is a tricky testcase for unsigned wrap detection which ScalarEvolution			; This is a tricky testcase for unsigned wrap detection which ScalarEvolution
	; doesn't yet know how to do.			; doesn't yet know how to do.

	define i32 @f(i32 %x) nounwind readnone {			define i32 @f(i32 %x) nounwind readnone {
	entry:			entry:
	Show All 26 Lines

llvm/test/Analysis/ScalarEvolution/trip-count-unknown-stride.ll

Show All 29 Lines

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret void		ret void
}		}


; Check that we are able to compute trip count of a loop without an entry guard.		; Check that we are able to compute trip count of a loop without an entry guard.
; CHECK: Determining loop execution counts for: @foo2		; CHECK: Determining loop execution counts for: @foo2
; CHECK: backedge-taken count is ((-1 + (-1 * %s) + (1 umax %s) + (%n smax %s)) /u (1 umax %s))		; CHECK: backedge-taken count is ((((-1 * (1 umin ((-1 * %s) + (%n smax %s))))<nuw><nsw> + (-1 * %s) + (%n smax %s)) /u (1 umax %s)) + (1 umin ((-1 * %s) + (%n smax %s))))

; We should have a conservative estimate for the max backedge taken count for		; We should have a conservative estimate for the max backedge taken count for
; loops with unknown stride.		; loops with unknown stride.
; CHECK: max backedge-taken count is -1		; CHECK: max backedge-taken count is -1

define void @foo2(i32* nocapture %A, i32 %n, i32 %s) mustprogress {		define void @foo2(i32* nocapture %A, i32 %n, i32 %s) mustprogress {
entry:		entry:
br label %for.body		br label %for.body
Show All 33 Lines	for.body: ; preds = %entry, %for.body
br i1 %cmp, label %for.body, label %for.end		br i1 %cmp, label %for.body, label %for.end

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret void		ret void
}		}

; Same as foo2, but with mustprogress on loop, not function		; Same as foo2, but with mustprogress on loop, not function
; CHECK: Determining loop execution counts for: @foo4		; CHECK: Determining loop execution counts for: @foo4
; CHECK: backedge-taken count is ((-1 + (-1 * %s) + (1 umax %s) + (%n smax %s)) /u (1 umax %s))		; CHECK: backedge-taken count is ((((-1 * (1 umin ((-1 * %s) + (%n smax %s))))<nuw><nsw> + (-1 * %s) + (%n smax %s)) /u (1 umax %s)) + (1 umin ((-1 * %s) + (%n smax %s))))
; CHECK: max backedge-taken count is -1		; CHECK: max backedge-taken count is -1

define void @foo4(i32* nocapture %A, i32 %n, i32 %s) {		define void @foo4(i32* nocapture %A, i32 %n, i32 %s) {
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%i.05 = phi i32 [ %add, %for.body ], [ 0, %entry ]		%i.05 = phi i32 [ %add, %for.body ], [ 0, %entry ]
%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.05		%arrayidx = getelementptr inbounds i32, i32* %A, i32 %i.05
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
%inc = add nsw i32 %0, 1		%inc = add nsw i32 %0, 1
store i32 %inc, i32* %arrayidx, align 4		store i32 %inc, i32* %arrayidx, align 4
%add = add nsw i32 %i.05, %s		%add = add nsw i32 %i.05, %s
%cmp = icmp slt i32 %add, %n		%cmp = icmp slt i32 %add, %n
br i1 %cmp, label %for.body, label %for.end, !llvm.loop !8		br i1 %cmp, label %for.body, label %for.end, !llvm.loop !8

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret void		ret void
}		}

; A more complex case with pre-increment compare instead of post-increment.		; A more complex case with pre-increment compare instead of post-increment.
; CHECK-LABEL: Determining loop execution counts for: @foo5		; CHECK-LABEL: Determining loop execution counts for: @foo5
; CHECK: Loop %for.body: backedge-taken count is ((-1 + (-1 * %start) + (1 umax %s) + (%n smax %start)) /u (1 umax %s))		; CHECK: Loop %for.body: backedge-taken count is ((((-1 * (1 umin ((-1 * %start) + (%n smax %start))))<nuw><nsw> + (-1 * %start) + (%n smax %start)) /u (1 umax %s)) + (1 umin ((-1 * %start) + (%n smax %start))))

; We should have a conservative estimate for the max backedge taken count for		; We should have a conservative estimate for the max backedge taken count for
; loops with unknown stride.		; loops with unknown stride.
; CHECK: max backedge-taken count is -1		; CHECK: max backedge-taken count is -1

define void @foo5(i32* nocapture %A, i32 %n, i32 %s, i32 %start) mustprogress {		define void @foo5(i32* nocapture %A, i32 %n, i32 %s, i32 %start) mustprogress {
entry:		entry:
br label %for.body		br label %for.body
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ScalarEvolution] Fix overflow in computeBECount.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 359473

llvm/include/llvm/Analysis/ScalarEvolution.h

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/ScalarEvolution/2008-11-18-Stride2.ll

llvm/test/Analysis/ScalarEvolution/trip-count-unknown-stride.ll

[ScalarEvolution] Fix overflow in computeBECount.
ClosedPublic