This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1/35
ScalarEvolution.h
-
lib/
-
Analysis/
-
LoopAccessAnalysis.cpp
3/24
ScalarEvolution.cpp
6
ScalarEvolutionExpander.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp
-
test/
-
Analysis/ScalarEvolution/
-
ScalarEvolution/
-
predicated-trip-count.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
-
AArch64/
-
backedge-overflow.ll
-
X86/
3
vectorization-remarks-missed.ll

Differential D17201

[SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
ClosedPublic

Authored by sbaranga on Feb 12 2016, 7:22 AM.

Download Raw Diff

Details

Reviewers

anemet
mzolotukhin
sanjoy
hfinkel

Commits

rG6f444dfd5517: Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and…
rG72b4a4a33083: [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
rL265786: Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA…
rL265535: [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV

Summary

When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Diff Detail

Event Timeline

sbaranga updated this revision to Diff 47800.Feb 12 2016, 7:22 AM

sbaranga retitled this revision from to [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV.

sbaranga updated this object.

sbaranga added reviewers: sanjoy, anemet, mzolotukhin, hfinkel.

sbaranga added a subscriber: llvm-commits.

Herald added subscribers: mzolotukhin, sanjoy. · View Herald TranscriptFeb 12 2016, 7:22 AM

mssimpso added a subscriber: mssimpso.Feb 12 2016, 7:54 AM

Given that we're now computing predicated trip counts (the bread and butter of SCEV), I think this is good time to start doing some whitebox testing. What I mean by that is to add a way to get SCEV to produce a predicated trip count and dump both the predicate and the trip count, and do llvm-lit/FileCheck type testing on the output. Unfortunately, this will be more work for you, but I think this the right thing to do for the system. We don't want to end up in a situation where adding a test for a predicated trip count related bug involves tricking the loop vectorizer to vectorize a certain loop.

In fact, I think there is already some scope for whitebox testing things like PredicatedScalarEvolution::getAsAddRec, maybe that's a good place to start?

Again, I know this is more work for you; but having the infrastructure to easily add tests is important.

include/llvm/Analysis/ScalarEvolution.h
723	"count of the" is repeated
898–900	Describe `UseAssumptions` here.
945	I don't think `Force` is a good name -- how about `CreateAssumptions` or `CreatePredicates`? Same comment below.
1343	Add a comment for this (fine to extend the one on `getBackedgeTakenCount`)
lib/Analysis/ScalarEvolution.cpp
5493	I don't think you need the parens around `ENT->Pred`.
5949	I don't think this is the right layering -- for instance, this forces SCEV clients that don't want speculative / predicated trip counts to pay to cost of computing them. I'd say SCEV users that care about predicated trip counts should do this retry themselves, i.e. something like const SCEV *TC = getBackedgeTakenCount(L); if (TC is SCEVCouldNotCompute) { SE->forgetLoop(L); TC = getPredicatedBackedgeTakenCount(L); }
6984	I don't think you need to `(!A) && B`, you can `!A && B` instead.

This revision now requires changes to proceed.Feb 19 2016, 6:04 PM

In D17201#357709, @sanjoy wrote:

Given that we're now computing predicated trip counts (the bread and butter of SCEV), I think this is good time to start doing some whitebox testing. What I mean by that is to add a way to get SCEV to produce a predicated trip count and dump both the predicate and the trip count, and do llvm-lit/FileCheck type testing on the output. Unfortunately, this will be more work for you, but I think this the right thing to do for the system. We don't want to end up in a situation where adding a test for a predicated trip count related bug involves tricking the loop vectorizer to vectorize a certain loop.

In fact, I think there is already some scope for whitebox testing things like PredicatedScalarEvolution::getAsAddRec, maybe that's a good place to start?

Again, I know this is more work for you; but having the infrastructure to easily add tests is important.

Thanks, Sanjoy! I agree, we should do as much testing as possible for this. I'll add the tests.

lib/Analysis/ScalarEvolution.cpp
5949	Good point. I'll look into how we could compute this more lazily. Why do you think we would need a SE->forgetLoop() here? FWIW I'm generally trying to avoid having to invalidate the analysis for a given loop.

sanjoy added inline comments.Feb 22 2016, 8:01 AM

lib/Analysis/ScalarEvolution.cpp
5949	Why do you think we would need a SE->forgetLoop() here? To forget the cached CouldNotCompute trip count. But you're right -- we're probably better off not invalidating the whole analysis -- we can just have the second call to `getPredicatedBackedgeTakenCount` overwrite CouldNotCompute.

sbaranga added inline comments.Feb 22 2016, 9:20 AM

lib/Analysis/ScalarEvolution.cpp
5949	Thanks! We'll take this approach then.

This update addresses the comments from Sanjoy from the previous
review round.

Renamed GuardedBackedgeTakenCount -> PredicatedBackedgeTakenCount, for
consistency with existing code.

Modified ScalarEvolution's print method to also print the predicated
backedge taken count. Used this feature to add regression tests that
directly check SCEV's output.

Cloned the backedge taken info map to store information for the predicated
backedge taken count, and used this to lazily compute the predicated
backedge taken count. This will allow users of SCEV that don't need
the predicated backedge taken count to not pay the cost of computing it.
This approach is also much less bug-prone then the previous one, as
the new information (the predicated backedge taken count) is is another
data structure that must be explicitly accessed.

Hi Sanjoy,

I've added the test/Analysis/ScalarEvolution/predicated-trip-count.ll test which goes through some use cases of this feature for whitebox testing.
Was this what you had in mind, or do you think we should also be doing something else?

Thanks,
Silviu

test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll
54	I had to change this test because it started to figure out what the backedge taken count was. This was testing the vectorization remark when we cannot find the backedge taken count. changed the test so that it will continue not be able to get the backedge taken count. What is interesting about this is the way we've managed to get the (exact) backedge taken count. The initial analysis was only able to get the maximum backedge taken count but not the exact one. However, we can use this to get a better SCEV for cmp3. On a following invocation of computeBackedgeTakenCount this information is used to get an exact backedge taken count.

Hi Silviu,

In D17201#371737, @sbaranga wrote:

Was this what you had in mind, or do you think we should also be doing something else?

Yes, this is what I had in mind, so thanks! I'm still not done
reviewing the change, but I've added some minor comments inline. I'll
try to finish the review by Monday (no need to address the inline
comments before then).

Btw, it would be great to have (in a later change) similar llvm-lit
tests for PredicatedScalarEvolution::getAsAddRec as well; but I'll
leave it to you to make a call on that.

include/llvm/Analysis/ScalarEvolution.h
873–876	This is minor, but the `IsGuarded` name is somewhat misleading, since the loop may not already be guarded. Why not call this `AddPredicates` or `AllowPredicates`?
lib/Analysis/ScalarEvolution.cpp
5272	Use `auto` here.
5278	Add a `/* IsGuarded = */` before the `true`.
5826	Please add a comment as `/* IsGuarded = */ true`.
6986	No need to fix this in this change, but a nicer API could be to have `convertSCEVToAddRecWithPredicates` return an add recurrence or null if it failed (and get rid of the `dyn_cast`).
8600	I think you can just fall through the logic below (under `// Avoid weird loops`). Also, the comment on the api change to `convertSCEVToAddRecWithPredicates` also applies here.

sanjoy added inline comments.Mar 10 2016, 6:27 PM

test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll
54	Interesting. Do you mean "However, we can use this to get a better SCEV for `%0`"? I can see how `SimplifyICmpOperands` would be able to use a tighter range on `%0` to simplify the `sle` into an `slt`. It would be great if SCEV could directly compute the backedge count here though. The problem is that we didn't have a way for `computeExitLimitFromCond` to say "BECount is Infinite if `L` is `INT_MAX` else is `L + 1` (say)" so `computeExitLimitFromCond` would have to give up in the face of the dreaded `COULDNOTCOMPUTE`. But with your work, that is changing (ability to represent predicated BE counts); and perhaps one day SCEV will be able to directly compute the backedge taken counts of loop like these. :)

sanjoy requested changes to this revision.Mar 11 2016, 4:56 PM

sanjoy edited edge metadata.

sanjoy added inline comments.

include/llvm/Analysis/ScalarEvolution.h
565	Now that you're embedding `SCEVUnionPredicate` in every `ExitNotTakenInfo`, I'm somewhat worried about the `SmallVector<const SCEVPredicate , 16> Preds;` :) Firstly, what is the typical number of predicates we see? 16 seems higher that what I would expect. Secondly, a better (in terms of memory consumption) solution would be to have a `std::unique_ptr<SCEVUnionPredicate>` here, where a null value means "always true" so that in the non-predicated case we only use up one word. The most compact solution (I think we should go with this) is to: Extract out a `struct ExitNotTakenExtras` that contains a `SmallVector` of `ExitNotTaken` s and a `SCEVUnionPredicate` Change `PointerIntPair<ExitNotTakenInfo, 1> NextExit` to `PointerIntPair<ExitNotTakenExtras *, 1> NextExit` where it is usually null, but if it is non-null you either have multiple exits or have some SCEV predicates. Normally I wouldn't worry so much about space, but `ExitNotTakenInfo` has clearly been optimized for space usage, so we should try not to break that.
702	Add a comment here that `Max` is valid only if the predicates on each of the `ExitNotTakenInfo` instances is true.
708–709	I think the right API here is to promote the `std::pair` into a full struct that carries the block, backedge count and a predicate; and not have a separate vector for `ExitPreds`. You could also use `std::tuple` here (since the types are all different, there isn't much scope for confusion).
725	Can you please make this comment a little more clear? Is it that the caller needs to know in advance, by some other means, that the backedge taken count is predicated and pass in a non-null `Predicates`? If so, what is that other means? The `always correct (independent of any assumptions that should be checked at run-time)` is misleading too -- you do sometimes return exact BE counts are dependent on a run-time assumption.
759	Here and elsewhere, can you please add a newline after field declarations (i.e. after the declaration for `BackedgeTakenCounts`)?
871	I'd remove the `(the exact backedge taken count will be known)` bit -- it can otherwise mislead people into thinking that this will return a precise be count, no matter what.
887–888	Nit: the parameter name is `IsGuarded` and not `Guarded`. But, as I said earlier, it is probably better to rename this to `AddPredicates`, `AllowPredicates` or `CreatePredicates` as you have below.
1348	Nit: indentation.
lib/Analysis/ScalarEvolution.cpp
5364	Usually I see lambdas formatted with no space between the `]` and the `(`. I'd say just run the diff through `clang-format` before checkin, and whatever it does is fine. :)
5372	What you have here is fine, but I'd have tried: for (auto &Map : {BackedgeTakenCounts, PredicatedBackedgeTakenCounts}) { // Remove L from Map } (not 100% sure if the above will work).
5585–5588	Add an assert here that if `Guarded` is false, then the predicate in `EL` is trivially true (i.e. we didn't add a predicate when we were not supposed to).
5598	This does not look right to me -- won't the `EL` be freed after each iteration, leaving a dangling pointer in `ExitCountPreds`?

This revision now requires changes to proceed.Mar 11 2016, 4:56 PM

Hi Sanjoy,

Thanks for the comments! Some replies inline.

Regarding testing getAsAddRec, we'll most likely have to do it in a pass like LoopAccessAnalysis, since the call takes a Loop parameter. We already do something similar in test/Analysis/LoopAccessAnalysis/wrapping-pointer-versioning.ll, although we just check the resulting predicates there (it would be nice to also get the initial and final expressions).

Thanks,
Silviu

include/llvm/Analysis/ScalarEvolution.h
565	Ok, the most common size of the SCEVUnionPredicate there should be 0, so going with your solution makes sense to me. On the other hand, it does seem a bit strange to optimize this for memory consumption. I wouldn't expect it to make that much of a difference.
lib/Analysis/ScalarEvolution.cpp
5598	Of course, thanks for catching this!
test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll
54	Yes, that is basically what is happening (we can now use the range information on %0). You're correct, with these changes we will have a way of doing the sle/ule to slt/ult conversions even without the appropriate range information. In fact that would be something really nice to have (I've even seen people hitting this issue on llvm-dev, so maybe this is not just a corner case).

sanjoy added a subscriber: atrick.Mar 14 2016, 11:21 AM

sanjoy added inline comments.

include/llvm/Analysis/ScalarEvolution.h
565	My guess is that there are enough `ExitNotTakenInfo` instances floating around in a typical compilation that its size becomes important. But maybe @atrick has a more cogent answer?

atrick added inline comments.Mar 14 2016, 11:40 AM

include/llvm/Analysis/ScalarEvolution.h
565	I don't have a better answer but agree that SCEV instances need to be size conscious, just as other IR data types do. I would avoid allocating and initializing 16 words that are unlikely to be used.

Reworked the ExitNotTakenInfo allocation scheme to avoid allocating
un-needed SCEVUnionPredicates.

The new allocation scheme follows Sanjoy's idea:
we have a new optional structure that can hold optional (almost always not
needed) data. The optional data contains a SCEVPredicate and a vector
of ExitNotTakenInfo structs. The first elemet has the optional info if it
has a SCEVPredicate or there are more than one loop exits. The other
ExitNotTakenInfo structs will contain the extra info only if they have
a SCEVPredicate.

Note that we could pottentially modify this scheme such that the first
loop exit has a union of all the SCEV predicates (which doesn't have
the per-exit information, but uses less memory - in the cases where
we do need SCEV predicates).

Since the new structure is not trivial to traverse, we also have a new
iterator for this (which keeps the traversals simple to write).

Also followed up on the reset of the review comments from the last round.

Minor cleanup: use range-based for loops when traversing the ExitNotTakenInfo structs, since we now have iterators.

sbaranga added inline comments.Mar 21 2016, 10:32 AM

lib/Analysis/ScalarEvolution.cpp
5372	Sanjoy, I've left this as is for now. Let me know if you prefer the form above, and I'll try to change the code to use it.

Hi Silviu,

This is looking close to ready to land; I mostly have minor comments except on the ExitNotTakenExtras structure, which is more complex than I had had in mind.

include/llvm/Analysis/ScalarEvolution.h
557	Leave a line between declarations.
575	This is pretty obvious by name, but I'd add a doxygen comment for consistency.
580	I'd remove the braces. Also a more LLVM-idiomatic way of writing this would be if (auto *Info = ExtraInfo.getPointer()) return &Info->Pred; return nullptr;
591	I'd mildly prefer `return !getPred() \|\| getPred()->isAlwaysTrue();`, but if you like this better, that's fine too.
595	This is great!
654	The following does not need to be done for this change, but this is just to note some cleanup work one of us should consider taking up in the future: Now that we're being so nice and well factored about this, I think it will be even nicer if we have a good containment relationship. Specifically, it will be nice to have a top level `ExitTakenInfoSet` struct that contains one inlined `ExitTakenInfo` object, and a `PointerIntPair` of `ExitTakenInfoSetExtras` and the `isComplete` bit.
665	I think the code would be cleaner if we changed `Pred` to a `std::vector` or `SmallVector` of predicates, and have the invariant be: `ExtraInfo` in the "root" `ExitNotTaken` instance is `nullptr` means that there is one unpredicated exit count `ExtraInfo` in the "root" `ExitNotTaken` instance is not `nullptr` means that i'th exit count is in `i == 0 ? Root : Root->Extra.Exits[i - 1]` (exactly as it is today), and the predicate for the i'th exit count is `Root->Extra ? Root->Extra.Exits[i] : AlwaysTrue`. This is less efficient than the layout you have here, but is simpler; and we still have the nice property that unpredicated exit counts for single exit loops can be represented compactly. Does the above sound feasible? If it does, then lets please go ahead with that; else let me know and I'll review the `BackedgeTakenInfo` constructor as is.
666	Again, please leave blank lines between decls.
894	Please consistently use one of `AllowPredicates` or `CreatePredicates`. Also there is some inconsistency around when `CreatePredicates` has a default value and when it does not -- is there a reason for the difference?
lib/Analysis/ScalarEvolution.cpp
5372	sgtm
5519–5545	Can you please do one of the following: Comment which numeric field is which Cache the three fields in three local variables, with more mnemonic names Honestly, reading the code, a `struct` with named fields would have been better; but we can fix that later.
5551	Are you unconditionally assuming `!std::get<2>(ExitCounts[0]).isAlwaysTrue()` here (given that you're unconditionally adding `1`)? If so, please add a brief comment on why that is okay.

This revision now requires changes to proceed.Mar 22 2016, 2:07 PM

sbaranga added inline comments.Mar 23 2016, 9:07 AM

include/llvm/Analysis/ScalarEvolution.h
665	The problem with this is we can't get the predicate of an ExitNotTaken without knowing the root, so the information there is not self contained anymore. So things like the isPredicateAlwaysTrue() method would need to change (because we don't know the root), probably by taking the ExitNotTaken root as the argument? Would there be a way around this? If not, I would prefer to keep the current solution.
lib/Analysis/ScalarEvolution.cpp
5551	No, we know the first ExitNotTakenInfo has an ExtraInfo struct because we have more than 1 exit (so there's no need to check the predicate). The other entries only have an ExtraInfo if they have a not always true SCEVPredicate. Perhaps ExtraInfoSize instead of PredsSize would have been better here.

Replaced the <SCEV*, BasicBlock*, SCEVUnionPredicate> tuple with a new "EdgeInfo" struct.
We now consistently use AllowPredicates eveywhere (defaults to false).

I've left ExitNotTakenInfo as is for now, until we decide how (and if) we should change it.

mcrosier added a subscriber: mcrosier.Mar 25 2016, 7:11 AM

Mostly nits, except that there may be a correctness issue in SCEVExpander::generateOverflowCheck.

include/llvm/Analysis/ScalarEvolution.h
665	SGTM, lets go with what we have now. I still think the data layout here can be simplified, but we (I'm happy to take care of that) can do that later once this change is in.
887–888	Nit (here and below) "this call will try to"
lib/Analysis/ScalarEvolution.cpp
5557	Can you move this `auto &Exits` out of the loop?
5562	Can this just be `Exits.emplace_back(ExitCounts[i].ExitBlock, ExitCounts[i].taken, Ptr)`?
5565	Why not move this to the place statement where you assign to `Ptr`: if (!ExitCounts[i].Pred.isAlwaysTrue()) { Ptr = &ENT[PredPos++]; Ptr->Pred = ... }
lib/Analysis/ScalarEvolutionExpander.cpp
2007–2009	This part is a significant semantic change that I unfortunately hadn't (but really should have) noticed in earlier revisions. While I don't yet have a specific example given the current state of the code, I think it is possible to end up with a circular logic fallacy here. Earlier, the invariant was: given a no-overflow predicate, we would write a loop entry predicate `EP` (a function of the backedge count) that, when `true`, would imply no overflow. Formally, `EP => NoOverflow`. However, now we're doing something different. Now we have: `EP => (Backedge taken count is BE => NoOverflow)` (this part is the same as earlier), but `Backedge taken count is BE` is not axiomatically true -- it is true under the `NoOverflow` condition. In other words, the set of logical equations we have are EP => (Backedge taken count is BE => NoOverflow) NoOverflow => Backedge taken count is BE Now we check `EP` at runtime, so it is known to be `true`. Given that, there are two solutions to the above: {`Backedge taken count is BE` is `true`, `NoOverflow` is `true`}; or {`Backedge taken count is BE` is `false`, `NoOverflow` is `false`}. The latter solution is problematic. One problematic case that won't happen today, but illustrates what I'm talking about: for (u32 i = 0; ; i++) store volatile (GEP a, i), 0 The above loop can run forever (assuming `u32` 's overflow is well defined), but let's say we decide to predicate the increment to an NUSW increment. Given that, we know the loop won't run more than `-1` times (since otherwise we will have a side effect that uses poison). Howver, the predicate that you'll compute in SCEVExpander in such a case is `-1 == -1`, which is trivially true, and now the loop is miscompiled (instead of running forever, it'll just run `-1` times). Now, I'll note that I've so far not have been able to come with a problematic case that will break in the current version of the patch, so it is possible that there is some deeper invariant here that is obvious.

This revision now requires changes to proceed.Mar 27 2016, 3:43 PM

flyingforyou added a subscriber: flyingforyou.Mar 27 2016, 5:06 PM

sbaranga added inline comments.Mar 28 2016, 4:15 AM

lib/Analysis/ScalarEvolutionExpander.cpp
2007–2009	Thanks! That is a very good point and we should be really careful here. I think this change is still correct: Let's say that we return a symbolic answer X, and the correct answer is Y. The problem is when X != Y and we pass the predicate check. Because the answer is wrong, some overflow must happen. Since we are passing the NoOverflow check, we need to have X < Y (otherwise we wouldn't be passing the check). Note that for the first X iterations our predicate holds. I think what makes this correct is the fact that we're using the predicates in HowManyGreaterThans/HowManyLessThans, which means that we need to stop at iteration X.

sanjoy added inline comments.Mar 28 2016, 5:42 PM

lib/Analysis/ScalarEvolutionExpander.cpp
2007–2009	So, to rephrase what I understood, by "with the predicate (IsNoWrap AR) the trip count of the loop is N", PSE really means "if the add recurrence AR does not overflow in the first N iterations, then the loop's count is N". In particular, for the loop for (unsigned i; ; i++) side_effect_use(i); predicating `i++` as NUSW does not let you conclude that the trip count of the loop is `UINT_MAX`, since just because `i++` does not overflow the in first `UINT_MAX` iterations does not guarantee that the loop will exit or have UB in the `UINT_MAX+1 th iteration. This is a subtle invariant, so at the very least this needs to be documented explicitly; and the places where we add predicates `HowManyLessThans` etc. need to be audited carefully (it sounds like you already have done most of that?). Finally, if possible, it will be great if we can structure the code in a way that will make (accidentally) breaking this invariant difficult; though I don't have anything concrete in mind right now.

sbaranga added inline comments.Mar 29 2016, 5:31 AM

lib/Analysis/ScalarEvolutionExpander.cpp
2007–2009	Yes, this is a very good description of the logic! Indeed, this is subtle so it needs documenting. I did audit the places where we've added the predicates, but I'd like to have another look.

Updated the definition of the SCEVWrapPredicate so that its semantics are now decoupled from the backedge taken count.
This should address the other review comments as well.

sbaranga marked 4 inline comments as done.Mar 31 2016, 5:59 AM

sbaranga added inline comments.

include/llvm/Analysis/ScalarEvolution.h
654	Makes sense to me.
665	Thanks! Greatly appreciated :).
lib/Analysis/ScalarEvolutionExpander.cpp
2007–2010	I've re-audited the Please see the updated definition of the SCEVWrapPredicate which should make this point clear. Regarding the code restructuring: I don't see a good solution for this either.

sbaranga added inline comments.Apr 1 2016, 9:05 AM

lib/Analysis/ScalarEvolutionExpander.cpp
2007–2010	I meant to write that I've re-audited the places where we were using the predicates, and I think everything is ok.

Hi Silviu,

This lgtm now! Thanks for bearing with the long review process. I have a minor "messaging" comment inline, but otherwise things look great!

include/llvm/Analysis/ScalarEvolution.h
278–279	What I have in mind is slightly different (same content, said differently): If a SCEVPredicate says `AR` is `nssw` then `AR` is `nssw` for the entire iteration space of the loop. No exceptions. The predicated BE taken computation logic now needs to be aware of a possible circular logic issue -- if it adds a predicate forcing an `AR` to be `nssw` then `AR` is `nssw` only for the trip count it itself computes. This means it cannot use the no-overflow property in certain ways the non-predicated BE taken computation can. If (1) is ever incorrect (i.e. the predicate fails to hold) then we've already "lost" (i.e. miscompiled) because `getPredicatedBackedgeTakenCount` gave us the wrong answer. In other words, I think you're downplaying the fact that the "burden" is really on `getPredicatedBackedgeTakenCount`, and not on the general notion of predicate itself. For instance, there is no complication at all if we don't also use predicated backedge taken counts (i.e. we use normal backedge taken counts) with `SCEVPredicate`.

This revision is now accepted and ready to land.Apr 4 2016, 11:47 PM

In D17201#392057, @sanjoy wrote:

Hi Silviu,

This lgtm now! Thanks for bearing with the long review process. I have a minor "messaging" comment inline, but otherwise things look great!

Thanks a lot for reviewing! Some comments inline, but it looks like this shouldn't be too difficult to resolve.

-Silviu

include/llvm/Analysis/ScalarEvolution.h
278–279	Yes, this was what I also had in mind initially. However, I found it more easy to reason about things this way. Here is the reason: The new definition also happens to imply the old one for all code that also checks the predicated backedge taken count, and it only makes a difference when computing this predicated backedge taken count. It removes the circular logic issue for our current use case. That is, we don't need to know that the predicated backedge taken count is correct in order to check the predicate, so the reasoning would look like: wrap predicate (+ possibly some other predicates) => predicated backedge taken count is correct predicated backedge taken count is correct + wrap predicate => Wrap predicate is true throughout the loop which looked like a slightly more elegant way of reasoning about this. The burden is still on the getPredicatedBackedgeTakenCount to correctly use the predicates and not get itself into circular logic. Also, we can still use the backedge taken count here since if we can compute it, the predicated backedge taken count would be equal to it and have no predicates. Given this, what do you think? If you still like to go with your version, we can do that.

sanjoy added inline comments.Apr 5 2016, 12:56 PM

include/llvm/Analysis/ScalarEvolution.h
276	Optional suggestion: I'd change this line to "Note that this predicate does not imply ... taken count". The "that X is interpreted as a SCEV expression" sounds confusing on the first reading (since we've already established that X is a SCEV expression returned by getPredicatedBackedgeTakenCount.
278	Optional suggestion: I'd remove the "Wrap" from "nusw Wrap", since we already have "w" for "wrap" in "nusw".
278–279	SGTM, what you have is fine. I added some minor wording suggestions -- all of which are optional to apply.

Improve description for SCEVWrapPredicate.

sbaranga closed this revision.Apr 6 2016, 6:23 AM

Thanks! Committed in r265535.

-Silviu

Reverted because this caused the following failure: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7327

As far as I know this is isolated to windows. It's not entirely clear why this happening..

Re-committed in r265786 with a fix. Essentially, we can't use a pointer allocated by new with PointerIntPair, since new doesn't guarantee alignment, which is required by PointerIntPair. So I've broken this up into a pointer and a bool.

We could use a special allocator (SpecificBumpPtrAllocator?) to guarantee alignment in order to work around this, but it doesn't make much sense since the allocation scheme for ExitNotTakenInfo/Extras will probably change soon anyway.

Revision Contents

Path

Size

include/

llvm/

Analysis/

ScalarEvolution.h

247 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

4 lines

ScalarEvolution.cpp

315 lines

ScalarEvolutionExpander.cpp

4 lines

Transforms/

Vectorize/

LoopVectorize.cpp

4 lines

test/

Analysis/

ScalarEvolution/

predicated-trip-count.ll

109 lines

Transforms/

LoopVectorize/

AArch64/

backedge-overflow.ll

166 lines

X86/

vectorization-remarks-missed.ll

3 lines

Diff 52784

include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	public:
const SCEVConstant *getRHS() const { return RHS; }		const SCEVConstant *getRHS() const { return RHS; }

/// Methods for support type inquiry through isa, cast, and dyn_cast:		/// Methods for support type inquiry through isa, cast, and dyn_cast:
static inline bool classof(const SCEVPredicate *P) {		static inline bool classof(const SCEVPredicate *P) {
return P->getKind() == P_Equal;		return P->getKind() == P_Equal;
}		}
};		};

/// SCEVWrapPredicate - This class represents an assumption		/// SCEVWrapPredicate - This class represents an assumption made on an AddRec
/// made on an AddRec expression. Given an affine AddRec expression		/// expression. Given an affine AddRec expression {a,+,b}, we assume that it
/// {a,+,b}, we assume that it has the nssw or nusw flags (defined		/// has the nssw or nusw flags (defined below) in the first X iterations of
/// below).		/// the loop, where X is a SCEV expression returned by
		sanjoyUnsubmitted Not Done Reply Inline Actions Optional suggestion: I'd change this line to "Note that this predicate does not imply ... taken count". The "that X is interpreted as a SCEV expression" sounds confusing on the first reading (since we've already established that X is a SCEV expression returned by getPredicatedBackedgeTakenCount. sanjoy: Optional suggestion: I'd change this line to "Note that this predicate does not imply ... taken…
		/// getPredicatedBackedgeTakenCount).
		///
		sanjoyUnsubmitted Not Done Reply Inline Actions Optional suggestion: I'd remove the "Wrap" from "nusw Wrap", since we already have "w" for "wrap" in "nusw". sanjoy: Optional suggestion: I'd remove the "Wrap" from "nusw Wrap", since we already have "w" for…
		/// Note that this does not imply that X is equal to the backedge taken
		sanjoyUnsubmitted Not Done Reply Inline Actions What I have in mind is slightly different (same content, said differently): If a SCEVPredicate says `AR` is `nssw` then `AR` is `nssw` for the entire iteration space of the loop. No exceptions. The predicated BE taken computation logic now needs to be aware of a possible circular logic issue -- if it adds a predicate forcing an `AR` to be `nssw` then `AR` is `nssw` only for the trip count it itself computes. This means it cannot use the no-overflow property in certain ways the non-predicated BE taken computation can. If (1) is ever incorrect (i.e. the predicate fails to hold) then we've already "lost" (i.e. miscompiled) because `getPredicatedBackedgeTakenCount` gave us the wrong answer. In other words, I think you're downplaying the fact that the "burden" is really on `getPredicatedBackedgeTakenCount`, and not on the general notion of predicate itself. For instance, there is no complication at all if we don't also use predicated backedge taken counts (i.e. we use normal backedge taken counts) with `SCEVPredicate`. sanjoy: What I have in mind is slightly different (same content, said differently): 1. If a…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Yes, this was what I also had in mind initially. However, I found it more easy to reason about things this way. Here is the reason: The new definition also happens to imply the old one for all code that also checks the predicated backedge taken count, and it only makes a difference when computing this predicated backedge taken count. It removes the circular logic issue for our current use case. That is, we don't need to know that the predicated backedge taken count is correct in order to check the predicate, so the reasoning would look like: wrap predicate (+ possibly some other predicates) => predicated backedge taken count is correct predicated backedge taken count is correct + wrap predicate => Wrap predicate is true throughout the loop which looked like a slightly more elegant way of reasoning about this. The burden is still on the getPredicatedBackedgeTakenCount to correctly use the predicates and not get itself into circular logic. Also, we can still use the backedge taken count here since if we can compute it, the predicated backedge taken count would be equal to it and have no predicates. Given this, what do you think? If you still like to go with your version, we can do that. sbaranga: Yes, this was what I also had in mind initially. However, I found it more easy to reason about…
		sanjoyUnsubmitted Not Done Reply Inline Actions SGTM, what you have is fine. I added some minor wording suggestions -- all of which are optional to apply. sanjoy: SGTM, what you have is fine. I added some minor wording suggestions -- all of which are…
		/// count. This means that if we have a nusw predicate for i32 {0,+,1} with a
		/// predicated backedge taken count of X, we only guarantee that {0,+,1} has
		/// nusw in the first X iterations. {0,+,1} may still wrap in the loop if we
		/// have more than X iterations.
class SCEVWrapPredicate final : public SCEVPredicate {		class SCEVWrapPredicate final : public SCEVPredicate {
public:		public:
/// Similar to SCEV::NoWrapFlags, but with slightly different semantics		/// Similar to SCEV::NoWrapFlags, but with slightly different semantics
/// for FlagNUSW. The increment is considered to be signed, and a + b		/// for FlagNUSW. The increment is considered to be signed, and a + b
/// (where b is the increment) is considered to wrap if:		/// (where b is the increment) is considered to wrap if:
/// zext(a + b) != zext(a) + sext(b)		/// zext(a + b) != zext(a) + sext(b)
///		///
/// If Signed is a function that takes an n-bit tuple and maps to the		/// If Signed is a function that takes an n-bit tuple and maps to the
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	private:
/// Information about the number of loop iterations for which a loop exit's		/// Information about the number of loop iterations for which a loop exit's
/// branch condition evaluates to the not-taken path. This is a temporary		/// branch condition evaluates to the not-taken path. This is a temporary
/// pair of exact and max expressions that are eventually summarized in		/// pair of exact and max expressions that are eventually summarized in
/// ExitNotTakenInfo and BackedgeTakenInfo.		/// ExitNotTakenInfo and BackedgeTakenInfo.
struct ExitLimit {		struct ExitLimit {
const SCEV *Exact;		const SCEV *Exact;
const SCEV *Max;		const SCEV *Max;

		/// A predicate union guard for this ExitLimit. The result is only
		/// valid if this predicate evaluates to 'true' at run-time.
		SCEVUnionPredicate Pred;

/implicit/ ExitLimit(const SCEV *E) : Exact(E), Max(E) {}		/implicit/ ExitLimit(const SCEV *E) : Exact(E), Max(E) {}

ExitLimit(const SCEV E, const SCEV M) : Exact(E), Max(M) {		ExitLimit(const SCEV E, const SCEV M, SCEVUnionPredicate &P)
		: Exact(E), Max(M), Pred(P) {
assert((isa<SCEVCouldNotCompute>(Exact) \|\|		assert((isa<SCEVCouldNotCompute>(Exact) \|\|
!isa<SCEVCouldNotCompute>(Max)) &&		!isa<SCEVCouldNotCompute>(Max)) &&
"Exact is not allowed to be less precise than Max");		"Exact is not allowed to be less precise than Max");
}		}

/// Test whether this ExitLimit contains any computed information, or		/// Test whether this ExitLimit contains any computed information, or
/// whether it's all SCEVCouldNotCompute values.		/// whether it's all SCEVCouldNotCompute values.
bool hasAnyInfo() const {		bool hasAnyInfo() const {
return !isa<SCEVCouldNotCompute>(Exact) \|\|		return !isa<SCEVCouldNotCompute>(Exact) \|\|
!isa<SCEVCouldNotCompute>(Max);		!isa<SCEVCouldNotCompute>(Max);
}		}

		/// Test whether this ExitLimit contains all information.
		bool hasFullInfo() const { return !isa<SCEVCouldNotCompute>(Exact); }
};		};

		/// Forward declaration of ExitNotTakenExtras
		struct ExitNotTakenExtras;

/// Information about the number of times a particular loop exit may be		/// Information about the number of times a particular loop exit may be
		sanjoyUnsubmitted Not Done Reply Inline Actions Leave a line between declarations. sanjoy: Leave a line between declarations.
/// reached before exiting the loop.		/// reached before exiting the loop.
struct ExitNotTakenInfo {		struct ExitNotTakenInfo {
AssertingVH<BasicBlock> ExitingBlock;		AssertingVH<BasicBlock> ExitingBlock;
const SCEV *ExactNotTaken;		const SCEV *ExactNotTaken;
PointerIntPair<ExitNotTakenInfo*, 1> NextExit;
		PointerIntPair<ExitNotTakenExtras *, 1> ExtraInfo;

ExitNotTakenInfo() : ExitingBlock(nullptr), ExactNotTaken(nullptr) {}		ExitNotTakenInfo() : ExitingBlock(nullptr), ExactNotTaken(nullptr) {}
		sanjoyUnsubmitted Not Done Reply Inline Actions Now that you're embedding `SCEVUnionPredicate` in every `ExitNotTakenInfo`, I'm somewhat worried about the `SmallVector<const SCEVPredicate , 16> Preds;` :) Firstly, what is the typical number of predicates we see? 16 seems higher that what I would expect. Secondly, a better (in terms of memory consumption) solution would be to have a `std::unique_ptr<SCEVUnionPredicate>` here, where a null value means "always true" so that in the non-predicated case we only use up one word. The most compact solution (I think we should go with this) is to: Extract out a `struct ExitNotTakenExtras` that contains a `SmallVector` of `ExitNotTaken` s and a `SCEVUnionPredicate` Change `PointerIntPair<ExitNotTakenInfo, 1> NextExit` to `PointerIntPair<ExitNotTakenExtras , 1> NextExit` where it is usually null, but if it is non-null you either have multiple exits or have some SCEV predicates. Normally I wouldn't worry so much about space, but `ExitNotTakenInfo` has clearly been optimized for space usage, so we should try not to break that. sanjoy:* Now that you're embedding `SCEVUnionPredicate` in every `ExitNotTakenInfo`, I'm somewhat…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Ok, the most common size of the SCEVUnionPredicate there should be 0, so going with your solution makes sense to me. On the other hand, it does seem a bit strange to optimize this for memory consumption. I wouldn't expect it to make that much of a difference. sbaranga: Ok, the most common size of the SCEVUnionPredicate there should be 0, so going with your…
		sanjoyUnsubmitted Not Done Reply Inline Actions My guess is that there are enough `ExitNotTakenInfo` instances floating around in a typical compilation that its size becomes important. But maybe @atrick has a more cogent answer? sanjoy: My //guess// is that there are enough `ExitNotTakenInfo` instances floating around in a typical…
		atrickUnsubmitted Not Done Reply Inline Actions I don't have a better answer but agree that SCEV instances need to be size conscious, just as other IR data types do. I would avoid allocating and initializing 16 words that are unlikely to be used. atrick: I don't have a better answer but agree that SCEV instances need to be size conscious, just as…
		ExitNotTakenInfo(BasicBlock ExitBlock, const SCEV Expr,
		ExitNotTakenExtras *Ptr)
		: ExitingBlock(ExitBlock), ExactNotTaken(Expr) {
		ExtraInfo.setPointer(Ptr);
		}

/// Return true if all loop exits are computable.		/// Return true if all loop exits are computable.
bool isCompleteList() const {		bool isCompleteList() const { return ExtraInfo.getInt() == 0; }
return NextExit.getInt() == 0;
		/// Sets the incomplete property, indicating that one of the loop exits
		sanjoyUnsubmitted Not Done Reply Inline Actions This is pretty obvious by name, but I'd add a doxygen comment for consistency. sanjoy: This is pretty obvious by name, but I'd add a doxygen comment for consistency.
		/// doesn't have a corresponding ExitNotTakenInfo entry.
		void setIncomplete() { ExtraInfo.setInt(1); }

		/// Returns a pointer to the predicate associated with this information,
		/// or nullptr if this doesn't exist (meaning always true).
		sanjoyUnsubmitted Not Done Reply Inline Actions I'd remove the braces. Also a more LLVM-idiomatic way of writing this would be if (auto Info = ExtraInfo.getPointer()) return &Info->Pred; return nullptr; sanjoy:* I'd remove the braces. Also a more LLVM-idiomatic way of writing this would be ``` if (auto…
		SCEVUnionPredicate *getPred() const {
		if (auto *Info = ExtraInfo.getPointer())
		return &Info->Pred;

		return nullptr;
}		}

void setIncomplete() { NextExit.setInt(1); }		/// Return true if the SCEV predicate associated with this information
		/// is always true.
		bool hasAlwaysTruePred() const {
		return !getPred() \|\| getPred()->isAlwaysTrue();
		sanjoyUnsubmitted Not Done Reply Inline Actions I'd mildly prefer `return !getPred() \|\| getPred()->isAlwaysTrue();`, but if you like this better, that's fine too. sanjoy: I'd mildly prefer `return !getPred() \|\| getPred()->isAlwaysTrue();`, but if you like this…
		}

/// Return a pointer to the next exit's not-taken info.		/// Defines a simple forward iterator for ExitNotTakenInfo.
ExitNotTakenInfo *getNextExit() const {		class ExitNotTakenInfoIterator
		sanjoyUnsubmitted Not Done Reply Inline Actions This is great! sanjoy: This is great!
return NextExit.getPointer();		: public std::iterator<std::forward_iterator_tag, ExitNotTakenInfo> {
		const ExitNotTakenInfo *Start;
		unsigned Position;

		public:
		ExitNotTakenInfoIterator(const ExitNotTakenInfo *Start,
		unsigned Position)
		: Start(Start), Position(Position) {}

		const ExitNotTakenInfo &operator*() const {
		if (Position == 0)
		return *Start;

		return Start->ExtraInfo.getPointer()->Exits[Position - 1];
		}

		const ExitNotTakenInfo *operator->() const {
		if (Position == 0)
		return Start;

		return &Start->ExtraInfo.getPointer()->Exits[Position - 1];
}		}

void setNextExit(ExitNotTakenInfo *ENT) { NextExit.setPointer(ENT); }		bool operator==(const ExitNotTakenInfoIterator &RHS) const {
		return Start == RHS.Start && Position == RHS.Position;
		}

		bool operator!=(const ExitNotTakenInfoIterator &RHS) const {
		return Start != RHS.Start \|\| Position != RHS.Position;
		}

		ExitNotTakenInfoIterator &operator++() { // Preincrement
		if (!Start)
		return *this;

		unsigned Elements =
		Start->ExtraInfo.getPointer()
		? Start->ExtraInfo.getPointer()->Exits.size() + 1
		: 1;

		++Position;

		// We've run out of elements.
		if (Position == Elements) {
		Start = nullptr;
		Position = 0;
		}

		return *this;
		}
		ExitNotTakenInfoIterator operator++(int) { // Postincrement
		ExitNotTakenInfoIterator Tmp = *this;
		++*this;
		return Tmp;
		}
		};

		/// Iterators
		ExitNotTakenInfoIterator begin() const {
		sanjoyUnsubmitted Not Done Reply Inline Actions The following does not need to be done for this change, but this is just to note some cleanup work one of us should consider taking up in the future: Now that we're being so nice and well factored about this, I think it will be even nicer if we have a good containment relationship. Specifically, it will be nice to have a top level `ExitTakenInfoSet` struct that contains one inlined `ExitTakenInfo` object, and a `PointerIntPair` of `ExitTakenInfoSetExtras` and the `isComplete` bit. sanjoy: The following does not need to be done for this change, but this is just to note some…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Makes sense to me. sbaranga: Makes sense to me.
		return ExitNotTakenInfoIterator(this, 0);
		}
		ExitNotTakenInfoIterator end() const {
		return ExitNotTakenInfoIterator(nullptr, 0);
		}
		};

		/// Describes the extra information that a ExitNotTakenInfo can have.
		struct ExitNotTakenExtras {
		/// The predicate associated with the ExitNotTakenInfo struct.
		SCEVUnionPredicate Pred;
		sanjoyUnsubmitted Not Done Reply Inline Actions I think the code would be cleaner if we changed `Pred` to a `std::vector` or `SmallVector` of predicates, and have the invariant be: `ExtraInfo` in the "root" `ExitNotTaken` instance is `nullptr` means that there is one unpredicated exit count `ExtraInfo` in the "root" `ExitNotTaken` instance is not `nullptr` means that i'th exit count is in `i == 0 ? Root : Root->Extra.Exits[i - 1]` (exactly as it is today), and the predicate for the i'th exit count is `Root->Extra ? Root->Extra.Exits[i] : AlwaysTrue`. This is less efficient than the layout you have here, but is simpler; and we still have the nice property that unpredicated exit counts for single exit loops can be represented compactly. Does the above sound feasible? If it does, then lets please go ahead with that; else let me know and I'll review the `BackedgeTakenInfo` constructor as is. sanjoy: I think the code would be cleaner if we changed `Pred` to a `std::vector` or `SmallVector` of…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions The problem with this is we can't get the predicate of an ExitNotTaken without knowing the root, so the information there is not self contained anymore. So things like the isPredicateAlwaysTrue() method would need to change (because we don't know the root), probably by taking the ExitNotTaken root as the argument? Would there be a way around this? If not, I would prefer to keep the current solution. sbaranga: The problem with this is we can't get the predicate of an ExitNotTaken without knowing the root…
		sanjoyUnsubmitted Not Done Reply Inline Actions SGTM, lets go with what we have now. I still think the data layout here can be simplified, but we (I'm happy to take care of that) can do that later once this change is in. sanjoy: SGTM, lets go with what we have now. I still think the data layout here can be simplified, but…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Thanks! Greatly appreciated :). sbaranga: Thanks! Greatly appreciated :).

		sanjoyUnsubmitted Not Done Reply Inline Actions Again, please leave blank lines between decls. sanjoy: Again, please leave blank lines between decls.
		/// The extra exits in the loop. Only the ExitNotTakenExtras structure
		/// pointed to by the first ExitNotTakenInfo struct (associated with the
		/// first loop exit) will populate this vector to prevent having
		/// redundant information.
		SmallVector<ExitNotTakenInfo, 4> Exits;
		};

		/// A struct containing the information attached to a backedge.
		struct EdgeInfo {
		EdgeInfo(BasicBlock Block, const SCEV Taken, SCEVUnionPredicate &P) :
		ExitBlock(Block), Taken(Taken), Pred(std::move(P)) {}

		/// The exit basic block.
		BasicBlock *ExitBlock;

		/// The (exact) number of time we take the edge back.
		const SCEV *Taken;

		/// The SCEV predicated associated with Taken. If Pred doesn't evaluate
		/// to true, the information in Taken is not valid (or equivalent with
		/// a CouldNotCompute.
		SCEVUnionPredicate Pred;
};		};

/// Information about the backedge-taken count of a loop. This currently		/// Information about the backedge-taken count of a loop. This currently
/// includes an exact count and a maximum count.		/// includes an exact count and a maximum count.
///		///
class BackedgeTakenInfo {		class BackedgeTakenInfo {
/// A list of computable exits and their not-taken counts. Loops almost		/// A list of computable exits and their not-taken counts. Loops almost
/// never have more than one computable exit.		/// never have more than one computable exit.
ExitNotTakenInfo ExitNotTaken;		ExitNotTakenInfo ExitNotTaken;

/// An expression indicating the least maximum backedge-taken count of the		/// An expression indicating the least maximum backedge-taken count of the
/// loop that is known, or a SCEVCouldNotCompute.		/// loop that is known, or a SCEVCouldNotCompute. This expression is only
		/// valid if the predicates associated with all loop exits are true.
const SCEV *Max;		const SCEV *Max;
		sanjoyUnsubmitted Not Done Reply Inline Actions Add a comment here that `Max` is valid only if the predicates on each of the `ExitNotTakenInfo` instances is true. sanjoy: Add a comment here that `Max` is valid only if the predicates on each of the `ExitNotTakenInfo`…

public:		public:
BackedgeTakenInfo() : Max(nullptr) {}		BackedgeTakenInfo() : Max(nullptr) {}

/// Initialize BackedgeTakenInfo from a list of exact exit counts.		/// Initialize BackedgeTakenInfo from a list of exact exit counts.
BackedgeTakenInfo(		BackedgeTakenInfo(SmallVectorImpl<EdgeInfo> &ExitCounts, bool Complete,
SmallVectorImpl< std::pair<BasicBlock , const SCEV > > &ExitCounts,		const SCEV *MaxCount);
		sanjoyUnsubmitted Not Done Reply Inline Actions I think the right API here is to promote the `std::pair` into a full struct that carries the block, backedge count and a predicate; and not have a separate vector for `ExitPreds`. You could also use `std::tuple` here (since the types are all different, there isn't much scope for confusion). sanjoy: I think the right API here is to promote the `std::pair` into a full struct that carries the…
bool Complete, const SCEV *MaxCount);

/// Test whether this BackedgeTakenInfo contains any computed information,		/// Test whether this BackedgeTakenInfo contains any computed information,
/// or whether it's all SCEVCouldNotCompute values.		/// or whether it's all SCEVCouldNotCompute values.
bool hasAnyInfo() const {		bool hasAnyInfo() const {
return ExitNotTaken.ExitingBlock \|\| !isa<SCEVCouldNotCompute>(Max);		return ExitNotTaken.ExitingBlock \|\| !isa<SCEVCouldNotCompute>(Max);
}		}

		/// Test whether this BackedgeTakenInfo contains complete information.
		bool hasFullInfo() const { return ExitNotTaken.isCompleteList(); }

/// Return an expression indicating the exact backedge-taken count of the		/// Return an expression indicating the exact backedge-taken count of the
/// loop if it is known, or SCEVCouldNotCompute otherwise. This is the		/// loop if it is known or SCEVCouldNotCompute otherwise. This is the
/// number of times the loop header can be guaranteed to execute, minus		/// number of times the loop header can be guaranteed to execute, minus
/// one.		/// one.
		sanjoyUnsubmitted Not Done Reply Inline Actions "count of the" is repeated sanjoy: "count of the" is repeated
const SCEV getExact(ScalarEvolution SE) const;		///
		/// If the SCEV predicate associated with the answer can be different
		sanjoyUnsubmitted Not Done Reply Inline Actions Can you please make this comment a little more clear? Is it that the caller needs to know in advance, by some other means, that the backedge taken count is predicated and pass in a non-null `Predicates`? If so, what is that other means? The `always correct (independent of any assumptions that should be checked at run-time)` is misleading too -- you do sometimes return exact BE counts are dependent on a run-time assumption. sanjoy: Can you please make this comment a little more clear? Is it that the caller needs to know in…
		/// from AlwaysTrue, we must add a (non null) Predicates argument.
		/// The SCEV predicate associated with the answer will be added to
		/// Predicates. A run-time check needs to be emitted for the SCEV
		/// predicate in order for the answer to be valid.
		///
		/// Note that we should always know if we need to pass a predicate
		/// argument or not from the way the ExitCounts vector was computed.
		/// If we allowed SCEV predicates to be generated when populating this
		/// vector, this information can contain them and therefore a
		/// SCEVPredicate argument should be added to getExact.
		const SCEV getExact(ScalarEvolution SE,
		SCEVUnionPredicate *Predicates = nullptr) const;

/// Return the number of times this loop exit may fall through to the back		/// Return the number of times this loop exit may fall through to the back
/// edge, or SCEVCouldNotCompute. The loop is guaranteed not to exit via		/// edge, or SCEVCouldNotCompute. The loop is guaranteed not to exit via
/// this block before this number of iterations, but may exit via another		/// this block before this number of iterations, but may exit via another
/// block.		/// block.
const SCEV getExact(BasicBlock ExitingBlock, ScalarEvolution *SE) const;		const SCEV getExact(BasicBlock ExitingBlock, ScalarEvolution *SE) const;

/// Get the max backedge taken count for the loop.		/// Get the max backedge taken count for the loop.
const SCEV getMax(ScalarEvolution SE) const;		const SCEV getMax(ScalarEvolution SE) const;

/// Return true if any backedge taken count expressions refer to the given		/// Return true if any backedge taken count expressions refer to the given
/// subexpression.		/// subexpression.
bool hasOperand(const SCEV S, ScalarEvolution SE) const;		bool hasOperand(const SCEV S, ScalarEvolution SE) const;

/// Invalidate this result and free associated memory.		/// Invalidate this result and free associated memory.
void clear();		void clear();
};		};

/// Cache the backedge-taken count of the loops for this function as they		/// Cache the backedge-taken count of the loops for this function as they
/// are computed.		/// are computed.
DenseMap<const Loop*, BackedgeTakenInfo> BackedgeTakenCounts;		DenseMap<const Loop *, BackedgeTakenInfo> BackedgeTakenCounts;

		sanjoyUnsubmitted Not Done Reply Inline Actions Here and elsewhere, can you please add a newline after field declarations (i.e. after the declaration for `BackedgeTakenCounts`)? sanjoy: Here and elsewhere, can you please add a newline after field declarations (i.e. after the…
		/// Cache the predicated backedge-taken count of the loops for this
		/// function as they are computed.
		DenseMap<const Loop *, BackedgeTakenInfo> PredicatedBackedgeTakenCounts;

/// This map contains entries for all of the PHI instructions that we		/// This map contains entries for all of the PHI instructions that we
/// attempt to compute constant evolutions for. This allows us to avoid		/// attempt to compute constant evolutions for. This allows us to avoid
/// potentially expensive recomputation of these properties. An instruction		/// potentially expensive recomputation of these properties. An instruction
/// maps to null if we are unable to compute its exit value.		/// maps to null if we are unable to compute its exit value.
DenseMap<PHINode, Constant> ConstantEvolutionLoopExitValue;		DenseMap<PHINode, Constant> ConstantEvolutionLoopExitValue;

/// This map contains entries for all the expressions that we attempt to		/// This map contains entries for all the expressions that we attempt to
/// compute getSCEVAtScope information for, which can be expensive in		/// compute getSCEVAtScope information for, which can be expensive in
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	private:
const SCEV computeSCEVAtScope(const SCEV S, const Loop *L);		const SCEV computeSCEVAtScope(const SCEV S, const Loop *L);

/// This looks up computed SCEV values for all instructions that depend on		/// This looks up computed SCEV values for all instructions that depend on
/// the given instruction and removes them from the ValueExprMap map if they		/// the given instruction and removes them from the ValueExprMap map if they
/// reference SymName. This is used during PHI resolution.		/// reference SymName. This is used during PHI resolution.
void forgetSymbolicName(Instruction I, const SCEV SymName);		void forgetSymbolicName(Instruction I, const SCEV SymName);

/// Return the BackedgeTakenInfo for the given loop, lazily computing new		/// Return the BackedgeTakenInfo for the given loop, lazily computing new
/// values if the loop hasn't been analyzed yet.		/// values if the loop hasn't been analyzed yet. The returned result is
		/// guaranteed not to be predicated.
const BackedgeTakenInfo &getBackedgeTakenInfo(const Loop *L);		const BackedgeTakenInfo &getBackedgeTakenInfo(const Loop *L);

		/// Similar to getBackedgeTakenInfo, but will add predicates as required
		/// with the purpose of returning complete information.
		const BackedgeTakenInfo &getPredicatedBackedgeTakenInfo(const Loop *L);

		sanjoyUnsubmitted Not Done Reply Inline Actions I'd remove the `(the exact backedge taken count will be known)` bit -- it can otherwise mislead people into thinking that this will return a precise be count, no matter what. sanjoy: I'd remove the `(the exact backedge taken count will be known)` bit -- it can otherwise mislead…
/// Compute the number of times the specified loop will iterate.		/// Compute the number of times the specified loop will iterate.
BackedgeTakenInfo computeBackedgeTakenCount(const Loop *L);		/// If AllowPredicates is set, we will create new SCEV predicates as
		/// necessary in order to return an exact answer.
		BackedgeTakenInfo computeBackedgeTakenCount(const Loop *L,
		bool AllowPredicates = false);
		sanjoyUnsubmitted Not Done Reply Inline Actions This is minor, but the `IsGuarded` name is somewhat misleading, since the loop may not already be guarded. Why not call this `AddPredicates` or `AllowPredicates`? sanjoy: This is minor, but the `IsGuarded` name is somewhat misleading, since the loop may not already…

/// Compute the number of times the backedge of the specified loop will		/// Compute the number of times the backedge of the specified loop will
/// execute if it exits via the specified block.		/// execute if it exits via the specified block. If AllowPredicates is set,
ExitLimit computeExitLimit(const Loop L, BasicBlock ExitingBlock);		/// this call will try to use a minimal set of SCEV predicates in order to
		/// return an exact answer.
		ExitLimit computeExitLimit(const Loop L, BasicBlock ExitingBlock,
		bool AllowPredicates = false);

/// Compute the number of times the backedge of the specified loop will		/// Compute the number of times the backedge of the specified loop will
/// execute if its exit condition were a conditional branch of ExitCond,		/// execute if its exit condition were a conditional branch of ExitCond,
/// TBB, and FBB.		/// TBB, and FBB. If AllowPredicates is set, this call will try to use a
		/// minimal set of SCEV predicates in order to return an exact answer.
		sanjoyUnsubmitted Not Done Reply Inline Actions Nit: the parameter name is `IsGuarded` and not `Guarded`. But, as I said earlier, it is probably better to rename this to `AddPredicates`, `AllowPredicates` or `CreatePredicates` as you have below. sanjoy: Nit: the parameter name is `IsGuarded` and not `Guarded`. But, as I said earlier, it is…
		sanjoyUnsubmitted Done Reply Inline Actions Nit (here and below) "this call will try to" sanjoy: Nit (here and below) "this call will try to"
ExitLimit computeExitLimitFromCond(const Loop *L,		ExitLimit computeExitLimitFromCond(const Loop *L,
Value *ExitCond,		Value *ExitCond,
BasicBlock *TBB,		BasicBlock *TBB,
BasicBlock *FBB,		BasicBlock *FBB,
bool IsSubExpr);		bool IsSubExpr,
		bool AllowPredicates = false);
		sanjoyUnsubmitted Not Done Reply Inline Actions Please consistently use one of `AllowPredicates` or `CreatePredicates`. Also there is some inconsistency around when `CreatePredicates` has a default value and when it does not -- is there a reason for the difference? sanjoy: Please consistently use one of `AllowPredicates` or `CreatePredicates`. Also there is some…

/// Compute the number of times the backedge of the specified loop will		/// Compute the number of times the backedge of the specified loop will
/// execute if its exit condition were a conditional branch of the ICmpInst		/// execute if its exit condition were a conditional branch of the ICmpInst
/// ExitCond, TBB, and FBB.		/// ExitCond, TBB, and FBB. If AllowPredicates is set, this call will try
		/// to use a minimal set of SCEV predicates in order to return an exact
		/// answer.
		sanjoyUnsubmitted Not Done Reply Inline Actions Describe `UseAssumptions` here. sanjoy: Describe `UseAssumptions` here.
ExitLimit computeExitLimitFromICmp(const Loop *L,		ExitLimit computeExitLimitFromICmp(const Loop *L,
ICmpInst *ExitCond,		ICmpInst *ExitCond,
BasicBlock *TBB,		BasicBlock *TBB,
BasicBlock *FBB,		BasicBlock *FBB,
bool IsSubExpr);		bool IsSubExpr,
		bool AllowPredicates = false);

/// Compute the number of times the backedge of the specified loop will		/// Compute the number of times the backedge of the specified loop will
/// execute if its exit condition were a switch with a single exiting case		/// execute if its exit condition were a switch with a single exiting case
/// to ExitingBB.		/// to ExitingBB.
ExitLimit		ExitLimit
computeExitLimitFromSingleExitSwitch(const Loop L, SwitchInst Switch,		computeExitLimitFromSingleExitSwitch(const Loop L, SwitchInst Switch,
BasicBlock *ExitingBB, bool IsSubExpr);		BasicBlock *ExitingBB, bool IsSubExpr);

Show All 21 Lines	private:
/// (true or false). If we cannot evaluate the exit count of the loop,		/// (true or false). If we cannot evaluate the exit count of the loop,
/// return CouldNotCompute.		/// return CouldNotCompute.
const SCEV computeExitCountExhaustively(const Loop L,		const SCEV computeExitCountExhaustively(const Loop L,
Value *Cond,		Value *Cond,
bool ExitWhen);		bool ExitWhen);

/// Return the number of times an exit condition comparing the specified		/// Return the number of times an exit condition comparing the specified
/// value to zero will execute. If not computable, return CouldNotCompute.		/// value to zero will execute. If not computable, return CouldNotCompute.
ExitLimit HowFarToZero(const SCEV V, const Loop L, bool IsSubExpr);		/// If AllowPredicates is set, this call will try to use a minimal set of
		/// SCEV predicates in order to return an exact answer.
		sanjoyUnsubmitted Not Done Reply Inline Actions I don't think `Force` is a good name -- how about `CreateAssumptions` or `CreatePredicates`? Same comment below. sanjoy: I don't think `Force` is a good name -- how about `CreateAssumptions` or `CreatePredicates`?
		ExitLimit HowFarToZero(const SCEV V, const Loop L, bool IsSubExpr,
		bool AllowPredicates = false);

/// Return the number of times an exit condition checking the specified		/// Return the number of times an exit condition checking the specified
/// value for nonzero will execute. If not computable, return		/// value for nonzero will execute. If not computable, return
/// CouldNotCompute.		/// CouldNotCompute.
ExitLimit HowFarToNonZero(const SCEV V, const Loop L);		ExitLimit HowFarToNonZero(const SCEV V, const Loop L);

/// Return the number of times an exit condition containing the specified		/// Return the number of times an exit condition containing the specified
/// less-than comparison will execute. If not computable, return		/// less-than comparison will execute. If not computable, return
/// CouldNotCompute. isSigned specifies whether the less-than is signed.		/// CouldNotCompute. isSigned specifies whether the less-than is signed.
ExitLimit HowManyLessThans(const SCEV LHS, const SCEV RHS,		/// If AllowPredicates is set, this call will try to use a minimal set of
const Loop *L, bool isSigned, bool IsSubExpr);		/// SCEV predicates in order to return an exact answer.
		ExitLimit HowManyLessThans(const SCEV LHS, const SCEV RHS, const Loop *L,
		bool isSigned, bool IsSubExpr,
		bool AllowPredicates = false);

ExitLimit HowManyGreaterThans(const SCEV LHS, const SCEV RHS,		ExitLimit HowManyGreaterThans(const SCEV LHS, const SCEV RHS,
const Loop *L, bool isSigned, bool IsSubExpr);		const Loop *L, bool isSigned, bool IsSubExpr,
		bool AllowPredicates = false);

/// Return a predecessor of BB (which may not be an immediate predecessor)		/// Return a predecessor of BB (which may not be an immediate predecessor)
/// which has exactly one successor from which BB is reachable, or null if		/// which has exactly one successor from which BB is reachable, or null if
/// no such block is found.		/// no such block is found.
std::pair<BasicBlock , BasicBlock >		std::pair<BasicBlock , BasicBlock >
getPredecessorWithUniqueSuccessorForBB(BasicBlock *BB);		getPredecessorWithUniqueSuccessorForBB(BasicBlock *BB);

/// Test whether the condition described by Pred, LHS, and RHS is true		/// Test whether the condition described by Pred, LHS, and RHS is true
▲ Show 20 Lines • Show All 361 Lines • ▼ Show 20 Lines	public:
/// outside the loop.		/// outside the loop.
///		///
/// Note that it is not valid to call this method on a loop without a		/// Note that it is not valid to call this method on a loop without a
/// loop-invariant backedge-taken count (see		/// loop-invariant backedge-taken count (see
/// hasLoopInvariantBackedgeTakenCount).		/// hasLoopInvariantBackedgeTakenCount).
///		///
const SCEV getBackedgeTakenCount(const Loop L);		const SCEV getBackedgeTakenCount(const Loop L);

		/// Similar to getBackedgeTakenCount, except it will add a set of
		sanjoyUnsubmitted Not Done Reply Inline Actions Add a comment for this (fine to extend the one on `getBackedgeTakenCount`) sanjoy: Add a comment for this (fine to extend the one on `getBackedgeTakenCount`)
		/// SCEV predicates to Predicates that are required to be true in order for
		/// the answer to be correct. Predicates can be checked with run-time
		/// checks and can be used to perform loop versioning.
		const SCEV getPredicatedBackedgeTakenCount(const Loop L,
		SCEVUnionPredicate &Predicates);
		sanjoyUnsubmitted Not Done Reply Inline Actions Nit: indentation. sanjoy: Nit: indentation.

/// Similar to getBackedgeTakenCount, except return the least SCEV value		/// Similar to getBackedgeTakenCount, except return the least SCEV value
/// that is known never to be less than the actual backedge taken count.		/// that is known never to be less than the actual backedge taken count.
const SCEV getMaxBackedgeTakenCount(const Loop L);		const SCEV getMaxBackedgeTakenCount(const Loop L);

/// Return true if the specified loop has an analyzable loop-invariant		/// Return true if the specified loop has an analyzable loop-invariant
/// backedge-taken count.		/// backedge-taken count.
bool hasLoopInvariantBackedgeTakenCount(const Loop *L);		bool hasLoopInvariantBackedgeTakenCount(const Loop *L);

▲ Show 20 Lines • Show All 309 Lines • ▼ Show 20 Lines	public:
PredicatedScalarEvolution(ScalarEvolution &SE, Loop &L);		PredicatedScalarEvolution(ScalarEvolution &SE, Loop &L);
const SCEVUnionPredicate &getUnionPredicate() const;		const SCEVUnionPredicate &getUnionPredicate() const;
/// \brief Returns the SCEV expression of V, in the context of the current		/// \brief Returns the SCEV expression of V, in the context of the current
/// SCEV predicate.		/// SCEV predicate.
/// The order of transformations applied on the expression of V returned		/// The order of transformations applied on the expression of V returned
/// by ScalarEvolution is guaranteed to be preserved, even when adding new		/// by ScalarEvolution is guaranteed to be preserved, even when adding new
/// predicates.		/// predicates.
const SCEV getSCEV(Value V);		const SCEV getSCEV(Value V);
		/// Get the (predicated) backedge count for the analyzed loop.
		const SCEV *getBackedgeTakenCount();
/// \brief Adds a new predicate.		/// \brief Adds a new predicate.
void addPredicate(const SCEVPredicate &Pred);		void addPredicate(const SCEVPredicate &Pred);
/// \brief Attempts to produce an AddRecExpr for V by adding additional		/// \brief Attempts to produce an AddRecExpr for V by adding additional
/// SCEV predicates. If we can't transform the expression into an		/// SCEV predicates. If we can't transform the expression into an
/// AddRecExpr we return nullptr and not add additional SCEV predicates		/// AddRecExpr we return nullptr and not add additional SCEV predicates
/// to the current context.		/// to the current context.
const SCEVAddRecExpr getAsAddRec(Value V);		const SCEVAddRecExpr getAsAddRec(Value V);
/// \brief Proves that V doesn't overflow by adding SCEV predicate.		/// \brief Proves that V doesn't overflow by adding SCEV predicate.
Show All 27 Lines	private:
/// The SCEVPredicate that forms our context. We will rewrite all		/// The SCEVPredicate that forms our context. We will rewrite all
/// expressions assuming that this predicate true.		/// expressions assuming that this predicate true.
SCEVUnionPredicate Preds;		SCEVUnionPredicate Preds;
/// Marks the version of the SCEV predicate used. When rewriting a SCEV		/// Marks the version of the SCEV predicate used. When rewriting a SCEV
/// expression we mark it with the version of the predicate. We use this to		/// expression we mark it with the version of the predicate. We use this to
/// figure out if the predicate has changed from the last rewrite of the		/// figure out if the predicate has changed from the last rewrite of the
/// SCEV. If so, we need to perform a new rewrite.		/// SCEV. If so, we need to perform a new rewrite.
unsigned Generation;		unsigned Generation;
		/// The backedge taken count.
		const SCEV *BackedgeCount;
};		};
}		}

#endif		#endif

lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	void RuntimePointerChecking::insert(Loop Lp, Value Ptr, bool WritePtr,
const SCEV *ScStart;		const SCEV *ScStart;
const SCEV *ScEnd;		const SCEV *ScEnd;

if (SE->isLoopInvariant(Sc, Lp))		if (SE->isLoopInvariant(Sc, Lp))
ScStart = ScEnd = Sc;		ScStart = ScEnd = Sc;
else {		else {
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);
assert(AR && "Invalid addrec expression");		assert(AR && "Invalid addrec expression");
const SCEV *Ex = SE->getBackedgeTakenCount(Lp);		const SCEV *Ex = PSE.getBackedgeTakenCount();

ScStart = AR->getStart();		ScStart = AR->getStart();
ScEnd = AR->evaluateAtIteration(Ex, *SE);		ScEnd = AR->evaluateAtIteration(Ex, *SE);
const SCEV Step = AR->getStepRecurrence(SE);		const SCEV Step = AR->getStepRecurrence(SE);

// For expressions with negative step, the upper bound is ScStart and the		// For expressions with negative step, the upper bound is ScStart and the
// lower bound is ScEnd.		// lower bound is ScEnd.
if (const SCEVConstant *CStep = dyn_cast<const SCEVConstant>(Step)) {		if (const SCEVConstant *CStep = dyn_cast<const SCEVConstant>(Step)) {
▲ Show 20 Lines • Show All 1,303 Lines • ▼ Show 20 Lines	if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
DEBUG(dbgs() << "LAA: loop control flow is not understood by analyzer\n");		DEBUG(dbgs() << "LAA: loop control flow is not understood by analyzer\n");
emitAnalysis(		emitAnalysis(
LoopAccessReport() <<		LoopAccessReport() <<
"loop control flow is not understood by analyzer");		"loop control flow is not understood by analyzer");
return false;		return false;
}		}

// ScalarEvolution needs to be able to find the exit count.		// ScalarEvolution needs to be able to find the exit count.
const SCEV *ExitCount = PSE.getSE()->getBackedgeTakenCount(TheLoop);		const SCEV *ExitCount = PSE.getBackedgeTakenCount();
if (ExitCount == PSE.getSE()->getCouldNotCompute()) {		if (ExitCount == PSE.getSE()->getCouldNotCompute()) {
emitAnalysis(LoopAccessReport()		emitAnalysis(LoopAccessReport()
<< "could not determine number of loop iterations");		<< "could not determine number of loop iterations");
DEBUG(dbgs() << "LAA: SCEV could not compute the loop exit count.\n");		DEBUG(dbgs() << "LAA: SCEV could not compute the loop exit count.\n");
return false;		return false;
}		}

return true;		return true;
▲ Show 20 Lines • Show All 506 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,217 Lines • ▼ Show 20 Lines

// getExitCount - Get the expression for the number of loop iterations for which		// getExitCount - Get the expression for the number of loop iterations for which
// this loop is guaranteed not to exit via ExitingBlock. Otherwise return		// this loop is guaranteed not to exit via ExitingBlock. Otherwise return
// SCEVCouldNotCompute.		// SCEVCouldNotCompute.
const SCEV ScalarEvolution::getExitCount(Loop L, BasicBlock *ExitingBlock) {		const SCEV ScalarEvolution::getExitCount(Loop L, BasicBlock *ExitingBlock) {
return getBackedgeTakenInfo(L).getExact(ExitingBlock, this);		return getBackedgeTakenInfo(L).getExact(ExitingBlock, this);
}		}

		const SCEV *
		ScalarEvolution::getPredicatedBackedgeTakenCount(const Loop *L,
		SCEVUnionPredicate &Preds) {
		return getPredicatedBackedgeTakenInfo(L).getExact(this, &Preds);
		}

/// getBackedgeTakenCount - If the specified loop has a predictable		/// getBackedgeTakenCount - If the specified loop has a predictable
/// backedge-taken count, return it, otherwise return a SCEVCouldNotCompute		/// backedge-taken count, return it, otherwise return a SCEVCouldNotCompute
/// object. The backedge-taken count is the number of times the loop header		/// object. The backedge-taken count is the number of times the loop header
/// will be branched to from within the loop. This is one less than the		/// will be branched to from within the loop. This is one less than the
/// trip count of the loop, since it doesn't count the first iteration,		/// trip count of the loop, since it doesn't count the first iteration,
/// when the header is branched to from outside the loop.		/// when the header is branched to from outside the loop.
///		///
/// Note that it is not valid to call this method on a loop without a		/// Note that it is not valid to call this method on a loop without a
Show All 19 Lines	PushLoopPHIs(const Loop L, SmallVectorImpl<Instruction > &Worklist) {

// Push all Loop-header PHIs onto the Worklist stack.		// Push all Loop-header PHIs onto the Worklist stack.
for (BasicBlock::iterator I = Header->begin();		for (BasicBlock::iterator I = Header->begin();
PHINode *PN = dyn_cast<PHINode>(I); ++I)		PHINode *PN = dyn_cast<PHINode>(I); ++I)
Worklist.push_back(PN);		Worklist.push_back(PN);
}		}

const ScalarEvolution::BackedgeTakenInfo &		const ScalarEvolution::BackedgeTakenInfo &
		ScalarEvolution::getPredicatedBackedgeTakenInfo(const Loop *L) {
		auto &BTI = getBackedgeTakenInfo(L);
		if (BTI.hasFullInfo())
		return BTI;

		auto Pair = PredicatedBackedgeTakenCounts.insert({L, BackedgeTakenInfo()});
		sanjoyUnsubmitted Not Done Reply Inline Actions Use `auto` here. sanjoy: Use `auto` here.

		if (!Pair.second)
		return Pair.first->second;

		BackedgeTakenInfo Result =
		computeBackedgeTakenCount(L, /AllowPredicates=/true);
		sanjoyUnsubmitted Not Done Reply Inline Actions Add a `/* IsGuarded = /` before the `true`. sanjoy:* Add a `/* IsGuarded = */` before the `true`.

		return PredicatedBackedgeTakenCounts.find(L)->second = Result;
		}

		const ScalarEvolution::BackedgeTakenInfo &
ScalarEvolution::getBackedgeTakenInfo(const Loop *L) {		ScalarEvolution::getBackedgeTakenInfo(const Loop *L) {
// Initially insert an invalid entry for this loop. If the insertion		// Initially insert an invalid entry for this loop. If the insertion
// succeeds, proceed to actually compute a backedge-taken count and		// succeeds, proceed to actually compute a backedge-taken count and
// update the value. The temporary CouldNotCompute value tells SCEV		// update the value. The temporary CouldNotCompute value tells SCEV
// code elsewhere that it shouldn't attempt to request a new		// code elsewhere that it shouldn't attempt to request a new
// backedge-taken count, which could result in infinite recursion.		// backedge-taken count, which could result in infinite recursion.
std::pair<DenseMap<const Loop *, BackedgeTakenInfo>::iterator, bool> Pair =		std::pair<DenseMap<const Loop *, BackedgeTakenInfo>::iterator, bool> Pair =
BackedgeTakenCounts.insert({L, BackedgeTakenInfo()});		BackedgeTakenCounts.insert({L, BackedgeTakenInfo()});
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	ScalarEvolution::getBackedgeTakenInfo(const Loop *L) {
return BackedgeTakenCounts.find(L)->second = Result;		return BackedgeTakenCounts.find(L)->second = Result;
}		}

/// forgetLoop - This method should be called by the client when it has		/// forgetLoop - This method should be called by the client when it has
/// changed a loop in a way that may effect ScalarEvolution's ability to		/// changed a loop in a way that may effect ScalarEvolution's ability to
/// compute a trip count, or if the loop is deleted.		/// compute a trip count, or if the loop is deleted.
void ScalarEvolution::forgetLoop(const Loop *L) {		void ScalarEvolution::forgetLoop(const Loop *L) {
// Drop any stored trip count value.		// Drop any stored trip count value.
DenseMap<const Loop*, BackedgeTakenInfo>::iterator BTCPos =		auto RemoveLoopFromBackedgeMap =
BackedgeTakenCounts.find(L);		[L](DenseMap<const Loop *, BackedgeTakenInfo> &Map) {
		sanjoyUnsubmitted Not Done Reply Inline Actions Usually I see lambdas formatted with no space between the `]` and the `(`. I'd say just run the diff through `clang-format` before checkin, and whatever it does is fine. :) sanjoy: Usually I see lambdas formatted with no space between the `]` and the `(`. I'd say just run…
if (BTCPos != BackedgeTakenCounts.end()) {		auto BTCPos = Map.find(L);
		if (BTCPos != Map.end()) {
BTCPos->second.clear();		BTCPos->second.clear();
BackedgeTakenCounts.erase(BTCPos);		Map.erase(BTCPos);
}		}
		};

		RemoveLoopFromBackedgeMap(BackedgeTakenCounts);
		sanjoyUnsubmitted Not Done Reply Inline Actions What you have here is fine, but I'd have tried: for (auto &Map : {BackedgeTakenCounts, PredicatedBackedgeTakenCounts}) { // Remove L from Map } (not 100% sure if the above will work). sanjoy: What you have here is fine, but I'd have tried: ``` for (auto &Map : {BackedgeTakenCounts…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Sanjoy, I've left this as is for now. Let me know if you prefer the form above, and I'll try to change the code to use it. sbaranga: Sanjoy, I've left this as is for now. Let me know if you prefer the form above, and I'll try to…
		sanjoyUnsubmitted Not Done Reply Inline Actions sgtm sanjoy: sgtm
		RemoveLoopFromBackedgeMap(PredicatedBackedgeTakenCounts);

// Drop information about expressions based on loop-header PHIs.		// Drop information about expressions based on loop-header PHIs.
SmallVector<Instruction *, 16> Worklist;		SmallVector<Instruction *, 16> Worklist;
PushLoopPHIs(L, Worklist);		PushLoopPHIs(L, Worklist);

SmallPtrSet<Instruction *, 8> Visited;		SmallPtrSet<Instruction *, 8> Visited;
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Instruction *I = Worklist.pop_back_val();		Instruction *I = Worklist.pop_back_val();
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
/// exits. A computable result can only be returned for loops with a single		/// exits. A computable result can only be returned for loops with a single
/// exit. Returning the minimum taken count among all exits is incorrect		/// exit. Returning the minimum taken count among all exits is incorrect
/// because one of the loop's exit limit's may have been skipped. HowFarToZero		/// because one of the loop's exit limit's may have been skipped. HowFarToZero
/// assumes that the limit of each loop test is never skipped. This is a valid		/// assumes that the limit of each loop test is never skipped. This is a valid
/// assumption as long as the loop exits via that test. For precise results, it		/// assumption as long as the loop exits via that test. For precise results, it
/// is the caller's responsibility to specify the relevant loop exit using		/// is the caller's responsibility to specify the relevant loop exit using
/// getExact(ExitingBlock, SE).		/// getExact(ExitingBlock, SE).
const SCEV *		const SCEV *
ScalarEvolution::BackedgeTakenInfo::getExact(ScalarEvolution *SE) const {		ScalarEvolution::BackedgeTakenInfo::getExact(
		ScalarEvolution SE, SCEVUnionPredicate Preds) const {
// If any exits were not computable, the loop is not computable.		// If any exits were not computable, the loop is not computable.
if (!ExitNotTaken.isCompleteList()) return SE->getCouldNotCompute();		if (!ExitNotTaken.isCompleteList()) return SE->getCouldNotCompute();

// We need exactly one computable exit.		// We need exactly one computable exit.
if (!ExitNotTaken.ExitingBlock) return SE->getCouldNotCompute();		if (!ExitNotTaken.ExitingBlock) return SE->getCouldNotCompute();
assert(ExitNotTaken.ExactNotTaken && "uninitialized not-taken info");		assert(ExitNotTaken.ExactNotTaken && "uninitialized not-taken info");

const SCEV *BECount = nullptr;		const SCEV *BECount = nullptr;
for (const ExitNotTakenInfo *ENT = &ExitNotTaken;		for (auto &ENT : ExitNotTaken) {
ENT != nullptr; ENT = ENT->getNextExit()) {		assert(ENT.ExactNotTaken != SE->getCouldNotCompute() && "bad exit SCEV");

assert(ENT->ExactNotTaken != SE->getCouldNotCompute() && "bad exit SCEV");

if (!BECount)		if (!BECount)
BECount = ENT->ExactNotTaken;		BECount = ENT.ExactNotTaken;
else if (BECount != ENT->ExactNotTaken)		else if (BECount != ENT.ExactNotTaken)
return SE->getCouldNotCompute();		return SE->getCouldNotCompute();
		if (Preds && ENT.getPred())
		Preds->add(ENT.getPred());

		assert((Preds \|\| ENT.hasAlwaysTruePred()) &&
		"Predicate should be always true!");
}		}

assert(BECount && "Invalid not taken count for loop exit");		assert(BECount && "Invalid not taken count for loop exit");
return BECount;		return BECount;
}		}

/// getExact - Get the exact not taken count for this loop exit.		/// getExact - Get the exact not taken count for this loop exit.
const SCEV *		const SCEV *
ScalarEvolution::BackedgeTakenInfo::getExact(BasicBlock *ExitingBlock,		ScalarEvolution::BackedgeTakenInfo::getExact(BasicBlock *ExitingBlock,
ScalarEvolution *SE) const {		ScalarEvolution *SE) const {
for (const ExitNotTakenInfo *ENT = &ExitNotTaken;		for (auto &ENT : ExitNotTaken)
ENT != nullptr; ENT = ENT->getNextExit()) {		if (ENT.ExitingBlock == ExitingBlock && ENT.hasAlwaysTruePred())
		return ENT.ExactNotTaken;

if (ENT->ExitingBlock == ExitingBlock)
return ENT->ExactNotTaken;
}
return SE->getCouldNotCompute();		return SE->getCouldNotCompute();
}		}

/// getMax - Get the max backedge taken count for the loop.		/// getMax - Get the max backedge taken count for the loop.
const SCEV *		const SCEV *
ScalarEvolution::BackedgeTakenInfo::getMax(ScalarEvolution *SE) const {		ScalarEvolution::BackedgeTakenInfo::getMax(ScalarEvolution *SE) const {
		for (auto &ENT : ExitNotTaken)
		if (!ENT.hasAlwaysTruePred())
		return SE->getCouldNotCompute();

return Max ? Max : SE->getCouldNotCompute();		return Max ? Max : SE->getCouldNotCompute();
}		}

bool ScalarEvolution::BackedgeTakenInfo::hasOperand(const SCEV *S,		bool ScalarEvolution::BackedgeTakenInfo::hasOperand(const SCEV *S,
ScalarEvolution *SE) const {		ScalarEvolution *SE) const {
if (Max && Max != SE->getCouldNotCompute() && SE->hasOperand(Max, S))		if (Max && Max != SE->getCouldNotCompute() && SE->hasOperand(Max, S))
		sanjoyUnsubmitted Not Done Reply Inline Actions I don't think you need the parens around `ENT->Pred`. sanjoy: I don't think you need the parens around `ENT->Pred`.
return true;		return true;

if (!ExitNotTaken.ExitingBlock)		if (!ExitNotTaken.ExitingBlock)
return false;		return false;

for (const ExitNotTakenInfo *ENT = &ExitNotTaken;		for (auto &ENT : ExitNotTaken)
ENT != nullptr; ENT = ENT->getNextExit()) {		if (ENT.ExactNotTaken != SE->getCouldNotCompute() &&
		SE->hasOperand(ENT.ExactNotTaken, S))
if (ENT->ExactNotTaken != SE->getCouldNotCompute()
&& SE->hasOperand(ENT->ExactNotTaken, S)) {
return true;		return true;
}
}
return false;		return false;
}		}

/// Allocate memory for BackedgeTakenInfo and copy the not-taken count of each		/// Allocate memory for BackedgeTakenInfo and copy the not-taken count of each
/// computable exit into a persistent ExitNotTakenInfo array.		/// computable exit into a persistent ExitNotTakenInfo array.
ScalarEvolution::BackedgeTakenInfo::BackedgeTakenInfo(		ScalarEvolution::BackedgeTakenInfo::BackedgeTakenInfo(
SmallVectorImpl< std::pair<BasicBlock , const SCEV > > &ExitCounts,		SmallVectorImpl<EdgeInfo> &ExitCounts, bool Complete, const SCEV *MaxCount)
bool Complete, const SCEV *MaxCount) : Max(MaxCount) {		: Max(MaxCount) {

if (!Complete)		if (!Complete)
ExitNotTaken.setIncomplete();		ExitNotTaken.setIncomplete();

unsigned NumExits = ExitCounts.size();		unsigned NumExits = ExitCounts.size();
if (NumExits == 0) return;		if (NumExits == 0) return;

ExitNotTaken.ExitingBlock = ExitCounts[0].first;		ExitNotTaken.ExitingBlock = ExitCounts[0].ExitBlock;
ExitNotTaken.ExactNotTaken = ExitCounts[0].second;		ExitNotTaken.ExactNotTaken = ExitCounts[0].Taken;
if (NumExits == 1) return;
		// Determine the number of ExitNotTakenExtras structures that we need.
		unsigned ExtraInfoSize = 0;
		if (NumExits > 1)
		ExtraInfoSize = 1 + std::count_if(std::next(ExitCounts.begin()),
		ExitCounts.end(), [](EdgeInfo &Entry) {
		return !Entry.Pred.isAlwaysTrue();
		});
		else if (!ExitCounts[0].Pred.isAlwaysTrue())
		ExtraInfoSize = 1;

		ExitNotTakenExtras *ENT = nullptr;

		// Allocate the ExitNotTakenExtras structures and initialize the first
		// element (ExitNotTaken).
		if (ExtraInfoSize > 0) {
		ENT = new ExitNotTakenExtras[ExtraInfoSize];
		ExitNotTaken.ExtraInfo.setPointer(&ENT[0]);
		*ExitNotTaken.getPred() = std::move(ExitCounts[0].Pred);
		}

		if (NumExits == 1)
		return;

		auto &Exits = ExitNotTaken.ExtraInfo.getPointer()->Exits;
		sanjoyUnsubmitted Not Done Reply Inline Actions Can you please do one of the following: Comment which numeric field is which Cache the three fields in three local variables, with more mnemonic names Honestly, reading the code, a `struct` with named fields would have been better; but we can fix that later. sanjoy: Can you please do one of the following: - Comment which numeric field is which - Cache the…

// Handle the rare case of multiple computable exits.		// Handle the rare case of multiple computable exits.
ExitNotTakenInfo *ENT = new ExitNotTakenInfo[NumExits-1];		for (unsigned i = 1, PredPos = 1; i < NumExits; ++i) {
		ExitNotTakenExtras *Ptr = nullptr;
		if (!ExitCounts[i].Pred.isAlwaysTrue()) {
		Ptr = &ENT[PredPos++];
		sanjoyUnsubmitted Not Done Reply Inline Actions Are you unconditionally assuming `!std::get<2>(ExitCounts[0]).isAlwaysTrue()` here (given that you're unconditionally adding `1`)? If so, please add a brief comment on why that is okay. sanjoy: Are you unconditionally assuming `!std::get<2>(ExitCounts[0]).isAlwaysTrue()` here (given that…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions No, we know the first ExitNotTakenInfo has an ExtraInfo struct because we have more than 1 exit (so there's no need to check the predicate). The other entries only have an ExtraInfo if they have a not always true SCEVPredicate. Perhaps ExtraInfoSize instead of PredsSize would have been better here. sbaranga: No, we know the first ExitNotTakenInfo has an ExtraInfo struct because we have more than 1 exit…
		Ptr->Pred = std::move(ExitCounts[i].Pred);
		}

ExitNotTakenInfo *PrevENT = &ExitNotTaken;		Exits.emplace_back(ExitCounts[i].ExitBlock, ExitCounts[i].Taken, Ptr);
for (unsigned i = 1; i < NumExits; ++i, PrevENT = ENT, ++ENT) {
PrevENT->setNextExit(ENT);
ENT->ExitingBlock = ExitCounts[i].first;
ENT->ExactNotTaken = ExitCounts[i].second;
}		}
}		}
		sanjoyUnsubmitted Done Reply Inline Actions Can you move this `auto &Exits` out of the loop? sanjoy: Can you move this `auto &Exits` out of the loop?

/// clear - Invalidate this result and free the ExitNotTakenInfo array.		/// clear - Invalidate this result and free the ExitNotTakenInfo array.
void ScalarEvolution::BackedgeTakenInfo::clear() {		void ScalarEvolution::BackedgeTakenInfo::clear() {
ExitNotTaken.ExitingBlock = nullptr;		ExitNotTaken.ExitingBlock = nullptr;
ExitNotTaken.ExactNotTaken = nullptr;		ExitNotTaken.ExactNotTaken = nullptr;
		sanjoyUnsubmitted Done Reply Inline Actions Can this just be `Exits.emplace_back(ExitCounts[i].ExitBlock, ExitCounts[i].taken, Ptr)`? sanjoy: Can this just be `Exits.emplace_back(ExitCounts[i].ExitBlock, ExitCounts[i].taken, Ptr)`?
delete[] ExitNotTaken.getNextExit();		delete[] ExitNotTaken.ExtraInfo.getPointer();
}		}

		sanjoyUnsubmitted Done Reply Inline Actions Why not move this to the place statement where you assign to `Ptr`: if (!ExitCounts[i].Pred.isAlwaysTrue()) { Ptr = &ENT[PredPos++]; Ptr->Pred = ... } sanjoy: Why not move this to the place statement where you assign to `Ptr`: ``` if (!ExitCounts[i].
/// computeBackedgeTakenCount - Compute the number of times the backedge		/// computeBackedgeTakenCount - Compute the number of times the backedge
/// of the specified loop will execute.		/// of the specified loop will execute.
ScalarEvolution::BackedgeTakenInfo		ScalarEvolution::BackedgeTakenInfo
ScalarEvolution::computeBackedgeTakenCount(const Loop *L) {		ScalarEvolution::computeBackedgeTakenCount(const Loop *L,
		bool AllowPredicates) {
SmallVector<BasicBlock *, 8> ExitingBlocks;		SmallVector<BasicBlock *, 8> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);		L->getExitingBlocks(ExitingBlocks);

SmallVector<std::pair<BasicBlock , const SCEV >, 4> ExitCounts;		SmallVector<EdgeInfo, 4> ExitCounts;
bool CouldComputeBECount = true;		bool CouldComputeBECount = true;
BasicBlock *Latch = L->getLoopLatch(); // may be NULL.		BasicBlock *Latch = L->getLoopLatch(); // may be NULL.
const SCEV *MustExitMaxBECount = nullptr;		const SCEV *MustExitMaxBECount = nullptr;
const SCEV *MayExitMaxBECount = nullptr;		const SCEV *MayExitMaxBECount = nullptr;

// Compute the ExitLimit for each loop exit. Use this to populate ExitCounts		// Compute the ExitLimit for each loop exit. Use this to populate ExitCounts
// and compute maxBECount.		// and compute maxBECount.
		// Do a union of all the predicates here.
for (unsigned i = 0, e = ExitingBlocks.size(); i != e; ++i) {		for (unsigned i = 0, e = ExitingBlocks.size(); i != e; ++i) {
BasicBlock *ExitBB = ExitingBlocks[i];		BasicBlock *ExitBB = ExitingBlocks[i];
ExitLimit EL = computeExitLimit(L, ExitBB);		ExitLimit EL = computeExitLimit(L, ExitBB, AllowPredicates);

		assert((AllowPredicates \|\| EL.Pred.isAlwaysTrue()) &&
		"Predicated exit limit when predicates are not allowed!");
		sanjoyUnsubmitted Not Done Reply Inline Actions Add an assert here that if `Guarded` is false, then the predicate in `EL` is trivially true (i.e. we didn't add a predicate when we were not supposed to). sanjoy: Add an assert here that if `Guarded` is false, then the predicate in `EL` is trivially true (i.

// 1. For each exit that can be computed, add an entry to ExitCounts.		// 1. For each exit that can be computed, add an entry to ExitCounts.
// CouldComputeBECount is true only if all exits can be computed.		// CouldComputeBECount is true only if all exits can be computed.
if (EL.Exact == getCouldNotCompute())		if (EL.Exact == getCouldNotCompute())
// We couldn't compute an exact value for this exit, so		// We couldn't compute an exact value for this exit, so
// we won't be able to compute an exact value for the loop.		// we won't be able to compute an exact value for the loop.
CouldComputeBECount = false;		CouldComputeBECount = false;
else		else
ExitCounts.push_back({ExitBB, EL.Exact});		ExitCounts.emplace_back(EdgeInfo(ExitBB, EL.Exact, EL.Pred));

		sanjoyUnsubmitted Not Done Reply Inline Actions This does not look right to me -- won't the `EL` be freed after each iteration, leaving a dangling pointer in `ExitCountPreds`? sanjoy: This does not look right to me -- won't the `EL` be freed after each iteration, leaving a…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Of course, thanks for catching this! sbaranga: Of course, thanks for catching this!
// 2. Derive the loop's MaxBECount from each exit's max number of		// 2. Derive the loop's MaxBECount from each exit's max number of
// non-exiting iterations. Partition the loop exits into two kinds:		// non-exiting iterations. Partition the loop exits into two kinds:
// LoopMustExits and LoopMayExits.		// LoopMustExits and LoopMayExits.
//		//
// If the exit dominates the loop latch, it is a LoopMustExit otherwise it		// If the exit dominates the loop latch, it is a LoopMustExit otherwise it
// is a LoopMayExit. If any computable LoopMustExit is found, then		// is a LoopMayExit. If any computable LoopMustExit is found, then
// MaxBECount is the minimum EL.Max of computable LoopMustExits. Otherwise,		// MaxBECount is the minimum EL.Max of computable LoopMustExits. Otherwise,
// MaxBECount is conservatively the maximum EL.Max, where CouldNotCompute is		// MaxBECount is conservatively the maximum EL.Max, where CouldNotCompute is
Show All 16 Lines	for (unsigned i = 0, e = ExitingBlocks.size(); i != e; ++i) {
}		}
}		}
const SCEV *MaxBECount = MustExitMaxBECount ? MustExitMaxBECount :		const SCEV *MaxBECount = MustExitMaxBECount ? MustExitMaxBECount :
(MayExitMaxBECount ? MayExitMaxBECount : getCouldNotCompute());		(MayExitMaxBECount ? MayExitMaxBECount : getCouldNotCompute());
return BackedgeTakenInfo(ExitCounts, CouldComputeBECount, MaxBECount);		return BackedgeTakenInfo(ExitCounts, CouldComputeBECount, MaxBECount);
}		}

ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::computeExitLimit(const Loop L, BasicBlock ExitingBlock) {		ScalarEvolution::computeExitLimit(const Loop L, BasicBlock ExitingBlock,
		bool AllowPredicates) {

// Okay, we've chosen an exiting block. See what condition causes us to exit		// Okay, we've chosen an exiting block. See what condition causes us to exit
// at this block and remember the exit block and whether all other targets		// at this block and remember the exit block and whether all other targets
// lead to the loop header.		// lead to the loop header.
bool MustExecuteLoopHeader = true;		bool MustExecuteLoopHeader = true;
BasicBlock *Exit = nullptr;		BasicBlock *Exit = nullptr;
for (auto *SBB : successors(ExitingBlock))		for (auto *SBB : successors(ExitingBlock))
if (!L->contains(SBB)) {		if (!L->contains(SBB)) {
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (!Ok)
return getCouldNotCompute();		return getCouldNotCompute();
}		}

bool IsOnlyExit = (L->getExitingBlock() != nullptr);		bool IsOnlyExit = (L->getExitingBlock() != nullptr);
TerminatorInst *Term = ExitingBlock->getTerminator();		TerminatorInst *Term = ExitingBlock->getTerminator();
if (BranchInst *BI = dyn_cast<BranchInst>(Term)) {		if (BranchInst *BI = dyn_cast<BranchInst>(Term)) {
assert(BI->isConditional() && "If unconditional, it can't be in loop!");		assert(BI->isConditional() && "If unconditional, it can't be in loop!");
// Proceed to the next level to examine the exit condition expression.		// Proceed to the next level to examine the exit condition expression.
return computeExitLimitFromCond(L, BI->getCondition(), BI->getSuccessor(0),		return computeExitLimitFromCond(
BI->getSuccessor(1),		L, BI->getCondition(), BI->getSuccessor(0), BI->getSuccessor(1),
/ControlsExit=/IsOnlyExit);		/ControlsExit=/IsOnlyExit, AllowPredicates);
}		}

if (SwitchInst *SI = dyn_cast<SwitchInst>(Term))		if (SwitchInst *SI = dyn_cast<SwitchInst>(Term))
return computeExitLimitFromSingleExitSwitch(L, SI, Exit,		return computeExitLimitFromSingleExitSwitch(L, SI, Exit,
/ControlsExit=/IsOnlyExit);		/ControlsExit=/IsOnlyExit);

return getCouldNotCompute();		return getCouldNotCompute();
}		}

/// computeExitLimitFromCond - Compute the number of times the		/// computeExitLimitFromCond - Compute the number of times the
/// backedge of the specified loop will execute if its exit condition		/// backedge of the specified loop will execute if its exit condition
/// were a conditional branch of ExitCond, TBB, and FBB.		/// were a conditional branch of ExitCond, TBB, and FBB.
///		///
/// @param ControlsExit is true if ExitCond directly controls the exit		/// @param ControlsExit is true if ExitCond directly controls the exit
/// branch. In this case, we can assume that the loop exits only if the		/// branch. In this case, we can assume that the loop exits only if the
/// condition is true and can infer that failing to meet the condition prior to		/// condition is true and can infer that failing to meet the condition prior to
/// integer wraparound results in undefined behavior.		/// integer wraparound results in undefined behavior.
ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::computeExitLimitFromCond(const Loop *L,		ScalarEvolution::computeExitLimitFromCond(const Loop *L,
Value *ExitCond,		Value *ExitCond,
BasicBlock *TBB,		BasicBlock *TBB,
BasicBlock *FBB,		BasicBlock *FBB,
bool ControlsExit) {		bool ControlsExit,
		bool AllowPredicates) {
// Check if the controlling expression for this loop is an And or Or.		// Check if the controlling expression for this loop is an And or Or.
if (BinaryOperator *BO = dyn_cast<BinaryOperator>(ExitCond)) {		if (BinaryOperator *BO = dyn_cast<BinaryOperator>(ExitCond)) {
if (BO->getOpcode() == Instruction::And) {		if (BO->getOpcode() == Instruction::And) {
// Recurse on the operands of the and.		// Recurse on the operands of the and.
bool EitherMayExit = L->contains(TBB);		bool EitherMayExit = L->contains(TBB);
ExitLimit EL0 = computeExitLimitFromCond(L, BO->getOperand(0), TBB, FBB,		ExitLimit EL0 = computeExitLimitFromCond(L, BO->getOperand(0), TBB, FBB,
ControlsExit && !EitherMayExit);		ControlsExit && !EitherMayExit,
		AllowPredicates);
ExitLimit EL1 = computeExitLimitFromCond(L, BO->getOperand(1), TBB, FBB,		ExitLimit EL1 = computeExitLimitFromCond(L, BO->getOperand(1), TBB, FBB,
ControlsExit && !EitherMayExit);		ControlsExit && !EitherMayExit,
		AllowPredicates);
const SCEV *BECount = getCouldNotCompute();		const SCEV *BECount = getCouldNotCompute();
const SCEV *MaxBECount = getCouldNotCompute();		const SCEV *MaxBECount = getCouldNotCompute();
if (EitherMayExit) {		if (EitherMayExit) {
// Both conditions must be true for the loop to continue executing.		// Both conditions must be true for the loop to continue executing.
// Choose the less conservative count.		// Choose the less conservative count.
if (EL0.Exact == getCouldNotCompute() \|\|		if (EL0.Exact == getCouldNotCompute() \|\|
EL1.Exact == getCouldNotCompute())		EL1.Exact == getCouldNotCompute())
BECount = getCouldNotCompute();		BECount = getCouldNotCompute();
Show All 10 Lines	if (BO->getOpcode() == Instruction::And) {
// For now, be conservative.		// For now, be conservative.
assert(L->contains(FBB) && "Loop block has no successor in loop!");		assert(L->contains(FBB) && "Loop block has no successor in loop!");
if (EL0.Max == EL1.Max)		if (EL0.Max == EL1.Max)
MaxBECount = EL0.Max;		MaxBECount = EL0.Max;
if (EL0.Exact == EL1.Exact)		if (EL0.Exact == EL1.Exact)
BECount = EL0.Exact;		BECount = EL0.Exact;
}		}

		SCEVUnionPredicate NP;
		NP.add(&EL0.Pred);
		NP.add(&EL1.Pred);
// There are cases (e.g. PR26207) where computeExitLimitFromCond is able		// There are cases (e.g. PR26207) where computeExitLimitFromCond is able
// to be more aggressive when computing BECount than when computing		// to be more aggressive when computing BECount than when computing
// MaxBECount. In these cases it is possible for EL0.Exact and EL1.Exact		// MaxBECount. In these cases it is possible for EL0.Exact and EL1.Exact
// to match, but for EL0.Max and EL1.Max to not.		// to match, but for EL0.Max and EL1.Max to not.
if (isa<SCEVCouldNotCompute>(MaxBECount) &&		if (isa<SCEVCouldNotCompute>(MaxBECount) &&
!isa<SCEVCouldNotCompute>(BECount))		!isa<SCEVCouldNotCompute>(BECount))
MaxBECount = BECount;		MaxBECount = BECount;

return ExitLimit(BECount, MaxBECount);		return ExitLimit(BECount, MaxBECount, NP);
}		}
if (BO->getOpcode() == Instruction::Or) {		if (BO->getOpcode() == Instruction::Or) {
// Recurse on the operands of the or.		// Recurse on the operands of the or.
bool EitherMayExit = L->contains(FBB);		bool EitherMayExit = L->contains(FBB);
ExitLimit EL0 = computeExitLimitFromCond(L, BO->getOperand(0), TBB, FBB,		ExitLimit EL0 = computeExitLimitFromCond(L, BO->getOperand(0), TBB, FBB,
ControlsExit && !EitherMayExit);		ControlsExit && !EitherMayExit,
		AllowPredicates);
ExitLimit EL1 = computeExitLimitFromCond(L, BO->getOperand(1), TBB, FBB,		ExitLimit EL1 = computeExitLimitFromCond(L, BO->getOperand(1), TBB, FBB,
ControlsExit && !EitherMayExit);		ControlsExit && !EitherMayExit,
		AllowPredicates);
const SCEV *BECount = getCouldNotCompute();		const SCEV *BECount = getCouldNotCompute();
const SCEV *MaxBECount = getCouldNotCompute();		const SCEV *MaxBECount = getCouldNotCompute();
if (EitherMayExit) {		if (EitherMayExit) {
// Both conditions must be false for the loop to continue executing.		// Both conditions must be false for the loop to continue executing.
// Choose the less conservative count.		// Choose the less conservative count.
if (EL0.Exact == getCouldNotCompute() \|\|		if (EL0.Exact == getCouldNotCompute() \|\|
EL1.Exact == getCouldNotCompute())		EL1.Exact == getCouldNotCompute())
BECount = getCouldNotCompute();		BECount = getCouldNotCompute();
Show All 10 Lines	if (BO->getOpcode() == Instruction::Or) {
// For now, be conservative.		// For now, be conservative.
assert(L->contains(TBB) && "Loop block has no successor in loop!");		assert(L->contains(TBB) && "Loop block has no successor in loop!");
if (EL0.Max == EL1.Max)		if (EL0.Max == EL1.Max)
MaxBECount = EL0.Max;		MaxBECount = EL0.Max;
if (EL0.Exact == EL1.Exact)		if (EL0.Exact == EL1.Exact)
BECount = EL0.Exact;		BECount = EL0.Exact;
}		}

return ExitLimit(BECount, MaxBECount);		SCEVUnionPredicate NP;
		NP.add(&EL0.Pred);
		NP.add(&EL1.Pred);
		return ExitLimit(BECount, MaxBECount, NP);
}		}
}		}

// With an icmp, it may be feasible to compute an exact backedge-taken count.		// With an icmp, it may be feasible to compute an exact backedge-taken count.
// Proceed to the next level to examine the icmp.		// Proceed to the next level to examine the icmp.
if (ICmpInst *ExitCondICmp = dyn_cast<ICmpInst>(ExitCond))		if (ICmpInst *ExitCondICmp = dyn_cast<ICmpInst>(ExitCond)) {
return computeExitLimitFromICmp(L, ExitCondICmp, TBB, FBB, ControlsExit);		ExitLimit EL =
		computeExitLimitFromICmp(L, ExitCondICmp, TBB, FBB, ControlsExit);
		if (EL.hasFullInfo() \|\| !AllowPredicates)
		return EL;

		// Try again, but use SCEV predicates this time.
		return computeExitLimitFromICmp(L, ExitCondICmp, TBB, FBB, ControlsExit,
		/AllowPredicates=/true);
		sanjoyUnsubmitted Not Done Reply Inline Actions Please add a comment as `/* IsGuarded = / true`. sanjoy:* Please add a comment as `/* IsGuarded = */ true`.
		}

// Check for a constant condition. These are normally stripped out by		// Check for a constant condition. These are normally stripped out by
// SimplifyCFG, but ScalarEvolution may be used by a pass which wishes to		// SimplifyCFG, but ScalarEvolution may be used by a pass which wishes to
// preserve the CFG and is temporarily leaving constant conditions		// preserve the CFG and is temporarily leaving constant conditions
// in place.		// in place.
if (ConstantInt *CI = dyn_cast<ConstantInt>(ExitCond)) {		if (ConstantInt *CI = dyn_cast<ConstantInt>(ExitCond)) {
if (L->contains(FBB) == !CI->getZExtValue())		if (L->contains(FBB) == !CI->getZExtValue())
// The backedge is always taken.		// The backedge is always taken.
return getCouldNotCompute();		return getCouldNotCompute();
else		else
// The backedge is never taken.		// The backedge is never taken.
return getZero(CI->getType());		return getZero(CI->getType());
}		}

// If it's not an integer or pointer comparison then compute it the hard way.		// If it's not an integer or pointer comparison then compute it the hard way.
return computeExitCountExhaustively(L, ExitCond, !L->contains(TBB));		return computeExitCountExhaustively(L, ExitCond, !L->contains(TBB));
}		}

ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::computeExitLimitFromICmp(const Loop *L,		ScalarEvolution::computeExitLimitFromICmp(const Loop *L,
ICmpInst *ExitCond,		ICmpInst *ExitCond,
BasicBlock *TBB,		BasicBlock *TBB,
BasicBlock *FBB,		BasicBlock *FBB,
bool ControlsExit) {		bool ControlsExit,
		bool AllowPredicates) {

// If the condition was exit on true, convert the condition to exit on false		// If the condition was exit on true, convert the condition to exit on false
ICmpInst::Predicate Cond;		ICmpInst::Predicate Cond;
if (!L->contains(FBB))		if (!L->contains(FBB))
Cond = ExitCond->getPredicate();		Cond = ExitCond->getPredicate();
else		else
Cond = ExitCond->getInversePredicate();		Cond = ExitCond->getInversePredicate();

Show All 40 Lines	if (const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(LHS))

const SCEV Ret = AddRec->getNumIterationsInRange(CompRange, this);		const SCEV Ret = AddRec->getNumIterationsInRange(CompRange, this);
if (!isa<SCEVCouldNotCompute>(Ret)) return Ret;		if (!isa<SCEVCouldNotCompute>(Ret)) return Ret;
}		}

switch (Cond) {		switch (Cond) {
case ICmpInst::ICMP_NE: { // while (X != Y)		case ICmpInst::ICMP_NE: { // while (X != Y)
// Convert to: while (X-Y != 0)		// Convert to: while (X-Y != 0)
ExitLimit EL = HowFarToZero(getMinusSCEV(LHS, RHS), L, ControlsExit);		ExitLimit EL = HowFarToZero(getMinusSCEV(LHS, RHS), L, ControlsExit,
		AllowPredicates);
if (EL.hasAnyInfo()) return EL;		if (EL.hasAnyInfo()) return EL;
break;		break;
}		}
case ICmpInst::ICMP_EQ: { // while (X == Y)		case ICmpInst::ICMP_EQ: { // while (X == Y)
// Convert to: while (X-Y == 0)		// Convert to: while (X-Y == 0)
ExitLimit EL = HowFarToNonZero(getMinusSCEV(LHS, RHS), L);		ExitLimit EL = HowFarToNonZero(getMinusSCEV(LHS, RHS), L);
if (EL.hasAnyInfo()) return EL;		if (EL.hasAnyInfo()) return EL;
break;		break;
}		}
case ICmpInst::ICMP_SLT:		case ICmpInst::ICMP_SLT:
case ICmpInst::ICMP_ULT: { // while (X < Y)		case ICmpInst::ICMP_ULT: { // while (X < Y)
bool IsSigned = Cond == ICmpInst::ICMP_SLT;		bool IsSigned = Cond == ICmpInst::ICMP_SLT;
ExitLimit EL = HowManyLessThans(LHS, RHS, L, IsSigned, ControlsExit);		ExitLimit EL = HowManyLessThans(LHS, RHS, L, IsSigned, ControlsExit,
		AllowPredicates);
if (EL.hasAnyInfo()) return EL;		if (EL.hasAnyInfo()) return EL;
break;		break;
}		}
case ICmpInst::ICMP_SGT:		case ICmpInst::ICMP_SGT:
case ICmpInst::ICMP_UGT: { // while (X > Y)		case ICmpInst::ICMP_UGT: { // while (X > Y)
bool IsSigned = Cond == ICmpInst::ICMP_SGT;		bool IsSigned = Cond == ICmpInst::ICMP_SGT;
ExitLimit EL = HowManyGreaterThans(LHS, RHS, L, IsSigned, ControlsExit);		ExitLimit EL =
		HowManyGreaterThans(LHS, RHS, L, IsSigned, ControlsExit,
		AllowPredicates);
if (EL.hasAnyInfo()) return EL;		if (EL.hasAnyInfo()) return EL;
break;		break;
}		}
default:		default:
break;		break;
}		}
return computeExitCountExhaustively(L, ExitCond, !L->contains(TBB));		return computeExitCountExhaustively(L, ExitCond, !L->contains(TBB));
}		}

ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::computeExitLimitFromSingleExitSwitch(const Loop *L,		ScalarEvolution::computeExitLimitFromSingleExitSwitch(const Loop *L,
SwitchInst *Switch,		SwitchInst *Switch,
BasicBlock *ExitingBlock,		BasicBlock *ExitingBlock,
bool ControlsExit) {		bool ControlsExit) {
assert(!L->contains(ExitingBlock) && "Not an exiting block!");		assert(!L->contains(ExitingBlock) && "Not an exiting block!");

		sanjoyUnsubmitted Not Done Reply Inline Actions I don't think this is the right layering -- for instance, this forces SCEV clients that don't want speculative / predicated trip counts to pay to cost of computing them. I'd say SCEV users that care about predicated trip counts should do this retry themselves, i.e. something like const SCEV TC = getBackedgeTakenCount(L); if (TC is SCEVCouldNotCompute) { SE->forgetLoop(L); TC = getPredicatedBackedgeTakenCount(L); } sanjoy:* I don't think this is the right layering -- for instance, this forces SCEV clients that don't…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Good point. I'll look into how we could compute this more lazily. Why do you think we would need a SE->forgetLoop() here? FWIW I'm generally trying to avoid having to invalidate the analysis for a given loop. sbaranga: Good point. I'll look into how we could compute this more lazily. Why do you think we would…
		sanjoyUnsubmitted Not Done Reply Inline Actions Why do you think we would need a SE->forgetLoop() here? To forget the cached CouldNotCompute trip count. But you're right -- we're probably better off not invalidating the whole analysis -- we can just have the second call to `getPredicatedBackedgeTakenCount` overwrite CouldNotCompute. sanjoy: > Why do you think we would need a SE->forgetLoop() here? To forget the cached CouldNotCompute…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Thanks! We'll take this approach then. sbaranga: Thanks! We'll take this approach then.
// Give up if the exit is the default dest of a switch.		// Give up if the exit is the default dest of a switch.
if (Switch->getDefaultDest() == ExitingBlock)		if (Switch->getDefaultDest() == ExitingBlock)
return getCouldNotCompute();		return getCouldNotCompute();

assert(L->contains(Switch->getDefaultDest()) &&		assert(L->contains(Switch->getDefaultDest()) &&
"Default case must not exit the loop!");		"Default case must not exit the loop!");
const SCEV *LHS = getSCEVAtScope(Switch->getCondition(), L);		const SCEV *LHS = getSCEVAtScope(Switch->getCondition(), L);
const SCEV *RHS = getConstant(Switch->findCaseDest(ExitingBlock));		const SCEV *RHS = getConstant(Switch->findCaseDest(ExitingBlock));
▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	auto *Result =
ConstantFoldCompareInstOperands(Pred, StableValue, RHS, DL, &TLI);		ConstantFoldCompareInstOperands(Pred, StableValue, RHS, DL, &TLI);
assert(Result->getType()->isIntegerTy(1) &&		assert(Result->getType()->isIntegerTy(1) &&
"Otherwise cannot be an operand to a branch instruction");		"Otherwise cannot be an operand to a branch instruction");

if (Result->isZeroValue()) {		if (Result->isZeroValue()) {
unsigned BitWidth = getTypeSizeInBits(RHS->getType());		unsigned BitWidth = getTypeSizeInBits(RHS->getType());
const SCEV *UpperBound =		const SCEV *UpperBound =
getConstant(getEffectiveSCEVType(RHS->getType()), BitWidth);		getConstant(getEffectiveSCEVType(RHS->getType()), BitWidth);
return ExitLimit(getCouldNotCompute(), UpperBound);		SCEVUnionPredicate P;
		return ExitLimit(getCouldNotCompute(), UpperBound, P);
}		}

return getCouldNotCompute();		return getCouldNotCompute();
}		}

/// CanConstantFold - Return true if we can constant fold an instruction of the		/// CanConstantFold - Return true if we can constant fold an instruction of the
/// specified type, assuming that all operands were constants.		/// specified type, assuming that all operands were constants.
static bool CanConstantFold(const Instruction *I) {		static bool CanConstantFold(const Instruction *I) {
▲ Show 20 Lines • Show All 760 Lines • ▼ Show 20 Lines
/// HowFarToZero - Return the number of times a backedge comparing the specified		/// HowFarToZero - Return the number of times a backedge comparing the specified
/// value to zero will execute. If not computable, return CouldNotCompute.		/// value to zero will execute. If not computable, return CouldNotCompute.
///		///
/// This is only used for loops with a "x != y" exit test. The exit condition is		/// This is only used for loops with a "x != y" exit test. The exit condition is
/// now expressed as a single expression, V = x-y. So the exit test is		/// now expressed as a single expression, V = x-y. So the exit test is
/// effectively V != 0. We know and take advantage of the fact that this		/// effectively V != 0. We know and take advantage of the fact that this
/// expression only being used in a comparison by zero context.		/// expression only being used in a comparison by zero context.
ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::HowFarToZero(const SCEV V, const Loop L, bool ControlsExit) {		ScalarEvolution::HowFarToZero(const SCEV V, const Loop L, bool ControlsExit,
		bool AllowPredicates) {
		SCEVUnionPredicate P;
// If the value is a constant		// If the value is a constant
if (const SCEVConstant *C = dyn_cast<SCEVConstant>(V)) {		if (const SCEVConstant *C = dyn_cast<SCEVConstant>(V)) {
// If the value is already zero, the branch will execute zero times.		// If the value is already zero, the branch will execute zero times.
if (C->getValue()->isZero()) return C;		if (C->getValue()->isZero()) return C;
return getCouldNotCompute(); // Otherwise it will loop infinitely.		return getCouldNotCompute(); // Otherwise it will loop infinitely.
}		}

const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(V);		const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(V);
		if (!AddRec && AllowPredicates)
		sanjoyUnsubmitted Not Done Reply Inline Actions I don't think you need to `(!A) && B`, you can `!A && B` instead. sanjoy: I don't think you need to `(!A) && B`, you can `!A && B` instead.
		// Try to make this an AddRec using runtime tests, in the first X
		// iterations of this loop, where X is the SCEV expression found by the
		sanjoyUnsubmitted Not Done Reply Inline Actions No need to fix this in this change, but a nicer API could be to have `convertSCEVToAddRecWithPredicates` return an add recurrence or null if it failed (and get rid of the `dyn_cast`). sanjoy: No need to fix this in this change, but a nicer API could be to have…
		// algorithm below.
		AddRec = convertSCEVToAddRecWithPredicates(V, L, P);

if (!AddRec \|\| AddRec->getLoop() != L)		if (!AddRec \|\| AddRec->getLoop() != L)
return getCouldNotCompute();		return getCouldNotCompute();

// If this is a quadratic (3-term) AddRec {L,+,M,+,N}, find the roots of		// If this is a quadratic (3-term) AddRec {L,+,M,+,N}, find the roots of
// the quadratic equation to solve it.		// the quadratic equation to solve it.
if (AddRec->isQuadratic() && AddRec->getType()->isIntegerTy()) {		if (AddRec->isQuadratic() && AddRec->getType()->isIntegerTy()) {
std::pair<const SCEV ,const SCEV > Roots =		std::pair<const SCEV ,const SCEV > Roots =
SolveQuadraticEquation(AddRec, *this);		SolveQuadraticEquation(AddRec, *this);
const SCEVConstant *R1 = dyn_cast<SCEVConstant>(Roots.first);		const SCEVConstant *R1 = dyn_cast<SCEVConstant>(Roots.first);
const SCEVConstant *R2 = dyn_cast<SCEVConstant>(Roots.second);		const SCEVConstant *R2 = dyn_cast<SCEVConstant>(Roots.second);
if (R1 && R2) {		if (R1 && R2) {
// Pick the smallest positive root value.		// Pick the smallest positive root value.
if (ConstantInt *CB =		if (ConstantInt *CB =
dyn_cast<ConstantInt>(ConstantExpr::getICmp(CmpInst::ICMP_ULT,		dyn_cast<ConstantInt>(ConstantExpr::getICmp(CmpInst::ICMP_ULT,
R1->getValue(),		R1->getValue(),
R2->getValue()))) {		R2->getValue()))) {
if (!CB->getZExtValue())		if (!CB->getZExtValue())
std::swap(R1, R2); // R1 is the minimum root now.		std::swap(R1, R2); // R1 is the minimum root now.

// We can only use this value if the chrec ends up with an exact zero		// We can only use this value if the chrec ends up with an exact zero
// value at this index. When solving for "X*X != 5", for example, we		// value at this index. When solving for "X*X != 5", for example, we
// should not accept a root of 2.		// should not accept a root of 2.
const SCEV Val = AddRec->evaluateAtIteration(R1, this);		const SCEV Val = AddRec->evaluateAtIteration(R1, this);
if (Val->isZero())		if (Val->isZero())
return R1; // We found a quadratic root!		return ExitLimit(R1, R1, P); // We found a quadratic root!
}		}
}		}
return getCouldNotCompute();		return getCouldNotCompute();
}		}

// Otherwise we can only handle this if it is affine.		// Otherwise we can only handle this if it is affine.
if (!AddRec->isAffine())		if (!AddRec->isAffine())
return getCouldNotCompute();		return getCouldNotCompute();
Show All 40 Lines	if (StepC->getValue()->equalsInt(1) \|\| StepC->getValue()->isAllOnesValue()) {
if (!CountDown && CR.getUnsignedMin().isMinValue())		if (!CountDown && CR.getUnsignedMin().isMinValue())
// When counting up, the worst starting value is 1, not 0.		// When counting up, the worst starting value is 1, not 0.
MaxBECount = CR.getUnsignedMax().isMinValue()		MaxBECount = CR.getUnsignedMax().isMinValue()
? getConstant(APInt::getMinValue(CR.getBitWidth()))		? getConstant(APInt::getMinValue(CR.getBitWidth()))
: getConstant(APInt::getMaxValue(CR.getBitWidth()));		: getConstant(APInt::getMaxValue(CR.getBitWidth()));
else		else
MaxBECount = getConstant(CountDown ? CR.getUnsignedMax()		MaxBECount = getConstant(CountDown ? CR.getUnsignedMax()
: -CR.getUnsignedMin());		: -CR.getUnsignedMin());
return ExitLimit(Distance, MaxBECount);		return ExitLimit(Distance, MaxBECount, P);
}		}

// As a special case, handle the instance where Step is a positive power of		// As a special case, handle the instance where Step is a positive power of
// two. In this case, determining whether Step divides Distance evenly can be		// two. In this case, determining whether Step divides Distance evenly can be
// done by counting and comparing the number of trailing zeros of Step and		// done by counting and comparing the number of trailing zeros of Step and
// Distance.		// Distance.
if (!CountDown) {		if (!CountDown) {
const APInt &StepV = StepC->getAPInt();		const APInt &StepV = StepC->getAPInt();
Show All 36 Lines	if (StepV.isPowerOf2() &&

// Since SCEV does not have a URem node, we construct one using a truncate		// Since SCEV does not have a URem node, we construct one using a truncate
// and a zero extend.		// and a zero extend.

unsigned NarrowWidth = StepV.getBitWidth() - StepV.countTrailingZeros();		unsigned NarrowWidth = StepV.getBitWidth() - StepV.countTrailingZeros();
auto *NarrowTy = IntegerType::get(getContext(), NarrowWidth);		auto *NarrowTy = IntegerType::get(getContext(), NarrowWidth);
auto *WideTy = Distance->getType();		auto *WideTy = Distance->getType();

return getZeroExtendExpr(getTruncateExpr(ModuloResult, NarrowTy), WideTy);		const SCEV *Limit =
		getZeroExtendExpr(getTruncateExpr(ModuloResult, NarrowTy), WideTy);
		return ExitLimit(Limit, Limit, P);
}		}
}		}

// If the condition controls loop exit (the loop exits only if the expression		// If the condition controls loop exit (the loop exits only if the expression
// is true) and the addition is no-wrap we can use unsigned divide to		// is true) and the addition is no-wrap we can use unsigned divide to
// compute the backedge count. In this case, the step may not divide the		// compute the backedge count. In this case, the step may not divide the
// distance, but we don't care because if the condition is "missed" the loop		// distance, but we don't care because if the condition is "missed" the loop
// will have undefined behavior due to wrapping.		// will have undefined behavior due to wrapping.
if (ControlsExit && AddRec->hasNoSelfWrap()) {		if (ControlsExit && AddRec->hasNoSelfWrap()) {
const SCEV *Exact =		const SCEV *Exact =
getUDivExpr(Distance, CountDown ? getNegativeSCEV(Step) : Step);		getUDivExpr(Distance, CountDown ? getNegativeSCEV(Step) : Step);
return ExitLimit(Exact, Exact);		return ExitLimit(Exact, Exact, P);
}		}

// Then, try to solve the above equation provided that Start is constant.		// Then, try to solve the above equation provided that Start is constant.
if (const SCEVConstant *StartC = dyn_cast<SCEVConstant>(Start))		if (const SCEVConstant *StartC = dyn_cast<SCEVConstant>(Start)) {
return SolveLinEquationWithOverflow(StepC->getAPInt(), -StartC->getAPInt(),		const SCEV *E = SolveLinEquationWithOverflow(
*this);		StepC->getValue()->getValue(), -StartC->getValue()->getValue(), *this);
		return ExitLimit(E, E, P);
		}
return getCouldNotCompute();		return getCouldNotCompute();
}		}

/// HowFarToNonZero - Return the number of times a backedge checking the		/// HowFarToNonZero - Return the number of times a backedge checking the
/// specified value for nonzero will execute. If not computable, return		/// specified value for nonzero will execute. If not computable, return
/// CouldNotCompute		/// CouldNotCompute
ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::HowFarToNonZero(const SCEV V, const Loop L) {		ScalarEvolution::HowFarToNonZero(const SCEV V, const Loop L) {
▲ Show 20 Lines • Show All 1,426 Lines • ▼ Show 20 Lines
/// CouldNotCompute.		/// CouldNotCompute.
///		///
/// @param ControlsExit is true when the LHS < RHS condition directly controls		/// @param ControlsExit is true when the LHS < RHS condition directly controls
/// the branch (loops exits only if condition is true). In this case, we can use		/// the branch (loops exits only if condition is true). In this case, we can use
/// NoWrapFlags to skip overflow checks.		/// NoWrapFlags to skip overflow checks.
ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::HowManyLessThans(const SCEV LHS, const SCEV RHS,		ScalarEvolution::HowManyLessThans(const SCEV LHS, const SCEV RHS,
const Loop *L, bool IsSigned,		const Loop *L, bool IsSigned,
bool ControlsExit) {		bool ControlsExit, bool AllowPredicates) {
		SCEVUnionPredicate P;
// We handle only IV < Invariant		// We handle only IV < Invariant
if (!isLoopInvariant(RHS, L))		if (!isLoopInvariant(RHS, L))
return getCouldNotCompute();		return getCouldNotCompute();

const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);		const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);
		if (!IV && AllowPredicates)
		// Try to make this an AddRec using runtime tests, in the first X
		// iterations of this loop, where X is the SCEV expression found by the
		// algorithm below.
		IV = convertSCEVToAddRecWithPredicates(LHS, L, P);
		sanjoyUnsubmitted Not Done Reply Inline Actions I think you can just fall through the logic below (under `// Avoid weird loops`). Also, the comment on the api change to `convertSCEVToAddRecWithPredicates` also applies here. sanjoy: I think you can just fall through the logic below (under `// Avoid weird loops`). Also, the…

// Avoid weird loops		// Avoid weird loops
if (!IV \|\| IV->getLoop() != L \|\| !IV->isAffine())		if (!IV \|\| IV->getLoop() != L \|\| !IV->isAffine())
return getCouldNotCompute();		return getCouldNotCompute();

bool NoWrap = ControlsExit &&		bool NoWrap = ControlsExit &&
IV->getNoWrapFlags(IsSigned ? SCEV::FlagNSW : SCEV::FlagNUW);		IV->getNoWrapFlags(IsSigned ? SCEV::FlagNSW : SCEV::FlagNUW);

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (isa<SCEVConstant>(BECount))
MaxBECount = BECount;		MaxBECount = BECount;
else		else
MaxBECount = computeBECount(getConstant(MaxEnd - MinStart),		MaxBECount = computeBECount(getConstant(MaxEnd - MinStart),
getConstant(MinStride), false);		getConstant(MinStride), false);

if (isa<SCEVCouldNotCompute>(MaxBECount))		if (isa<SCEVCouldNotCompute>(MaxBECount))
MaxBECount = BECount;		MaxBECount = BECount;

return ExitLimit(BECount, MaxBECount);		return ExitLimit(BECount, MaxBECount, P);
}		}

ScalarEvolution::ExitLimit		ScalarEvolution::ExitLimit
ScalarEvolution::HowManyGreaterThans(const SCEV LHS, const SCEV RHS,		ScalarEvolution::HowManyGreaterThans(const SCEV LHS, const SCEV RHS,
const Loop *L, bool IsSigned,		const Loop *L, bool IsSigned,
bool ControlsExit) {		bool ControlsExit, bool AllowPredicates) {
		SCEVUnionPredicate P;
// We handle only IV > Invariant		// We handle only IV > Invariant
if (!isLoopInvariant(RHS, L))		if (!isLoopInvariant(RHS, L))
return getCouldNotCompute();		return getCouldNotCompute();

const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);		const SCEVAddRecExpr *IV = dyn_cast<SCEVAddRecExpr>(LHS);
		if (!IV && AllowPredicates)
		// Try to make this an AddRec using runtime tests, in the first X
		// iterations of this loop, where X is the SCEV expression found by the
		// algorithm below.
		IV = convertSCEVToAddRecWithPredicates(LHS, L, P);

// Avoid weird loops		// Avoid weird loops
if (!IV \|\| IV->getLoop() != L \|\| !IV->isAffine())		if (!IV \|\| IV->getLoop() != L \|\| !IV->isAffine())
return getCouldNotCompute();		return getCouldNotCompute();

bool NoWrap = ControlsExit &&		bool NoWrap = ControlsExit &&
IV->getNoWrapFlags(IsSigned ? SCEV::FlagNSW : SCEV::FlagNUW);		IV->getNoWrapFlags(IsSigned ? SCEV::FlagNSW : SCEV::FlagNUW);

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (isa<SCEVConstant>(BECount))
MaxBECount = BECount;		MaxBECount = BECount;
else		else
MaxBECount = computeBECount(getConstant(MaxStart - MinEnd),		MaxBECount = computeBECount(getConstant(MaxStart - MinEnd),
getConstant(MinStride), false);		getConstant(MinStride), false);

if (isa<SCEVCouldNotCompute>(MaxBECount))		if (isa<SCEVCouldNotCompute>(MaxBECount))
MaxBECount = BECount;		MaxBECount = BECount;

return ExitLimit(BECount, MaxBECount);		return ExitLimit(BECount, MaxBECount, P);
}		}

/// getNumIterationsInRange - Return the number of iterations of this loop that		/// getNumIterationsInRange - Return the number of iterations of this loop that
/// produce values in the specified constant range. Another way of looking at		/// produce values in the specified constant range. Another way of looking at
/// this is that it returns the first iteration number where the value is not in		/// this is that it returns the first iteration number where the value is not in
/// the condition, thus computing the exit count. If the iteration count can't		/// the condition, thus computing the exit count. If the iteration count can't
/// be computed, an instance of SCEVCouldNotCompute is returned.		/// be computed, an instance of SCEVCouldNotCompute is returned.
const SCEV *SCEVAddRecExpr::getNumIterationsInRange(ConstantRange Range,		const SCEV *SCEVAddRecExpr::getNumIterationsInRange(ConstantRange Range,
▲ Show 20 Lines • Show All 687 Lines • ▼ Show 20 Lines	: F(F), TLI(TLI), AC(AC), DT(DT), LI(LI),
FirstUnknown(nullptr) {}		FirstUnknown(nullptr) {}

ScalarEvolution::ScalarEvolution(ScalarEvolution &&Arg)		ScalarEvolution::ScalarEvolution(ScalarEvolution &&Arg)
: F(Arg.F), TLI(Arg.TLI), AC(Arg.AC), DT(Arg.DT), LI(Arg.LI),		: F(Arg.F), TLI(Arg.TLI), AC(Arg.AC), DT(Arg.DT), LI(Arg.LI),
CouldNotCompute(std::move(Arg.CouldNotCompute)),		CouldNotCompute(std::move(Arg.CouldNotCompute)),
ValueExprMap(std::move(Arg.ValueExprMap)),		ValueExprMap(std::move(Arg.ValueExprMap)),
WalkingBEDominatingConds(false), ProvingSplitPredicate(false),		WalkingBEDominatingConds(false), ProvingSplitPredicate(false),
BackedgeTakenCounts(std::move(Arg.BackedgeTakenCounts)),		BackedgeTakenCounts(std::move(Arg.BackedgeTakenCounts)),
		PredicatedBackedgeTakenCounts(
		std::move(Arg.PredicatedBackedgeTakenCounts)),
ConstantEvolutionLoopExitValue(		ConstantEvolutionLoopExitValue(
std::move(Arg.ConstantEvolutionLoopExitValue)),		std::move(Arg.ConstantEvolutionLoopExitValue)),
ValuesAtScopes(std::move(Arg.ValuesAtScopes)),		ValuesAtScopes(std::move(Arg.ValuesAtScopes)),
LoopDispositions(std::move(Arg.LoopDispositions)),		LoopDispositions(std::move(Arg.LoopDispositions)),
BlockDispositions(std::move(Arg.BlockDispositions)),		BlockDispositions(std::move(Arg.BlockDispositions)),
UnsignedRanges(std::move(Arg.UnsignedRanges)),		UnsignedRanges(std::move(Arg.UnsignedRanges)),
SignedRanges(std::move(Arg.SignedRanges)),		SignedRanges(std::move(Arg.SignedRanges)),
UniqueSCEVs(std::move(Arg.UniqueSCEVs)),		UniqueSCEVs(std::move(Arg.UniqueSCEVs)),
Show All 16 Lines	ScalarEvolution::~ScalarEvolution() {
ExprValueMap.clear();		ExprValueMap.clear();
ValueExprMap.clear();		ValueExprMap.clear();
HasRecMap.clear();		HasRecMap.clear();

// Free any extra memory created for ExitNotTakenInfo in the unlikely event		// Free any extra memory created for ExitNotTakenInfo in the unlikely event
// that a loop had multiple computable exits.		// that a loop had multiple computable exits.
for (auto &BTCI : BackedgeTakenCounts)		for (auto &BTCI : BackedgeTakenCounts)
BTCI.second.clear();		BTCI.second.clear();
		for (auto &BTCI : PredicatedBackedgeTakenCounts)
		BTCI.second.clear();

assert(PendingLoopPredicates.empty() && "isImpliedCond garbage");		assert(PendingLoopPredicates.empty() && "isImpliedCond garbage");
assert(!WalkingBEDominatingConds && "isLoopBackedgeGuardedByCond garbage!");		assert(!WalkingBEDominatingConds && "isLoopBackedgeGuardedByCond garbage!");
assert(!ProvingSplitPredicate && "ProvingSplitPredicate garbage!");		assert(!ProvingSplitPredicate && "ProvingSplitPredicate garbage!");
}		}

bool ScalarEvolution::hasLoopInvariantBackedgeTakenCount(const Loop *L) {		bool ScalarEvolution::hasLoopInvariantBackedgeTakenCount(const Loop *L) {
return !isa<SCEVCouldNotCompute>(getBackedgeTakenCount(L));		return !isa<SCEVCouldNotCompute>(getBackedgeTakenCount(L));
Show All 26 Lines	static void PrintLoopInfo(raw_ostream &OS, ScalarEvolution *SE,
OS << ": ";		OS << ": ";

if (!isa<SCEVCouldNotCompute>(SE->getMaxBackedgeTakenCount(L))) {		if (!isa<SCEVCouldNotCompute>(SE->getMaxBackedgeTakenCount(L))) {
OS << "max backedge-taken count is " << *SE->getMaxBackedgeTakenCount(L);		OS << "max backedge-taken count is " << *SE->getMaxBackedgeTakenCount(L);
} else {		} else {
OS << "Unpredictable max backedge-taken count. ";		OS << "Unpredictable max backedge-taken count. ";
}		}

		OS << "\n"
		"Loop ";
		L->getHeader()->printAsOperand(OS, /PrintType=/false);
		OS << ": ";

		SCEVUnionPredicate Pred;
		auto PBT = SE->getPredicatedBackedgeTakenCount(L, Pred);
		if (!isa<SCEVCouldNotCompute>(PBT)) {
		OS << "Predicated backedge-taken count is " << *PBT << "\n";
		OS << " Predicates:\n";
		Pred.print(OS, 4);
		} else {
		OS << "Unpredictable predicated backedge-taken count. ";
		}
OS << "\n";		OS << "\n";
}		}

void ScalarEvolution::print(raw_ostream &OS) const {		void ScalarEvolution::print(raw_ostream &OS) const {
// ScalarEvolution's implementation of the print method is to print		// ScalarEvolution's implementation of the print method is to print
// out SCEV values of all instructions that are interesting. Doing		// out SCEV values of all instructions that are interesting. Doing
// this potentially causes it to create new SCEV objects though,		// this potentially causes it to create new SCEV objects though,
// which technically conflicts with the const qualifier. This isn't		// which technically conflicts with the const qualifier. This isn't
▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	void ScalarEvolution::forgetMemoizedResults(const SCEV *S) {
ValuesAtScopes.erase(S);		ValuesAtScopes.erase(S);
LoopDispositions.erase(S);		LoopDispositions.erase(S);
BlockDispositions.erase(S);		BlockDispositions.erase(S);
UnsignedRanges.erase(S);		UnsignedRanges.erase(S);
SignedRanges.erase(S);		SignedRanges.erase(S);
ExprValueMap.erase(S);		ExprValueMap.erase(S);
HasRecMap.erase(S);		HasRecMap.erase(S);

for (DenseMap<const Loop*, BackedgeTakenInfo>::iterator I =		auto RemoveSCEVFromBackedgeMap =
BackedgeTakenCounts.begin(), E = BackedgeTakenCounts.end(); I != E; ) {		[S, this](DenseMap<const Loop *, BackedgeTakenInfo> &Map) {
		for (auto I = Map.begin(), E = Map.end(); I != E;) {
BackedgeTakenInfo &BEInfo = I->second;		BackedgeTakenInfo &BEInfo = I->second;
if (BEInfo.hasOperand(S, this)) {		if (BEInfo.hasOperand(S, this)) {
BEInfo.clear();		BEInfo.clear();
BackedgeTakenCounts.erase(I++);		Map.erase(I++);
}		} else
else
++I;		++I;
}		}
		};

		RemoveSCEVFromBackedgeMap(BackedgeTakenCounts);
		RemoveSCEVFromBackedgeMap(PredicatedBackedgeTakenCounts);
}		}

typedef DenseMap<const Loop *, std::string> VerifyMap;		typedef DenseMap<const Loop *, std::string> VerifyMap;

/// replaceSubString - Replaces all occurrences of From in Str with To.		/// replaceSubString - Replaces all occurrences of From in Str with To.
static void replaceSubString(std::string &Str, StringRef From, StringRef To) {		static void replaceSubString(std::string &Str, StringRef From, StringRef To) {
size_t Pos = 0;		size_t Pos = 0;
while ((Pos = Str.find(From, Pos)) != std::string::npos) {		while ((Pos = Str.find(From, Pos)) != std::string::npos) {
▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	assert(Key && "Only SCEVUnionPredicate doesn't have an "
" associated expression!");		" associated expression!");

SCEVToPreds[Key].push_back(N);		SCEVToPreds[Key].push_back(N);
Preds.push_back(N);		Preds.push_back(N);
}		}

PredicatedScalarEvolution::PredicatedScalarEvolution(ScalarEvolution &SE,		PredicatedScalarEvolution::PredicatedScalarEvolution(ScalarEvolution &SE,
Loop &L)		Loop &L)
: SE(SE), L(L), Generation(0) {}		: SE(SE), L(L), Generation(0), BackedgeCount(nullptr) {}

const SCEV PredicatedScalarEvolution::getSCEV(Value V) {		const SCEV PredicatedScalarEvolution::getSCEV(Value V) {
const SCEV *Expr = SE.getSCEV(V);		const SCEV *Expr = SE.getSCEV(V);
RewriteEntry &Entry = RewriteMap[Expr];		RewriteEntry &Entry = RewriteMap[Expr];

// If we already have an entry and the version matches, return it.		// If we already have an entry and the version matches, return it.
if (Entry.second && Generation == Entry.first)		if (Entry.second && Generation == Entry.first)
return Entry.second;		return Entry.second;

// We found an entry but it's stale. Rewrite the stale entry		// We found an entry but it's stale. Rewrite the stale entry
// acording to the current predicate.		// acording to the current predicate.
if (Entry.second)		if (Entry.second)
Expr = Entry.second;		Expr = Entry.second;

const SCEV *NewSCEV = SE.rewriteUsingPredicate(Expr, &L, Preds);		const SCEV *NewSCEV = SE.rewriteUsingPredicate(Expr, &L, Preds);
Entry = {Generation, NewSCEV};		Entry = {Generation, NewSCEV};

return NewSCEV;		return NewSCEV;
}		}

		const SCEV *PredicatedScalarEvolution::getBackedgeTakenCount() {
		if (!BackedgeCount) {
		SCEVUnionPredicate BackedgePred;
		BackedgeCount = SE.getPredicatedBackedgeTakenCount(&L, BackedgePred);
		addPredicate(BackedgePred);
		}
		return BackedgeCount;
		}

void PredicatedScalarEvolution::addPredicate(const SCEVPredicate &Pred) {		void PredicatedScalarEvolution::addPredicate(const SCEVPredicate &Pred) {
if (Preds.implies(&Pred))		if (Preds.implies(&Pred))
return;		return;
Preds.add(&Pred);		Preds.add(&Pred);
updateGeneration();		updateGeneration();
}		}

const SCEVUnionPredicate &PredicatedScalarEvolution::getUnionPredicate() const {		const SCEVUnionPredicate &PredicatedScalarEvolution::getUnionPredicate() const {
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	const SCEVAddRecExpr PredicatedScalarEvolution::getAsAddRec(Value V) {
if (!New)		if (!New)
return nullptr;		return nullptr;

updateGeneration();		updateGeneration();
RewriteMap[SE.getSCEV(V)] = {Generation, New};		RewriteMap[SE.getSCEV(V)] = {Generation, New};
return New;		return New;
}		}

PredicatedScalarEvolution::		PredicatedScalarEvolution::PredicatedScalarEvolution(
PredicatedScalarEvolution(const PredicatedScalarEvolution &Init) :		const PredicatedScalarEvolution &Init)
RewriteMap(Init.RewriteMap), SE(Init.SE), L(Init.L), Preds(Init.Preds),		: RewriteMap(Init.RewriteMap), SE(Init.SE), L(Init.L), Preds(Init.Preds),
Generation(Init.Generation) {		Generation(Init.Generation), BackedgeCount(Init.BackedgeCount) {
for (auto I = Init.FlagsMap.begin(), E = Init.FlagsMap.end(); I != E; ++I)		for (auto I = Init.FlagsMap.begin(), E = Init.FlagsMap.end(); I != E; ++I)
FlagsMap.insert(*I);		FlagsMap.insert(*I);
}		}

lib/Analysis/ScalarEvolutionExpander.cpp

Show First 20 Lines • Show All 1,998 Lines • ▼ Show 20 Lines	Value SCEVExpander::expandEqualPredicate(const SCEVEqualPredicate Pred,
return I;		return I;
}		}

Value SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr AR,		Value SCEVExpander::generateOverflowCheck(const SCEVAddRecExpr AR,
Instruction *Loc, bool Signed) {		Instruction *Loc, bool Signed) {
assert(AR->isAffine() && "Cannot generate RT check for "		assert(AR->isAffine() && "Cannot generate RT check for "
"non-affine expression");		"non-affine expression");

const SCEV *ExitCount = SE.getBackedgeTakenCount(AR->getLoop());		SCEVUnionPredicate Pred;
		const SCEV *ExitCount =
		SE.getPredicatedBackedgeTakenCount(AR->getLoop(), Pred);
		sanjoyUnsubmitted Not Done Reply Inline Actions This part is a significant semantic change that I unfortunately hadn't (but really should have) noticed in earlier revisions. While I don't yet have a specific example given the current state of the code, I think it is possible to end up with a circular logic fallacy here. Earlier, the invariant was: given a no-overflow predicate, we would write a loop entry predicate `EP` (a function of the backedge count) that, when `true`, would imply no overflow. Formally, `EP => NoOverflow`. However, now we're doing something different. Now we have: `EP => (Backedge taken count is BE => NoOverflow)` (this part is the same as earlier), but `Backedge taken count is BE` is not axiomatically true -- it is true under the `NoOverflow` condition. In other words, the set of logical equations we have are EP => (Backedge taken count is BE => NoOverflow) NoOverflow => Backedge taken count is BE Now we check `EP` at runtime, so it is known to be `true`. Given that, there are two solutions to the above: {`Backedge taken count is BE` is `true`, `NoOverflow` is `true`}; or {`Backedge taken count is BE` is `false`, `NoOverflow` is `false`}. The latter solution is problematic. One problematic case that won't happen today, but illustrates what I'm talking about: for (u32 i = 0; ; i++) store volatile (GEP a, i), 0 The above loop can run forever (assuming `u32` 's overflow is well defined), but let's say we decide to predicate the increment to an NUSW increment. Given that, we know the loop won't run more than `-1` times (since otherwise we will have a side effect that uses poison). Howver, the predicate that you'll compute in SCEVExpander in such a case is `-1 == -1`, which is trivially true, and now the loop is miscompiled (instead of running forever, it'll just run `-1` times). Now, I'll note that I've so far not have been able to come with a problematic case that will break in the current version of the patch, so it is possible that there is some deeper invariant here that is obvious. sanjoy: This part is a significant semantic change that I unfortunately hadn't (but really should have)…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Thanks! That is a very good point and we should be really careful here. I think this change is still correct: Let's say that we return a symbolic answer X, and the correct answer is Y. The problem is when X != Y and we pass the predicate check. Because the answer is wrong, some overflow must happen. Since we are passing the NoOverflow check, we need to have X < Y (otherwise we wouldn't be passing the check). Note that for the first X iterations our predicate holds. I think what makes this correct is the fact that we're using the predicates in HowManyGreaterThans/HowManyLessThans, which means that we need to stop at iteration X. sbaranga: Thanks! That is a very good point and we should be really careful here. I think this change is…
		sanjoyUnsubmitted Not Done Reply Inline Actions So, to rephrase what I understood, by "with the predicate (IsNoWrap AR) the trip count of the loop is N", PSE really means "if the add recurrence AR does not overflow in the first N iterations, then the loop's count is N". In particular, for the loop for (unsigned i; ; i++) side_effect_use(i); predicating `i++` as NUSW does not let you conclude that the trip count of the loop is `UINT_MAX`, since just because `i++` does not overflow the in first `UINT_MAX` iterations does not guarantee that the loop will exit or have UB in the `UINT_MAX+1 th iteration. This is a subtle invariant, so at the very least this needs to be documented explicitly; and the places where we add predicates `HowManyLessThans` etc. need to be audited carefully (it sounds like you already have done most of that?). Finally, if possible, it will be great if we can structure the code in a way that will make (accidentally) breaking this invariant difficult; though I don't have anything concrete in mind right now. sanjoy: So, to rephrase what I understood, by "with the predicate (IsNoWrap AR) the trip count of the…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is a very good description of the logic! Indeed, this is subtle so it needs documenting. I did audit the places where we've added the predicates, but I'd like to have another look. sbaranga: Yes, this is a very good description of the logic! Indeed, this is subtle so it needs…
const SCEV *Step = AR->getStepRecurrence(SE);		const SCEV *Step = AR->getStepRecurrence(SE);
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions I've re-audited the Please see the updated definition of the SCEVWrapPredicate which should make this point clear. Regarding the code restructuring: I don't see a good solution for this either. sbaranga: I've re-audited the Please see the updated definition of the SCEVWrapPredicate which should…
		sbarangaAuthorUnsubmitted Not Done Reply Inline Actions I meant to write that I've re-audited the places where we were using the predicates, and I think everything is ok. sbaranga: I meant to write that I've re-audited the places where we were using the predicates, and I…
const SCEV *Start = AR->getStart();		const SCEV *Start = AR->getStart();

unsigned DstBits = SE.getTypeSizeInBits(AR->getType());		unsigned DstBits = SE.getTypeSizeInBits(AR->getType());
unsigned SrcBits = SE.getTypeSizeInBits(ExitCount->getType());		unsigned SrcBits = SE.getTypeSizeInBits(ExitCount->getType());
unsigned MaxBits = 2 * std::max(DstBits, SrcBits);		unsigned MaxBits = 2 * std::max(DstBits, SrcBits);

auto *TripCount = SE.getTruncateOrZeroExtend(ExitCount, AR->getType());		auto *TripCount = SE.getTruncateOrZeroExtend(ExitCount, AR->getType());
IntegerType *MaxTy = IntegerType::get(Loc->getContext(), MaxBits);		IntegerType *MaxTy = IntegerType::get(Loc->getContext(), MaxBits);
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 2,772 Lines • ▼ Show 20 Lines

Value InnerLoopVectorizer::getOrCreateTripCount(Loop L) {		Value InnerLoopVectorizer::getOrCreateTripCount(Loop L) {
if (TripCount)		if (TripCount)
return TripCount;		return TripCount;

IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());		IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());
// Find the loop boundaries.		// Find the loop boundaries.
ScalarEvolution *SE = PSE.getSE();		ScalarEvolution *SE = PSE.getSE();
const SCEV *BackedgeTakenCount = SE->getBackedgeTakenCount(OrigLoop);		const SCEV *BackedgeTakenCount = PSE.getBackedgeTakenCount();
assert(BackedgeTakenCount != SE->getCouldNotCompute() &&		assert(BackedgeTakenCount != SE->getCouldNotCompute() &&
"Invalid loop count");		"Invalid loop count");

Type *IdxTy = Legal->getWidestInductionType();		Type *IdxTy = Legal->getWidestInductionType();

// The exit count might have the type of i64 while the phi is i32. This can		// The exit count might have the type of i64 while the phi is i32. This can
// happen if we have an induction variable that is sign extended before the		// happen if we have an induction variable that is sign extended before the
// compare. The only way that we get a backedge taken count is that the		// compare. The only way that we get a backedge taken count is that the
▲ Show 20 Lines • Show All 1,630 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::canVectorize() {
// Check if we can if-convert non-single-bb loops.		// Check if we can if-convert non-single-bb loops.
unsigned NumBlocks = TheLoop->getNumBlocks();		unsigned NumBlocks = TheLoop->getNumBlocks();
if (NumBlocks != 1 && !canVectorizeWithIfConvert()) {		if (NumBlocks != 1 && !canVectorizeWithIfConvert()) {
DEBUG(dbgs() << "LV: Can't if-convert the loop.\n");		DEBUG(dbgs() << "LV: Can't if-convert the loop.\n");
return false;		return false;
}		}

// ScalarEvolution needs to be able to find the exit count.		// ScalarEvolution needs to be able to find the exit count.
const SCEV *ExitCount = PSE.getSE()->getBackedgeTakenCount(TheLoop);		const SCEV *ExitCount = PSE.getBackedgeTakenCount();
if (ExitCount == PSE.getSE()->getCouldNotCompute()) {		if (ExitCount == PSE.getSE()->getCouldNotCompute()) {
emitAnalysis(VectorizationReport()		emitAnalysis(VectorizationReport()
<< "could not determine number of loop iterations");		<< "could not determine number of loop iterations");
DEBUG(dbgs() << "LV: SCEV could not compute the loop exit count.\n");		DEBUG(dbgs() << "LV: SCEV could not compute the loop exit count.\n");
return false;		return false;
}		}

// Check if we can vectorize the instructions and CFG in this loop.		// Check if we can vectorize the instructions and CFG in this loop.
▲ Show 20 Lines • Show All 1,696 Lines • Show Last 20 Lines

test/Analysis/ScalarEvolution/predicated-trip-count.ll

This file was added.

				; RUN: opt < %s -analyze -scalar-evolution \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				@A = weak global [1000 x i32] zeroinitializer, align 32

				; The resulting predicate is i16 {0,+,1} <nssw>, meanining
				; that the resulting backedge expression will be valid for:
				; (1 + (-1 smax %M)) <= MAX_INT16
				;
				; At the limit condition for M (MAX_INT16 - 1) we have in the
				; last iteration:
				; i0 <- MAX_INT16
				; i0.ext <- MAX_INT16
				;
				; and therefore no wrapping happend for i0 or i0.ext
				; throughout the execution of the loop. The resulting predicated
				; backedge taken count is correct.

				; CHECK: Classifying expressions for: @test1
				; CHECK: %i.0.ext = sext i16 %i.0 to i32
				; CHECK-NEXT: --> (sext i16 {0,+,1}<%bb3> to i32)
				; CHECK: Loop %bb3: Unpredictable backedge-taken count.
				; CHECK-NEXT: Loop %bb3: Unpredictable max backedge-taken count.
				; CHECK-NEXT: Loop %bb3: Predicated backedge-taken count is (1 + (-1 smax %M))
				; CHECK-NEXT: Predicates:
				; CHECK-NEXT: {0,+,1}<%bb3> Added Flags: <nssw>
				define void @test1(i32 %N, i32 %M) {
				entry:
				br label %bb3

				bb: ; preds = %bb3
				%tmp = getelementptr [1000 x i32], [1000 x i32]* @A, i32 0, i16 %i.0 ; <i32*> [#uses=1]
				store i32 123, i32* %tmp
				%tmp2 = add i16 %i.0, 1 ; <i32> [#uses=1]
				br label %bb3

				bb3: ; preds = %bb, %entry
				%i.0 = phi i16 [ 0, %entry ], [ %tmp2, %bb ] ; <i32> [#uses=3]
				%i.0.ext = sext i16 %i.0 to i32
				%tmp3 = icmp sle i32 %i.0.ext, %M ; <i1> [#uses=1]
				br i1 %tmp3, label %bb, label %bb5

				bb5: ; preds = %bb3
				br label %return

				return: ; preds = %bb5
				ret void
				}

				; The predicated backedge taken count is:
				; (2 + (zext i16 %Start to i32) + ((-2 + (-1 * (sext i16 %Start to i32)))
				; smax (-1 + (-1 * %M)))
				; )

				; -1 + (-1 * %M) <= (-2 + (-1 * (sext i16 %Start to i32))
				; The predicated backedge taken count is 0.
				; From the IR, this is correct since we will bail out at the
				; first iteration.


				; * -1 + (-1 * %M) > (-2 + (-1 * (sext i16 %Start to i32))
				; or: %M < 1 + (sext i16 %Start to i32)
				;
				; The predicated backedge taken count is 1 + (zext i16 %Start to i32) - %M
				;
				; If %M >= MIN_INT + 1, this predicated backedge taken count would be correct (even
				; without predicates). However, for %M < MIN_INT this would be an infinite loop.
				; In these cases, the {%Start,+,-1} <nusw> predicate would be false, as the
				; final value of the expression {%Start,+,-1} expression (%M - 1) would not be
				; representable as an i16.

				; There is also a limit case here where the value of %M is MIN_INT. In this case
				; we still have an infinite loop, since icmp sge %x, MIN_INT will always return
				; true.

				; CHECK: Classifying expressions for: @test2

				; CHECK: %i.0.ext = sext i16 %i.0 to i32
				; CHECK-NEXT: --> (sext i16 {%Start,+,-1}<%bb3> to i32)
				; CHECK: Loop %bb3: Unpredictable backedge-taken count.
				; CHECK-NEXT: Loop %bb3: Unpredictable max backedge-taken count.
				; CHECK-NEXT: Loop %bb3: Predicated backedge-taken count is (2 + (sext i16 %Start to i32) + ((-2 + (-1 * (sext i16 %Start to i32))) smax (-1 + (-1 * %M))))
				; CHECK-NEXT: Predicates:
				; CHECK-NEXT: {%Start,+,-1}<%bb3> Added Flags: <nssw>

				define void @test2(i32 %N, i32 %M, i16 %Start) {
				entry:
				br label %bb3

				bb: ; preds = %bb3
				%tmp = getelementptr [1000 x i32], [1000 x i32]* @A, i32 0, i16 %i.0 ; <i32*> [#uses=1]
				store i32 123, i32* %tmp
				%tmp2 = sub i16 %i.0, 1 ; <i32> [#uses=1]
				br label %bb3

				bb3: ; preds = %bb, %entry
				%i.0 = phi i16 [ %Start, %entry ], [ %tmp2, %bb ] ; <i32> [#uses=3]
				%i.0.ext = sext i16 %i.0 to i32
				%tmp3 = icmp sge i32 %i.0.ext, %M ; <i1> [#uses=1]
				br i1 %tmp3, label %bb, label %bb5

				bb5: ; preds = %bb3
				br label %return

				return: ; preds = %bb5
				ret void
				}

test/Transforms/LoopVectorize/AArch64/backedge-overflow.ll

This file was added.

				; RUN: opt -mtriple=aarch64--linux-gnueabi -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 < %s -S \| FileCheck %s

				; The following tests contain loops for which SCEV cannot determine the backedge
				; taken count. This is because the backedge taken condition is produced by an
				; icmp with one of the sides being a loop varying non-AddRec expression.
				; However, there is a possibility to normalize this to an AddRec expression
				; using SCEV predicates. This allows us to compute a 'guarded' backedge count.
				; The Loop Vectorizer is able to version to loop in order to use this guarded
				; backedge count and vectorize more loops.


				; CHECK-LABEL: test_sge
				; CHECK-LABEL: vector.scevcheck
				; CHECK-LABEL: vector.body
				define void @test_sge(i32* noalias %A,
				i32* noalias %B,
				i32* noalias %C, i32 %N) {
				entry:
				%cmp13 = icmp eq i32 %N, 0
				br i1 %cmp13, label %for.end, label %for.body.preheader

				for.body.preheader:
				br label %for.body

				for.body:
				%indvars.iv = phi i16 [ %indvars.next, %for.body ], [ 0, %for.body.preheader ]
				%indvars.next = add i16 %indvars.iv, 1
				%indvars.ext = zext i16 %indvars.iv to i32

				%arrayidx = getelementptr inbounds i32, i32* %B, i32 %indvars.ext
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx3 = getelementptr inbounds i32, i32* %C, i32 %indvars.ext
				%1 = load i32, i32* %arrayidx3, align 4

				%mul4 = mul i32 %1, %0

				%arrayidx7 = getelementptr inbounds i32, i32* %A, i32 %indvars.ext
				store i32 %mul4, i32* %arrayidx7, align 4

				%exitcond = icmp sge i32 %indvars.ext, %N
				br i1 %exitcond, label %for.end.loopexit, label %for.body

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

				; CHECK-LABEL: test_uge
				; CHECK-LABEL: vector.scevcheck
				; CHECK-LABEL: vector.body
				define void @test_uge(i32* noalias %A,
				i32* noalias %B,
				i32* noalias %C, i32 %N, i32 %Offset) {
				entry:
				%cmp13 = icmp eq i32 %N, 0
				br i1 %cmp13, label %for.end, label %for.body.preheader

				for.body.preheader:
				br label %for.body

				for.body:
				%indvars.iv = phi i16 [ %indvars.next, %for.body ], [ 0, %for.body.preheader ]
				%indvars.next = add i16 %indvars.iv, 1

				%indvars.ext = sext i16 %indvars.iv to i32
				%indvars.access = add i32 %Offset, %indvars.ext

				%arrayidx = getelementptr inbounds i32, i32* %B, i32 %indvars.access
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx3 = getelementptr inbounds i32, i32* %C, i32 %indvars.access
				%1 = load i32, i32* %arrayidx3, align 4

				%mul4 = add i32 %1, %0

				%arrayidx7 = getelementptr inbounds i32, i32* %A, i32 %indvars.access
				store i32 %mul4, i32* %arrayidx7, align 4

				%exitcond = icmp uge i32 %indvars.ext, %N
				br i1 %exitcond, label %for.end.loopexit, label %for.body

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

				; CHECK-LABEL: test_ule
				; CHECK-LABEL: vector.scevcheck
				; CHECK-LABEL: vector.body
				define void @test_ule(i32* noalias %A,
				i32* noalias %B,
				i32* noalias %C, i32 %N,
				i16 %M) {
				entry:
				%cmp13 = icmp eq i32 %N, 0
				br i1 %cmp13, label %for.end, label %for.body.preheader

				for.body.preheader:
				br label %for.body

				for.body:
				%indvars.iv = phi i16 [ %indvars.next, %for.body ], [ %M, %for.body.preheader ]
				%indvars.next = sub i16 %indvars.iv, 1
				%indvars.ext = zext i16 %indvars.iv to i32

				%arrayidx = getelementptr inbounds i32, i32* %B, i32 %indvars.ext
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx3 = getelementptr inbounds i32, i32* %C, i32 %indvars.ext
				%1 = load i32, i32* %arrayidx3, align 4

				%mul4 = mul i32 %1, %0

				%arrayidx7 = getelementptr inbounds i32, i32* %A, i32 %indvars.ext
				store i32 %mul4, i32* %arrayidx7, align 4

				%exitcond = icmp ule i32 %indvars.ext, %N
				br i1 %exitcond, label %for.end.loopexit, label %for.body

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

				; CHECK-LABEL: test_sle
				; CHECK-LABEL: vector.scevcheck
				; CHECK-LABEL: vector.body
				define void @test_sle(i32* noalias %A,
				i32* noalias %B,
				i32* noalias %C, i32 %N,
				i16 %M) {
				entry:
				%cmp13 = icmp eq i32 %N, 0
				br i1 %cmp13, label %for.end, label %for.body.preheader

				for.body.preheader:
				br label %for.body

				for.body:
				%indvars.iv = phi i16 [ %indvars.next, %for.body ], [ %M, %for.body.preheader ]
				%indvars.next = sub i16 %indvars.iv, 1
				%indvars.ext = sext i16 %indvars.iv to i32

				%arrayidx = getelementptr inbounds i32, i32* %B, i32 %indvars.ext
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx3 = getelementptr inbounds i32, i32* %C, i32 %indvars.ext
				%1 = load i32, i32* %arrayidx3, align 4

				%mul4 = mul i32 %1, %0

				%arrayidx7 = getelementptr inbounds i32, i32* %A, i32 %indvars.ext
				store i32 %mul4, i32* %arrayidx7, align 4

				%exitcond = icmp sle i32 %indvars.ext, %N
				br i1 %exitcond, label %for.end.loopexit, label %for.body

				for.end.loopexit:
				br label %for.end

				for.end:
				ret void
				}

test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines

	; Function Attrs: nounwind optsize ssp uwtable			; Function Attrs: nounwind optsize ssp uwtable
	define void @_Z4testPii(i32* nocapture %A, i32 %Length) #0 !dbg !4 {			define void @_Z4testPii(i32* nocapture %A, i32 %Length) #0 !dbg !4 {
	entry:			entry:
	%cmp10 = icmp sgt i32 %Length, 0, !dbg !12			%cmp10 = icmp sgt i32 %Length, 0, !dbg !12
	br i1 %cmp10, label %for.body, label %for.end, !dbg !12, !llvm.loop !14			br i1 %cmp10, label %for.body, label %for.end, !dbg !12, !llvm.loop !14

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions I had to change this test because it started to figure out what the backedge taken count was. This was testing the vectorization remark when we cannot find the backedge taken count. changed the test so that it will continue not be able to get the backedge taken count. What is interesting about this is the way we've managed to get the (exact) backedge taken count. The initial analysis was only able to get the maximum backedge taken count but not the exact one. However, we can use this to get a better SCEV for cmp3. On a following invocation of computeBackedgeTakenCount this information is used to get an exact backedge taken count. sbaranga: I had to change this test because it started to figure out what the backedge taken count was.
				sanjoyUnsubmitted Not Done Reply Inline Actions Interesting. Do you mean "However, we can use this to get a better SCEV for `%0`"? I can see how `SimplifyICmpOperands` would be able to use a tighter range on `%0` to simplify the `sle` into an `slt`. It would be great if SCEV could directly compute the backedge count here though. The problem is that we didn't have a way for `computeExitLimitFromCond` to say "BECount is Infinite if `L` is `INT_MAX` else is `L + 1` (say)" so `computeExitLimitFromCond` would have to give up in the face of the dreaded `COULDNOTCOMPUTE`. But with your work, that is changing (ability to represent predicated BE counts); and perhaps one day SCEV will be able to directly compute the backedge taken counts of loop like these. :) sanjoy: Interesting. Do you mean "However, we can use this to get a better SCEV for `%0`"? I can see…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Yes, that is basically what is happening (we can now use the range information on %0). You're correct, with these changes we will have a way of doing the sle/ule to slt/ult conversions even without the appropriate range information. In fact that would be something really nice to have (I've even seen people hitting this issue on llvm-dev, so maybe this is not just a corner case). sbaranga: Yes, that is basically what is happening (we can now use the range information on %0). You're…
	%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv, !dbg !16			%arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv, !dbg !16
	%0 = trunc i64 %indvars.iv to i32, !dbg !16			%0 = trunc i64 %indvars.iv to i32, !dbg !16
				%ld = load i32, i32* %arrayidx, align 4
	store i32 %0, i32* %arrayidx, align 4, !dbg !16, !tbaa !18			store i32 %0, i32* %arrayidx, align 4, !dbg !16, !tbaa !18
	%cmp3 = icmp sle i32 %0, %Length, !dbg !22			%cmp3 = icmp sle i32 %ld, %Length, !dbg !22
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !12			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !12
	%1 = trunc i64 %indvars.iv.next to i32			%1 = trunc i64 %indvars.iv.next to i32
	%cmp = icmp slt i32 %1, %Length, !dbg !12			%cmp = icmp slt i32 %1, %Length, !dbg !12
	%or.cond = and i1 %cmp3, %cmp, !dbg !22			%or.cond = and i1 %cmp3, %cmp, !dbg !22
	br i1 %or.cond, label %for.body, label %for.end, !dbg !22			br i1 %or.cond, label %for.body, label %for.end, !dbg !22

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	ret void, !dbg !24			ret void, !dbg !24
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Introduce a guarded backedge taken count and use it in LAA and LVClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 52784

include/llvm/Analysis/ScalarEvolution.h

lib/Analysis/LoopAccessAnalysis.cpp

lib/Analysis/ScalarEvolution.cpp

lib/Analysis/ScalarEvolutionExpander.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Analysis/ScalarEvolution/predicated-trip-count.ll

test/Transforms/LoopVectorize/AArch64/backedge-overflow.ll

test/Transforms/LoopVectorize/X86/vectorization-remarks-missed.ll

[SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
ClosedPublic