Page MenuHomePhabricator

[SCEV][LV] Introduce SCEV Predicates and use them to re-implement stride versioning

Authored by sbaranga on Sep 16 2015, 7:53 AM.



SCEV Predicates represent conditions that typically cannot be derived from
static analysis, but can be used to reduce SCEV expressions to forms which are
usable for different optimizers.

ScalarEvolution now has the rewriteUsingPredicate method which can simplify a
SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using
SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions
need to be made a new SCEV Predicate would be created and added to the set.
Each time after calling getSCEV, the user will call the rewriteUsingPredicate

We add two types of predicates
SCEVPredicateSet - implements a set of predicates
SCEVEqualPredicate - tests for equality between two SCEV expressions

We use the SCEVEqualPredicate to re-implement stride versioning. Every time we
version a stride, we will add a SCEVEqualPredicate to the context.
Instead of adding specific stride checks, LoopVectorize now adds a more
generic SCEV check.

We only need to add support for this in the LoopVectorizer since this is the
only pass that will do stride versioning.

The only effect of this change is that now the number of versioned strides
will be limited to 16 (which is should be better than having no limit).

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mzolotukhin added inline comments.Sep 28 2015, 12:16 PM

It might return nullptr in certain cases - please document them.



By the way, why is the assumption (that RHS is SCEVUnknown) needed?


Why do we need it as a public member?


Nitpick: here the predicate set is called Preds, in other functions (e.g. RuntimePointerChecking::insert) it's called Pred.


Yes, you're right, I misread the code.

I think it would make sense to factor out common ('default' implementation) part into a separate class. It will also remove some code duplication from SCEVApplyRewriter and SCEVParameterRewriter. That should be a separate NFC patch, but I think we need to do it before landing this one.


I think it's fine, but at least we should explicitly state somewhere that we're doing it this way. It'll prevent future readers of this code from being caught by surprise.


Why are this and 9224 lines different?


From previous iteration: Why do we need to do OR with False?

Thanks, Michael! Answers inline.

  • Silviu

I'm basically using IdPreds for storage. It's not ideal. Would there be some other way of providing storage for different types of predicates while avoiding heap allocation?

Preds would hold a list of references and would be the thing that the algorithms use.


I needed to get access to Preds from the rewriter. Maybe having some interface for this would be better.


Makes sense. I've create another review for this change:


One of these should be removed, yes.


I wrote this code some time ago but if I remember correctly this was the same reason why LoopAccessInfo::addRuntimeChecks does an AND with True:

"We have to do this trickery because the IRBuilder might fold the check to a constant expression in which case there is no Instruction anchored in a the block.".

sbaranga added inline comments.Sep 30 2015, 2:53 AM

The assumption is not technically required. But if we wanted to have something general here we would have to change the visiting algorithm to test if this predicate matches on every sub-expression. This is technically easy to do, but we don't need it for the current uses (and we would just spend more time doing these checks).

sbaranga updated this revision to Diff 36096.Sep 30 2015, 6:26 AM

Renamed SCEVPredicateSet to SCEVUnionPredicate.
Documented the behaviour of the check-generating functions.
We now should be using "Preds" instead of "Pred" everywhere.

Re-based this change on top of
which simplified the implementation of the predicate rewriter.

Also included other small cleanups and ran the patch through clang-format.

Hi Silviu,

Thanks for the changes! Please find my comments inline.



Makes sense. But then we'll need to clarify, that another assumption is that the unknown part is always LHS (while RHS is always a constant?). Maybe it's worth enforcing these assumptions with some assertions too.


Hmm.. I don't think it'll work..

So, IdPreds is a vector of objects, and Preds is a vector of pointers to those objects, right? Are you going to keep different types of predicates in Preds? If so, where the actual objects will be stored? Will you create a separate IdPreds-like array for any new type of objects that can be stored in Preds (because you can't store objects of a type other than SCEVEqualPredicate in IdPreds, right?) ?

I think we'll have to heap allocate the objects to make the scheme versatile. For an example of how similar problem is solved, you could take a look at ScalarEvolution::getConstant and UniqueSCEVs member of class ScalarEvolution.


Thanks! FWIW, I like that patch, but I'd like to leave it up to Andy or Sanjoy for a final approval.

anemet added inline comments.Sep 30 2015, 2:21 PM

I am not sure I understand this interface (it is certainly under-documented and so is getInfo):

What is the point of passing strides and SCEV complexity? Don't we know already that for each symbolic stride in Strides we will need exactly one check? I understand that this will change when we generate predicates for other SCEV assumption, but:

Wouldn't it be better to pass a PredicateSet here initialized with the strides?

sanjoy requested changes to this revision.Sep 30 2015, 3:14 PM
sanjoy added a reviewer: sanjoy.

Reviewed the SCEV bits. I don't know enough about the other bits to comment on those.


Minor: I'd put newlines between method declarations.


Also minor, but why not just call it implies?


Please document Loc, and pass in things that cannot be nullptr, like SE or DL as references.


The restriction that LHS is a SCEVUnknown seems somewhat arbitrary; but if you want to keep it, please change the type of E0 to be SCEVUnknown.

Also, rename E0 and E1 to LHS and RHS to sync with the documentation, or change the documentation to refer to these as E0 or E1.


Same comment here -- if your invariant is that E0 is a SCEVUnknown, then make that obvious in the type.


I haven't read the whole patch yet, but I think the doc should state if a SCEVUnionPredicate represents a logical and or a logical or (or sometimes one and sometimes the other) of the predicates it contains.


I agree with @mzolotukhin -- this won't work once you have different types of predicates. You'll need a "host" to contain the allocations, perhaps using a BumpPtrAllocator or something similar.


Why do you need to return a pair of instructions? Why not just return a Value * computing the result of the predicate? This will also obviate the need to create a "fake or" in the implementation, and you'll just be able to return what IRBuilder gave you.


Use const auto *.


Why not just return (Op->E0 == E0 && Op->E1 == E1) || (Op->E0 == E1 && Op->E1 == E0)?


Why not just return Builder.CreateICmpNE(...)?


As I've mentioned in the declaration of generateGuardCond, I don't see why you need to return a range of instructions. I'd rather have this return a Value *, and do away with getFirstInst and the fake or instruction.


I think this should be std::all_of to be consistent with the interpretation of SCEVUnionPredicate in generateCheck -- to prove (X|Y)->Z you need to prove (X->Z)&&(Y->Z).


Nit: here and elsewhere, prefer using const auto * or auto * when the type is obvious from the RHS.


[Optional] Why not std::any_of?


As mentioned earlier, this allocation scheme is not quite right.

This revision now requires changes to proceed.Sep 30 2015, 3:14 PM

Thanks for the reviews! Replies inline.


My understanding is that the strides in the Strides map can be replaced with 1 if needed. However it is not guaranteed that we will need to replace these strides with 1.

I was thinking that maybe it would be a good idea to have some VersioningParameters struct to hold all the parameters that tell us if we can use versioning? So at the moment it would be the strides map and the SCEV complexity. Do you think that would make sense?


Ok, this makes sense. I'll do the changes.


I agree, the current scheme is not ideal.

Using a ScalarEvolution-like allocation scheme seems reasonable to me for everything except SCEVUnionPredicate.
But we don't to allocate that anyway so it should be fine.


It is mostly for consistency. This is what some of the other versioning checks return - addRuntimeChecks in LoopAccessInfo and InnerLoopVectorizer::addStrideCheck (removed by this patch).


In fact if we know that LHS and RHS are different (one is a SCEVUnknown and the other a SCEVConstant) we can simplify that expression.


The union predicate function is an "and", so I think this makes the current implementation correct?

sbaranga updated this revision to Diff 36248.Oct 1 2015, 8:26 AM
sbaranga edited edge metadata.

Updated the SCEV predicate allocation scheme, so we now use the same allocator as the SCEVs
(and have the same scheme as the one used for SCEVs by ScalarEvolution).

Made explicit the assumption that LHS is a SCEVUnknown and RHS is a SCEVConstant in
the SCEVEqualPredicate.

This update should address all the other review comments (unless I've accidentally missed

anemet added inline comments.Oct 1 2015, 9:44 AM

My understanding is that the strides in the Strides map can be replaced with 1 if needed. However it is not guaranteed that we will need to replace these strides with 1.

Fair enough.

I was thinking that maybe it would be a good idea to have some VersioningParameters struct to hold all the parameters that tell us if we can use versioning? So at the moment it would be the strides map and the SCEV complexity. Do you think that would make sense?

Just to be clear, I am not looking for stylistic improvements here but trying to make sense of the semantics. If I understand correctly SCEVComplexity specifies the cost of the checks that we're allowed to accumulate to make LAA complete its analysis.

I don't understand why that is an incoming parameter. It seems to me that a better formulation would be to have LAA do its thing, accumulate whatever checks it needs and then make that cost value part of the LAI state. Then a client pass can query that and decide whether it's willing to pay that cost in order to get an analyzable loop or not.

To put it another way, it seem to me that with the current interface you can have this scenario:

  1. One client specifies a low complexity value. There is no LAI for the loop so we compute it but fail because we need more checks than allowed. We *cache* the result.
  2. Next client wants to specify a higher value but can't because LAI is already cached plus actually as the code is currently written this will lead to an assert.

So how do you envision this scenario to work?

sbaranga added inline comments.Oct 2 2015, 7:08 AM

Yes, your analysis is correct and that scenario is a problem.

I chose the current solution because it was equivalent with the existing implementation. I agree with your assessment.

Some possible solutions:

a) no bounds in LAA on predicate sizes. This can have a negative impact on compile time.
b) LAA has its own logic for computing the limit, and we can make sure the limit is high enough to cover all the users.
c) we have some initial default bounds, but clients can request an increase by throwing away the cached LAI result and computing a new one (with an increased threshold). Recomputing the LAI would not be ideal.

I like variant b).
b) is orthogonal to c), but we theoretically wouldn't need c) right now.

anemet added inline comments.Oct 3 2015, 9:56 PM

Do you know about any compile-time or algorithmic complexity issues?

I am not sure if we want to approximate compile time with the cost of the checks.

If we know some worse than linear algorithms here, I much rather control it locally rather hoping that the number of checks would catch it (especially once we start having checks for different type of predicates like non-wrapping, etc.)

sbaranga added inline comments.Oct 5 2015, 5:48 AM

I don't think there are any problems that we can't fix. The predicates themselves have linear complexity for the methods, but the problem is the places where we're using them (using them in some algorithm that was already linear is not great).

The problem with the current code is that it was written for a low number of predicates. There are some places where we do linear traversals of the set (for example in SCEVPredicateRewriter::visitUnknown). We can fix this by having a map from SCEVs to predicates (in the case of the SCEVEqualPredicate this would map the SCEVUnknown to the predicate).

I'll make the required changes.

sbaranga updated this revision to Diff 36727.Oct 7 2015, 5:19 AM
sbaranga edited edge metadata.

Add a getExpr() method to the SCEVPredicate interface. This will return the
expression on which the predicate is applied. The return value of this call
is used by SCEVUnionPredicate to effciently lookup the adequate predicate
during expression rewriting. Now the algorithm should be able to scale with
the number of predicates.

Removed the limit to the number of predicates in LAA. The loop vectorizer will
check at the end of the legality phase to see if the number of predicates is

Fixed several instances where a Loop * parameter was passed but not used.

sanjoy added a comment.Oct 7 2015, 2:03 PM

Overall, I think this is looking much better. I have few minor comments inline. The only design issue (according to me) is that (I've said this inline) we're creating a fake instruction solely to satisfy an internal interface. I think that's a code smell, but I can live with it if that's hard to fix and others are okay with having it.


This is optional to fix, but I'd prefer renaming this to getKind, since getType in LLVM has the connotation of returning a Type *.


Please document Loc.


Newline here?


Here, and in getLHS, please start the sentence with uppercase.


Why not DenseMap?


I'd prefer returning an ArrayRef<const SCEVPredicate *>, and returning an empty ArrayRef if there are no predicates for Expr.


Please construct this in the move constructor.


There is now a ScalarEvolution::getOne, can you use that here?


Use auto *I here.


Isn't V always an instruction?


Don't you need to add the type of the predicate to ID?


Use const auto * because the type is obvious from the RHS.


Use three slashes.


If the IRBuilder returned a constant Check, why do you need to generate the check? I'd either

  • change this to assert(isa<Instruction>(Check)), and fix the cases where the assert fails to return true from isAlwaysTrue (i.e. if the IRBuilder can prove the check is redundant, SCEV should be able to as well)
  • change the interface to allow returning something that says "always true" / "always false" in addition to returning a pair of instructions

I think creating a completely redundant instruction to satisfy internal interfaces is a code smell, and we should try to avoid it if reasonable.


In that case, shouldn't you have

AllCheck = CheckBuilder.CreateAnd(AllCheck, CheckResult);

in SCEVUnionPredicate::generateCheck? Or are you checking something different there?


Shouldn't this be CreateAnd?


As mentioned earlier, IIUC this should be a CreateAnd.


Why not just SCEVToPreds[Key].push_back(N);?

There is a huge number of inline comments from earlier revisions that are still popping up in the new version. This makes it pretty hard to read the patch. Can you please check if marking those "done" would make them disappear?


Please expand this comment to explain how they are used for dep-checking.


Explain the Preds parameter


Same here, please expand comment to explain how they are used.


Comment the Preds parameter.


Here too.


Didn't you want to get rid of MaxSCEVPredicates?


Ah, MaxSCEVPredicates is unused.


Comment missing


Here too please expand the comment to explain what these assumptions are used for.


Here too.


Missing comment.


Any reason this whole logic can't be pushed into LVLegality?

sbaranga marked 80 inline comments as done.Oct 8 2015, 3:53 AM
sbaranga added inline comments.

We remove this from the loop vectorizer so we end up with the same number of copies. There's still some code duplication.


Yes, I think you're right. I would go with the first option.


See the comment bellow.


The generated code checks for the negated condtion. So we basically end up with
(or (not predicate1), (not predicate2), ...) which is the same as
(not (and prediacte1, predicate2, ..)

We do this because the current users already use this interface.
This also avoids the need to explicitly create negations (we just create the negated check for each predicate).

I think this at least needs a comment. Do you have a strong preference about this?

There is a huge number of inline comments from earlier revisions that are still popping up in the new version. This makes it pretty hard to read the patch. Can you please check if marking those "done" would make them disappear?

I've marked the comments as done, but phabricator is still showing them. Do you think it would be better to start a new review?


anemet added a comment.Oct 8 2015, 9:33 AM

I've marked the comments as done, but phabricator is still showing them. Do you think it would be better to start a new review?

Wow, silly Phab. Yeah I think, unless other reviewers object, we should move this to a new review. Thanks!

sbaranga abandoned this revision.Oct 9 2015, 8:43 AM

Ok. Moving the review to