This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
11/17
LoopAccessAnalysis.h
35/42
ScalarEvolution.h
-
lib/
-
Analysis/
2/4
LoopAccessAnalysis.cpp
31/56
ScalarEvolution.cpp
-
Transforms/Vectorize/
-
Vectorize/
1/6
LoopVectorize.cpp

Differential D12905

[SCEV][LV] Introduce SCEV Predicates and use them to re-implement stride versioning
AbandonedPublic

Authored by sbaranga on Sep 16 2015, 7:53 AM.

Download Raw Diff

Details

Reviewers

anemet
mzolotukhin
sanjoy

Summary

SCEV Predicates represent conditions that typically cannot be derived from
static analysis, but can be used to reduce SCEV expressions to forms which are
usable for different optimizers.

ScalarEvolution now has the rewriteUsingPredicate method which can simplify a
SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using
SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions
need to be made a new SCEV Predicate would be created and added to the set.
Each time after calling getSCEV, the user will call the rewriteUsingPredicate
method.

We add two types of predicates
SCEVPredicateSet - implements a set of predicates
SCEVEqualPredicate - tests for equality between two SCEV expressions

We use the SCEVEqualPredicate to re-implement stride versioning. Every time we
version a stride, we will add a SCEVEqualPredicate to the context.
Instead of adding specific stride checks, LoopVectorize now adds a more
generic SCEV check.

We only need to add support for this in the LoopVectorizer since this is the
only pass that will do stride versioning.

The only effect of this change is that now the number of versioned strides
will be limited to 16 (which is should be better than having no limit).

Diff Detail

Event Timeline

sbaranga updated this revision to Diff 34892.Sep 16 2015, 7:53 AM

sbaranga retitled this revision from to [SCEV][LV] Introduce SCEV Predicates and use them to re-implement stride versioning.

sbaranga updated this object.

sbaranga added reviewers: anemet, mzolotukhin.

sbaranga added subscribers: hfinkel, sanjoy, rengolin and 2 others.

hfinkel added inline comments.Sep 22 2015, 4:32 PM

include/llvm/Analysis/ScalarEvolution.h
242	Remove commented-out code.
lib/Analysis/ScalarEvolution.cpp
9498	Can this take a threshold override, or similar, as a parameter to override SCEVCheckThreshold? We had specifically decided that loops decorated with #pragma clang vectorize(enable), which asks for vectorization but does not assert safety, would generate as many checks as necessary to enable vectorization (or be bound by some very large limit). For this case, we'll need to override the limit (or, at least, have a much larger limit). Generically, I'm skeptical of embedding the limit in SCEV at all; I think that the caller should always provide an appropriate limit for whatever happens to be its use case.

Hi Silviu,

Thanks for splitting this out, this is a much easier read now!

Adam

include/llvm/Analysis/LoopAccessAnalysis.h
296	I think I've commented on this too: this should be plural or have set in the name.
546–577	I think I've already asked this before: why is the thing with unique_ptr needed?
include/llvm/Analysis/ScalarEvolution.h
174	I think the enum tags are uppercase. Also this should probably be inside the class.
188–194	Do these really ever happen aside for reaching the check threshold? I think that we should just have an API to query the number of checks.
1245–1248	Stale comment. I also think that the function name should be plural.
lib/Analysis/ScalarEvolution.cpp
114–118	I don't think that this should be central threshold. It should be up to the client transformation to decide if the benefit of the transformation outweighs the overhead of the necessary checks.
9243–9250	Hmm, I think we're duplicating this now into the third file. Any better idea?
9252	Should probably be a class since not all members are public. Also it needs a comment. This may be a silly question but why we need to override all these members?
9431	OF for overflow? ;)
lib/Transforms/Vectorize/LoopVectorize.cpp
321–324	Overflow again.

Hi, Adam, Hal,

Thanks for the reviews! I've replied inline. I should have a new version shortly.

Silviu

include/llvm/Analysis/LoopAccessAnalysis.h
546–577	I'm sure I had some reason at some point of making it a unique_ptr, but I can't remember. I think it should be possible to remove the unique_ptr part.
include/llvm/Analysis/ScalarEvolution.h
188–194	isAlwaysFalse would currently be true only when reaching the threshold. Having an API to return the number of checks (or maybe something that estimates the cost of the checks?), would be more clear.
lib/Analysis/ScalarEvolution.cpp
114–118	Ok, I'll remove this.
9252	I think because SCEVVisitor is a template that uses the visit* methods without defining them, users must define all the methods (or get a compilation error). Not really nice. I'll make the changes (add a comment and convert to a class).
9498	That makes sense to me. There is currently the override in SCEV, but from your argument it looks like it should be passed in from the user. Would that put a complexity bound on our rewriting algorithms / estimates for how expensive using these predicates can be? So far I've worked under the assumption that we would have a limited number of predicates.

anemet added inline comments.Sep 24 2015, 9:51 AM

lib/Analysis/ScalarEvolution.cpp
9252	I think because SCEVVisitor is a template that uses the visit* methods without defining them, users must define all the methods (or get a compilation error). Not really nice. Can this be derived from SCEVParameterRewriter since we only support the equality predicate right now? Then we can extend it later as we add the other predicates.

Hi Silviu,

Thanks for doing this, I think this could be a nice improvement. As for now, several questions: does it work on any new cases, compared to the original StrideVersioning implementation? Do you plan to add any other types of predicates in future?

Also, some comments inline.

Thanks,
Michael

include/llvm/Analysis/LoopAccessAnalysis.h
575–576	Where is this used?
include/llvm/Analysis/ScalarEvolution.h
205	Please add some description for this class.
237–238	The names `SCEVPredicate` and `SCEVPredicateSet` are a bit confusing: usually, one doesn't assume that a `SomethingSet` is derived from `Something`. Would it make sense to rename them to something like `SCEVPredicateBase`/`SCEVUnionPredicate` (not insisting on these particular names)?
239–240	Should it be called `Invalid` or something like this then?
lib/Analysis/LoopAccessAnalysis.cpp
107–121	Instead of storing `VOne` and `One` in a separate variables, I think we can use `ScalarEvolution::getConstant(Type *Ty, uint64_t V, bool isSigned)` here.
lib/Analysis/ScalarEvolution.cpp
9252	users must define all the methods (or get a compilation error). Not really nice. No, you don't have to override all of them, you only need to define the ones you want to change. For generic case, you could override `visitInstruction` method. As an example, you could look at `class UnrolledInstAnalyzer` in `lib/Transforms/Scalar/LoopUnrollPass.cpp`.
9499	I think it should be `>=` since we haven't added the new predicate yet.

In D12905#252980, @mzolotukhin wrote:

Hi Silviu,

Thanks for doing this, I think this could be a nice improvement. As for now, several questions: does it work on any new cases, compared to the original StrideVersioning implementation? Do you plan to add any other types of predicates in future?

Also, some comments inline.

Thanks,
Michael

Hi Michael,

This won't work on any new cases of StrideVersioning, it should be equivalent to the existing implementation.
The main motivation is to introduce the SCEV predicates. I plan to add predicates for overflow tests (see http://reviews.llvm.org/D10161 for the overall direction).

I've found that range checking predicates could also be useful in some cases (the use case I have for this would be canonicalization of "a <= b" into "a < b + 1" in SCEV), but I would have to think more if this is actually needed.

Thanks,
Silviu

include/llvm/Analysis/LoopAccessAnalysis.h
575–576	This is an artefact of rebasing and should be removed.
include/llvm/Analysis/ScalarEvolution.h
237–238	I'm not opposed to changing the names. SCEVPredicateBase/SCEVUnionPredicate could work.
239–240	"Invalid" would be better, yes.

hfinkel added inline comments.Sep 25 2015, 2:34 PM

lib/Analysis/ScalarEvolution.cpp
9498	Would that put a complexity bound on our rewriting algorithms / estimates for how expensive using these predicates can be? So far I've worked under the assumption that we would have a limited number of predicates. I'm not too concerned about this. In cases where we want to override the "generally reasonable" limit with a large one, at least currently, this is always in response to a direct request from the user (due to a pragma or similar), and so we get to assume that the user knows what he or she is doing.

Hi Silviu,

Thanks, I see the general direction now. I found several spots that need some clean-up (you could find them inline), but apart from that I have a question - currently PredicateSet only works as OR of several checks. Do we need (or might we need in future) sets of predicates, combined with AND? I tend to think that we'll need both. I suspect that supporting them both might add non-trivial amount of work to this patch, so we can keep it for later, but I think that as a default option we need AND, not OR. What do you think?

Best regards,
Michael

include/llvm/Analysis/ScalarEvolution.h
243	What is `IdPreds` and how is it different from `Preds`?
252–253	This probably should be private/protected.
258–259	Does it return first and last instruction in the generated check? Could we document it?
lib/Analysis/ScalarEvolution.cpp
9371–9372	Maybe just `return E0 == E1;` ?
9386	The predicate is called Equal, should we generate `ICmpEQ` for consistency here?
9402	Please avoid computing `IdPreds.size()` on every iteration. Even better, we could use range loop here.
9410	Another candidate for a range loop.
9411–9412	What does `OA` stand for? Maybe get rid of this var at all? Actually, this entire function could probably be replaced with `std::all_of`.
9429	Could we just return `std::make_pair(nullptr, nullptr)` as we do below?
9442–9444	Why do we need to do or with `False`?
9458	Range loop?
9479–9489	Good candidates for `std::all_of`/`std::any_of`.

This update addresses part of the review comments:

Removed the isAlwaysFalse interface from the predicates and
replaced it with a getComplexity() method. The users will now
use this method to determine if it is acceptable to create
a new predicate.

Moved the SCEV limit from ScalarEvolution to the users
(LoopVectorize in this case).

Other small changes (removed the unique_ptr use, use range loops,
etc).

Hi Michael,

In D12905#254015, @mzolotukhin wrote:

Hi Silviu,

Thanks, I see the general direction now. I found several spots that need some clean-up (you could find them inline), but apart from that I have a question - currently PredicateSet only works as OR of several checks. Do we need (or might we need in future) sets of predicates, combined with AND? I tend to think that we'll need both. I suspect that supporting them both might add non-trivial amount of work to this patch, so we can keep it for later, but I think that as a default option we need AND, not OR. What do you think?

The current implementation of PredicateSet is actually an AND. The checks do (or (not p)), for p in the set. So we're effectively getting the condition that would be used to invalidate the predicates. I don't see a clear case for 'OR' at this moment but I guess it might be required at some point.

Thanks,
Silviu

include/llvm/Analysis/ScalarEvolution.h
243	Right now it's not different, but I was thinking that in the future it could contain different types of predicates.
258–259	It does, yes! This should be consistent with the interface used by the loop vectorizer / loop access analysis to add runtime checks. I'll add documentation for this in the next version.
lib/Analysis/ScalarEvolution.cpp
9252	I think we have a different case here, as we would like to visit SCEV expressions. I think that the LoopUnrollPass examples visits instructions?
9386	The convention that I used so far was to generate the negative test (ICmpNE). I think the loop vectorizer / loop access analysis were doing the same thing for their checks? Would keeping it like this be a problem?
9482	Makes sense. Thanks!

Hi Silviu,

Thanks for the changes, the code looks much better now! Please find some remarks inline.

Michael

include/llvm/Analysis/ScalarEvolution.h
243	I still don't understand the purpose of having `IdPreds`. Why can't `Preds` contain different types of predicates?

mzolotukhin added inline comments.Sep 28 2015, 12:16 PM

include/llvm/Analysis/ScalarEvolution.h
197–199	It might return `nullptr` in certain cases - please document them.
204–205	s/assume#/assume/ By the way, why is the assumption (that RHS is SCEVUnknown) needed?
250–251	Why do we need it as a `public` member?
lib/Analysis/LoopAccessAnalysis.cpp
426	Nitpick: here the predicate set is called `Preds`, in other functions (e.g. `RuntimePointerChecking::insert`) it's called `Pred`.
lib/Analysis/ScalarEvolution.cpp
9252	Yes, you're right, I misread the code. I think it would make sense to factor out common ('default' implementation) part into a separate class. It will also remove some code duplication from `SCEVApplyRewriter` and `SCEVParameterRewriter`. That should be a separate NFC patch, but I think we need to do it before landing this one.
9386	I think it's fine, but at least we should explicitly state somewhere that we're doing it this way. It'll prevent future readers of this code from being caught by surprise.
9419	Why are this and 9224 lines different?
9421–9422	From previous iteration: Why do we need to do `OR` with `False`?

Thanks, Michael! Answers inline.

Silviu

include/llvm/Analysis/ScalarEvolution.h
243	I'm basically using IdPreds for storage. It's not ideal. Would there be some other way of providing storage for different types of predicates while avoiding heap allocation? Preds would hold a list of references and would be the thing that the algorithms use.
250–251	I needed to get access to Preds from the rewriter. Maybe having some interface for this would be better.
lib/Analysis/ScalarEvolution.cpp
9252	Makes sense. I've create another review for this change: http://reviews.llvm.org/D13242
9419	One of these should be removed, yes.
9421–9422	I wrote this code some time ago but if I remember correctly this was the same reason why LoopAccessInfo::addRuntimeChecks does an AND with True: "We have to do this trickery because the IRBuilder might fold the check to a constant expression in which case there is no Instruction anchored in a the block.".

sbaranga added inline comments.Sep 30 2015, 2:53 AM

include/llvm/Analysis/ScalarEvolution.h
204–205	The assumption is not technically required. But if we wanted to have something general here we would have to change the visiting algorithm to test if this predicate matches on every sub-expression. This is technically easy to do, but we don't need it for the current uses (and we would just spend more time doing these checks).

Renamed SCEVPredicateSet to SCEVUnionPredicate.
Documented the behaviour of the check-generating functions.
We now should be using "Preds" instead of "Pred" everywhere.

Re-based this change on top of http://reviews.llvm.org/D13242
which simplified the implementation of the predicate rewriter.

Also included other small cleanups and ran the patch through clang-format.

Hi Silviu,

Thanks for the changes! Please find my comments inline.

Michael

include/llvm/Analysis/ScalarEvolution.h
204–205	Makes sense. But then we'll need to clarify, that another assumption is that the unknown part is always `LHS` (while `RHS` is always a constant?). Maybe it's worth enforcing these assumptions with some assertions too.
243	Hmm.. I don't think it'll work.. So, `IdPreds` is a vector of objects, and `Preds` is a vector of pointers to those objects, right? Are you going to keep different types of predicates in `Preds`? If so, where the actual objects will be stored? Will you create a separate `IdPreds`-like array for any new type of objects that can be stored in `Preds` (because you can't store objects of a type other than `SCEVEqualPredicate` in `IdPreds`, right?) ? I think we'll have to heap allocate the objects to make the scheme versatile. For an example of how similar problem is solved, you could take a look at `ScalarEvolution::getConstant` and `UniqueSCEVs` member of class `ScalarEvolution`.
lib/Analysis/ScalarEvolution.cpp
9252	Thanks! FWIW, I like that patch, but I'd like to leave it up to Andy or Sanjoy for a final approval.

anemet added inline comments.Sep 30 2015, 2:21 PM

include/llvm/Analysis/LoopAccessAnalysis.h
467–470	I am not sure I understand this interface (it is certainly under-documented and so is getInfo): What is the point of passing strides and SCEV complexity? Don't we know already that for each symbolic stride in Strides we will need exactly one check? I understand that this will change when we generate predicates for other SCEV assumption, but: Wouldn't it be better to pass a PredicateSet here initialized with the strides?

Reviewed the SCEV bits. I don't know enough about the other bits to comment on those.

include/llvm/Analysis/ScalarEvolution.h
187	Minor: I'd put newlines between method declarations.
194	Also minor, but why not just call it `implies`?
200	Please document `Loc`, and pass in things that cannot be `nullptr`, like `SE` or `DL` as references.
211	The restriction that `LHS` is a `SCEVUnknown` seems somewhat arbitrary; but if you want to keep it, please change the type of `E0` to be `SCEVUnknown`. Also, rename `E0` and `E1` to `LHS` and `RHS` to sync with the documentation, or change the documentation to refer to these as `E0` or `E1`.
223	Same comment here -- if your invariant is that `E0` is a `SCEVUnknown`, then make that obvious in the type.
237	I haven't read the whole patch yet, but I think the doc should state if a `SCEVUnionPredicate` represents a logical and or a logical or (or sometimes one and sometimes the other) of the predicates it contains.
242	I agree with @mzolotukhin -- this won't work once you have different types of predicates. You'll need a "host" to contain the allocations, perhaps using a `BumpPtrAllocator` or something similar.
261	Why do you need to return a pair of instructions? Why not just return a `Value *` computing the result of the predicate? This will also obviate the need to create a "fake or" in the implementation, and you'll just be able to return what IRBuilder gave you.
lib/Analysis/ScalarEvolution.cpp
9292	Use `const auto *`.
9297	Why not just `return (Op->E0 == E0 && Op->E1 == E1) \|\| (Op->E0 == E1 && Op->E1 == E0)`?
9313	Why not just `return Builder.CreateICmpNE(...)`?
9336	As I've mentioned in the declaration of `generateGuardCond`, I don't see why you need to return a range of instructions. I'd rather have this return a `Value *`, and do away with `getFirstInst` and the fake `or` instruction.
9385	I think this should be `std::all_of` to be consistent with the interpretation of `SCEVUnionPredicate` in `generateCheck` -- to prove `(X\|Y)->Z` you need to prove `(X->Z)&&(Y->Z)`.
9395	Nit: here and elsewhere, prefer using `const auto ` or `auto ` when the type is obvious from the RHS.
9396	[Optional] Why not `std::any_of`?
9399	As mentioned earlier, this allocation scheme is not quite right.

This revision now requires changes to proceed.Sep 30 2015, 3:14 PM

Thanks for the reviews! Replies inline.

include/llvm/Analysis/LoopAccessAnalysis.h
467–470	My understanding is that the strides in the Strides map can be replaced with 1 if needed. However it is not guaranteed that we will need to replace these strides with 1. I was thinking that maybe it would be a good idea to have some VersioningParameters struct to hold all the parameters that tell us if we can use versioning? So at the moment it would be the strides map and the SCEV complexity. Do you think that would make sense?
include/llvm/Analysis/ScalarEvolution.h
242	Ok, this makes sense. I'll do the changes.
243	I agree, the current scheme is not ideal. Using a ScalarEvolution-like allocation scheme seems reasonable to me for everything except SCEVUnionPredicate. But we don't to allocate that anyway so it should be fine.
261	It is mostly for consistency. This is what some of the other versioning checks return - addRuntimeChecks in LoopAccessInfo and InnerLoopVectorizer::addStrideCheck (removed by this patch).
lib/Analysis/ScalarEvolution.cpp
9297	In fact if we know that LHS and RHS are different (one is a SCEVUnknown and the other a SCEVConstant) we can simplify that expression.
9385	The union predicate function is an "and", so I think this makes the current implementation correct?

Updated the SCEV predicate allocation scheme, so we now use the same allocator as the SCEVs
(and have the same scheme as the one used for SCEVs by ScalarEvolution).

Made explicit the assumption that LHS is a SCEVUnknown and RHS is a SCEVConstant in
the SCEVEqualPredicate.

This update should address all the other review comments (unless I've accidentally missed
something).

anemet added inline comments.Oct 1 2015, 9:44 AM

include/llvm/Analysis/LoopAccessAnalysis.h
467–470	My understanding is that the strides in the Strides map can be replaced with 1 if needed. However it is not guaranteed that we will need to replace these strides with 1. Fair enough. I was thinking that maybe it would be a good idea to have some VersioningParameters struct to hold all the parameters that tell us if we can use versioning? So at the moment it would be the strides map and the SCEV complexity. Do you think that would make sense? Just to be clear, I am not looking for stylistic improvements here but trying to make sense of the semantics. If I understand correctly SCEVComplexity specifies the cost of the checks that we're allowed to accumulate to make LAA complete its analysis. I don't understand why that is an incoming parameter. It seems to me that a better formulation would be to have LAA do its thing, accumulate whatever checks it needs and then make that cost value part of the LAI state. Then a client pass can query that and decide whether it's willing to pay that cost in order to get an analyzable loop or not. To put it another way, it seem to me that with the current interface you can have this scenario: One client specifies a low complexity value. There is no LAI for the loop so we compute it but fail because we need more checks than allowed. We cache the result. Next client wants to specify a higher value but can't because LAI is already cached plus actually as the code is currently written this will lead to an assert. So how do you envision this scenario to work?

sbaranga added inline comments.Oct 2 2015, 7:08 AM

include/llvm/Analysis/LoopAccessAnalysis.h

467–470

Yes, your analysis is correct and that scenario is a problem.

I chose the current solution because it was equivalent with the existing implementation. I agree with your assessment.

Some possible solutions:

a) no bounds in LAA on predicate sizes. This can have a negative impact on compile time.
b) LAA has its own logic for computing the limit, and we can make sure the limit is high enough to cover all the users.
c) we have some initial default bounds, but clients can request an increase by throwing away the cached LAI result and computing a new one (with an increased threshold). Recomputing the LAI would not be ideal.

I like variant b).
b) is orthogonal to c), but we theoretically wouldn't need c) right now.

anemet added inline comments.Oct 3 2015, 9:56 PM

include/llvm/Analysis/LoopAccessAnalysis.h
467–470	Do you know about any compile-time or algorithmic complexity issues? I am not sure if we want to approximate compile time with the cost of the checks. If we know some worse than linear algorithms here, I much rather control it locally rather hoping that the number of checks would catch it (especially once we start having checks for different type of predicates like non-wrapping, etc.)

sbaranga added inline comments.Oct 5 2015, 5:48 AM

include/llvm/Analysis/LoopAccessAnalysis.h
467–470	I don't think there are any problems that we can't fix. The predicates themselves have linear complexity for the methods, but the problem is the places where we're using them (using them in some algorithm that was already linear is not great). The problem with the current code is that it was written for a low number of predicates. There are some places where we do linear traversals of the set (for example in SCEVPredicateRewriter::visitUnknown). We can fix this by having a map from SCEVs to predicates (in the case of the SCEVEqualPredicate this would map the SCEVUnknown to the predicate). I'll make the required changes.

Add a getExpr() method to the SCEVPredicate interface. This will return the
expression on which the predicate is applied. The return value of this call
is used by SCEVUnionPredicate to effciently lookup the adequate predicate
during expression rewriting. Now the algorithm should be able to scale with
the number of predicates.

Removed the limit to the number of predicates in LAA. The loop vectorizer will
check at the end of the legality phase to see if the number of predicates is
acceptable.

Fixed several instances where a Loop * parameter was passed but not used.

Overall, I think this is looking much better. I have few minor comments inline. The only design issue (according to me) is that (I've said this inline) we're creating a fake instruction solely to satisfy an internal interface. I think that's a code smell, but I can live with it if that's hard to fix and others are okay with having it.

include/llvm/Analysis/ScalarEvolution.h
194	This is optional to fix, but I'd prefer renaming this to `getKind`, since `getType` in LLVM has the connotation of returning a `Type *`.
213	Please document `Loc`.
216	Newline here?
267	Here, and in `getLHS`, please start the sentence with uppercase.
285	Why not `DenseMap`?
312	I'd prefer returning an `ArrayRef<const SCEVPredicate *>`, and returning an empty `ArrayRef` if there are no predicates for `Expr`.
1271	Please construct this in the move constructor.
lib/Analysis/LoopAccessAnalysis.cpp
113	There is now a `ScalarEvolution::getOne`, can you use that here?
lib/Analysis/ScalarEvolution.cpp
9247	Use `auto *I` here.
9247	Isn't `V` always an instruction?
9258	Don't you need to add the type of the predicate to `ID`?
9283	Use `const auto *` because the type is obvious from the RHS.
9301	Use three slashes.
9360	If the `IRBuilder` returned a constant `Check`, why do you need to generate the check? I'd either change this to `assert(isa<Instruction>(Check))`, and fix the cases where the assert fails to return `true` from `isAlwaysTrue` (i.e. if the IRBuilder can prove the check is redundant, SCEV should be able to as well) change the interface to allow returning something that says "always true" / "always false" in addition to returning a pair of instructions I think creating a completely redundant instruction to satisfy internal interfaces is a code smell, and we should try to avoid it if reasonable.
9385	In that case, shouldn't you have AllCheck = CheckBuilder.CreateAnd(AllCheck, CheckResult); in `SCEVUnionPredicate::generateCheck`? Or are you checking something different there?
9394	Shouldn't this be `CreateAnd`?
9394	As mentioned earlier, IIUC this should be a `CreateAnd`.
9437	Why not just `SCEVToPreds[Key].push_back(N);`?

There is a huge number of inline comments from earlier revisions that are still popping up in the new version. This makes it pretty hard to read the patch. Can you please check if marking those "done" would make them disappear?

include/llvm/Analysis/LoopAccessAnalysis.h
295	Please expand this comment to explain how they are used for dep-checking.
337–340	Explain the Preds parameter
546–547	Same here, please expand comment to explain how they are used.
594–603	Comment the Preds parameter.
605–608	Here too.
635–636	Didn't you want to get rid of MaxSCEVPredicates?
lib/Analysis/LoopAccessAnalysis.cpp
1783–1784	Ah, MaxSCEVPredicates is unused.
lib/Transforms/Vectorize/LoopVectorize.cpp
401–402	Comment missing
510–511	Here too please expand the comment to explain what these assumptions are used for.
778–779	Here too.
1351	Missing comment.
1692–1701	Any reason this whole logic can't be pushed into LVLegality?

sbaranga marked 80 inline comments as done.Oct 8 2015, 3:53 AM

sbaranga added inline comments.

lib/Analysis/ScalarEvolution.cpp
9243–9250	We remove this from the loop vectorizer so we end up with the same number of copies. There's still some code duplication.
9360	Yes, I think you're right. I would go with the first option.
9385	See the comment bellow.
9394	The generated code checks for the negated condtion. So we basically end up with (or (not predicate1), (not predicate2), ...) which is the same as (not (and prediacte1, predicate2, ..) We do this because the current users already use this interface. This also avoids the need to explicitly create negations (we just create the negated check for each predicate). I think this at least needs a comment. Do you have a strong preference about this?

In D12905#262511, @anemet wrote:

There is a huge number of inline comments from earlier revisions that are still popping up in the new version. This makes it pretty hard to read the patch. Can you please check if marking those "done" would make them disappear?

I've marked the comments as done, but phabricator is still showing them. Do you think it would be better to start a new review?

Thanks,
Silviu

I've marked the comments as done, but phabricator is still showing them. Do you think it would be better to start a new review?

Wow, silly Phab. Yeah I think, unless other reviewers object, we should move this to a new review. Thanks!

Ok. Moving the review to http://reviews.llvm.org/D13595.

Thanks,
Silviu

Revision Contents

Path

Size

include/

llvm/

Analysis/

LoopAccessAnalysis.h

23 lines

ScalarEvolution.h

173 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

69 lines

ScalarEvolution.cpp

200 lines

Transforms/

Vectorize/

LoopVectorize.cpp

174 lines

Diff 36727

include/llvm/Analysis/LoopAccessAnalysis.h

Show All 26 Lines

namespace llvm {		namespace llvm {

class Value;		class Value;
class DataLayout;		class DataLayout;
class ScalarEvolution;		class ScalarEvolution;
class Loop;		class Loop;
class SCEV;		class SCEV;
		class SCEVUnionPredicate;

/// Optimization analysis message produced during vectorization. Messages inform		/// Optimization analysis message produced during vectorization. Messages inform
/// the user why vectorization did not occur.		/// the user why vectorization did not occur.
class LoopAccessReport {		class LoopAccessReport {
std::string Message;		std::string Message;
const Instruction *Instr;		const Instruction *Instr;

protected:		protected:
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	struct Dependence {
bool isPossiblyBackward() const;		bool isPossiblyBackward() const;

/// \brief Print the dependence. \p Instr is used to map the instruction		/// \brief Print the dependence. \p Instr is used to map the instruction
/// indices to instructions.		/// indices to instructions.
void print(raw_ostream &OS, unsigned Depth,		void print(raw_ostream &OS, unsigned Depth,
const SmallVectorImpl<Instruction *> &Instrs) const;		const SmallVectorImpl<Instruction *> &Instrs) const;
};		};

MemoryDepChecker(ScalarEvolution Se, const Loop L)		MemoryDepChecker(ScalarEvolution Se, const Loop L,
		SCEVUnionPredicate &Preds)
: SE(Se), InnermostLoop(L), AccessIdx(0),		: SE(Se), InnermostLoop(L), AccessIdx(0),
ShouldRetryWithRuntimeCheck(false), SafeForVectorization(true),		ShouldRetryWithRuntimeCheck(false), SafeForVectorization(true),
RecordInterestingDependences(true) {}		RecordInterestingDependences(true), Preds(Preds) {}

/// \brief Register the location (instructions are given increasing numbers)		/// \brief Register the location (instructions are given increasing numbers)
/// of a write access.		/// of a write access.
void addAccess(StoreInst *SI) {		void addAccess(StoreInst *SI) {
Value *Ptr = SI->getPointerOperand();		Value *Ptr = SI->getPointerOperand();
Accesses[MemAccessInfo(Ptr, true)].push_back(AccessIdx);		Accesses[MemAccessInfo(Ptr, true)].push_back(AccessIdx);
InstMap.push_back(SI);		InstMap.push_back(SI);
++AccessIdx;		++AccessIdx;
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	private:
/// Otherwise, this function returns true signaling a possible dependence.		/// Otherwise, this function returns true signaling a possible dependence.
Dependence::DepType isDependent(const MemAccessInfo &A, unsigned AIdx,		Dependence::DepType isDependent(const MemAccessInfo &A, unsigned AIdx,
const MemAccessInfo &B, unsigned BIdx,		const MemAccessInfo &B, unsigned BIdx,
const ValueToValueMap &Strides);		const ValueToValueMap &Strides);

/// \brief Check whether the data dependence could prevent store-load		/// \brief Check whether the data dependence could prevent store-load
/// forwarding.		/// forwarding.
bool couldPreventStoreLoadForward(unsigned Distance, unsigned TypeByteSize);		bool couldPreventStoreLoadForward(unsigned Distance, unsigned TypeByteSize);

		/// The SCEV predicate containing all the SCEV-related assumptions.
		anemetUnsubmitted Not Done Reply Inline Actions Please expand this comment to explain how they are used for dep-checking. anemet: Please expand this comment to explain how they are used for dep-checking.
		SCEVUnionPredicate &Preds;
		anemetUnsubmitted Done Reply Inline Actions I think I've commented on this too: this should be plural or have set in the name. anemet: I think I've commented on this too: this should be plural or have set in the name.
};		};

/// \brief Holds information about the memory runtime legality checks to verify		/// \brief Holds information about the memory runtime legality checks to verify
/// that a group of pointers do not overlap.		/// that a group of pointers do not overlap.
class RuntimePointerChecking {		class RuntimePointerChecking {
public:		public:
struct PointerInfo {		struct PointerInfo {
/// Holds the pointer value that we need to check.		/// Holds the pointer value that we need to check.
Show All 24 Lines	public:

/// Reset the state of the pointer runtime information.		/// Reset the state of the pointer runtime information.
void reset() {		void reset() {
Need = false;		Need = false;
Pointers.clear();		Pointers.clear();
Checks.clear();		Checks.clear();
}		}

/// Insert a pointer and calculate the start and end SCEVs.		/// Insert a pointer and calculate the start and end SCEVs.
void insert(Loop Lp, Value Ptr, bool WritePtr, unsigned DepSetId,		void insert(Loop Lp, Value Ptr, bool WritePtr, unsigned DepSetId,
unsigned ASId, const ValueToValueMap &Strides);		unsigned ASId, const ValueToValueMap &Strides,
		SCEVUnionPredicate &Preds);
		anemetUnsubmitted Not Done Reply Inline Actions Explain the Preds parameter anemet: Explain the Preds parameter

/// \brief No run-time memory checking is necessary.		/// \brief No run-time memory checking is necessary.
bool empty() const { return Pointers.empty(); }		bool empty() const { return Pointers.empty(); }

/// A grouping of pointers. A single memcheck is required between		/// A grouping of pointers. A single memcheck is required between
/// two groups.		/// two groups.
struct CheckingPtrGroup {		struct CheckingPtrGroup {
/// \brief Create a new pointer checking group containing a single		/// \brief Create a new pointer checking group containing a single
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
/// is delegated to the MemoryDepChecker class.		/// is delegated to the MemoryDepChecker class.
///		///
/// For memory dependences that cannot be determined at compile time, it		/// For memory dependences that cannot be determined at compile time, it
/// generates run-time checks to prove independence. This is done by		/// generates run-time checks to prove independence. This is done by
/// AccessAnalysis::canCheckPtrAtRT and the checks are maintained by the		/// AccessAnalysis::canCheckPtrAtRT and the checks are maintained by the
/// RuntimePointerCheck class.		/// RuntimePointerCheck class.
class LoopAccessInfo {		class LoopAccessInfo {
public:		public:
LoopAccessInfo(Loop L, ScalarEvolution SE, const DataLayout &DL,		LoopAccessInfo(Loop L, ScalarEvolution SE, const DataLayout &DL,
const TargetLibraryInfo TLI, AliasAnalysis AA,		const TargetLibraryInfo TLI, AliasAnalysis AA,
DominatorTree DT, LoopInfo LI,		DominatorTree DT, LoopInfo LI,
const ValueToValueMap &Strides);		const ValueToValueMap &Strides);
		anemetUnsubmitted Done Reply Inline Actions I am not sure I understand this interface (it is certainly under-documented and so is getInfo): What is the point of passing strides and SCEV complexity? Don't we know already that for each symbolic stride in Strides we will need exactly one check? I understand that this will change when we generate predicates for other SCEV assumption, but: Wouldn't it be better to pass a PredicateSet here initialized with the strides? anemet: I am not sure I understand this interface (it is certainly under-documented and so is getInfo)…
		sbarangaAuthorUnsubmitted Done Reply Inline Actions My understanding is that the strides in the Strides map can be replaced with 1 if needed. However it is not guaranteed that we will need to replace these strides with 1. I was thinking that maybe it would be a good idea to have some VersioningParameters struct to hold all the parameters that tell us if we can use versioning? So at the moment it would be the strides map and the SCEV complexity. Do you think that would make sense? sbaranga: My understanding is that the strides in the Strides map can be replaced with 1 if needed.
		anemetUnsubmitted Done Reply Inline Actions My understanding is that the strides in the Strides map can be replaced with 1 if needed. However it is not guaranteed that we will need to replace these strides with 1. Fair enough. I was thinking that maybe it would be a good idea to have some VersioningParameters struct to hold all the parameters that tell us if we can use versioning? So at the moment it would be the strides map and the SCEV complexity. Do you think that would make sense? Just to be clear, I am not looking for stylistic improvements here but trying to make sense of the semantics. If I understand correctly SCEVComplexity specifies the cost of the checks that we're allowed to accumulate to make LAA complete its analysis. I don't understand why that is an incoming parameter. It seems to me that a better formulation would be to have LAA do its thing, accumulate whatever checks it needs and then make that cost value part of the LAI state. Then a client pass can query that and decide whether it's willing to pay that cost in order to get an analyzable loop or not. To put it another way, it seem to me that with the current interface you can have this scenario: One client specifies a low complexity value. There is no LAI for the loop so we compute it but fail because we need more checks than allowed. We cache the result. Next client wants to specify a higher value but can't because LAI is already cached plus actually as the code is currently written this will lead to an assert. So how do you envision this scenario to work? anemet: > My understanding is that the strides in the Strides map can be replaced with 1 if needed.
		sbarangaAuthorUnsubmitted Done Reply Inline Actions Yes, your analysis is correct and that scenario is a problem. I chose the current solution because it was equivalent with the existing implementation. I agree with your assessment. Some possible solutions: a) no bounds in LAA on predicate sizes. This can have a negative impact on compile time. b) LAA has its own logic for computing the limit, and we can make sure the limit is high enough to cover all the users. c) we have some initial default bounds, but clients can request an increase by throwing away the cached LAI result and computing a new one (with an increased threshold). Recomputing the LAI would not be ideal. I like variant b). b) is orthogonal to c), but we theoretically wouldn't need c) right now. sbaranga: Yes, your analysis is correct and that scenario is a problem. I chose the current solution…
		anemetUnsubmitted Done Reply Inline Actions Do you know about any compile-time or algorithmic complexity issues? I am not sure if we want to approximate compile time with the cost of the checks. If we know some worse than linear algorithms here, I much rather control it locally rather hoping that the number of checks would catch it (especially once we start having checks for different type of predicates like non-wrapping, etc.) anemet: Do you know about any compile-time or algorithmic complexity issues? I am not sure if we want…
		sbarangaAuthorUnsubmitted Done Reply Inline Actions I don't think there are any problems that we can't fix. The predicates themselves have linear complexity for the methods, but the problem is the places where we're using them (using them in some algorithm that was already linear is not great). The problem with the current code is that it was written for a low number of predicates. There are some places where we do linear traversals of the set (for example in SCEVPredicateRewriter::visitUnknown). We can fix this by having a map from SCEVs to predicates (in the case of the SCEVEqualPredicate this would map the SCEVUnknown to the predicate). I'll make the required changes. sbaranga: I don't think there are any problems that we can't fix. The predicates themselves have linear…

/// Return true we can analyze the memory accesses in the loop and there are		/// Return true we can analyze the memory accesses in the loop and there are
/// no memory dependence cycles.		/// no memory dependence cycles.
bool canVectorizeMemory() const { return CanVecMem; }		bool canVectorizeMemory() const { return CanVecMem; }

const RuntimePointerChecking *getRuntimePointerChecking() const {		const RuntimePointerChecking *getRuntimePointerChecking() const {
return &PtrRtChecking;		return &PtrRtChecking;
}		}
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	public:

/// \brief Checks existence of store to invariant address inside loop.		/// \brief Checks existence of store to invariant address inside loop.
/// If the loop has any store to invariant address, then it returns true,		/// If the loop has any store to invariant address, then it returns true,
/// else returns false.		/// else returns false.
bool hasStoreToLoopInvariantAddress() const {		bool hasStoreToLoopInvariantAddress() const {
return StoreToLoopInvariantAddress;		return StoreToLoopInvariantAddress;
}		}

		/// The SCEV predicate containing all the SCEV-related assumptions.
		SCEVUnionPredicate Preds;
		anemetUnsubmitted Not Done Reply Inline Actions Same here, please expand comment to explain how they are used. anemet: Same here, please expand comment to explain how they are used.

private:		private:
/// \brief Analyze the loop. Substitute symbolic strides using Strides.		/// \brief Analyze the loop. Substitute symbolic strides using Strides.
void analyzeLoop(const ValueToValueMap &Strides);		void analyzeLoop(const ValueToValueMap &Strides);

/// \brief Check if the structure of the loop allows it to be analyzed by this		/// \brief Check if the structure of the loop allows it to be analyzed by this
/// pass.		/// pass.
bool canAnalyzeLoop();		bool canAnalyzeLoop();

void emitAnalysis(LoopAccessReport &Message);		void emitAnalysis(LoopAccessReport &Message);

/// We need to check that all of the pointers in this list are disjoint		/// We need to check that all of the pointers in this list are disjoint
/// at runtime.		/// at runtime.
RuntimePointerChecking PtrRtChecking;		RuntimePointerChecking PtrRtChecking;

/// \brief the Memory Dependence Checker which can determine the		/// \brief the Memory Dependence Checker which can determine the
/// loop-independent and loop-carried dependences between memory accesses.		/// loop-independent and loop-carried dependences between memory accesses.
MemoryDepChecker DepChecker;		MemoryDepChecker DepChecker;

Loop *TheLoop;		Loop *TheLoop;
ScalarEvolution *SE;		ScalarEvolution *SE;
const DataLayout &DL;		const DataLayout &DL;
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
AliasAnalysis *AA;		AliasAnalysis *AA;
DominatorTree *DT;		DominatorTree *DT;
LoopInfo *LI;		LoopInfo *LI;

unsigned NumLoads;		unsigned NumLoads;
unsigned NumStores;		unsigned NumStores;
		mzolotukhinUnsubmitted Done Reply Inline Actions Where is this used? mzolotukhin: Where is this used?
		sbarangaAuthorUnsubmitted Done Reply Inline Actions This is an artefact of rebasing and should be removed. sbaranga: This is an artefact of rebasing and should be removed.

		anemetUnsubmitted Done Reply Inline Actions I think I've already asked this before: why is the thing with unique_ptr needed? anemet: I think I've already asked this before: why is the thing with unique_ptr needed?
		sbarangaAuthorUnsubmitted Done Reply Inline Actions I'm sure I had some reason at some point of making it a unique_ptr, but I can't remember. I think it should be possible to remove the unique_ptr part. sbaranga: I'm sure I had some reason at some point of making it a unique_ptr, but I can't remember. I…
unsigned MaxSafeDepDistBytes;		unsigned MaxSafeDepDistBytes;

/// \brief Cache the result of analyzeLoop.		/// \brief Cache the result of analyzeLoop.
bool CanVecMem;		bool CanVecMem;

/// \brief Indicator for storing to uniform addresses.		/// \brief Indicator for storing to uniform addresses.
/// If a loop has write to a loop invariant address then it should be true.		/// If a loop has write to a loop invariant address then it should be true.
bool StoreToLoopInvariantAddress;		bool StoreToLoopInvariantAddress;

/// \brief The diagnostics report generated for the analysis. E.g. why we		/// \brief The diagnostics report generated for the analysis. E.g. why we
/// couldn't analyze the loop.		/// couldn't analyze the loop.
Optional<LoopAccessReport> Report;		Optional<LoopAccessReport> Report;
};		};

Value stripIntegerCast(Value V);		Value stripIntegerCast(Value V);

///\brief Return the SCEV corresponding to a pointer with the symbolic stride		///\brief Return the SCEV corresponding to a pointer with the symbolic stride
///replaced with constant one.		///replaced with constant one.
///		///
/// If \p OrigPtr is not null, use it to look up the stride value instead of \p		/// If \p OrigPtr is not null, use it to look up the stride value instead of \p
/// Ptr. \p PtrToStride provides the mapping between the pointer value and its		/// Ptr. \p PtrToStride provides the mapping between the pointer value and its
/// stride as collected by LoopVectorizationLegality::collectStridedAccess.		/// stride as collected by LoopVectorizationLegality::collectStridedAccess.
const SCEV replaceSymbolicStrideSCEV(ScalarEvolution SE,		const SCEV replaceSymbolicStrideSCEV(ScalarEvolution SE,
const ValueToValueMap &PtrToStride,		const ValueToValueMap &PtrToStride,
Value Ptr, Value OrigPtr = nullptr);		SCEVUnionPredicate &Preds, Value *Ptr,
		Value *OrigPtr = nullptr);
		anemetUnsubmitted Not Done Reply Inline Actions Comment the Preds parameter. anemet: Comment the Preds parameter.

/// \brief Check the stride of the pointer and ensure that it does not wrap in		/// \brief Check the stride of the pointer and ensure that it does not wrap in
/// the address space.		/// the address space.
int isStridedPtr(ScalarEvolution SE, Value Ptr, const Loop *Lp,		int isStridedPtr(ScalarEvolution SE, Value Ptr, const Loop *Lp,
const ValueToValueMap &StridesMap);		const ValueToValueMap &StridesMap, SCEVUnionPredicate &Preds);
		anemetUnsubmitted Not Done Reply Inline Actions Here too. anemet: Here too.

/// \brief This analysis provides dependence information for the memory accesses		/// \brief This analysis provides dependence information for the memory accesses
/// of a loop.		/// of a loop.
///		///
/// It runs the analysis for a loop on demand. This can be initiated by		/// It runs the analysis for a loop on demand. This can be initiated by
/// querying the loop access info via LAA::getInfo. getInfo return a		/// querying the loop access info via LAA::getInfo. getInfo return a
/// LoopAccessInfo object. See this class for the specifics of what information		/// LoopAccessInfo object. See this class for the specifics of what information
/// is provided.		/// is provided.
Show All 10 Lines	public:
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

/// \brief Query the result of the loop access information for the loop \p L.		/// \brief Query the result of the loop access information for the loop \p L.
///		///
/// If the client speculates (and then issues run-time checks) for the values		/// If the client speculates (and then issues run-time checks) for the values
/// of symbolic strides, \p Strides provides the mapping (see		/// of symbolic strides, \p Strides provides the mapping (see
/// replaceSymbolicStrideSCEV). If there is no cached result available run		/// replaceSymbolicStrideSCEV). If there is no cached result available run
/// the analysis.		/// the analysis.
const LoopAccessInfo &getInfo(Loop *L, const ValueToValueMap &Strides);		const LoopAccessInfo &getInfo(Loop *L, const ValueToValueMap &Strides,
		const unsigned MaxSCEVPredicates = 0);
		anemetUnsubmitted Not Done Reply Inline Actions Didn't you want to get rid of MaxSCEVPredicates? anemet: Didn't you want to get rid of MaxSCEVPredicates?

void releaseMemory() override {		void releaseMemory() override {
// Invalidate the cache when the pass is freed.		// Invalidate the cache when the pass is freed.
LoopAccessInfoMap.clear();		LoopAccessInfoMap.clear();
}		}

/// \brief Print the result of the analysis when invoked with -analyze.		/// \brief Print the result of the analysis when invoked with -analyze.
void print(raw_ostream &OS, const Module *M = nullptr) const override;		void print(raw_ostream &OS, const Module *M = nullptr) const override;
Show All 15 Lines

include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	namespace llvm {
class DataLayout;		class DataLayout;
class TargetLibraryInfo;		class TargetLibraryInfo;
class LLVMContext;		class LLVMContext;
class Loop;		class Loop;
class LoopInfo;		class LoopInfo;
class Operator;		class Operator;
class SCEVUnknown;		class SCEVUnknown;
class SCEVAddRecExpr;		class SCEVAddRecExpr;
		class SCEVConstant;
class SCEV;		class SCEV;
		class SCEVExpander;
		class SCEVPredicate;

template<> struct FoldingSetTrait<SCEV>;		template<> struct FoldingSetTrait<SCEV>;
		template<> struct FoldingSetTrait<SCEVPredicate>;

/// This class represents an analyzed expression in the program. These are		/// This class represents an analyzed expression in the program. These are
/// opaque objects that the client is not allowed to do much with directly.		/// opaque objects that the client is not allowed to do much with directly.
///		///
class SCEV : public FoldingSetNode {		class SCEV : public FoldingSetNode {
friend struct FoldingSetTrait<SCEV>;		friend struct FoldingSetTrait<SCEV>;

/// A reference to an Interned FoldingSetNodeID for this node. The		/// A reference to an Interned FoldingSetNodeID for this node. The
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	#endif
/// operations are valid on this class, it is just a marker.		/// operations are valid on this class, it is just a marker.
struct SCEVCouldNotCompute : public SCEV {		struct SCEVCouldNotCompute : public SCEV {
SCEVCouldNotCompute();		SCEVCouldNotCompute();

/// Methods for support type inquiry through isa, cast, and dyn_cast:		/// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const SCEV *S);		static bool classof(const SCEV *S);
};		};

		//===--------------------------------------------------------------------===//
		anemetUnsubmitted Done Reply Inline Actions I think the enum tags are uppercase. Also this should probably be inside the class. anemet: I think the enum tags are uppercase. Also this should probably be inside the class.
		/// SCEVPredicate - This class represents an assumption made using SCEV
		/// expressions which can be checked at run-time.
		///
		class SCEVPredicate : public FoldingSetNode {
		friend struct FoldingSetTrait<SCEVPredicate>;

		/// A reference to an Interned FoldingSetNodeID for this node. The
		/// ScalarEvolution's BumpPtrAllocator holds the data.
		FoldingSetNodeIDRef FastID;

		protected:
		unsigned short SCEVPredicateType;
		enum SCEVPredicateTypes { PSET, PEQUAL };
		sanjoyUnsubmitted Done Reply Inline Actions Minor: I'd put newlines between method declarations. sanjoy: Minor: I'd put newlines between method declarations.

		public:
		SCEVPredicate(const FoldingSetNodeIDRef ID, unsigned short Type);

		virtual ~SCEVPredicate() {}

		unsigned short getType() const { return SCEVPredicateType; }
		anemetUnsubmitted Done Reply Inline Actions Do these really ever happen aside for reaching the check threshold? I think that we should just have an API to query the number of checks. anemet: Do these really ever happen aside for reaching the check threshold? I think that we should…
		sbarangaAuthorUnsubmitted Done Reply Inline Actions isAlwaysFalse would currently be true only when reaching the threshold. Having an API to return the number of checks (or maybe something that estimates the cost of the checks?), would be more clear. sbaranga: isAlwaysFalse would currently be true only when reaching the threshold. Having an API to return…
		sanjoyUnsubmitted Done Reply Inline Actions Also minor, but why not just call it `implies`? sanjoy: Also minor, but why not just call it `implies`?
		sanjoyUnsubmitted Not Done Reply Inline Actions This is optional to fix, but I'd prefer renaming this to `getKind`, since `getType` in LLVM has the connotation of returning a `Type `. sanjoy:* This is optional to fix, but I'd prefer renaming this to `getKind`, since `getType` in LLVM has…

		/// \brief Returns the estimated complexity of this predicate.
		/// This is roughly measured in the number of run-time checks required.
		virtual unsigned getComplexity() { return 1; }

		mzolotukhinUnsubmitted Done Reply Inline Actions It might return `nullptr` in certain cases - please document them. mzolotukhin: It might return `nullptr` in certain cases - please document them.
		/// \brief Returns true if the predicate is always true. This means that no
		sanjoyUnsubmitted Done Reply Inline Actions Please document `Loc`, and pass in things that cannot be `nullptr`, like `SE` or `DL` as references. sanjoy: Please document `Loc`, and pass in things that cannot be `nullptr`, like `SE` or `DL` as…
		/// assumptions were made and nothing needs to be checked at run-time.
		virtual bool isAlwaysTrue() const = 0;

		/// \brief Returns true if this predicate implies \p N.
		virtual bool implies(const SCEVPredicate *N) const = 0;
		mzolotukhinUnsubmitted Done Reply Inline Actions Please add some description for this class. mzolotukhin: Please add some description for this class.
		mzolotukhinUnsubmitted Done Reply Inline Actions s/assume#/assume/ By the way, why is the assumption (that RHS is SCEVUnknown) needed? mzolotukhin: s/assume#/assume/ By the way, why is the assumption (that RHS is SCEVUnknown) needed?
		sbarangaAuthorUnsubmitted Done Reply Inline Actions The assumption is not technically required. But if we wanted to have something general here we would have to change the visiting algorithm to test if this predicate matches on every sub-expression. This is technically easy to do, but we don't need it for the current uses (and we would just spend more time doing these checks). sbaranga: The assumption is not technically required. But if we wanted to have something general here we…
		mzolotukhinUnsubmitted Done Reply Inline Actions Makes sense. But then we'll need to clarify, that another assumption is that the unknown part is always `LHS` (while `RHS` is always a constant?). Maybe it's worth enforcing these assumptions with some assertions too. mzolotukhin: Makes sense. But then we'll need to clarify, that another assumption is that the unknown part…

		/// \brief Prints a textual representation of this predicate.
		virtual void print(raw_ostream &OS, unsigned Depth) const = 0;

		/// \brief Generates a run-time check for this predicate. The return value
		/// of this check indicates if the predicate is not true.
		sanjoyUnsubmitted Done Reply Inline Actions The restriction that `LHS` is a `SCEVUnknown` seems somewhat arbitrary; but if you want to keep it, please change the type of `E0` to be `SCEVUnknown`. Also, rename `E0` and `E1` to `LHS` and `RHS` to sync with the documentation, or change the documentation to refer to these as `E0` or `E1`. sanjoy: The restriction that `LHS` is a `SCEVUnknown` seems somewhat arbitrary; but if you want to keep…
		/// In case no checks are needed nullptr is returned.
		virtual Value generateCheck(Instruction Loc, ScalarEvolution &SE,
		sanjoyUnsubmitted Not Done Reply Inline Actions Please document `Loc`. sanjoy: Please document `Loc`.
		const DataLayout &DL,
		SCEVExpander &Exp) const = 0;
		/// \brief Returns the SCEV to which this predicate applies, or nullptr
		sanjoyUnsubmitted Not Done Reply Inline Actions Newline here? sanjoy: Newline here?
		/// if this is a SCEVUnionPredicate.
		virtual const SCEV *getExpr() const = 0;
		};

		// Specialize FoldingSetTrait for SCEVPredicate to avoid needing to compute
		// temporary FoldingSetNodeID values.
		template <>
		sanjoyUnsubmitted Done Reply Inline Actions Same comment here -- if your invariant is that `E0` is a `SCEVUnknown`, then make that obvious in the type. sanjoy: Same comment here -- if your invariant is that `E0` is a `SCEVUnknown`, then make that obvious…
		struct FoldingSetTrait<SCEVPredicate>
		: DefaultFoldingSetTrait<SCEVPredicate> {

		static void Profile(const SCEVPredicate &X, FoldingSetNodeID &ID) {
		ID = X.FastID;
		}

		static bool Equals(const SCEVPredicate &X, const FoldingSetNodeID &ID,
		unsigned IDHash, FoldingSetNodeID &TempID) {
		return ID == X.FastID;
		}
		static unsigned ComputeHash(const SCEVPredicate &X,
		FoldingSetNodeID &TempID) {
		return X.FastID.ComputeHash();
		sanjoyUnsubmitted Done Reply Inline Actions I haven't read the whole patch yet, but I think the doc should state if a `SCEVUnionPredicate` represents a logical and or a logical or (or sometimes one and sometimes the other) of the predicates it contains. sanjoy: I haven't read the whole patch yet, but I think the doc should state if a `SCEVUnionPredicate`…
		}
		mzolotukhinUnsubmitted Done Reply Inline Actions The names `SCEVPredicate` and `SCEVPredicateSet` are a bit confusing: usually, one doesn't assume that a `SomethingSet` is derived from `Something`. Would it make sense to rename them to something like `SCEVPredicateBase`/`SCEVUnionPredicate` (not insisting on these particular names)? mzolotukhin: The names `SCEVPredicate` and `SCEVPredicateSet` are a bit confusing: usually, one doesn't…
		sbarangaAuthorUnsubmitted Done Reply Inline Actions I'm not opposed to changing the names. SCEVPredicateBase/SCEVUnionPredicate could work. sbaranga: I'm not opposed to changing the names. SCEVPredicateBase/SCEVUnionPredicate could work.
		};

		mzolotukhinUnsubmitted Done Reply Inline Actions Should it be called `Invalid` or something like this then? mzolotukhin: Should it be called `Invalid` or something like this then?
		sbarangaAuthorUnsubmitted Done Reply Inline Actions "Invalid" would be better, yes. sbaranga: "Invalid" would be better, yes.
		//===--------------------------------------------------------------------===//
		/// SCEVEqualPredicate - This class represents an assumption that two SCEV
		hfinkelUnsubmitted Done Reply Inline Actions Remove commented-out code. hfinkel: Remove commented-out code.
		sanjoyUnsubmitted Done Reply Inline Actions I agree with @mzolotukhin -- this won't work once you have different types of predicates. You'll need a "host" to contain the allocations, perhaps using a `BumpPtrAllocator` or something similar. sanjoy: I agree with @mzolotukhin -- this won't work once you have different types of predicates.
		sbarangaAuthorUnsubmitted Done Reply Inline Actions Ok, this makes sense. I'll do the changes. sbaranga: Ok, this makes sense. I'll do the changes.
		/// expressions are equal, and this can be checked at run-time. We assume
		mzolotukhinUnsubmitted Done Reply Inline Actions What is `IdPreds` and how is it different from `Preds`? mzolotukhin: What is `IdPreds` and how is it different from `Preds`?
		sbarangaAuthorUnsubmitted Done Reply Inline Actions Right now it's not different, but I was thinking that in the future it could contain different types of predicates. sbaranga: Right now it's not different, but I was thinking that in the future it could contain different…
		mzolotukhinUnsubmitted Done Reply Inline Actions I still don't understand the purpose of having `IdPreds`. Why can't `Preds` contain different types of predicates? mzolotukhin: I still don't understand the purpose of having `IdPreds`. Why can't `Preds` contain different…
		sbarangaAuthorUnsubmitted Done Reply Inline Actions I'm basically using IdPreds for storage. It's not ideal. Would there be some other way of providing storage for different types of predicates while avoiding heap allocation? Preds would hold a list of references and would be the thing that the algorithms use. sbaranga: I'm basically using IdPreds for storage. It's not ideal. Would there be some other way of…
		mzolotukhinUnsubmitted Done Reply Inline Actions Hmm.. I don't think it'll work.. So, `IdPreds` is a vector of objects, and `Preds` is a vector of pointers to those objects, right? Are you going to keep different types of predicates in `Preds`? If so, where the actual objects will be stored? Will you create a separate `IdPreds`-like array for any new type of objects that can be stored in `Preds` (because you can't store objects of a type other than `SCEVEqualPredicate` in `IdPreds`, right?) ? I think we'll have to heap allocate the objects to make the scheme versatile. For an example of how similar problem is solved, you could take a look at `ScalarEvolution::getConstant` and `UniqueSCEVs` member of class `ScalarEvolution`. mzolotukhin: Hmm.. I don't think it'll work.. So, `IdPreds` is a vector of objects, and `Preds` is a vector…
		sbarangaAuthorUnsubmitted Done Reply Inline Actions I agree, the current scheme is not ideal. Using a ScalarEvolution-like allocation scheme seems reasonable to me for everything except SCEVUnionPredicate. But we don't to allocate that anyway so it should be fine. sbaranga: I agree, the current scheme is not ideal. Using a ScalarEvolution-like allocation scheme seems…
		/// that the left hand side is a SCEVUnknown and the right hand side a
		/// constant.
		///
		class SCEVEqualPredicate : public SCEVPredicate {
		/// We assume that LHS == RHS, where LHS is a SCEVUnknown and RHS a
		/// constant.
		const SCEVUnknown *LHS;
		const SCEVConstant *RHS;
		mzolotukhinUnsubmitted Done Reply Inline Actions Why do we need it as a `public` member? mzolotukhin: Why do we need it as a `public` member?
		sbarangaAuthorUnsubmitted Done Reply Inline Actions I needed to get access to Preds from the rewriter. Maybe having some interface for this would be better. sbaranga: I needed to get access to Preds from the rewriter. Maybe having some interface for this would…

		public:
		mzolotukhinUnsubmitted Done Reply Inline Actions This probably should be private/protected. mzolotukhin: This probably should be private/protected.
		SCEVEqualPredicate(const FoldingSetNodeIDRef ID, const SCEVUnknown *LHS,
		const SCEVConstant *RHS);

		/// Implementation of the SCEVPredicate interface
		bool implies(const SCEVPredicate *N) const override;
		void print(raw_ostream &OS, unsigned Depth) const override;
		mzolotukhinUnsubmitted Done Reply Inline Actions Does it return first and last instruction in the generated check? Could we document it? mzolotukhin: Does it return first and last instruction in the generated check? Could we document it?
		sbarangaAuthorUnsubmitted Done Reply Inline Actions It does, yes! This should be consistent with the interface used by the loop vectorizer / loop access analysis to add runtime checks. I'll add documentation for this in the next version. sbaranga: It does, yes! This should be consistent with the interface used by the loop vectorizer / loop…
		bool isAlwaysTrue() const override;
		const SCEV *getExpr() const;
		sanjoyUnsubmitted Done Reply Inline Actions Why do you need to return a pair of instructions? Why not just return a `Value ` computing the result of the predicate? This will also obviate the need to create a "fake or" in the implementation, and you'll just be able to return what IRBuilder gave you. sanjoy:* Why do you need to return a pair of instructions? Why not just return a `Value *` computing…
		sbarangaAuthorUnsubmitted Done Reply Inline Actions It is mostly for consistency. This is what some of the other versioning checks return - addRuntimeChecks in LoopAccessInfo and InnerLoopVectorizer::addStrideCheck (removed by this patch). sbaranga: It is mostly for consistency. This is what some of the other versioning checks return…

		Value generateCheck(Instruction Loc, ScalarEvolution &SE,
		const DataLayout &DL,
		SCEVExpander &Exp) const override;

		/// \brief returns the left hand side of the equality.
		sanjoyUnsubmitted Not Done Reply Inline Actions Here, and in `getLHS`, please start the sentence with uppercase. sanjoy: Here, and in `getLHS`, please start the sentence with uppercase.
		const SCEVUnknown *getLHS() const { return LHS; }

		/// \brief returns the right hand side of the equality.
		const SCEVConstant *getRHS() const { return RHS; }

		/// Methods for support type inquiry through isa, cast, and dyn_cast:
		static inline bool classof(const SCEVPredicate *P) {
		return P->getType() == PEQUAL;
		}
		};

		//===--------------------------------------------------------------------===//
		/// SCEVUnionPredicate - This class represents a composition of other
		/// SCEV predicates, and is the class that most clients will interact with.
		/// This is equivalent to a logical "AND" of all the predicates in the union.
		class SCEVUnionPredicate : public SCEVPredicate {
		private:
		typedef std::map<const SCEV , SmallVector<const SCEVPredicate , 4>>
		sanjoyUnsubmitted Not Done Reply Inline Actions Why not `DenseMap`? sanjoy: Why not `DenseMap`?
		PredicateMap;

		/// Vector with references to all predicates in this union.
		SmallVector<const SCEVPredicate *, 16> Preds;
		/// Maps SCEVs to predicates for quick look-ups.
		PredicateMap SCEVToPreds;

		public:
		SCEVUnionPredicate();

		/// \brief Adds a predicate to this union.
		void add(const SCEVPredicate *N);

		/// \brief Generates a run-time check for all the contained predicates
		/// using \p Loc as the instruction inserton point.
		/// This is a wrapper around generateCheck, and provides an interface
		/// similar to other run-time checks used for versioning. Like the
		/// generateCheck method, the returned values indicate if any of its
		/// sub-predicates are not true. It will return a pair formed by the
		/// first and last instructions in the check. If no instructions are
		/// generated we will return a pair of nullptr.
		std::pair<Instruction , Instruction >
		generateGuardCond(Instruction *Loc, ScalarEvolution &SE);

		/// \brief Returns a reference to a vector containing all predicates
		/// which apply to \p Expr.
		SmallVectorImpl<const SCEVPredicate >
		sanjoyUnsubmitted Not Done Reply Inline Actions I'd prefer returning an `ArrayRef<const SCEVPredicate >`, and returning an empty `ArrayRef` if there are no predicates for `Expr`. sanjoy:* I'd prefer returning an `ArrayRef<const SCEVPredicate *>`, and returning an empty `ArrayRef` if…
		getPredicatesForExpr(const SCEV *Expr);

		/// Implementation of the SCEVPredicate interface
		bool isAlwaysTrue() const override;
		bool implies(const SCEVPredicate *N) const override;
		void print(raw_ostream &OS, unsigned Depth) const;
		const SCEV *getExpr() const override;

		Value generateCheck(Instruction Loc, ScalarEvolution &SE,
		const DataLayout &DL,
		SCEVExpander &Exp) const override;

		/// \brief We estimate the complexity of a union predicate as the size
		/// number of predicates in the union.
		unsigned getComplexity() override { return Preds.size(); }

		/// Methods for support type inquiry through isa, cast, and dyn_cast:
		static inline bool classof(const SCEVPredicate *P) {
		return P->getType() == PSET;
		}
		};

/// The main scalar evolution driver. Because client code (intentionally)		/// The main scalar evolution driver. Because client code (intentionally)
/// can't do much with the SCEV objects directly, they must ask this class		/// can't do much with the SCEV objects directly, they must ask this class
/// for services.		/// for services.
class ScalarEvolution {		class ScalarEvolution {
public:		public:
/// An enum describing the relationship between a SCEV and a loop.		/// An enum describing the relationship between a SCEV and a loop.
enum LoopDisposition {		enum LoopDisposition {
LoopVariant, ///< The SCEV is loop-variant (unknown).		LoopVariant, ///< The SCEV is loop-variant (unknown).
▲ Show 20 Lines • Show All 894 Lines • ▼ Show 20 Lines	public:
/// The subscript of the outermost dimension is the Quotient: [j+k].		/// The subscript of the outermost dimension is the Quotient: [j+k].
///		///
/// Overall, we have: A[][n][m], and the access function: A[j+k][2i][5i].		/// Overall, we have: A[][n][m], and the access function: A[j+k][2i][5i].
void delinearize(const SCEV *Expr,		void delinearize(const SCEV *Expr,
SmallVectorImpl<const SCEV *> &Subscripts,		SmallVectorImpl<const SCEV *> &Subscripts,
SmallVectorImpl<const SCEV *> &Sizes,		SmallVectorImpl<const SCEV *> &Sizes,
const SCEV *ElementSize);		const SCEV *ElementSize);

		const SCEVPredicate getEqualPredicate(const SCEVUnknown LHS,
		const SCEVConstant *RHS);

		/// Re-writes the SCEV according to the Predicates in \p Preds.
		anemetUnsubmitted Done Reply Inline Actions Stale comment. I also think that the function name should be plural. anemet: Stale comment. I also think that the function name should be plural.
		const SCEV rewriteUsingPredicate(const SCEV Scev, SCEVUnionPredicate &A);

private:		private:
/// Compute the backedge taken count knowing the interval difference, the		/// Compute the backedge taken count knowing the interval difference, the
/// stride and presence of the equality in the comparison.		/// stride and presence of the equality in the comparison.
const SCEV computeBECount(const SCEV Delta, const SCEV *Stride,		const SCEV computeBECount(const SCEV Delta, const SCEV *Stride,
bool Equality);		bool Equality);

/// Verify if an linear IV with positive stride can overflow when in a		/// Verify if an linear IV with positive stride can overflow when in a
/// less-than comparison, knowing the invariant term of the comparison,		/// less-than comparison, knowing the invariant term of the comparison,
/// the stride and the knowledge of NSW/NUW flags on the recurrence.		/// the stride and the knowledge of NSW/NUW flags on the recurrence.
bool doesIVOverflowOnLT(const SCEV RHS, const SCEV Stride,		bool doesIVOverflowOnLT(const SCEV RHS, const SCEV Stride,
bool IsSigned, bool NoWrap);		bool IsSigned, bool NoWrap);

/// Verify if an linear IV with negative stride can overflow when in a		/// Verify if an linear IV with negative stride can overflow when in a
/// greater-than comparison, knowing the invariant term of the comparison,		/// greater-than comparison, knowing the invariant term of the comparison,
/// the stride and the knowledge of NSW/NUW flags on the recurrence.		/// the stride and the knowledge of NSW/NUW flags on the recurrence.
bool doesIVOverflowOnGT(const SCEV RHS, const SCEV Stride,		bool doesIVOverflowOnGT(const SCEV RHS, const SCEV Stride,
bool IsSigned, bool NoWrap);		bool IsSigned, bool NoWrap);

private:		private:
FoldingSet<SCEV> UniqueSCEVs;		FoldingSet<SCEV> UniqueSCEVs;
		FoldingSet<SCEVPredicate> UniquePreds;
		sanjoyUnsubmitted Not Done Reply Inline Actions Please construct this in the move constructor. sanjoy: Please construct this in the move constructor.
BumpPtrAllocator SCEVAllocator;		BumpPtrAllocator SCEVAllocator;

/// The head of a linked list of all SCEVUnknown values that have been		/// The head of a linked list of all SCEVUnknown values that have been
/// allocated. This is used by releaseMemory to locate them all and call		/// allocated. This is used by releaseMemory to locate them all and call
/// their destructors.		/// their destructors.
SCEVUnknown *FirstUnknown;		SCEVUnknown *FirstUnknown;
};		};

▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	Value llvm::stripIntegerCast(Value V) {
if (CastInst *CI = dyn_cast<CastInst>(V))		if (CastInst *CI = dyn_cast<CastInst>(V))
if (CI->getOperand(0)->getType()->isIntegerTy())		if (CI->getOperand(0)->getType()->isIntegerTy())
return CI->getOperand(0);		return CI->getOperand(0);
return V;		return V;
}		}

const SCEV llvm::replaceSymbolicStrideSCEV(ScalarEvolution SE,		const SCEV llvm::replaceSymbolicStrideSCEV(ScalarEvolution SE,
const ValueToValueMap &PtrToStride,		const ValueToValueMap &PtrToStride,
		SCEVUnionPredicate &Preds,
Value Ptr, Value OrigPtr) {		Value Ptr, Value OrigPtr) {

const SCEV *OrigSCEV = SE->getSCEV(Ptr);		const SCEV *OrigSCEV = SE->getSCEV(Ptr);

// If there is an entry in the map return the SCEV of the pointer with the		// If there is an entry in the map return the SCEV of the pointer with the
// symbolic stride replaced by one.		// symbolic stride replaced by one.
ValueToValueMap::const_iterator SI =		ValueToValueMap::const_iterator SI =
PtrToStride.find(OrigPtr ? OrigPtr : Ptr);		PtrToStride.find(OrigPtr ? OrigPtr : Ptr);
if (SI != PtrToStride.end()) {		if (SI != PtrToStride.end()) {
Value *StrideVal = SI->second;		Value *StrideVal = SI->second;

// Strip casts.		// Strip casts.
StrideVal = stripIntegerCast(StrideVal);		StrideVal = stripIntegerCast(StrideVal);

// Replace symbolic stride by one.		// Replace symbolic stride by one.
Value *One = ConstantInt::get(StrideVal->getType(), 1);		Value *One = ConstantInt::get(StrideVal->getType(), 1);
ValueToValueMap RewriteMap;		ValueToValueMap RewriteMap;
RewriteMap[StrideVal] = One;		RewriteMap[StrideVal] = One;

const SCEV *ByOne =		const auto U = static_cast<const SCEVUnknown >(SE->getSCEV(StrideVal));
SCEVParameterRewriter::rewrite(OrigSCEV, *SE, RewriteMap, true);		const auto CT = static_cast<const SCEVConstant >(
		SE->getConstant(StrideVal->getType(), 1, true));
		sanjoyUnsubmitted Not Done Reply Inline Actions There is now a `ScalarEvolution::getOne`, can you use that here? sanjoy: There is now a `ScalarEvolution::getOne`, can you use that here?

		Preds.add(SE->getEqualPredicate(U, CT));

		const SCEV *ByOne = SE->rewriteUsingPredicate(OrigSCEV, Preds);
DEBUG(dbgs() << "LAA: Replacing SCEV: " << OrigSCEV << " by: " << ByOne		DEBUG(dbgs() << "LAA: Replacing SCEV: " << OrigSCEV << " by: " << ByOne
<< "\n");		<< "\n");
return ByOne;		return ByOne;
}		}
		mzolotukhinUnsubmitted Done Reply Inline Actions Instead of storing `VOne` and `One` in a separate variables, I think we can use `ScalarEvolution::getConstant(Type Ty, uint64_t V, bool isSigned)` here. mzolotukhin:* Instead of storing `VOne` and `One` in a separate variables, I think we can use…

// Otherwise, just return the SCEV of the original pointer.		// Otherwise, just return the SCEV of the original pointer.
return SE->getSCEV(Ptr);		return OrigSCEV;
}		}

void RuntimePointerChecking::insert(Loop Lp, Value Ptr, bool WritePtr,		void RuntimePointerChecking::insert(Loop Lp, Value Ptr, bool WritePtr,
unsigned DepSetId, unsigned ASId,		unsigned DepSetId, unsigned ASId,
const ValueToValueMap &Strides) {		const ValueToValueMap &Strides,
		SCEVUnionPredicate &Preds) {
// Get the stride replaced scev.		// Get the stride replaced scev.
const SCEV *Sc = replaceSymbolicStrideSCEV(SE, Strides, Ptr);		const SCEV *Sc = replaceSymbolicStrideSCEV(SE, Strides, Preds, Ptr);
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);
assert(AR && "Invalid addrec expression");		assert(AR && "Invalid addrec expression");
const SCEV *Ex = SE->getBackedgeTakenCount(Lp);		const SCEV *Ex = SE->getBackedgeTakenCount(Lp);

const SCEV *ScStart = AR->getStart();		const SCEV *ScStart = AR->getStart();
const SCEV ScEnd = AR->evaluateAtIteration(Ex, SE);		const SCEV ScEnd = AR->evaluateAtIteration(Ex, SE);
const SCEV Step = AR->getStepRecurrence(SE);		const SCEV Step = AR->getStepRecurrence(SE);

// For expressions with negative step, the upper bound is ScStart and the		// For expressions with negative step, the upper bound is ScStart and the
// lower bound is ScEnd.		// lower bound is ScEnd.
if (const SCEVConstant *CStep = dyn_cast<const SCEVConstant>(Step)) {		if (const SCEVConstant *CStep = dyn_cast<const SCEVConstant>(Step)) {
if (CStep->getValue()->isNegative())		if (CStep->getValue()->isNegative())
▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines
/// dependence checking.		/// dependence checking.
class AccessAnalysis {		class AccessAnalysis {
public:		public:
/// \brief Read or write access location.		/// \brief Read or write access location.
typedef PointerIntPair<Value *, 1, bool> MemAccessInfo;		typedef PointerIntPair<Value *, 1, bool> MemAccessInfo;
typedef SmallPtrSet<MemAccessInfo, 8> MemAccessInfoSet;		typedef SmallPtrSet<MemAccessInfo, 8> MemAccessInfoSet;

AccessAnalysis(const DataLayout &Dl, AliasAnalysis AA, LoopInfo LI,		AccessAnalysis(const DataLayout &Dl, AliasAnalysis AA, LoopInfo LI,
MemoryDepChecker::DepCandidates &DA)		MemoryDepChecker::DepCandidates &DA, SCEVUnionPredicate &Preds)
: DL(Dl), AST(*AA), LI(LI), DepCands(DA),		: DL(Dl), AST(*AA), LI(LI), DepCands(DA), IsRTCheckAnalysisNeeded(false),
		mzolotukhinUnsubmitted Done Reply Inline Actions Nitpick: here the predicate set is called `Preds`, in other functions (e.g. `RuntimePointerChecking::insert`) it's called `Pred`. mzolotukhin: Nitpick: here the predicate set is called `Preds`, in other functions (e.g.
IsRTCheckAnalysisNeeded(false) {}		Preds(Preds) {}

/// \brief Register a load and whether it is only read from.		/// \brief Register a load and whether it is only read from.
void addLoad(MemoryLocation &Loc, bool IsReadOnly) {		void addLoad(MemoryLocation &Loc, bool IsReadOnly) {
Value Ptr = const_cast<Value>(Loc.Ptr);		Value Ptr = const_cast<Value>(Loc.Ptr);
AST.add(Ptr, MemoryLocation::UnknownSize, Loc.AATags);		AST.add(Ptr, MemoryLocation::UnknownSize, Loc.AATags);
Accesses.insert(MemAccessInfo(Ptr, false));		Accesses.insert(MemAccessInfo(Ptr, false));
if (IsReadOnly)		if (IsReadOnly)
ReadOnlyPtr.insert(Ptr);		ReadOnlyPtr.insert(Ptr);
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	private:
/// \brief Initial processing of memory accesses determined that we may need		/// \brief Initial processing of memory accesses determined that we may need
/// to add memchecks. Perform the analysis to determine the necessary checks.		/// to add memchecks. Perform the analysis to determine the necessary checks.
///		///
/// Note that, this is different from isDependencyCheckNeeded. When we retry		/// Note that, this is different from isDependencyCheckNeeded. When we retry
/// memcheck analysis without dependency checking		/// memcheck analysis without dependency checking
/// (i.e. ShouldRetryWithRuntimeCheck), isDependencyCheckNeeded is cleared		/// (i.e. ShouldRetryWithRuntimeCheck), isDependencyCheckNeeded is cleared
/// while this remains set if we have potentially dependent accesses.		/// while this remains set if we have potentially dependent accesses.
bool IsRTCheckAnalysisNeeded;		bool IsRTCheckAnalysisNeeded;

		/// The SCEV predicate containing all the SCEV-related assumptions.
		SCEVUnionPredicate &Preds;
};		};

} // end anonymous namespace		} // end anonymous namespace

/// \brief Check whether a pointer can participate in a runtime bounds check.		/// \brief Check whether a pointer can participate in a runtime bounds check.
static bool hasComputableBounds(ScalarEvolution *SE,		static bool hasComputableBounds(ScalarEvolution *SE,
const ValueToValueMap &Strides, Value *Ptr) {		const ValueToValueMap &Strides, Value *Ptr,
const SCEV *PtrScev = replaceSymbolicStrideSCEV(SE, Strides, Ptr);		Loop *L, SCEVUnionPredicate &Preds) {
		const SCEV *PtrScev = replaceSymbolicStrideSCEV(SE, Strides, Preds, Ptr);
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
if (!AR)		if (!AR)
return false;		return false;

return AR->isAffine();		return AR->isAffine();
}		}

bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,		bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
Show All 26 Lines	for (auto A : AS) {
bool IsWrite = Accesses.count(MemAccessInfo(Ptr, true));		bool IsWrite = Accesses.count(MemAccessInfo(Ptr, true));
MemAccessInfo Access(Ptr, IsWrite);		MemAccessInfo Access(Ptr, IsWrite);

if (IsWrite)		if (IsWrite)
++NumWritePtrChecks;		++NumWritePtrChecks;
else		else
++NumReadPtrChecks;		++NumReadPtrChecks;

if (hasComputableBounds(SE, StridesMap, Ptr) &&		if (hasComputableBounds(SE, StridesMap, Ptr, TheLoop, Preds) &&
// When we run after a failing dependency check we have to make sure		// When we run after a failing dependency check we have to make sure
// we don't have wrapping pointers.		// we don't have wrapping pointers.
(!ShouldCheckStride \|\|		(!ShouldCheckStride \|\|
isStridedPtr(SE, Ptr, TheLoop, StridesMap) == 1)) {		isStridedPtr(SE, Ptr, TheLoop, StridesMap, Preds) == 1)) {
// The id of the dependence set.		// The id of the dependence set.
unsigned DepId;		unsigned DepId;

if (IsDepCheckNeeded) {		if (IsDepCheckNeeded) {
Value *Leader = DepCands.getLeaderValue(Access).getPointer();		Value *Leader = DepCands.getLeaderValue(Access).getPointer();
unsigned &LeaderId = DepSetId[Leader];		unsigned &LeaderId = DepSetId[Leader];
if (!LeaderId)		if (!LeaderId)
LeaderId = RunningDepId++;		LeaderId = RunningDepId++;
DepId = LeaderId;		DepId = LeaderId;
} else		} else
// Each access has its own dependence set.		// Each access has its own dependence set.
DepId = RunningDepId++;		DepId = RunningDepId++;

RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap);		RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap, Preds);

DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');		DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');
} else {		} else {
DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');		DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');
CanDoRT = false;		CanDoRT = false;
}		}
}		}

▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	if (OBO->hasNoSignedWrap() &&
return OpAR->getLoop() == L && OpAR->getNoWrapFlags(SCEV::FlagNSW);		return OpAR->getLoop() == L && OpAR->getNoWrapFlags(SCEV::FlagNSW);
}		}

return false;		return false;
}		}

/// \brief Check whether the access through \p Ptr has a constant stride.		/// \brief Check whether the access through \p Ptr has a constant stride.
int llvm::isStridedPtr(ScalarEvolution SE, Value Ptr, const Loop *Lp,		int llvm::isStridedPtr(ScalarEvolution SE, Value Ptr, const Loop *Lp,
const ValueToValueMap &StridesMap) {		const ValueToValueMap &StridesMap,
		SCEVUnionPredicate &Preds) {
Type *Ty = Ptr->getType();		Type *Ty = Ptr->getType();
assert(Ty->isPointerTy() && "Unexpected non-ptr");		assert(Ty->isPointerTy() && "Unexpected non-ptr");

// Make sure that the pointer does not point to aggregate types.		// Make sure that the pointer does not point to aggregate types.
auto *PtrTy = cast<PointerType>(Ty);		auto *PtrTy = cast<PointerType>(Ty);
if (PtrTy->getElementType()->isAggregateType()) {		if (PtrTy->getElementType()->isAggregateType()) {
DEBUG(dbgs() << "LAA: Bad stride - Not a pointer to a scalar type"		DEBUG(dbgs() << "LAA: Bad stride - Not a pointer to a scalar type"
<< *Ptr << "\n");		<< *Ptr << "\n");
return 0;		return 0;
}		}

const SCEV *PtrScev = replaceSymbolicStrideSCEV(SE, StridesMap, Ptr);		const SCEV *PtrScev = replaceSymbolicStrideSCEV(SE, StridesMap, Preds, Ptr);

const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
if (!AR) {		if (!AR) {
DEBUG(dbgs() << "LAA: Bad stride - Not an AddRecExpr pointer "		DEBUG(dbgs() << "LAA: Bad stride - Not an AddRecExpr pointer "
<< Ptr << " SCEV: " << PtrScev << "\n");		<< Ptr << " SCEV: " << PtrScev << "\n");
return 0;		return 0;
}		}

▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
if (!AIsWrite && !BIsWrite)		if (!AIsWrite && !BIsWrite)
return Dependence::NoDep;		return Dependence::NoDep;

// We cannot check pointers in different address spaces.		// We cannot check pointers in different address spaces.
if (APtr->getType()->getPointerAddressSpace() !=		if (APtr->getType()->getPointerAddressSpace() !=
BPtr->getType()->getPointerAddressSpace())		BPtr->getType()->getPointerAddressSpace())
return Dependence::Unknown;		return Dependence::Unknown;

const SCEV *AScev = replaceSymbolicStrideSCEV(SE, Strides, APtr);		const SCEV *AScev = replaceSymbolicStrideSCEV(SE, Strides, Preds, APtr);
const SCEV *BScev = replaceSymbolicStrideSCEV(SE, Strides, BPtr);		const SCEV *BScev = replaceSymbolicStrideSCEV(SE, Strides, Preds, BPtr);

int StrideAPtr = isStridedPtr(SE, APtr, InnermostLoop, Strides);		int StrideAPtr = isStridedPtr(SE, APtr, InnermostLoop, Strides, Preds);
int StrideBPtr = isStridedPtr(SE, BPtr, InnermostLoop, Strides);		int StrideBPtr = isStridedPtr(SE, BPtr, InnermostLoop, Strides, Preds);

const SCEV *Src = AScev;		const SCEV *Src = AScev;
const SCEV *Sink = BScev;		const SCEV *Sink = BScev;

// If the induction step is negative we have to invert source and sink of the		// If the induction step is negative we have to invert source and sink of the
// dependence.		// dependence.
if (StrideAPtr < 0) {		if (StrideAPtr < 0) {
//Src = BScev;		//Src = BScev;
▲ Show 20 Lines • Show All 382 Lines • ▼ Show 20 Lines	void LoopAccessInfo::analyzeLoop(const ValueToValueMap &Strides) {
if (!Stores.size()) {		if (!Stores.size()) {
DEBUG(dbgs() << "LAA: Found a read-only loop!\n");		DEBUG(dbgs() << "LAA: Found a read-only loop!\n");
CanVecMem = true;		CanVecMem = true;
return;		return;
}		}

MemoryDepChecker::DepCandidates DependentAccesses;		MemoryDepChecker::DepCandidates DependentAccesses;
AccessAnalysis Accesses(TheLoop->getHeader()->getModule()->getDataLayout(),		AccessAnalysis Accesses(TheLoop->getHeader()->getModule()->getDataLayout(),
AA, LI, DependentAccesses);		AA, LI, DependentAccesses, Preds);

// Holds the analyzed pointers. We don't want to call GetUnderlyingObjects		// Holds the analyzed pointers. We don't want to call GetUnderlyingObjects
// multiple times on the same object. If the ptr is accessed twice, once		// multiple times on the same object. If the ptr is accessed twice, once
// for read and once for write, it will only appear once (on the write		// for read and once for write, it will only appear once (on the write
// list). This is okay, since we are going to check for conflicts between		// list). This is okay, since we are going to check for conflicts between
// writes and between reads and writes, but not between reads and reads.		// writes and between reads and writes, but not between reads and reads.
ValueSet Seen;		ValueSet Seen;

Show All 34 Lines	for (I = Loads.begin(), IE = Loads.end(); I != IE; ++I) {
// read list. If we did see it before, then it is already in		// read list. If we did see it before, then it is already in
// the read-write list. This allows us to vectorize expressions		// the read-write list. This allows us to vectorize expressions
// such as A[i] += x; Because the address of A[i] is a read-write		// such as A[i] += x; Because the address of A[i] is a read-write
// pointer. This only works if the index of A[i] is consecutive.		// pointer. This only works if the index of A[i] is consecutive.
// If the address of i is unknown (for example A[B[i]]) then we may		// If the address of i is unknown (for example A[B[i]]) then we may
// read a few words, modify, and write a few words, and some of the		// read a few words, modify, and write a few words, and some of the
// words may be written to the same address.		// words may be written to the same address.
bool IsReadOnlyPtr = false;		bool IsReadOnlyPtr = false;
if (Seen.insert(Ptr).second \|\| !isStridedPtr(SE, Ptr, TheLoop, Strides)) {		if (Seen.insert(Ptr).second \|\|
		!isStridedPtr(SE, Ptr, TheLoop, Strides, Preds)) {
++NumReads;		++NumReads;
IsReadOnlyPtr = true;		IsReadOnlyPtr = true;
}		}

MemoryLocation Loc = MemoryLocation::get(LD);		MemoryLocation Loc = MemoryLocation::get(LD);
// The TBAA metadata could have a control dependency on the predication		// The TBAA metadata could have a control dependency on the predication
// condition, so we cannot rely on it when determining whether or not we		// condition, so we cannot rely on it when determining whether or not we
// need runtime pointer checks.		// need runtime pointer checks.
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	LoopAccessInfo::addRuntimeChecks(Instruction *Loc) const {
return addRuntimeChecks(Loc, PtrRtChecking.getChecks());		return addRuntimeChecks(Loc, PtrRtChecking.getChecks());
}		}

LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,		LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,
const DataLayout &DL,		const DataLayout &DL,
const TargetLibraryInfo TLI, AliasAnalysis AA,		const TargetLibraryInfo TLI, AliasAnalysis AA,
DominatorTree DT, LoopInfo LI,		DominatorTree DT, LoopInfo LI,
const ValueToValueMap &Strides)		const ValueToValueMap &Strides)
: PtrRtChecking(SE), DepChecker(SE, L), TheLoop(L), SE(SE), DL(DL),		: PtrRtChecking(SE), DepChecker(SE, L, Preds), TheLoop(L), SE(SE), DL(DL),
TLI(TLI), AA(AA), DT(DT), LI(LI), NumLoads(0), NumStores(0),		TLI(TLI), AA(AA), DT(DT), LI(LI), NumLoads(0), NumStores(0),
MaxSafeDepDistBytes(-1U), CanVecMem(false),		MaxSafeDepDistBytes(-1U), CanVecMem(false),
StoreToLoopInvariantAddress(false) {		StoreToLoopInvariantAddress(false) {
if (canAnalyzeLoop())		if (canAnalyzeLoop())
analyzeLoop(Strides);		analyzeLoop(Strides);
}		}

void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {		void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {
Show All 18 Lines	void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {

// List the pair of accesses need run-time checks to prove independence.		// List the pair of accesses need run-time checks to prove independence.
PtrRtChecking.print(OS, Depth);		PtrRtChecking.print(OS, Depth);
OS << "\n";		OS << "\n";

OS.indent(Depth) << "Store to invariant address was "		OS.indent(Depth) << "Store to invariant address was "
<< (StoreToLoopInvariantAddress ? "" : "not ")		<< (StoreToLoopInvariantAddress ? "" : "not ")
<< "found in loop.\n";		<< "found in loop.\n";

		OS.indent(Depth) << "SCEV assumptions:\n";
		Preds.print(OS, Depth);
}		}

const LoopAccessInfo &		const LoopAccessInfo &
LoopAccessAnalysis::getInfo(Loop *L, const ValueToValueMap &Strides) {		LoopAccessAnalysis::getInfo(Loop *L, const ValueToValueMap &Strides,
		const unsigned MaxSCEVPredicates) {
		anemetUnsubmitted Not Done Reply Inline Actions Ah, MaxSCEVPredicates is unused. anemet: Ah, MaxSCEVPredicates is unused.
auto &LAI = LoopAccessInfoMap[L];		auto &LAI = LoopAccessInfoMap[L];

#ifndef NDEBUG		#ifndef NDEBUG
assert((!LAI \|\| LAI->NumSymbolicStrides == Strides.size()) &&		assert((!LAI \|\| LAI->NumSymbolicStrides == Strides.size()) &&
"Symbolic strides changed for loop");		"Symbolic strides changed for loop");
#endif		#endif

if (!LAI) {		if (!LAI) {
const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();		const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();
LAI = llvm::make_unique<LoopAccessInfo>(L, SE, DL, TLI, AA, DT, LI,		LAI =
Strides);		llvm::make_unique<LoopAccessInfo>(L, SE, DL, TLI, AA, DT, LI, Strides);
#ifndef NDEBUG		#ifndef NDEBUG
LAI->NumSymbolicStrides = Strides.size();		LAI->NumSymbolicStrides = Strides.size();
#endif		#endif
}		}
return *LAI.get();		return *LAI.get();
}		}

void LoopAccessAnalysis::print(raw_ostream &OS, const Module *M) const {		void LoopAccessAnalysis::print(raw_ostream &OS, const Module *M) const {
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include "llvm/ADT/SmallPtrSet.h"			#include "llvm/ADT/SmallPtrSet.h"
	#include "llvm/ADT/Statistic.h"			#include "llvm/ADT/Statistic.h"
	#include "llvm/Analysis/AssumptionCache.h"			#include "llvm/Analysis/AssumptionCache.h"
	#include "llvm/Analysis/ConstantFolding.h"			#include "llvm/Analysis/ConstantFolding.h"
	#include "llvm/Analysis/InstructionSimplify.h"			#include "llvm/Analysis/InstructionSimplify.h"
	#include "llvm/Analysis/LoopInfo.h"			#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/ScalarEvolutionExpander.h"
	#include "llvm/Analysis/ScalarEvolutionExpressions.h"			#include "llvm/Analysis/ScalarEvolutionExpressions.h"
	#include "llvm/Analysis/TargetLibraryInfo.h"			#include "llvm/Analysis/TargetLibraryInfo.h"
	#include "llvm/Analysis/ValueTracking.h"			#include "llvm/Analysis/ValueTracking.h"
	#include "llvm/IR/ConstantRange.h"			#include "llvm/IR/ConstantRange.h"
	#include "llvm/IR/Constants.h"			#include "llvm/IR/Constants.h"
	#include "llvm/IR/DataLayout.h"			#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/DerivedTypes.h"			#include "llvm/IR/DerivedTypes.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	Show All 27 Lines

	static cl::opt<unsigned>			static cl::opt<unsigned>
	MaxBruteForceIterations("scalar-evolution-max-iterations", cl::ReallyHidden,			MaxBruteForceIterations("scalar-evolution-max-iterations", cl::ReallyHidden,
	cl::desc("Maximum number of iterations SCEV will "			cl::desc("Maximum number of iterations SCEV will "
	"symbolically execute a constant "			"symbolically execute a constant "
	"derived loop"),			"derived loop"),
	cl::init(100));			cl::init(100));

	// FIXME: Enable this with XDEBUG when the test suite is clean.			// FIXME: Enable this with XDEBUG when the test suite is clean.
	static cl::opt<bool>			static cl::opt<bool>
	VerifySCEV("verify-scev",			VerifySCEV("verify-scev",
	cl::desc("Verify ScalarEvolution's backedge taken counts (slow)"));			cl::desc("Verify ScalarEvolution's backedge taken counts (slow)"));

				anemetUnsubmitted Done Reply Inline Actions I don't think that this should be central threshold. It should be up to the client transformation to decide if the benefit of the transformation outweighs the overhead of the necessary checks. anemet: I don't think that this should be central threshold. It should be up to the client…
				sbarangaAuthorUnsubmitted Done Reply Inline Actions Ok, I'll remove this. sbaranga: Ok, I'll remove this.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SCEV class definitions			// SCEV class definitions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Implementation of the SCEV class.			// Implementation of the SCEV class.
	//			//

	▲ Show 20 Lines • Show All 9,107 Lines • ▼ Show 20 Lines

	void ScalarEvolutionWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {			void ScalarEvolutionWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
	AU.setPreservesAll();			AU.setPreservesAll();
	AU.addRequiredTransitive<AssumptionCacheTracker>();			AU.addRequiredTransitive<AssumptionCacheTracker>();
	AU.addRequiredTransitive<LoopInfoWrapperPass>();			AU.addRequiredTransitive<LoopInfoWrapperPass>();
	AU.addRequiredTransitive<DominatorTreeWrapperPass>();			AU.addRequiredTransitive<DominatorTreeWrapperPass>();
	AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();			AU.addRequiredTransitive<TargetLibraryInfoWrapperPass>();
	}			}

				static Instruction getFirstInst(Instruction FirstInst, Value *V,
				Instruction *Loc) {
				if (FirstInst)
				return FirstInst;
				if (auto I = dyn_cast<Instruction>(V))
				sanjoyUnsubmitted Not Done Reply Inline Actions Use `auto I` here. sanjoy:* Use `auto *I` here.
				sanjoyUnsubmitted Not Done Reply Inline Actions Isn't `V` always an instruction? sanjoy: Isn't `V` always an instruction?
				return I->getParent() == Loc->getParent() ? I : nullptr;
				return nullptr;
				}
				anemetUnsubmitted Not Done Reply Inline Actions Hmm, I think we're duplicating this now into the third file. Any better idea? anemet: Hmm, I think we're duplicating this now into the third file. Any better idea?
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions We remove this from the loop vectorizer so we end up with the same number of copies. There's still some code duplication. sbaranga: We remove this from the loop vectorizer so we end up with the same number of copies. There's…

				const SCEVPredicate *
				anemetUnsubmitted Not Done Reply Inline Actions Should probably be a class since not all members are public. Also it needs a comment. This may be a silly question but why we need to override all these members? anemet: Should probably be a class since not all members are public. Also it needs a comment. This…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions I think because SCEVVisitor is a template that uses the visit* methods without defining them, users must define all the methods (or get a compilation error). Not really nice. I'll make the changes (add a comment and convert to a class). sbaranga: I think because SCEVVisitor is a template that uses the visit* methods without defining them…
				anemetUnsubmitted Not Done Reply Inline Actions I think because SCEVVisitor is a template that uses the visit* methods without defining them, users must define all the methods (or get a compilation error). Not really nice. Can this be derived from SCEVParameterRewriter since we only support the equality predicate right now? Then we can extend it later as we add the other predicates. anemet: > I think because SCEVVisitor is a template that uses the visit* methods without defining them…
				mzolotukhinUnsubmitted Not Done Reply Inline Actions users must define all the methods (or get a compilation error). Not really nice. No, you don't have to override all of them, you only need to define the ones you want to change. For generic case, you could override `visitInstruction` method. As an example, you could look at `class UnrolledInstAnalyzer` in `lib/Transforms/Scalar/LoopUnrollPass.cpp`. mzolotukhin: > users must define all the methods (or get a compilation error). Not really nice. No, you…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions I think we have a different case here, as we would like to visit SCEV expressions. I think that the LoopUnrollPass examples visits instructions? sbaranga: I think we have a different case here, as we would like to visit SCEV expressions. I think that…
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Yes, you're right, I misread the code. I think it would make sense to factor out common ('default' implementation) part into a separate class. It will also remove some code duplication from `SCEVApplyRewriter` and `SCEVParameterRewriter`. That should be a separate NFC patch, but I think we need to do it before landing this one. mzolotukhin: Yes, you're right, I misread the code. I think it would make sense to factor out common…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Makes sense. I've create another review for this change: http://reviews.llvm.org/D13242 sbaranga: Makes sense. I've create another review for this change: http://reviews.llvm.org/D13242
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Thanks! FWIW, I like that patch, but I'd like to leave it up to Andy or Sanjoy for a final approval. mzolotukhin: Thanks! FWIW, I like that patch, but I'd like to leave it up to Andy or Sanjoy for a final…
				ScalarEvolution::getEqualPredicate(const SCEVUnknown *LHS,
				const SCEVConstant *RHS) {
				FoldingSetNodeID ID;
				// Unique this node based on the arguments
				ID.AddPointer(LHS);
				ID.AddPointer(RHS);
				sanjoyUnsubmitted Not Done Reply Inline Actions Don't you need to add the type of the predicate to `ID`? sanjoy: Don't you need to add the type of the predicate to `ID`?
				void *IP = nullptr;
				if (const auto *S = UniquePreds.FindNodeOrInsertPos(ID, IP))
				return S;
				SCEVEqualPredicate *Eq = new (SCEVAllocator)
				SCEVEqualPredicate(ID.Intern(SCEVAllocator), LHS, RHS);
				UniquePreds.InsertNode(Eq, IP);
				return Eq;
				}

				class SCEVPredicateRewriter : public SCEVRewriteVisitor<SCEVPredicateRewriter> {
				public:
				static const SCEV rewrite(const SCEV Scev, ScalarEvolution &SE,
				SCEVUnionPredicate &A) {
				SCEVPredicateRewriter Rewriter(SE, A);
				return Rewriter.visit(Scev);
				}

				SCEVPredicateRewriter(ScalarEvolution &SE, SCEVUnionPredicate &P)
				: SCEVRewriteVisitor(SE), P(P) {}

				const SCEV visitUnknown(const SCEVUnknown Expr) {
				auto ExprPreds = P.getPredicatesForExpr(Expr);
				if (ExprPreds)
				for (auto Pred : ExprPreds)
				if (const SCEVEqualPredicate *IPred =
				sanjoyUnsubmitted Not Done Reply Inline Actions Use `const auto ` because the type is obvious from the RHS. sanjoy:* Use `const auto *` because the type is obvious from the RHS.
				dyn_cast<const SCEVEqualPredicate>(Pred)) {
				if (IPred->getLHS() == Expr)
				return IPred->getRHS();
				}

				return Expr;
				}

				private:
				sanjoyUnsubmitted Done Reply Inline Actions Use `const auto `. sanjoy:* Use `const auto *`.
				SCEVUnionPredicate &P;
				};

				const SCEV ScalarEvolution::rewriteUsingPredicate(const SCEV Scev,
				SCEVUnionPredicate &Preds) {
				sanjoyUnsubmitted Done Reply Inline Actions Why not just `return (Op->E0 == E0 && Op->E1 == E1) \|\| (Op->E0 == E1 && Op->E1 == E0)`? sanjoy: Why not just `return (Op->E0 == E0 && Op->E1 == E1) \|\| (Op->E0 == E1 && Op->E1 == E0)`?
				sbarangaAuthorUnsubmitted Done Reply Inline Actions In fact if we know that LHS and RHS are different (one is a SCEVUnknown and the other a SCEVConstant) we can simplify that expression. sbaranga: In fact if we know that LHS and RHS are different (one is a SCEVUnknown and the other a…
				return SCEVPredicateRewriter::rewrite(Scev, *this, Preds);
				}

				//// SCEV predicates
				sanjoyUnsubmitted Not Done Reply Inline Actions Use three slashes. sanjoy: Use three slashes.
				SCEVPredicate::SCEVPredicate(const FoldingSetNodeIDRef ID, unsigned short Type)
				: FastID(ID), SCEVPredicateType(Type) {}

				SCEVEqualPredicate::SCEVEqualPredicate(const FoldingSetNodeIDRef ID,
				const SCEVUnknown *LHS,
				const SCEVConstant *RHS)
				: SCEVPredicate(ID, PEQUAL), LHS(LHS), RHS(RHS) {}

				bool SCEVEqualPredicate::implies(const SCEVPredicate *N) const {
				const auto *Op = dyn_cast<const SCEVEqualPredicate>(N);

				if (!Op)
				sanjoyUnsubmitted Done Reply Inline Actions Why not just `return Builder.CreateICmpNE(...)`? sanjoy: Why not just `return Builder.CreateICmpNE(...)`?
				return false;

				return Op->LHS == LHS && Op->RHS == RHS;
				}

				bool SCEVEqualPredicate::isAlwaysTrue() const { return false; }

				Value SCEVEqualPredicate::generateCheck(Instruction Loc, ScalarEvolution &SE,
				const DataLayout &DL,
				SCEVExpander &Exp) const {
				IRBuilder<> Builder(Loc);

				Value *Expr0 = Exp.expandCodeFor(LHS, LHS->getType(), Loc);
				Value *Expr1 = Exp.expandCodeFor(RHS, RHS->getType(), Loc);

				return Builder.CreateICmpNE(Expr0, Expr1, "ident.check");
				}

				const SCEV *SCEVEqualPredicate::getExpr() const { return LHS; }

				void SCEVEqualPredicate::print(raw_ostream &OS, unsigned Depth) const {
				OS.indent(Depth) << "Equal predicate: " << LHS << " == " << RHS << "\n";
				}
				sanjoyUnsubmitted Done Reply Inline Actions As I've mentioned in the declaration of `generateGuardCond`, I don't see why you need to return a range of instructions. I'd rather have this return a `Value `, and do away with `getFirstInst` and the fake `or` instruction. sanjoy:* As I've mentioned in the declaration of `generateGuardCond`, I don't see why you need to return…

				/// Union predicates don't get cached so create a dummy set ID for it.
				SCEVUnionPredicate::SCEVUnionPredicate()
				: SCEVPredicate(FoldingSetNodeIDRef(nullptr, 0), PSET) {}

				bool SCEVUnionPredicate::isAlwaysTrue() const {
				return std::all_of(Preds.begin(), Preds.end(),
				[](const SCEVPredicate *I) { return I->isAlwaysTrue(); });
				}

				std::pair<Instruction , Instruction >
				SCEVUnionPredicate::generateGuardCond(Instruction *Loc, ScalarEvolution &SE) {
				IRBuilder<> GuardBuilder(Loc);
				Instruction *FirstInst = nullptr;
				Module *M = Loc->getParent()->getParent()->getParent();
				const DataLayout &DL = M->getDataLayout();
				SCEVExpander Exp(SE, DL, "start");

				Value *Check = generateCheck(Loc, SE, DL, Exp);

				if (!Check)
				return std::make_pair(nullptr, nullptr);

				// IRBuilder might fold the checks to constant expressions. Make sure we have
				sanjoyUnsubmitted Not Done Reply Inline Actions If the `IRBuilder` returned a constant `Check`, why do you need to generate the check? I'd either change this to `assert(isa<Instruction>(Check))`, and fix the cases where the assert fails to return `true` from `isAlwaysTrue` (i.e. if the IRBuilder can prove the check is redundant, SCEV should be able to as well) change the interface to allow returning something that says "always true" / "always false" in addition to returning a pair of instructions I think creating a completely redundant instruction to satisfy internal interfaces is a code smell, and we should try to avoid it if reasonable. sanjoy: If the `IRBuilder` returned a constant `Check`, why do you need to generate the check? I'd…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions Yes, I think you're right. I would go with the first option. sbaranga: Yes, I think you're right. I would go with the first option.
				// at least one instruction by performing an OR with False.
				Instruction *CheckInst =
				BinaryOperator::CreateOr(Check, ConstantInt::getFalse(M->getContext()));
				GuardBuilder.Insert(CheckInst, "scev.check");

				FirstInst = getFirstInst(FirstInst, CheckInst, Loc);
				return std::make_pair(FirstInst, CheckInst);
				}

				SmallVectorImpl<const SCEVPredicate >
				SCEVUnionPredicate::getPredicatesForExpr(const SCEV *Expr) {
				auto I = SCEVToPreds.find(Expr);
				mzolotukhinUnsubmitted Done Reply Inline Actions Maybe just `return E0 == E1;` ? mzolotukhin: Maybe just `return E0 == E1;` ?
				if (I == SCEVToPreds.end())
				return nullptr;
				return &(I->second);
				}

				Value SCEVUnionPredicate::generateCheck(Instruction Loc, ScalarEvolution &SE,
				const DataLayout &DL,
				SCEVExpander &Exp) const {
				IRBuilder<> CheckBuilder(Loc);
				Value *AllCheck = nullptr;

				// Loop over all checks in this set.
				for (auto Pred : Preds) {
				sanjoyUnsubmitted Not Done Reply Inline Actions I think this should be `std::all_of` to be consistent with the interpretation of `SCEVUnionPredicate` in `generateCheck` -- to prove `(X\|Y)->Z` you need to prove `(X->Z)&&(Y->Z)`. sanjoy: I think this should be `std::all_of` to be consistent with the interpretation of…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions The union predicate function is an "and", so I think this makes the current implementation correct? sbaranga: The union predicate function is an "and", so I think this makes the current implementation…
				sanjoyUnsubmitted Not Done Reply Inline Actions In that case, shouldn't you have AllCheck = CheckBuilder.CreateAnd(AllCheck, CheckResult); in `SCEVUnionPredicate::generateCheck`? Or are you checking something different there? sanjoy: In that case, shouldn't you have ``` AllCheck = CheckBuilder.CreateAnd(AllCheck, CheckResult)…
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions See the comment bellow. sbaranga: See the comment bellow.
				if (Pred->isAlwaysTrue())
				mzolotukhinUnsubmitted Done Reply Inline Actions The predicate is called Equal, should we generate `ICmpEQ` for consistency here? mzolotukhin: The predicate is called Equal, should we generate `ICmpEQ` for consistency here?
				sbarangaAuthorUnsubmitted Done Reply Inline Actions The convention that I used so far was to generate the negative test (ICmpNE). I think the loop vectorizer / loop access analysis were doing the same thing for their checks? Would keeping it like this be a problem? sbaranga: The convention that I used so far was to generate the negative test (ICmpNE). I think the loop…
				mzolotukhinUnsubmitted Done Reply Inline Actions I think it's fine, but at least we should explicitly state somewhere that we're doing it this way. It'll prevent future readers of this code from being caught by surprise. mzolotukhin: I think it's fine, but at least we should explicitly state somewhere that we're doing it this…
				continue;

				Value *CheckResult = Pred->generateCheck(Loc, SE, DL, Exp);

				if (!AllCheck)
				AllCheck = CheckResult;
				else
				AllCheck = CheckBuilder.CreateOr(AllCheck, CheckResult);
				sanjoyUnsubmitted Not Done Reply Inline Actions Shouldn't this be `CreateAnd`? sanjoy: Shouldn't this be `CreateAnd`?
				sanjoyUnsubmitted Not Done Reply Inline Actions As mentioned earlier, IIUC this should be a `CreateAnd`. sanjoy: As mentioned earlier, IIUC this should be a `CreateAnd`.
				sbarangaAuthorUnsubmitted Not Done Reply Inline Actions The generated code checks for the negated condtion. So we basically end up with (or (not predicate1), (not predicate2), ...) which is the same as (not (and prediacte1, predicate2, ..) We do this because the current users already use this interface. This also avoids the need to explicitly create negations (we just create the negated check for each predicate). I think this at least needs a comment. Do you have a strong preference about this? sbaranga: The generated code checks for the negated condtion. So we basically end up with (or (not…
				}
				sanjoyUnsubmitted Done Reply Inline Actions Nit: here and elsewhere, prefer using `const auto ` or `auto ` when the type is obvious from the RHS. sanjoy: Nit: here and elsewhere, prefer using `const auto ` or `auto ` when the type is obvious from…

				sanjoyUnsubmitted Done Reply Inline Actions [Optional] Why not `std::any_of`? sanjoy: [Optional] Why not `std::any_of`?
				return AllCheck;
				}

				sanjoyUnsubmitted Done Reply Inline Actions As mentioned earlier, this allocation scheme is not quite right. sanjoy: As mentioned earlier, this allocation scheme is not quite right.
				bool SCEVUnionPredicate::implies(const SCEVPredicate *N) const {
				if (const auto *Set = dyn_cast<const SCEVUnionPredicate>(N))
				return std::all_of(
				mzolotukhinUnsubmitted Done Reply Inline Actions Please avoid computing `IdPreds.size()` on every iteration. Even better, we could use range loop here. mzolotukhin: Please avoid computing `IdPreds.size()` on every iteration. Even better, we could use range…
				Set->Preds.begin(), Set->Preds.end(),
				[this](const SCEVPredicate *I) { return this->implies(I); });

				auto ScevPredsIt = SCEVToPreds.find(N->getExpr());
				if (ScevPredsIt == SCEVToPreds.end())
				return false;
				auto &SCEVPreds = ScevPredsIt->second;

				mzolotukhinUnsubmitted Done Reply Inline Actions Another candidate for a range loop. mzolotukhin: Another candidate for a range loop.
				return std::any_of(SCEVPreds.begin(), SCEVPreds.end(),
				[N](const SCEVPredicate *I) { return I->implies(N); });
				mzolotukhinUnsubmitted Done Reply Inline Actions What does `OA` stand for? Maybe get rid of this var at all? Actually, this entire function could probably be replaced with `std::all_of`. mzolotukhin: What does `OA` stand for? Maybe get rid of this var at all? Actually, this entire function…
				}

				const SCEV *SCEVUnionPredicate::getExpr() const { return nullptr; }

				void SCEVUnionPredicate::print(raw_ostream &OS, unsigned Depth) const {
				for (auto Pred : Preds)
				Pred->print(OS, Depth);
				mzolotukhinUnsubmitted Done Reply Inline Actions Why are this and 9224 lines different? mzolotukhin: Why are this and 9224 lines different?
				sbarangaAuthorUnsubmitted Done Reply Inline Actions One of these should be removed, yes. sbaranga: One of these should be removed, yes.
				}

				void SCEVUnionPredicate::add(const SCEVPredicate *N) {
				mzolotukhinUnsubmitted Done Reply Inline Actions From previous iteration: Why do we need to do `OR` with `False`? mzolotukhin: From previous iteration: Why do we need to do `OR` with `False`?
				sbarangaAuthorUnsubmitted Done Reply Inline Actions I wrote this code some time ago but if I remember correctly this was the same reason why LoopAccessInfo::addRuntimeChecks does an AND with True: "We have to do this trickery because the IRBuilder might fold the check to a constant expression in which case there is no Instruction anchored in a the block.". sbaranga: I wrote this code some time ago but if I remember correctly this was the same reason why…
				if (const auto *Set = dyn_cast<const SCEVUnionPredicate>(N)) {
				for (auto Pred : Set->Preds)
				add(Pred);
				return;
				}

				if (implies(N))
				mzolotukhinUnsubmitted Done Reply Inline Actions Could we just return `std::make_pair(nullptr, nullptr)` as we do below? mzolotukhin: Could we just return `std::make_pair(nullptr, nullptr)` as we do below?
				return;

				anemetUnsubmitted Done Reply Inline Actions OF for overflow? ;) anemet: OF for overflow? ;)
				const SCEV *Key = N->getExpr();
				assert(Key && "Only SCEVUnionPredicate doesn't have an "
				" associated expression!");

				auto &Predicates = SCEVToPreds[Key];
				Predicates.push_back(N);
				sanjoyUnsubmitted Not Done Reply Inline Actions Why not just `SCEVToPreds[Key].push_back(N);`? sanjoy: Why not just `SCEVToPreds[Key].push_back(N);`?

				Preds.push_back(N);
				}
				hfinkelUnsubmitted Done Reply Inline Actions Can this take a threshold override, or similar, as a parameter to override SCEVCheckThreshold? We had specifically decided that loops decorated with #pragma clang vectorize(enable), which asks for vectorization but does not assert safety, would generate as many checks as necessary to enable vectorization (or be bound by some very large limit). For this case, we'll need to override the limit (or, at least, have a much larger limit). Generically, I'm skeptical of embedding the limit in SCEV at all; I think that the caller should always provide an appropriate limit for whatever happens to be its use case. hfinkel: Can this take a threshold override, or similar, as a parameter to override SCEVCheckThreshold?
				sbarangaAuthorUnsubmitted Done Reply Inline Actions That makes sense to me. There is currently the override in SCEV, but from your argument it looks like it should be passed in from the user. Would that put a complexity bound on our rewriting algorithms / estimates for how expensive using these predicates can be? So far I've worked under the assumption that we would have a limited number of predicates. sbaranga: That makes sense to me. There is currently the override in SCEV, but from your argument it…
				hfinkelUnsubmitted Done Reply Inline Actions Would that put a complexity bound on our rewriting algorithms / estimates for how expensive using these predicates can be? So far I've worked under the assumption that we would have a limited number of predicates. I'm not too concerned about this. In cases where we want to override the "generally reasonable" limit with a large one, at least currently, this is always in response to a direct request from the user (due to a pragma or similar), and so we get to assume that the user knows what he or she is doing. hfinkel: > Would that put a complexity bound on our rewriting algorithms / estimates for how expensive…
				sbarangaAuthorUnsubmitted Done Reply Inline Actions Makes sense. Thanks! sbaranga: Makes sense. Thanks!
				mzolotukhinUnsubmitted Done Reply Inline Actions I think it should be `>=` since we haven't added the new predicate yet. mzolotukhin: I think it should be `>=` since we haven't added the new predicate yet.
				mzolotukhinUnsubmitted Done Reply Inline Actions Why do we need to do or with `False`? mzolotukhin: Why do we need to do or with `False`?
				mzolotukhinUnsubmitted Done Reply Inline Actions Range loop? mzolotukhin: Range loop?
				mzolotukhinUnsubmitted Done Reply Inline Actions Good candidates for `std::all_of`/`std::any_of`. mzolotukhin: Good candidates for `std::all_of`/`std::any_of`.

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> MaxNestedScalarReductionIC(
cl::desc("The maximum interleave count to use when interleaving a scalar "		cl::desc("The maximum interleave count to use when interleaving a scalar "
"reduction in a nested loop."));		"reduction in a nested loop."));

static cl::opt<unsigned> PragmaVectorizeMemoryCheckThreshold(		static cl::opt<unsigned> PragmaVectorizeMemoryCheckThreshold(
"pragma-vectorize-memory-check-threshold", cl::init(128), cl::Hidden,		"pragma-vectorize-memory-check-threshold", cl::init(128), cl::Hidden,
cl::desc("The maximum allowed number of runtime memory checks with a "		cl::desc("The maximum allowed number of runtime memory checks with a "
"vectorize(enable) pragma."));		"vectorize(enable) pragma."));

		static cl::opt<unsigned> VectorizeSCEVCheckThreshold(
		"vectorize-scev-check-threshold", cl::init(16), cl::Hidden,
		cl::desc("The maximum number of SCEV checks allowed."));

		static cl::opt<unsigned> PragmaVectorizeSCEVCheckThreshold(
		"pragma-vectorize-scev-check-threshold", cl::init(128), cl::Hidden,
		cl::desc("The maximum number of SCEV checks allowed with a "
		"vectorize(enable) pragma"));

namespace {		namespace {

// Forward declarations.		// Forward declarations.
class LoopVectorizeHints;		class LoopVectorizeHints;
class LoopVectorizationLegality;		class LoopVectorizationLegality;
class LoopVectorizationCostModel;		class LoopVectorizationCostModel;
class LoopVectorizationRequirements;		class LoopVectorizationRequirements;

Show All 35 Lines
/// aspects. The InnerLoopVectorizer relies on the		/// aspects. The InnerLoopVectorizer relies on the
/// LoopVectorizationLegality class to provide information about the induction		/// LoopVectorizationLegality class to provide information about the induction
/// and reduction variables that were found to a given vectorization factor.		/// and reduction variables that were found to a given vectorization factor.
class InnerLoopVectorizer {		class InnerLoopVectorizer {
public:		public:
InnerLoopVectorizer(Loop OrigLoop, ScalarEvolution SE, LoopInfo *LI,		InnerLoopVectorizer(Loop OrigLoop, ScalarEvolution SE, LoopInfo *LI,
DominatorTree DT, const TargetLibraryInfo TLI,		DominatorTree DT, const TargetLibraryInfo TLI,
const TargetTransformInfo *TTI, unsigned VecWidth,		const TargetTransformInfo *TTI, unsigned VecWidth,
unsigned UnrollFactor)		unsigned UnrollFactor, SCEVUnionPredicate &Preds)
: OrigLoop(OrigLoop), SE(SE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),		: OrigLoop(OrigLoop), SE(SE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),
VF(VecWidth), UF(UnrollFactor), Builder(SE->getContext()),		VF(VecWidth), UF(UnrollFactor), Builder(SE->getContext()),
Induction(nullptr), OldInduction(nullptr), WidenMap(UnrollFactor),		Induction(nullptr), OldInduction(nullptr), WidenMap(UnrollFactor),
TripCount(nullptr), VectorTripCount(nullptr), Legal(nullptr),		TripCount(nullptr), VectorTripCount(nullptr), Legal(nullptr),
AddedSafetyChecks(false) {}		AddedSafetyChecks(false), Preds(Preds) {}

// Perform the actual loop widening (vectorization).		// Perform the actual loop widening (vectorization).
void vectorize(LoopVectorizationLegality *L) {		void vectorize(LoopVectorizationLegality *L) {
Legal = L;		Legal = L;
// Create a new empty loop. Unlink the old loop and connect the new one.		// Create a new empty loop. Unlink the old loop and connect the new one.
createEmptyLoop();		createEmptyLoop();
// Widen each instruction in the old loop to a new one in the new loop.		// Widen each instruction in the old loop to a new one in the new loop.
// Use the Legality module to find the induction and reduction variables.		// Use the Legality module to find the induction and reduction variables.
Show All 15 Lines	protected:
/// originated from one scalar instruction.		/// originated from one scalar instruction.
typedef SmallVector<Value*, 2> VectorParts;		typedef SmallVector<Value*, 2> VectorParts;

// When we if-convert we need to create edge masks. We have to cache values		// When we if-convert we need to create edge masks. We have to cache values
// so that we don't end up with exponential recursion/IR.		// so that we don't end up with exponential recursion/IR.
typedef DenseMap<std::pair<BasicBlock, BasicBlock>,		typedef DenseMap<std::pair<BasicBlock, BasicBlock>,
VectorParts> EdgeMaskCache;		VectorParts> EdgeMaskCache;

/// \brief Add checks for strides that were assumed to be 1.
///
/// Returns the last check instruction and the first check instruction in the
/// pair as (first, last).
std::pair<Instruction , Instruction > addStrideCheck(Instruction *Loc);

/// Create an empty loop, based on the loop ranges of the old loop.		/// Create an empty loop, based on the loop ranges of the old loop.
void createEmptyLoop();		void createEmptyLoop();
/// Create a new induction variable inside L.		/// Create a new induction variable inside L.
PHINode createInductionVariable(Loop L, Value Start, Value End,		PHINode createInductionVariable(Loop L, Value Start, Value End,
		anemetUnsubmitted Done Reply Inline Actions Overflow again. anemet: Overflow again.
Value Step, Instruction DL);		Value Step, Instruction DL);
/// Copy and widen the instructions from the old loop.		/// Copy and widen the instructions from the old loop.
virtual void vectorizeLoop();		virtual void vectorizeLoop();

/// \brief The Loop exit block may have single value PHI nodes where the		/// \brief The Loop exit block may have single value PHI nodes where the
/// incoming value is 'Undef'. While vectorizing we only handled real values		/// incoming value is 'Undef'. While vectorizing we only handled real values
/// that were defined inside the loop. Here we fix the 'undef case'.		/// that were defined inside the loop. Here we fix the 'undef case'.
/// See PR14725.		/// See PR14725.
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	protected:
/// Returns (and creates if needed) the trip count of the widened loop.		/// Returns (and creates if needed) the trip count of the widened loop.
Value getOrCreateVectorTripCount(Loop NewLoop);		Value getOrCreateVectorTripCount(Loop NewLoop);

/// Emit a bypass check to see if the trip count would overflow, or we		/// Emit a bypass check to see if the trip count would overflow, or we
/// wouldn't have enough iterations to execute one vector loop.		/// wouldn't have enough iterations to execute one vector loop.
void emitMinimumIterationCountCheck(Loop L, BasicBlock Bypass);		void emitMinimumIterationCountCheck(Loop L, BasicBlock Bypass);
/// Emit a bypass check to see if the vector trip count is nonzero.		/// Emit a bypass check to see if the vector trip count is nonzero.
void emitVectorLoopEnteredCheck(Loop L, BasicBlock Bypass);		void emitVectorLoopEnteredCheck(Loop L, BasicBlock Bypass);
/// Emit bypass checks to check if strides we've assumed to be one really are.
void emitStrideChecks(Loop L, BasicBlock Bypass);		void emitSCEVChecks(Loop L, BasicBlock Bypass);
		anemetUnsubmitted Not Done Reply Inline Actions Comment missing anemet: Comment missing
/// Emit bypass checks to check any memory assumptions we may have made.		/// Emit bypass checks to check any memory assumptions we may have made.
void emitMemRuntimeChecks(Loop L, BasicBlock Bypass);		void emitMemRuntimeChecks(Loop L, BasicBlock Bypass);

/// This is a helper class that holds the vectorizer state. It maps scalar		/// This is a helper class that holds the vectorizer state. It maps scalar
/// instructions to vector instructions. When the code is 'unrolled' then		/// instructions to vector instructions. When the code is 'unrolled' then
/// then a single scalar value is mapped to multiple vector parts. The parts		/// then a single scalar value is mapped to multiple vector parts. The parts
/// are stored in the VectorPart type.		/// are stored in the VectorPart type.
struct ValueMap {		struct ValueMap {
/// C'tor. UnrollFactor controls the number of vectors ('parts') that		/// C'tor. UnrollFactor controls the number of vectors ('parts') that
/// are mapped.		/// are mapped.
ValueMap(unsigned UnrollFactor) : UF(UnrollFactor) {}		ValueMap(unsigned UnrollFactor) : UF(UnrollFactor) {}
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	protected:
Value *TripCount;		Value *TripCount;
/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))		/// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
Value *VectorTripCount;		Value *VectorTripCount;

LoopVectorizationLegality *Legal;		LoopVectorizationLegality *Legal;

// Record whether runtime check is added.		// Record whether runtime check is added.
bool AddedSafetyChecks;		bool AddedSafetyChecks;

		/// The SCEV predicate containing all the SCEV-related assumptions.
		SCEVUnionPredicate &Preds;
		anemetUnsubmitted Not Done Reply Inline Actions Here too please expand the comment to explain what these assumptions are used for. anemet: Here too please expand the comment to explain what these assumptions are used for.
};		};

class InnerLoopUnroller : public InnerLoopVectorizer {		class InnerLoopUnroller : public InnerLoopVectorizer {
public:		public:
InnerLoopUnroller(Loop OrigLoop, ScalarEvolution SE, LoopInfo *LI,		InnerLoopUnroller(Loop OrigLoop, ScalarEvolution SE, LoopInfo *LI,
DominatorTree DT, const TargetLibraryInfo TLI,		DominatorTree DT, const TargetLibraryInfo TLI,
const TargetTransformInfo *TTI, unsigned UnrollFactor)		const TargetTransformInfo *TTI, unsigned UnrollFactor,
: InnerLoopVectorizer(OrigLoop, SE, LI, DT, TLI, TTI, 1, UnrollFactor) {}		SCEVUnionPredicate &Preds)
		: InnerLoopVectorizer(OrigLoop, SE, LI, DT, TLI, TTI, 1, UnrollFactor,
		Preds) {}

private:		private:
void scalarizeInstruction(Instruction *Instr,		void scalarizeInstruction(Instruction *Instr,
bool IfPredicateStore = false) override;		bool IfPredicateStore = false) override;
void vectorizeMemoryInstruction(Instruction *Instr) override;		void vectorizeMemoryInstruction(Instruction *Instr) override;
Value getBroadcastInstrs(Value V) override;		Value getBroadcastInstrs(Value V) override;
Value getStepVector(Value Val, int StartIdx, Value *Step) override;		Value getStepVector(Value Val, int StartIdx, Value *Step) override;
Value reverseVector(Value Vec) override;		Value reverseVector(Value Vec) override;
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
/// Use this class to analyze interleaved accesses only when we can vectorize		/// Use this class to analyze interleaved accesses only when we can vectorize
/// a loop. Otherwise it's meaningless to do analysis as the vectorization		/// a loop. Otherwise it's meaningless to do analysis as the vectorization
/// on interleaved accesses is unsafe.		/// on interleaved accesses is unsafe.
///		///
/// The analysis collects interleave groups and records the relationships		/// The analysis collects interleave groups and records the relationships
/// between the member and the group in a map.		/// between the member and the group in a map.
class InterleavedAccessInfo {		class InterleavedAccessInfo {
public:		public:
InterleavedAccessInfo(ScalarEvolution SE, Loop L, DominatorTree *DT)		InterleavedAccessInfo(ScalarEvolution SE, Loop L, DominatorTree *DT,
: SE(SE), TheLoop(L), DT(DT) {}		SCEVUnionPredicate &Preds)
		: SE(SE), TheLoop(L), DT(DT), Preds(Preds) {}

~InterleavedAccessInfo() {		~InterleavedAccessInfo() {
SmallSet<InterleaveGroup *, 4> DelSet;		SmallSet<InterleaveGroup *, 4> DelSet;
// Avoid releasing a pointer twice.		// Avoid releasing a pointer twice.
for (auto &I : InterleaveGroupMap)		for (auto &I : InterleaveGroupMap)
DelSet.insert(I.second);		DelSet.insert(I.second);
for (auto *Ptr : DelSet)		for (auto *Ptr : DelSet)
delete Ptr;		delete Ptr;
Show All 17 Lines	InterleaveGroup getInterleaveGroup(Instruction Instr) const {
return nullptr;		return nullptr;
}		}

private:		private:
ScalarEvolution *SE;		ScalarEvolution *SE;
Loop *TheLoop;		Loop *TheLoop;
DominatorTree *DT;		DominatorTree *DT;

		/// The SCEV predicate containing all the SCEV-related assumptions.
		SCEVUnionPredicate &Preds;
		anemetUnsubmitted Not Done Reply Inline Actions Here too. anemet: Here too.

/// Holds the relationships between the members and the interleave group.		/// Holds the relationships between the members and the interleave group.
DenseMap<Instruction , InterleaveGroup > InterleaveGroupMap;		DenseMap<Instruction , InterleaveGroup > InterleaveGroupMap;

/// \brief The descriptor for a strided memory access.		/// \brief The descriptor for a strided memory access.
struct StrideDescriptor {		struct StrideDescriptor {
StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size,		StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size,
unsigned Align)		unsigned Align)
: Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}		: Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}
▲ Show 20 Lines • Show All 346 Lines • ▼ Show 20 Lines
/// induction variable and the different reduction variables.		/// induction variable and the different reduction variables.
class LoopVectorizationLegality {		class LoopVectorizationLegality {
public:		public:
LoopVectorizationLegality(Loop L, ScalarEvolution SE, DominatorTree *DT,		LoopVectorizationLegality(Loop L, ScalarEvolution SE, DominatorTree *DT,
TargetLibraryInfo TLI, AliasAnalysis AA,		TargetLibraryInfo TLI, AliasAnalysis AA,
Function F, const TargetTransformInfo TTI,		Function F, const TargetTransformInfo TTI,
LoopAccessAnalysis *LAA,		LoopAccessAnalysis *LAA,
LoopVectorizationRequirements *R,		LoopVectorizationRequirements *R,
const LoopVectorizeHints *H)		const LoopVectorizeHints *H,
		unsigned SCEVCheckThreshold,
		SCEVUnionPredicate &Preds)
: NumPredStores(0), TheLoop(L), SE(SE), TLI(TLI), TheFunction(F),		: NumPredStores(0), TheLoop(L), SE(SE), TLI(TLI), TheFunction(F),
TTI(TTI), DT(DT), LAA(LAA), LAI(nullptr), InterleaveInfo(SE, L, DT),		TTI(TTI), DT(DT), LAA(LAA), LAI(nullptr),
Induction(nullptr), WidestIndTy(nullptr), HasFunNoNaNAttr(false),		InterleaveInfo(SE, L, DT, Preds), Induction(nullptr),
Requirements(R), Hints(H) {}		WidestIndTy(nullptr), HasFunNoNaNAttr(false), Requirements(R), Hints(H),
		SCEVCheckThreshold(SCEVCheckThreshold), Preds(Preds) {}

/// ReductionList contains the reduction descriptors for all		/// ReductionList contains the reduction descriptors for all
/// of the reductions that were found in the loop.		/// of the reductions that were found in the loop.
typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;		typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;

/// InductionList saves induction variables and maps them to the		/// InductionList saves induction variables and maps them to the
/// induction descriptor.		/// induction descriptor.
typedef MapVector<PHINode*, InductionDescriptor> InductionList;		typedef MapVector<PHINode*, InductionDescriptor> InductionList;
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	private:
/// Used to emit an analysis of any legality issues.		/// Used to emit an analysis of any legality issues.
const LoopVectorizeHints *Hints;		const LoopVectorizeHints *Hints;

ValueToValueMap Strides;		ValueToValueMap Strides;
SmallPtrSet<Value *, 8> StrideSet;		SmallPtrSet<Value *, 8> StrideSet;

/// While vectorizing these instructions we have to generate a		/// While vectorizing these instructions we have to generate a
/// call to the appropriate masked intrinsic		/// call to the appropriate masked intrinsic
SmallPtrSet<const Instruction*, 8> MaskedOp;		SmallPtrSet<const Instruction *, 8> MaskedOp;

		unsigned SCEVCheckThreshold;
		anemetUnsubmitted Not Done Reply Inline Actions Missing comment. anemet: Missing comment.
		/// The SCEV predicate containing all the SCEV-related assumptions.
		SCEVUnionPredicate &Preds;
};		};

/// LoopVectorizationCostModel - estimates the expected speedups due to		/// LoopVectorizationCostModel - estimates the expected speedups due to
/// vectorization.		/// vectorization.
/// In many cases vectorization is not profitable. This can happen because of		/// In many cases vectorization is not profitable. This can happen because of
/// a number of reasons. In this class we mainly attempt to predict the		/// a number of reasons. In this class we mainly attempt to predict the
/// expected speedup/slowdowns due to the supported instruction set. We use the		/// expected speedup/slowdowns due to the supported instruction set. We use the
/// TargetTransformInfo to query the different backends for the cost of		/// TargetTransformInfo to query the different backends for the cost of
/// different operations.		/// different operations.
class LoopVectorizationCostModel {		class LoopVectorizationCostModel {
public:		public:
LoopVectorizationCostModel(Loop L, ScalarEvolution SE, LoopInfo *LI,		LoopVectorizationCostModel(Loop L, ScalarEvolution SE, LoopInfo *LI,
LoopVectorizationLegality *Legal,		LoopVectorizationLegality *Legal,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
const TargetLibraryInfo TLI, AssumptionCache AC,		const TargetLibraryInfo TLI, AssumptionCache AC,
const Function F, const LoopVectorizeHints Hints,		const Function F, const LoopVectorizeHints Hints,
SmallPtrSetImpl<const Value *> &ValuesToIgnore)		SmallPtrSetImpl<const Value *> &ValuesToIgnore,
		SCEVUnionPredicate &Preds)
: TheLoop(L), SE(SE), LI(LI), Legal(Legal), TTI(TTI), TLI(TLI),		: TheLoop(L), SE(SE), LI(LI), Legal(Legal), TTI(TTI), TLI(TLI),
TheFunction(F), Hints(Hints), ValuesToIgnore(ValuesToIgnore) {}		TheFunction(F), Hints(Hints), ValuesToIgnore(ValuesToIgnore),
		Preds(Preds) {}

/// Information about vectorization costs		/// Information about vectorization costs
struct VectorizationFactor {		struct VectorizationFactor {
unsigned Width; // Vector width with best cost		unsigned Width; // Vector width with best cost
unsigned Cost; // Cost of the loop with that width		unsigned Cost; // Cost of the loop with that width
};		};
/// \return The most profitable vectorization factor and the cost of that VF.		/// \return The most profitable vectorization factor and the cost of that VF.
/// This method checks every power of two up to VF. If UserVF is not ZERO		/// This method checks every power of two up to VF. If UserVF is not ZERO
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	private:
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
/// Target Library Info.		/// Target Library Info.
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
const Function *TheFunction;		const Function *TheFunction;
// Loop Vectorize Hint.		// Loop Vectorize Hint.
const LoopVectorizeHints *Hints;		const LoopVectorizeHints *Hints;
// Values to ignore in the cost model.		// Values to ignore in the cost model.
const SmallPtrSetImpl<const Value *> &ValuesToIgnore;		const SmallPtrSetImpl<const Value *> &ValuesToIgnore;
		/// The SCEV predicate containing all the SCEV-related assumptions.
		SCEVUnionPredicate &Preds;
};		};

/// \brief This holds vectorization requirements that must be verified late in		/// \brief This holds vectorization requirements that must be verified late in
/// the process. The requirements are set by legalize and costmodel. Once		/// the process. The requirements are set by legalize and costmodel. Once
/// vectorization has been determined to be possible and profitable the		/// vectorization has been determined to be possible and profitable the
/// requirements can be verified by looking for metadata or compiler options.		/// requirements can be verified by looking for metadata or compiler options.
/// For example, some loops require FP commutativity which is only allowed if		/// For example, some loops require FP commutativity which is only allowed if
/// vectorization is explicitly specified or if the fast-math compiler option		/// vectorization is explicitly specified or if the fast-math compiler option
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	if (TC > 0u && TC < TinyTripCountVectorThreshold) {
DEBUG(dbgs() << "\n");		DEBUG(dbgs() << "\n");
emitAnalysisDiag(F, L, Hints, VectorizationReport()		emitAnalysisDiag(F, L, Hints, VectorizationReport()
<< "vectorization is not beneficial "		<< "vectorization is not beneficial "
"and is not explicitly forced");		"and is not explicitly forced");
return false;		return false;
}		}
}		}

		unsigned SCEVThreshold = VectorizeSCEVCheckThreshold;
		if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)
		SCEVThreshold = PragmaVectorizeSCEVCheckThreshold;

		SCEVUnionPredicate Preds;

// Check if it is legal to vectorize the loop.		// Check if it is legal to vectorize the loop.
LoopVectorizationRequirements Requirements;		LoopVectorizationRequirements Requirements;
LoopVectorizationLegality LVL(L, SE, DT, TLI, AA, F, TTI, LAA,		LoopVectorizationLegality LVL(L, SE, DT, TLI, AA, F, TTI, LAA,
&Requirements, &Hints);		&Requirements, &Hints, SCEVThreshold, Preds);
		anemetUnsubmitted Not Done Reply Inline Actions Any reason this whole logic can't be pushed into LVLegality? anemet: Any reason this whole logic can't be pushed into LVLegality?
if (!LVL.canVectorize()) {		if (!LVL.canVectorize()) {
DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");		DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");
emitMissedWarning(F, L, Hints);		emitMissedWarning(F, L, Hints);
return false;		return false;
}		}

// Collect values we want to ignore in the cost model. This includes		// Collect values we want to ignore in the cost model. This includes
// type-promoting instructions we identified during reduction detection.		// type-promoting instructions we identified during reduction detection.
SmallPtrSet<const Value *, 32> ValuesToIgnore;		SmallPtrSet<const Value *, 32> ValuesToIgnore;
CodeMetrics::collectEphemeralValues(L, AC, ValuesToIgnore);		CodeMetrics::collectEphemeralValues(L, AC, ValuesToIgnore);
for (auto &Reduction : *LVL.getReductionVars()) {		for (auto &Reduction : *LVL.getReductionVars()) {
RecurrenceDescriptor &RedDes = Reduction.second;		RecurrenceDescriptor &RedDes = Reduction.second;
SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();		SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();
ValuesToIgnore.insert(Casts.begin(), Casts.end());		ValuesToIgnore.insert(Casts.begin(), Casts.end());
}		}

// Use the cost model.		// Use the cost model.
LoopVectorizationCostModel CM(L, SE, LI, &LVL, *TTI, TLI, AC, F, &Hints,		LoopVectorizationCostModel CM(L, SE, LI, &LVL, *TTI, TLI, AC, F, &Hints,
ValuesToIgnore);		ValuesToIgnore, Preds);

// Check the function attributes to find out if this function should be		// Check the function attributes to find out if this function should be
// optimized for size.		// optimized for size.
bool OptForSize = Hints.getForce() != LoopVectorizeHints::FK_Enabled &&		bool OptForSize = Hints.getForce() != LoopVectorizeHints::FK_Enabled &&
F->optForSize();		F->optForSize();

// Compute the weighted frequency of this loop being executed and see if it		// Compute the weighted frequency of this loop being executed and see if it
// is less than 20% of the function entry baseline frequency. Note that we		// is less than 20% of the function entry baseline frequency. Note that we
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	if (!VectorizeLoop && !InterleaveLoop) {
<< DebugLocStr << '\n');		<< DebugLocStr << '\n');
DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');		DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
}		}

if (!VectorizeLoop) {		if (!VectorizeLoop) {
assert(IC > 1 && "interleave count should not be 1 or 0");		assert(IC > 1 && "interleave count should not be 1 or 0");
// If we decided that it is not legal to vectorize the loop then		// If we decided that it is not legal to vectorize the loop then
// interleave it.		// interleave it.
InnerLoopUnroller Unroller(L, SE, LI, DT, TLI, TTI, IC);		InnerLoopUnroller Unroller(L, SE, LI, DT, TLI, TTI, IC, Preds);
Unroller.vectorize(&LVL);		Unroller.vectorize(&LVL);

emitOptimizationRemark(F->getContext(), LV_NAME, *F, L->getStartLoc(),		emitOptimizationRemark(F->getContext(), LV_NAME, *F, L->getStartLoc(),
Twine("interleaved loop (interleaved count: ") +		Twine("interleaved loop (interleaved count: ") +
Twine(IC) + ")");		Twine(IC) + ")");
} else {		} else {
// If we decided that it is legal to vectorize the loop then do it.		// If we decided that it is legal to vectorize the loop then do it.
InnerLoopVectorizer LB(L, SE, LI, DT, TLI, TTI, VF.Width, IC);		InnerLoopVectorizer LB(L, SE, LI, DT, TLI, TTI, VF.Width, IC, Preds);
LB.vectorize(&LVL);		LB.vectorize(&LVL);
++LoopsVectorized;		++LoopsVectorized;

// Add metadata to disable runtime unrolling scalar loop when there's no		// Add metadata to disable runtime unrolling scalar loop when there's no
// runtime check about strides and memory. Because at this situation,		// runtime check about strides and memory. Because at this situation,
// scalar loop is rarely used not worthy to be unrolled.		// scalar loop is rarely used not worthy to be unrolled.
if (!LB.IsSafetyChecksAdded())		if (!LB.IsSafetyChecksAdded())
AddRuntimeUnrollDisableMetaData(L);		AddRuntimeUnrollDisableMetaData(L);
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	else {
// We are going to replace this stride by 1 so the cast is safe to ignore.		// We are going to replace this stride by 1 so the cast is safe to ignore.
//		//
// %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		// %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
// %0 = trunc i64 %indvars.iv to i32		// %0 = trunc i64 %indvars.iv to i32
// %mul = mul i32 %0, %Stride1		// %mul = mul i32 %0, %Stride1
// %idxprom = zext i32 %mul to i64 << Safe cast.		// %idxprom = zext i32 %mul to i64 << Safe cast.
// %arrayidx = getelementptr inbounds i32* %B, i64 %idxprom		// %arrayidx = getelementptr inbounds i32* %B, i64 %idxprom
//		//
Last = replaceSymbolicStrideSCEV(SE, Strides,		Last = replaceSymbolicStrideSCEV(SE, Strides, Preds,
Gep->getOperand(InductionOperand), Gep);		Gep->getOperand(InductionOperand), Gep);
if (const SCEVCastExpr *C = dyn_cast<SCEVCastExpr>(Last))		if (const SCEVCastExpr *C = dyn_cast<SCEVCastExpr>(Last))
Last =		Last =
(C->getSCEVType() == scSignExtend \|\| C->getSCEVType() == scZeroExtend)		(C->getSCEVType() == scSignExtend \|\| C->getSCEVType() == scZeroExtend)
? C->getOperand()		? C->getOperand()
: Last;		: Last;
}		}
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Last)) {		if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Last)) {
▲ Show 20 Lines • Show All 541 Lines • ▼ Show 20 Lines	for (unsigned Width = 0; Width < VF; ++Width) {
// End if-block.		// End if-block.
if (IfPredicateStore)		if (IfPredicateStore)
PredicatedStores.push_back(std::make_pair(cast<StoreInst>(Cloned),		PredicatedStores.push_back(std::make_pair(cast<StoreInst>(Cloned),
Cmp));		Cmp));
}		}
}		}
}		}

static Instruction getFirstInst(Instruction FirstInst, Value *V,		PHINode InnerLoopVectorizer::createInductionVariable(Loop L, Value *Start,
Instruction *Loc) {		Value End, Value Step,
if (FirstInst)
return FirstInst;
if (Instruction *I = dyn_cast<Instruction>(V))
return I->getParent() == Loc->getParent() ? I : nullptr;
return nullptr;
}

std::pair<Instruction , Instruction >
InnerLoopVectorizer::addStrideCheck(Instruction *Loc) {
Instruction *tnullptr = nullptr;
if (!Legal->mustCheckStrides())
return std::pair<Instruction , Instruction >(tnullptr, tnullptr);

IRBuilder<> ChkBuilder(Loc);

// Emit checks.
Value *Check = nullptr;
Instruction *FirstInst = nullptr;
for (SmallPtrSet<Value *, 8>::iterator SI = Legal->strides_begin(),
SE = Legal->strides_end();
SI != SE; ++SI) {
Value Ptr = stripIntegerCast(SI);
Value *C = ChkBuilder.CreateICmpNE(Ptr, ConstantInt::get(Ptr->getType(), 1),
"stride.chk");
// Store the first instruction we create.
FirstInst = getFirstInst(FirstInst, C, Loc);
if (Check)
Check = ChkBuilder.CreateOr(Check, C);
else
Check = C;
}

// We have to do this trickery because the IRBuilder might fold the check to a
// constant expression in which case there is no Instruction anchored in a
// the block.
LLVMContext &Ctx = Loc->getContext();
Instruction *TheCheck =
BinaryOperator::CreateAnd(Check, ConstantInt::getTrue(Ctx));
ChkBuilder.Insert(TheCheck, "stride.not.one");
FirstInst = getFirstInst(FirstInst, TheCheck, Loc);

return std::make_pair(FirstInst, TheCheck);
}

PHINode InnerLoopVectorizer::createInductionVariable(Loop L,
Value *Start,
Value *End,
Value *Step,
Instruction *DL) {		Instruction *DL) {
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
// As we're just creating this loop, it's possible no latch exists		// As we're just creating this loop, it's possible no latch exists
// yet. If so, use the header as this will be a single block loop.		// yet. If so, use the header as this will be a single block loop.
if (!Latch)		if (!Latch)
Latch = Header;		Latch = Header;

▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	BasicBlock *NewBB = BB->splitBasicBlock(BB->getTerminator(),
"vector.ph");		"vector.ph");
if (L->getParentLoop())		if (L->getParentLoop())
L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);		L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
ReplaceInstWithInst(BB->getTerminator(),		ReplaceInstWithInst(BB->getTerminator(),
BranchInst::Create(Bypass, NewBB, Cmp));		BranchInst::Create(Bypass, NewBB, Cmp));
LoopBypassBlocks.push_back(BB);		LoopBypassBlocks.push_back(BB);
}		}

void InnerLoopVectorizer::emitStrideChecks(Loop *L,		void InnerLoopVectorizer::emitSCEVChecks(Loop L, BasicBlock Bypass) {
BasicBlock *Bypass) {
BasicBlock *BB = L->getLoopPreheader();		BasicBlock *BB = L->getLoopPreheader();

// Generate the code to check that the strides we assumed to be one are really		// Generate the code to check that the strides we assumed to be one are really
// one. We want the new basic block to start at the first instruction in a		// one. We want the new basic block to start at the first instruction in a
// sequence of instructions that form a check.		// sequence of instructions that form a check.
Instruction *StrideCheck;		Instruction *SCEVCheck;
Instruction *FirstCheckInst;		Instruction *FirstCheckInst;
std::tie(FirstCheckInst, StrideCheck) = addStrideCheck(BB->getTerminator());		std::tie(FirstCheckInst, SCEVCheck) =
if (!StrideCheck)		Preds.generateGuardCond(BB->getTerminator(), *SE);
		if (!SCEVCheck)
return;		return;

// Create a new block containing the stride check.		// Create a new block containing the stride check.
BB->setName("vector.stridecheck");		BB->setName("vector.scevcheck");
auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");		auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
if (L->getParentLoop())		if (L->getParentLoop())
L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);		L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
ReplaceInstWithInst(BB->getTerminator(),		ReplaceInstWithInst(BB->getTerminator(),
BranchInst::Create(Bypass, NewBB, StrideCheck));		BranchInst::Create(Bypass, NewBB, SCEVCheck));
LoopBypassBlocks.push_back(BB);		LoopBypassBlocks.push_back(BB);
AddedSafetyChecks = true;		AddedSafetyChecks = true;
}		}

void InnerLoopVectorizer::emitMemRuntimeChecks(Loop *L,		void InnerLoopVectorizer::emitMemRuntimeChecks(Loop *L,
BasicBlock *Bypass) {		BasicBlock *Bypass) {
BasicBlock *BB = L->getLoopPreheader();		BasicBlock *BB = L->getLoopPreheader();

▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::createEmptyLoop() {
// We need to test whether the backedge-taken count is uint##_max. Adding one		// We need to test whether the backedge-taken count is uint##_max. Adding one
// to it will cause overflow and an incorrect loop trip count in the vector		// to it will cause overflow and an incorrect loop trip count in the vector
// body. In case of overflow we want to directly jump to the scalar remainder		// body. In case of overflow we want to directly jump to the scalar remainder
// loop.		// loop.
emitMinimumIterationCountCheck(Lp, ScalarPH);		emitMinimumIterationCountCheck(Lp, ScalarPH);
// Now, compare the new count to zero. If it is zero skip the vector loop and		// Now, compare the new count to zero. If it is zero skip the vector loop and
// jump to the scalar loop.		// jump to the scalar loop.
emitVectorLoopEnteredCheck(Lp, ScalarPH);		emitVectorLoopEnteredCheck(Lp, ScalarPH);
// Generate the code to check that the strides we assumed to be one are really		// Generate the code to check any assumptions that we've made for SCEV
// one. We want the new basic block to start at the first instruction in a		// expressions.
// sequence of instructions that form a check.		emitSCEVChecks(Lp, ScalarPH);
emitStrideChecks(Lp, ScalarPH);
// Generate the code that checks in runtime if arrays overlap. We put the		// Generate the code that checks in runtime if arrays overlap. We put the
// checks into a separate block to make the more common case of few elements		// checks into a separate block to make the more common case of few elements
// faster.		// faster.
emitMemRuntimeChecks(Lp, ScalarPH);		emitMemRuntimeChecks(Lp, ScalarPH);

// Generate the induction variable.		// Generate the induction variable.
// The loop step is equal to the vectorization factor (num of SIMD elements)		// The loop step is equal to the vectorization factor (num of SIMD elements)
// times the unroll factor (num of SIMD instructions).		// times the unroll factor (num of SIMD instructions).
▲ Show 20 Lines • Show All 1,119 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::canVectorize() {
bool UseInterleaved = TTI->enableInterleavedAccessVectorization();		bool UseInterleaved = TTI->enableInterleavedAccessVectorization();

// If an override option has been passed in for interleaved accesses, use it.		// If an override option has been passed in for interleaved accesses, use it.
if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)		if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)
UseInterleaved = EnableInterleavedMemAccesses;		UseInterleaved = EnableInterleavedMemAccesses;

// Analyze interleaved memory accesses.		// Analyze interleaved memory accesses.
if (UseInterleaved)		if (UseInterleaved)
InterleaveInfo.analyzeInterleaving(Strides);		InterleaveInfo.analyzeInterleaving(Strides);

		if (Preds.getComplexity() > SCEVCheckThreshold) {
		emitAnalysis(VectorizationReport()
		<< "Too many SCEV assuptions need to be made and checked "
		<< "at runtime");
		DEBUG(dbgs() << "LV: Too many SCEV checks needed.\n");
		return false;
		}

// Okay! We can vectorize. At this point we don't have any other mem analysis		// Okay! We can vectorize. At this point we don't have any other mem analysis
// which may limit our maximum vectorization factor, so just return true with		// which may limit our maximum vectorization factor, so just return true with
// no restrictions.		// no restrictions.
return true;		return true;
}		}

static Type convertPointerToIntegerType(const DataLayout &DL, Type Ty) {		static Type convertPointerToIntegerType(const DataLayout &DL, Type Ty) {
if (Ty->isPointerTy())		if (Ty->isPointerTy())
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	if (LAI->hasStoreToLoopInvariantAddress()) {
emitAnalysis(		emitAnalysis(
VectorizationReport()		VectorizationReport()
<< "write to a loop invariant address could not be vectorized");		<< "write to a loop invariant address could not be vectorized");
DEBUG(dbgs() << "LV: We don't allow storing to uniform addresses\n");		DEBUG(dbgs() << "LV: We don't allow storing to uniform addresses\n");
return false;		return false;
}		}

Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());		Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());
		Preds.add(&LAI->Preds);

return true;		return true;
}		}

bool LoopVectorizationLegality::isInductionVariable(const Value *V) {		bool LoopVectorizationLegality::isInductionVariable(const Value *V) {
Value In0 = const_cast<Value>(V);		Value In0 = const_cast<Value>(V);
PHINode *PN = dyn_cast_or_null<PHINode>(In0);		PHINode *PN = dyn_cast_or_null<PHINode>(In0);
if (!PN)		if (!PN)
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	if (AccessList.empty())
return;		return;

auto &DL = TheLoop->getHeader()->getModule()->getDataLayout();		auto &DL = TheLoop->getHeader()->getModule()->getDataLayout();
for (auto I : AccessList) {		for (auto I : AccessList) {
LoadInst *LI = dyn_cast<LoadInst>(I);		LoadInst *LI = dyn_cast<LoadInst>(I);
StoreInst *SI = dyn_cast<StoreInst>(I);		StoreInst *SI = dyn_cast<StoreInst>(I);

Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand();		Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand();
int Stride = isStridedPtr(SE, Ptr, TheLoop, Strides);		int Stride = isStridedPtr(SE, Ptr, TheLoop, Strides, Preds);

// The factor of the corresponding interleave group.		// The factor of the corresponding interleave group.
unsigned Factor = std::abs(Stride);		unsigned Factor = std::abs(Stride);

// Ignore the access if the factor is too small or too large.		// Ignore the access if the factor is too small or too large.
if (Factor < 2 \|\| Factor > MaxInterleaveGroupFactor)		if (Factor < 2 \|\| Factor > MaxInterleaveGroupFactor)
continue;		continue;

const SCEV *Scev = replaceSymbolicStrideSCEV(SE, Strides, Ptr);		const SCEV *Scev = replaceSymbolicStrideSCEV(SE, Strides, Preds, Ptr);
PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());		PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());
unsigned Size = DL.getTypeAllocSize(PtrTy->getElementType());		unsigned Size = DL.getTypeAllocSize(PtrTy->getElementType());

// An alignment of 0 means target ABI alignment.		// An alignment of 0 means target ABI alignment.
unsigned Align = LI ? LI->getAlignment() : SI->getAlignment();		unsigned Align = LI ? LI->getAlignment() : SI->getAlignment();
if (!Align)		if (!Align)
Align = DL.getABITypeAlignment(PtrTy->getElementType());		Align = DL.getABITypeAlignment(PtrTy->getElementType());

▲ Show 20 Lines • Show All 1,053 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV][LV] Introduce SCEV Predicates and use them to re-implement stride versioningAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 36727

include/llvm/Analysis/LoopAccessAnalysis.h

include/llvm/Analysis/ScalarEvolution.h

lib/Analysis/LoopAccessAnalysis.cpp

lib/Analysis/ScalarEvolution.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

[SCEV][LV] Introduce SCEV Predicates and use them to re-implement stride versioning
AbandonedPublic