This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Merge memchecks for accesses separated by a constant offset
ClosedPublic

Authored by sbaranga on Jun 11 2015, 5:55 AM.

Download Raw Diff

Details

Reviewers

Commits

rG1b6b50a92130: [LAA] Merge memchecks for accesses separated by a constant offset
rL241673: [LAA] Merge memchecks for accesses separated by a constant offset

Summary

Often filter-like loops will do memory accesses that are
separated by constant offsets. In these cases it is
common that we will exceed the threshold for the
allowable number of checks.

However, it should be possible to merge such checks,
sice a check of any interval againt two other intervals separated
by a constant offset (a,b), (a+c, b+c) will be equivalent with
a check againt (a, b+c), as long as (a,b) and (a+c, b+c) overlap.
Assuming the loop will be executed for a sufficient number of
iterations, this will be true. If not true, checking against
(a, b+c) is still safe (although not equivalent).

As long as there are no dependencies between two accesses,
we can merge their checks into a single one. We use this
technique to construct groups of accesses, and then check
the intervals associated with the groups instead of
checking the accesses directly.

Diff Detail

Repository: rL LLVM

Event Timeline

sbaranga updated this revision to Diff 27499.Jun 11 2015, 5:55 AM

sbaranga retitled this revision from to [LAA] Merge memchecks for accesses separated by a constant offset.

sbaranga updated this object.

sbaranga edited the test plan for this revision. (Show Details)

sbaranga added a reviewer: anemet.Jun 11 2015, 5:56 AM

sbaranga added a subscriber: Unknown Object (MLST).

Added llvm-commits as a subscriber (I'm adding this comment just to get phabricator to send out notice emails).

anemet added inline comments.Jun 16 2015, 11:16 PM

include/llvm/Analysis/LoopAccessAnalysis.h
350–352 ↗	(On Diff #27499)	How about calling it "needsChecking" then (but on Groups now)? "Dependent" does not seem specific enough in this context. Three slashes and \brief in the comment. There is more of the same later.
351 ↗	(On Diff #27499)	You must mean "needsChecking" here.
357–358 ↗	(On Diff #27499)	If GroupedChecks is the return value here, it should be returned by value rather than passed-by-reference.
lib/Analysis/LoopAccessAnalysis.cpp
181 ↗	(On Diff #27499)	Why isn't this just insert and Leader == I?
190–191 ↗	(On Diff #27499)	Why is inbounds relevant here?
209–212 ↗	(On Diff #27499)	Please run this through clang-format-diff.py
214 ↗	(On Diff #27499)	Can we already have Leader at this point in the EC? I also think that Value and Leader are not really good names here. Perhaps "Set" and "Pointer"? I read this far but I am not sure I understand this algorithm. It probably needs a big comment at the beginning explaining what's going on. Also it looks quadratic. Could we do something better by perhaps analyzing the start values of the ARs and then only trying to match those that share the same base pointer? We may actually already have these "related" pointers in AccessAnalysis::DepCands. (The name is not great. This are candidates for dependence analysis, i.e. pointers that share the same underlying object.)
test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll
1 ↗	(On Diff #27499)	I don't understand this change. Now instead of unit-testing a pass, you run the entire -O1 pipeline?!
61–79 ↗	(On Diff #27499)	It would be good to have a C version of these loop in comment as well.

Ran through clang format to improve formatting.
Added text to explain algorithm and did some renames that
should improve readability.

Thanks! I've uploaded a new version.

-Silviu

include/llvm/Analysis/LoopAccessAnalysis.h
350–352 ↗	(On Diff #27499)	Good idea! Should be changed now.
lib/Analysis/LoopAccessAnalysis.cpp
181 ↗	(On Diff #27499)	It is. I also renamed I to Pointer for clarity in the newest version.
190–191 ↗	(On Diff #27499)	I was concerned that possible pointer wrapping might affect the result of some comparisons. For example if we have a (a pointer ) and c a positive constant, a + c might be smaller than a if we wrap. However I see that we are already excluding this case somewhere else for all accesses, so it should be ok to remove this.
214 ↗	(On Diff #27499)	We do have Leader (currently renamed) in the EC (the last one being added). I've renamed to FirstIndexInSet and Pointer in the latest revision. The algorithm is quadratic. However, the number of pointers is currently bounded (maximum 100 I believe). Also, the existing needsAnyChecking() etc are also quadratic, and we only run this when we would normally run something that is also quadratic, which made me think that it might not be that bad. I had a look at the DepCands, and it looks like they would need to be processed further as we want to partition into a sets of pointers which don't need checks within each set (and I don't think that's the case with DepCands). I think this would bring us back at a quadratic algorithm. One thing that we could do is iterate within DepCands for each pointer instead of iterating over the entire set of pointers. Matching only pointers with the same underlying object is also an option.
test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll
1 ↗	(On Diff #27499)	Removed the -O1. However we now don't see some functions as being 'safe', but that should be ok for testing this change.

lib/Analysis/LoopAccessAnalysis.cpp
214 ↗	(On Diff #27499)	A DepCands equivalence class does not need runtime checks between its elements. (We analyze the accesses within such set with the MemoryDependenceChecker.) I think that all we may need to do is to find the min start bound and the max end bound of a class within DepCands. This should cover the range for the entire class. What do you think?

sbaranga added inline comments.Jun 18 2015, 8:13 AM

lib/Analysis/LoopAccessAnalysis.cpp
214 ↗	(On Diff #27499)	Ah, yes. That makes sense. I think it would be possible to check the value in DependencySetId instead of getting access to DepCands? I agree, the idea behind this is to get the min and max of these objects and use that for the whole class. I think we are essentially doing the same thing here, except we only know how to compare some pairs of pointers (the very simple case where the difference is a constant). The biggest problem as far as I can see is that we could in theory have pointers to the same underlying object which we can't order (and it might be impossible to determine the full order in all cases) I think we have two options in this case: we can create separate groups, where we can order all elements within a group (we do this in the current solution) we can create umin/umax SCEV expressions - this leads to huge SCEV expressions and is essentially a way to hide the cost of the checks. I will refactor the code so that we will have some comparison function for pointers and no mentions to constants outside it, and that can be easily extended to support more cases for pointer comparisons. Does that sound like a good idea?

[Moved this from an inline comment so that it's easier to reply inline]

Ah, yes. That makes sense. I think it would be possible to check the value in
DependencySetId instead of getting access to DepCands?

Yes, I believe so. There is a fallback case where we disable the MemoryDependenceChecker. In this case we could disable merging too.

I agree, the idea behind this is to get the min and max of these objects and
use that for the whole class. I think we are essentially doing the same thing
here, except we only know how to compare some pairs of pointers (the very
simple case where the difference is a constant).

The biggest problem as far as I can see is that we could in theory have
pointers to the same underlying object which we can't order (and it might be
impossible to determine the full order in all cases)

I think we already have this distinction available as well. You're right that this could happen within a DepCands set. These I believe would be marked Dependence::Unknown.

So perhaps the approach should be to build the groups from the InterestingDependences. Any known dependence would put the two participating pointers in the same group.

(You may need to start recording forward dependences as well which we don't do currently.)

I think we have two
options in this case:

we can create separate groups, where we can order all elements within a

group (we do this in the current solution)

I'd go for this initially.

we can create umin/umax SCEV expressions - this leads to huge SCEV

expressions and is essentially a way to hide the cost of the checks.

I will refactor the code so that we will have some comparison function for
pointers and no mentions to constants outside it, and that can be easily
extended to support more cases for pointer comparisons. Does that sound like a
good idea?

I think so but I am not completely sure I understand this last paragraph.

Adam

lib/Analysis/LoopAccessAnalysis.cpp
214 ↗	(On Diff #27499)	I can't quote the lines in reply to an inline comment so moving this to the main comment area.

Hi Adam,

In D10386#190514, @anemet wrote:

The biggest problem as far as I can see is that we could in theory have
pointers to the same underlying object which we can't order (and it might be
impossible to determine the full order in all cases)

I think we already have this distinction available as well. You're right that this could happen within a DepCands set. These I believe would be marked Dependence::Unknown.

So perhaps the approach should be to build the groups from the InterestingDependences. Any known dependence would put the two participating pointers in the same group.

(You may need to start recording forward dependences as well which we don't do currently.)

I've looked at this in more detail, and there is a problem with NoDep dependencies since they are not considered to be either forward or backward. We need to consider these as well, since otherwise we would not cover common cases (and also wouldn't solve the problem for interleaved accesses).

We could add some new types of NoDeps to get that information. I hope that adding these as well (plus the forward ones) won't exceed the maximum number of interesting dependencies too often.

Thanks,
Silviu

In D10386#191741, @sbaranga wrote:

I've looked at this in more detail, and there is a problem with NoDep dependencies since they are not considered to be either forward or backward. We need to consider these as well, since otherwise we would not cover common cases (and also wouldn't solve the problem for interleaved accesses).

Good point but let's see first if we can do this without adding "fake" dependence types for these. Sounds like we should be able to work this out with using *both* DepCands and InterestingDependences:

If the pointers fall in the same set in DepCands and don't have unknown dependence between then we should be able to add them in the same checking pointer group. Do you agree?

In D10386#191851, @anemet wrote:

In D10386#191741, @sbaranga wrote:

I've looked at this in more detail, and there is a problem with NoDep dependencies since they are not considered to be either forward or backward. We need to consider these as well, since otherwise we would not cover common cases (and also wouldn't solve the problem for interleaved accesses).

Good point but let's see first if we can do this without adding "fake" dependence types for these. Sounds like we should be able to work this out with using *both* DepCands and InterestingDependences:

If the pointers fall in the same set in DepCands and don't have unknown dependence between then we should be able to add them in the same checking pointer group. Do you agree?

The problem is that for each checking pointer group we need to get the min/max pointers (otherwise we wouldn't know how to emit memchecks). If we don't have dependencies, then we wouldn't be able to figure out what the bounds are. This might be easier with an example that shows all the issues:

void test(char *in, char * out, int n, int s) {

for (int i = 0; i < n; ++i)
  out[i] = in[i] + in[i + 1] + in[i + 2] + in[i + 3] + in[i + s];

}

All the accesses to "in" are in the same equivalence class in DepCands, but we get no interesting dependencies (we get only NoDep because there are only reads). If we would say that all these accesses belong to the same pointer checking group, then we would end up having to compare in + s with i and i + 1, etc which we don't know how to do. Since we don't get any dependencies we also don't know how to compare i + 1 with i + 2 etc.

wrt to the pointer checking partitioning algorithm, I think because we need to get the min/max pointers for each group, we can't join two groups when we only see a dependency between them:

Let's say we have 3 pointers, a, b and c and two dependencies for which we can say: a > b and a > c. However we don't know anything about how b and c compare. This can happen when recording too many dependencies, or ScalarEvolution might not return a constant for b - c, or some other condition. If we merge two group checks whenever seeing an edge between them, then we end up with (a, b, c) as a pointer group check. But we don't know what the lower bound of this group is (it's either b or c), so we would end up in some invalid state.

One solution would be to track the min/max elements of each pointer checking group and only add other pointers to it as long as we can keep this property (we know what the min/max pointers are).

Regarding dependencies: maybe it would be somewhat simpler to compare the pointers on the fly (the comparison is simple I think) - and limit the number of iterations that we do when grouping the pointers? Dependencies are closely related to ordering (we need to order pointers when computing dependency types), but different enough to cause problems.

Thanks,
Silviu

OK, so I guess we're back to the original idea of using DepCands to limit the search space.

I.e. rather than:

for each p in Pointers:
  for each r in Pointers after p:

we could do:

for each set S in DepCands:
  for each pointer p in S:
    for each pointer r in S after p:

We now use DepCands to speed up the construction of pointer check groups.
We cache the results of the algorithm, and compute the result at canCheckPtrAtRT
only when it is needed.

We now also have a limit on the number of comparisons that the grouping algorithm will perform.

Longish comment down is my main question. We should probably resolve that first.

include/llvm/Analysis/LoopAccessAnalysis.h
335 ↗	(On Diff #28359)	A grouping of pointers
337 ↗	(On Diff #28359)	CheckingGroup or CheckingPtrGroup probably sounds better.
338 ↗	(On Diff #28359)	s/0/nullptr
339 ↗	(On Diff #28359)	s/wich/which later too
lib/Analysis/LoopAccessAnalysis.cpp
166–171 ↗	(On Diff #28359)	I don't get this last point -- seems pretty recursive.
194 ↗	(On Diff #28359)	s/indeces/indices
196–199 ↗	(On Diff #28359)	Why do we need to compare against the same element?
202–213 ↗	(On Diff #28359)	Shouldn't we only look through the members if DI is a leader?
208–211 ↗	(On Diff #28359)	I don't understand what you gain by using an EC for this. It seems to me that you just want a vector of CheckGroups for each DepCands set. For each pointer then you go through this vector and try to merge to an existing group. If nothing found you add a new group at the end of the vector. As another general comment, it would be good to push some of the low-level mechanics into the CheckingGroup class so that the outline of algorithm is separated from the details.
242–243 ↗	(On Diff #28359)	Nit: fold the increment in the comparison?

anemet added inline comments.Jun 25 2015, 1:52 PM

lib/Analysis/LoopAccessAnalysis.cpp
208–211 ↗	(On Diff #28359)	I just want to clarify one more thing about the above algorithm above using a local vector of CheckingGroup instead of an EC. When we're done merging for a DepCands set, then you'd merge the "local" vector of CheckingGroups to the main one.

Renamed CheckGroup to CheckingPtrGroup.

Remove the use of EquivalenceClasses and all the other
vectors used in the merging algorithm. We now use a single
SmallVector of CheckingPtrGroups instead (which seems to
be enough). This should make everything more readable.

Fixed the iteration of DepCands (was missing a isLeader check).

Moved the code that adds a pointer to a CheckingPtrGroup
(and checks that this can be done) to CheckingPtrGroup.

Change the constructor of CheckingPtrGroup to take the index
of the first element that makes up the group. We only need
to construct groups with at least one element and the previous
constructor was only in a single place.

Various comment changes / spelling fixes according to comments.

Hi Adam,

Thanks for the comments! I've uploaded a new version, which should (hopefully) deal with all the issues raised.

Thanks,
Silviu

lib/Analysis/LoopAccessAnalysis.cpp
196–199 ↗	(On Diff #28359)	Sorry, the last sentence of that comment was stale. Removed in the new version.
202–213 ↗	(On Diff #28359)	We should! Nice catch!
208–211 ↗	(On Diff #28359)	I've updated the algorithm. By using a SmallVector of CheckingPtrGroup we got rid of the not only the EquivalenceClass, but also the other vectors (Mins, Maxs, etc).

Wow, looks great, IMO. Thanks for your work, Silviu!

Some minor things:

include/llvm/Analysis/LoopAccessAnalysis.h
346 ↗	(On Diff #28564)	s/recored/recorded
357–362 ↗	(On Diff #28564)	These are not indices but SCEVs. For the record, I agree that they should be SCEVs but the comment is wrong.
420–421 ↗	(On Diff #28564)	How about CheckingGroups?
lib/Analysis/LoopAccessAnalysis.cpp
150 ↗	(On Diff #28564)	Put \p before I and J
231–242 ↗	(On Diff #28564)	Looks like you could formulate this with a range-based loop.
235–236 ↗	(On Diff #28564)	I think that you also want to break out from the entire loop-nest here. (I'd probably go with a goto in this case.)
253–254 ↗	(On Diff #28564)	Can this be written with std::copy? I.e.: std::copy(Groups.begin(), Groups.end(), std::back_inserter(CheckGroups))
test/Analysis/LoopAccessAnalysis/unsafe-and-rt-checks.ll
16–19 ↗	(On Diff #28564)	Can we make it clearer where one group ends and the other one starts when printing a check?

Fixed a number of spelling errors, missed doxygen tags and out of date comments
(according to the review). Also added further comments to explain why we don't
want to break from the loop nest when reaching the maximum number of comparisons
(we want to break only from the inner loop).

Renamed CheckGroups to CheckingGroups.

We are now using std::copy to add the newly computed CheckingPtrGroups to the
global solution. Also replaced the inner loop with a range-based loop.

Modified the printing of the memchecks to separate the groups withing each
memcheck. The new format looks like:

Check 0:
   Comparing group 0:
     ....
   Against group 1:
     ....
Check 1:

etc

Updated the tests to use this new format.

Hi Adam,

Thanks! Please see the updated version. It should contain all of the requested changes except one. I've added a comment for that one in the review and further comments in the code to explain it.

Thanks,
Silviu

lib/Analysis/LoopAccessAnalysis.cpp
235–236 ↗	(On Diff #28564)	I think it is correct. When we reach the comparisons limit we want to default to creating a new group for each pointer. Otherwise if we would break from the loop nest we would end up ignoring the pointers that we didn't process. I've also remembered why I didn't previously fold the increment to TotalComparisons into the if statement. Once we reach the threshold we don't want to increment it further.

LGTM, thank you!

lib/Analysis/LoopAccessAnalysis.cpp
247–252 ↗	(On Diff #28657)	No {}
292–295 ↗	(On Diff #28657)	I would drop this. Also moving the def of N right before the loop would be a good thing.
552–553 ↗	(On Diff #28657)	For the record, this is not where want to call this in the long run but I am OK with this for now. DepCands is not exposed right now but moving forward we want to initiate the grouping from the client so that things like PtrPartitions could be passed in. I have some plans for how to do this so this will change anyway pretty soon.
test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll
76–89 ↗	(On Diff #28657)	I'd have a slight preference to fully match these lines rather than the number of checks/groups.

This revision is now accepted and ready to land.Jun 29 2015, 5:21 PM

anemet added inline comments.Jun 30 2015, 10:52 AM

lib/Analysis/LoopAccessAnalysis.cpp
222 ↗	(On Diff #28657)	Same no struct here either.
231 ↗	(On Diff #28657)	Sorry didn't notice these the first time around: no need for struct in C++ since this is no longer an iterator, we should probably call it Group or G or whatever

Constrain grouping algorithm to only merge pointers with maximum distance of 512.
This stops a (pottential) 20% regression in TSVC/Symbolics-flt. Added a regression test for this.

Remove redundant 'struct' keyword, renamed group iterator to 'Group'.
Update tests to fully match lines with CHECK-NEXT.

Hi Adam,

Sorry for delaying the commit, but I have a last minute change for this. It seems that this change would cause a regression in the TSVC/Symbolics-flt benchmark.

The regression happens when executing a code that looks like this:

struct Data{

int a[LEN];
int B[LEN];

};

void test(int m) {

for (int i = 0; i < m; ++i)
  Data.a[i + m] = Data.a[i] + Data.b[i];

}

a[i] and b[i] get merged according to the algorithm. However the resulting interval covers a[i + m], and the runtime check fails.
I've limited the merging algorithm to only merge when the distance to the max/min is less then a constant (I've chosen 512 for this), which should prevent this issues from happening (at least for loops with more than 512 iterations). Does this sound reasonable?

That benchmark was the only regression that I've seen (lnt, spec2k,spec2k), but I've not seen any improvements either - so either further tuning is required or we would need to turn on the interleaved accesses to see any gains.

Thanks,
Silviu

Hi Silviu,

In D10386#197861, @sbaranga wrote:
The regression happens when executing a code that looks like this:

struct Data{
int a[LEN];
int B[LEN];
};

void test(int m) {
for (int i = 0; i < m; ++i)
  Data.a[i + m] = Data.a[i] + Data.b[i];
}

a[i] and b[i] get merged according to the algorithm. However the resulting interval covers a[i + m], and the runtime check fails.

This is the way I look at this problem, please correct me if I am missing something. We have three pointers all pointing to the same underlying object (Data). Let's refer to them as P_am, P_a and P_b respectively.

We were able to determine const distance between P_a and P_b however P_am is not constant. In this case it seems to me that when analyzing P_am, we want to veto the merge of P_a and P_b because as in this case P_am may reside somewhere between the two objects.

This does not seem like a major change to your approach and it does not seem to affect the main use-case. What do you think?

Adam

Hi Adam,

In D10386#197967, @anemet wrote:

This is the way I look at this problem, please correct me if I am missing something. We have three pointers all pointing to the same underlying object (Data). Let's refer to them as P_am, P_a and P_b respectively.

We were able to determine const distance between P_a and P_b however P_am is not constant. In this case it seems to me that when analyzing P_am, we want to veto the merge of P_a and P_b because as in this case P_am may reside somewhere between the two objects.

This does not seem like a major change to your approach and it does not seem to affect the main use-case. What do you think?

Yes, that would also solve the cases I've seen so far and seems sensible. At this point there isn't any obvious best solution, so I have no preference.

FWIW we always want to group pointers separated by a small constant (if we get a false positive it means that the trip count was small), so veto-ing groups with such pointers would pessimize this case. I'm not sure how frequent the case is though.

Thanks,
Silviu

Updated the memcheck grouping algorithm to only use DepCands
if DepCands have been computed. If not, make a separate group
for each pointer. Previously we were using DepCands even when
DepCands wasn't available, and this could have produced
incorrect results.

This should also fixes the algorithm for the case of unknown
dependencies: in this case, we will retry to group pointers
without using dependencies, and create a separate group for
each pointer.

Hi Adam,

Further digging into this has revealed that we should only get into the case where we get the regression if we have an unknown dependency - and we should retry without dependencies. In this case we don't have DepCands so we can't use the current algorithm to group pointers. Do you agree with this?

I've fixed the algorithm to not group pointers in this case (this should be inline with previous comments).

Thanks,
Silviu

sbaranga closed this revision.Jul 8 2015, 2:16 AM

Closed by commit rL241673: [LAA] Merge memchecks for accesses separated by a constant offset (authored by sbaranga). · Explain WhyJul 8 2015, 2:16 AM

This revision was automatically updated to reflect the committed changes.

Committed in r241673. Thanks for all the help!

-Silviu

In D10386#199603, @sbaranga wrote:

Further digging into this has revealed that we should only get into the case where we get the regression if we have an unknown dependency - and we should retry without dependencies. In this case we don't have DepCands so we can't use the current algorithm to group pointers. Do you agree with this?

No, I don't think that's quite correct.

If I understand the situation correctly, we have two accesses of the same underlying object with non-const distance. So we give up dependence analysis with "LAA: Retrying with memory checks".

The problem is that we can have unknown distance between pointers of the same underlying object also if the stride is unknown (look for the return Dependence::Unknown before the ShouldRetryWithRuntimeCheck in isDependent). In this case we don't give up on dependence analysis so your flag would still be true.

I would think that you should be able to create a testcase with three pointers on the same objects two with known stride and one without. In this case too we would merge all three but the one with unknown stride to cause the same issue you discovered.

If this is all correct, I still think the best way is the veto-ing algorithm I described originally.

I've fixed the algorithm to not group pointers in this case (this should be inline with previous comments).

Please don't forget to add the testcase that prompted the new version.

Hi Adam,

In D10386#201172, @anemet wrote:

In D10386#199603, @sbaranga wrote:

Further digging into this has revealed that we should only get into the case where we get the regression if we have an unknown dependency - and we should retry without dependencies. In this case we don't have DepCands so we can't use the current algorithm to group pointers. Do you agree with this?

No, I don't think that's quite correct.

If I understand the situation correctly, we have two accesses of the same underlying object with non-const distance. So we give up dependence analysis with "LAA: Retrying with memory checks".

The problem is that we can have unknown distance between pointers of the same underlying object also if the stride is unknown (look for the return Dependence::Unknown before the ShouldRetryWithRuntimeCheck in isDependent). In this case we don't give up on dependence analysis so your flag would still be true.

I would think that you should be able to create a testcase with three pointers on the same objects two with known stride and one without. In this case too we would merge all three but the one with unknown stride to cause the same issue you discovered.

If this is all correct, I still think the best way is the veto-ing algorithm I described originally.

Ok, that makes sense. I already have the updated veto-ing algorithm, but I'd like to add it under a different review instead of rolling everything back. Is that ok with you?

I've fixed the algorithm to not group pointers in this case (this should be inline with previous comments).

Please don't forget to add the testcase that prompted the new version.

I'll commit a test case for this.

Thanks,
Silviu

In D10386#201889, @sbaranga wrote:

Ok, that makes sense. I already have the updated veto-ing algorithm, but I'd like to add it under a different review instead of rolling everything back. Is that ok with you?

Sure, thanks.

Hi Adam,

In D10386#202281, @anemet wrote:

In D10386#201889, @sbaranga wrote:

Ok, that makes sense. I already have the updated veto-ing algorithm, but I'd like to add it under a different review instead of rolling everything back. Is that ok with you?

Sure, thanks.

I'm still having trouble with producing a testcase for the 'veto' approach.

Looking at the logic in analyzeLoops:

In order to need memory checks, first we need Accesses.isDependencyCheckNeeded() to evaluate to true.
Since we need an unknown dependence between two objects to get into the 'veto' case, it follows that areDepsSafe will return false (we have al least one unknown dependence).

This causes us to take the "!CanVecMem && DepChecker.shouldRetryWithRuntimeCheck()" branch and reset the dependence checks, which in turn will cause us to not group pointers.

I'm sure we could end up in the 'veto' case by tweaking the logic in 'analyzeLoops', but not sure how to do that without this. Can you see any issues with the reasoning above?

Thanks,
Silviu

Hi Silviu,

In D10386#205474, @sbaranga wrote:

Looking at the logic in analyzeLoops:

In order to need memory checks, first we need Accesses.isDependencyCheckNeeded() to evaluate to true.
Since we need an unknown dependence between two objects to get into the 'veto' case, it follows that areDepsSafe will return false (we have al least one unknown dependence).

This causes us to take the "!CanVecMem && DepChecker.shouldRetryWithRuntimeCheck()" branch and reset the dependence checks, which in turn will cause us to not group pointers.

shouldRetry... should only be set if we didn't find a constant distance between the accesses. This is why I said earlier that:

Was something unclear about this? I can help to come up with a testcase for this if that helps.

Adam

In D10386#206174, @anemet wrote:

Hi Silviu,

In D10386#205474, @sbaranga wrote:

Looking at the logic in analyzeLoops:

In order to need memory checks, first we need Accesses.isDependencyCheckNeeded() to evaluate to true.
Since we need an unknown dependence between two objects to get into the 'veto' case, it follows that areDepsSafe will return false (we have al least one unknown dependence).

This causes us to take the "!CanVecMem && DepChecker.shouldRetryWithRuntimeCheck()" branch and reset the dependence checks, which in turn will cause us to not group pointers.

shouldRetry... should only be set if we didn't find a constant distance between the accesses. This is why I said earlier that:

The problem is that we can have unknown distance between pointers of the same underlying object also if the stride is unknown (look for the return Dependence::Unknown before the ShouldRetryWithRuntimeCheck in isDependent). In this case we don't give up on dependence analysis so your flag would still be true.

Was something unclear about this? I can help to come up with a testcase for this if that helps.

Adam

I think I'm getting tunnel vision on this.. I would be really grateful if you could help to come up with a testcase.

Thanks,
Silviu

Ah, OK, I think I understand what's going on.

We simply can't have memchecks between pointers of the same underlying objects *unless* shouldRetryWithRuntimeChecks. I don't think that this what you were trying to explain but if yes, sorry for not understanding.

Anyhow, I think in this case the code is good, we just have to explain the situation in the comment before the UseDependenies check (was this the only reason for the check?). I.e. why the offending check can only occur with shouldRetryWithRuntimeChecks.

Please also mention shouldRetryWithRuntimeChecks in the testcase you added earlier. It took me a while to figure out why we wanted to compare pointers with the same underlying object. (That is kinda the point.)

Thanks for looking into this!

In D10386#208783, @anemet wrote:

Ah, OK, I think I understand what's going on.

We simply can't have memchecks between pointers of the same underlying objects *unless* shouldRetryWithRuntimeChecks. I don't think that this what you were trying to explain but if yes, sorry for not understanding.

Anyhow, I think in this case the code is good, we just have to explain the situation in the comment before the UseDependenies check (was this the only reason for the check?). I.e. why the offending check can only occur with shouldRetryWithRuntimeChecks.

We also need this for correctness. The algorithm assumes that pointers in the same equivalence class don't need checking against each other, which would be false in this case.

Please also mention shouldRetryWithRuntimeChecks in the testcase you added earlier. It took me a while to figure out why we wanted to compare pointers with the same underlying object. (That is kinda the point.)

I've added comments to explain this in r243416.

Thanks,
Silviu

In D10386#213466, @sbaranga wrote:

Thanks for looking into this!

In D10386#208783, @anemet wrote:

Ah, OK, I think I understand what's going on.

We simply can't have memchecks between pointers of the same underlying objects *unless* shouldRetryWithRuntimeChecks. I don't think that this what you were trying to explain but if yes, sorry for not understanding.

Anyhow, I think in this case the code is good, we just have to explain the situation in the comment before the UseDependenies check (was this the only reason for the check?). I.e. why the offending check can only occur with shouldRetryWithRuntimeChecks.

We also need this for correctness. The algorithm assumes that pointers in the same equivalence class don't need checking against each other, which would be false in this case.

Yep.

Please also mention shouldRetryWithRuntimeChecks in the testcase you added earlier. It took me a while to figure out why we wanted to compare pointers with the same underlying object. (That is kinda the point.)

I've added comments to explain this in r243416.

Excellent, thank you!

Thanks,
Silviu

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

59 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

253 lines

test/

Analysis/

LoopAccessAnalysis/

number-of-memchecks.ll

172 lines

resort-to-memchecks-only.ll

2 lines

unsafe-and-rt-checks.ll

6 lines

Transforms/

LoopDistribute/

basic-with-memchecks.ll

8 lines

Diff 29252

llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 305 Lines • ▼ Show 20 Lines
/// generates run-time checks to prove independence. This is done by		/// generates run-time checks to prove independence. This is done by
/// AccessAnalysis::canCheckPtrAtRT and the checks are maintained by the		/// AccessAnalysis::canCheckPtrAtRT and the checks are maintained by the
/// RuntimePointerCheck class.		/// RuntimePointerCheck class.
class LoopAccessInfo {		class LoopAccessInfo {
public:		public:
/// This struct holds information about the memory runtime legality check that		/// This struct holds information about the memory runtime legality check that
/// a group of pointers do not overlap.		/// a group of pointers do not overlap.
struct RuntimePointerCheck {		struct RuntimePointerCheck {
RuntimePointerCheck() : Need(false) {}		RuntimePointerCheck(ScalarEvolution *SE) : Need(false), SE(SE) {}

/// Reset the state of the pointer runtime information.		/// Reset the state of the pointer runtime information.
void reset() {		void reset() {
Need = false;		Need = false;
Pointers.clear();		Pointers.clear();
Starts.clear();		Starts.clear();
Ends.clear();		Ends.clear();
IsWritePtr.clear();		IsWritePtr.clear();
DependencySetId.clear();		DependencySetId.clear();
AliasSetId.clear();		AliasSetId.clear();
		Exprs.clear();
}		}

/// Insert a pointer and calculate the start and end SCEVs.		/// Insert a pointer and calculate the start and end SCEVs.
void insert(ScalarEvolution SE, Loop Lp, Value *Ptr, bool WritePtr,		void insert(Loop Lp, Value Ptr, bool WritePtr, unsigned DepSetId,
unsigned DepSetId, unsigned ASId,		unsigned ASId, const ValueToValueMap &Strides);
const ValueToValueMap &Strides);

/// \brief No run-time memory checking is necessary.		/// \brief No run-time memory checking is necessary.
bool empty() const { return Pointers.empty(); }		bool empty() const { return Pointers.empty(); }

		/// A grouping of pointers. A single memcheck is required between
		/// two groups.
		struct CheckingPtrGroup {
		/// \brief Create a new pointer checking group containing a single
		/// pointer, with index \p Index in RtCheck.
		CheckingPtrGroup(unsigned Index, RuntimePointerCheck &RtCheck)
		: RtCheck(RtCheck), High(RtCheck.Ends[Index]),
		Low(RtCheck.Starts[Index]) {
		Members.push_back(Index);
		}

		/// \brief Tries to add the pointer recorded in RtCheck at index
		/// \p Index to this pointer checking group. We can only add a pointer
		/// to a checking group if we will still be able to get
		/// the upper and lower bounds of the check. Returns true in case
		/// of success, false otherwise.
		bool addPointer(unsigned Index);

		/// Constitutes the context of this pointer checking group. For each
		/// pointer that is a member of this group we will retain the index
		/// at which it appears in RtCheck.
		RuntimePointerCheck &RtCheck;
		/// The SCEV expression which represents the upper bound of all the
		/// pointers in this group.
		const SCEV *High;
		/// The SCEV expression which represents the lower bound of all the
		/// pointers in this group.
		const SCEV *Low;
		/// Indices of all the pointers that constitute this grouping.
		SmallVector<unsigned, 2> Members;
		};

		/// \brief Groups pointers such that a single memcheck is required
		/// between two different groups. This will clear the CheckingGroups vector
		/// and re-compute it. We will only group dependecies if \p UseDependencies
		/// is true, otherwise we will create a separate group for each pointer.
		void groupChecks(MemoryDepChecker::DepCandidates &DepCands,
		bool UseDependencies);

/// \brief Decide whether we need to issue a run-time check for pointer at		/// \brief Decide whether we need to issue a run-time check for pointer at
/// index \p I and \p J to prove their independence.		/// index \p I and \p J to prove their independence.
///		///
/// If \p PtrPartition is set, it contains the partition number for		/// If \p PtrPartition is set, it contains the partition number for
/// pointers (-1 if the pointer belongs to multiple partitions). In this		/// pointers (-1 if the pointer belongs to multiple partitions). In this
/// case omit checks between pointers belonging to the same partition.		/// case omit checks between pointers belonging to the same partition.
bool needsChecking(unsigned I, unsigned J,		bool needsChecking(unsigned I, unsigned J,
const SmallVectorImpl<int> *PtrPartition) const;		const SmallVectorImpl<int> *PtrPartition) const;

		/// \brief Decide if we need to add a check between two groups of pointers,
		/// according to needsChecking.
		bool needsChecking(const CheckingPtrGroup &M,
		const CheckingPtrGroup &N,
		const SmallVectorImpl<int> *PtrPartition) const;

/// \brief Return true if any pointer requires run-time checking according		/// \brief Return true if any pointer requires run-time checking according
/// to needsChecking.		/// to needsChecking.
bool needsAnyChecking(const SmallVectorImpl<int> *PtrPartition) const;		bool needsAnyChecking(const SmallVectorImpl<int> *PtrPartition) const;

/// \brief Returns the number of run-time checks required according to		/// \brief Returns the number of run-time checks required according to
/// needsChecking.		/// needsChecking.
unsigned getNumberOfChecks(const SmallVectorImpl<int> *PtrPartition) const;		unsigned getNumberOfChecks(const SmallVectorImpl<int> *PtrPartition) const;

Show All 15 Lines	struct RuntimePointerCheck {
SmallVector<const SCEV*, 2> Ends;		SmallVector<const SCEV*, 2> Ends;
/// Holds the information if this pointer is used for writing to memory.		/// Holds the information if this pointer is used for writing to memory.
SmallVector<bool, 2> IsWritePtr;		SmallVector<bool, 2> IsWritePtr;
/// Holds the id of the set of pointers that could be dependent because of a		/// Holds the id of the set of pointers that could be dependent because of a
/// shared underlying object.		/// shared underlying object.
SmallVector<unsigned, 2> DependencySetId;		SmallVector<unsigned, 2> DependencySetId;
/// Holds the id of the disjoint alias set to which this pointer belongs.		/// Holds the id of the disjoint alias set to which this pointer belongs.
SmallVector<unsigned, 2> AliasSetId;		SmallVector<unsigned, 2> AliasSetId;
		/// Holds at position i the SCEV for the access i
		SmallVector<const SCEV *, 2> Exprs;
		/// Holds a partitioning of pointers into "check groups".
		SmallVector<CheckingPtrGroup, 2> CheckingGroups;
		/// Holds a pointer to the ScalarEvolution analysis.
		ScalarEvolution *SE;
};		};

LoopAccessInfo(Loop L, ScalarEvolution SE, const DataLayout &DL,		LoopAccessInfo(Loop L, ScalarEvolution SE, const DataLayout &DL,
const TargetLibraryInfo TLI, AliasAnalysis AA,		const TargetLibraryInfo TLI, AliasAnalysis AA,
DominatorTree DT, LoopInfo LI,		DominatorTree DT, LoopInfo LI,
const ValueToValueMap &Strides);		const ValueToValueMap &Strides);

/// Return true we can analyze the memory accesses in the loop and there are		/// Return true we can analyze the memory accesses in the loop and there are
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines

static cl::opt<unsigned, true> RuntimeMemoryCheckThreshold(		static cl::opt<unsigned, true> RuntimeMemoryCheckThreshold(
"runtime-memory-check-threshold", cl::Hidden,		"runtime-memory-check-threshold", cl::Hidden,
cl::desc("When performing memory disambiguation checks at runtime do not "		cl::desc("When performing memory disambiguation checks at runtime do not "
"generate more than this number of comparisons (default = 8)."),		"generate more than this number of comparisons (default = 8)."),
cl::location(VectorizerParams::RuntimeMemoryCheckThreshold), cl::init(8));		cl::location(VectorizerParams::RuntimeMemoryCheckThreshold), cl::init(8));
unsigned VectorizerParams::RuntimeMemoryCheckThreshold;		unsigned VectorizerParams::RuntimeMemoryCheckThreshold;

		/// \brief The maximum iterations used to merge memory checks
		static cl::opt<unsigned> MemoryCheckMergeThreshold(
		"memory-check-merge-threshold", cl::Hidden,
		cl::desc("Maximum number of comparisons done when trying to merge "
		"runtime memory checks. (default = 100)"),
		cl::init(100));

/// Maximum SIMD width.		/// Maximum SIMD width.
const unsigned VectorizerParams::MaxVectorWidth = 64;		const unsigned VectorizerParams::MaxVectorWidth = 64;

/// \brief We collect interesting dependences up to this threshold.		/// \brief We collect interesting dependences up to this threshold.
static cl::opt<unsigned> MaxInterestingDependence(		static cl::opt<unsigned> MaxInterestingDependence(
"max-interesting-dependences", cl::Hidden,		"max-interesting-dependences", cl::Hidden,
cl::desc("Maximum number of interesting dependences collected by "		cl::desc("Maximum number of interesting dependences collected by "
"loop-access analysis (default = 100)"),		"loop-access analysis (default = 100)"),
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (SI != PtrToStride.end()) {
return ByOne;		return ByOne;
}		}

// Otherwise, just return the SCEV of the original pointer.		// Otherwise, just return the SCEV of the original pointer.
return SE->getSCEV(Ptr);		return SE->getSCEV(Ptr);
}		}

void LoopAccessInfo::RuntimePointerCheck::insert(		void LoopAccessInfo::RuntimePointerCheck::insert(
ScalarEvolution SE, Loop Lp, Value *Ptr, bool WritePtr, unsigned DepSetId,		Loop Lp, Value Ptr, bool WritePtr, unsigned DepSetId, unsigned ASId,
unsigned ASId, const ValueToValueMap &Strides) {		const ValueToValueMap &Strides) {
// Get the stride replaced scev.		// Get the stride replaced scev.
const SCEV *Sc = replaceSymbolicStrideSCEV(SE, Strides, Ptr);		const SCEV *Sc = replaceSymbolicStrideSCEV(SE, Strides, Ptr);
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);
assert(AR && "Invalid addrec expression");		assert(AR && "Invalid addrec expression");
const SCEV *Ex = SE->getBackedgeTakenCount(Lp);		const SCEV *Ex = SE->getBackedgeTakenCount(Lp);
const SCEV ScEnd = AR->evaluateAtIteration(Ex, SE);		const SCEV ScEnd = AR->evaluateAtIteration(Ex, SE);
Pointers.push_back(Ptr);		Pointers.push_back(Ptr);
Starts.push_back(AR->getStart());		Starts.push_back(AR->getStart());
Ends.push_back(ScEnd);		Ends.push_back(ScEnd);
IsWritePtr.push_back(WritePtr);		IsWritePtr.push_back(WritePtr);
DependencySetId.push_back(DepSetId);		DependencySetId.push_back(DepSetId);
AliasSetId.push_back(ASId);		AliasSetId.push_back(ASId);
		Exprs.push_back(Sc);
		}

		bool LoopAccessInfo::RuntimePointerCheck::needsChecking(
		const CheckingPtrGroup &M, const CheckingPtrGroup &N,
		const SmallVectorImpl<int> *PtrPartition) const {
		for (unsigned I = 0, EI = M.Members.size(); EI != I; ++I)
		for (unsigned J = 0, EJ = N.Members.size(); EJ != J; ++J)
		if (needsChecking(M.Members[I], N.Members[J], PtrPartition))
		return true;
		return false;
		}

		/// Compare \p I and \p J and return the minimum.
		/// Return nullptr in case we couldn't find an answer.
		static const SCEV getMinFromExprs(const SCEV I, const SCEV *J,
		ScalarEvolution *SE) {
		const SCEV *Diff = SE->getMinusSCEV(J, I);
		const SCEVConstant *C = dyn_cast<const SCEVConstant>(Diff);

		if (!C)
		return nullptr;
		if (C->getValue()->isNegative())
		return J;
		return I;
		}

		bool LoopAccessInfo::RuntimePointerCheck::CheckingPtrGroup::addPointer(
		unsigned Index) {
		// Compare the starts and ends with the known minimum and maximum
		// of this set. We need to know how we compare against the min/max
		// of the set in order to be able to emit memchecks.
		const SCEV *Min0 = getMinFromExprs(RtCheck.Starts[Index], Low, RtCheck.SE);
		if (!Min0)
		return false;

		const SCEV *Min1 = getMinFromExprs(RtCheck.Ends[Index], High, RtCheck.SE);
		if (!Min1)
		return false;

		// Update the low bound expression if we've found a new min value.
		if (Min0 == RtCheck.Starts[Index])
		Low = RtCheck.Starts[Index];

		// Update the high bound expression if we've found a new max value.
		if (Min1 != RtCheck.Ends[Index])
		High = RtCheck.Ends[Index];

		Members.push_back(Index);
		return true;
		}

		void LoopAccessInfo::RuntimePointerCheck::groupChecks(
		MemoryDepChecker::DepCandidates &DepCands,
		bool UseDependencies) {
		// We build the groups from dependency candidates equivalence classes
		// because:
		// - We know that pointers in the same equivalence class share
		// the same underlying object and therefore there is a chance
		// that we can compare pointers
		// - We wouldn't be able to merge two pointers for which we need
		// to emit a memcheck. The classes in DepCands are already
		// conveniently built such that no two pointers in the same
		// class need checking against each other.

		// We use the following (greedy) algorithm to construct the groups
		// For every pointer in the equivalence class:
		// For each existing group:
		// - if the difference between this pointer and the min/max bounds
		// of the group is a constant, then make the pointer part of the
		// group and update the min/max bounds of that group as required.

		CheckingGroups.clear();

		// If we don't have the dependency partitions, construct a new
		// checking pointer group for each pointer.
		if (!UseDependencies) {
		for (unsigned I = 0; I < Pointers.size(); ++I)
		CheckingGroups.push_back(CheckingPtrGroup(I, *this));
		return;
		}

		unsigned TotalComparisons = 0;

		DenseMap<Value *, unsigned> PositionMap;
		for (unsigned Pointer = 0; Pointer < Pointers.size(); ++Pointer)
		PositionMap[Pointers[Pointer]] = Pointer;

		// Go through all equivalence classes, get the the "pointer check groups"
		// and add them to the overall solution.
		for (auto DI = DepCands.begin(), DE = DepCands.end(); DI != DE; ++DI) {
		if (!DI->isLeader())
		continue;

		SmallVector<CheckingPtrGroup, 2> Groups;

		for (auto MI = DepCands.member_begin(DI), ME = DepCands.member_end();
		MI != ME; ++MI) {
		unsigned Pointer = PositionMap[MI->getPointer()];
		bool Merged = false;

		// Go through all the existing sets and see if we can find one
		// which can include this pointer.
		for (CheckingPtrGroup &Group : Groups) {
		// Don't perform more than a certain amount of comparisons.
		// This should limit the cost of grouping the pointers to something
		// reasonable. If we do end up hitting this threshold, the algorithm
		// will create separate groups for all remaining pointers.
		if (TotalComparisons > MemoryCheckMergeThreshold)
		break;

		TotalComparisons++;

		if (Group.addPointer(Pointer)) {
		Merged = true;
		break;
		}
		}

		if (!Merged)
		// We couldn't add this pointer to any existing set or the threshold
		// for the number of comparisons has been reached. Create a new group
		// to hold the current pointer.
		Groups.push_back(CheckingPtrGroup(Pointer, *this));
		}

		// We've computed the grouped checks for this partition.
		// Save the results and continue with the next one.
		std::copy(Groups.begin(), Groups.end(), std::back_inserter(CheckingGroups));
		}
}		}

bool LoopAccessInfo::RuntimePointerCheck::needsChecking(		bool LoopAccessInfo::RuntimePointerCheck::needsChecking(
unsigned I, unsigned J, const SmallVectorImpl<int> *PtrPartition) const {		unsigned I, unsigned J, const SmallVectorImpl<int> *PtrPartition) const {
// No need to check if two readonly pointers intersect.		// No need to check if two readonly pointers intersect.
if (!IsWritePtr[I] && !IsWritePtr[J])		if (!IsWritePtr[I] && !IsWritePtr[J])
return false;		return false;

Show All 13 Lines	if (PtrPartition && (*PtrPartition)[I] != -1 &&
return false;		return false;

return true;		return true;
}		}

void LoopAccessInfo::RuntimePointerCheck::print(		void LoopAccessInfo::RuntimePointerCheck::print(
raw_ostream &OS, unsigned Depth,		raw_ostream &OS, unsigned Depth,
const SmallVectorImpl<int> *PtrPartition) const {		const SmallVectorImpl<int> *PtrPartition) const {
unsigned NumPointers = Pointers.size();
if (NumPointers == 0)
return;

OS.indent(Depth) << "Run-time memory checks:\n";		OS.indent(Depth) << "Run-time memory checks:\n";

unsigned N = 0;		unsigned N = 0;
for (unsigned I = 0; I < NumPointers; ++I)		for (unsigned I = 0; I < CheckingGroups.size(); ++I)
for (unsigned J = I + 1; J < NumPointers; ++J)		for (unsigned J = I + 1; J < CheckingGroups.size(); ++J)
if (needsChecking(I, J, PtrPartition)) {		if (needsChecking(CheckingGroups[I], CheckingGroups[J], PtrPartition)) {
OS.indent(Depth) << N++ << ":\n";		OS.indent(Depth) << "Check " << N++ << ":\n";
OS.indent(Depth + 2) << *Pointers[I];		OS.indent(Depth + 2) << "Comparing group " << I << ":\n";

		for (unsigned K = 0; K < CheckingGroups[I].Members.size(); ++K) {
		OS.indent(Depth + 2) << *Pointers[CheckingGroups[I].Members[K]]
		<< "\n";
if (PtrPartition)		if (PtrPartition)
OS << " (Partition: " << (*PtrPartition)[I] << ")";		OS << " (Partition: "
OS << "\n";		<< (*PtrPartition)[CheckingGroups[I].Members[K]] << ")"
OS.indent(Depth + 2) << *Pointers[J];		<< "\n";
		}

		OS.indent(Depth + 2) << "Against group " << J << ":\n";

		for (unsigned K = 0; K < CheckingGroups[J].Members.size(); ++K) {
		OS.indent(Depth + 2) << *Pointers[CheckingGroups[J].Members[K]]
		<< "\n";
if (PtrPartition)		if (PtrPartition)
OS << " (Partition: " << (*PtrPartition)[J] << ")";		OS << " (Partition: "
OS << "\n";		<< (*PtrPartition)[CheckingGroups[J].Members[K]] << ")"
		<< "\n";
		}
		}

		OS.indent(Depth) << "Grouped accesses:\n";
		for (unsigned I = 0; I < CheckingGroups.size(); ++I) {
		OS.indent(Depth + 2) << "Group " << I << ":\n";
		OS.indent(Depth + 4) << "(Low: " << *CheckingGroups[I].Low
		<< " High: " << *CheckingGroups[I].High << ")\n";
		for (unsigned J = 0; J < CheckingGroups[I].Members.size(); ++J) {
		OS.indent(Depth + 6) << "Member: " << *Exprs[CheckingGroups[I].Members[J]]
		<< "\n";
		}
}		}
}		}

unsigned LoopAccessInfo::RuntimePointerCheck::getNumberOfChecks(		unsigned LoopAccessInfo::RuntimePointerCheck::getNumberOfChecks(
const SmallVectorImpl<int> *PtrPartition) const {		const SmallVectorImpl<int> *PtrPartition) const {
unsigned NumPointers = Pointers.size();
		unsigned NumPartitions = CheckingGroups.size();
unsigned CheckCount = 0;		unsigned CheckCount = 0;

for (unsigned I = 0; I < NumPointers; ++I)		for (unsigned I = 0; I < NumPartitions; ++I)
for (unsigned J = I + 1; J < NumPointers; ++J)		for (unsigned J = I + 1; J < NumPartitions; ++J)
if (needsChecking(I, J, PtrPartition))		if (needsChecking(CheckingGroups[I], CheckingGroups[J], PtrPartition))
CheckCount++;		CheckCount++;
return CheckCount;		return CheckCount;
}		}

bool LoopAccessInfo::RuntimePointerCheck::needsAnyChecking(		bool LoopAccessInfo::RuntimePointerCheck::needsAnyChecking(
const SmallVectorImpl<int> *PtrPartition) const {		const SmallVectorImpl<int> *PtrPartition) const {
return getNumberOfChecks(PtrPartition) != 0;		unsigned NumPointers = Pointers.size();

		for (unsigned I = 0; I < NumPointers; ++I)
		for (unsigned J = I + 1; J < NumPointers; ++J)
		if (needsChecking(I, J, PtrPartition))
		return true;
		return false;
}		}

namespace {		namespace {
/// \brief Analyses memory accesses in a loop.		/// \brief Analyses memory accesses in a loop.
///		///
/// Checks whether run time pointer checks are needed and builds sets for data		/// Checks whether run time pointer checks are needed and builds sets for data
/// dependence checking.		/// dependence checking.
class AccessAnalysis {		class AccessAnalysis {
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	for (auto A : AS) {
unsigned &LeaderId = DepSetId[Leader];		unsigned &LeaderId = DepSetId[Leader];
if (!LeaderId)		if (!LeaderId)
LeaderId = RunningDepId++;		LeaderId = RunningDepId++;
DepId = LeaderId;		DepId = LeaderId;
} else		} else
// Each access has its own dependence set.		// Each access has its own dependence set.
DepId = RunningDepId++;		DepId = RunningDepId++;

RtCheck.insert(SE, TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap);		RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap);

DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');		DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');
} else {		} else {
DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');		DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');
CanDoRT = false;		CanDoRT = false;
}		}
}		}

Show All 29 Lines	for (unsigned j = i + 1; j < NumPointers; ++j) {
if (ASi != ASj) {		if (ASi != ASj) {
DEBUG(dbgs() << "LAA: Runtime check would require comparison between"		DEBUG(dbgs() << "LAA: Runtime check would require comparison between"
" different address spaces\n");		" different address spaces\n");
return false;		return false;
}		}
}		}
}		}

		if (NeedRTCheck && CanDoRT)
		RtCheck.groupChecks(DepCands, IsDepCheckNeeded);

return CanDoRT;		return CanDoRT;
}		}

void AccessAnalysis::processMemAccesses() {		void AccessAnalysis::processMemAccesses() {
// We process the set twice: first we process read-write pointers, last we		// We process the set twice: first we process read-write pointers, last we
// process read-only pointers. This allows us to skip dependence tests for		// process read-only pointers. This allows us to skip dependence tests for
// read-only pointers.		// read-only pointers.

▲ Show 20 Lines • Show All 957 Lines • ▼ Show 20 Lines	static Instruction getFirstInst(Instruction FirstInst, Value *V,
return nullptr;		return nullptr;
}		}

std::pair<Instruction , Instruction > LoopAccessInfo::addRuntimeCheck(		std::pair<Instruction , Instruction > LoopAccessInfo::addRuntimeCheck(
Instruction Loc, const SmallVectorImpl<int> PtrPartition) const {		Instruction Loc, const SmallVectorImpl<int> PtrPartition) const {
if (!PtrRtCheck.Need)		if (!PtrRtCheck.Need)
return std::make_pair(nullptr, nullptr);		return std::make_pair(nullptr, nullptr);

unsigned NumPointers = PtrRtCheck.Pointers.size();
SmallVector<TrackingVH<Value> , 2> Starts;		SmallVector<TrackingVH<Value>, 2> Starts;
SmallVector<TrackingVH<Value> , 2> Ends;		SmallVector<TrackingVH<Value>, 2> Ends;

LLVMContext &Ctx = Loc->getContext();		LLVMContext &Ctx = Loc->getContext();
SCEVExpander Exp(*SE, DL, "induction");		SCEVExpander Exp(*SE, DL, "induction");
Instruction *FirstInst = nullptr;		Instruction *FirstInst = nullptr;

for (unsigned i = 0; i < NumPointers; ++i) {		for (unsigned i = 0; i < PtrRtCheck.CheckingGroups.size(); ++i) {
Value *Ptr = PtrRtCheck.Pointers[i];		const RuntimePointerCheck::CheckingPtrGroup &CG =
		PtrRtCheck.CheckingGroups[i];
		Value *Ptr = PtrRtCheck.Pointers[CG.Members[0]];
const SCEV *Sc = SE->getSCEV(Ptr);		const SCEV *Sc = SE->getSCEV(Ptr);

if (SE->isLoopInvariant(Sc, TheLoop)) {		if (SE->isLoopInvariant(Sc, TheLoop)) {
DEBUG(dbgs() << "LAA: Adding RT check for a loop invariant ptr:" <<		DEBUG(dbgs() << "LAA: Adding RT check for a loop invariant ptr:" << *Ptr
*Ptr <<"\n");		<< "\n");
Starts.push_back(Ptr);		Starts.push_back(Ptr);
Ends.push_back(Ptr);		Ends.push_back(Ptr);
} else {		} else {
DEBUG(dbgs() << "LAA: Adding RT check for range:" << *Ptr << '\n');
unsigned AS = Ptr->getType()->getPointerAddressSpace();		unsigned AS = Ptr->getType()->getPointerAddressSpace();

// Use this type for pointer arithmetic.		// Use this type for pointer arithmetic.
Type *PtrArithTy = Type::getInt8PtrTy(Ctx, AS);		Type *PtrArithTy = Type::getInt8PtrTy(Ctx, AS);
		Value Start = nullptr, End = nullptr;

Value *Start = Exp.expandCodeFor(PtrRtCheck.Starts[i], PtrArithTy, Loc);		DEBUG(dbgs() << "LAA: Adding RT check for range:\n");
Value *End = Exp.expandCodeFor(PtrRtCheck.Ends[i], PtrArithTy, Loc);		Start = Exp.expandCodeFor(CG.Low, PtrArithTy, Loc);
		End = Exp.expandCodeFor(CG.High, PtrArithTy, Loc);
		DEBUG(dbgs() << "Start: " << CG.Low << " End: " << CG.High << "\n");
Starts.push_back(Start);		Starts.push_back(Start);
Ends.push_back(End);		Ends.push_back(End);
}		}
}		}

IRBuilder<> ChkBuilder(Loc);		IRBuilder<> ChkBuilder(Loc);
// Our instructions might fold to a constant.		// Our instructions might fold to a constant.
Value *MemoryRuntimeCheck = nullptr;		Value *MemoryRuntimeCheck = nullptr;
for (unsigned i = 0; i < NumPointers; ++i) {		for (unsigned i = 0; i < PtrRtCheck.CheckingGroups.size(); ++i) {
for (unsigned j = i+1; j < NumPointers; ++j) {		for (unsigned j = i + 1; j < PtrRtCheck.CheckingGroups.size(); ++j) {
if (!PtrRtCheck.needsChecking(i, j, PtrPartition))		const RuntimePointerCheck::CheckingPtrGroup &CGI =
		PtrRtCheck.CheckingGroups[i];
		const RuntimePointerCheck::CheckingPtrGroup &CGJ =
		PtrRtCheck.CheckingGroups[j];

		if (!PtrRtCheck.needsChecking(CGI, CGJ, PtrPartition))
continue;		continue;

unsigned AS0 = Starts[i]->getType()->getPointerAddressSpace();		unsigned AS0 = Starts[i]->getType()->getPointerAddressSpace();
unsigned AS1 = Starts[j]->getType()->getPointerAddressSpace();		unsigned AS1 = Starts[j]->getType()->getPointerAddressSpace();

assert((AS0 == Ends[j]->getType()->getPointerAddressSpace()) &&		assert((AS0 == Ends[j]->getType()->getPointerAddressSpace()) &&
(AS1 == Ends[i]->getType()->getPointerAddressSpace()) &&		(AS1 == Ends[i]->getType()->getPointerAddressSpace()) &&
"Trying to bounds check pointers with different address spaces");		"Trying to bounds check pointers with different address spaces");
Show All 34 Lines	std::pair<Instruction , Instruction > LoopAccessInfo::addRuntimeCheck(
return std::make_pair(FirstInst, Check);		return std::make_pair(FirstInst, Check);
}		}

LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,		LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,
const DataLayout &DL,		const DataLayout &DL,
const TargetLibraryInfo TLI, AliasAnalysis AA,		const TargetLibraryInfo TLI, AliasAnalysis AA,
DominatorTree DT, LoopInfo LI,		DominatorTree DT, LoopInfo LI,
const ValueToValueMap &Strides)		const ValueToValueMap &Strides)
: DepChecker(SE, L), TheLoop(L), SE(SE), DL(DL),		: PtrRtCheck(SE), DepChecker(SE, L), TheLoop(L), SE(SE), DL(DL), TLI(TLI),
TLI(TLI), AA(AA), DT(DT), LI(LI), NumLoads(0), NumStores(0),		AA(AA), DT(DT), LI(LI), NumLoads(0), NumStores(0),
MaxSafeDepDistBytes(-1U), CanVecMem(false),		MaxSafeDepDistBytes(-1U), CanVecMem(false),
StoreToLoopInvariantAddress(false) {		StoreToLoopInvariantAddress(false) {
if (canAnalyzeLoop())		if (canAnalyzeLoop())
analyzeLoop(Strides);		analyzeLoop(Strides);
}		}

void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {		void LoopAccessInfo::print(raw_ostream &OS, unsigned Depth) const {
if (CanVecMem) {		if (CanVecMem) {
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll

; RUN: opt -loop-accesses -analyze < %s \| FileCheck %s		; RUN: opt -loop-accesses -analyze < %s \| FileCheck %s

; 3 reads and 3 writes should need 12 memchecks

target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnueabi"		target triple = "aarch64--linux-gnueabi"

		; 3 reads and 3 writes should need 12 memchecks
		; CHECK: function 'testf':
; CHECK: Memory dependences are safe with run-time checks		; CHECK: Memory dependences are safe with run-time checks
; Memory dependecies have labels starting from 0, so in
		; Memory dependencies have labels starting from 0, so in
; order to verify that we have n checks, we look for		; order to verify that we have n checks, we look for
; (n-1): and not n:.		; (n-1): and not n:.

; CHECK: Run-time memory checks:		; CHECK: Run-time memory checks:
; CHECK-NEXT: 0:		; CHECK-NEXT: Check 0:
; CHECK: 11:		; CHECK: Check 11:
; CHECK-NOT: 12:		; CHECK-NOT: Check 12:

define void @testf(i16* %a,		define void @testf(i16* %a,
i16* %b,		i16* %b,
i16* %c,		i16* %c,
i16* %d,		i16* %d,
i16* %e,		i16* %e,
i16* %f) {		i16* %f) {
entry:		entry:
Show All 26 Lines	for.body: ; preds = %for.body, %entry
store i16 %mul1, i16* %arrayidxF, align 2		store i16 %mul1, i16* %arrayidxF, align 2

%exitcond = icmp eq i64 %add, 20		%exitcond = icmp eq i64 %add, 20
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret void		ret void
}		}

		; The following (testg and testh) check that we can group
		; memory checks of accesses which differ by a constant value.
		; Both tests are based on the following C code:
		;
		; void testh(short a, short b, short *c) {
		; unsigned long ind = 0;
		; for (unsigned long ind = 0; ind < 20; ++ind) {
		; c[2 * ind] = a[ind] * a[ind + 1];
		; c[2 * ind + 1] = a[ind] * a[ind + 1] * b[ind];
		; }
		; }
		;
		; It is sufficient to check the intervals
		; [a, a + 21], [b, b + 20] against [c, c + 41].

		; 3 reads and 2 writes - two of the reads can be merged,
		; and the writes can be merged as well. This gives us a
		; total of 2 memory checks.

		; CHECK: function 'testg':

		; CHECK: Run-time memory checks:
		; CHECK-NEXT: Check 0:
		; CHECK-NEXT: Comparing group 0:
		; CHECK-NEXT: %arrayidxA1 = getelementptr inbounds i16, i16* %a, i64 %add
		; CHECK-NEXT: %arrayidxA = getelementptr inbounds i16, i16* %a, i64 %ind
		; CHECK-NEXT: Against group 2:
		; CHECK-NEXT: %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
		; CHECK-NEXT: %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
		; CHECK-NEXT: Check 1:
		; CHECK-NEXT: Comparing group 1:
		; CHECK-NEXT: %arrayidxB = getelementptr inbounds i16, i16* %b, i64 %ind
		; CHECK-NEXT: Against group 2:
		; CHECK-NEXT: %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
		; CHECK-NEXT: %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
		; CHECK-NEXT: Grouped accesses:
		; CHECK-NEXT: Group 0:
		; CHECK-NEXT: (Low: %a High: (40 + %a))
		; CHECK-NEXT: Member: {(2 + %a),+,2}
		; CHECK-NEXT: Member: {%a,+,2}
		; CHECK-NEXT: Group 1:
		; CHECK-NEXT: (Low: %b High: (38 + %b))
		; CHECK-NEXT: Member: {%b,+,2}
		; CHECK-NEXT: Group 2:
		; CHECK-NEXT: (Low: %c High: (78 + %c))
		; CHECK-NEXT: Member: {(2 + %c),+,4}
		; CHECK-NEXT: Member: {%c,+,4}

		define void @testg(i16* %a,
		i16* %b,
		i16* %c) {
		entry:
		br label %for.body

		for.body: ; preds = %for.body, %entry
		%ind = phi i64 [ 0, %entry ], [ %add, %for.body ]
		%store_ind = phi i64 [ 0, %entry ], [ %store_ind_next, %for.body ]

		%add = add nuw nsw i64 %ind, 1
		%store_ind_inc = add nuw nsw i64 %store_ind, 1
		%store_ind_next = add nuw nsw i64 %store_ind_inc, 1

		%arrayidxA = getelementptr inbounds i16, i16* %a, i64 %ind
		%loadA = load i16, i16* %arrayidxA, align 2

		%arrayidxA1 = getelementptr inbounds i16, i16* %a, i64 %add
		%loadA1 = load i16, i16* %arrayidxA1, align 2

		%arrayidxB = getelementptr inbounds i16, i16* %b, i64 %ind
		%loadB = load i16, i16* %arrayidxB, align 2

		%mul = mul i16 %loadA, %loadA1
		%mul1 = mul i16 %mul, %loadB

		%arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
		store i16 %mul1, i16* %arrayidxC, align 2

		%arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
		store i16 %mul, i16* %arrayidxC1, align 2

		%exitcond = icmp eq i64 %add, 20
		br i1 %exitcond, label %for.end, label %for.body

		for.end: ; preds = %for.body
		ret void
		}

		; 3 reads and 2 writes - the writes can be merged into a single
		; group, but the GEPs used for the reads are not marked as inbounds.
		; We can still merge them because we are using a unit stride for
		; accesses, so we cannot overflow the GEPs.

		; CHECK: function 'testh':
		; CHECK: Run-time memory checks:
		; CHECK-NEXT: Check 0:
		; CHECK-NEXT: Comparing group 0:
		; CHECK-NEXT: %arrayidxA1 = getelementptr i16, i16* %a, i64 %add
		; CHECK-NEXT: %arrayidxA = getelementptr i16, i16* %a, i64 %ind
		; CHECK-NEXT: Against group 2:
		; CHECK-NEXT: %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
		; CHECK-NEXT: %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
		; CHECK-NEXT: Check 1:
		; CHECK-NEXT: Comparing group 1:
		; CHECK-NEXT: %arrayidxB = getelementptr i16, i16* %b, i64 %ind
		; CHECK-NEXT: Against group 2:
		; CHECK-NEXT: %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
		; CHECK-NEXT: %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
		; CHECK-NEXT: Grouped accesses:
		; CHECK-NEXT: Group 0:
		; CHECK-NEXT: (Low: %a High: (40 + %a))
		; CHECK-NEXT: Member: {(2 + %a),+,2}
		; CHECK-NEXT: Member: {%a,+,2}
		; CHECK-NEXT: Group 1:
		; CHECK-NEXT: (Low: %b High: (38 + %b))
		; CHECK-NEXT: Member: {%b,+,2}
		; CHECK-NEXT: Group 2:
		; CHECK-NEXT: (Low: %c High: (78 + %c))
		; CHECK-NEXT: Member: {(2 + %c),+,4}
		; CHECK-NEXT: Member: {%c,+,4}

		define void @testh(i16* %a,
		i16* %b,
		i16* %c) {
		entry:
		br label %for.body

		for.body: ; preds = %for.body, %entry
		%ind = phi i64 [ 0, %entry ], [ %add, %for.body ]
		%store_ind = phi i64 [ 0, %entry ], [ %store_ind_next, %for.body ]

		%add = add nuw nsw i64 %ind, 1
		%store_ind_inc = add nuw nsw i64 %store_ind, 1
		%store_ind_next = add nuw nsw i64 %store_ind_inc, 1

		%arrayidxA = getelementptr i16, i16* %a, i64 %ind
		%loadA = load i16, i16* %arrayidxA, align 2

		%arrayidxA1 = getelementptr i16, i16* %a, i64 %add
		%loadA1 = load i16, i16* %arrayidxA1, align 2

		%arrayidxB = getelementptr i16, i16* %b, i64 %ind
		%loadB = load i16, i16* %arrayidxB, align 2

		%mul = mul i16 %loadA, %loadA1
		%mul1 = mul i16 %mul, %loadB

		%arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
		store i16 %mul1, i16* %arrayidxC, align 2

		%arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
		store i16 %mul, i16* %arrayidxC1, align 2

		%exitcond = icmp eq i64 %add, 20
		br i1 %exitcond, label %for.end, label %for.body

		for.end: ; preds = %for.body
		ret void
		}

llvm/trunk/test/Analysis/LoopAccessAnalysis/resort-to-memchecks-only.ll

	Show All 9 Lines

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.10.0"			target triple = "x86_64-apple-macosx10.10.0"

	; CHECK: Memory dependences are safe with run-time checks			; CHECK: Memory dependences are safe with run-time checks
	; CHECK-NEXT: Interesting Dependences:			; CHECK-NEXT: Interesting Dependences:
	; CHECK-NEXT: Run-time memory checks:			; CHECK-NEXT: Run-time memory checks:
	; CHECK-NEXT: 0:			; CHECK-NEXT: 0:
				; CHECK-NEXT: Comparing group
	; CHECK-NEXT: %arrayidxA2 = getelementptr inbounds i16, i16* %a, i64 %idx			; CHECK-NEXT: %arrayidxA2 = getelementptr inbounds i16, i16* %a, i64 %idx
				; CHECK-NEXT: Against group
	; CHECK-NEXT: %arrayidxA = getelementptr inbounds i16, i16* %a, i64 %indvar			; CHECK-NEXT: %arrayidxA = getelementptr inbounds i16, i16* %a, i64 %indvar

	@B = common global i16* null, align 8			@B = common global i16* null, align 8
	@A = common global i16* null, align 8			@A = common global i16* null, align 8
	@C = common global i16* null, align 8			@C = common global i16* null, align 8

	define void @f(i64 %offset) {			define void @f(i64 %offset) {
	entry:			entry:
	Show All 31 Lines

llvm/trunk/test/Analysis/LoopAccessAnalysis/unsafe-and-rt-checks.ll

	; RUN: opt -loop-accesses -analyze < %s \| FileCheck %s			; RUN: opt -loop-accesses -analyze < %s \| FileCheck %s

	; Analyze this loop:			; Analyze this loop:
	; for (i = 0; i < n; i++)			; for (i = 0; i < n; i++)
	; A[i + 1] = A[i] * B[i] * C[i];			; A[i + 1] = A[i] * B[i] * C[i];

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.10.0"			target triple = "x86_64-apple-macosx10.10.0"

	; CHECK: Report: unsafe dependent memory operations in loop			; CHECK: Report: unsafe dependent memory operations in loop
	; CHECK-NEXT: Interesting Dependences:			; CHECK-NEXT: Interesting Dependences:
	; CHECK-NEXT: Backward:			; CHECK-NEXT: Backward:
	; CHECK-NEXT: %loadA = load i16, i16* %arrayidxA, align 2 ->			; CHECK-NEXT: %loadA = load i16, i16* %arrayidxA, align 2 ->
	; CHECK-NEXT: store i16 %mul1, i16* %arrayidxA_plus_2, align 2			; CHECK-NEXT: store i16 %mul1, i16* %arrayidxA_plus_2, align 2
	; CHECK: Run-time memory checks:			; CHECK: Run-time memory checks:
	; CHECK-NEXT: 0:			; CHECK-NEXT: 0:
				; CHECK-NEXT: Comparing group
				; CHECK-NEXT: %arrayidxA = getelementptr inbounds i16, i16* %a, i64 %storemerge3
	; CHECK-NEXT: %arrayidxA_plus_2 = getelementptr inbounds i16, i16* %a, i64 %add			; CHECK-NEXT: %arrayidxA_plus_2 = getelementptr inbounds i16, i16* %a, i64 %add
				; CHECK-NEXT: Against group
	; CHECK-NEXT: %arrayidxB = getelementptr inbounds i16, i16* %b, i64 %storemerge3			; CHECK-NEXT: %arrayidxB = getelementptr inbounds i16, i16* %b, i64 %storemerge3
	; CHECK-NEXT: 1:			; CHECK-NEXT: 1:
				; CHECK-NEXT: Comparing group
				; CHECK-NEXT: %arrayidxA = getelementptr inbounds i16, i16* %a, i64 %storemerge3
	; CHECK-NEXT: %arrayidxA_plus_2 = getelementptr inbounds i16, i16* %a, i64 %add			; CHECK-NEXT: %arrayidxA_plus_2 = getelementptr inbounds i16, i16* %a, i64 %add
				; CHECK-NEXT: Against group
	; CHECK-NEXT: %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %storemerge3			; CHECK-NEXT: %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %storemerge3

	@B = common global i16* null, align 8			@B = common global i16* null, align 8
	@A = common global i16* null, align 8			@A = common global i16* null, align 8
	@C = common global i16* null, align 8			@C = common global i16* null, align 8

	define void @f() {			define void @f() {
	entry:			entry:
	Show All 30 Lines

llvm/trunk/test/Transforms/LoopDistribute/basic-with-memchecks.ll

	Show All 26 Lines
	entry:			entry:
	%a = load i32, i32* @A, align 8			%a = load i32, i32* @A, align 8
	%b = load i32, i32* @B, align 8			%b = load i32, i32* @B, align 8
	%c = load i32, i32* @C, align 8			%c = load i32, i32* @C, align 8
	%d = load i32, i32* @D, align 8			%d = load i32, i32* @D, align 8
	%e = load i32, i32* @E, align 8			%e = load i32, i32* @E, align 8
	br label %for.body			br label %for.body

	; We have two compares for each array overlap check which is a total of 10			; We have two compares for each array overlap check.
	; compares.			; Since the checks to A and A + 4 get merged, this will give us a
				; total of 8 compares.
	;			;
	; CHECK: for.body.lver.memcheck:			; CHECK: for.body.lver.memcheck:
	; CHECK: = icmp			; CHECK: = icmp
	; CHECK: = icmp			; CHECK: = icmp

	; CHECK: = icmp			; CHECK: = icmp
	; CHECK: = icmp			; CHECK: = icmp

	; CHECK: = icmp			; CHECK: = icmp
	; CHECK: = icmp			; CHECK: = icmp

	; CHECK: = icmp			; CHECK: = icmp
	; CHECK: = icmp			; CHECK: = icmp

	; CHECK: = icmp
	; CHECK: = icmp

	; CHECK-NOT: = icmp			; CHECK-NOT: = icmp
	; CHECK: br i1 %memcheck.conflict, label %for.body.ph.lver.orig, label %for.body.ph.ldist1			; CHECK: br i1 %memcheck.conflict, label %for.body.ph.lver.orig, label %for.body.ph.ldist1

	; The non-distributed loop that the memchecks fall back on.			; The non-distributed loop that the memchecks fall back on.

	; CHECK: for.body.ph.lver.orig:			; CHECK: for.body.ph.lver.orig:
	; CHECK: br label %for.body.lver.orig			; CHECK: br label %for.body.lver.orig
	; CHECK: for.body.lver.orig:			; CHECK: for.body.lver.orig:
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Merge memchecks for accesses separated by a constant offsetClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 29252

llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp

llvm/trunk/test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll

llvm/trunk/test/Analysis/LoopAccessAnalysis/resort-to-memchecks-only.ll

llvm/trunk/test/Analysis/LoopAccessAnalysis/unsafe-and-rt-checks.ll

llvm/trunk/test/Transforms/LoopDistribute/basic-with-memchecks.ll

[LAA] Merge memchecks for accesses separated by a constant offset
ClosedPublic