This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Merge memchecks for accesses separated by a constant offset
ClosedPublic

Authored by sbaranga on Jun 11 2015, 5:55 AM.

Details

Summary

Often filter-like loops will do memory accesses that are
separated by constant offsets. In these cases it is
common that we will exceed the threshold for the
allowable number of checks.

However, it should be possible to merge such checks,
sice a check of any interval againt two other intervals separated
by a constant offset (a,b), (a+c, b+c) will be equivalent with
a check againt (a, b+c), as long as (a,b) and (a+c, b+c) overlap.
Assuming the loop will be executed for a sufficient number of
iterations, this will be true. If not true, checking against
(a, b+c) is still safe (although not equivalent).

As long as there are no dependencies between two accesses,
we can merge their checks into a single one. We use this
technique to construct groups of accesses, and then check
the intervals associated with the groups instead of
checking the accesses directly.

Diff Detail

Repository
rL LLVM

Event Timeline

sbaranga updated this revision to Diff 27499.Jun 11 2015, 5:55 AM
sbaranga retitled this revision from to [LAA] Merge memchecks for accesses separated by a constant offset.
sbaranga updated this object.
sbaranga edited the test plan for this revision. (Show Details)
sbaranga added a subscriber: Unknown Object (MLST).

Added llvm-commits as a subscriber (I'm adding this comment just to get phabricator to send out notice emails).

anemet added inline comments.Jun 16 2015, 11:16 PM
include/llvm/Analysis/LoopAccessAnalysis.h
350–352 ↗(On Diff #27499)

How about calling it "needsChecking" then (but on Groups now)? "Dependent" does not seem specific enough in this context.

Three slashes and \brief in the comment. There is more of the same later.

351 ↗(On Diff #27499)

You must mean "needsChecking" here.

357–358 ↗(On Diff #27499)

If GroupedChecks is the return value here, it should be returned by value rather than passed-by-reference.

lib/Analysis/LoopAccessAnalysis.cpp
181 ↗(On Diff #27499)

Why isn't this just insert and Leader == I?

190–191 ↗(On Diff #27499)

Why is inbounds relevant here?

209–212 ↗(On Diff #27499)

Please run this through clang-format-diff.py

214 ↗(On Diff #27499)

Can we already have Leader at this point in the EC?

I also think that Value and Leader are not really good names here. Perhaps "Set" and "Pointer"?

I read this far but I am not sure I understand this algorithm. It probably needs a big comment at the beginning explaining what's going on.

Also it looks quadratic. Could we do something better by perhaps analyzing the start values of the ARs and then only trying to match those that share the same base pointer? We may actually already have these "related" pointers in AccessAnalysis::DepCands. (The name is not great. This are candidates for dependence analysis, i.e. pointers that share the same underlying object.)

test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll
1 ↗(On Diff #27499)

I don't understand this change. Now instead of unit-testing a pass, you run the entire -O1 pipeline?!

61–79 ↗(On Diff #27499)

It would be good to have a C version of these loop in comment as well.

sbaranga updated this revision to Diff 27840.Jun 17 2015, 9:26 AM

Ran through clang format to improve formatting.
Added text to explain algorithm and did some renames that
should improve readability.

Thanks! I've uploaded a new version.

-Silviu

include/llvm/Analysis/LoopAccessAnalysis.h
350–352 ↗(On Diff #27499)

Good idea! Should be changed now.

lib/Analysis/LoopAccessAnalysis.cpp
181 ↗(On Diff #27499)

It is. I also renamed I to Pointer for clarity in the newest version.

190–191 ↗(On Diff #27499)

I was concerned that possible pointer wrapping might affect the result of some comparisons.
For example if we have a (a pointer ) and c a positive constant, a + c might be smaller than a if we wrap.

However I see that we are already excluding this case somewhere else for all accesses, so it should be ok to remove this.

214 ↗(On Diff #27499)

We do have Leader (currently renamed) in the EC (the last one being added).

I've renamed to FirstIndexInSet and Pointer in the latest revision.

The algorithm is quadratic. However, the number of pointers is currently bounded (maximum 100 I believe). Also, the existing needsAnyChecking() etc are also quadratic, and we only run this when we would normally run something that is also quadratic, which made me think that it might not be that bad.

I had a look at the DepCands, and it looks like they would need to be processed further as we want to partition into a sets of pointers which don't need checks within each set (and I don't think that's the case with DepCands). I think this would bring us back at a quadratic algorithm.

One thing that we could do is iterate within DepCands for each pointer instead of iterating over the entire set of pointers. Matching only pointers with the same underlying object is also an option.

test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll
1 ↗(On Diff #27499)

Removed the -O1. However we now don't see some functions as being 'safe', but that should be ok for testing this change.

anemet edited edge metadata.Jun 17 2015, 5:24 PM
lib/Analysis/LoopAccessAnalysis.cpp
214 ↗(On Diff #27499)

A DepCands equivalence class does not need *runtime* checks between its elements. (We analyze the accesses within such set with the MemoryDependenceChecker.)

I think that all we may need to do is to find the min start bound and the max end bound of a class within DepCands. This should cover the range for the entire class.

What do you think?

sbaranga added inline comments.Jun 18 2015, 8:13 AM
lib/Analysis/LoopAccessAnalysis.cpp
214 ↗(On Diff #27499)

Ah, yes. That makes sense. I think it would be possible to check the value in DependencySetId instead of getting access to DepCands?

I agree, the idea behind this is to get the min and max of these objects and use that for the whole class. I think we are essentially doing the same thing here, except we only know how to compare some pairs of pointers (the very simple case where the difference is a constant).

The biggest problem as far as I can see is that we could in theory have pointers to the same underlying object which
we can't order (and it might be impossible to determine the full order in all cases) I think we have two options in this case:

  • we can create separate groups, where we can order all elements within a group (we do this in the current solution)
  • we can create umin/umax SCEV expressions - this leads to huge SCEV expressions and is essentially a way to hide the cost of the checks.

I will refactor the code so that we will have some comparison function for pointers and no mentions to constants outside it, and that can be easily extended to support more cases for pointer comparisons. Does that sound like a good idea?

[Moved this from an inline comment so that it's easier to reply inline]

Ah, yes. That makes sense. I think it would be possible to check the value in
DependencySetId instead of getting access to DepCands?

Yes, I believe so. There is a fallback case where we disable the MemoryDependenceChecker. In this case we could disable merging too.

I agree, the idea behind this is to get the min and max of these objects and
use that for the whole class. I think we are essentially doing the same thing
here, except we only know how to compare some pairs of pointers (the very
simple case where the difference is a constant).

The biggest problem as far as I can see is that we could in theory have
pointers to the same underlying object which we can't order (and it might be
impossible to determine the full order in all cases)

I think we already have this distinction available as well. You're right that this could happen within a DepCands set. These I believe would be marked Dependence::Unknown.

So perhaps the approach should be to build the groups from the InterestingDependences. Any known dependence would put the two participating pointers in the same group.

(You may need to start recording forward dependences as well which we don't do currently.)

I think we have two
options in this case:

  • we can create separate groups, where we can order all elements within a

group (we do this in the current solution)

I'd go for this initially.

  • we can create umin/umax SCEV expressions - this leads to huge SCEV

expressions and is essentially a way to hide the cost of the checks.

I will refactor the code so that we will have some comparison function for
pointers and no mentions to constants outside it, and that can be easily
extended to support more cases for pointer comparisons. Does that sound like a
good idea?

I think so but I am not completely sure I understand this last paragraph.

Adam

lib/Analysis/LoopAccessAnalysis.cpp
214 ↗(On Diff #27499)

I can't quote the lines in reply to an inline comment so moving this to the main comment area.

Hi Adam,

The biggest problem as far as I can see is that we could in theory have
pointers to the same underlying object which we can't order (and it might be
impossible to determine the full order in all cases)

I think we already have this distinction available as well. You're right that this could happen within a DepCands set. These I believe would be marked Dependence::Unknown.

So perhaps the approach should be to build the groups from the InterestingDependences. Any known dependence would put the two participating pointers in the same group.

(You may need to start recording forward dependences as well which we don't do currently.)

I've looked at this in more detail, and there is a problem with NoDep dependencies since they are not considered to be either forward or backward. We need to consider these as well, since otherwise we would not cover common cases (and also wouldn't solve the problem for interleaved accesses).

We could add some new types of NoDeps to get that information. I hope that adding these as well (plus the forward ones) won't exceed the maximum number of interesting dependencies too often.

Thanks,
Silviu

I've looked at this in more detail, and there is a problem with NoDep dependencies since they are not considered to be either forward or backward. We need to consider these as well, since otherwise we would not cover common cases (and also wouldn't solve the problem for interleaved accesses).

Good point but let's see first if we can do this without adding "fake" dependence types for these. Sounds like we should be able to work this out with using *both* DepCands and InterestingDependences:

If the pointers fall in the same set in DepCands and don't have unknown dependence between then we should be able to add them in the same checking pointer group. Do you agree?

I've looked at this in more detail, and there is a problem with NoDep dependencies since they are not considered to be either forward or backward. We need to consider these as well, since otherwise we would not cover common cases (and also wouldn't solve the problem for interleaved accesses).

Good point but let's see first if we can do this without adding "fake" dependence types for these. Sounds like we should be able to work this out with using *both* DepCands and InterestingDependences:

If the pointers fall in the same set in DepCands and don't have unknown dependence between then we should be able to add them in the same checking pointer group. Do you agree?

The problem is that for each checking pointer group we need to get the min/max pointers (otherwise we wouldn't know how to emit memchecks). If we don't have dependencies, then we wouldn't be able to figure out what the bounds are. This might be easier with an example that shows all the issues:

void test(char *in, char * out, int n, int s) {

for (int i = 0; i < n; ++i)
  out[i] = in[i] + in[i + 1] + in[i + 2] + in[i + 3] + in[i + s];

}

All the accesses to "in" are in the same equivalence class in DepCands, but we get no interesting dependencies (we get only NoDep because there are only reads). If we would say that all these accesses belong to the same pointer checking group, then we would end up having to compare in + s with i and i + 1, etc which we don't know how to do. Since we don't get any dependencies we also don't know how to compare i + 1 with i + 2 etc.

wrt to the pointer checking partitioning algorithm, I think because we need to get the min/max pointers for each group, we can't join two groups when we only see a dependency between them:

Let's say we have 3 pointers, a, b and c and two dependencies for which we can say: a > b and a > c. However we don't know anything about how b and c compare. This can happen when recording too many dependencies, or ScalarEvolution might not return a constant for b - c, or some other condition. If we merge two group checks whenever seeing an edge between them, then we end up with (a, b, c) as a pointer group check. But we don't know what the lower bound of this group is (it's either b or c), so we would end up in some invalid state.

One solution would be to track the min/max elements of each pointer checking group and only add other pointers to it as long as we can keep this property (we know what the min/max pointers are).

Regarding dependencies: maybe it would be somewhat simpler to compare the pointers on the fly (the comparison is simple I think) - and limit the number of iterations that we do when grouping the pointers? Dependencies are closely related to ordering (we need to order pointers when computing dependency types), but different enough to cause problems.

Thanks,
Silviu

OK, so I guess we're back to the original idea of using DepCands to limit the search space.

I.e. rather than:

for each p in Pointers:
  for each r in Pointers after p:

we could do:

for each set S in DepCands:
  for each pointer p in S:
    for each pointer r in S after p:
sbaranga updated this revision to Diff 28359.Jun 24 2015, 8:50 AM
sbaranga edited edge metadata.

We now use DepCands to speed up the construction of pointer check groups.
We cache the results of the algorithm, and compute the result at canCheckPtrAtRT
only when it is needed.

We now also have a limit on the number of comparisons that the grouping algorithm will perform.

Longish comment down is my main question. We should probably resolve that first.

include/llvm/Analysis/LoopAccessAnalysis.h
335 ↗(On Diff #28359)

A grouping *of* pointers

337 ↗(On Diff #28359)

CheckingGroup or CheckingPtrGroup probably sounds better.

338 ↗(On Diff #28359)

s/0/nullptr

339 ↗(On Diff #28359)

s/wich/which

later too

lib/Analysis/LoopAccessAnalysis.cpp
166–171 ↗(On Diff #28359)

I don't get this last point -- seems pretty recursive.

194 ↗(On Diff #28359)

s/indeces/indices

196–199 ↗(On Diff #28359)

Why do we need to compare against the same element?

202–213 ↗(On Diff #28359)

Shouldn't we only look through the members if DI is a leader?

208–211 ↗(On Diff #28359)

I don't understand what you gain by using an EC for this.

It seems to me that you just want a vector of CheckGroups for each DepCands set.

For each pointer then you go through this vector and try to merge to an existing group.

If nothing found you add a new group at the end of the vector.

As another general comment, it would be good to push some of the low-level mechanics into the CheckingGroup class so that the outline of algorithm is separated from the details.

242–243 ↗(On Diff #28359)

Nit: fold the increment in the comparison?

anemet added inline comments.Jun 25 2015, 1:52 PM
lib/Analysis/LoopAccessAnalysis.cpp
208–211 ↗(On Diff #28359)

I just want to clarify one more thing about the above algorithm above using a local vector of CheckingGroup instead of an EC.

When we're done merging for a DepCands set, then you'd merge the "local" vector of CheckingGroups to the main one.

sbaranga updated this revision to Diff 28564.Jun 26 2015, 8:55 AM

Renamed CheckGroup to CheckingPtrGroup.

Remove the use of EquivalenceClasses and all the other
vectors used in the merging algorithm. We now use a single
SmallVector of CheckingPtrGroups instead (which seems to
be enough). This should make everything more readable.

Fixed the iteration of DepCands (was missing a isLeader check).

Moved the code that adds a pointer to a CheckingPtrGroup
(and checks that this can be done) to CheckingPtrGroup.

Change the constructor of CheckingPtrGroup to take the index
of the first element that makes up the group. We only need
to construct groups with at least one element and the previous
constructor was only in a single place.

Various comment changes / spelling fixes according to comments.

Hi Adam,

Thanks for the comments! I've uploaded a new version, which should (hopefully) deal with all the issues raised.

Thanks,
Silviu

lib/Analysis/LoopAccessAnalysis.cpp
196–199 ↗(On Diff #28359)

Sorry, the last sentence of that comment was stale. Removed in the new version.

202–213 ↗(On Diff #28359)

We should! Nice catch!

208–211 ↗(On Diff #28359)

I've updated the algorithm. By using a SmallVector of CheckingPtrGroup we got rid of the not only the EquivalenceClass, but also the other vectors (Mins, Maxs, etc).

Wow, looks great, IMO. Thanks for your work, Silviu!

Some minor things:

include/llvm/Analysis/LoopAccessAnalysis.h
346 ↗(On Diff #28564)

s/recored/recorded

357–362 ↗(On Diff #28564)

These are not indices but SCEVs.

For the record, I agree that they should be SCEVs but the comment is wrong.

420–421 ↗(On Diff #28564)

How about CheckingGroups?

lib/Analysis/LoopAccessAnalysis.cpp
150 ↗(On Diff #28564)

Put \p before I and J

231–242 ↗(On Diff #28564)

Looks like you could formulate this with a range-based loop.

235–236 ↗(On Diff #28564)

I think that you also want to break out from the entire loop-nest here. (I'd probably go with a goto in this case.)

253–254 ↗(On Diff #28564)

Can this be written with std::copy? I.e.:

std::copy(Groups.begin(), Groups.end(), std::back_inserter(CheckGroups))
test/Analysis/LoopAccessAnalysis/unsafe-and-rt-checks.ll
16–19 ↗(On Diff #28564)

Can we make it clearer where one group ends and the other one starts when printing a check?

sbaranga updated this revision to Diff 28657.Jun 29 2015, 4:29 AM

Fixed a number of spelling errors, missed doxygen tags and out of date comments
(according to the review). Also added further comments to explain why we don't
want to break from the loop nest when reaching the maximum number of comparisons
(we want to break only from the inner loop).

Renamed CheckGroups to CheckingGroups.

We are now using std::copy to add the newly computed CheckingPtrGroups to the
global solution. Also replaced the inner loop with a range-based loop.

Modified the printing of the memchecks to separate the groups withing each
memcheck. The new format looks like:

Check 0:
   Comparing group 0:
     ....
   Against group 1:
     ....
Check 1:

etc

Updated the tests to use this new format.

Hi Adam,

Thanks! Please see the updated version. It should contain all of the requested changes except one. I've added a comment for that one in the review and further comments in the code to explain it.

Thanks,
Silviu

lib/Analysis/LoopAccessAnalysis.cpp
235–236 ↗(On Diff #28564)

I think it is correct. When we reach the comparisons limit we want to default to creating a new group for each pointer.
Otherwise if we would break from the loop nest we would end up ignoring the pointers that we didn't process.

I've also remembered why I didn't previously fold the increment to TotalComparisons into the if statement. Once we reach the threshold we don't want to increment it further.

anemet accepted this revision.Jun 29 2015, 5:21 PM
anemet edited edge metadata.

LGTM, thank you!

lib/Analysis/LoopAccessAnalysis.cpp
247–252 ↗(On Diff #28657)

No {}

292–295 ↗(On Diff #28657)

I would drop this. Also moving the def of N right before the loop would be a good thing.

552–553 ↗(On Diff #28657)

For the record, this is not where want to call this in the long run but I am OK with this for now.

DepCands is not exposed right now but moving forward we want to initiate the grouping from the client so that things like PtrPartitions could be passed in.

I have some plans for how to do this so this will change anyway pretty soon.

test/Analysis/LoopAccessAnalysis/number-of-memchecks.ll
76–89 ↗(On Diff #28657)

I'd have a slight preference to fully match these lines rather than the number of checks/groups.

This revision is now accepted and ready to land.Jun 29 2015, 5:21 PM
anemet added inline comments.Jun 30 2015, 10:52 AM
lib/Analysis/LoopAccessAnalysis.cpp
222 ↗(On Diff #28657)

Same no struct here either.

231 ↗(On Diff #28657)

Sorry didn't notice these the first time around:

  • no need for struct in C++
  • since this is no longer an iterator, we should probably call it Group or G or whatever
sbaranga updated this revision to Diff 28874.Jul 1 2015, 9:00 AM
sbaranga edited edge metadata.

Constrain grouping algorithm to only merge pointers with maximum distance of 512.
This stops a (pottential) 20% regression in TSVC/Symbolics-flt. Added a regression test for this.

Remove redundant 'struct' keyword, renamed group iterator to 'Group'.
Update tests to fully match lines with CHECK-NEXT.

Hi Adam,

Sorry for delaying the commit, but I have a last minute change for this. It seems that this change would cause a regression in the TSVC/Symbolics-flt benchmark.

The regression happens when executing a code that looks like this:

struct Data{

int a[LEN];
int B[LEN];

};

void test(int m) {

for (int i = 0; i < m; ++i)
  Data.a[i + m] = Data.a[i] + Data.b[i];

}

a[i] and b[i] get merged according to the algorithm. However the resulting interval covers a[i + m], and the runtime check fails.
I've limited the merging algorithm to only merge when the distance to the max/min is less then a constant (I've chosen 512 for this), which should prevent this issues from happening (at least for loops with more than 512 iterations). Does this sound reasonable?

That benchmark was the only regression that I've seen (lnt, spec2k,spec2k), but I've not seen any improvements either - so either further tuning is required or we would need to turn on the interleaved accesses to see any gains.

Thanks,
Silviu

Hi Silviu,

The regression happens when executing a code that looks like this:

struct Data{

int a[LEN];
int B[LEN];

};

void test(int m) {

for (int i = 0; i < m; ++i)
  Data.a[i + m] = Data.a[i] + Data.b[i];

}

a[i] and b[i] get merged according to the algorithm. However the resulting interval covers a[i + m], and the runtime check fails.

This is the way I look at this problem, please correct me if I am missing something. We have three pointers all pointing to the same underlying object (Data). Let's refer to them as P_am, P_a and P_b respectively.

We were able to determine const distance between P_a and P_b however P_am is not constant. In this case it seems to me that when analyzing P_am, we want to veto the merge of P_a and P_b because as in this case P_am may reside somewhere between the two objects.

This does not seem like a major change to your approach and it does not seem to affect the main use-case. What do you think?

Adam

Hi Adam,

This is the way I look at this problem, please correct me if I am missing something. We have three pointers all pointing to the same underlying object (Data). Let's refer to them as P_am, P_a and P_b respectively.

We were able to determine const distance between P_a and P_b however P_am is not constant. In this case it seems to me that when analyzing P_am, we want to veto the merge of P_a and P_b because as in this case P_am may reside somewhere between the two objects.

This does not seem like a major change to your approach and it does not seem to affect the main use-case. What do you think?

Yes, that would also solve the cases I've seen so far and seems sensible. At this point there isn't any obvious best solution, so I have no preference.

FWIW we always want to group pointers separated by a small constant (if we get a false positive it means that the trip count was small), so veto-ing groups with such pointers would pessimize this case. I'm not sure how frequent the case is though.

Thanks,
Silviu

sbaranga updated this revision to Diff 29085.Jul 6 2015, 7:19 AM

Updated the memcheck grouping algorithm to only use DepCands
if DepCands have been computed. If not, make a separate group
for each pointer. Previously we were using DepCands even when
DepCands wasn't available, and this could have produced
incorrect results.

This should also fixes the algorithm for the case of unknown
dependencies: in this case, we will retry to group pointers
without using dependencies, and create a separate group for
each pointer.

Hi Adam,

Further digging into this has revealed that we should only get into the case where we get the regression if we have an unknown dependency - and we should retry without dependencies. In this case we don't have DepCands so we can't use the current algorithm to group pointers. Do you agree with this?

I've fixed the algorithm to not group pointers in this case (this should be inline with previous comments).

Thanks,
Silviu

sbaranga closed this revision.Jul 8 2015, 2:16 AM
This revision was automatically updated to reflect the committed changes.

Committed in r241673. Thanks for all the help!

-Silviu

anemet added a comment.Jul 8 2015, 9:49 AM

Further digging into this has revealed that we should only get into the case where we get the regression if we have an unknown dependency - and we should retry without dependencies. In this case we don't have DepCands so we can't use the current algorithm to group pointers. Do you agree with this?

No, I don't think that's quite correct.

If I understand the situation correctly, we have two accesses of the same underlying object with non-const distance. So we give up dependence analysis with "LAA: Retrying with memory checks".

The problem is that we can have unknown distance between pointers of the same underlying object also if the stride is unknown (look for the return Dependence::Unknown before the ShouldRetryWithRuntimeCheck in isDependent). In this case we don't give up on dependence analysis so your flag would still be true.

I would think that you should be able to create a testcase with three pointers on the same objects two with known stride and one without. In this case too we would merge all three but the one with unknown stride to cause the same issue you discovered.

If this is all correct, I still think the best way is the veto-ing algorithm I described originally.

I've fixed the algorithm to not group pointers in this case (this should be inline with previous comments).

Please don't forget to add the testcase that prompted the new version.

Hi Adam,

Further digging into this has revealed that we should only get into the case where we get the regression if we have an unknown dependency - and we should retry without dependencies. In this case we don't have DepCands so we can't use the current algorithm to group pointers. Do you agree with this?

No, I don't think that's quite correct.

If I understand the situation correctly, we have two accesses of the same underlying object with non-const distance. So we give up dependence analysis with "LAA: Retrying with memory checks".

The problem is that we can have unknown distance between pointers of the same underlying object also if the stride is unknown (look for the return Dependence::Unknown before the ShouldRetryWithRuntimeCheck in isDependent). In this case we don't give up on dependence analysis so your flag would still be true.

I would think that you should be able to create a testcase with three pointers on the same objects two with known stride and one without. In this case too we would merge all three but the one with unknown stride to cause the same issue you discovered.

If this is all correct, I still think the best way is the veto-ing algorithm I described originally.

Ok, that makes sense. I already have the updated veto-ing algorithm, but I'd like to add it under a different review instead of rolling everything back. Is that ok with you?

I've fixed the algorithm to not group pointers in this case (this should be inline with previous comments).

Please don't forget to add the testcase that prompted the new version.

I'll commit a test case for this.

Thanks,
Silviu

anemet added a comment.Jul 9 2015, 1:49 PM

Ok, that makes sense. I already have the updated veto-ing algorithm, but I'd like to add it under a different review instead of rolling everything back. Is that ok with you?

Sure, thanks.

Hi Adam,

Ok, that makes sense. I already have the updated veto-ing algorithm, but I'd like to add it under a different review instead of rolling everything back. Is that ok with you?

Sure, thanks.

I'm still having trouble with producing a testcase for the 'veto' approach.

Looking at the logic in analyzeLoops:

In order to need memory checks, first we need Accesses.isDependencyCheckNeeded() to evaluate to true.
Since we need an unknown dependence between two objects to get into the 'veto' case, it follows that areDepsSafe will return false (we have al least one unknown dependence).

This causes us to take the "!CanVecMem && DepChecker.shouldRetryWithRuntimeCheck()" branch and reset the dependence checks, which in turn will cause us to not group pointers.

I'm sure we could end up in the 'veto' case by tweaking the logic in 'analyzeLoops', but not sure how to do that without this. Can you see any issues with the reasoning above?

Thanks,
Silviu

Hi Silviu,

Looking at the logic in analyzeLoops:

In order to need memory checks, first we need Accesses.isDependencyCheckNeeded() to evaluate to true.
Since we need an unknown dependence between two objects to get into the 'veto' case, it follows that areDepsSafe will return false (we have al least one unknown dependence).

This causes us to take the "!CanVecMem && DepChecker.shouldRetryWithRuntimeCheck()" branch and reset the dependence checks, which in turn will cause us to not group pointers.

shouldRetry... should only be set if we didn't find a constant distance between the accesses. This is why I said earlier that:

The problem is that we can have unknown distance between pointers of the same underlying object also if the stride is unknown (look for the return Dependence::Unknown before the ShouldRetryWithRuntimeCheck in isDependent). In this case we don't give up on dependence analysis so your flag would still be true.

Was something unclear about this? I can help to come up with a testcase for this if that helps.

Adam

Hi Silviu,

Looking at the logic in analyzeLoops:

In order to need memory checks, first we need Accesses.isDependencyCheckNeeded() to evaluate to true.
Since we need an unknown dependence between two objects to get into the 'veto' case, it follows that areDepsSafe will return false (we have al least one unknown dependence).

This causes us to take the "!CanVecMem && DepChecker.shouldRetryWithRuntimeCheck()" branch and reset the dependence checks, which in turn will cause us to not group pointers.

shouldRetry... should only be set if we didn't find a constant distance between the accesses. This is why I said earlier that:

The problem is that we can have unknown distance between pointers of the same underlying object also if the stride is unknown (look for the return Dependence::Unknown before the ShouldRetryWithRuntimeCheck in isDependent). In this case we don't give up on dependence analysis so your flag would still be true.

Was something unclear about this? I can help to come up with a testcase for this if that helps.

Adam

I think I'm getting tunnel vision on this.. I would be really grateful if you could help to come up with a testcase.

Thanks,
Silviu

Ah, OK, I think I understand what's going on.

We simply can't have memchecks between pointers of the same underlying objects *unless* shouldRetryWithRuntimeChecks. I don't think that this what you were trying to explain but if yes, sorry for not understanding.

Anyhow, I think in this case the code is good, we just have to explain the situation in the comment before the UseDependenies check (was this the only reason for the check?). I.e. why the offending check can only occur with shouldRetryWithRuntimeChecks.

Please also mention shouldRetryWithRuntimeChecks in the testcase you added earlier. It took me a while to figure out why we wanted to compare pointers with the same underlying object. (That is kinda the point.)

Thanks for looking into this!

Ah, OK, I think I understand what's going on.

We simply can't have memchecks between pointers of the same underlying objects *unless* shouldRetryWithRuntimeChecks. I don't think that this what you were trying to explain but if yes, sorry for not understanding.

Anyhow, I think in this case the code is good, we just have to explain the situation in the comment before the UseDependenies check (was this the only reason for the check?). I.e. why the offending check can only occur with shouldRetryWithRuntimeChecks.

We also need this for correctness. The algorithm assumes that pointers in the same equivalence class don't need checking against each other, which would be false in this case.

Please also mention shouldRetryWithRuntimeChecks in the testcase you added earlier. It took me a while to figure out why we wanted to compare pointers with the same underlying object. (That is kinda the point.)

I've added comments to explain this in r243416.

Thanks,
Silviu

Thanks for looking into this!

Ah, OK, I think I understand what's going on.

We simply can't have memchecks between pointers of the same underlying objects *unless* shouldRetryWithRuntimeChecks. I don't think that this what you were trying to explain but if yes, sorry for not understanding.

Anyhow, I think in this case the code is good, we just have to explain the situation in the comment before the UseDependenies check (was this the only reason for the check?). I.e. why the offending check can only occur with shouldRetryWithRuntimeChecks.

We also need this for correctness. The algorithm assumes that pointers in the same equivalence class don't need checking against each other, which would be false in this case.

Yep.

Please also mention shouldRetryWithRuntimeChecks in the testcase you added earlier. It took me a while to figure out why we wanted to compare pointers with the same underlying object. (That is kinda the point.)

I've added comments to explain this in r243416.

Excellent, thank you!

Thanks,
Silviu