This is an archive of the discontinued LLVM Phabricator instance.

[Polly] [DependenceInfo] Change reduction building to use May-writes.
Needs ReviewPublic

Authored by bollu on Apr 5 2017, 3:24 AM.

Details

Summary

Change the way reductions are built to use may-writes and not use
the transitive closure. This simplifies the process of creating
reduction dependences. It also eliminates the need for StrictWAW.

Reduction dependences (RED) are widened. This is different
from previous behaviour where the reduction dependences were "exact"
while TC_RED was the widened dependences.

This also affected the matrix multiplication pattern match. It earlier
used to expect reduction dependences to look like:

S(i0, i1, ..., ik, ... in) -> S(i0, i1, ..., ik + 1, ...in)

Now, that is no longer possible since the reduction dependences will be
wider, and will look like so:

S(i0, i1, ..., ik, ... in) -> S(i0, i1, ..., ok, ...in) where
ik < ok < N.

So, we choose to allow anything along the "ok" dimension, since the other
checks make sure that illegal memory access cannot happen
in the first place.

Event Timeline

bollu created this revision.Apr 5 2017, 3:24 AM
bollu added a comment.Apr 5 2017, 3:25 AM

@gareevroman: Could you take a look at containsOnlyMatMulDep and affirm that the assumption made is correct?

bollu added a comment.Apr 7 2017, 12:57 AM

@gareevroman , @Meinersbur: ping, a review would be appreciated.

Meinersbur added inline comments.Apr 7 2017, 8:27 AM
include/polly/DependenceInfo.h
165

Leftover?

lib/Analysis/DependenceInfo.cpp
176

If you do not isl_map_copy() the both cases above, you don't need to free accdom.

Effectively, you are only appending || MA->isReductionLike() to the second conditional.

It seems to have the side-effect that all reduction candidates are treated as May-Writes from here on. A candidate doesn't necessarily need to turn out to be a reduction. I don't like the idea that we pessimize the dependences/schedule optimizer based on some in-between heuristic of a detection algorithm.

To see what I mean: Why not just add all writes to MayWrite and leave MustWrite empty, since every access is potentially a reduction.

417–418

Could you add a comment to this line what it is doing?

lib/Transform/ScheduleOptimizer.cpp
771

<anything> doesn't match the implementation which checks for a constant delta.

gareevroman edited edge metadata.Apr 7 2017, 11:39 PM

@gareevroman: Could you take a look at containsOnlyMatMulDep and affirm that the assumption made is correct?

@bollu: I think it's correct. However, what should we do if we need "exact" reduction dependences in future? How can we compute them in case that patch is applied?

lib/Transform/ScheduleOptimizer.cpp
788

Why do you define a separate variable DomainDep for isl_union_map_extract_map(Dep, Space)?

793

We can use the following code to avoid using the foundNonZeroDimension:

  ...
  for (int i = 0; i < DeltasDimNum; i++) {
    auto *Val = isl_set_plain_get_val_if_fixed(Deltas, isl_dim_set, i);
    Pos = Pos < 0 && !isl_val_is_zero(Val) ? i : Pos;
    if (!isl_val_is_zero(Val) && i != Pos) {
      isl_val_free(Val);
      isl_set_free(Deltas);
      return false;
    }
    isl_val_free(Val);
  }
  isl_set_free(Deltas);
  if (DeltasDimNum == 0 || Pos < 0)
    return false;
  return true;
}

@gareevroman I don't believe there is a way to get back the exact reduction dependences that we had access to previously. (Unless we change the dependence generation to be quite different from what we have now).

lib/Analysis/DependenceInfo.cpp
176

for the code structure: I'll change it.

Yes, you're right, I group all May-writes + ReductionLike into the MayWrite now.

Why not just add all writes to MayWrite and leave MustWrite empty, since every access is potentially a reduction.

This is too pessimistic, since from what I understand, we know which accesses _cannot_ be treated as reductions. This is a decent compromise IMO between exactness and simplicity.

However, I do agree that the change does reduce the quality of information we produce (in terms of RED dependences)

bollu updated this revision to Diff 94956.Apr 12 2017, 5:36 AM
  • [NFC] Simplify creation of MayWrite and MustWrite
  • [NFC] comment on why we take the union with the reverse of TC_RED
bollu updated this revision to Diff 95122.Apr 13 2017, 7:33 AM
  • [Polly] [DependenceInfo] Change reduction building to use May-writes.
  • [NFC] Simplify creation of MayWrite and MustWrite
  • [NFC] comment on why we take the union with the reverse of TC_RED
  • [NFC] remove forgotten comment in DependenceInfo.h, remove temporary variable from lib/Transform/ScheduleOptimizer.cpp
bollu added a comment.Apr 15 2017, 6:49 AM

@Meinersbur , @gareevroman : I've made changes I was comfortable with. Could you take another look?

lib/Analysis/DependenceInfo.cpp
417–418

@Meinersbur: I wrote that to keep it consistent with that the original reduction implementation does. I believe we need the backwards dependences to make sure that dependences are tracked even when optimisations like loop reversal takes place.

lib/Transform/ScheduleOptimizer.cpp
771

Yes. However, I believe that containsOnlyMatrMultAcc also checks for the correct stride.

In the RED information that is generated from this change onwards, we lose access to fine-grained dependence information about reductions. as @gareevroman mentioned, perhaps this is an issue.

@Meinersbur: if you can think of an example that will break on this particular check, then I will be glad to know of it. I tried a little bit and couldn't come up with anything.

788

Ah, good catch. I'll eliminate the frivolous temporary variable.

793

Yes, I initially did keep it that way. However, I felt that we are overloading Pos as both a "state store" of whether we have found the dimension or not (based on it being < 0) and as a "value store" based on the nonzero value it takes.

I personally find it clearer with the extra boolean. However, if this impacts performance or something along those lines, I do not mind modifying it.

Meinersbur edited edge metadata.Apr 19 2017, 8:43 AM

Is it possible to put the changes to containsOnlyMatMulDep into a separate patch? Then we can discuss the changes to the reduction detection separately.

lib/Analysis/DependenceInfo.cpp
176

Yes, you're right, I group all May-writes + ReductionLike into the MayWrite now.

So could you rename it such that it reflects its purpose?

This is a decent compromise IMO between exactness and simplicity. However, I do agree that the change does reduce the quality of information we produce (in terms of RED dependences)

I don't agree on simplicity. Merging MayWrite and reduction candidates overcharges a variable with two meanings that are subtly different. If the purpose is to have less lines of code, put the common code in the separate function and call it once for MayWrite and ReductionLike.

The reason I don't like this is that switching reduction detection has an influence on something unrelated to reductions. If something does not work as expected it will take a long time until one realizes that the reduction detection was the cause, even though there are no reductions in the code. IMHO this is not worth any perceived simplicity of the code.

If your goal is to save computation time, you should show how much is saved to justify the regression and provide an option for the non-heuristic behaviour (this could already be switching off reductions altogether, though I'd prefer being able to do both at the same time).

417–418

Looks ok.

lib/Transform/ScheduleOptimizer.cpp
771

My remark war about the comment being wrong, not the code. S(..., k, ...) -> S(..., <anything>, ...) is the same as S(...) -> S(...).

However, from the code I read that it checks for S(..., k, ...) -> S(..., k+c, ...) with a constant c.

793

I think this is more about changing fewer lines so the review is easier and avoids non-intended changes.

Since this is already reviewed, no need to change it back (if Roman is fine with it).

bollu added a comment.Apr 20 2017, 6:08 AM

Is it possible to put the changes to containsOnlyMatMulDep into a separate patch? Then we can discuss the changes to the reduction detection separately.

Unfortunately, no. This will cause a test case to break (since we now have coarser RED dependences).

lib/Analysis/DependenceInfo.cpp
176

I agree with what you are saying. We are overloading the "May write" semantics.

What if I create a separate union_map called as Reduction, add all the reductions to that in collectInfo, and then have a function in the dependence computation called taintWritesWithReduction which takes a union of the two? That way, we can document the rationale for fusing them at the site of the function. And, if something goes wrong, it will be easy to switch from the tainted-writes back to the non-tainted writes while experimenting.

Meinersbur added inline comments.Apr 20 2017, 6:27 AM
lib/Analysis/DependenceInfo.cpp
176

That sounds better.

However, wouldn't switching off taintWritesWithReduction have the same effect as -polly-detect-reductions=false? And what is the compile-time speedup?

lib/Transform/ScheduleOptimizer.cpp
771

It looks fine as done in D32157.

bollu added inline comments.Apr 24 2017, 12:54 AM
lib/Analysis/DependenceInfo.cpp
176

Yes, the two would be equivalent.

I haven't benchmarked the speedup, but there should be one since this eliminates a call to buildFlow from the StrictWAW computation. It also removes the (potentially) expensive transitive closure operation.

bollu added inline comments.Apr 24 2017, 3:48 PM
lib/Transform/ScheduleOptimizer.cpp
771

Actually, thinking about this, if it were only k+c, then this is pointless. This implementation is correct, in that it allows for ranges of values with the non-zero dimension.

Precisely, we wish to look for both lower and upper bounds along a single dimension. So, it should be something along the lines of:

matrix-dependences
S[..., k, ...] -> S[..., k', ...] where k < k' <= N

Since we will have widened dependences.

I had implemented the precise check [as a proof of concept here: https://github.com/bollu/polly/blob/remove-privatization-with-may-reduction-fix-matmul/lib/Transform/ScheduleOptimizer.cpp#L875](https://github.com/bollu/polly/blob/remove-privatization-with-may-reduction-fix-matmul/lib/Transform/ScheduleOptimizer.cpp#L875).

However, I decided that the check did not make sense - I was being overly cautious to check for exactly what I expected with widened matrix multiplication dependences in the constraints, because I did not understand the problem well enough. It is also brittle, since I look at the shape of the constraints, which can change under representations from what I understand.

I would like to hear your thoughts on this.

Meinersbur added inline comments.Apr 25 2017, 7:50 AM
lib/Transform/ScheduleOptimizer.cpp
771

I see that I misunderstood one thing. It uses isl_set_plain_get_val_if_fixed, but it is used only to check that it is zero. If it is nonzero, its value is ignored.

That means, it checks for:
S(i_0, ..., i_{n-1}) -> S(i_0, i_{k-1},i'_k,i_{k+1},..,i_{n-1}) where i'_k != i_k.

Is this correct?

gareevroman added inline comments.Apr 25 2017, 11:47 PM
lib/Transform/ScheduleOptimizer.cpp
793

Yes, I initially did keep it that way. However, I felt that we are overloading Pos as both a "state store" of whether we have found the dimension or not (based on it being < 0) and as a "value store" based on the nonzero value it takes.

AFAIK, the overloading doesn't deviate from the LLVM Coding Standards.

I personally find it clearer with the extra boolean. However, if this impacts performance or something along those lines, I do not mind modifying it.

Okay. However, it seems that it doesn't remove the overloading of 'Pos'. It still used as a "state store" and as a "value store".

Could you factor out the check of the form of the dependency to a separate function (I find it harder to read now)?

jdoerfert resigned from this revision.Jun 4 2019, 10:06 PM