This is an archive of the discontinued LLVM Phabricator instance.

[DA] Enable -da-delinearize by default
ClosedPublic

Authored by dmgreen on Apr 20 2018, 4:15 AM.

Details

Summary

We already use this flag for the tests we run. This should make it safe to use by
checking that the subscript we use are within bounds of the array dimensions
we compute, preventing subscripts from wrapping into the next dimension.

Enabling it by default can help increase the power of DA for multi dimensional
array accesses.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 20 2018, 4:15 AM

Thanks Dave! This should be helpful for LoopInterchange after D35430. Did you measure any the compile-time impact of this?

dmgreen updated this revision to Diff 143293.Apr 20 2018, 5:03 AM

Re add a missing 'a'

Ah, yes. I will try to get some compile time numbers. I expect it is in the noise for all the rest that DA is doing (but may be being optimistic). I will check.

AFAIU this change would affect other targets. Probably best to add more reviewers (owners for other targets).

We probably need to also add code that ensures that the arrays that result from delinearization remain within bounds. Without this code, the resulting checks are likely incorrect.

We probably need to also add code that ensures that the arrays that result from delinearization remain within bounds. Without this code, the resulting checks are likely incorrect.

I agree. To be clear, I think that this means addressing the FIXME in the code:

// The delinearization transforms a single-subscript MIV dependence test into
// a multi-subscript SIV dependence test that is easier to compute. So we
// resize Pair to contain as many pairs of subscripts as the delinearization
// has found, and then initialize the pairs following the delinearization.
Pair.resize(size);
for (int i = 0; i < size; ++i) {
  Pair[i].Src = SrcSubscripts[i];
  Pair[i].Dst = DstSubscripts[i];
  unifySubscriptType(&Pair[i]);

  // FIXME: we should record the bounds SrcSizes[i] and DstSizes[i] that the
  // delinearization has found, and add these constraints to the dependence
  // check to avoid memory accesses overflow from one dimension into another.
  // This is related to the problem of determining the existence of data
  // dependences in array accesses using a different number of subscripts: in
  // C one can access an array A[100][100]; as A[0][9999], *A[9999], etc.
}

I'd assumed that is why delinearization is disabled by default.

Exactly. The reason this was not done before is that we did not really had a user. Historically the LLVM's DA was implemented without users, then we added delinearization with Polly being the first (and only) user. The classical loop transformation passes came only later.

dmgreen planned changes to this revision.Apr 23 2018, 7:00 AM

Ah, I see. Thanks for the information. I had tried some example loops, but apparently hadn't tried the right thing. I'll hopefully look into it.

dmgreen updated this revision to Diff 144683.May 1 2018, 1:48 AM

This attempts to statically prove that the subscripts found by delinearization will always be within bounds of the array dimensions calculated. It does not feel super powerful, and the idea of storing the bounds and using them in later calculations may be better. It also doesn't attempt to do anything like runtime-checks. It will hopefully make da-delin correct though.

fhahn added inline comments.May 1 2018, 2:05 AM
lib/Analysis/DependenceAnalysis.cpp
3266

How about using SCEVAddRecExpr::evaluateAtIteration here?

dmgreen updated this revision to Diff 146331.May 11 2018, 8:23 AM

Thanks for the suggestion. You learn something new every day.

This adds some type checking that is needed. During the testing I found this delinearization doesn't work as well on 64bit systems due to all the extra extends being added. Presumably polly has a better way of dealing with this.

Some inline nits. What worries me slightly is that the test oracle changed in GCD.ll. Why is that? Where the tests plainly wrong before?

lib/Analysis/DependenceAnalysis.cpp
3257

I'd prefer a continue over a structured block here.

3258

tryDeliniarize is a 300LOC function. Instead of the lambda, why not make this a static function?

3283

Double 'using'.

dmgreen updated this revision to Diff 149789.Jun 4 2018, 10:05 AM
dmgreen marked 4 inline comments as done.
dmgreen edited the summary of this revision. (Show Details)
dmgreen added a subscriber: tvvikram.

Thanks for the comments. I've moved things around as a result of your suggestions.

I managed to create https://rise4fun.com/Z3/oCi2 for the change in GCD.ll, which seems to be able come up with valid solutions for both < and > in i (Providing N < 0 is valid). The ones in SymbolicSIV.ll seem wrong too.

Ping.

https://rise4fun.com/Z3/hP6p is a slightly cleaner version of the proof for the change in GCD.ll.

Hi,

This patch looks good to me. I applied it to our code and ran some of our internal correctness and performance tests (for Hexagon), and the results came back clean.

This new delinearization code misses some cases that the old version caught. For example, something like the following, which cannot be delinearized because of the isKnownNonNegative check, so the dependence distance is not computed exactly.

unsigned foo(unsigned *restrict a, unsigned *restrict b, unsigned h, unsigned w, unsigned ss, unsigned ds) {
  unsigned sum = 0;
  for ( unsigned y = 1; y < h - 1; ++y) {
    for ( unsigned x = 1; x < w - 1; ++x ) {
      unsigned char sum = 0;
      sum += a[x + ss];
      sum += a[x + ss + 1];
      b[x] = sum;
    }
    a += ss;
    b += ds;
  }
  return sum;
}

Thanks,
Brendon

Hello. Thanks for running the tests. I did not know that there were uses outside of loop interchange and now unroll and jam. That's good to know it will get some more use.

By the old delinearise code, do you mean the one that was based on geps, or this one without these extra checks in it? I believe unfortunately that both could produce wrong dependencies in some cases.

For your example, what would you expect the dependency between the two loads to be? I think it is difficult to prove anything other than [* *] (Because ss could be 0, or a value lower than w). Delinearizing using scevs seems to be a tricky business, and there may be cases that are tough to prove. I think this is better than what is in tree at the moment though, and brings the code in line with what we test, which I think is important.

Hi David,

We have a (loop) scalar replacement implementation that uses the dependence analysis information. We don't really need/use it much anymore for various reasons, including that there are other LLVM passes that get much of the same benefits.

You're correct about the problems with delinearizing. I think it was Sebastian who added a patch a while back that removed some of the delinearization code in DA that was incorrect (using the GEPs). My example was a bad one, though. Sorry about that. I was trying to show an example where we could possibly delinearize correctly but still doesn't work with your patch. I still think your patch is good as-is, and enabling the DA with delinearization is great, but I'm not sure if/what other opportunities it's still missing.

Thanks,
Brendon

Yes, I agree. There are hopefully multiple improvements we can make in the future. I remember seeing one patch recently (40369) that should help delinearise when sext's are involved (which I think happens a lot on 64bit machines). One thing that would probably be good to add somehow would be versioning based on needs of the loop. For example in your case it could check that ss is more than w, or in a more general case check that arrays do not alias. I know polly is very good at that, but it's not something we have for DA yet.

Hopefully we can improve things as we go.

sebpop accepted this revision.Jun 20 2018, 9:01 AM

lgtm thanks!

This revision is now accepted and ready to land.Jun 20 2018, 9:01 AM
This revision was automatically updated to reflect the committed changes.