This is an archive of the discontinued LLVM Phabricator instance.

[LV/LoopAccess] Check statically if an unknown dependence distance can be proven larger than the loop-count
ClosedPublic

Authored by dorit on Dec 21 2016, 11:48 PM.

Details

Summary

This fixes PR31098: Try to resolve statically data-dependences whose compile-time-unknown distance can be proven larger than the loop-count, instead of resorting to runtime dependence checking (which are not always possible).

(A couple existing testcases assumed that runtime tests will be generated, but with this patch the runtime tests are no longer generated because a dependence with an unknown distance is resolved statically; I changed the loop-count in these tests so that we will not be able to resolve the dependence statically; This way these tests can continue to verify that the runtime tests can be created for the accesses at hand.)

Diff Detail

Event Timeline

dorit updated this revision to Diff 82308.Dec 21 2016, 11:48 PM
dorit retitled this revision from to [LV/LoopAccess] Check statically if an unknown dependence distance can be proven larger than the loop-count.
dorit updated this object.
dorit added reviewers: mkuper, silviu.baranga.
dorit added a subscriber: llvm-commits.
delena added a subscriber: delena.Dec 25 2016, 11:27 PM
dorit updated this revision to Diff 82799.Jan 2 2017, 5:52 AM

discovered I wasn't careful about not losing bits when matching the types of the MinusExpr operands... Now fixed.

mkuper edited edge metadata.Jan 6 2017, 5:40 PM

Hi Dorit, sorry for the delay - I don't know SCEV and LAA well enough yet, and I spent some time happily ignoring this, without noticing nobody else reviewed it either. :)

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1369

Can this be an early exit?

1376

Can you use getMaxBackedgeTakenCount()?
If the count is precisely known, then they should be equivalent. If it's not, then since by definition MaxBackedgeTakenCount >= BackedgeTakenCount, the equation above also holds.

(I'm not sure if ever vectorize when we don't have a precise getBackedgeTakenCount(), but even if we don't, no need for extra constraints here)

1391

(Sorry for possibly rehashing the discussion you had with Silviu)
The code looks fine, but I'm not entirely sure I understand why this is the best way to approach this. I've read through PR31098, and it probably contains the information I'm looking for, but it's spread over separate messages and it's a bit hard for me to follow.

  1. Why can't you offload this into SCEV? Is the problem representing |dist| as a SCEV, or that SCEV can't prove |dist| - product is positive?
  2. While this may be a good idea independently, wouldn't cases like PR31098, in general be better served by improving alias analysis, and then being able to prove that 8D >= 0?
dorit added a comment.Jan 8 2017, 7:23 PM

Thanks Michael!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1369

yes, I'll change that

1376

Ok; I gave this a try, but getMaxBackedgeTakenCount is giving me -1… maybe I'm using it wrong?:
Const SCEV *MaxBackedgeTakenCount = PSE.getSE()->getMaxBackedgeTakenCount(InnermostLoop);

I don't see any uses of this API in the vectorizer/LoopAccesses; is it supposed to work also when predicates are added to compute the BackedgeCount?

(BTW, we indeed never get here if getBackedgeTakenCount is UnknownSCEV. it' a requirement in canAnalyzeLoop).

1391
  1. AFAICS, SCEV can't answer the questions isKnownPositive(dist)/isKnownNonNegative(dist)/… for the expression in question. It looks like SCEV can tell that D = (%size /u 2) is positive, but it can't tell that 8*D is positive... (where in fact it is even strictly positive if we know we entered the loop). So I think there's room for improvement in that respect. But in any case, even if/when that is improved, there still may be scenarios in which we can't answer the isKnownPositive question about dist, but we can successfully prove that either (dist - product > 0) or (-dist - product > 0) simply because things get canceled out... (After all, the goal is not to compute |dist|, but to prove the inequality...).
  1. By improving alias analysis you refer to my suggestion to be able to anti-alias structure fields? If so then yes, sure, that would help prune irrelevant dependencies. (But if we had a regular array strided access like so: a[2*i+1], a[2*i+D] then alias analysis would not come to the rescue… we'd have to deal with dist = 8*D-4 for which 8*D>=0 is not sufficient)
dorit added inline comments.Jan 9 2017, 12:35 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
1376

getMaxBackedgeTakenCount is giving me -1

probably this related to the PR you opened (PR31412)? (although this is not in the remainder loop...?)

mkuper added inline comments.Jan 9 2017, 4:51 PM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
1376

Err, yes, you're right, this is currently broken. Sorry for the confusion.

1391
  1. What I mean is that this holds unconditionally:
(X - Y > 0 || -X - Y > 0) <=> |X| - Y > 0

Would it make sense to sink this inference into SCEV itself, instead of only using it here? If not, is it because the SCEV representation of |X| is too hairy?

  1. Yes, that's what I meant. And you're right, we probably want both.
dorit added inline comments.Jan 10 2017, 2:14 PM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
1391

about 1: I don't know that we have a canonical way for abs representation in SCEV, so I guess I would classify this as "hairy" :-) but better if the SCEV experts would comment on this.

dorit updated this revision to Diff 84090.Jan 12 2017, 1:49 AM
dorit edited edge metadata.

changed one condition to early-exit, as requested.

anemet added inline comments.Jan 12 2017, 9:09 AM
../llvm/lib/Analysis/LoopAccessAnalysis.cpp
1303–1344 ↗(On Diff #84090)

Can this be separated out into its own function? isDependent is already a largish function.

dorit updated this revision to Diff 84749.Jan 17 2017, 2:45 PM

separated out into a new function. (good suggestion! thanks).

anemet requested changes to this revision.Jan 23 2017, 9:11 AM

Dorit, I am really sorry that it took me sooo much time to look at this for real. The only thing that is missing is an LAA-specific test.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1224–1229

Use the same variable names as in the formula or the other way?

1241–1245

The comment is actually slightly misleading since you perform subtraction to detect this.

@sanjoy, are these steps reasonably efficient to prove the property at line 1224 with SCEV?

llvm/test/Transforms/LoopVectorize/pr31098.ll
2–3

Please add an LAA-only test in addition to this under Test/Analysis/LoopAccessAnalysis.

This revision now requires changes to proceed.Jan 23 2017, 9:11 AM
dorit updated this revision to Diff 85741.Jan 25 2017, 6:20 AM
dorit edited edge metadata.

added testcase and addressed other comments.

I did not understand why this works (I've added some questions inline). Some "constructive" reasoning (not necessarily a proof) for why isSafeDependenceDistance is correct may help (i.e. we know X and therefore we know Y etc.). If this is already documented / discussed elsewhere then a reference to that discussion / documentation will be helpful too.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1208

Nit: "non-constant"

1225

How does this work with a negative BackedgeTakenCount * Step?

For instance, in:

i = 0;
do {
  r0 = out[i + 1];
  out[i] = r1;
} while (--i != -2);

D is 1 (or -1), BackedgeTakenCount is 2 and Step is -1,
so |Dist| > BackedgeTakenCount * Step is true (I assume you intend
to use a signed > ?).

However, on the first iteration we write to out[0], and we read from
it on the second iteration.

Or is this somehow not the kind of dependence you're looking for?

1237

Can you add a comment for why you're zero extending Product but sign extending Dist?

1241

How do you know that Dist - (BackedgeTakenCount * Step) won't
overflow?

I.e. why can't Dist be i8 128 and BackedgeTakenCount * Step be
i8 1?

Skimming through LAA, it looks like it guards against overflow in the
addressing expressions themselves, but I think the situation above can
happen even if addressing expressions themselves do not overflow.

for (i8 i = 0; i < 2; i++) {
  r0 = out[i - 64];
  out[i + 64] = r1;
}
dorit added a comment.Jan 31 2017, 2:30 AM

I did not understand why this works (I've added some questions inline). Some "constructive" reasoning (not necessarily a proof) for why isSafeDependenceDistance is correct may help (i.e. we know X and therefore we know Y etc.). If this is already documented / discussed elsewhere then a reference to that discussion / documentation will be helpful too.

Thanks for looking closely into this.

I guess clarifying in the documentation that "Step" is the *absolute* stride would help… And I can also add the following:

"
We check here if the absolute distance (|Dist/Step|) is <= the loop iteration count.
This is equivalent to the Strong SIV Test (Practical Dependence Testing , Section 4.2.1);

Note, that for vectorization it is sufficient to prove that the dependence distance is >= VF; This is checked elsewhere.
But in some cases we can prune unknown dependence distances early, and even before selecting the VF, and without a runtime test, by comparing the distance against the loop iteration count. Since the vectorized code will be executed only if LoopCount >= VF, proving that the distance >= LoopCount also guarantees that distance >= VF."

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1225

Step cannot be negative. It is the absolute Stride in bytes. I will clarify that in the function description.

1237

Is this clear enough?:
"The dependence distance is signed (can be positive/negative), so we sign extend Dist; The (unsigned) product is the multiplication of the absolute stride in bytes, and the backdgeTakenCount, so we zero extend Product."

1241

I'm not sure I understand the overflow concern... First of all, just to make sure it's clear: we reach this code only when the distance is unknown, so the specific example you gave with a known distance of 128 will not reach this code. So say the example was the following:

void foo (char *out, signed char Two, signed char SixtyFour) {

for (signed char i = 0; i < Two; i++) {
  char r0 = out[i - SixtyFour];
  out[i + SixtyFour] = r0 * 2;
}

}

The Dist that we will have is: (2 * (sext i8 %SixtyFour to i64))
The Minus expressions that we will compute are:
Minus = (0 + (2 * (sext i8 %SixtyFour to i64)) + (-1 * (zext i8 %Two to i64))<nsw>)
Minus = (0 + (-2 * (sext i8 %SixtyFour to i64)) + (-1 * (zext i8 %Two to i64))<nsw>)

(And of course in this case we will not be able to prove statically that this is positive, so we will return that the distance is not safe).

dorit updated this revision to Diff 86777.Feb 2 2017, 1:11 AM

Addressing Sanjoy's comments. Thanks.

dorit added a comment.Feb 6 2017, 11:09 PM

Any remaining/further concerns?

Sorry for the delay, I'll make sure to review this by the end of (California time) day today. If I don't get to it by then please feel free to check this in, and I'll do a post-commit review.

This revision was automatically updated to reflect the committed changes.