This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize header phis that feed from if-convertable latch phis
AbandonedPublic

Authored by anna on Aug 8 2018, 1:53 PM.

Details

Summary

This patch teaches loop vectorizer to vectorize phi nodes that have the
following characteristics:

  1. header phis that are not identified as induction/reduction/first order

recurrences.

  1. feeds in from a phi in the latch block and that phi can be if-converted
  2. unused outside the loop.

Condition #3 will be avoided in a follow on change.
This is to teach the vectorizer about general recurrences and cross iteration
dependencies.
The key point here is that if the header phi feeds from an if-convertable phi,
then the header phi can be vectorized. The 'resume' value extracted for this
header phi in the scalar post loop is the last element of the vectorized phi in
the latch block.

Please see added test cases. A current TODO item is we do not want to vectorize
'dead' loops, i.e. a read only loop whose values computed in the loop have no
outside users.

Please note: I will add more test cases, this is a proof of concept patch just
to make sure I've not missed some obvious legality constraint.

Diff Detail

Event Timeline

anna created this revision.Aug 8 2018, 1:53 PM
anna added inline comments.Aug 8 2018, 1:57 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
3393

This is just a refactoring of the code from fixFirstOrderRecurrence. It is used by both fixFirstOrderRecurrence and fixCrossIterationPhiInHeader.

anna updated this revision to Diff 159793.Aug 8 2018, 2:10 PM

clang formatted the change and added some comments. NFC wrt previous diff.

fhahn added a subscriber: fhahn.Aug 8 2018, 2:21 PM
anna added a reviewer: hfinkel.Aug 8 2018, 5:47 PM
anna added a comment.Aug 9 2018, 10:24 AM

fuzz testing this patch through our internal fuzzer shows all tests passed.

Your patch miscompiles the following:

void f(int n, int *p, int *z) {
int x = 100;
#pragma clang loop vectorize(enable)
for (int i = 0; i < n; ++i) {
if (p[i] > 0) x *= p[i];
else x /= p[i];
z[i]=x;
}
}
typedef void fty(int,int*,int*);
fty*volatile ff=f;
int main() {
  int p[] = {1,-2,3,-4};
  int z[] = {1,2,3,4};
  ff(4, p, z);
  printf("%d %d %d %d\n", z[0], z[1], z[2], z[3]);
}
anna added a comment.Aug 9 2018, 3:05 PM

Your patch miscompiles the following:

void f(int n, int *p, int *z) {
int x = 100;
#pragma clang loop vectorize(enable)
for (int i = 0; i < n; ++i) {
if (p[i] > 0) x *= p[i];
else x /= p[i];
z[i]=x;
}
}
typedef void fty(int,int*,int*);
fty*volatile ff=f;
int main() {
  int p[] = {1,-2,3,-4};
  int z[] = {1,2,3,4};
  ff(4, p, z);
  printf("%d %d %d %d\n", z[0], z[1], z[2], z[3]);
}

yeah, that's right. The header phi is a predicated reduction and we'll miscompile with a splatted preheader value.
I think it would be okay if the iteration dependence distance > VF.
However, the case I'm interested in is a "predicated reduction". The header phi is just not recognized as a reduction.

Ayal added inline comments.Aug 9 2018, 3:27 PM
lib/Transforms/Utils/LoopUtils.cpp
673

Looks like one thing being asked here is: why bail out if isa<PHINode>(Previous)? @mssimpso answers that in r265983, in response to PR27246. There, Previous was a header PHINode, (specifically an Induction), which effectively means the original header PHI is a 2nd (or greater) order recurrence. It may be interesting to see if that case could be vectorized correctly.
In any case, the motivation for this patch involves Previous which isa PHINode, but not of the header block. Such Previous's get vectorized into blends, so presumably this may be safe and helpful:

   auto *Previous = dyn_cast<Instruction>(Phi->getIncomingValueForBlock(Latch));
-  if (!Previous || !TheLoop->contains(Previous) || isa<PHINode>(Previous) ||
+  if (!Previous || !TheLoop->contains(Previous) ||
+      (isa<PHINode>(Previous) &&
+       Previous->getParent() == TheLoop->getHeader()) ||
       SinkAfter.count(Previous)) // Cannot rely on dominance due to motion.

Note however that for 1st order recurrence to work, Previous (including if it's a PHI of a non-header block) must dominate all users of the original header PHI. Or be made to dominate them by Sinking the users After Previous. In particular, Previous cannot be data-dependent on the original header PHI, which will close a dependence cycle.

In the tests below, %phiuseout2 which is the candidate Previous of %hdrphi1, also depends on it (e.g., via %tmp50), disqualifying it from being a 1st order recurrence. Every such dependence cycle must be broken, if possible, or rather expanded into a cycle of distance VF, as done for Inductions and Reductions, in order for vectorization to succeed. Efforts to handle more cyclic dependencies include D49168 and D22341. Not sure I follow how the tests below with their cyclic dependencies are expected to be vectorized, nor the "over vectorized" argument - perhaps related to D50480(?).

anna added inline comments.Aug 10 2018, 6:33 AM
lib/Transforms/Utils/LoopUtils.cpp
673

hi Ayal,
Thanks for the comments about the first order recurrence.

For the test below, as you mentioned, it is not a first order recurrence. However, I think of it as a "predicated reduction".
The computation I'm interested in vectorizing:

hdrphi1 = boolarr[i+c] == 1 ? intarr[i+c] + (hdrphi2 == 0 ? hdrphi1 : 0) : (!(hdrphi2 == 0) ? hdrphi1 : -1)

if boolarr[i+c] is 1 (and hdrphi2 is 0), then this computation reduces to a reduction:

hdrphi1 = intarr[i+c] +hdrphi1

if boolarr[i+c] != 1, then hdrphi1 is not dependent on previous iteration (it's either hdrphi1 itself or -1).
if boolarr[i+c] is 1 and hdrphi2 != 0, then
hdrphi1 = intarr[i+c]

Some forms of predicated reduction is already supported today (when tested on clang), such as:

if (a[i] > 0)
 x *= a[i];

I think it's because we have the identity vector for the else case? i.e. this becomes:

if (a[i] > 0)
 x *= a[i];
else
 x *= 1;
anna added inline comments.Aug 10 2018, 6:55 AM
lib/Transforms/Utils/LoopUtils.cpp
673

I have come to the conclusion that the case I have simply cannot be treated as a predicated reduction, because the else part of the predicate is not a reduction.
Specifically, cases such as these aren't vectorizable:

for (i = 0; i < n; ++i) {
    if (p[i] > 0) x += p[i];
    else x =0; <-- this bites us because it cannot be expressed as a reduction since the dependence distance cannot be widened to VF.
}
Ayal added inline comments.Aug 11 2018, 2:11 AM
lib/Transforms/Utils/LoopUtils.cpp
673

Agreed. If a conditional update can be folded into

x *= (a[i] > 0 ? a[i] : 1);

then we're good. See, e.g., D49168 as referenced earlier. But resetting the accumulator while reduction is in progress is indeed a bummer. Reductions are vectorized based on their ability to reassociate; which does not hold for floating-point computations (w/o fast-math), nor for such resets.

anna abandoned this revision.Aug 13 2018, 9:05 AM

We cannot vectorize predicated reductions where the else part of the predicate is a reset of the reduction val.

lib/Transforms/Utils/LoopUtils.cpp
673

yes, unfortunately, here we are resetting the accumulator.