This is an archive of the discontinued LLVM Phabricator instance.

[LV] Teach about non header phis that have uses outside the loop
ClosedPublic

Authored by anna on Aug 10 2018, 12:45 PM.

Details

Summary

This patch identifies non header phis that have no cyclic dependencies with
header phis (reduction/induction/first order recurrence phis).
If those phis have outside uses, we can still vectorize the loop and extract the
last element. This is because the iteration dependence distance for these phis
can be widened upto VF (similar to how we do for induction/reduction) since they do not
have a cyclic dependence with header phis.

The key point is to extract the last element from the vectorized phi and update
the scalar loop exit block phi to contain this extracted element from the vector
loop.

Diff Detail

Event Timeline

anna created this revision.Aug 10 2018, 12:45 PM
Ayal added inline comments.Aug 12 2018, 3:03 PM
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
608

Only 2 incoming values: only header phi's are required to have two incoming values; non-header phi's get if-converted for any number of predecessors.

Only 1 use: in general, it shouldn't really matter how many uses there are, nor whether I is a PHINode; if I is defined inside the loop, and the loop gets vectorized, it should be possible to obtain the value of I associated with the last iteration, either inside or outside the loop. Regarding cyclic dependencies, these are taken care of by analyzing each header phi, and they in turn take care of their own live-outs, by pre-computing (Inductions) or post-processing (Reductions); so any cross-iteration dependence issues will be detected, and need to be addressed, there (cf. D22341).

Note that some Instructions may require special attention when obtaining last-lane value, e.g., isUniformAfterVectorization(). So starting with non-header PHI's sounds like a good first step.

lib/Transforms/Vectorize/LoopVectorize.cpp
3723–3729

Would it suffice to do here

Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
Value *LastIncomingValue = getOrCreateScalarValue(IncomingValue, {UF-1, VF-1});
LCSSAPhi.addIncoming(LastIncomingValue, LoopMiddleBlock);

instead of the below? Note that in general, if IncomingValue isUniformAfterVectorization(), we'll need to ask for lane 0 instead of VF-1.

test/Transforms/LoopVectorize/no_outside_user.ll
16–25

Good to also check that the live-out is extracted correctly.

53

This vectorizes just fine, right?

80

This cannot be vectorized, because reductions are allowed to have only the bump as a single inside user, plus outside users. The header %sum.02 phi flags this. See RecurrenceDescriptor::AddReductionVar

anna added inline comments.Aug 13 2018, 6:55 AM
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
608

I'll update the code. For now, I'm going to start with non-header PHIs as the instruction defined inside the loop and having outside use.
The "single use" check was a very conservative way to avoid cyclic dependency -will check how to address this separately during the header phi analysis for reductions/inductions/first order recurrences.

lib/Transforms/Vectorize/LoopVectorize.cpp
3723–3729

From the code in getOrCreateScalarValue, it looks like it handles scalarization versus "unroll without vectorization" versus "unrolled and vectorized".
So, I think what you've written above is a succinct and complete version for what I need, without having to repeat the code down below. Thanks!

test/Transforms/LoopVectorize/no_outside_user.ll
16–25

yes.

53

vectorizes only with the patch.

Without the patch, Vectorization fails with:

LV: Found an outside user for :   %.lcssa = phi i32 [ %tmp17, %bb16 ]
remark: <unknown>:0:0: loop not vectorized: value could not be identified as an induction or reduction variable
LV: Can't vectorize the instructions or CFG
LV: Not vectorizing: Cannot prove legality.
anna added a comment.Aug 13 2018, 6:55 AM

thanks for the comments Ayal. Have added inline comments. Will address those in upcoming patch.

anna updated this revision to Diff 160366.Aug 13 2018, 8:55 AM
anna marked an inline comment as done.

addressed review comments. added more tests.

Ayal added a comment.Aug 13 2018, 2:12 PM

Overall looks good to me, though it could be cleaned up a bit more?

lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
438–440

Worth updating the above comment, to also include non-header Phi's.

Perhaps leave behind a TODO somewhere, that other (non-Phi) instructions should also be able to work with the new extract-scalar-of-last-iteration code.

601–602

Checking if Phi has outside users here becomes redundant, when AllowNonHdrExits becomes redundant (see below).

lib/Transforms/Vectorize/LoopVectorize.cpp
3724

Distinguishing between different AllowedExits here seem redundant: if IncomingValue isLoopInvariant(), as asserted below, getOrCreateScalarValue({UF-1,VF-1}) will simply return it. So the 'else' case below can also take care of the 'then' case. In general, whatever the allowed exit may be, grab the scalar value associated with the last iteration.

This will make AllowedNonHdrExits useless.

In the most general form, one may need to do

unsigned LastLane = Cost->isUniformAfterVectorization(IncomingValue, VF) ? 0 : VF - 1;
Value *lastIncomingValue =
          getOrCreateScalarValue(IncomingValue, {UF - 1, LastLane});

but this is not needed for this patch.

test/Transforms/LoopVectorize/no_outside_user.ll
12–15

Suffice to say "tmp17" is a non-header phi, which is an allowed exit.

90

(Check that) %predphi1 has %predphi as one of its operands, right? Thereby merging the more than 2 original incoming values together.

205

In cyclic_dep_with_indvar() below, %iv is conditionally reset inside the loop, disqualifying it from being an induction variable; so the loop isn't vectorizable, regardless of live-outs. Right?

anna marked 2 inline comments as done.Aug 14 2018, 6:31 AM
anna added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
3724

good point. will reduce complexity here.

test/Transforms/LoopVectorize/no_outside_user.ll
205

yes, I had added the test case to show that cyclic dependencies that do not make the loop vectorizable still remain non-vectorizable. Same behaviour with not allowable cyclic dependencies for reductions.

I don't think there's a test where the loop was vectorizable in absence of live-outs and now is no longer vectorizable because of the live-outs. That's what this patch fixes :)

anna marked 4 inline comments as done.Aug 14 2018, 7:20 AM
anna updated this revision to Diff 160579.Aug 14 2018, 7:23 AM

addressed review comments and simplified code.

Ayal accepted this revision.Aug 14 2018, 7:52 AM
This revision is now accepted and ready to land.Aug 14 2018, 7:52 AM
This revision was automatically updated to reflect the committed changes.