Page MenuHomePhabricator

[LV] Vectorize loops where non-phi instructions used outside loop
ClosedPublic

Authored by anna on Aug 15 2018, 7:01 AM.

Details

Summary

Follow up change to rL339703, where we now vectorize loops with non-phi
instructions used outside the loop. Note that the cyclic dependency
identification occurs when identifying reduction/induction vars.

We also need to identify that we do not allow users where the PSCEV information
within and outside the loop are different. This was the fix added in rL307837
for PR33706.

Diff Detail

Repository
rL LLVM

Event Timeline

anna created this revision.Aug 15 2018, 7:01 AM
hiraditya added inline comments.
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
438 ↗(On Diff #160802)

This would be trivial if we realize that outside user must be in a loop-closed phi node for a valid LCSSA, IIUC @sebpop ?

anna added inline comments.Aug 15 2018, 11:19 AM
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
438 ↗(On Diff #160802)

We will still need to check if the phi node use is contained in the loop or not. So, at best, it will reduce the number of cases where we will need to do the contains operation.

if (isa<PHINode>(UI) && !TheLoop->contains(UI)) {
   LLVM_DEBUG(dbgs() << "LV: Found an outside user for : " << *UI << '\n');
   return true;
}

Feel free to update this if you'd like, outside of this patch.

Ayal added a comment.Aug 15 2018, 3:15 PM

Suggest to update InnerLoopVectorizer::fixLCSSAPHIs() as follows, now that arbitrary values are allowed to be live-out:

unsigned LastLane = Cost->isUniformAfterVectorization(IncomingValue, VF) ? 0 : VF - 1;
Value *lastIncomingValue =
    getOrCreateScalarValue(IncomingValue, {UF - 1, LastLane});
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
582 ↗(On Diff #160802)

Use PredicateValidOutsideLoop below?

728 ↗(On Diff #160802)

It would now be good to reason about and mark all exits that are not allowed to be live-out, rather than those that are :-). Specifically, Reduction Phi's are on this "not allowed" list, as they represent the one-before-last value, which is not available when vectorized (could possibly be computed by "subtracting" from the final reduced value), unlike the Reduction bump instructions. Induction Phi's and their bumps are, as pointed out here, also on this "not allowed" list when the SCEV predicates cannot be used outside the loop, because the latter are used to pre-compute the live-out values. But such cases could possibly be handled alternatively by extracting from within the loop, similar to internal Instructions, rather than pre-computing. FirstOrderRecurrence Phi's are also currently one the "not allowed" list, unlike their "Previous" value, and could also possibly be handled using extraction. Would be good to confirm that this enumeration is exhaustive.

anna added inline comments.Aug 16 2018, 7:39 AM
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
582 ↗(On Diff #160802)

that doesn't work. I'd started off with it and the pr33706.ll test case failed. Specifically, the predicates were added as part of IsInductionPhi within the loop over instructions below. So, we need to check for the predicates at the time of adding valid exits that can be used outside the loop.

I'll remove the dead variable declaration here.

728 ↗(On Diff #160802)

I prefer this being a follow-on NFC. By having it as a follow-on "intended" NFC, it also confirms the enumeration for the "not allowed list" is exhaustive :)

anna updated this revision to Diff 161047.Aug 16 2018, 9:35 AM

addressed review comments - added support for uniform instructions.

anna added inline comments.Aug 16 2018, 9:37 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
3727 ↗(On Diff #161047)

I have trouble getting a test case that exercises this path. Tried couple of different uniform instructions (GEPs with constant operands). In all cases, we were vectorizing these and then extracting the VF -1 th element.
Any ideas on what will be a good test case for this? Thanks.

Ayal added inline comments.Aug 19 2018, 1:02 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
3727 ↗(On Diff #161047)

Yes, the GEPs themselves are immune, because they are checked by UsersAreMemAccesses() to see if all their users are loads/stores, so if they have a loop-closed phi as a user they will not be regarded as uniform. The following should work:

; RUN: opt < %s -loop-vectorize -S

target datalayout = "E-m:e-p:32:32-i64:32-f64:32:64-a:0:32-n32-S128"

@tab = common global [32 x i8] zeroinitializer, align 1

define i32 @uniform_live_out() {
entry:
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %i.09 = add i32 %i.08, 7
  %arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.09
  %0 = load i8, i8* %arrayidx, align 1
  %bump = add i8 %0, 1
  store i8 %bump, i8* %arrayidx, align 1
  %inc = add nsw i32 %i.08, 1
  %exitcond = icmp eq i32 %i.08, 20000
  br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body
  %lcssa = phi i32 [%i.09, %for.body]
  %arrayidx.out = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %lcssa
  store i8 42, i8* %arrayidx.out, align 1
  ret i32 0
}

Sorry about this. It's really due to the way uniform values are stored. Would be better to have an API for obtaining the scalar value corresponding to the last VF-1 lane, regardless of how it's stored, instead of having users check if VF-1 or 0 should be passed.

But wait, this test shows that LoopVectorizationCostModel::collectLoopUniforms() must be updated too, to refrain from recognizing instructions with live-outs as being uniform; except for instructions that are inherently uniform/invariant (i.e., all VF values will be the same), and induction variables with their bumps - which are pre-computed rather than have their last-iteration value taken.

anna added inline comments.Aug 20 2018, 7:26 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
3727 ↗(On Diff #161047)

thanks Ayal. I've made the change in collectLoopUniforms to avoid incorrectly recognizing instructions with outside uses as uniform (unless those uses themselves were uniform). So, %i.09 = add i32 %i.08, 7 is no longer scalarized and we extract the VF-1th element. Without that change, we were recognizing i.09 as uniform.
I will add this test case and find another one where the live-out is of a really uniform instruction.

anna updated this revision to Diff 161491.Aug 20 2018, 8:51 AM

Updated LoopVectorizationCostModel::collectLoopUniforms to correctly identify uniforms

anna added inline comments.Aug 20 2018, 9:11 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
3727 ↗(On Diff #161047)

But wait, this test shows that LoopVectorizationCostModel::collectLoopUniforms() must be updated too, to refrain from recognizing instructions with live-outs as being uniform; except for instructions that are inherently uniform/invariant (i.e., all VF values will be the same), and induction variables with their bumps - which are pre-computed rather than have their last-iteration value taken.

IIUC, the change in the latest diff I have will recognize the following as uniform instructions:

  1. the IVs and bumps which satisfy certain conditions as uniforms (no change there),
  2. uses of operands of uniform instruction which satisfy certain conditions - already marked uniform or is a uniform pointer operand of load/store (topologically identifying uniform instructions)

Instructions that are used outside and don't satisfy either of above are not marked uniform: For example "%i.09 = add i32 %i.08, 7".

But then this patch will not have any incoming non-induction phis that need to be updated through fixLCSSA and are uniformAfterVectorization.

I think I might be missing something in what you're stating?

Ayal accepted this revision.Aug 20 2018, 1:16 PM
Ayal added inline comments.
lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
728 ↗(On Diff #160802)

OK, sure. May be worth leaving a TODO behind.

lib/Transforms/Vectorize/LoopVectorize.cpp
3727 ↗(On Diff #161047)

This looks fine to me now! - no Uniforms are expected by fixLCSSA, so it should indeed continue to always ask for lane VF-1. I.e., w/o the LastLane change I suggested above.

4516 ↗(On Diff #161491)

Above comment deserves an update. The original "out of scope" term was somewhat misleading, as isOutOfScope() refers to irrelevant operands rather than live-out users.

This revision is now accepted and ready to land.Aug 20 2018, 1:16 PM
anna added a comment.Aug 21 2018, 6:16 AM

thanks for reviewing these changes Ayal!

This revision was automatically updated to reflect the committed changes.
anna marked 2 inline comments as done.Aug 21 2018, 7:42 AM