This is an archive of the discontinued LLVM Phabricator instance.

[LV] Support efficient vectorization of an induction with redundant casts
ClosedPublic

Authored by dorit on Oct 16 2017, 5:29 AM.

Details

Summary

D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose update chain involves casts; PSCEV can now build an AddRecurrence for some forms of such phi nodes, under the proper runtime overflow test. This means that we can identify such phi nodes as an induction, and the loop-vectorizer can now vectorize such inductions, however inefficiently. The vectorizer doesn't know that it can ignore the casts, and so it vectorizes them.

This patch records the casts in the InductionDescriptor, so that they could be marked to be ignored for cost calculation (we use vecValuesToIgnore for that) and ignored for vectorization/widening/scalarization (i.e. treated as TriviallyDead).

In addition to marking all these casts to be ignored, we also need to make sure that each cast is mapped to the right vector value in the vector loop body (be it a widened, vectorized, or scalarized induction). So whenever an induction phi is mapped to a vector value (during vectorization/widening/scalarization), we also map the respective cast instruction (if exists) to that vector value.

Note: if the phi-update sequence of an induction involves more than one cast, then the above mapping to vector value is relevant only for the last cast of the sequence. (We also allow only the "last cast" to be used outside the induction update chain itself).

Lastly, we also record all these "last casts" in InductionCastsToIgnore; this is because some utilities, which need to query whether a Value is an Induction variable, may wish to be able to (quickly) identify that also a casted induction value behave like an Induction (e.g. for cost calculation, such as in getAddressAccessCost).

This is the last step in addressing PR30654.

Diff Detail

Event Timeline

dorit created this revision.Oct 16 2017, 5:29 AM
Ayal added inline comments.Nov 2 2017, 9:33 AM
../llvm/include/llvm/Transforms/Utils/LoopUtils.h
311 ↗(On Diff #119140)

"\Casts" >> "\p Casts". Or better yet "\p CastsToIgnore"?

359 ↗(On Diff #119140)

Better have "const SmallVectorImpl<Instruction *>" as the return type?

367 ↗(On Diff #119140)

Use "Casts" instead of "CI" for consistency.

379 ↗(On Diff #119140)

"CastInsts" >> "RedundantCasts"?

../llvm/lib/Analysis/ScalarEvolution.cpp
4389 ↗(On Diff #119140)

"casts" >> "cast", or "and".

4405 ↗(On Diff #119140)

Can you check if (!Mask || !Mask.isMask()) instead?

4424 ↗(On Diff #119140)

Forgot %x to And with 255?

4428 ↗(On Diff #119140)

"%sext, m" >> "%tmp, m"?

fhahn added a subscriber: fhahn.Nov 3 2017, 2:47 AM
dorit updated this revision to Diff 121630.Nov 5 2017, 10:23 AM

Incorporated Ayal's comments. Thanks!

dorit added a comment.Nov 16 2017, 1:58 AM

ping^2

thanks,
Dorit

Ayal edited edge metadata.Nov 16 2017, 2:38 AM

A few additional minor comments..

lib/Analysis/ScalarEvolution.cpp
4378

Perhaps a clearer way of stating this: "This last instruction (is the only one which) may be used outside of the ExtTrunc cast sequence, so it is placed in position 0 of CastInsts, for special handing later on."

Worth asserting that CastInsts is empty before pushing back, to make sure that the above statement holds?

4404

assert empty before push_back?

test/Transforms/LoopVectorize/vect-phiscev-sext-trunc.ll
26

doit3 >> doit1

44

If we first check NOT to have the shl which defines TEST, seems useless to then check that ashr doesn't use TEST.

Better check instead that the induction phi in vector.body (the one being stored into a[i]) feeds and consumes its bump directly?

94

doit3 >> doit2

dorit updated this revision to Diff 123203.Nov 16 2017, 9:58 AM

Thanks Ayal. Incorporated your suggestions.

Sorry for not having a look earlier.

Maybe I'm missing something, but why not rewrite the induction variable using the the SCEV that we get from PSE (something like what IndVarsSimpify does)?
If we would take this approach then the casts would end up being dead code and removed.

lib/Analysis/ScalarEvolution.cpp
4470

(1) might not hold in the future, so this might end up being a problem?

dorit added a comment.Nov 19 2017, 5:24 AM

Hi Silviu,

Maybe I'm missing something, but why not rewrite the induction variable using the the SCEV that we get from PSE (something like what IndVarsSimpify does)? If we would take this approach then the casts would end up being dead code and removed.

We can't do that on the original loop, which remains scalar and untouched, and will not be guarded by the scev predicates.
This means that the cast instructions will be there when we start vectorizing, and we need a way to tell that they will cost us nothing, and that we don't need to vectorize them but rather "look through them" as if they weren't there.

IIUC, what IndVarSimply does is call the rewriter and then fix the users of the induction; This is similar in effect to what we do: We don't need to call the PSCEV rewriter again, we already have the nice AddRec for the induction phi (isInductionPhi() had already obtained it); The vectorization of the induction phi proceeds unchanged; The only thing we are adding is the def-use wiring so that in the vectorized loop, any users of the cast instructions will be users of the vectorized phi. We never vectorize the casts, and we never actively remove them -- they will end up dead code in the vectorized loop because they will not be used.

Comment at: lib/Analysis/ScalarEvolution.cpp:4470
+/ (1) The value of the phi at iteration i is:
+
/ (Ext ix (Trunc iy ( Start + i*Step ) to ix) to iy)

+/// (2) Under the runtime predicate, the above expression is equal to:

(1) might not hold in the future, so this might end up being a problem?

Do you mean this will not hold once additional forms of "casted-inductions" are supported by createAddRecFromPhiWithCasts()?
In such a case this function will not find the IR sequence that it expects, and we will have a missed optimization (namely, the vectorizer will vectorize the cast instructions instead of ignoring them). But we won't have a correctness problem.

I think I should probably add a comment to the documentation of createAddRecFromPhiWithCasts() to remind us that if/when it is extended to support more forms of "casted-induction" SECV-Exprs, then it is recommended to also extend getCastsForInductionPHI() to look for the additional IR patterns that can result in these SCEV-Exprs. Would that address your concern?

Thanks,
Dorit

Ayal added inline comments.Nov 19 2017, 10:31 AM
lib/Transforms/Utils/LoopUtils.cpp
880

Set CastInsts to nullptr or the address of a SmallVector created earlier depending on the conditions below, or invoke isInductionPHI() once using the default nullptr for its last parameter, or else with such an address.

933

Suggest to keep the 'nullptr' parameter as the last argument and continue to use its default, or document /* what-this-nullptr-is */

lib/Transforms/Vectorize/LoopVectorize.cpp
619

Placing 'Part' next to 'Lane' seems more logical.

1592

The name "isCastedInductionVariable()" may be confused with "isInductionVariable()" - the latter deals with a header Phi, whereas the former deals with a redundant Cast. (The Phi feeding an isCastedInductionVariable() answers true to isInductionVariable(), right?)

Perhaps have "isInductionPhi()", and change "isInductionVariable()" to answer true for both Phi and Cast Instructions of an IV?

5273

In other words, all casts could be recorded here for ignoring, but suffices to record only the first.

6999

VecValuesToIgnore holds Instructions whose cost should be ignored only if widened, and for VF's where they are actually widened, IIUC their use in calculateRegisterUsage(). Should the redundant IV casts be added to ValuesToIgnore instead, being redundant in either scalar and vector types?

In any case, having expectedCost() neglect the cost of VecValuesToIgnore may affect cases unrelated to this patch's casted Inductions (related to Reductions), so probably better done in a separate patch.

test/Transforms/LoopVectorize/vect-phiscev-sext-trunc.ll
52

TEST is set but not used.

120

TEST is set but not used.

IIUC, what IndVarSimply does is call the rewriter and then fix the users of the induction; This is similar in effect to what we do: We don't need to call the PSCEV rewriter again, we already have the nice AddRec for the induction phi (isInductionPhi() had already obtained it); The vectorization of the induction phi proceeds unchanged; The only thing we are adding is the def-use wiring so that in the vectorized loop, any users of the cast instructions will be users of the vectorized phi. We never vectorize the casts, and we never actively remove them -- they will end up dead code in the vectorized loop because they will not be used.

Yes, IndVarSimpify wouldn't fix this issue, but I was thinking more of using the techniques there that use the SCEV expressions to find these cases instead of doing the pattern matching (see the inline comment).

Comment at: lib/Analysis/ScalarEvolution.cpp:4470
+/ (1) The value of the phi at iteration i is:
+
/ (Ext ix (Trunc iy ( Start + i*Step ) to ix) to iy)

+/// (2) Under the runtime predicate, the above expression is equal to:

(1) might not hold in the future, so this might end up being a problem?

Do you mean this will not hold once additional forms of "casted-inductions" are supported by createAddRecFromPhiWithCasts()?

Yes, but even if it's not a correctness issue I think we should avoid pattern matching here if possible to avoid doing more work in the future.

Thanks,
Silviu

lib/Analysis/ScalarEvolution.cpp
4482

If I understand correctly the SCEV expression returned by PSE for x2 should be the same as for x1?

In that case, wouldn't it be possible to search for values in the loop with the same SCEV as an induction PHI, go through the def-use chain until you've reached the PHI and mark all instructions in between as a (part of) a cast of that PHI? I think this wouldn't need any pattern matching and should be straight-forward to do.

Also I think ideally this should all be in an utility function (LoopUtils?). It seems a strange to have this in Scalar Evolution. It's very specific to the vectorizer.

4491

Technically this would be part of a cast operation, and not a cast itself? (shl changes the value).

Yes, IndVarSimpify wouldn't fix this issue, but I was thinking more of using the techniques there that use the SCEV expressions to find these cases instead of doing the pattern matching (see the inline comment).

Oh, so your comment was referring to how we retrieve the IR sequence, not to how we vectorize the phi... Ok, I think I understand your point now.

Do you mean this will not hold once additional forms of "casted-inductions" are supported by createAddRecFromPhiWithCasts()?

Yes, but even if it's not a correctness issue I think we should avoid pattern matching here if possible to avoid doing more work in the future.

Yes, I agree it's preferable.
The current pattern-matching is actually an overkill, just being overly cautious checking again things we don't need to re-check. It's enough to walk the def-use chain starting from the value that defines the phi via the back-edge, and, as you suggest, once we encounter a Value on the way whose SCEV is the same as the PHI SCEV to mark instructions as part of a casting sequence.

I agree that it's nicer. I would hesitate about implementing here something much more generic than what our SCEV-rewriter is currently able to support. But it can certainly be made more generic than it currently is, so it will be easier to extend it even further, in the future, if/when the need arises. I will give it try.

lib/Analysis/ScalarEvolution.cpp
4482

If I understand correctly the SCEV expression returned by PSE for x2 should be the same as for x1?

Yes, that should be the case

In that case, wouldn't it be possible to search for values in the loop with the same SCEV as an induction PHI

Is there a ScalarEvolution map that holds this information (all the Values that have the same SCEV)? There's the ExprValueMap which looks close, but I'm not sure... or do you suggest to really obtain "from scratch" all the values defined in the loop and call getSCEV for each? Would it make more sense to start from the Value that defines the phi from the backedge (the way we start now), walk the def-use chain from there? (calling getSCEV on each value we find on the way, and from the point where we encounter the same SCEV as the induction phi continue the def-use walk while marking instructions on the way as part of a cast)?

Also I think ideally this should all be in an utility function (LoopUtils?). It seems a strange to have this in Scalar Evolution. It's very specific to the vectorizer.

Ok, sure, will move it.

4491

Yes, sure. It's a "Cast-Sequence Member". I'll drop the "Cast:" from the comment.

dorit updated this revision to Diff 123517.Nov 19 2017, 2:49 PM

Addressed Ayal's comments.
Have yet to address Silviu's comments.

Thanks very much Silviu and Ayal!

dorit marked 12 inline comments as done.Nov 20 2017, 1:28 AM
dorit added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
6999

VecValuesToIgnore holds Instructions whose cost should be ignored only if widened, and for VF's where they are actually widened, IIUC their use in calculateRegisterUsage(). Should the redundant IV casts be added to ValuesToIgnore instead, being redundant in either scalar and vector types?

No, because then we would also ignore the cost of the casts when we calculate the baseline cost (of the scalar loop), and we don't want to do that (that loop will not be guarded by the predicate, and the casts will remain there).

In any case, having expectedCost() neglect the cost of VecValuesToIgnore may affect cases unrelated to this patch's casted Inductions (related to Reductions), so probably better done in a separate patch.

Yes. I thought it's a small enough fix to be included in this patch. I can submit a separate patch for this.

Hi Silviu,
I started to try out the approach you suggested, and I realized that our assumption doesn't hold... (see response to inlined comment).
Thanks,
Dorit

lib/Analysis/ScalarEvolution.cpp
4482

If I understand correctly the SCEV expression returned by PSE for x2 should be the same as for x1?

Yes, that should be the case

Turns out that's not the case…. For example, for this def-use chain:
V0: %w_ix.011 = phi (0, %add)
V1: %sext = shl i64 %w_ix.011, 32
V2: %idxprom = ashr exact i64 %sext, 32
V3: %add = add i64 %idxprom, %step

The scev expr of the Phi is {0,+,%step}<%for.body>
The scev expr of V2 is (sext i32 (trunc i64 %w_ix.011 to i32) to i64)
(this is what PSCEV.getSCEV() returns)....

sbaranga added inline comments.Nov 20 2017, 1:15 PM
lib/Analysis/ScalarEvolution.cpp
4482

Right... I think the SCEV rewriter isn't replacing w_ix.011 with {0, +, step} (not taking into account the equals predicate).

Ideally we would be making the following transformations:

(sext i32 (trunc i64 %w_ix.011 to i32) to i64) -> (equals predicate, replace %w_ix)
(sext i32 (trunc i64 {0, +, step} to i32) to i64) -> (fold trunc)
(sext i32 {0, + trunc i64 step to i32} to i64) -> (no overflow predicate)
{0, +, sext i32 (trunc i64 step to i32)} -> (equals predicate, replace sext i32 (trunc i64 step to i32) with step)
{0, +, step}

dorit added a comment.Nov 22 2017, 2:03 PM

Hi Silviu,

Right... I think the SCEV rewriter isn't replacing w_ix.011 with {0, +, step} (not taking into account the equals predicate).

a couple updates:

  1. I think mixed up things a bit in what I wrote above; the situation is as follows:

Ideally we would be making the following transformations:
(sext i32 (trunc i64 %w_ix.011 to i32) to i64) -> (equals predicate, replace %w_ix)
(sext i32 (trunc i64 {0, +, step} to i32) to i64) -> (fold trunc)
(sext i32 {0, + trunc i64 step to i32} to i64) -> (no overflow predicate)
{0, +, sext i32 (trunc i64 step to i32)}

Up to here everything works as expected, and the expression returned for PSE.getSCEV(V2) is {0, +, sext i32 (trunc i64 step to i32)}.
But now, when we check if this SCEV is equal to {0, +, step}, we failed because my equality check was not looking at the predicates... When considering the equality predicates all is well.

BTW, I didn't find a PSCEV utility that checks for equality of SCEV expressions taking equality predicates into account… did I miss anything? (here we have two AddRecs to compare, so a Start1,Start2 to compare and a Step1,Step2 to compare, so in what I wrote I'm looking for any EqualPredicates whose LHS=Step1 and RHS=Step2, or LHS=Step2 and RHS=Step1, and the same for the Start exprs… makes sense?? (just feels like a lot of work…))

  1. There was another bit of a hurdle in the unsigned version of this test (doit3 in the testcase):

The IR pattern is the following:

V0: %p.09 = phi (0, %add)
V1: %conv = and i32 %p.09, 255
V2: %add = add nsw i32 %conv, %step

And we have:
PSCEV.getScev(V0) = {0,+,%step}
PSCEV.getScev(V2) = {0,+,(sext i8 (trunc i32 %step to i8) to i32)}

And we are not able to deduce equality of the above because the Equal Predicate that we add in createAddRecFromPHIWithCasts() is:
%step == (zext i8 (trunc i32 %step to i8) to i32)

I guess that's a bug in the Predicate creation (?) in createAddRecFromPHIWithCastsImpl():
"AccumExtended = GetExtendedExpr(Accum)" creates the extended expression with zext, whereas probably the step part should be extended using sext because the overflow check that we add is IncrementNUSW…? Does that make sense?

(when I fix that, then the entire test passes).

Thanks,
Dorit

dorit updated this revision to Diff 124530.Nov 28 2017, 1:55 AM

Hi Silviu,

The new version I uploaded has two main changes:

  1. It fixes (what I think may be) a bug in createAddRecFromPHIWithCastsImpl(), where we add the equality predicate for the unsigned case: The IncrementNUSW overflow predicate that we add in this function allows the rewriter to rewrite this:

(zext i8 {0, + , (trunc i32 step to i8)} to i32)
into
{0, +, (sext i8 (trunc i32 step to i8) to i32)}
But the Equal predicate that we add is:
%step == (zext i8 (trunc i32 %step to i8) to i32).
So the fix changes the Equal predicate to: %step == (sext i8 (trunc i32 %step to i8) to i32)
(even for the unsigned case).

  1. It changes the search for the IR cast-sequence in the spirit of what you proposed: It is moved to LoopUtils.cpp, and it relies on the SCEV of an instruction to be equal (***) to the SCEV of the phi.

I think it may not be entirely as general as you may have envisioned, but generalizing the implementation even further comes with some cost (complexity) which I am not sure that the current limited support justifies. Even the other SCEV patterns that I've seen (which I listed under the TODO of createAddRecFromPHIWithCastsImpl()) would be covered by the current implementatin, so I wouldn't like to over generalize at this point. In any case, this implementation is much more easily extendable than the previous pattern-based approach,
I hope this is close enough to what you had in mind…

(***) For the scev equality, I added the "areAddRecsEqualWithPreds()" utility, which considers the Equal predicates, because the rewriter did not rewrite this:
{0, +, (sext i8 (trunc i32 to i8) to i32)}
into {0, +, %step}.
We could instead extend the rewriter: in the "Sext/Zext" case it would have to check if the SCEV expr at hand confirms to the pattern (ext ix (trunc iy %step to ix) to iy),
and if so, to look for any equality predicates of the form:
(ext ix (trunc iy %step to ix) to iy) == %step.
(basically to call "areAddRecsEqualWithPreds()" there, instead of in the LoopUtils utility.
Do you think we should do that?

Many thanks,
Dorit

dorit updated this revision to Diff 124555.Nov 28 2017, 5:54 AM

(uploaded a fix to LoopUtils:getCastsForInductionPHI())

Hi Dorit,

Thanks for doing the changes! I just have a few comments/questions (see inline).

Silviu

lib/Analysis/ScalarEvolution.cpp
4751

Nice catch, I think you can do a separate commit for this fix.

4815

Could you add a FIXME here?

This shouldn't be required, and PSE should return the same expression for both.

4823

Any idea what the complexity of isKnownPredicate() is?

lib/Transforms/Utils/LoopUtils.cpp
889

I guess this is just an optimization for speed?

Maybe it's enough to check that SE returns a SCEVUnknown and PSE returns an AddRec?

dorit marked 3 inline comments as done.Nov 30 2017, 2:00 AM
dorit added inline comments.
lib/Analysis/ScalarEvolution.cpp
4751
4815

Yes, I'll do that;
"FIXME: This is currently required because the rewriter currently does not rewrite this:
{0, +, (sext ix (trunc iy to ix) to iy)}
into {0, +, %step},
although there exists an Equal predicate "%step == (sext ix (trunc iy to ix) to iy)".

4823

I guess it can be potentially more complex than looking up Preds.implies... I can move it to be last check (or drop it...?).

lib/Transforms/Utils/LoopUtils.cpp
889

Yes, to make sure we only look at relevant phis...

The check you suggest is exactly what the caller to this utility checks before calling this routine...

dorit marked 3 inline comments as done.Dec 6 2017, 12:50 AM

Hi Ayal,

lib/Transforms/Vectorize/LoopVectorize.cpp
6999

Yes. I thought it's a small enough fix to be included in this patch. I can submit a separate patch for this.

This is https://reviews.llvm.org/D40883.

thanks,
Dorit

dorit updated this revision to Diff 126038.Dec 7 2017, 2:00 PM

Dropped the parts that are uploaded for review separately (D40641, D40883), and hopefully addressed Silviu's last comments.

Ayal added a comment.Dec 10 2017, 10:25 AM

This looks good to me, but please wait for Silviu to approve as well before committing.

include/llvm/Analysis/ScalarEvolution.h
1027

clang-format?

lib/Transforms/Vectorize/LoopVectorize.cpp
2615

Early exit instead if (Casts.empty())?

5921

Can be folded into a single return (Inst && InductionCastsToIgnore.count(Inst));

6999

Great, thanks.

sbaranga added inline comments.Dec 10 2017, 11:03 AM
lib/Transforms/Utils/LoopUtils.cpp
889

In that case would we ever bail out here? If not, and this doesn't affect correctness, maybe we shouldn't do this check?

The problem I have with havePredicatedSCEVRewriteForPHI is that it is looking into the SCEV internal structures while SCEV is doing the analysis lazily. So the answer that it's getting is that either we don't have a better value under some predicates or we haven't performed the analysis at all. So maybe there is an assumption here that createAddRecFromPHIWithCasts was called at some point.

From an interface perspective, I think it's better to call createAddRecForPHIWithCasts if we really require doing this check.

943

nit: formatting, there should be a space before after AR

dorit updated this revision to Diff 126607.Dec 12 2017, 12:40 PM

Addressed Ayal's and Silviu's comments.

dorit marked 5 inline comments as done.Dec 12 2017, 12:46 PM
dorit added inline comments.
lib/Transforms/Utils/LoopUtils.cpp
889

Ok, I see your point. I dropped the (redundant/overly-cautious) call to havePredicatedSCEVRewirtesForPHI.

dorit marked an inline comment as done.Dec 14 2017, 6:21 AM

Thanks so much for all your help with this work!

Committed as:
r320672 | dorit | 2017-12-14 09:56:31 +0200 (Thu, 14 Dec 2017) | 27 lines

[LV] Support efficient vectorization of an induction with redundant casts

(I forgot to include the Differential revision number in the commit...)

Ayal accepted this revision.Dec 14 2017, 11:33 AM

Just to formally close this review, as it wasn't closed by the commit.

This revision is now accepted and ready to land.Dec 14 2017, 11:33 AM
dorit closed this revision.Dec 14 2017, 10:54 PM