This is an archive of the discontinued LLVM Phabricator instance.

Loop vectorization with induction variable with non-constant step.
ClosedPublic

Authored by delena on Apr 19 2016, 4:09 AM.

Details

Summary

One of vectorization limitation today is non-const step of induction variable.
In this patch, I allow vectorization when the step is a loop-invariant variable.
This is the loop example that was vectorized after my patch:
int int_inc;
int bar(int init, int *restrict A, int N) {

int x = init;
 for (int i=0;i<N;i++){
   A[i] = x;
   x += int_inc;
 }
 return x;

}
In this case "x" is an induction variable with"init_inc" step. Loop exit count is calculated from another induction variable "i".
The following loop remains scalar right now:

for (int i=0; i<N; i+=int_inc) {
  A[i] = i;
}

In this case the vectorizer can't calculate the exit count.

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 54172.Apr 19 2016, 4:09 AM
delena retitled this revision from to Loop vectorization with induction variable with non-constant step..
delena updated this object.
delena added reviewers: Ayal, anemet, hfinkel.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
hfinkel added inline comments.Apr 26 2016, 3:38 PM
../lib/Transforms/Utils/LoopUtils.cpp
754–755

I don't think this is a good way to handle this, because we'll run into all kinds of non-trivial situations where SCEV has computed an expression for the step that does not exactly correspond to a pre-existing IR value. We won't handle those cases, and it will be nearly impossible for a user to understand why.

Instead, convert StepValue in InductionDescriptor to a const SCEV *, and then when we need the Value* when generating IR, get one from the SCEVBuilder. It will reuse an existing Value* should one exist.

delena added inline comments.Apr 27 2016, 11:17 AM
../lib/Transforms/Utils/LoopUtils.cpp
754–755

I can't find a way to build a Value from SCEV. If the Step is "SCEVAddRecExpr", how do I get Value* from it?

mssimpso added inline comments.
../lib/Transforms/Utils/LoopUtils.cpp
754–755

Elena,

I think Hal was suggesting you use a SCEVExpander to build a Value from a SCEV expression. SCEVExpander's are used elsewhere in the loop vectorizer.

hfinkel added inline comments.Apr 28 2016, 7:41 AM
../lib/Transforms/Utils/LoopUtils.cpp
754–755

Yes, exactly. SCEVExpander will reuse an existing value if an appropriate one is known, or otherwise materialize some appropriate IR value.

delena updated this revision to Diff 55587.Apr 29 2016, 5:49 AM

Following Hal's and Matthew comments, I keep Step as a SCEV inside InductionDescriptor. It allows to vectorize loop with induction variable with any *loop-invariant* step.
Again, we are talking about a "secondary" induction variable, which is being updated inside loop, but does not participate in trip count calculation.

I added one more example, where the step is a SCEV from outer loop:

for (int j = 0; j < M; j++) {
  for (int i=0; i<N; i++){
    A[i] = x;
    x += j; // loop-invariant step
  }
}
hfinkel added inline comments.Apr 29 2016, 8:09 AM
../lib/Transforms/Utils/LoopUtils.cpp
661

Comments should be complete sentences and end with a period.

668

See above.

703

The logic here is now repeating things that SCEVExpander should already know how to do (plus SCEV might know about even more simplification rules). Try calculating the expression using SCEV and then expand that. This seems to be computing Start + Index*Step, so we can generate:

const SCEV *S = SE->getAddExpr(SE->getSCEV(Start), SE->getMulExpr(SE->getSCEV(Index), Step);
return Exp.expandCodeFor(S, StartValue->getType(), &*B.GetInsertPoint());

If you use StartValue->getType() as I did above, this might just work for the pointer inductions too.

delena added inline comments.May 3 2016, 6:41 AM
../lib/Transforms/Utils/LoopUtils.cpp
703

I tried to apply your proposal. The generated code is right but not always match the original. Some tests fail due to this replacement. It happens because getAddExpr() tries to combine all addends recursively. In some cases the result is better. But I saw one case at least, where the code became less optimal.

delena updated this revision to Diff 55988.May 3 2016, 7:10 AM

I added SCEV calculation where it was possible without changing the existing tests.
I also changed some comments.

delena added a comment.May 8 2016, 5:26 AM

Hi Hal,

I checked again. SE->getAddExpr(SE->getSCEV(A), SE->getSCEV(B)) does not always give better results than just createAdd(A, B).
It happens when we mix getAddExpr on SCEVs and ADD/SUB operations. As a result, we receive redundant intermediate values being calculated in different ways and Instcombine is unable to reduce them all.

Could you, please, continue reviewing this patch. Thank you.

hfinkel accepted this revision.May 9 2016, 3:47 PM
hfinkel edited edge metadata.
hfinkel added inline comments.
../lib/Transforms/Utils/LoopUtils.cpp
703–708

Okay. That's unfortunate. Please add a FIXME comment here explaining the situation. Otherwise, this LGTM.

This revision is now accepted and ready to land.May 9 2016, 3:47 PM

Thanks a lot for the review! I added a comment about SCEV expressions.

This revision was automatically updated to reflect the committed changes.