This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Fix for PR25748: reduction vectorization after loop unrolling.
ClosedPublic

Authored by ABataev on Sep 21 2016, 6:41 AM.

Details

Summary

The next code is not vectorized by the SLPVectorizer:

int test(unsigned int *p) {
  int sum = 0;
  for (int i = 0; i < 8; i++)
    sum += p[i];
  return sum;
 }

During optimization this loop is fully unrolled and SLPVectorizer is unable to vectorize it. Patch tries to fix this problem.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev updated this revision to Diff 72035.Sep 21 2016, 6:41 AM
ABataev retitled this revision from to [SLPVectorizer] Fix for PR25748: reduction vectorization after loop unrolling..
ABataev updated this object.
ABataev added reviewers: spatel, RKSimon, mkuper.
suyog added a subscriber: suyog.Sep 21 2016, 7:31 AM
ABataev updated this revision to Diff 72055.Sep 21 2016, 8:43 AM

Reworked handling of not BinaryOps/SelectInst instructions

mkuper added inline comments.Sep 25 2016, 4:40 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

Why do you need this change?

4171 ↗(On Diff #72055)

This comment looks wrong now. Could you replace it with a comment that explains the current situation?

4586 ↗(On Diff #72055)

I actually started coding a similar patch at some point, but decided against it, because the "return" here seems really fairly accidental.
You could just as easily have something like:

int test(unsigned int *p) {
  int sum = 0;
  for (int i = 0; i < 8; i++)
    sum += p[i];
  return sum + 5;
}

It may be the case that having it feed a return is an important special case, but maybe we can work out a full solution for this.

ABataev added inline comments.Sep 29 2016, 11:58 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

It is required to handle the next kind of hor reduction:

sum = a + b + c + ...

It comes from:

int test(unsigned int *p) {
  int sum = 0;
  for (int i = 0; i < 8; i++)
    sum += p[i];
  return sum;
}
4171 ↗(On Diff #72055)

Ok, will do it

4586 ↗(On Diff #72055)

Michael,
I'm working on a general patch, that is able to vectorize your example (I have it already but need some time to improve it). But this patch is still required as a first step for the second patch. Later we can rework this code for better compatibility

ABataev updated this revision to Diff 73001.Sep 30 2016, 1:09 AM

Address Michael's comments

mkuper edited edge metadata.Oct 2 2016, 12:57 AM

Hi Alexey,
Sorry for the slow response time, I'm intermittently on vacation.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

What I'm asking is, how do you even get trees of height 1?

4171 ↗(On Diff #72055)

The code below is really not entirely obvious (at least to me - admittedly, I'm not an expert on the SLP vectorizer...).
Could you actually update the comment to explain what it does - that is, why you have that specific set of conditions - instead of just removing it?

4586 ↗(On Diff #72055)

Ok, that sounds reasonable.

mssimpso added inline comments.Oct 3 2016, 1:15 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

I could be wrong about this, but I think the horizontal reduction matching logic looks through the actual reduction step, since it will generate customized vector operations for the reduce. So it will build trees starting from the reduction operands, not the reduced value. Looking at the test case, this means the adds won't be in the tree. Only the loads will be, so the height will be one.

ABataev added inline comments.Oct 4 2016, 4:44 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

Yes, that's right

4171 ↗(On Diff #72055)

Ok

ABataev updated this revision to Diff 73458.Oct 4 2016, 5:01 AM
ABataev edited edge metadata.

Added a comment + updated

mssimpso added inline comments.Oct 4 2016, 7:29 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

I think it would be a good idea to rework this function so that it checks the fully vectorizable property up to max(MinTreeSize, VectorizeableTree.size()). The current state of things is a bit confusing. MinTreeSize defines what is "tiny" yet here we assume tiny means a height of two. What do you think?

mssimpso added inline comments.Oct 4 2016, 7:31 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

That should be min(MinTreeSize, VectorizeableTree.size()), not max.

ABataev added inline comments.Oct 4 2016, 8:30 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

Yes, I agree. But I believe it must done in a different patch

mssimpso added inline comments.Oct 4 2016, 8:57 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1797 ↗(On Diff #72055)

Sounds good to me.

Other comments?

mkuper accepted this revision.Oct 6 2016, 12:21 PM
mkuper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Oct 6 2016, 12:21 PM
This revision was automatically updated to reflect the committed changes.