This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions.
ClosedPublic

Authored by ABataev on Jan 20 2017, 11:05 AM.

Details

Summary

Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like:

float f(float x[], int a, int b) {
  float p = a % b;
  p += x[0] + 3;
  for (int i = 1; i < 32; i++)
    p += x[i];
  return p;
}

Patch allows vectorization of this kind of horizontal reductions.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev created this revision.Jan 20 2017, 11:05 AM
mkuper added inline comments.Jan 20 2017, 6:31 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4256 ↗(On Diff #85161)

I don't understand the logic here.
IIUC, if EdgeToVist is 0 it just means that we're currently going "left", but this could be at any stage in the reduction. Why does EdgeToVist == 0 means we can do this for a non-associative FAdd?

4329 ↗(On Diff #85161)

Don't you want the debugloc for the original op (Pair.first?), not for its operand?

test/Transforms/SLPVectorizer/X86/horizontal-list.ll
748 ↗(On Diff #85161)

Could you please add a test that has several "extra" values?
It doesn't have to have so many vector lanes, that's just noise.

ABataev added inline comments.Jan 23 2017, 5:17 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4256 ↗(On Diff #85161)

Removed it, now there is only check for associative operation

4329 ↗(On Diff #85161)

Good catch, thanks.

test/Transforms/SLPVectorizer/X86/horizontal-list.ll
748 ↗(On Diff #85161)

Will do

ABataev updated this revision to Diff 85580.Jan 24 2017, 6:35 AM

Address comments from Michael + stable codegen + couple testcases added with several extra arguments.

ABataev retitled this revision from [SLP] Fix for PR31690: Allow using of non-zero initial values in horizontal instructions. to [SLP] Fix for PR31690: Allow using of extra values in horizontal instructions..Jan 24 2017, 6:37 AM
mkuper added inline comments.Jan 24 2017, 10:54 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4249 ↗(On Diff #85580)

Let me make sure I understand what's going on here.
What happens is that we've added an operand to the tree, but it turns out that we shouldn't have, and we should actually be using it as an "extra value". So we're undoing it. Right?
Why don't we catch it before we add it to the tree? Is it because it happens to have the same opcode as the reduction, and we only realize we're "stuck" once we looks at its operands?

In any case, regardless of whether that's entirely correct, this needs to be documented more clearly.
Also, is this checked by any of the tests you added?

ABataev added inline comments.Jan 25 2017, 2:03 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4249 ↗(On Diff #85580)

I'll try to explain what's going on here.
Imagine we have something

%add1 = fadd fast float %extra_val1, %extra_val2
%add2 = fadd fast float %red_val1, %add1
%add3 = fadd fast float %red_val2, %add2

Here we have reduced values %red_val1 and %red_val2 and the reduction opcode is fadd.
Analysis of the horizontal reduction is from bottom to top and from left to right. So, we starting building the tree from the %add3 operation. It is marked as a root. %red_val2 is marked as reduced value. Then %add2 op is marked as the next node of the tree. %red_val1 is marked as next reduced value. Then %add1 is marked as the next possible reduction operation. But during analysis of its argument %extra_val1 it is detected that it is neither reduction operation, not reduced value. We mark %extra_val1 as an extra value for the operation %add1. Then we start analysis of %extra_val2. Again, it is neither reductiomn operation, nor reduced value. But %add1 already marked as (possible) reduction operation with an extra value. In this case we understand, that the whole %add1 is an extra value itself for %add2 reduction operation. We unmark %add1 operation as a reduction operation with an extra value and mark %add1 as an extra value for %add2 reduction operation.
The main problem here that currently we're able to catch only 1-level operations with extra values (because of if in line 4251). I'll try to make it more universal.

Yes, this functionality is tested. See the test test/Transforms/SLPVectorizer/X86/horizontal-list.ll, function @extra_args. Analysis for the instruction [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00 works this way.

Actually, I had to add this functionality on Monday, because something changed in codegen/optimization passes after I published this patch on Friday. Before Monday clang at -O2/-O3 generated a bit different code and this piece of code was not required. But then some changes were made (maybe in InstCombiner) and the IR code has changed, so I just had to make this changes to make it detect such kind of constructs.

ABataev updated this revision to Diff 85900.Jan 26 2017, 7:07 AM
ABataev retitled this revision from [SLP] Fix for PR31690: Allow using of extra values in horizontal instructions. to [SLP] Fix for PR31690: Allow using of extra values in horizontal reductions..

Address Michael's comments + improvements in detection and processing of extra values.

mkuper edited edge metadata.Jan 26 2017, 1:58 PM

Could you please rebase this on top of D29175?

ABataev updated this revision to Diff 86045.Jan 27 2017, 6:41 AM

Added some extra comments + outlined MarkAsExtraArg lambda to a member function markExtraArg

Could you please rebase this on top of D29175?

Michael, it was done already.

mkuper added inline comments.Jan 30 2017, 5:16 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4152 ↗(On Diff #86045)

Isn't "ParentStackElim.first" in this case just "<ExtraArg_i> + ExtraArg" itself? What you're trying to say is that ParentStackElem.first has two "extra" arguments, so it is itself an extra argument, right?

4155 ↗(On Diff #86045)

What this basically does is avoid recursing above this node, right? Please document this, since it's non-obvious looking from this code ("second" here isn't meaningful, unless you know what the pair you were called with means - which is not documented above.)

4166 ↗(On Diff #86045)

This looks out of place.

4219 ↗(On Diff #86045)

You can use ExtraArgs.lookup(TreeN) to replace the "if (map.count(k) v = map[k]" pattern. This saves a lookup.

4229 ↗(On Diff #86045)

Does "Stack.size() - 2" always point to the parent? Please document this.

ABataev marked 5 inline comments as done.Jan 31 2017, 12:48 AM
ABataev added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
4152 ↗(On Diff #86045)

Yes, I just provided some general description in C/C++ like format rather than in LLVM IR format.

4155 ↗(On Diff #86045)

Ok, will do

4166 ↗(On Diff #86045)

Ooops, will remove it.

4219 ↗(On Diff #86045)

Ok, will do, thanks.

4229 ↗(On Diff #86045)

Yes, will add some additional comment

ABataev updated this revision to Diff 86408.Jan 31 2017, 3:58 AM
ABataev marked 5 inline comments as done.

Update after review

Some more comments?

mkuper accepted this revision.Feb 2 2017, 12:03 PM

Argh, sorry, I didn't notice I didn't submit.
LGTM, but could you try to make the comment in line 4216 clearer? No need for another round of review.

lib/Transforms/Vectorize/SLPVectorizer.cpp
4152 ↗(On Diff #86045)

Sure, what I meant is that I think this is a somewhat confusing description.

This revision is now accepted and ready to land.Feb 2 2017, 12:03 PM
This revision was automatically updated to reflect the committed changes.