This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize first-order recurrences
ClosedPublic

Authored by mssimpso on Jan 14 2016, 10:02 AM.

Details

Summary

This patch enables the vectorization of first-order (non-reduction) recurrences. For example:

for (int i = 0; i < n; ++i)
  b[i] = a[i] - a[i - 1];

In the example above, the load PRE of the GVN pass can often hoist a[i - 1] into the loop preheader. This leaves a phi node inside the loop containing values for the hoisted load and a[i]. Although GVN can create these phi nodes, they can also occur naturally.

In this patch, we add a new recurrence kind for these phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction.

Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 44897.Jan 14 2016, 10:02 AM
mssimpso retitled this revision from to [LV] Vectorize pre-load recurrences.
mssimpso updated this object.
mssimpso added a subscriber: llvm-commits.
bmakam added a subscriber: bmakam.Jan 14 2016, 10:21 AM
gberry added a subscriber: gberry.Jan 14 2016, 10:25 AM

Matt, I've added a few comments below.

lib/Transforms/Vectorize/LoopVectorize.cpp
4573

I don't think checking for store instructions is enough here. Isn't I->mayWriteToMemory() more appropriate to catch e.g. calls?

4579

Same as above. Plus, I'm not sure this is a sufficient check. What if there is a store after the sink point in iteration 'i' that may alias the load from iteration 'i+1'? Perhaps you can piggy back on the check for vectorizability of the original load in the loop? That may not be enough either since the load in question might be known to only write the 0th element (i.e. the value loaded by the sunk load).

One other question: have you explored vectorizing this recurrence as a shuffle+insertelement instead? That would avoid the need for any extra memory dependency checking, and would avoid introducing more loads in the loop.

anemet edited edge metadata.Jan 14 2016, 11:04 AM

One other question: have you explored vectorizing this recurrence as a shuffle+insertelement instead? That would avoid the need for any extra memory dependency checking, and would avoid introducing more loads in the loop.

+1

I just talked with Geoff in person, and I like the shuffle/insert approach. This will avoid a lot of the memory issues and should also expand the types of phi's we can handle beyond loads. Thanks for the quick feedback! I'll post an updated version soon.

sbaranga added inline comments.
lib/Transforms/Utils/LoopUtils.cpp
562

It might be worth taking a PredicatedScalarEvolution for this function which I think might help in some cases.

At the moment it is possible that the loop was versioned such that AR's step is known to be 1. In this case doing PSE->getSCEV() would give you the more accurate expression.

I think you also need to check that AR's is an AddRecExpr for TheLoop?

mssimpso updated this revision to Diff 45423.Jan 20 2016, 11:35 AM
mssimpso retitled this revision from [LV] Vectorize pre-load recurrences to [LV] Vectorize first-order recurrences.
mssimpso updated this object.
mssimpso edited edge metadata.

Addressed comments from Geoff and Adam.

Code generation is now done by shuffling the vectors from the current and previous iterations instead of trying to sink loads into the loop. The new approach is simpler and I don't think requires as much legality checking.

mssimpso added inline comments.Jan 20 2016, 11:38 AM
lib/Transforms/Utils/LoopUtils.cpp
587

Hi Silviu. Thanks very much for the comments. After changing the code generation approach, I don't think scalar evolution is required any longer.

Hi Matt,

I've started reviewing this but I found that I could use an example of this transformation fully spelled out. Particularly, I am not sure I understand why the initial vector value is a splat vector.

Thanks,
Adam

lib/Transforms/Vectorize/LoopVectorize.cpp
367

I guess a definition of first-order recurrence should be added here.

3333–3334

I actually think we need to this before this patch ;). This is a 200-line loop and we shouldn't pile on more strangeness.

I think that all we need to do is to rename RdxPHIsToFix to PHIsToFix in a prequel NFC patch.

3336

I know that we use fix... in other places but that's pretty general. How about vectorize... or something like that.

3574–3577

assert that Previous is loop-invariant at this point?

Hi Adam,

Thanks very much for the feedback. Yes, the initial value doesn't have to be broadcast to all lanes since just an insert will do. I will upload a new version with your suggestions. Let me walk through the simple example I mentioned in the summary. For this loop, the (shorthand) scalar IR looks something like this:

scalar.ph:
  s_init = a[-1]
  br scalar.body

scalar.body:
  s1 = phi [s_init, scalar.ph], [s2, scalar.body]
  i = phi [0, scalar.ph], [i+1, scalar.body]
  s2 = a[i]
  b[i] = s2 - s1
  br cond, scalar.body, ...

Here, s1 is a non-reduction recurrence that we currently give up on. This patch calls it a first-order recurrence (because it's value depends on the previous iteration) and we try to vectorize it. The check in isFirstOrderRecurrence basically ensures that s_init is loop-invariant, that s2 is in the loop header (and thus loop-varying), and that every use of s1 is dominated by the definition of s2 (see below). The vectorized IR looks something like this for VF=4:

vector.ph:
  v_init = vector(..., ..., ..., a[-1])
  br vector.body

vector.body
  v1 = phi [v_init, vector.ph], [v2, vector.body]
  i = phi [0, vector.ph], [i+4, vector.body]
  v2 = a[i, i+1, i+2, i+3];
  v3 = vector(v1(3), v2(0, 1, 2))
  b[i, i+1, i+2, i+3] = v2 - v3
  br cond, vector.body, middle.block

middle.block:
  x = v2(3)
  br scalar.ph

scalar.ph:
  s_init = phi [x, middle.block], [a[-1], otherwise]
  br scalar.body

The dominance requirement is there so that we can shuffle v1 with v2 before this value is needed for the assignment to b. After we leave the vector loop, we extract the next value of the recurrence (x) to use as the initial value when jumping to the scalar portion. I hope that helps.

lib/Transforms/Vectorize/LoopVectorize.cpp
367

Agreed.

3333–3334

Sure, I'll push a patch that renames RdxPHIsToFix to PHIsToFix first.

3336

vectorizeFirstOrderRecurrence sounds good to me.

3574–3577

Good idea.

mssimpso marked 4 inline comments as done.Feb 1 2016, 9:52 AM
mssimpso added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
3574–3577

We actually want the previous value to be loop-varying here. The initial value should be loop-invariant. I've added this assertion as well as a few others.

mssimpso updated this revision to Diff 46551.Feb 1 2016, 9:53 AM

Addressed Adam's comments

anemet added a comment.Feb 9 2016, 6:20 PM

Matt, that's a nice example. Can you please add it as a comment to the code where it's appropriate.

lib/Transforms/Utils/LoopUtils.cpp
532–541

Thinking more about this, isn't it true that whether the phi operand is initial or previous depends on their edge assignment rather than whether they loop-variant/invariant.

543–550

How can user of the current value be loop-invariant?

lib/Transforms/Vectorize/LoopVectorize.cpp
368–381

We don't add empty /// like that

I actually think that this definition of this should be before isFirstOrderRecurrence.

3566

Sorry but I forgot that "fix" is an actual "phase" here. We first do the default widening then we fix up the phis. Can you please rename it back, sorry :(

3581–3593

This is duplicated between here and isFirstOrderRecurrence, would be nice to somehow remove this.

3604–3605

"... when initially vectorizing ..."

3609–3610

Outdated comment.

3633–3644

I think that CurrentParts should be a single value, this is the value that's incoming into the iteration (whether the real loop iteration or an unrolled loop iteration). We may also want to use the name VecPhi rather than "current" and add comment clarifying the above.

What do you think?

3665–3667

A newline before this block?

3749–3750

It's generally not a good practice to mix a multi-line formatting change with a single line (?) functionality change .

mssimpso added inline comments.Feb 11 2016, 9:12 AM
lib/Transforms/Utils/LoopUtils.cpp
532–541

When I think more about this, yes, I agree. I'll make the necessary updates. Thanks for catching this.

543–550

Current (the Phi) could be used outside the loop. Reductions are an example where a Phi is used externally. I actually don't think we need this restriction though, because we can handle this case in code generation. The use would just be of the last value of the recurrence, which we already extract in middle.block.

lib/Transforms/Vectorize/LoopVectorize.cpp
368–381

Sure, I'll move it.

3566

No problem!

3581–3593

Sure, I will delete these assertions here.

3633–3644

Yes, I don't think we need to keep track of CurrentParts as a SmallVector since a single value should do. Changing the name is probably a good idea to avoid confusion.

3749–3750

Sure, we can fix the formatting in a separate patch.

anemet added inline comments.Feb 11 2016, 9:35 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
3581–3593

I didn't mean this only for the asserts. I think it would make sense to factor out the way we get the previous and init for a first-order recurrence somewhere (e.g. RecurrenceDescriptor?).

There may also be some common functionality between this new function and RecurrenceDescriptor::isFirstOrderRecurrence that could be shared.

mssimpso added inline comments.Feb 11 2016, 10:45 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
3581–3593

I see. Thanks for clarifying!

mssimpso updated this revision to Diff 47718.Feb 11 2016, 1:17 PM
mssimpso marked 10 inline comments as done.

Addressed Adam's comments. Thanks, Adam!

anemet added inline comments.Feb 17 2016, 11:30 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
3617–3636

Don't we only enter this function if the Phi has passed isFirstOrderRecurrence? Why are we asserting the same properties that were already checked there? This is almost like saying assert(isFirstOrderRecurrence(Phi)), no? I don't think we want these unless I am misunderstanding something.

test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
1 ↗(On Diff #47718)

Please also add a UF>1 test.

1–4 ↗(On Diff #47718)

I think that it's better practice to avoid the triple and use -force-vector-width to make the test robust.

18 ↗(On Diff #47718)

Is there any reason we can't fully spell out most of these instructions, e.g. shuffle mask?

mssimpso updated this revision to Diff 48342.Feb 18 2016, 9:28 AM
mssimpso marked 4 inline comments as done.

Addressed Adam's comments. Thanks again, Adam!

lib/Transforms/Vectorize/LoopVectorize.cpp
3617–3636

Yes, this is my mistake. I thought you had requested these asserts in an earlier review. I've removed the duplication.

test/Transforms/LoopVectorize/AArch64/first-order-recurrence.ll
18 ↗(On Diff #47718)

We can definitely do this.

anemet accepted this revision.Feb 18 2016, 2:47 PM
anemet edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Feb 18 2016, 2:47 PM

Thanks, Adam! I really appreciate the multiple rounds of feedback you gave on this one.

You're welcome and thanks for your patience. I also had the EuroLLVM reviews happening in parallel.

This revision was automatically updated to reflect the committed changes.