This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorize first-order recurrences

Authored by mssimpso on Jan 14 2016, 10:02 AM.



This patch enables the vectorization of first-order (non-reduction) recurrences. For example:

for (int i = 0; i < n; ++i)
  b[i] = a[i] - a[i - 1];

In the example above, the load PRE of the GVN pass can often hoist a[i - 1] into the loop preheader. This leaves a phi node inside the loop containing values for the hoisted load and a[i]. Although GVN can create these phi nodes, they can also occur naturally.

In this patch, we add a new recurrence kind for these phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction.

Contributed-by: Matthew Simpson and Chad Rosier <>

Diff Detail


Event Timeline

mssimpso updated this revision to Diff 44897.Jan 14 2016, 10:02 AM
mssimpso retitled this revision from to [LV] Vectorize pre-load recurrences.
mssimpso updated this object.
mssimpso added a subscriber: llvm-commits.
bmakam added a subscriber: bmakam.Jan 14 2016, 10:21 AM
gberry added a subscriber: gberry.Jan 14 2016, 10:25 AM

Matt, I've added a few comments below.

4573 ↗(On Diff #44897)

I don't think checking for store instructions is enough here. Isn't I->mayWriteToMemory() more appropriate to catch e.g. calls?

4579 ↗(On Diff #44897)

Same as above. Plus, I'm not sure this is a sufficient check. What if there is a store after the sink point in iteration 'i' that may alias the load from iteration 'i+1'? Perhaps you can piggy back on the check for vectorizability of the original load in the loop? That may not be enough either since the load in question might be known to only write the 0th element (i.e. the value loaded by the sunk load).

One other question: have you explored vectorizing this recurrence as a shuffle+insertelement instead? That would avoid the need for any extra memory dependency checking, and would avoid introducing more loads in the loop.

anemet edited edge metadata.Jan 14 2016, 11:04 AM

One other question: have you explored vectorizing this recurrence as a shuffle+insertelement instead? That would avoid the need for any extra memory dependency checking, and would avoid introducing more loads in the loop.


I just talked with Geoff in person, and I like the shuffle/insert approach. This will avoid a lot of the memory issues and should also expand the types of phi's we can handle beyond loads. Thanks for the quick feedback! I'll post an updated version soon.

sbaranga added inline comments.
562 ↗(On Diff #44897)

It might be worth taking a PredicatedScalarEvolution for this function which I think might help in some cases.

At the moment it is possible that the loop was versioned such that AR's step is known to be 1. In this case doing PSE->getSCEV() would give you the more accurate expression.

I think you also need to check that AR's is an AddRecExpr for TheLoop?

mssimpso updated this revision to Diff 45423.Jan 20 2016, 11:35 AM
mssimpso retitled this revision from [LV] Vectorize pre-load recurrences to [LV] Vectorize first-order recurrences.
mssimpso updated this object.
mssimpso edited edge metadata.

Addressed comments from Geoff and Adam.

Code generation is now done by shuffling the vectors from the current and previous iterations instead of trying to sink loads into the loop. The new approach is simpler and I don't think requires as much legality checking.

mssimpso added inline comments.Jan 20 2016, 11:38 AM
562 ↗(On Diff #45423)

Hi Silviu. Thanks very much for the comments. After changing the code generation approach, I don't think scalar evolution is required any longer.

Hi Matt,

I've started reviewing this but I found that I could use an example of this transformation fully spelled out. Particularly, I am not sure I understand why the initial vector value is a splat vector.


367 ↗(On Diff #45423)

I guess a definition of first-order recurrence should be added here.

3323–3324 ↗(On Diff #45423)

I actually think we need to this before this patch ;). This is a 200-line loop and we shouldn't pile on more strangeness.

I think that all we need to do is to rename RdxPHIsToFix to PHIsToFix in a prequel NFC patch.

3326 ↗(On Diff #45423)

I know that we use fix... in other places but that's pretty general. How about vectorize... or something like that.

3564–3567 ↗(On Diff #45423)

assert that Previous is loop-invariant at this point?

Hi Adam,

Thanks very much for the feedback. Yes, the initial value doesn't have to be broadcast to all lanes since just an insert will do. I will upload a new version with your suggestions. Let me walk through the simple example I mentioned in the summary. For this loop, the (shorthand) scalar IR looks something like this:
  s_init = a[-1]
  br scalar.body

  s1 = phi [s_init,], [s2, scalar.body]
  i = phi [0,], [i+1, scalar.body]
  s2 = a[i]
  b[i] = s2 - s1
  br cond, scalar.body, ...

Here, s1 is a non-reduction recurrence that we currently give up on. This patch calls it a first-order recurrence (because it's value depends on the previous iteration) and we try to vectorize it. The check in isFirstOrderRecurrence basically ensures that s_init is loop-invariant, that s2 is in the loop header (and thus loop-varying), and that every use of s1 is dominated by the definition of s2 (see below). The vectorized IR looks something like this for VF=4:
  v_init = vector(..., ..., ..., a[-1])
  br vector.body

  v1 = phi [v_init,], [v2, vector.body]
  i = phi [0,], [i+4, vector.body]
  v2 = a[i, i+1, i+2, i+3];
  v3 = vector(v1(3), v2(0, 1, 2))
  b[i, i+1, i+2, i+3] = v2 - v3
  br cond, vector.body, middle.block

  x = v2(3)
  s_init = phi [x, middle.block], [a[-1], otherwise]
  br scalar.body

The dominance requirement is there so that we can shuffle v1 with v2 before this value is needed for the assignment to b. After we leave the vector loop, we extract the next value of the recurrence (x) to use as the initial value when jumping to the scalar portion. I hope that helps.

367 ↗(On Diff #45423)


3323–3324 ↗(On Diff #45423)

Sure, I'll push a patch that renames RdxPHIsToFix to PHIsToFix first.

3326 ↗(On Diff #45423)

vectorizeFirstOrderRecurrence sounds good to me.

3564–3567 ↗(On Diff #45423)

Good idea.

mssimpso marked 4 inline comments as done.Feb 1 2016, 9:52 AM
mssimpso added inline comments.
3564–3567 ↗(On Diff #45423)

We actually want the previous value to be loop-varying here. The initial value should be loop-invariant. I've added this assertion as well as a few others.

mssimpso updated this revision to Diff 46551.Feb 1 2016, 9:53 AM

Addressed Adam's comments

anemet added a comment.Feb 9 2016, 6:20 PM

Matt, that's a nice example. Can you please add it as a comment to the code where it's appropriate.

532–541 ↗(On Diff #46551)

Thinking more about this, isn't it true that whether the phi operand is initial or previous depends on their edge assignment rather than whether they loop-variant/invariant.

543–550 ↗(On Diff #46551)

How can user of the current value be loop-invariant?

368–381 ↗(On Diff #46551)

We don't add empty /// like that

I actually think that this definition of this should be before isFirstOrderRecurrence.

3558 ↗(On Diff #46551)

Sorry but I forgot that "fix" is an actual "phase" here. We first do the default widening then we fix up the phis. Can you please rename it back, sorry :(

3573–3585 ↗(On Diff #46551)

This is duplicated between here and isFirstOrderRecurrence, would be nice to somehow remove this.

3596–3597 ↗(On Diff #46551)

"... when initially vectorizing ..."

3601–3602 ↗(On Diff #46551)

Outdated comment.

3625–3636 ↗(On Diff #46551)

I think that CurrentParts should be a single value, this is the value that's incoming into the iteration (whether the real loop iteration or an unrolled loop iteration). We may also want to use the name VecPhi rather than "current" and add comment clarifying the above.

What do you think?

3657–3659 ↗(On Diff #46551)

A newline before this block?

3735–3742 ↗(On Diff #46551)

It's generally not a good practice to mix a multi-line formatting change with a single line (?) functionality change .

mssimpso added inline comments.Feb 11 2016, 9:12 AM
532–541 ↗(On Diff #46551)

When I think more about this, yes, I agree. I'll make the necessary updates. Thanks for catching this.

543–550 ↗(On Diff #46551)

Current (the Phi) could be used outside the loop. Reductions are an example where a Phi is used externally. I actually don't think we need this restriction though, because we can handle this case in code generation. The use would just be of the last value of the recurrence, which we already extract in middle.block.

368–381 ↗(On Diff #46551)

Sure, I'll move it.

3558 ↗(On Diff #46551)

No problem!

3573–3585 ↗(On Diff #46551)

Sure, I will delete these assertions here.

3625–3636 ↗(On Diff #46551)

Yes, I don't think we need to keep track of CurrentParts as a SmallVector since a single value should do. Changing the name is probably a good idea to avoid confusion.

3735–3742 ↗(On Diff #46551)

Sure, we can fix the formatting in a separate patch.

anemet added inline comments.Feb 11 2016, 9:35 AM
3573–3585 ↗(On Diff #46551)

I didn't mean this only for the asserts. I think it would make sense to factor out the way we get the previous and init for a first-order recurrence somewhere (e.g. RecurrenceDescriptor?).

There may also be some common functionality between this new function and RecurrenceDescriptor::isFirstOrderRecurrence that could be shared.

mssimpso added inline comments.Feb 11 2016, 10:45 AM
3573–3585 ↗(On Diff #46551)

I see. Thanks for clarifying!

mssimpso updated this revision to Diff 47718.Feb 11 2016, 1:17 PM
mssimpso marked 10 inline comments as done.

Addressed Adam's comments. Thanks, Adam!

anemet added inline comments.Feb 17 2016, 11:30 PM
3613–3632 ↗(On Diff #47718)

Don't we only enter this function if the Phi has passed isFirstOrderRecurrence? Why are we asserting the same properties that were already checked there? This is almost like saying assert(isFirstOrderRecurrence(Phi)), no? I don't think we want these unless I am misunderstanding something.

1 ↗(On Diff #47718)

Please also add a UF>1 test.

1–4 ↗(On Diff #47718)

I think that it's better practice to avoid the triple and use -force-vector-width to make the test robust.

18 ↗(On Diff #47718)

Is there any reason we can't fully spell out most of these instructions, e.g. shuffle mask?

mssimpso updated this revision to Diff 48342.Feb 18 2016, 9:28 AM
mssimpso marked 4 inline comments as done.

Addressed Adam's comments. Thanks again, Adam!

3613–3632 ↗(On Diff #47718)

Yes, this is my mistake. I thought you had requested these asserts in an earlier review. I've removed the duplication.

18 ↗(On Diff #47718)

We can definitely do this.

anemet accepted this revision.Feb 18 2016, 2:47 PM
anemet edited edge metadata.


This revision is now accepted and ready to land.Feb 18 2016, 2:47 PM

Thanks, Adam! I really appreciate the multiple rounds of feedback you gave on this one.

You're welcome and thanks for your patience. I also had the EuroLLVM reviews happening in parallel.

This revision was automatically updated to reflect the committed changes.