This is an archive of the discontinued LLVM Phabricator instance.

[LV] Fix the vector code generation for first order recurrence
ClosedPublic

Authored by anna on Apr 12 2017, 7:48 AM.

Details

Summary

In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.

Fixes PR32396.

Event Timeline

anna created this revision.Apr 12 2017, 7:48 AM
anna added a comment.Apr 12 2017, 8:00 AM

Hi Matt,

Just FYI: The logic differs slightly from what you suggested in https://reviews.llvm.org/D31910#723721.
We need to extract from the correct index of the vectorized phi update. Extracting from index 3 of the vectorized phi (vector.recur), will give you the element at vector.update(3) - VF, which won't work.

middle.block:
  %vector.recur.extract = extractelement <4 x i32> %vector.update, i32 3
  %new.extract = extractelement <4 x i32> %vector.recur, i32 3

What we need is the the vectorized loop element equivalent to the second last element in the scalar loop (since the phi is the second last and the phi update is the last).

So, the logic used in this patch is:

middle.block:
  %vector.recur.extract = extractelement <4 x i32> %vector.update, i32 3
  %vector.recur.extract.for.phi = extractelement <4 x i32> %vector.update, i32 2
scalar.ph:
  %scalar.recur.init = phi i32 [ %vector.recur.extract, %middle.block ], ...
for.end:
  %r.lcssa = phi i32 [ %scalar.recur, %for.body ], [ %vector.recur.extract.for.phi, %middle.block ]
mssimpso edited edge metadata.Apr 12 2017, 9:09 AM
In D31979#724852, @anna wrote:

Hi Matt,

Just FYI: The logic differs slightly from what you suggested in https://reviews.llvm.org/D31910#723721.
We need to extract from the correct index of the vectorized phi update. Extracting from index 3 of the vectorized phi (vector.recur), will give you the element at vector.update(3) - VF, which won't work.

OK, this makes sense to me.

lib/Transforms/Vectorize/LoopVectorize.cpp
4092–4093

We might as well create this extract unconditionally, regardless of whether the phi is used outside the loop. It would simplify the code here. So you could move it up to where "Extract" is created and not have to worry about setting "PhiHasUsersOutsideLoop". If there are no users, the extract is trivially dead and will be cleaned up by a later pass.

Unlike "Extract", I think we will need to initialize "ExtractForPhiUsedOutsideLoop" to something other than "Incoming" or nullptr. I think we'll need it when we are just unrolling and not vectorizing. Should it be PreviousParts[UF - 2]? See my comment in the test case.

It's probably a good idea to also rename "Extract" to something more descriptive since we now have two of them.

4111–4122

This looks a bit strange to me now. The LCSSA phi for "Phi" should always have an incoming value for "ExtractForPhiUsedOutsideLoop" and not "Extract". "Extract" will only be used to initialize the scalar recurrence. I just checked that we don't allow the update instruction to be used externally (I thought we could handle this), so we don't have to worry about that here.

test/Transforms/LoopVectorize/first-order-recurrence.ll
372

Can you add a check for the VF = 1, UF > 1 case? I want to make sure the external use is correct when there is no extract.

anna planned changes to this revision.Apr 12 2017, 9:52 AM

Need change based on Matt's comment to handle code gen when just unrolling and not vectorizing.

lib/Transforms/Vectorize/LoopVectorize.cpp
4092–4093

Agreed, will make the ExtractForPhiUsedOutsideLoop unconditional, and allow dce to eliminate it. I was under the impression these extracts would contribute to the cost model, but looking at the code, we only consider the instructions within the loop.

I'll take a look at the UF>1 and VF = 1 case. Thanks!

4111–4122

Ah! We can always use ExtractForPhiUsedOutsideLoop as the incoming value, since the loop is in LCSSA form. So, this Phi will be used only in LCSSA phis outside the loop, and it's occurrence will be only at the loop exit block.
Is that right?

How about these 2 checks under DEBUG:

  1. Use the PhiHasUsersOutsideLoop to count number of users. It should be equal to one when we hit LCSSAPhi->getIncomingValue(0) == Phi. Or is this just checking LCSSA property? :)
  2. update instruction has no external uses. If that's false, then we would get incorrect code gen here.
mssimpso added inline comments.Apr 12 2017, 10:40 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4111–4122

So, this Phi will be used only in LCSSA phis outside the loop, and it's occurrence will be only at the loop exit block. Is that right?

Yes, if any value in a loop is used externally, LCSSA will create a phi node for it in the exit block. The external uses will then be of the new phi.

How about these 2 checks under DEBUG:

For counting the external users, yes that would just be checking the LCSSA property, so you don't need to do that.

For the update instruction, you could do something like "assert(!hasOutsideLoopUser(Previous))". See hasOutsideLoopUser for more details.

You could also just go ahead and enable external uses of the update instruction (or do so in another patch if you prefer). The changes here to allow this would be to add "Extract" as the incoming value for the LCSSA phi for "Previous". You'll also need to add the update to AllowedExit over in canVectorizeInstrs.

mcrosier resigned from this revision.Apr 12 2017, 11:40 AM
anna updated this revision to Diff 95017.Apr 12 2017, 12:16 PM
anna marked 3 inline comments as done.

Handle code gen when loop unrolled but not vectorized.
Unconditionally generate the second extract, and other minor changes based on review.

mssimpso added inline comments.Apr 12 2017, 1:24 PM
test/Transforms/LoopVectorize/first-order-recurrence.ll
361

Does the vector test actually pass? It doesn't look like it's well-formed. You need to define the regex for L1 and L2 when they're first used. Like %[[L1:.+]]. Also please indent the instructions under their corresponding block labels to make reading the test a little easier.

anna updated this revision to Diff 95025.Apr 12 2017, 1:53 PM
anna marked an inline comment as done.

Updated tests with correct prefix, fixed indentation.

anna added inline comments.Apr 12 2017, 1:54 PM
test/Transforms/LoopVectorize/first-order-recurrence.ll
361

yeah, it actually did. These are silently ignored. We just need UNROLL-NO-IC without the CHECK suffix.

This revision is now accepted and ready to land.Apr 13 2017, 10:59 AM
This revision was automatically updated to reflect the committed changes.