This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Support epilogue vectorisation of loops with reductions
ClosedPublic

Authored by kmclaughlin on Jan 10 2022, 3:54 AM.

Details

Summary

isCandidateForEpilogueVectorization will currently return false for loops
which contain reductions. This patch removes this restriction and makes
the following changes to support epilogue vectorisation with reductions:

  • fixReduction: If fixReduction is being called during vectorisation of the epilogue, the phi node it creates will need to additionally carry incoming values from the middle block of the main loop.
  • createEpilogueVectorizedLoopSkeleton: The incoming values of the phi created by fixReduction are updated after the vec.epilog.iter.check block is added. The phi is also moved to the preheader of the epilogue.
  • processLoop: The start value of any VPReductionPHIRecipes are updated before vectorising the epilogue loop. The getResumeInstr function added to the ILV will return the resume instruction associated with the recurrence descriptor.

Diff Detail

Event Timeline

kmclaughlin created this revision.Jan 10 2022, 3:54 AM
kmclaughlin requested review of this revision.Jan 10 2022, 3:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2022, 3:54 AM

Hi @kmclaughlin thanks for sharing this patch. Overall the approach looks sensible, i.e. updating the VPlan for the epilogue to use a new start value for the VPReductionPHIRecipe, and moving the bc.merge.rdx to the correct block. The implementation itself still seems a bit fuzzy, so I've given some suggestions to make the remapping of values/edges more explicit.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1235

Instead of inferring this information from the PHI node in the original loop, I'd prefer to record the resume values separately in the ILV and pass the information through the EpilogueLoopVectorizationInfo.

4317

It's not necessarily safe to assume ResumePhi is a resume value just because it is a PHINode, so I'd recommend looking this up in the map that I mentioned earlier.

4324–4325

This case seems a bit odd, does it matter that ResumeValue is an instruction?

For the preheader after epilogue vectorization, I'd expect there to only be three cases to consider:

  1. Original start value (when not having executed any vector code at all).
  2. Intermediate reduction value after the main vector body (i.e. no epilogue vectorization because of the iter checks)
  3. Intermediate reduction value after the vectorized epilogue. (i.e. the VPReductionPHI's new start value).

If you know that ResumePhi is actually a resume PHI node, you can write the loop as:

for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
  if (Incoming == LoopMiddleBlock)
    BCBlockPhi->addIncoming(ReducedPartRdx, Incoming);
  else if (ResumePhi &&              // case 2 from above.
                                     // note that ResumePhi is only non-zero for epilogue vectorization.
           llvm::is_contained(ResumePhi->blocks(), Incoming))
    BCBlockPhi->addIncoming(ResumePhi->getIncomingValueForBlock(Incoming),
                            Incoming);
  else
    BCBlockPhi->addIncoming((Value *)ReductionStartValue, Incoming);
}

// Now, we need to fix the users of the reduction variable
// inside and outside of the scalar remainder loop.

[...]
8297

Instead of going through all the basic blocks like this, you can simply remove the incoming values from the EPI.SCEVSafetyCheck, EPI.MemSafetyCheck and EPI.EpilogueIterationCountCheck.

The incoming edge from the middle block can be updated to now come from VecEpilogueIterationCountCheck.

bmahjour added inline comments.Jan 10 2022, 8:24 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8289

"backedge-taken check block"?
do you mean the latch block?

llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll
108

Would be more meaningful to have [ 1, %entry ] or [ 0xFFFFFFFF, %entry ], instead of [ 0, %entry ]

llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll
500

Please add a test case for when there are more than one reductions, eg:

float foo(float *arr, int n) {
  float sum = 0.0;
  float prod = 1.0;
  for (int i = 0; i < n; i++) {
    sum += arr[i];
    prod *= arr[i];
  }
  return prod/sum;
}
kmclaughlin marked 6 inline comments as done.
kmclaughlin edited the summary of this revision. (Show Details)
  • Added a map to associate resume values with RecurrenceDescriptors in the loop
  • Changed the tests to use more meaningful start values
  • Added a test case where there is more than one reduction in the loop
  • Use the start value of the VPReductionPHIRecipe in fixReduction to check if the resume value is a Phi
  • Restructured the changes to fixReduction & createEpilogueVectorizedLoopSkeleton

Thank you for the comments on this patch, @sdesmalen & @bmahjour!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4317

I've changed this to look up the start value from the VPReductionPHIRecipe as this should be set correctly for the epilogue.

4324–4325

I don't think it matters if ResumeValue is an instruction, so I've rewritten the loop as suggested above.

llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll
500

Hi @bmahjour, I have added a test case using the above example here (@multiple_fp_rdx)

Thanks for the changes, this looks better! I've left a few more comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
561

nit:

s/resume value for a reduction if set, otherwise returns nullptr./
  resume value (bc.merge.rdx) for a reduction as generated by fixReduction./

In the implementation of getResumeValue, I'd also add an assert that the record exists to avoid it being queried when the information is not yet available, and also to ensure the record isn't missing for whatever reason.

563

This should return a PHI node, because it can only ever be a PHINode.

834

PHINode

1232

nit: maybe just pass the RecurrenceDescriptor in as const reference? (because in most places that calls this function the recurrence descriptor is a reference, not a pointer).

1233–1235

You could resume the iterator from find to return the result, rather than having to search through ResumeValues twice.

If you assert that the value must be available, you can do:

auto It = ResumeValues.find(RdxDesc);
assert(It != ResumeValues.end() && "Message");
return *It;
4358

nit: maybe move to line 4317 so that it's together with the creation of BCBlockPhi?

4359–4360

does this ever happen? I'd expect ResumeValues not to contain a PHINode for the resume value of this reduction descriptor yet.

4362

a PHINode is a Value

8290–8293

nit: the highlighted bit seems redundant, because the code below says as much.

kmclaughlin marked 9 inline comments as done.
  • Changed getResumeValue to return a PHINode instead of a Value
  • Pass in the RecurrenceDescriptor as a const reference to getResumeValue
  • Assert that the resume value must be found in getResumeValue

This looks like a really nice change, thanks @kmclaughlin!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8295

nit: Is it possible to merge these two for loops together into one, i.e.

for (PHINode &Phi : VecEpilogueIterationCountCheck->phis()) {
  • Added a test to epilog-vectorization-reductions.ll where the start value of the reduction is also Phi node
kmclaughlin added inline comments.Jan 13 2022, 7:05 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8295

Hi @david-arm, I found that when I tried to use one loop here as you've suggested, that the bc.merge.rdx phis for loops with more than one reduction were not all updated correctly. When I added the test with a loop containing two reductions, the second reduction was not updated and moved to the preheader.

david-arm added inline comments.Jan 13 2022, 7:48 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8295

Ah ok, I suspect that VecEpilogueIterationCountCheck->phis() is perhaps being changed by code in this loop, which requires you to take a snapshot.

sdesmalen added inline comments.Jan 14 2022, 8:59 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8291–8293

nit: SmallVector has a constructor that takes an iterator range, so I think you can just write:

SmallVector<PhiNode*> PhisInBlock(VecEpilogueIterationCountCheck->phis());
10638

nit: please add braces for the outer for-loop too.

llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll
121

What's the value in testing this? (We already know that for SVE we don't vectorize a mul reduction?)

llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll
243

Is there anything specifically different about how reductions are handled in epilogue vectorization, that requires this to be tested explicitly? (same for a max-reduction above)

llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll
201

Again, I'm not entirely sure of the value of testing this case? (as in, how it relates to this patch)

bmahjour added inline comments.Jan 14 2022, 10:35 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
563

suggestion: rename to getReductionResumeValue (to distinguish it from "induction resume" values).

834

suggestion: rename to ReductionResumeValues

kmclaughlin marked 5 inline comments as done.
  • Renamed ResumeValues -> ReductionResumeValues
  • Removed several tests which were not covering anything new from this patch compared to other tests
llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll
243

I was trying to add tests for many different scenarios involving reductions, though I think you're right that some of these aren't testing anything different related to this patch that isn't already covered elsewhere. I've removed the tests that you've highlighted here & in the other test files.

david-arm added inline comments.Jan 17 2022, 6:39 AM
llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll
201

I think this might be a test that I asked @kmclaughlin to add because I wanted to make sure we used the correct PHI type in the epilogue loop too. This exposes a case where we detect reductions using the lookThroughAnd vectoriser function. The actual reduction`llvm.vector.reduce.or.v4i16` uses a different type to the PHI.

This revision is now accepted and ready to land.Jan 20 2022, 7:08 AM
fhahn added inline comments.Jan 20 2022, 7:17 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10642

I think you might have to add this as external def to the VPlan (using addExternalDef), otherwise the allocated memory won't get freed.

llvm/test/Transforms/LoopVectorize/X86/pr35432.ll
92

lots of spurrious changes here due to auto generating the check lines. Can you just reorder the incoming values here without auto-generating the check lines?

kmclaughlin marked 2 inline comments as done.
  • Added StartVal as an external def to BestEpiPlan
  • Updated pr35432.ll without auto-generating the check lines