This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Support conditional in-loop vector reductions
ClosedPublic

Authored by kmclaughlin on Jan 18 2022, 10:03 AM.

Details

Summary

Extends getReductionOpChain to look through Phis which may be part of
the reduction chain. adjustRecipesForReductions will now also create a
CondOp for VPReductionRecipe if the block is predicated and not only if
foldTailByMasking is true.

Changes were required in tryToBlend to ensure that we don't attempt
to convert the reduction Phi into a select by returning a VPBlendRecipe.
The VPReductionRecipe will create a select between the Phi and the reduction.

Diff Detail

Event Timeline

kmclaughlin created this revision.Jan 18 2022, 10:03 AM
kmclaughlin requested review of this revision.Jan 18 2022, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2022, 10:03 AM
david-arm added a comment.EditedJan 19 2022, 8:28 AM

This looks like a great addition to the vectoriser @kmclaughlin and nice to see us vectorising conditional in-loop reductions! I'm just a little concerned about the extra complexity we're adding when I'm not sure the patch currently supports chained reductions where at least one of them is conditional. If it's trivial to add support for chained reductions then that's great! Otherwise, perhaps it might be simpler initially to say if there is a conditional reduction we don't support them being chained? It might allow you to rewrite the code in a simpler way.

llvm/lib/Analysis/IVDescriptors.cpp
1070

Hi @kmclaughlin, I'm not sure if I can see tests that exercise the case where ExpectedUses != ExpectedPhiUses and ExpectedUses is actually used, i.e. in the while loop below:

while (Cur != RdxInstr) {
  ...

Is it possible to add test cases for this? Alternatively, as a first implementation we could decide not to support chained reductions where one of them is conditional? In that way, it might be possible to simplify the code.

1134

I'm a bit surprised that getNextInstruction(Phi) returns LoopExitInstr here because there are instructions in-between the reduction phi and the loop exit phi, i.e. the reduction operation in the if block. Although I do believe this is what you're seeing and the code seems to work!!

1138

Is there ever a case where this check fails despite LoopExitInstr being a PHI node?

  • Changed getNextInstruction to iterate over Cur->users() and handle Phi nodes found by moving to the next user, similar to ICmp/FCmp.
  • Removed the dyn_cast<PHINode>(Cur) == LoopExitInstr block as Phis are now handled by getNextInstruction.
  • Added tests for various scenarios involving chained reductions where we should not vectorise with in-loop reductions.

Thank you for reviewing these changes, @david-arm!

llvm/lib/Analysis/IVDescriptors.cpp
1070

Hi @david-arm, I think it's simpler to not support a chain of conditional reductions here to begin with. I've added some tests to reduction-inloop-cond.ll for various loops containing chained conditional reductions, including @uncond_cond where Phi does not have the expected number of uses.

1134

This worked for some cases, but you're right that it was incorrect for the LoopExitInstr to be returned first. I've rewritten getNextInstruction as you suggested, so that we look to the next user if a Phi node is found and added a test case (@simple_chained_rdx) with two reduction operations between the Phi & LoopExitInst.

1138

If there is a phi node in the chain between Phi and the LoopExitInstr then this check could fail, though isCorrectOpcode(Cur) will return false. I've added a test for this to reduction-inloop-cond.ll.
I've removed this check though as it isn't necessary after the changes to getNextInstruction.

Hi @kmclaughlin, this looks a lot better, thanks for taking the time to address my previous comments and add the exhaustive set of tests! I just had some minor comments, mostly about the tests.

llvm/lib/Analysis/IVDescriptors.cpp
1070

nit: If you agree, it might be worth moving this variable down to the place it's used? i.e. just above

if (auto ExitPhi = dyn_cast<PHINode>(LoopExitInstr)) {
1110

nit: I think you can probably simplify this a little with something like this:

if (Inc0 == Phi)
  Chain = Inc1;
else if (Inc1 == Phi)
  Chain = Inc0;
else
  return {};
llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll
27

I think this code looks right because the inactive lanes will be zero, which matches the identity value for an fadd. However, it might be clearer if you remove the -dce -instcombine flags so we can see the select? I assume the select has been folded away.

llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll
235

Hi @kmclaughlin, it doesn't look there are multiple conditional and instructions in the loop?

519

nit: Maybe a better name for this is something like @nested_cond_and to distinguish from the other cond_and test?

527

Maybe for this negative test you don't need the CHECK lines here and perhaps something like this is sufficient?

; CHECK: vector.body
; CHECK-NOT: llvm.vector.reduce.and.v4i64
; CHECK: middle.block
; CHECK: llvm.vector.reduce.and.v4i64
; CHECK: scalar.ph
690

Perhaps worth adding a bit more here, i.e.

; Chain of conditional & unconditional reductions. We currently only support conditional reductions
; if they are the last in the chain, i.e. the loop exit instruction is a PHI node. Therefore, we reject the
; PHI (%rdx1) as it has more than one use.

Do you think that makes it a bit clearer?

697

SImilar to the comment in @cond_and I don't think you need all the CHECK lines here.

827

Same comment as @cond_and about the CHECK lines. :)

1008

Same comment as @cond_and for the CHECK lines.

1138

Same comment as @cond_and for the CHECK lines.

kmclaughlin marked 11 inline comments as done.
  • Renamed the @multiple_cond_ands test to @unconditional_and.
  • Removed the -instcombine & -dce flags from scalable-reduction-inloop-cond.ll.
  • Simplified the CHECK lines for the negative tests in reduction-inloop-cond.ll.
kmclaughlin added inline comments.Feb 4 2022, 9:46 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll
27

The select has been folded away, I've removed the flags from this test so that it's hopefully a bit clearer.

llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll
235

Hi @david-arm, I've renamed this to @unconditional_and since there's only one and in the loop.

690

I think that's clearer, thanks!

Hi @kmclaughlin, this looks great now, thanks for making the changes! I just had one comment about a test, but apart from that looks good. :)

llvm/test/Transforms/LoopVectorize/reduction-inloop-cond.ll
342

Hi @kmclaughlin, sorry to be a pain, but I just realised the and here is not actually conditional because it's always executed. I think this would also vectorise without your patch? Maybe it's worth moving the and instruction into the if.then block?

kmclaughlin marked an inline comment as done.
  • Moved the and reduction in the @unconditional_and test into the if.then block.

This looks really good now @kmclaughlin! Thanks for strengthening the getReductionOpChain code and adding a plethora of negative chained reduction tests. :) I just have a few more minor comments, but I think it's almost good to go!

llvm/lib/Analysis/IVDescriptors.cpp
1111

nit: Before merging could you simplify this to something like

if (Inc0 == Phi)
  Chain = Inc1;
else if (Inc1 == Phi)
  Chain = Inc0;
else
  return {};

Thanks!

1129–1131

nit: I think for conditional min/max reductions there would be three uses. Perhaps you can reword as something like:

// Check that the Phi has one (or two for min/max) uses for unconditional reductions, plus
// an extra use for conditional reductions.

What do you think?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8597

It might be worth adding more asserts here, i.e. assert that if any operand is an in-loop reduction that we only have two incoming values, as well as asserting that there is only one in-loop reduction operand?

8619–8620

nit: If you move this higher up I think you can reuse it, i.e.

unsigned NumIncoming = Phi->getNumIncomingValues();

// For in-loop reductions, we do not need to create an additional select.
if (Phi->getNumIncomingValues() == 2) {
kmclaughlin marked 4 inline comments as done.
  • Changes to tryToBlend() so that all incoming values of the Phi are checked for in-loop reductions.
  • Added an assert to tryToBlend() that the number of incoming values to the Phi is 2 if an in-loop reduction is found. Also added an assert that only one of the incoming values is an in-loop reduction.
  • Reworded the comment describing the number of uses of Phi in getReductionOpChain().
llvm/lib/Analysis/IVDescriptors.cpp
1129–1131

I think that's more accurate, thanks!

david-arm accepted this revision.Feb 21 2022, 8:43 AM

LGTM! Thanks for the changes. :)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8597

I think this is ok for now because we don't support chained reductions. If at some point we do want to support them, then we may have to do more work to look through the chain.

This revision is now accepted and ready to land.Feb 21 2022, 8:43 AM