This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Fix in-loop reduction chains using VPlan def-use chains (NFCI)
ClosedPublic

Authored by fhahn on Jul 20 2023, 7:41 AM.

Details

Summary

Update adjustRecipesForReductions to directly use the VPlan def-use
chains for in-loop reductions to collect the reduction operations that
need adjusting.

This allows the removal of

  • ReductionChainMap
  • recording of recipes for instruction in the reduction chain
  • removes late uses of getVPValue
  • removes to need for removeVPValueFor.

Diff Detail

Event Timeline

fhahn created this revision.Jul 20 2023, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 7:41 AM
fhahn requested review of this revision.Jul 20 2023, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 7:41 AM
Herald added subscribers: wangpc, vkmr. · View Herald Transcript
Ayal added a comment.Jul 22 2023, 3:46 PM

Thanks for following-up on this nice cleanup!
Looking over the code raises various initial thoughts, which could be pursued in several steps/patches.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1611
1774–1775
1779–1780
9149–9156

?
Can also filter out non-isOrdered phi's when VF=1, to then iterate only over phi's of interest.

9153

Bypass this for loop if MinVF.isScalar()?

9162–9163
9162–9163

Hoist Kind and its assert for being a select-compare kind next to setting RdxDesc above?

9162–9163

IndexOfFirstOperand?
Initialize? Comment what this variable holds? (Admittedly missing from current code base.)

9163–9164

Would one SetVector work?

This represents the Chain - should it hold Recipes as Links, each residing inside the loop and having a single user (except last with live-out), rather than VPValues?

9164

Above initialization of Worklist tolerates recipes with multiple defined values, but here only single valued recipes are expected?

9167

Worth a comment explaining what's going to happen next.

9201–9202

Dead assert.

Worth a comment why we continue here - to skip over the compare of the select?
What about Fminimum/Fmaximum reductions which are isMinMaxRecurrenceKind but have a call to an intrinsic as their WidenRecipe, rather than a compare-select?

9208–9209

comment appears out of date.

9222
9227–9228
9231

Is this setting of Chain needed?

9249

Can this setting of Chain continue to be placed at the end of the loop below?

Ayal added inline comments.Jul 22 2023, 3:46 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9162–9163
9163
9163

Could this for loop be replaced by a simpler for (auto Link : Chain) { loop?

9163–9195

Comment deserves update - FMulAdd is also supported.
Potentially renamed isAnyOfRecurrenceKind() in D155786.

9164
9165
9219

It's somewhat alarming that we're indifferent to which operand is VecOp, in non commutative cases such as compare-select. The semantics of the compare/select are presumably captured by the min/max kind, rather than the condition itself (or its negation). Worth a comment, and ensuring test coverage of having VecOp in both positions.

9222

Or rather BasicBlock *CurrentLinkBB = CurrentLinkI->getParent();

Can also define and reuse: VPBasicBlock *CurrentLinkVPBB = CurrentLink->getParent();

9234

Should this breaking of fmuladd into an fadd recurrence fed by an fmul take place separately, as a preparatory step?
Should compare-select pairs be replaced by Min/Max recipes?
In order to convert an (initially widened) reduction recurrence into an in-loop one, it seems that isolating the reduction operation into a single, dedicated operation would be useful. It could also help canonicalize where to find the reduced value operands, aka vecOps.

9241–9245
9261

Why/Is this needed?

Ayal added inline comments.Jul 22 2023, 11:07 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9153

Bypass this for loop if MinVF.isScalar()?

Ah, isOrdered reductions are considered inLoop even for VF=1 (but only them).

fhahn updated this revision to Diff 544541.Jul 26 2023, 3:30 PM
fhahn marked 27 inline comments as done.

Address latest comments, thanks! I tried to include most adjustments directly in this patch to keep things together.

fhahn added inline comments.Jul 26 2023, 3:32 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1611

Adjusted, thanks!

1774–1775

Updated, thanks!

1779–1780

Updated, thanks!

9149–9156

Adjusted and collect only VPReductionPHIRecipe

9162–9163

Updated, Thanks!

9162–9163

Hoisted out of loop, thanks!

9162–9163

Renamed & added comment. It should be guaranteed to be initialized before use.

9163

Updated to use iterator based loop which skips the first element (aka PreviousLink)

9163

Updated , thanks!

9163–9164

Updated to use SetVector and use VPRecipeBase as element type, thanks!

9163–9195

Updated the message; any reduction should be allowed, besides select/cmp ones.

9164

Updated, thanks!

Above initialization of Worklist tolerates recipes with multiple defined values, but here only single valued recipes are expected?

Added an assert in the collection loop

9165

Updated, thanks!

9167

added a comment, thanks!

9201–9202

Will update to have a check for the cmp only. In-loop reductions won't be used for those intrinsics at the moment. Will make sure there's a test. Added tests with min-max intrinsics in cc3986643696, for which in-loop reductions aren't applied yet

9208–9209

Updated to also include FMulAdd

9219

Yep, that looks right, added a comment, thanks!

Added test coverage with flipped operands in cc3986643696.

9222

Updated, thanks!

9227–9228

Updated, thanks!

9231

No, removed ,thanks!

9234

Might be good as follow up?

9249

Yep, moved, thanks!

9261

Not needed in the latest version, removed!

Ayal added a comment.Jul 31 2023, 3:32 PM

Added some more comments to help clean this up more, can be applied now or later if preferred to keep the diff closer to a VPlan refactoring alone.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9161

nit: Should this filtering be applied above when inserting into InLoopReductionPhis?
Should unordered reductions be considered non-isInLoop in VPlans of VF=1?

9172

note: may assert that UserRecipe resides inside the vector loop, as the outside user is a LiveOut, at-least for now.

9181

nit: assert better placed above, asap, independent of filling Worklist.

9194

nit: would always considering the last two indices be (correct and) easier than considering either first two indices or indices 1 and 2? Excluding FMulAdd which deserves a separate treatment.

9203
9206

assert CurrentLink is a select recipe (whose first operand index is 1)?

9210

Treat FMulAdd separately from the binary operators - the operand along the chain is in index 2, go ahead and generate the FMul of indices 0 and 1 to produce VecOp? (Keeping the assert message intact ;-)

9220

assert that VecOp != PreviousLinkV and that the other candidate does equal PreviousLinkV?
(CurrentLink->getOperand(IndexOfFirstOperand+IndexOfFirstOperand+1-VecOpId))

9225

Wonder if this information may already be recorded in VPlan - perhaps BlockInMask already exists?

9233
9234

Sure, if not two.

9235

assert VecOpId == 0?

9244

Could we use PreviousLink serve as/instead of CompareRecipe?

fhahn updated this revision to Diff 546239.Aug 1 2023, 3:11 PM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

fhahn added inline comments.Aug 1 2023, 3:12 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9161

Moved the filtering. Left scalar handling as is for now to keep with original code.

9172

Updated, thanks!

9181

Moved up, thanks!

9194

For the initial version, I think it would be good retain the original logic as much as possible. This also makes it easier to retain the asserts.

9203

Updated, thanks!

9206

added assert, thanks!

9210

Moved, thanks!

9220

Added assert, thanks!

9225

Not easily yet, but I think as follow-up this could be handled by checking the predecessors of the block to find the predecessor with branch-on-mask.

9235

Not needed after code movement.

9244

Unfortunately I don't think so, but we can let dead-recipe removal take care of this removal. I removed the code.

Ayal added inline comments.Aug 1 2023, 4:17 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9167–9174
9183

?

9189
9191
9205

If VF=1 CurrentLink should be a Replicate instead of a Widen(Call) recipe?

Better place the above assert(RecurrenceDescriptor::isFMulAddIntrinsic(CurrentLinkI)) here?

9244

If PreviousLink is set to Cmp before early-continuing above?

OTOH, dead-recipe removal could also reclaim CurrentLink.

fhahn updated this revision to Diff 546406.Aug 2 2023, 4:26 AM
fhahn marked 9 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9167–9174

Reordered, thanks!

9183

Removed, thanks!

9189

Fixed, thanks!

9191

added, thanks!

9205

Moved assert and refined condition as suggested, thanks!

9233

Adjusted, thanks!

9244

Yep, updated, thanks!

Ayal accepted this revision.Aug 2 2023, 7:23 AM

This looks good to me, thanks!
Adding last minor nits.

Would be good to see how adjustRecipesForReductions() may become a VPlan2VPlan pass, and apply fmuladd decomposition / minmax composition as separate, preparatory steps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9213

Note: in this case PreviousLink stays the recipe before the compare, feeding one of the operands of the select.

9234
9244

nit: may be worth noting somewhere that this transformation may leave dead recipes for a subsequent pass to clean up.

This revision is now accepted and ready to land.Aug 2 2023, 7:23 AM
This revision was landed with ongoing or failed builds.Aug 2 2023, 9:05 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 3 inline comments as done.
fhahn added inline comments.Aug 2 2023, 9:25 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9234

Sounds better, thanks!

9244

Added a note in the committed version, thanks!