This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Fix in-loop reduction chains using VPlan def-use chains (NFCI)
ClosedPublic

Authored by fhahn on Jul 20 2023, 7:41 AM.

Details

Summary

Update adjustRecipesForReductions to directly use the VPlan def-use
chains for in-loop reductions to collect the reduction operations that
need adjusting.

This allows the removal of

  • ReductionChainMap
  • recording of recipes for instruction in the reduction chain
  • removes late uses of getVPValue
  • removes to need for removeVPValueFor.

Diff Detail

Event Timeline

fhahn created this revision.Jul 20 2023, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 7:41 AM
fhahn requested review of this revision.Jul 20 2023, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 7:41 AM
Herald added subscribers: wangpc, vkmr. · View Herald Transcript
Ayal added a comment.Jul 22 2023, 3:46 PM

Thanks for following-up on this nice cleanup!
Looking over the code raises various initial thoughts, which could be pursued in several steps/patches.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1611
1774–1775
1779–1780
9153–9154

?
Can also filter out non-isOrdered phi's when VF=1, to then iterate only over phi's of interest.

9157

Bypass this for loop if MinVF.isScalar()?

9194–9195
9194–9195

Hoist Kind and its assert for being a select-compare kind next to setting RdxDesc above?

9194–9195

IndexOfFirstOperand?
Initialize? Comment what this variable holds? (Admittedly missing from current code base.)

9195–9196

Would one SetVector work?

This represents the Chain - should it hold Recipes as Links, each residing inside the loop and having a single user (except last with live-out), rather than VPValues?

9196

Above initialization of Worklist tolerates recipes with multiple defined values, but here only single valued recipes are expected?

9199

Worth a comment explaining what's going to happen next.

9203–9204

Dead assert.

Worth a comment why we continue here - to skip over the compare of the select?
What about Fminimum/Fmaximum reductions which are isMinMaxRecurrenceKind but have a call to an intrinsic as their WidenRecipe, rather than a compare-select?

9220–9221

comment appears out of date.

9249
9253–9254
9257

Is this setting of Chain needed?

9261

Can this setting of Chain continue to be placed at the end of the loop below?

Ayal added inline comments.Jul 22 2023, 3:46 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9194–9195
9195
9195

Could this for loop be replaced by a simpler for (auto Link : Chain) { loop?

9195–9198

Comment deserves update - FMulAdd is also supported.
Potentially renamed isAnyOfRecurrenceKind() in D155786.

9196
9197
9249

It's somewhat alarming that we're indifferent to which operand is VecOp, in non commutative cases such as compare-select. The semantics of the compare/select are presumably captured by the min/max kind, rather than the condition itself (or its negation). Worth a comment, and ensuring test coverage of having VecOp in both positions.

9249

Or rather BasicBlock *CurrentLinkBB = CurrentLinkI->getParent();

Can also define and reuse: VPBasicBlock *CurrentLinkVPBB = CurrentLink->getParent();

9256–9257

Should this breaking of fmuladd into an fadd recurrence fed by an fmul take place separately, as a preparatory step?
Should compare-select pairs be replaced by Min/Max recipes?
In order to convert an (initially widened) reduction recurrence into an in-loop one, it seems that isolating the reduction operation into a single, dedicated operation would be useful. It could also help canonicalize where to find the reduced value operands, aka vecOps.

9257–9258
9263

Why/Is this needed?

Ayal added inline comments.Jul 22 2023, 11:07 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9157

Bypass this for loop if MinVF.isScalar()?

Ah, isOrdered reductions are considered inLoop even for VF=1 (but only them).

fhahn updated this revision to Diff 544541.Jul 26 2023, 3:30 PM
fhahn marked 27 inline comments as done.

Address latest comments, thanks! I tried to include most adjustments directly in this patch to keep things together.

fhahn added inline comments.Jul 26 2023, 3:32 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1611

Adjusted, thanks!

1774–1775

Updated, thanks!

1779–1780

Updated, thanks!

9153–9154

Adjusted and collect only VPReductionPHIRecipe

9194–9195

Updated, Thanks!

9194–9195

Hoisted out of loop, thanks!

9194–9195

Renamed & added comment. It should be guaranteed to be initialized before use.

9195

Updated to use iterator based loop which skips the first element (aka PreviousLink)

9195

Updated , thanks!

9195–9196

Updated to use SetVector and use VPRecipeBase as element type, thanks!

9195–9198

Updated the message; any reduction should be allowed, besides select/cmp ones.

9196

Updated, thanks!

Above initialization of Worklist tolerates recipes with multiple defined values, but here only single valued recipes are expected?

Added an assert in the collection loop

9197

Updated, thanks!

9199

added a comment, thanks!

9203–9204

Will update to have a check for the cmp only. In-loop reductions won't be used for those intrinsics at the moment. Will make sure there's a test. Added tests with min-max intrinsics in cc3986643696, for which in-loop reductions aren't applied yet

9220–9221

Updated to also include FMulAdd

9249

Updated, thanks!

9249

Yep, that looks right, added a comment, thanks!

Added test coverage with flipped operands in cc3986643696.

9253–9254

Updated, thanks!

9256–9257

Might be good as follow up?

9257

No, removed ,thanks!

9261

Yep, moved, thanks!

9263

Not needed in the latest version, removed!

Ayal added a comment.Jul 31 2023, 3:32 PM

Added some more comments to help clean this up more, can be applied now or later if preferred to keep the diff closer to a VPlan refactoring alone.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9159–9193

nit: Should this filtering be applied above when inserting into InLoopReductionPhis?
Should unordered reductions be considered non-isInLoop in VPlans of VF=1?

9204

note: may assert that UserRecipe resides inside the vector loop, as the outside user is a LiveOut, at-least for now.

9205
9208

assert CurrentLink is a select recipe (whose first operand index is 1)?

9213

nit: assert better placed above, asap, independent of filling Worklist.

9222

Treat FMulAdd separately from the binary operators - the operand along the chain is in index 2, go ahead and generate the FMul of indices 0 and 1 to produce VecOp? (Keeping the assert message intact ;-)

9226

nit: would always considering the last two indices be (correct and) easier than considering either first two indices or indices 1 and 2? Excluding FMulAdd which deserves a separate treatment.

9250

assert that VecOp != PreviousLinkV and that the other candidate does equal PreviousLinkV?
(CurrentLink->getOperand(IndexOfFirstOperand+IndexOfFirstOperand+1-VecOpId))

9251

Wonder if this information may already be recorded in VPlan - perhaps BlockInMask already exists?

9256–9257

Sure, if not two.

9256–9257
9256–9257

assert VecOpId == 0?

9260

Could we use PreviousLink serve as/instead of CompareRecipe?

fhahn updated this revision to Diff 546239.Aug 1 2023, 3:11 PM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

fhahn added inline comments.Aug 1 2023, 3:12 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9159–9193

Moved the filtering. Left scalar handling as is for now to keep with original code.

9204

Updated, thanks!

9205

Updated, thanks!

9208

added assert, thanks!

9213

Moved up, thanks!

9222

Moved, thanks!

9226

For the initial version, I think it would be good retain the original logic as much as possible. This also makes it easier to retain the asserts.

9250

Added assert, thanks!

9251

Not easily yet, but I think as follow-up this could be handled by checking the predecessors of the block to find the predecessor with branch-on-mask.

9256–9257

Not needed after code movement.

9260

Unfortunately I don't think so, but we can let dead-recipe removal take care of this removal. I removed the code.

Ayal added inline comments.Aug 1 2023, 4:17 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9165–9172
9181

?

9187
9189
9207

If VF=1 CurrentLink should be a Replicate instead of a Widen(Call) recipe?

Better place the above assert(RecurrenceDescriptor::isFMulAddIntrinsic(CurrentLinkI)) here?

9260

If PreviousLink is set to Cmp before early-continuing above?

OTOH, dead-recipe removal could also reclaim CurrentLink.

fhahn updated this revision to Diff 546406.Aug 2 2023, 4:26 AM
fhahn marked 9 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9165–9172

Reordered, thanks!

9181

Removed, thanks!

9187

Fixed, thanks!

9189

added, thanks!

9207

Moved assert and refined condition as suggested, thanks!

9256–9257

Adjusted, thanks!

9260

Yep, updated, thanks!

Ayal accepted this revision.Aug 2 2023, 7:23 AM

This looks good to me, thanks!
Adding last minor nits.

Would be good to see how adjustRecipesForReductions() may become a VPlan2VPlan pass, and apply fmuladd decomposition / minmax composition as separate, preparatory steps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9225

Note: in this case PreviousLink stays the recipe before the compare, feeding one of the operands of the select.

9246
9260

nit: may be worth noting somewhere that this transformation may leave dead recipes for a subsequent pass to clean up.

This revision is now accepted and ready to land.Aug 2 2023, 7:23 AM
This revision was landed with ongoing or failed builds.Aug 2 2023, 9:05 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 3 inline comments as done.
fhahn added inline comments.Aug 2 2023, 9:25 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9246

Sounds better, thanks!

9260

Added a note in the committed version, thanks!