This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Fix in-loop reduction chains using VPlan def-use chains (NFCI)
ClosedPublic

Authored by fhahn on Jul 20 2023, 7:41 AM.

Details

Summary

Update adjustRecipesForReductions to directly use the VPlan def-use
chains for in-loop reductions to collect the reduction operations that
need adjusting.

This allows the removal of

  • ReductionChainMap
  • recording of recipes for instruction in the reduction chain
  • removes late uses of getVPValue
  • removes to need for removeVPValueFor.

Diff Detail

Unit TestsFailed

Event Timeline

fhahn created this revision.Jul 20 2023, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 7:41 AM
fhahn requested review of this revision.Jul 20 2023, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 7:41 AM
Herald added subscribers: wangpc, vkmr. · View Herald Transcript
Ayal added a comment.Jul 22 2023, 3:46 PM

Thanks for following-up on this nice cleanup!
Looking over the code raises various initial thoughts, which could be pursued in several steps/patches.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1611
1774–1775
1779–1780
9149–9150

?
Can also filter out non-isOrdered phi's when VF=1, to then iterate only over phi's of interest.

9153

Bypass this for loop if MinVF.isScalar()?

9188–9189
9188–9189

Hoist Kind and its assert for being a select-compare kind next to setting RdxDesc above?

9188–9189

IndexOfFirstOperand?
Initialize? Comment what this variable holds? (Admittedly missing from current code base.)

9189–9190

Would one SetVector work?

This represents the Chain - should it hold Recipes as Links, each residing inside the loop and having a single user (except last with live-out), rather than VPValues?

9190

Above initialization of Worklist tolerates recipes with multiple defined values, but here only single valued recipes are expected?

9193

Worth a comment explaining what's going to happen next.

9194–9195

Dead assert.

Worth a comment why we continue here - to skip over the compare of the select?
What about Fminimum/Fmaximum reductions which are isMinMaxRecurrenceKind but have a call to an intrinsic as their WidenRecipe, rather than a compare-select?

9216–9217

comment appears out of date.

9245
9249–9250
9253

Is this setting of Chain needed?

9257

Can this setting of Chain continue to be placed at the end of the loop below?

Ayal added inline comments.Jul 22 2023, 3:46 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9188–9189
9189
9189

Could this for loop be replaced by a simpler for (auto Link : Chain) { loop?

9189–9192

Comment deserves update - FMulAdd is also supported.
Potentially renamed isAnyOfRecurrenceKind() in D155786.

9190
9191
9245

It's somewhat alarming that we're indifferent to which operand is VecOp, in non commutative cases such as compare-select. The semantics of the compare/select are presumably captured by the min/max kind, rather than the condition itself (or its negation). Worth a comment, and ensuring test coverage of having VecOp in both positions.

9245

Or rather BasicBlock *CurrentLinkBB = CurrentLinkI->getParent();

Can also define and reuse: VPBasicBlock *CurrentLinkVPBB = CurrentLink->getParent();

9252–9253

Should this breaking of fmuladd into an fadd recurrence fed by an fmul take place separately, as a preparatory step?
Should compare-select pairs be replaced by Min/Max recipes?
In order to convert an (initially widened) reduction recurrence into an in-loop one, it seems that isolating the reduction operation into a single, dedicated operation would be useful. It could also help canonicalize where to find the reduced value operands, aka vecOps.

9253–9254
9259

Why/Is this needed?

Ayal added inline comments.Jul 22 2023, 11:07 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9153

Bypass this for loop if MinVF.isScalar()?

Ah, isOrdered reductions are considered inLoop even for VF=1 (but only them).

fhahn updated this revision to Diff 544541.Jul 26 2023, 3:30 PM
fhahn marked 27 inline comments as done.

Address latest comments, thanks! I tried to include most adjustments directly in this patch to keep things together.

fhahn added inline comments.Jul 26 2023, 3:32 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1611

Adjusted, thanks!

1774–1775

Updated, thanks!

1779–1780

Updated, thanks!

9149–9150

Adjusted and collect only VPReductionPHIRecipe

9188–9189

Updated, Thanks!

9188–9189

Hoisted out of loop, thanks!

9188–9189

Renamed & added comment. It should be guaranteed to be initialized before use.

9189

Updated to use iterator based loop which skips the first element (aka PreviousLink)

9189

Updated , thanks!

9189–9190

Updated to use SetVector and use VPRecipeBase as element type, thanks!

9189–9192

Updated the message; any reduction should be allowed, besides select/cmp ones.

9190

Updated, thanks!

Above initialization of Worklist tolerates recipes with multiple defined values, but here only single valued recipes are expected?

Added an assert in the collection loop

9191

Updated, thanks!

9193

added a comment, thanks!

9194–9195

Will update to have a check for the cmp only. In-loop reductions won't be used for those intrinsics at the moment. Will make sure there's a test. Added tests with min-max intrinsics in cc3986643696, for which in-loop reductions aren't applied yet

9216–9217

Updated to also include FMulAdd

9245

Updated, thanks!

9245

Yep, that looks right, added a comment, thanks!

Added test coverage with flipped operands in cc3986643696.

9249–9250

Updated, thanks!

9252–9253

Might be good as follow up?

9253

No, removed ,thanks!

9257

Yep, moved, thanks!

9259

Not needed in the latest version, removed!

Ayal added a comment.Jul 31 2023, 3:32 PM

Added some more comments to help clean this up more, can be applied now or later if preferred to keep the diff closer to a VPlan refactoring alone.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9155–9187

nit: Should this filtering be applied above when inserting into InLoopReductionPhis?
Should unordered reductions be considered non-isInLoop in VPlans of VF=1?

9196
9198

note: may assert that UserRecipe resides inside the vector loop, as the outside user is a LiveOut, at-least for now.

9199

assert CurrentLink is a select recipe (whose first operand index is 1)?

9207

nit: assert better placed above, asap, independent of filling Worklist.

9218

Treat FMulAdd separately from the binary operators - the operand along the chain is in index 2, go ahead and generate the FMul of indices 0 and 1 to produce VecOp? (Keeping the assert message intact ;-)

9220

nit: would always considering the last two indices be (correct and) easier than considering either first two indices or indices 1 and 2? Excluding FMulAdd which deserves a separate treatment.

9246

assert that VecOp != PreviousLinkV and that the other candidate does equal PreviousLinkV?
(CurrentLink->getOperand(IndexOfFirstOperand+IndexOfFirstOperand+1-VecOpId))

9247

Wonder if this information may already be recorded in VPlan - perhaps BlockInMask already exists?

9252–9253

Sure, if not two.

9252–9253
9252–9253

assert VecOpId == 0?

9256

Could we use PreviousLink serve as/instead of CompareRecipe?

fhahn updated this revision to Diff 546239.Aug 1 2023, 3:11 PM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

fhahn added inline comments.Aug 1 2023, 3:12 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9155–9187

Moved the filtering. Left scalar handling as is for now to keep with original code.

9196

Updated, thanks!

9198

Updated, thanks!

9199

added assert, thanks!

9207

Moved up, thanks!

9218

Moved, thanks!

9220

For the initial version, I think it would be good retain the original logic as much as possible. This also makes it easier to retain the asserts.

9246

Added assert, thanks!

9247

Not easily yet, but I think as follow-up this could be handled by checking the predecessors of the block to find the predecessor with branch-on-mask.

9252–9253

Not needed after code movement.

9256

Unfortunately I don't think so, but we can let dead-recipe removal take care of this removal. I removed the code.

Ayal added inline comments.Aug 1 2023, 4:17 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9161–9168
9177

?

9183
9185
9198

If VF=1 CurrentLink should be a Replicate instead of a Widen(Call) recipe?

Better place the above assert(RecurrenceDescriptor::isFMulAddIntrinsic(CurrentLinkI)) here?

9256

If PreviousLink is set to Cmp before early-continuing above?

OTOH, dead-recipe removal could also reclaim CurrentLink.

fhahn updated this revision to Diff 546406.Aug 2 2023, 4:26 AM
fhahn marked 9 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9161–9168

Reordered, thanks!

9177

Removed, thanks!

9183

Fixed, thanks!

9185

added, thanks!

9198

Moved assert and refined condition as suggested, thanks!

9252–9253

Adjusted, thanks!

9256

Yep, updated, thanks!

Ayal accepted this revision.Aug 2 2023, 7:23 AM

This looks good to me, thanks!
Adding last minor nits.

Would be good to see how adjustRecipesForReductions() may become a VPlan2VPlan pass, and apply fmuladd decomposition / minmax composition as separate, preparatory steps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9221

Note: in this case PreviousLink stays the recipe before the compare, feeding one of the operands of the select.

9242
9256

nit: may be worth noting somewhere that this transformation may leave dead recipes for a subsequent pass to clean up.

This revision is now accepted and ready to land.Aug 2 2023, 7:23 AM
This revision was landed with ongoing or failed builds.Aug 2 2023, 9:05 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 3 inline comments as done.
fhahn added inline comments.Aug 2 2023, 9:25 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9242

Sounds better, thanks!

9256

Added a note in the committed version, thanks!