Page MenuHomePhabricator

[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
ClosedPublic

Authored by fhahn on Dec 21 2021, 11:55 AM.

Details

Summary

This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

  1. First step to avoid setting both vector and scalar values for the same induction def.
  2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan
  3. Only need to splat the vector IV for block in masks.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ayal added inline comments.Jan 12 2022, 5:09 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
390

WidenCan is going to be replaced, question is by what; check instead if StepVector is needed (one recipe producing only scalars the other vectors (deserves more review thoughts...)) and if so create it, followed by a single replacing and erasing of WidenCan?

fhahn updated this revision to Diff 399590.Jan 13 2022, 1:43 AM
fhahn marked 12 inline comments as done.

Address latest comments. Pulled in onlyFirstLaneDemanded from D116554. Unfortunately D116554 itself cannot be applied before this patch, because the stepvector transition is required for the use in buildScalarSteps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2577

Landed the change separately in 7ce48be0fd83 . StartIdx depends on Part though, so I left it in the loop for now.

2584

Thanks, I applied the folding and added an assert in 7ce48be0fd83.

8393

Good idea, split off to D117140

llvm/lib/Transforms/Vectorize/VPlan.h
873

Added documentation and also added it for other recipes where needed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
336

Updated in the split-off review.

356

It's sufficient to check whether the start is scalar. There's no need to clamp the end.

373

Moved!

390

Updated to check if StepVector is needed and sink replace & erase code.

396

This is not needed in the latest version, removed.

fhahn updated this revision to Diff 400380.Jan 16 2022, 7:34 AM

Introduce VPRecipeBase::onlyFirstLaneUsed & VPRecipeBase::onlyScalarsUsed as suggested by @Ayal in D116554.

fhahn updated this revision to Diff 400951.Jan 18 2022, 12:32 PM

Rebased after latest changes to D117140.

Ayal added inline comments.Jan 19 2022, 8:35 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
1686

Demanded == Used?

llvm/lib/Transforms/Vectorize/VPlan.h
76

Place above next to getRuntimeVF()?

81

(Indep. of this patch, but while we're here:)
"The sequence starts at StartIndex." - redundant?
\p Opcode >> \p BinOp

772

Default is to return false conservatively.

779

ditto

927

The recipes along the canonical IV chain also use first lane only, namely: CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount and VPCanonicalIVPHIRecipe? They are also "onlyFirstPartUsed()", but are unique in that.

1379

Comment that "// Recursing through Blend recipes only, must terminate at header phi's the latest." ?

1761

Already confirmed above that Op is the address?

1764

... of their address.

return Op == getAddr() && isConsecutive(); ?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342

Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes-built-from-Truncs, so the latter will be excluded if we only visit phis() here? OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

351–366

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

fhahn updated this revision to Diff 401290.Jan 19 2022, 9:38 AM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

fhahn updated this revision to Diff 402314.Jan 23 2022, 4:24 AM

Rebase & ping :)

llvm/lib/Transforms/Vectorize/VPlan.cpp
1686

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
76

Moved up, thanks!

81

Adjusted, thanks!

772

extended comment.

779

extended comment.

927

Updated to include CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount opcodes.

1379

Added the comment, thanks!

1764

Simplified as suggested, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342

OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

I don't think so, as we should widen using the canonical IVs type, which would be the wider type.

351–366

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Ayal added inline comments.Jan 25 2022, 12:00 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4794

Wonder if this is still the case now that NewIV is always created to feed the vector compare, until replaced?

llvm/lib/Transforms/Vectorize/VPlan.cpp
769

Assert is redundant given than Step was created above as a ConstantInt - assert instead that Val is integer, before the loop?

llvm/lib/Transforms/Vectorize/VPlan.h
927

Also have VPCanonicalIVPHIRecipe indicate it uses only first lane?

1383

Could a user be live-out / not a recipe?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
351–366

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Ah, you're right, I forgot foldTail may operate when unrolling only; it should probably stop doing that, but in a separate patch.

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Follow-up TODO would be fine.

Still would be good to try and simplify the logic here that checks if OriginalIV cannot replace NewIV (but instead a StepVector'd CanonicalIV replaces NewIV): FoldTail must be true given that NewIV exists, so can be asserted or not passed-in; which leaves

Range includes only positive VF's &&
onlyFirstLaneUsed(WidenOriginalIV) &&
onlyScalarsUsed(WidenOriginalIV) &&
NewIV feeds some VPInstruction that isn't ActiveLaneMask
?

fhahn updated this revision to Diff 402978.Jan 25 2022, 11:16 AM

A major update & simplification: the latest version doesn't need a separate step-vector recipe; if a 'step-vector' is needed, the VPWidenCanonicalInduction recipe is retained. It also removes the VPlan-based analysis of the uses in favor of looking up the recorded information added in D118167.

Note that it still checks if all users of the induction are active-lane-masks. The legacy cost-model doesn't know about them, but if it all users are active-lane-masks, no widened canonical IV is needed, only the scalar steps.

fhahn marked 4 inline comments as done.Jan 25 2022, 11:21 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4794

I think for compatibility with existing behavior we still need to retain it, as it impacts the decision whether the scalarize the induction or not. Could be investigated in a follow-up change, once all related patches are through.

llvm/lib/Transforms/Vectorize/VPlan.cpp
769

This code is now gone in the latest version.

llvm/lib/Transforms/Vectorize/VPlan.h
927

Will update, but the code has moved to a separate patch again.

1383

at the moment there's not possible, all users must be a recipe (also the code is gone from this patch)

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
351–366

The logic here has been simplified with help of D118167

fhahn retitled this revision from [VPlan] Handle IV vector splat as VPInstruction. to [VPlan] Handle IV vector splat using VPWidenCanonicalIV..Jan 26 2022, 4:30 AM
fhahn edited the summary of this revision. (Show Details)
Ayal added inline comments.Jan 26 2022, 11:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2572

OK, this is synonymous with NeedsVectorIV(), capturing it more accurately.

Follows the above comment:

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Have VPWidenCanonicalIVRecipe also support needsScalarIVOnly() - by checking if all it's users are ActiveLaneMask VPInstructions, and needsScalarIV() - by checking if any user is ActiveLaneMask?

Then check

if ((WidenNewIV->needsScalarIV() && WidenOriginalIV->needsScalarIV()) ||
    (!WidenNewIV->needsScalarIVOnly() && !WidenOriginalIV->needsScalarIVOnly())) {
    WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
    WidenNewIV->eraseFromParent();
}

?

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
175–176

Should vp<%2> be vp<[[CAN_IV]]> ?

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153 ↗(On Diff #402978)

Hmm, this enables sink scalar operands to handle scalar steps.

fhahn updated this revision to Diff 403549.Jan 27 2022, 1:54 AM
fhahn marked 8 inline comments as done.

Addressed latest comments, thanks!

Clarified comment for replacement check for WidenNewIV and WidenOriginalIV, update test to use CAN_IV variable.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2572

Updated D118167 to use needsVectorIV

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Yes, that's a better summary! I updated the comment.

I am not sure about adding needsScalarIVOnly/needsScalarIV to VPWidenCanonicalIVRecipe, as this distinction seems only relevant here and might muddy the waters a bit, as the recipe will always generate a vector value (and I don't think we should change that). I left the check here as is for now, but once D116554 lands this check can be replaced with onlyFirstLaneUsed(WidenNewIV).

I think to wrap things up, it is worth to keep the current order of patches and have the temporary check of WidenNewIV's users here, to simplify the patch ordering.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
175–176

Yep, updated, thanks!

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153 ↗(On Diff #402978)

I think this is caused by the drop of the check for the scalar VF; we add the VPWidenCanonicalInduction, but the cost model knows that the induction recipe will be scalar only, so we cannot replace the VPWidenCanonicalInduction with the other induction recipe and we have some extra steps. Given that the VF=1,UF>1 is somewhat of an edge case, it seems cleaner to not check for the scalar VF, as suggested earlier.

Ayal accepted this revision.Jan 27 2022, 2:31 AM

This is fine, with a last nit

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356

Very well.
How about just setting here

bool WidenNewIVOnlyFirstLaneUsed = all_of(...);
if (WidenOriginalIV->needsVectorIV() || WidenNewIVOnlyFirstLaneUsed) {
  WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
  WidenNewIV->eraseFromParent();
}

?

This revision is now accepted and ready to land.Jan 27 2022, 2:31 AM
fhahn updated this revision to Diff 403567.Jan 27 2022, 3:02 AM

Thanks Ayal! Moved the all_of check outside the if and assigned to variable.

This change now depends on D118167, which is NFC and adds the needsVectorIV/needsScalarIV required.

fhahn updated this revision to Diff 404274.Jan 29 2022, 8:17 AM

Rebased on current main, I am planning on landing this soon.

This revision was landed with ongoing or failed builds.Jan 29 2022, 8:25 AM
This revision was automatically updated to reflect the committed changes.