Page MenuHomePhabricator

[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
ClosedPublic

Authored by fhahn on Dec 21 2021, 11:55 AM.

Details

Summary

This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

  1. First step to avoid setting both vector and scalar values for the same induction def.
  2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan
  3. Only need to splat the vector IV for block in masks.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ayal added inline comments.Jan 12 2022, 5:09 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
391

WidenCan is going to be replaced, question is by what; check instead if StepVector is needed (one recipe producing only scalars the other vectors (deserves more review thoughts...)) and if so create it, followed by a single replacing and erasing of WidenCan?

fhahn updated this revision to Diff 399590.Jan 13 2022, 1:43 AM
fhahn marked 12 inline comments as done.

Address latest comments. Pulled in onlyFirstLaneDemanded from D116554. Unfortunately D116554 itself cannot be applied before this patch, because the stepvector transition is required for the use in buildScalarSteps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2570

Landed the change separately in 7ce48be0fd83 . StartIdx depends on Part though, so I left it in the loop for now.

2577

Thanks, I applied the folding and added an assert in 7ce48be0fd83.

8417

Good idea, split off to D117140

llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗(On Diff #395732)

Added documentation and also added it for other recipes where needed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
337

Updated in the split-off review.

357

It's sufficient to check whether the start is scalar. There's no need to clamp the end.

374

Moved!

391

Updated to check if StepVector is needed and sink replace & erase code.

397

This is not needed in the latest version, removed.

fhahn updated this revision to Diff 400380.Jan 16 2022, 7:34 AM

Introduce VPRecipeBase::onlyFirstLaneUsed & VPRecipeBase::onlyScalarsUsed as suggested by @Ayal in D116554.

fhahn updated this revision to Diff 400951.Jan 18 2022, 12:32 PM

Rebased after latest changes to D117140.

Ayal added inline comments.Jan 19 2022, 8:35 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
1674 ↗(On Diff #400951)

Demanded == Used?

llvm/lib/Transforms/Vectorize/VPlan.h
75 ↗(On Diff #400951)

Place above next to getRuntimeVF()?

80 ↗(On Diff #400951)

(Indep. of this patch, but while we're here:)
"The sequence starts at StartIndex." - redundant?
\p Opcode >> \p BinOp

773 ↗(On Diff #400951)

Default is to return false conservatively.

780 ↗(On Diff #400951)

ditto

926 ↗(On Diff #400951)

The recipes along the canonical IV chain also use first lane only, namely: CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount and VPCanonicalIVPHIRecipe? They are also "onlyFirstPartUsed()", but are unique in that.

1364 ↗(On Diff #400951)

Comment that "// Recursing through Blend recipes only, must terminate at header phi's the latest." ?

1744 ↗(On Diff #400951)

Already confirmed above that Op is the address?

1747 ↗(On Diff #400951)

... of their address.

return Op == getAddr() && isConsecutive(); ?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342

Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes-built-from-Truncs, so the latter will be excluded if we only visit phis() here? OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

362

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

fhahn updated this revision to Diff 401290.Jan 19 2022, 9:38 AM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

fhahn updated this revision to Diff 402314.Jan 23 2022, 4:24 AM

Rebase & ping :)

llvm/lib/Transforms/Vectorize/VPlan.cpp
1674 ↗(On Diff #400951)

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
75 ↗(On Diff #400951)

Moved up, thanks!

80 ↗(On Diff #400951)

Adjusted, thanks!

773 ↗(On Diff #400951)

extended comment.

780 ↗(On Diff #400951)

extended comment.

926 ↗(On Diff #400951)

Updated to include CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount opcodes.

1364 ↗(On Diff #400951)

Added the comment, thanks!

1747 ↗(On Diff #400951)

Simplified as suggested, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342

OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

I don't think so, as we should widen using the canonical IVs type, which would be the wider type.

362

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Ayal added inline comments.Jan 25 2022, 12:00 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4801

Wonder if this is still the case now that NewIV is always created to feed the vector compare, until replaced?

llvm/lib/Transforms/Vectorize/VPlan.cpp
769 ↗(On Diff #402314)

Assert is redundant given than Step was created above as a ConstantInt - assert instead that Val is integer, before the loop?

llvm/lib/Transforms/Vectorize/VPlan.h
926 ↗(On Diff #400951)

Also have VPCanonicalIVPHIRecipe indicate it uses only first lane?

1383 ↗(On Diff #402314)

Could a user be live-out / not a recipe?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
362

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Ah, you're right, I forgot foldTail may operate when unrolling only; it should probably stop doing that, but in a separate patch.

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Follow-up TODO would be fine.

Still would be good to try and simplify the logic here that checks if OriginalIV cannot replace NewIV (but instead a StepVector'd CanonicalIV replaces NewIV): FoldTail must be true given that NewIV exists, so can be asserted or not passed-in; which leaves

Range includes only positive VF's &&
onlyFirstLaneUsed(WidenOriginalIV) &&
onlyScalarsUsed(WidenOriginalIV) &&
NewIV feeds some VPInstruction that isn't ActiveLaneMask
?

fhahn updated this revision to Diff 402978.Jan 25 2022, 11:16 AM

A major update & simplification: the latest version doesn't need a separate step-vector recipe; if a 'step-vector' is needed, the VPWidenCanonicalInduction recipe is retained. It also removes the VPlan-based analysis of the uses in favor of looking up the recorded information added in D118167.

Note that it still checks if all users of the induction are active-lane-masks. The legacy cost-model doesn't know about them, but if it all users are active-lane-masks, no widened canonical IV is needed, only the scalar steps.

fhahn marked 4 inline comments as done.Jan 25 2022, 11:21 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4801

I think for compatibility with existing behavior we still need to retain it, as it impacts the decision whether the scalarize the induction or not. Could be investigated in a follow-up change, once all related patches are through.

llvm/lib/Transforms/Vectorize/VPlan.cpp
769 ↗(On Diff #402314)

This code is now gone in the latest version.

llvm/lib/Transforms/Vectorize/VPlan.h
926 ↗(On Diff #400951)

Will update, but the code has moved to a separate patch again.

1383 ↗(On Diff #402314)

at the moment there's not possible, all users must be a recipe (also the code is gone from this patch)

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
362

The logic here has been simplified with help of D118167

fhahn retitled this revision from [VPlan] Handle IV vector splat as VPInstruction. to [VPlan] Handle IV vector splat using VPWidenCanonicalIV..Jan 26 2022, 4:30 AM
fhahn edited the summary of this revision. (Show Details)
Ayal added inline comments.Jan 26 2022, 11:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2566

OK, this is synonymous with NeedsVectorIV(), capturing it more accurately.

Follows the above comment:

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Have VPWidenCanonicalIVRecipe also support needsScalarIVOnly() - by checking if all it's users are ActiveLaneMask VPInstructions, and needsScalarIV() - by checking if any user is ActiveLaneMask?

Then check

if ((WidenNewIV->needsScalarIV() && WidenOriginalIV->needsScalarIV()) ||
    (!WidenNewIV->needsScalarIVOnly() && !WidenOriginalIV->needsScalarIVOnly())) {
    WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
    WidenNewIV->eraseFromParent();
}

?

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
175–176

Should vp<%2> be vp<[[CAN_IV]]> ?

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153–154

Hmm, this enables sink scalar operands to handle scalar steps.

fhahn updated this revision to Diff 403549.Jan 27 2022, 1:54 AM
fhahn marked 8 inline comments as done.

Addressed latest comments, thanks!

Clarified comment for replacement check for WidenNewIV and WidenOriginalIV, update test to use CAN_IV variable.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2566

Updated D118167 to use needsVectorIV

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Yes, that's a better summary! I updated the comment.

I am not sure about adding needsScalarIVOnly/needsScalarIV to VPWidenCanonicalIVRecipe, as this distinction seems only relevant here and might muddy the waters a bit, as the recipe will always generate a vector value (and I don't think we should change that). I left the check here as is for now, but once D116554 lands this check can be replaced with onlyFirstLaneUsed(WidenNewIV).

I think to wrap things up, it is worth to keep the current order of patches and have the temporary check of WidenNewIV's users here, to simplify the patch ordering.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
175–176

Yep, updated, thanks!

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153–154

I think this is caused by the drop of the check for the scalar VF; we add the VPWidenCanonicalInduction, but the cost model knows that the induction recipe will be scalar only, so we cannot replace the VPWidenCanonicalInduction with the other induction recipe and we have some extra steps. Given that the VF=1,UF>1 is somewhat of an edge case, it seems cleaner to not check for the scalar VF, as suggested earlier.

Ayal accepted this revision.Jan 27 2022, 2:31 AM

This is fine, with a last nit

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357

Very well.
How about just setting here

bool WidenNewIVOnlyFirstLaneUsed = all_of(...);
if (WidenOriginalIV->needsVectorIV() || WidenNewIVOnlyFirstLaneUsed) {
  WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
  WidenNewIV->eraseFromParent();
}

?

This revision is now accepted and ready to land.Jan 27 2022, 2:31 AM
fhahn updated this revision to Diff 403567.Jan 27 2022, 3:02 AM

Thanks Ayal! Moved the all_of check outside the if and assigned to variable.

This change now depends on D118167, which is NFC and adds the needsVectorIV/needsScalarIV required.

fhahn updated this revision to Diff 404274.Jan 29 2022, 8:17 AM

Rebased on current main, I am planning on landing this soon.

This revision was landed with ongoing or failed builds.Jan 29 2022, 8:25 AM
This revision was automatically updated to reflect the committed changes.