This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
ClosedPublic

Authored by fhahn on Dec 21 2021, 11:55 AM.

Details

Summary

This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

  1. First step to avoid setting both vector and scalar values for the same induction def.
  2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan
  3. Only need to splat the vector IV for block in masks.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ayal added inline comments.Dec 26 2021, 9:51 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2790

This last case which only emits a scalar IV can now be folded with the end of the previous case which emits both?

8419

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

At first FoldTail relied solely on PrimaryInduction, until D77635 extended it to work w/o PrimaryInduction; but it should probably have removed the dependence on PrimaryInduction altogether. This may also simplify/revert D92017. A subsequent optimization could try to fold multiple IV recipes, in case both the PrimaryInduction was widened and WidenCanonicalIV was created (or leave it to LSR).

8432

TailFolded should always be true? (While we're here, irrespective of this patch)

8442

No need for {}

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).

8454

StepVector needs to hold One as its second operand, mainly to convey the desired type?

8849

NeedsScalarInduction(VF) will always early-return true given that ShouldScalarizeInstruction(Instr, VF) is true?

9175

As a redundancy elimination optimization?
This buildVPlanWithVPRecipes() method deserves outlining anything from it.

llvm/lib/Transforms/Vectorize/VPlan.h
856

Where are these new classof's needed?

fhahn updated this revision to Diff 396313.Dec 27 2021, 8:00 AM
fhahn marked 13 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2729

I inlined the lambda, as it is a single use now. I also inlined the scalar path from getStepVector, which seems like it should have never been there in the first place.

2774

removed

2788

removed

2790

Done!

8419

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.

8432

Yes, I noticed that as well. I'll push a commit, replacing it with an assert.

8442

No need for {}

I think if the body spans multiple lines due to a comment, braces should not be omitted
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.

BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).

done separately in 2e630eabd329

8454

At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.

8849

I think it is required for the case where Instr is a truncate.

9175

Sounds good, moved to VPlanTransforms::removeRedundantStepVector

llvm/lib/Transforms/Vectorize/VPlan.cpp
741

Changed to get VPIteration(Part, 0). I kept accessing Val outside the loop, to keep a single broadcast. But there's now an assert.

746

Move out of the loop and will push a fix separately.

llvm/lib/Transforms/Vectorize/VPlan.h
856

Unfortunately this is needed to dyn_cast directly from VPUser -> VPInstruction

fhahn added inline comments.Dec 27 2021, 8:37 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8442

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.

I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. But there's no test case without primary inductions that use active.lane.mask. I;ll push one and update the patch.

fhahn added inline comments.Dec 27 2021, 1:26 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8442

I missed something here and IIUC your comment was regarding creating a step vector in both cases at first, right?

The issue with that is that in some cases ActiveLaneMask will use the step-vector instead of the scalar steps, so I left it as is for now.

I just double checked and we may still need to go the route with the widened recipe if there's no primary induction.

The current patch still creates the vector phi when necessary and the existing tests cover that already.

Ayal added inline comments.Dec 27 2021, 2:10 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8442

Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
This patch continues to feed ActiveLaneMask with this IV, but feeds ICmpULE with a StepVector.

8849

Doesn't ShouldScalarizeInstruction(Instr, VF) && NeedsScalarInduction(VF) always equals ShouldScalarizeInstruction(Instr, VF) given that NeedsScalarInduction(VF) returns true if ShouldScalarizeInstruction(Instr, VF) returns true for the same Instr and VF?

fhahn updated this revision to Diff 396422.Dec 28 2021, 12:40 PM

Simplify onlyScalarStepsNeeded a bit.

fhahn added inline comments.Dec 28 2021, 12:46 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8442

Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());

Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.

8849

I think to preserve the existing behaviour the PHI needs to be scalarized and any of its users. I updated the code to make the check a bit more direct.

Ayal added inline comments.Dec 29 2021, 1:57 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2757

(assert !State.VF.isZero()? Indep. of this patch.)

2760

assert outside loop.
This is handling the unrolling-only case, does it matter whether vectors are scalable or not?

2762

Clarifying the three options for VF>1:

  1. Create a vector IV (createVectorIntOrFpInductionPHI)
  2. Create a scalar IV (CreateScalarIV) with per-lane steps (buildScalarSteps)
  3. both 1&2

which complements the above option for VF=1:
0. Create a scalar IV (CreateScalarIV) with per-part steps (CreateSplatIV updated and inlined)

It may be clearer to have

auto NeedsVectorIV = !shouldScalarizeInstruction(EntryVal);
if (NeedsVectorIV)
  createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State);

auto NeedsScalarIV = needsScalarInduction(EntryVal);
if (NeedsScalarIV) {
  Value *ScalarIV = CreateScalarIV(Step);
  buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);
}

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?

2764

State.VF == 1?
Calling getRuntimeVF* seems redundant, StartIdx == (Value*...)Part?

2771

ditto

8419

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.

sure, can also do so separately first. It may help simplify this patch, among other things.

8442

Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.

ok, that confirms 1st Q above, what about the 2nd: Seems like an optimization best applied separately?

8445

(If a VPWidenCanonicalIVRecipe is always constructed then IV will always be set to it here and this construction is not needed?)

8448

Is anything but CanIV expected to feed getCanonicalStepVector?
Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.
Sounds like Plan->getOrCreateCanonicalStepVector()?

Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?

8454

At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.

Have that future patch extend the support from the initial unit step needed here?

8836

If applied as a VPlan2VPlan optimization after VPlans were built according to ranges that have been clamped, suffice to check properties of any VF in Range; can assert they are the same across Range; best prevent clamping Range.

9175

place comment next to call.

9264

Have removeRedundantStepVector() take care of whatever it needs to check internally, rather than passing in this lambda?

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
178

document

llvm/lib/Transforms/Vectorize/VPlan.cpp
741

Val >> a more informative name: CanonicalIV? ScalarFirstLanesOfIV?

The single scalar value of the scalar canonical IV chain can be recorded per-part (0 or all) throughout:
Value *Val = State.get(getOperand(0), 0)?

746

assert(Val == State.get(getOperand(0), Part) && "Val changed for Part"); ?

748

Value *Step = State.get(getOperand(1), Part); ?

758

Does Instruction::Add also work for FP inductions?

llvm/lib/Transforms/Vectorize/VPlan.h
856

document with said need?

2695

There's only a single VPCanonicalIVPHIRecipe per Plan to be passed in, may as well have getCanonicalStepVector() obtain it from Plan?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
344 ↗(On Diff #396422)

Can early-exit before the loop if OnlyScalarStepsNeeded(OriginalCanonicalIV).
Then here check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV).
But perhaps better to check

if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV && !IV->NeedsVectorIV())

as mentioned above.

llvm/lib/Transforms/Vectorize/VPlanValue.h
333

Lex order

fhahn updated this revision to Diff 397079.Jan 3 2022, 9:02 AM
fhahn marked 22 inline comments as done.

Address latest comments, thanks! The most notable changes are always generating VPWidenCanonicalIVRecipes and then try to replace them by either a vector induction or a step-vector in a separate VPlan-to-VPlan transform, if possible.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2757

Done in e2f1c4c7066b

2760

Moved!

This is handling the unrolling-only case, does it matter whether vectors are scalable or not?

Not sure why the assert is here, but I think it should be retained in this patch.

2762

Clarifying the three options for VF>1:

Exactly! I restructured the code as suggested, thanks.

2764

Replaced call.

2771

Replaced call

8419

Update to always generate the widened canonical induction.

8442

I updated the patch to always generate widened induction here, and added a transform to replace it if possible later.

8448

Is anything but CanIV expected to feed getCanonicalStepVector?
Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.

Yes that was the original goal. The code is now gone, the stepvector may be replace the widened canonical IV created unconditional here.

Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?

I think modeling the stepvector separately allows us to re-use it for vector induction increments and optionally defining an additional value would make it a bit harder to first add the stepvector and then optimize it away; with a separate recipe for the stepvector we can simplify replace the uses and remove the step recipe. With multi-defs we would need to either undef one of the multi-defs or replace the multi-def recipe with a single def version.

8454

A stepvector is also needed for incrementing vector inductions and I plan to use the recipe in a later patch, but unfortunately there's nothing to share yet.

8836

The issue here is that for VF == 1 there's no need for a step-vector. We could define step-vector to just produce a scalar for VF = 1 though? Later patches with a separate recipe for scalar-steps would handle this as well.

9175

the comment is stale now, removed

9264

All information to take the decision is already available in the VPlan. We only need to know whether the tail will be folded and check check the users of the induction recipe in the plan.

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
178

Moved definition out of VPRecipeBuilder and removed this here.

llvm/lib/Transforms/Vectorize/VPlan.cpp
746

Unfortunately this won't work as expected in this case, because the operand is an constant wrapped in a VPValue and State is not explicitly set. Currently State.get will broadcast the constant in a vector when using State.get(getOperand(1), Part); and return scalar when using VPIteration(Part, 0).

748

see above.

753

At the moment there's no need for float support here. I added an assert.

758

stepvector is not used for floats yet, added an assert

llvm/lib/Transforms/Vectorize/VPlan.h
2695

This is not needed any longer in the current version of the patch. Removed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
344 ↗(On Diff #396422)

This has been completely removed and replaced by a different transform.

llvm/lib/Transforms/Vectorize/VPlanValue.h
333

This is a leftover from an earlier iteration and has been removed.

fhahn updated this revision to Diff 398643.Jan 10 2022, 7:49 AM

Rebase & ping :)

I also removed the explicit step argument from the patch, as suggested. I think now all outstanding comments should be addressed.

Ayal added inline comments.Jan 12 2022, 5:09 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2768

Set StartIdx to int or fp from its beginning? Indep. of this patch.

2775

Would be good to fold int and fp handlings, indep. of this patch:
Set and use MulOp here as well?
Use CreateBinOp here as well?
Name the instruction "induction" above for fp as well?
OTOH, hanlding Trunc should apply to int only?

8442

Better have the transform in a separate patch, to retain exiting behavior here?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
336 ↗(On Diff #398643)

Phi: VPWidenIntOrFpInductionRecipe represents a phi (assumed to exist, corresponding to OriginalCanonicalIV) but VPWidenCanonicalIVRecipe represents a non-phi and thus appears after the former. Ordering their handling accordingly is more logical? Break as soon as (if) the latter is found?

WidenCan >> WidenNewIV?
WidenInduction >> WidenOriginalIV?
(both are canonical...)

356 ↗(On Diff #398643)

Is this optimization expected to clamp Range.End for the current VPlan during VPlans-for-ranges construction?
Is there a need to (only) know if VPlan's range is scalar? Specifically, if Start is Scalar then End can be asserted to be Start*2, if needed? Suffice the check if all recipes provide scalars only, although faster to check if range is scalar?

358 ↗(On Diff #398643)

vputils::onlyScalarsDemanded()?

365 ↗(On Diff #398643)

vputils::onlyFirstLaneDemanded()?

373 ↗(On Diff #398643)

Place next to above early-exit if !WidenCan?

396 ↗(On Diff #398643)

Scalar Start handled above (w/o retaining WidenCan)?

Ayal added inline comments.Jan 12 2022, 5:09 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
390 ↗(On Diff #398643)

WidenCan is going to be replaced, question is by what; check instead if StepVector is needed (one recipe producing only scalars the other vectors (deserves more review thoughts...)) and if so create it, followed by a single replacing and erasing of WidenCan?

fhahn updated this revision to Diff 399590.Jan 13 2022, 1:43 AM
fhahn marked 12 inline comments as done.

Address latest comments. Pulled in onlyFirstLaneDemanded from D116554. Unfortunately D116554 itself cannot be applied before this patch, because the stepvector transition is required for the use in buildScalarSteps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2768

Landed the change separately in 7ce48be0fd83 . StartIdx depends on Part though, so I left it in the loop for now.

2775

Thanks, I applied the folding and added an assert in 7ce48be0fd83.

8442

Good idea, split off to D117140

llvm/lib/Transforms/Vectorize/VPlan.h
856

Added documentation and also added it for other recipes where needed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
336 ↗(On Diff #398643)

Updated in the split-off review.

356 ↗(On Diff #398643)

It's sufficient to check whether the start is scalar. There's no need to clamp the end.

373 ↗(On Diff #398643)

Moved!

390 ↗(On Diff #398643)

Updated to check if StepVector is needed and sink replace & erase code.

396 ↗(On Diff #398643)

This is not needed in the latest version, removed.

fhahn updated this revision to Diff 400380.Jan 16 2022, 7:34 AM

Introduce VPRecipeBase::onlyFirstLaneUsed & VPRecipeBase::onlyScalarsUsed as suggested by @Ayal in D116554.

fhahn updated this revision to Diff 400951.Jan 18 2022, 12:32 PM

Rebased after latest changes to D117140.

Ayal added inline comments.Jan 19 2022, 8:35 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
1627

Demanded == Used?

llvm/lib/Transforms/Vectorize/VPlan.h
72

Place above next to getRuntimeVF()?

77

(Indep. of this patch, but while we're here:)
"The sequence starts at StartIndex." - redundant?
\p Opcode >> \p BinOp

772

Default is to return false conservatively.

779

ditto

908

The recipes along the canonical IV chain also use first lane only, namely: CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount and VPCanonicalIVPHIRecipe? They are also "onlyFirstPartUsed()", but are unique in that.

1294

Comment that "// Recursing through Blend recipes only, must terminate at header phi's the latest." ?

1631

Already confirmed above that Op is the address?

1634

... of their address.

return Op == getAddr() && isConsecutive(); ?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342 ↗(On Diff #400951)

Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes-built-from-Truncs, so the latter will be excluded if we only visit phis() here? OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

354 ↗(On Diff #400951)

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

fhahn updated this revision to Diff 401290.Jan 19 2022, 9:38 AM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

fhahn updated this revision to Diff 402314.Jan 23 2022, 4:24 AM

Rebase & ping :)

llvm/lib/Transforms/Vectorize/VPlan.cpp
1627

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
72

Moved up, thanks!

77

Adjusted, thanks!

772

extended comment.

779

extended comment.

908

Updated to include CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount opcodes.

1294

Added the comment, thanks!

1634

Simplified as suggested, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342 ↗(On Diff #400951)

OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

I don't think so, as we should widen using the canonical IVs type, which would be the wider type.

354 ↗(On Diff #400951)

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Ayal added inline comments.Jan 25 2022, 12:00 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4856

Wonder if this is still the case now that NewIV is always created to feed the vector compare, until replaced?

llvm/lib/Transforms/Vectorize/VPlan.cpp
749

Assert is redundant given than Step was created above as a ConstantInt - assert instead that Val is integer, before the loop?

llvm/lib/Transforms/Vectorize/VPlan.h
908

Also have VPCanonicalIVPHIRecipe indicate it uses only first lane?

1298

Could a user be live-out / not a recipe?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
354 ↗(On Diff #400951)

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Ah, you're right, I forgot foldTail may operate when unrolling only; it should probably stop doing that, but in a separate patch.

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Follow-up TODO would be fine.

Still would be good to try and simplify the logic here that checks if OriginalIV cannot replace NewIV (but instead a StepVector'd CanonicalIV replaces NewIV): FoldTail must be true given that NewIV exists, so can be asserted or not passed-in; which leaves

Range includes only positive VF's &&
onlyFirstLaneUsed(WidenOriginalIV) &&
onlyScalarsUsed(WidenOriginalIV) &&
NewIV feeds some VPInstruction that isn't ActiveLaneMask
?

fhahn updated this revision to Diff 402978.Jan 25 2022, 11:16 AM

A major update & simplification: the latest version doesn't need a separate step-vector recipe; if a 'step-vector' is needed, the VPWidenCanonicalInduction recipe is retained. It also removes the VPlan-based analysis of the uses in favor of looking up the recorded information added in D118167.

Note that it still checks if all users of the induction are active-lane-masks. The legacy cost-model doesn't know about them, but if it all users are active-lane-masks, no widened canonical IV is needed, only the scalar steps.

fhahn marked 4 inline comments as done.Jan 25 2022, 11:21 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4856

I think for compatibility with existing behavior we still need to retain it, as it impacts the decision whether the scalarize the induction or not. Could be investigated in a follow-up change, once all related patches are through.

llvm/lib/Transforms/Vectorize/VPlan.cpp
749

This code is now gone in the latest version.

llvm/lib/Transforms/Vectorize/VPlan.h
908

Will update, but the code has moved to a separate patch again.

1298

at the moment there's not possible, all users must be a recipe (also the code is gone from this patch)

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
354 ↗(On Diff #400951)

The logic here has been simplified with help of D118167

fhahn retitled this revision from [VPlan] Handle IV vector splat as VPInstruction. to [VPlan] Handle IV vector splat using VPWidenCanonicalIV..Jan 26 2022, 4:30 AM
fhahn edited the summary of this revision. (Show Details)
Ayal added inline comments.Jan 26 2022, 11:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2767

OK, this is synonymous with NeedsVectorIV(), capturing it more accurately.

Follows the above comment:

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356 ↗(On Diff #398643)

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Have VPWidenCanonicalIVRecipe also support needsScalarIVOnly() - by checking if all it's users are ActiveLaneMask VPInstructions, and needsScalarIV() - by checking if any user is ActiveLaneMask?

Then check

if ((WidenNewIV->needsScalarIV() && WidenOriginalIV->needsScalarIV()) ||
    (!WidenNewIV->needsScalarIVOnly() && !WidenOriginalIV->needsScalarIVOnly())) {
    WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
    WidenNewIV->eraseFromParent();
}

?

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
167–168

Should vp<%2> be vp<[[CAN_IV]]> ?

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153 ↗(On Diff #402978)

Hmm, this enables sink scalar operands to handle scalar steps.

fhahn updated this revision to Diff 403549.Jan 27 2022, 1:54 AM
fhahn marked 8 inline comments as done.

Addressed latest comments, thanks!

Clarified comment for replacement check for WidenNewIV and WidenOriginalIV, update test to use CAN_IV variable.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2767

Updated D118167 to use needsVectorIV

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356 ↗(On Diff #398643)

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Yes, that's a better summary! I updated the comment.

I am not sure about adding needsScalarIVOnly/needsScalarIV to VPWidenCanonicalIVRecipe, as this distinction seems only relevant here and might muddy the waters a bit, as the recipe will always generate a vector value (and I don't think we should change that). I left the check here as is for now, but once D116554 lands this check can be replaced with onlyFirstLaneUsed(WidenNewIV).

I think to wrap things up, it is worth to keep the current order of patches and have the temporary check of WidenNewIV's users here, to simplify the patch ordering.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
167–168

Yep, updated, thanks!

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153 ↗(On Diff #402978)

I think this is caused by the drop of the check for the scalar VF; we add the VPWidenCanonicalInduction, but the cost model knows that the induction recipe will be scalar only, so we cannot replace the VPWidenCanonicalInduction with the other induction recipe and we have some extra steps. Given that the VF=1,UF>1 is somewhat of an edge case, it seems cleaner to not check for the scalar VF, as suggested earlier.

Ayal accepted this revision.Jan 27 2022, 2:31 AM

This is fine, with a last nit

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356 ↗(On Diff #398643)

Very well.
How about just setting here

bool WidenNewIVOnlyFirstLaneUsed = all_of(...);
if (WidenOriginalIV->needsVectorIV() || WidenNewIVOnlyFirstLaneUsed) {
  WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
  WidenNewIV->eraseFromParent();
}

?

This revision is now accepted and ready to land.Jan 27 2022, 2:31 AM
fhahn updated this revision to Diff 403567.Jan 27 2022, 3:02 AM

Thanks Ayal! Moved the all_of check outside the if and assigned to variable.

This change now depends on D118167, which is NFC and adds the needsVectorIV/needsScalarIV required.

fhahn updated this revision to Diff 404274.Jan 29 2022, 8:17 AM

Rebased on current main, I am planning on landing this soon.

This revision was landed with ongoing or failed builds.Jan 29 2022, 8:25 AM
This revision was automatically updated to reflect the committed changes.