This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
ClosedPublic

Authored by fhahn on Dec 21 2021, 11:55 AM.

Details

Summary

This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

  1. First step to avoid setting both vector and scalar values for the same induction def.
  2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan
  3. Only need to splat the vector IV for block in masks.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ayal added inline comments.Dec 26 2021, 9:51 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2576–2577

This last case which only emits a scalar IV can now be folded with the end of the previous case which emits both?

8403–8404

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

At first FoldTail relied solely on PrimaryInduction, until D77635 extended it to work w/o PrimaryInduction; but it should probably have removed the dependence on PrimaryInduction altogether. This may also simplify/revert D92017. A subsequent optimization could try to fold multiple IV recipes, in case both the PrimaryInduction was widened and WidenCanonicalIV was created (or leave it to LSR).

8413–8414

TailFolded should always be true? (While we're here, irrespective of this patch)

8417

No need for {}

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).

8428

StepVector needs to hold One as its second operand, mainly to convey the desired type?

8849

NeedsScalarInduction(VF) will always early-return true given that ShouldScalarizeInstruction(Instr, VF) is true?

9159

As a redundancy elimination optimization?
This buildVPlanWithVPRecipes() method deserves outlining anything from it.

llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗(On Diff #395732)

Where are these new classof's needed?

fhahn updated this revision to Diff 396313.Dec 27 2021, 8:00 AM
fhahn marked 13 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2529–2530

I inlined the lambda, as it is a single use now. I also inlined the scalar path from getStepVector, which seems like it should have never been there in the first place.

2568–2569

removed

2576–2577

Done!

2576–2577

removed

8403–8404

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.

8413–8414

Yes, I noticed that as well. I'll push a commit, replacing it with an assert.

8417

No need for {}

I think if the body spans multiple lines due to a comment, braces should not be omitted
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.

BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).

done separately in 2e630eabd329

8428

At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.

8849

I think it is required for the case where Instr is a truncate.

9159

Sounds good, moved to VPlanTransforms::removeRedundantStepVector

llvm/lib/Transforms/Vectorize/VPlan.cpp
741 ↗(On Diff #395732)

Changed to get VPIteration(Part, 0). I kept accessing Val outside the loop, to keep a single broadcast. But there's now an assert.

746 ↗(On Diff #395732)

Move out of the loop and will push a fix separately.

llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗(On Diff #395732)

Unfortunately this is needed to dyn_cast directly from VPUser -> VPInstruction

fhahn added inline comments.Dec 27 2021, 8:37 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8417

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.

I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. But there's no test case without primary inductions that use active.lane.mask. I;ll push one and update the patch.

fhahn added inline comments.Dec 27 2021, 1:26 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8417

I missed something here and IIUC your comment was regarding creating a step vector in both cases at first, right?

The issue with that is that in some cases ActiveLaneMask will use the step-vector instead of the scalar steps, so I left it as is for now.

I just double checked and we may still need to go the route with the widened recipe if there's no primary induction.

The current patch still creates the vector phi when necessary and the existing tests cover that already.

Ayal added inline comments.Dec 27 2021, 2:10 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8417

Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
This patch continues to feed ActiveLaneMask with this IV, but feeds ICmpULE with a StepVector.

8849

Doesn't ShouldScalarizeInstruction(Instr, VF) && NeedsScalarInduction(VF) always equals ShouldScalarizeInstruction(Instr, VF) given that NeedsScalarInduction(VF) returns true if ShouldScalarizeInstruction(Instr, VF) returns true for the same Instr and VF?

fhahn updated this revision to Diff 396422.Dec 28 2021, 12:40 PM

Simplify onlyScalarStepsNeeded a bit.

fhahn added inline comments.Dec 28 2021, 12:46 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8417

Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());

Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.

8849

I think to preserve the existing behaviour the PHI needs to be scalarized and any of its users. I updated the code to make the check a bit more direct.

Ayal added inline comments.Dec 29 2021, 1:57 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2537

(assert !State.VF.isZero()? Indep. of this patch.)

2540

assert outside loop.
This is handling the unrolling-only case, does it matter whether vectors are scalable or not?

2544

State.VF == 1?
Calling getRuntimeVF* seems redundant, StartIdx == (Value*...)Part?

2551

ditto

2564

Clarifying the three options for VF>1:

  1. Create a vector IV (createVectorIntOrFpInductionPHI)
  2. Create a scalar IV (CreateScalarIV) with per-lane steps (buildScalarSteps)
  3. both 1&2

which complements the above option for VF=1:
0. Create a scalar IV (CreateScalarIV) with per-part steps (CreateSplatIV updated and inlined)

It may be clearer to have

auto NeedsVectorIV = !shouldScalarizeInstruction(EntryVal);
if (NeedsVectorIV)
  createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State);

auto NeedsScalarIV = needsScalarInduction(EntryVal);
if (NeedsScalarIV) {
  Value *ScalarIV = CreateScalarIV(Step);
  buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);
}

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?

8403–8404

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.

sure, can also do so separately first. It may help simplify this patch, among other things.

8417

Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.

ok, that confirms 1st Q above, what about the 2nd: Seems like an optimization best applied separately?

8420

(If a VPWidenCanonicalIVRecipe is always constructed then IV will always be set to it here and this construction is not needed?)

8423

Is anything but CanIV expected to feed getCanonicalStepVector?
Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.
Sounds like Plan->getOrCreateCanonicalStepVector()?

Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?

8428

At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.

Have that future patch extend the support from the initial unit step needed here?

8836

If applied as a VPlan2VPlan optimization after VPlans were built according to ranges that have been clamped, suffice to check properties of any VF in Range; can assert they are the same across Range; best prevent clamping Range.

9159

place comment next to call.

9228

Have removeRedundantStepVector() take care of whatever it needs to check internally, rather than passing in this lambda?

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
178 ↗(On Diff #396422)

document

llvm/lib/Transforms/Vectorize/VPlan.cpp
741 ↗(On Diff #395732)

Val >> a more informative name: CanonicalIV? ScalarFirstLanesOfIV?

The single scalar value of the scalar canonical IV chain can be recorded per-part (0 or all) throughout:
Value *Val = State.get(getOperand(0), 0)?

746 ↗(On Diff #395732)

assert(Val == State.get(getOperand(0), Part) && "Val changed for Part"); ?

741 ↗(On Diff #396422)

Value *Step = State.get(getOperand(1), Part); ?

751 ↗(On Diff #396422)

Does Instruction::Add also work for FP inductions?

llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗(On Diff #395732)

document with said need?

2746 ↗(On Diff #396422)

There's only a single VPCanonicalIVPHIRecipe per Plan to be passed in, may as well have getCanonicalStepVector() obtain it from Plan?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
345

Can early-exit before the loop if OnlyScalarStepsNeeded(OriginalCanonicalIV).
Then here check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV).
But perhaps better to check

if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV && !IV->NeedsVectorIV())

as mentioned above.

llvm/lib/Transforms/Vectorize/VPlanValue.h
333 ↗(On Diff #396422)

Lex order

fhahn updated this revision to Diff 397079.Jan 3 2022, 9:02 AM
fhahn marked 22 inline comments as done.

Address latest comments, thanks! The most notable changes are always generating VPWidenCanonicalIVRecipes and then try to replace them by either a vector induction or a step-vector in a separate VPlan-to-VPlan transform, if possible.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2537

Done in e2f1c4c7066b

2540

Moved!

This is handling the unrolling-only case, does it matter whether vectors are scalable or not?

Not sure why the assert is here, but I think it should be retained in this patch.

2544

Replaced call.

2551

Replaced call

2564

Clarifying the three options for VF>1:

Exactly! I restructured the code as suggested, thanks.

8403–8404

Update to always generate the widened canonical induction.

8417

I updated the patch to always generate widened induction here, and added a transform to replace it if possible later.

8423

Is anything but CanIV expected to feed getCanonicalStepVector?
Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.

Yes that was the original goal. The code is now gone, the stepvector may be replace the widened canonical IV created unconditional here.

Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?

I think modeling the stepvector separately allows us to re-use it for vector induction increments and optionally defining an additional value would make it a bit harder to first add the stepvector and then optimize it away; with a separate recipe for the stepvector we can simplify replace the uses and remove the step recipe. With multi-defs we would need to either undef one of the multi-defs or replace the multi-def recipe with a single def version.

8428

A stepvector is also needed for incrementing vector inductions and I plan to use the recipe in a later patch, but unfortunately there's nothing to share yet.

8836

The issue here is that for VF == 1 there's no need for a step-vector. We could define step-vector to just produce a scalar for VF = 1 though? Later patches with a separate recipe for scalar-steps would handle this as well.

9159

the comment is stale now, removed

9228

All information to take the decision is already available in the VPlan. We only need to know whether the tail will be folded and check check the users of the induction recipe in the plan.

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
178 ↗(On Diff #396422)

Moved definition out of VPRecipeBuilder and removed this here.

llvm/lib/Transforms/Vectorize/VPlan.cpp
746 ↗(On Diff #395732)

Unfortunately this won't work as expected in this case, because the operand is an constant wrapped in a VPValue and State is not explicitly set. Currently State.get will broadcast the constant in a vector when using State.get(getOperand(1), Part); and return scalar when using VPIteration(Part, 0).

753 ↗(On Diff #395732)

At the moment there's no need for float support here. I added an assert.

741 ↗(On Diff #396422)

see above.

751 ↗(On Diff #396422)

stepvector is not used for floats yet, added an assert

llvm/lib/Transforms/Vectorize/VPlan.h
2746 ↗(On Diff #396422)

This is not needed any longer in the current version of the patch. Removed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
345

This has been completely removed and replaced by a different transform.

llvm/lib/Transforms/Vectorize/VPlanValue.h
333 ↗(On Diff #396422)

This is a leftover from an earlier iteration and has been removed.

fhahn updated this revision to Diff 398643.Jan 10 2022, 7:49 AM

Rebase & ping :)

I also removed the explicit step argument from the patch, as suggested. I think now all outstanding comments should be addressed.

Ayal added inline comments.Jan 12 2022, 5:09 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2570

Set StartIdx to int or fp from its beginning? Indep. of this patch.

2577

Would be good to fold int and fp handlings, indep. of this patch:
Set and use MulOp here as well?
Use CreateBinOp here as well?
Name the instruction "induction" above for fp as well?
OTOH, hanlding Trunc should apply to int only?

8417

Better have the transform in a separate patch, to retain exiting behavior here?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
337

Phi: VPWidenIntOrFpInductionRecipe represents a phi (assumed to exist, corresponding to OriginalCanonicalIV) but VPWidenCanonicalIVRecipe represents a non-phi and thus appears after the former. Ordering their handling accordingly is more logical? Break as soon as (if) the latter is found?

WidenCan >> WidenNewIV?
WidenInduction >> WidenOriginalIV?
(both are canonical...)

357

Is this optimization expected to clamp Range.End for the current VPlan during VPlans-for-ranges construction?
Is there a need to (only) know if VPlan's range is scalar? Specifically, if Start is Scalar then End can be asserted to be Start*2, if needed? Suffice the check if all recipes provide scalars only, although faster to check if range is scalar?

359

vputils::onlyScalarsDemanded()?

366

vputils::onlyFirstLaneDemanded()?

374

Place next to above early-exit if !WidenCan?

397

Scalar Start handled above (w/o retaining WidenCan)?

Ayal added inline comments.Jan 12 2022, 5:09 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
391

WidenCan is going to be replaced, question is by what; check instead if StepVector is needed (one recipe producing only scalars the other vectors (deserves more review thoughts...)) and if so create it, followed by a single replacing and erasing of WidenCan?

fhahn updated this revision to Diff 399590.Jan 13 2022, 1:43 AM
fhahn marked 12 inline comments as done.

Address latest comments. Pulled in onlyFirstLaneDemanded from D116554. Unfortunately D116554 itself cannot be applied before this patch, because the stepvector transition is required for the use in buildScalarSteps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2570

Landed the change separately in 7ce48be0fd83 . StartIdx depends on Part though, so I left it in the loop for now.

2577

Thanks, I applied the folding and added an assert in 7ce48be0fd83.

8417

Good idea, split off to D117140

llvm/lib/Transforms/Vectorize/VPlan.h
856 ↗(On Diff #395732)

Added documentation and also added it for other recipes where needed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
337

Updated in the split-off review.

357

It's sufficient to check whether the start is scalar. There's no need to clamp the end.

374

Moved!

391

Updated to check if StepVector is needed and sink replace & erase code.

397

This is not needed in the latest version, removed.

fhahn updated this revision to Diff 400380.Jan 16 2022, 7:34 AM

Introduce VPRecipeBase::onlyFirstLaneUsed & VPRecipeBase::onlyScalarsUsed as suggested by @Ayal in D116554.

fhahn updated this revision to Diff 400951.Jan 18 2022, 12:32 PM

Rebased after latest changes to D117140.

Ayal added inline comments.Jan 19 2022, 8:35 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
1674 ↗(On Diff #400951)

Demanded == Used?

llvm/lib/Transforms/Vectorize/VPlan.h
75 ↗(On Diff #400951)

Place above next to getRuntimeVF()?

80 ↗(On Diff #400951)

(Indep. of this patch, but while we're here:)
"The sequence starts at StartIndex." - redundant?
\p Opcode >> \p BinOp

773 ↗(On Diff #400951)

Default is to return false conservatively.

780 ↗(On Diff #400951)

ditto

926 ↗(On Diff #400951)

The recipes along the canonical IV chain also use first lane only, namely: CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount and VPCanonicalIVPHIRecipe? They are also "onlyFirstPartUsed()", but are unique in that.

1364 ↗(On Diff #400951)

Comment that "// Recursing through Blend recipes only, must terminate at header phi's the latest." ?

1744 ↗(On Diff #400951)

Already confirmed above that Op is the address?

1747 ↗(On Diff #400951)

... of their address.

return Op == getAddr() && isConsecutive(); ?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342

Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes-built-from-Truncs, so the latter will be excluded if we only visit phis() here? OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

362

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

fhahn updated this revision to Diff 401290.Jan 19 2022, 9:38 AM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

fhahn updated this revision to Diff 402314.Jan 23 2022, 4:24 AM

Rebase & ping :)

llvm/lib/Transforms/Vectorize/VPlan.cpp
1674 ↗(On Diff #400951)

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
75 ↗(On Diff #400951)

Moved up, thanks!

80 ↗(On Diff #400951)

Adjusted, thanks!

773 ↗(On Diff #400951)

extended comment.

780 ↗(On Diff #400951)

extended comment.

926 ↗(On Diff #400951)

Updated to include CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount opcodes.

1364 ↗(On Diff #400951)

Added the comment, thanks!

1747 ↗(On Diff #400951)

Simplified as suggested, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
342

OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

I don't think so, as we should widen using the canonical IVs type, which would be the wider type.

362

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Ayal added inline comments.Jan 25 2022, 12:00 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4801

Wonder if this is still the case now that NewIV is always created to feed the vector compare, until replaced?

llvm/lib/Transforms/Vectorize/VPlan.cpp
769 ↗(On Diff #402314)

Assert is redundant given than Step was created above as a ConstantInt - assert instead that Val is integer, before the loop?

llvm/lib/Transforms/Vectorize/VPlan.h
926 ↗(On Diff #400951)

Also have VPCanonicalIVPHIRecipe indicate it uses only first lane?

1383 ↗(On Diff #402314)

Could a user be live-out / not a recipe?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
362

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Ah, you're right, I forgot foldTail may operate when unrolling only; it should probably stop doing that, but in a separate patch.

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Follow-up TODO would be fine.

Still would be good to try and simplify the logic here that checks if OriginalIV cannot replace NewIV (but instead a StepVector'd CanonicalIV replaces NewIV): FoldTail must be true given that NewIV exists, so can be asserted or not passed-in; which leaves

Range includes only positive VF's &&
onlyFirstLaneUsed(WidenOriginalIV) &&
onlyScalarsUsed(WidenOriginalIV) &&
NewIV feeds some VPInstruction that isn't ActiveLaneMask
?

fhahn updated this revision to Diff 402978.Jan 25 2022, 11:16 AM

A major update & simplification: the latest version doesn't need a separate step-vector recipe; if a 'step-vector' is needed, the VPWidenCanonicalInduction recipe is retained. It also removes the VPlan-based analysis of the uses in favor of looking up the recorded information added in D118167.

Note that it still checks if all users of the induction are active-lane-masks. The legacy cost-model doesn't know about them, but if it all users are active-lane-masks, no widened canonical IV is needed, only the scalar steps.

fhahn marked 4 inline comments as done.Jan 25 2022, 11:21 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4801

I think for compatibility with existing behavior we still need to retain it, as it impacts the decision whether the scalarize the induction or not. Could be investigated in a follow-up change, once all related patches are through.

llvm/lib/Transforms/Vectorize/VPlan.cpp
769 ↗(On Diff #402314)

This code is now gone in the latest version.

llvm/lib/Transforms/Vectorize/VPlan.h
926 ↗(On Diff #400951)

Will update, but the code has moved to a separate patch again.

1383 ↗(On Diff #402314)

at the moment there's not possible, all users must be a recipe (also the code is gone from this patch)

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
362

The logic here has been simplified with help of D118167

fhahn retitled this revision from [VPlan] Handle IV vector splat as VPInstruction. to [VPlan] Handle IV vector splat using VPWidenCanonicalIV..Jan 26 2022, 4:30 AM
fhahn edited the summary of this revision. (Show Details)
Ayal added inline comments.Jan 26 2022, 11:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2566

OK, this is synonymous with NeedsVectorIV(), capturing it more accurately.

Follows the above comment:

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Have VPWidenCanonicalIVRecipe also support needsScalarIVOnly() - by checking if all it's users are ActiveLaneMask VPInstructions, and needsScalarIV() - by checking if any user is ActiveLaneMask?

Then check

if ((WidenNewIV->needsScalarIV() && WidenOriginalIV->needsScalarIV()) ||
    (!WidenNewIV->needsScalarIVOnly() && !WidenOriginalIV->needsScalarIVOnly())) {
    WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
    WidenNewIV->eraseFromParent();
}

?

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
175–176

Should vp<%2> be vp<[[CAN_IV]]> ?

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153–154

Hmm, this enables sink scalar operands to handle scalar steps.

fhahn updated this revision to Diff 403549.Jan 27 2022, 1:54 AM
fhahn marked 8 inline comments as done.

Addressed latest comments, thanks!

Clarified comment for replacement check for WidenNewIV and WidenOriginalIV, update test to use CAN_IV variable.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2566

Updated D118167 to use needsVectorIV

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Yes, that's a better summary! I updated the comment.

I am not sure about adding needsScalarIVOnly/needsScalarIV to VPWidenCanonicalIVRecipe, as this distinction seems only relevant here and might muddy the waters a bit, as the recipe will always generate a vector value (and I don't think we should change that). I left the check here as is for now, but once D116554 lands this check can be replaced with onlyFirstLaneUsed(WidenNewIV).

I think to wrap things up, it is worth to keep the current order of patches and have the temporary check of WidenNewIV's users here, to simplify the patch ordering.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
175–176

Yep, updated, thanks!

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153–154

I think this is caused by the drop of the check for the scalar VF; we add the VPWidenCanonicalInduction, but the cost model knows that the induction recipe will be scalar only, so we cannot replace the VPWidenCanonicalInduction with the other induction recipe and we have some extra steps. Given that the VF=1,UF>1 is somewhat of an edge case, it seems cleaner to not check for the scalar VF, as suggested earlier.

Ayal accepted this revision.Jan 27 2022, 2:31 AM

This is fine, with a last nit

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
357

Very well.
How about just setting here

bool WidenNewIVOnlyFirstLaneUsed = all_of(...);
if (WidenOriginalIV->needsVectorIV() || WidenNewIVOnlyFirstLaneUsed) {
  WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
  WidenNewIV->eraseFromParent();
}

?

This revision is now accepted and ready to land.Jan 27 2022, 2:31 AM
fhahn updated this revision to Diff 403567.Jan 27 2022, 3:02 AM

Thanks Ayal! Moved the all_of check outside the if and assigned to variable.

This change now depends on D118167, which is NFC and adds the needsVectorIV/needsScalarIV required.

fhahn updated this revision to Diff 404274.Jan 29 2022, 8:17 AM

Rebased on current main, I am planning on landing this soon.

This revision was landed with ongoing or failed builds.Jan 29 2022, 8:25 AM
This revision was automatically updated to reflect the committed changes.