This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
ClosedPublic

Authored by fhahn on Dec 21 2021, 11:55 AM.

Details

Summary

This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

  1. First step to avoid setting both vector and scalar values for the same induction def.
  2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan
  3. Only need to splat the vector IV for block in masks.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ayal added inline comments.Dec 26 2021, 9:51 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2587

This last case which only emits a scalar IV can now be folded with the end of the previous case which emits both?

8377

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

At first FoldTail relied solely on PrimaryInduction, until D77635 extended it to work w/o PrimaryInduction; but it should probably have removed the dependence on PrimaryInduction altogether. This may also simplify/revert D92017. A subsequent optimization could try to fold multiple IV recipes, in case both the PrimaryInduction was widened and WidenCanonicalIV was created (or leave it to LSR).

8389

TailFolded should always be true? (While we're here, irrespective of this patch)

8394

No need for {}

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).

8408

StepVector needs to hold One as its second operand, mainly to convey the desired type?

8804

NeedsScalarInduction(VF) will always early-return true given that ShouldScalarizeInstruction(Instr, VF) is true?

9125

As a redundancy elimination optimization?
This buildVPlanWithVPRecipes() method deserves outlining anything from it.

llvm/lib/Transforms/Vectorize/VPlan.h
857

Where are these new classof's needed?

fhahn updated this revision to Diff 396313.Dec 27 2021, 8:00 AM
fhahn marked 13 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2535–2536

I inlined the lambda, as it is a single use now. I also inlined the scalar path from getStepVector, which seems like it should have never been there in the first place.

2577–2578

removed

2582–2585

removed

2587

Done!

8377

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.

8389

Yes, I noticed that as well. I'll push a commit, replacing it with an assert.

8394

No need for {}

I think if the body spans multiple lines due to a comment, braces should not be omitted
https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.

BTC is no longer used by ActiveLaneMask, so setting it should sink below to where ICmpULE needs it (While we're here, irrespective of this patch).

done separately in 2e630eabd329

8408

At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.

8804

I think it is required for the case where Instr is a truncate.

9125

Sounds good, moved to VPlanTransforms::removeRedundantStepVector

llvm/lib/Transforms/Vectorize/VPlan.cpp
734

Changed to get VPIteration(Part, 0). I kept accessing Val outside the loop, to keep a single broadcast. But there's now an assert.

739

Move out of the loop and will push a fix separately.

llvm/lib/Transforms/Vectorize/VPlan.h
857

Unfortunately this is needed to dyn_cast directly from VPUser -> VPInstruction

fhahn added inline comments.Dec 27 2021, 8:37 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8394

So a StepVector is needed only to feed ICmpULE, not ActiveLaneMask which depends on first lane only? Seems like an optimization best applied separately?

Yes, let me try to split that off. ActiveLaneMask only needs the first scalar step of each Part.

I just double checked and we may still need to go the route with the widened recipe if there's no primary induction. But there's no test case without primary inductions that use active.lane.mask. I;ll push one and update the patch.

fhahn added inline comments.Dec 27 2021, 1:26 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8394

I missed something here and IIUC your comment was regarding creating a step vector in both cases at first, right?

The issue with that is that in some cases ActiveLaneMask will use the step-vector instead of the scalar steps, so I left it as is for now.

I just double checked and we may still need to go the route with the widened recipe if there's no primary induction.

The current patch still creates the vector phi when necessary and the existing tests cover that already.

Ayal added inline comments.Dec 27 2021, 2:10 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8394

Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());
This patch continues to feed ActiveLaneMask with this IV, but feeds ICmpULE with a StepVector.

8804

Doesn't ShouldScalarizeInstruction(Instr, VF) && NeedsScalarInduction(VF) always equals ShouldScalarizeInstruction(Instr, VF) given that NeedsScalarInduction(VF) returns true if ShouldScalarizeInstruction(Instr, VF) returns true for the same Instr and VF?

fhahn updated this revision to Diff 396422.Dec 28 2021, 12:40 PM

Simplify onlyScalarStepsNeeded a bit.

fhahn added inline comments.Dec 28 2021, 12:46 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8394

Yes, my comment refers to the fact that currently both ActiveLaneMask and ICmpULE are treated the same when Legal has a PrimaryInduction() - both use the same IV = Plan->getOrAddVPValue(Legal->getPrimaryInduction());

Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.

8804

I think to preserve the existing behaviour the PHI needs to be scalarized and any of its users. I updated the code to make the check a bit more direct.

Ayal added inline comments.Dec 29 2021, 1:57 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2543

(assert !State.VF.isZero()? Indep. of this patch.)

2546

assert outside loop.
This is handling the unrolling-only case, does it matter whether vectors are scalable or not?

2550

State.VF == 1?
Calling getRuntimeVF* seems redundant, StartIdx == (Value*...)Part?

2557

ditto

2568

Clarifying the three options for VF>1:

  1. Create a vector IV (createVectorIntOrFpInductionPHI)
  2. Create a scalar IV (CreateScalarIV) with per-lane steps (buildScalarSteps)
  3. both 1&2

which complements the above option for VF=1:
0. Create a scalar IV (CreateScalarIV) with per-part steps (CreateSplatIV updated and inlined)

It may be clearer to have

auto NeedsVectorIV = !shouldScalarizeInstruction(EntryVal);
if (NeedsVectorIV)
  createVectorIntOrFpInductionPHI(ID, Step, Start, EntryVal, Def, State);

auto NeedsScalarIV = needsScalarInduction(EntryVal);
if (NeedsScalarIV) {
  Value *ScalarIV = CreateScalarIV(Step);
  buildScalarSteps(ScalarIV, Step, EntryVal, ID, Def, State);
}

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?

8377

Could we simplify by always creating a VPWidenCanonicalIVRecipe, w/o trying to use Legal->getPrimaryInduction() even when it exists?

We could do, but it would probably mean a bigger test diff and a bigger functional change. I'd prefer to do it separately, to avoid any unexpected knock-on effects.

sure, can also do so separately first. It may help simplify this patch, among other things.

8394

Ah I see. Yes, currently they both use the same VPValue, because the single recipe creates both the scalar steps and the step vector. But ActiveLaneMask only needs the steps and ICmpULE only needs the step vector. This is explicit now in the code as a result of breaking this part of the phi handling into distinct pieces.

ok, that confirms 1st Q above, what about the 2nd: Seems like an optimization best applied separately?

8400

(If a VPWidenCanonicalIVRecipe is always constructed then IV will always be set to it here and this construction is not needed?)

8403

Is anything but CanIV expected to feed getCanonicalStepVector?
Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.
Sounds like Plan->getOrCreateCanonicalStepVector()?

Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?

8408

At the moment only a step of one is used, but it will be used with different steps in the future (build scalar steps for inductions with arbitrary steps.

Have that future patch extend the support from the initial unit step needed here?

8791

If applied as a VPlan2VPlan optimization after VPlans were built according to ranges that have been clamped, suffice to check properties of any VF in Range; can assert they are the same across Range; best prevent clamping Range.

9125

place comment next to call.

9196

Have removeRedundantStepVector() take care of whatever it needs to check internally, rather than passing in this lambda?

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
178

document

llvm/lib/Transforms/Vectorize/VPlan.cpp
734

Val >> a more informative name: CanonicalIV? ScalarFirstLanesOfIV?

The single scalar value of the scalar canonical IV chain can be recorded per-part (0 or all) throughout:
Value *Val = State.get(getOperand(0), 0)?

739

assert(Val == State.get(getOperand(0), Part) && "Val changed for Part"); ?

741

Value *Step = State.get(getOperand(1), Part); ?

751

Does Instruction::Add also work for FP inductions?

llvm/lib/Transforms/Vectorize/VPlan.h
857

document with said need?

2746

There's only a single VPCanonicalIVPHIRecipe per Plan to be passed in, may as well have getCanonicalStepVector() obtain it from Plan?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
344

Can early-exit before the loop if OnlyScalarStepsNeeded(OriginalCanonicalIV).
Then here check if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV).
But perhaps better to check

if (IV && IV->getUnderlyingValue() == OriginalCanonicalIV && !IV->NeedsVectorIV())

as mentioned above.

llvm/lib/Transforms/Vectorize/VPlanValue.h
333

Lex order

fhahn updated this revision to Diff 397079.Jan 3 2022, 9:02 AM
fhahn marked 22 inline comments as done.

Address latest comments, thanks! The most notable changes are always generating VPWidenCanonicalIVRecipes and then try to replace them by either a vector induction or a step-vector in a separate VPlan-to-VPlan transform, if possible.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2543

Done in e2f1c4c7066b

2546

Moved!

This is handling the unrolling-only case, does it matter whether vectors are scalable or not?

Not sure why the assert is here, but I think it should be retained in this patch.

2550

Replaced call.

2557

Replaced call

2568

Clarifying the three options for VF>1:

Exactly! I restructured the code as suggested, thanks.

8377

Update to always generate the widened canonical induction.

8394

I updated the patch to always generate widened induction here, and added a transform to replace it if possible later.

8403

Is anything but CanIV expected to feed getCanonicalStepVector?
Does it aim to cope with multiple attempts to obtain CanIV's StepVector? It only feeds a single ICmpULE.

Yes that was the original goal. The code is now gone, the stepvector may be replace the widened canonical IV created unconditional here.

Can alternatively have VPCanonicalIVPhiRecipe define two VPValues: one being the scalar-per-part providing lane 0, and the other (optionally optional) providing its vector steps?

I think modeling the stepvector separately allows us to re-use it for vector induction increments and optionally defining an additional value would make it a bit harder to first add the stepvector and then optimize it away; with a separate recipe for the stepvector we can simplify replace the uses and remove the step recipe. With multi-defs we would need to either undef one of the multi-defs or replace the multi-def recipe with a single def version.

8408

A stepvector is also needed for incrementing vector inductions and I plan to use the recipe in a later patch, but unfortunately there's nothing to share yet.

8791

The issue here is that for VF == 1 there's no need for a step-vector. We could define step-vector to just produce a scalar for VF = 1 though? Later patches with a separate recipe for scalar-steps would handle this as well.

9125

the comment is stale now, removed

9196

All information to take the decision is already available in the VPlan. We only need to know whether the tail will be folded and check check the users of the induction recipe in the plan.

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
178

Moved definition out of VPRecipeBuilder and removed this here.

llvm/lib/Transforms/Vectorize/VPlan.cpp
739

Unfortunately this won't work as expected in this case, because the operand is an constant wrapped in a VPValue and State is not explicitly set. Currently State.get will broadcast the constant in a vector when using State.get(getOperand(1), Part); and return scalar when using VPIteration(Part, 0).

741

see above.

746

At the moment there's no need for float support here. I added an assert.

751

stepvector is not used for floats yet, added an assert

llvm/lib/Transforms/Vectorize/VPlan.h
2746

This is not needed any longer in the current version of the patch. Removed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
344

This has been completely removed and replaced by a different transform.

llvm/lib/Transforms/Vectorize/VPlanValue.h
333

This is a leftover from an earlier iteration and has been removed.

fhahn updated this revision to Diff 398643.Jan 10 2022, 7:49 AM

Rebase & ping :)

I also removed the explicit step argument from the patch, as suggested. I think now all outstanding comments should be addressed.

Ayal added inline comments.Jan 12 2022, 5:09 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2574

Set StartIdx to int or fp from its beginning? Indep. of this patch.

2581

Would be good to fold int and fp handlings, indep. of this patch:
Set and use MulOp here as well?
Use CreateBinOp here as well?
Name the instruction "induction" above for fp as well?
OTOH, hanlding Trunc should apply to int only?

8394

Better have the transform in a separate patch, to retain exiting behavior here?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
336

Phi: VPWidenIntOrFpInductionRecipe represents a phi (assumed to exist, corresponding to OriginalCanonicalIV) but VPWidenCanonicalIVRecipe represents a non-phi and thus appears after the former. Ordering their handling accordingly is more logical? Break as soon as (if) the latter is found?

WidenCan >> WidenNewIV?
WidenInduction >> WidenOriginalIV?
(both are canonical...)

356

Is this optimization expected to clamp Range.End for the current VPlan during VPlans-for-ranges construction?
Is there a need to (only) know if VPlan's range is scalar? Specifically, if Start is Scalar then End can be asserted to be Start*2, if needed? Suffice the check if all recipes provide scalars only, although faster to check if range is scalar?

358

vputils::onlyScalarsDemanded()?

365

vputils::onlyFirstLaneDemanded()?

373

Place next to above early-exit if !WidenCan?

396

Scalar Start handled above (w/o retaining WidenCan)?

Ayal added inline comments.Jan 12 2022, 5:09 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
390

WidenCan is going to be replaced, question is by what; check instead if StepVector is needed (one recipe producing only scalars the other vectors (deserves more review thoughts...)) and if so create it, followed by a single replacing and erasing of WidenCan?

fhahn updated this revision to Diff 399590.Jan 13 2022, 1:43 AM
fhahn marked 12 inline comments as done.

Address latest comments. Pulled in onlyFirstLaneDemanded from D116554. Unfortunately D116554 itself cannot be applied before this patch, because the stepvector transition is required for the use in buildScalarSteps.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2574

Landed the change separately in 7ce48be0fd83 . StartIdx depends on Part though, so I left it in the loop for now.

2581

Thanks, I applied the folding and added an assert in 7ce48be0fd83.

8394

Good idea, split off to D117140

llvm/lib/Transforms/Vectorize/VPlan.h
857

Added documentation and also added it for other recipes where needed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
336

Updated in the split-off review.

356

It's sufficient to check whether the start is scalar. There's no need to clamp the end.

373

Moved!

390

Updated to check if StepVector is needed and sink replace & erase code.

396

This is not needed in the latest version, removed.

fhahn updated this revision to Diff 400380.Jan 16 2022, 7:34 AM

Introduce VPRecipeBase::onlyFirstLaneUsed & VPRecipeBase::onlyScalarsUsed as suggested by @Ayal in D116554.

fhahn updated this revision to Diff 400951.Jan 18 2022, 12:32 PM

Rebased after latest changes to D117140.

Ayal added inline comments.Jan 19 2022, 8:35 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
1643

Demanded == Used?

llvm/lib/Transforms/Vectorize/VPlan.h
75

Place above next to getRuntimeVF()?

80

(Indep. of this patch, but while we're here:)
"The sequence starts at StartIndex." - redundant?
\p Opcode >> \p BinOp

773

Default is to return false conservatively.

780

ditto

909

The recipes along the canonical IV chain also use first lane only, namely: CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount and VPCanonicalIVPHIRecipe? They are also "onlyFirstPartUsed()", but are unique in that.

1317

Comment that "// Recursing through Blend recipes only, must terminate at header phi's the latest." ?

1654

Already confirmed above that Op is the address?

1657

... of their address.

return Op == getAddr() && isConsecutive(); ?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
326–350

Ahh, removeRedundantInductionCasts() is called before hoisting VPWidenIntOrFpInductionRecipes-built-from-Truncs, so the latter will be excluded if we only visit phis() here? OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

326–350

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

fhahn updated this revision to Diff 401290.Jan 19 2022, 9:38 AM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

fhahn updated this revision to Diff 402314.Jan 23 2022, 4:24 AM

Rebase & ping :)

llvm/lib/Transforms/Vectorize/VPlan.cpp
1643

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
75

Moved up, thanks!

80

Adjusted, thanks!

773

extended comment.

780

extended comment.

909

Updated to include CanonicalIVIncrement, CanonicalIVIncrementNUW, BranchOnCount opcodes.

1317

Added the comment, thanks!

1657

Simplified as suggested, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
326–350

OTOH, are such truncated IV's candidates for folding with WidenCanonicalIV in terms of matching types?

I don't think so, as we should widen using the canonical IVs type, which would be the wider type.

326–350

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Ayal added inline comments.Jan 25 2022, 12:00 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4821

Wonder if this is still the case now that NewIV is always created to feed the vector compare, until replaced?

llvm/lib/Transforms/Vectorize/VPlan.cpp
742

Assert is redundant given than Step was created above as a ConstantInt - assert instead that Val is integer, before the loop?

llvm/lib/Transforms/Vectorize/VPlan.h
909

Also have VPCanonicalIVPHIRecipe indicate it uses only first lane?

1321

Could a user be live-out / not a recipe?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
326–350

The VPWidenCanonicalIVRecipe WidenNewIV is created only to feed the ICmpULE or ActiveLane, i.e., when vectorizing with fold tail, so can assert(!Range.Start.isScalar && FoldTail) if needed below?

I might be missing something, but I think It is still possible to fold the tail with VF=1 and UF>1, so I left the check (e.g. in llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll)

Ah, you're right, I forgot foldTail may operate when unrolling only; it should probably stop doing that, but in a separate patch.

Scalar values are demanded from WidenNewIV only if it feeds ActiveLane, which demands only the first lane, so broadcasting and adding building scalar steps for each lane seems redundant here? Suffice to feed ActiveLane with UF parts each holding first lane only, either constructed via a simplified StepVector from canonicalIV or perhaps have the canonicalIV supply them directly instead of broadcasting its Part 0 value across all parts (which now seems potentially misleading)?

Yes, for ActiveLane we only need the first scalar part of each lane. I think it might be better to have a special version of build-scalar-steps that only generates the first lane as follow-up to the build-scalar-steps patches.

Follow-up TODO would be fine.

Still would be good to try and simplify the logic here that checks if OriginalIV cannot replace NewIV (but instead a StepVector'd CanonicalIV replaces NewIV): FoldTail must be true given that NewIV exists, so can be asserted or not passed-in; which leaves

Range includes only positive VF's &&
onlyFirstLaneUsed(WidenOriginalIV) &&
onlyScalarsUsed(WidenOriginalIV) &&
NewIV feeds some VPInstruction that isn't ActiveLaneMask
?

fhahn updated this revision to Diff 402978.Jan 25 2022, 11:16 AM

A major update & simplification: the latest version doesn't need a separate step-vector recipe; if a 'step-vector' is needed, the VPWidenCanonicalInduction recipe is retained. It also removes the VPlan-based analysis of the uses in favor of looking up the recorded information added in D118167.

Note that it still checks if all users of the induction are active-lane-masks. The legacy cost-model doesn't know about them, but if it all users are active-lane-masks, no widened canonical IV is needed, only the scalar steps.

fhahn marked 4 inline comments as done.Jan 25 2022, 11:21 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4821

I think for compatibility with existing behavior we still need to retain it, as it impacts the decision whether the scalarize the induction or not. Could be investigated in a follow-up change, once all related patches are through.

llvm/lib/Transforms/Vectorize/VPlan.cpp
742

This code is now gone in the latest version.

llvm/lib/Transforms/Vectorize/VPlan.h
909

Will update, but the code has moved to a separate patch again.

1321

at the moment there's not possible, all users must be a recipe (also the code is gone from this patch)

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
326–350

The logic here has been simplified with help of D118167

fhahn retitled this revision from [VPlan] Handle IV vector splat as VPInstruction. to [VPlan] Handle IV vector splat using VPWidenCanonicalIV..Jan 26 2022, 4:30 AM
fhahn edited the summary of this revision. (Show Details)
Ayal added inline comments.Jan 26 2022, 11:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2573

OK, this is synonymous with NeedsVectorIV(), capturing it more accurately.

Follows the above comment:

Perhaps VPWidenIntOrFpInductionRecipe should know of NeedsScalarIV and NeedsVectorIV (names to be improved), pass them to ILV->widenIntOrFpInduction() and to others - see removeRedundantStepVector below?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Have VPWidenCanonicalIVRecipe also support needsScalarIVOnly() - by checking if all it's users are ActiveLaneMask VPInstructions, and needsScalarIV() - by checking if any user is ActiveLaneMask?

Then check

if ((WidenNewIV->needsScalarIV() && WidenOriginalIV->needsScalarIV()) ||
    (!WidenNewIV->needsScalarIVOnly() && !WidenOriginalIV->needsScalarIVOnly())) {
    WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
    WidenNewIV->eraseFromParent();
}

?

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
167–168

Should vp<%2> be vp<[[CAN_IV]]> ?

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153 ↗(On Diff #402978)

Hmm, this enables sink scalar operands to handle scalar steps.

fhahn updated this revision to Diff 403549.Jan 27 2022, 1:54 AM
fhahn marked 8 inline comments as done.

Addressed latest comments, thanks!

Clarified comment for replacement check for WidenNewIV and WidenOriginalIV, update test to use CAN_IV variable.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2573

Updated D118167 to use needsVectorIV

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356

Does the following summarize the logic here: Original and New IV's can each provide either scalar only, vector only, or both. Original can replace New iff it provides whatever New needs to provide.

Yes, that's a better summary! I updated the comment.

I am not sure about adding needsScalarIVOnly/needsScalarIV to VPWidenCanonicalIVRecipe, as this distinction seems only relevant here and might muddy the waters a bit, as the recipe will always generate a vector value (and I don't think we should change that). I left the check here as is for now, but once D116554 lands this check can be replaced with onlyFirstLaneUsed(WidenNewIV).

I think to wrap things up, it is worth to keep the current order of patches and have the temporary check of WidenNewIV's users here, to simplify the patch ordering.

llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll
167–168

Yep, updated, thanks!

llvm/test/Transforms/LoopVectorize/pr45679-fold-tail-by-masking.ll
153 ↗(On Diff #402978)

I think this is caused by the drop of the check for the scalar VF; we add the VPWidenCanonicalInduction, but the cost model knows that the induction recipe will be scalar only, so we cannot replace the VPWidenCanonicalInduction with the other induction recipe and we have some extra steps. Given that the VF=1,UF>1 is somewhat of an edge case, it seems cleaner to not check for the scalar VF, as suggested earlier.

Ayal accepted this revision.Jan 27 2022, 2:31 AM

This is fine, with a last nit

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
356

Very well.
How about just setting here

bool WidenNewIVOnlyFirstLaneUsed = all_of(...);
if (WidenOriginalIV->needsVectorIV() || WidenNewIVOnlyFirstLaneUsed) {
  WidenNewIV->replaceAllUsesWith(WidenOriginalIV);
  WidenNewIV->eraseFromParent();
}

?

This revision is now accepted and ready to land.Jan 27 2022, 2:31 AM
fhahn updated this revision to Diff 403567.Jan 27 2022, 3:02 AM

Thanks Ayal! Moved the all_of check outside the if and assigned to variable.

This change now depends on D118167, which is NFC and adds the needsVectorIV/needsScalarIV required.

fhahn updated this revision to Diff 404274.Jan 29 2022, 8:17 AM

Rebased on current main, I am planning on landing this soon.

This revision was landed with ongoing or failed builds.Jan 29 2022, 8:25 AM
This revision was automatically updated to reflect the committed changes.