This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
ClosedPublic

Authored by fhahn on Sep 13 2022, 1:20 AM.

Details

Summary

This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.

Diff Detail

Event Timeline

fhahn created this revision.Sep 13 2022, 1:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 1:20 AM
fhahn requested review of this revision.Sep 13 2022, 1:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2022, 1:20 AM
Ayal added a comment.Sep 19 2022, 2:45 PM

Nice refactoring - potentially closing a gap between VPlan and original post-vectorization IR sink scalar operands?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8702

Can alternatively provide the desired constant 1 VPValue when asked to retrieve getStep()?
(Such constant operands are analogous to Attributes in MLIR.)
(Unlike the constant Start which VPlan::prepareToExecute() may replace later with a nonzero value, something worth fixing...)

9533

assert(!State.Instance && "VPTransformedIVRecipe being replicated."); ?

9543

Some things seem a bit confusing here, looking at the existing code:

VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?

The original code has both ScalarIV and CanonicalIV - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?

Now TransformedIV is also "Scalar" (as in non-Vector) similar to ScalarIV.

Perhaps instead of
Value *ScalarIV = State.get(getCanonicalIV(), VPIteration(0, 0));
we should have
Value *CanonicalIV = State.get(getCanonicalIV(), VPIteration(0, 0));
?

Perhaps instead of TransformedIV have NonCanonicalIV, AffineIV or DerivedIV - considering that the canonical IV is aka a "BasicIV"?

Then rename VPTransformedIVRecipe accordingly?

Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.

9556

assert(TransformedIV != ScalarIV && "..."); ?

9570–9571

ditto: everything here is scalar IV. Perhaps BaseIV, FirstLaneScalarIV, or Start - reviving getStartValue() to wrap getOperand(0)?

9573

Is this the source for introducing a unit step as operand to Can[onical]IV?

9575

Have both producing recipes feed VPScalarIVStepsRecipe() with their step value as another operand, reviving getStepValue() to retrieve it?

llvm/lib/Transforms/Vectorize/VPlan.cpp
650–651

Suggest to add a FIXME to find a better way than this for VPlan to represent epilogue loop.

llvm/lib/Transforms/Vectorize/VPlan.h
1131

second >> last

2044

Worth clarifying which scalarized versions are actually generated.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
405

Should this be a method of VPCanonicalIVPHIRecipe, checking if a given Start and Step match those of its own?

414

Suggest to first set VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV(); (or auto) to avoid casting below. Then set VPValue *Start, Step to be either those of CanonicalIV or those of the new VPTransformedIVRecipe, to feed the new VPScalarIVStepsRecipe?

fhahn updated this revision to Diff 468629.Oct 18 2022, 11:17 AM
fhahn marked 12 inline comments as done.

Address latest comments, thanks!

fhahn retitled this revision from [VPlan] Add VPTransformedIVRecipe, use for VPScalarIVStepsRecipe. to [VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe..Oct 18 2022, 11:20 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8702

Updated, thanks!

9533

Added, thanks!

9543

VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?

Yes we could also use VPlan->getCanonicalIV(), but it might be easier to follow if modeled explicitly?

The original code has both ScalarIV and CanonicalIV - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type?

Yep, that should be cleaner in the new code.

Perhaps instead of

Thanks, I updated the naming to use CanonicalIV and DerivedIV. I also renamed VPTransformedIVRecipe -> VPDerivedIVRecipe

Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.

Good idea, I'll see about that separately.

9556

Thanks, added the assert.

9570–9571

Updated to use BaseIV. I kept getOperand(0) for now, as it seems like it may be confused with the different behavior of the existing getStartValue().

9573

Yes but that's refactored now.

9575

I left things as they are for now, after refactoring getSTepValue in VPCanonicalIVPHIrecipe.

llvm/lib/Transforms/Vectorize/VPlan.cpp
650–651

Added, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
1131

That's not needed any longer.

1174

Should be gone now.

2044

Added an explanation, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
405

Added a helper.

414

Updated to have a separate CanIV variable, thanks!

Ayal added a comment.Oct 23 2022, 8:19 AM

Ahh, this brings up some further thoughts...

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9550

nit: can also rename this original emitTransformedIndex() - emitDerivedIndex()

TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?

9571

Ahh, operand 0 of VPScalarIVStepsRecipe is providing both the start value (directly - BaseIV) and the step (indirectly - by casting operand 0 into a recipe and asking it for its step). Better represent all def-use relations explicitly by passing VPValues (only) between recipes.

This can be done by propagating Step instead of delegating it, e.g.: if SCEV is needed then have a common VPExpandSCEVRecipe (placed in the preheader) take care of generating the VPValue *Step = vputils::getOrCreateVPValueForSCEVExpr(Plan, ID.getStep(), SE); and feed it to both VPDerivedIVRecipe and VPScalarIVStepsRecipe. If SCEV is not used, have a Plan.getOrAddVPValue(1)) of the desired type feed these recipes?

Sounds reasonable?

9579

This truncation of Step is needed only if fed directly from CanonicalIV, because the Step produced by DeriveIV should have the desired (value and) type, including a truncation at the end if needed, right?

Alternatively, introduce a DeriveIV recipe also if only truncation of Step is needed, so as not to sink it into triangles? (Or have ScalarIV recipe take care of TruncToTy always, also for derived steps, relieving DeriveIV of doing so, though this would sink into triangles.)

llvm/lib/Transforms/Vectorize/VPlan.cpp
667

... or DerivedIV ...

llvm/lib/Transforms/Vectorize/VPlan.h
1174

nit: inlining can be applied separately.

1862

Check if this method can remain const if this recipe no longer needs to delegate its Step.

1867

Consider removing and feeding users directly with a constant 1 VPValue when needed upon their construction, to keep this recipe with a single value (Start) VPDef. Otherwise it can provide both Start and Step as parts of its multi-valued VPDef...

(nit: const?)

1874

nit: extra space "and the"

step is also the same (1), as type and start; or does the original canonical unit step become UF * VF?

2018–2019

Worth clarifying the use of both Ty (type to cast to before conversion) and TruncToTy (type to cast to after conversion)?

2028

nit: 2nd, 1st, 3rd? Also clarify involved type casts?

2037

Return operand 0 inline, or outlined to avoid cast?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1061

Also check for same type, as claimed at the interface?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
400–401

nit: define ID and TruncI below slightly closer to first use?

404

IVR may be a confusing name, defined as a VPValue* rather than a Recipe. IV also stands for a recipe, conflicting with IVR.

How about renaming IVR to something like BaseIV, and define it as a VPRecipeBase*? It stands for the recipe providing a single scalar value per iteration of vectorized & unrolled loop with the desired type/start/step values, as a Base on which to build scalar steps - a scalar value per lane and part. This is either the canonical IV recipe if suitable or a newly introduced derived IV recipe which transforms it.

Perhaps also rename IV to WidenIV, and spell out CanIV to CanonicalV.

407

TruncTy here is the desired type of the resulting scalar steps; should it be supplied to isCanonical() along with ID in order to check if CanonicalIV is providing the suitable start/step/type for scalar users of Phi, or else a derived recipe is needed?

410

nit: (re)use IVTy
nit: use Can[onical]IV directly instead of IVR.

413

nit: was there some helper to get Def as RecipeBase?

llvm/lib/Transforms/Vectorize/VPlanValue.h
103

Transformed or Derived? Lex order

366

Lex order

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
22

This does appear less descriptive than having SCALAR-STEPS depict the step it uses?

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

While we're here, ir<false>, ir<true> seem odd (and even ;-))

fhahn updated this revision to Diff 473523.Nov 6 2022, 4:49 PM
fhahn marked 13 inline comments as done.
fhahn added a comment.Nov 6 2022, 4:49 PM

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9550

nit: can also rename this original emitTransformedIndex() - emitDerivedIndex()

Done!

TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?

It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here?

9571

Thanks, updated to keep adding the step to VPScalarIVStepsRecipe as well.

9579

at the moment, a derived IV recipe is created if only a truncate is needed, but the derived IV will only be truncated at the end, so it uses the wide step whereas for the steps recipe we need the truncated step.

I think to unify this we would have to compute the derived IV with truncated steps, but that would be a bigger, unrelated change,

llvm/lib/Transforms/Vectorize/VPlan.cpp
667

added, thanks

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1061

updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
400–401

done, thanks!

404

Updated, thanks! I kept the type as VPVale as this requires using getDef just once.

407

done, thanks!

410

done, thanks!

413

the patch has not landed yet (D136068)

llvm/lib/Transforms/Vectorize/VPlanValue.h
103

done thanks!

366

done thanks!

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

I guess that's because they are boolean values and the IR get printed as boolean literals by the IR printer. Do you think this is something that should be changed?

Ayal added a comment.Nov 15 2022, 2:03 PM

pong :-)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9543

VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?

Yes we could also use VPlan->getCanonicalIV(), but it might be easier to follow if modeled explicitly?

Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)

Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.

Good idea, I'll see about that separately.

Found a good place?

9550

TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?

It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here?

Hmm, on the contrary - it appears all users cast the 2nd operand to match the type of the 4th operand before calling emitTransformedIndex(): VTC/"cast.vtc" casts VectorTripCount and AdditionalBypass.second to StepType, CMO/"cast.cmo" casts CountMinusOne to Step Type, PtrInd casts CanonicalIV to Step Type in order for Idx and GlobalIdx to be the desired type.

Here Step is obtained from an operand via State, Ty is recorded in the recipe, and we eventually assert that the type of Step matches Ty. Seems overly complicated? Would it be simpler, here and also at the other callers, to feed emitTransformedIndex() or emitDerivedIndex() directly with the original IV and Step, and let it do the necessary casting of the former to match the type of the latter?

9570–9571

Another type reconciliation worth taking care of by the callee instead of having it assert it? buildScalarSteps() in this case.

9570–9571

I kept getOperand(0) for now, as it seems like it may be confused with the different behavior of the existing getStartValue().

Not sure what the source of confusion is, but it may appear clearer to have either

Value *BaseIV = State.get(getOperand(0), VPIteration(0, 0));
Value *Step = State.get(getOperand(1), VPIteration(0, 0));

or

Value *BaseIV = State.get(getBaseIV(), VPIteration(0, 0));
Value *Step = State.get(getStepValue(), VPIteration(0, 0));
9579

Perhaps worth leaving a note behind about said bigger unrelated change.

llvm/lib/Transforms/Vectorize/VPlan.h
1862

Cannot return a const Type because ConstantInt::get() expects a non-const type, in VPCanonicalIVPHIRecipe::getStepValue()?

1867

Is VPCanonicalIVPHIRecipe::getStepValue() still needed?

1874

I mean, would the following rephrasing be better:
/// i.e., has the same start, step (of 1), and type as the canonical IV.
?

2018–2019

Worth removing Ty altogether from VPDerivedIVRecipe - Step provides the desired type.

2024

nit: Derived IV stands for producing Start + CanonicalIV * Step, so seems more natural to order its operands and dump them in this order? Possibly with + and * instead of separating commas, possibly along with type cast information.

2048

Reordering the operands will also simplify this documentation.

2057

Suffice to have

VPValue *getBasicIVValue() const { return getOperand(0); }

instead?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1068

nit: can check ConstantInt *Step = ID.getConstIntStepValue() as in Loop::isCanonical().

nit: can assert ID.getInductionOpcode() == Instruction::Add.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
396

nit: TruncI is used for its type only, TruncTy may better be called ResultTy, IVTy is hopefully unneeded if VPDerivedRecipe can be constructed w/o it; consider setting:

Type *IVTy = WideIV->getPHINode()->getType();
Type *ResultTy = IVTy;
Type *TruncInstTy = nullptr;
if (auto *TruncI = WideIV->getTruncInst()) {
  TruncInstTy = TruncI->getType();
  ResultTy = TruncInstTy;
}
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

Oh well, the input IR is adding and subtracting true and false...

The reason for having DERIVED-IV with canonical start ir<false> ==0 and step ir<true> ==1 is because of type expansion and/or truncation?

188

The reason for having DERIVED-IV with canonical start ir<0> and step ir<1> is because of type expansion and/or truncation? Perhaps worth dumping the distinct types, as this operation is effectively a cast.

189

nit: check the second (ir<1>) operand of SCALAR-STEPS, for completeness?

It is in practice either +1 or -1 ...

fhahn updated this revision to Diff 478071.Nov 26 2022, 3:51 PM
fhahn marked 19 inline comments as done.

Thank you very much Ayal! Comments should be addressed, hopefully I didn't miss any.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9543

Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)

I think emitTransformedIndex needs the canonical IV.

Found a good place?

I put up D138748 to add it to the VPHeaderPHIRecipe documentation.

9550

Simplified in 12bb5535d270, thanks!

9570–9571

Refactored in bf0bd85f9d82 and removed here, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
1862

Unfortunately yes!

1867

It's not needed in the latest version, removed, thanks!

2018–2019

Removed in the latest version, thanks!

2024

Updated, thanks!

2048

Updated, thanks!

2057

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1068

Updated, thanks! I turns out the assertion triggered in some tests, added as an extra condition.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
396

Simplified, thanks!

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

Here the inputs are already truncated to i1 before recipe construction so the recipe doesn't truncated itself.

188

Updated, thanks!

189

Done, thanks!

Ayal added inline comments.Nov 27 2022, 8:15 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2348–2349

Above assert is now redundant.

Can hoist the comment above and rephrase it, e.g., "Ensure scalar IV and step have the same integer type", or rather "Ensure step has the same type as that of scalar IV"?

9543

Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)

I think emitTransformedIndex needs the canonical IV.

Hmm, emitTransformedIndex just needs an "Index", i.e., an "IV" or "BasicIV", regardless if it is "the canonical" IV.

9549

nit: suffice to ask if TruncToTy != DerivedIV->getType() as the latter is never null?

9550

assert(Step->getType()->isIntegerTy()) belongs earlier if still needed here at all?

9570–9571

Refactored in bf0bd85f9d82 and removed here, thanks!

Good! Thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
1874

above nit worth addressing?

1960

nit: it seems TruncToTy is always non-null, can simplify later check?

1968

CanonicalIV >> BaseIV or IV? I.e., any Index that has a value per iteration rather than per-part/per-lane, regardless if it is "the canonical" IV?

TruncToTy >> ResultTy? I.e., always specifies the type of the result, never nullptr?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1056

turns out the assertion triggered in some tests, added as an extra condition.

Good! Worth a comment?

1058

steps-recipe?

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
22

Ah, the step it uses is explicit - it's ir<1>! What has been omitted is "start" (ir<0>), which is moved from ScalarIVSteps to DerivedIV, if needed.

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

Trying to clarify why DERIVED-IV is needed at all here (too), given that it starts at 0 (false) and bumps with a step of 1 (true)?

fhahn updated this revision to Diff 478116.Nov 27 2022, 2:03 PM
fhahn marked 18 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2348–2349

Updated, thanks!

9543

Right, but VPDerivedIVRecipe will always use the canonical IV, at least to start with IIUC.

9549

Adjusted, thanks!

9550

I might be missing something, but I think before the patch we also only had this assert for the case we need to truncate, as this can only be done for integer types. The induction could also be a floating point IV in general here I think.

9579

Added to buildScalarSteps.

llvm/lib/Transforms/Vectorize/VPlan.h
1174

I think it was already inlined in current main, it just got re-formatted here. Undid the formatting change.

1874

Missed those earlier, should be adjusted, thanks!

1960

Adjusted, thanks!

1968

CanonicalIV >> BaseIV or IV? I.e., any Index that has a value per iteration rather than per-part/per-lane, regardless if it is "the canonical" IV?

I kept it as CanonicalIV for now, as this is what all current clients use (also updated the constructor to require VPCanonicalIVPHIRecipe. I can change it if you would prefer, but can also do that if we ever lift the restriction.

TruncToTy >> ResultTy? I.e., always specifies the type of the result, never nullptr?

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1056

Moved the check to the return and move the part about incrementing it by one there.

1058

Adjusted, thanks!

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
22

Ah I missed this comment earlier, Yes, DerivedIV will now take care of adjusting the start value, ScalarIVSteps just generate steps.

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

I think DERIVED-IV here is for %d = phi i1 ..., which has a different type than the canonical IV, but doesn't itself need truncating because the operands are already i1.

Ayal accepted this revision.Nov 27 2022, 3:58 PM

This looks good to me, ship it!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2343

Is this TODO an NFC to simplify the code, w/o affecting the generated code?

9543

Agreed, VPDerivedIVRecipe is (currently) always fed the canonical IV, but it can compute a derived IV given any BaseIV, i.e., does not rely on it being The Canonical IV.

9550

Ahh, sorry, agreed. (Confused by same assert-guarding-trunc in buildScalarSteps() and thought one could check if ResultTy differs from Step->getType() instead of DerivedIV->getType() thereby asserting earlier before emitTransformedIndex(). But current code is fine.)

llvm/lib/Transforms/Vectorize/VPlan.cpp
667

worth adding in the error message as well

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

Ah, right; DERIVED-IV truncates CAN_IV to i1 before Mul & Add, which is not dumped-out like the truncation to ResultTy.

This revision is now accepted and ready to land.Nov 27 2022, 3:58 PM
This revision was landed with ongoing or failed builds.Nov 28 2022, 8:32 AM
This revision was automatically updated to reflect the committed changes.
dyung added a subscriber: dyung.Nov 29 2022, 2:12 PM

@fhahn one of our internal tests also hit the same assertion failure and I have a reduced testcase for it if it helps. I was going to file a bug for it, but since you have already reverted the change, I'll hold off for now. Let me know if you would like the testcase we found.