Page MenuHomePhabricator

[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
ClosedPublic

Authored by fhahn on Sep 13 2022, 1:20 AM.

Details

Summary

This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn added a comment.Sun, Nov 6, 4:49 PM

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9550

nit: can also rename this original emitTransformedIndex() - emitDerivedIndex()

Done!

TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?

It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here?

9571

Thanks, updated to keep adding the step to VPScalarIVStepsRecipe as well.

9579

at the moment, a derived IV recipe is created if only a truncate is needed, but the derived IV will only be truncated at the end, so it uses the wide step whereas for the steps recipe we need the truncated step.

I think to unify this we would have to compute the derived IV with truncated steps, but that would be a bigger, unrelated change,

llvm/lib/Transforms/Vectorize/VPlan.cpp
667

added, thanks

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1061

updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
400–401

done, thanks!

404

Updated, thanks! I kept the type as VPVale as this requires using getDef just once.

407

done, thanks!

410

done, thanks!

413

the patch has not landed yet (D136068)

llvm/lib/Transforms/Vectorize/VPlanValue.h
103

done thanks!

366

done thanks!

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

I guess that's because they are boolean values and the IR get printed as boolean literals by the IR printer. Do you think this is something that should be changed?

Ayal added a comment.Tue, Nov 15, 2:03 PM

pong :-)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9543

VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()?

Yes we could also use VPlan->getCanonicalIV(), but it might be easier to follow if modeled explicitly?

Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)

Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane.

Good idea, I'll see about that separately.

Found a good place?

9550

TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter?

It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here?

Hmm, on the contrary - it appears all users cast the 2nd operand to match the type of the 4th operand before calling emitTransformedIndex(): VTC/"cast.vtc" casts VectorTripCount and AdditionalBypass.second to StepType, CMO/"cast.cmo" casts CountMinusOne to Step Type, PtrInd casts CanonicalIV to Step Type in order for Idx and GlobalIdx to be the desired type.

Here Step is obtained from an operand via State, Ty is recorded in the recipe, and we eventually assert that the type of Step matches Ty. Seems overly complicated? Would it be simpler, here and also at the other callers, to feed emitTransformedIndex() or emitDerivedIndex() directly with the original IV and Step, and let it do the necessary casting of the former to match the type of the latter?

9570–9571

Another type reconciliation worth taking care of by the callee instead of having it assert it? buildScalarSteps() in this case.

9570–9571

I kept getOperand(0) for now, as it seems like it may be confused with the different behavior of the existing getStartValue().

Not sure what the source of confusion is, but it may appear clearer to have either

Value *BaseIV = State.get(getOperand(0), VPIteration(0, 0));
Value *Step = State.get(getOperand(1), VPIteration(0, 0));

or

Value *BaseIV = State.get(getBaseIV(), VPIteration(0, 0));
Value *Step = State.get(getStepValue(), VPIteration(0, 0));
9579

Perhaps worth leaving a note behind about said bigger unrelated change.

llvm/lib/Transforms/Vectorize/VPlan.h
1862

Cannot return a const Type because ConstantInt::get() expects a non-const type, in VPCanonicalIVPHIRecipe::getStepValue()?

1867

Is VPCanonicalIVPHIRecipe::getStepValue() still needed?

1874

I mean, would the following rephrasing be better:
/// i.e., has the same start, step (of 1), and type as the canonical IV.
?

2018–2019

Worth removing Ty altogether from VPDerivedIVRecipe - Step provides the desired type.

2024

nit: Derived IV stands for producing Start + CanonicalIV * Step, so seems more natural to order its operands and dump them in this order? Possibly with + and * instead of separating commas, possibly along with type cast information.

2048

Reordering the operands will also simplify this documentation.

2057

Suffice to have

VPValue *getBasicIVValue() const { return getOperand(0); }

instead?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1068

nit: can check ConstantInt *Step = ID.getConstIntStepValue() as in Loop::isCanonical().

nit: can assert ID.getInductionOpcode() == Instruction::Add.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
396

nit: TruncI is used for its type only, TruncTy may better be called ResultTy, IVTy is hopefully unneeded if VPDerivedRecipe can be constructed w/o it; consider setting:

Type *IVTy = WideIV->getPHINode()->getType();
Type *ResultTy = IVTy;
Type *TruncInstTy = nullptr;
if (auto *TruncI = WideIV->getTruncInst()) {
  TruncInstTy = TruncI->getType();
  ResultTy = TruncInstTy;
}
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

Oh well, the input IR is adding and subtracting true and false...

The reason for having DERIVED-IV with canonical start ir<false> ==0 and step ir<true> ==1 is because of type expansion and/or truncation?

188

The reason for having DERIVED-IV with canonical start ir<0> and step ir<1> is because of type expansion and/or truncation? Perhaps worth dumping the distinct types, as this operation is effectively a cast.

189

nit: check the second (ir<1>) operand of SCALAR-STEPS, for completeness?

It is in practice either +1 or -1 ...

fhahn updated this revision to Diff 478071.Sat, Nov 26, 3:51 PM
fhahn marked 19 inline comments as done.

Thank you very much Ayal! Comments should be addressed, hopefully I didn't miss any.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9543

Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)

I think emitTransformedIndex needs the canonical IV.

Found a good place?

I put up D138748 to add it to the VPHeaderPHIRecipe documentation.

9550

Simplified in 12bb5535d270, thanks!

9570–9571

Refactored in bf0bd85f9d82 and removed here, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
1862

Unfortunately yes!

1867

It's not needed in the latest version, removed, thanks!

2018–2019

Removed in the latest version, thanks!

2024

Updated, thanks!

2048

Updated, thanks!

2057

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1068

Updated, thanks! I turns out the assertion triggered in some tests, added as an extra condition.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
396

Simplified, thanks!

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

Here the inputs are already truncated to i1 before recipe construction so the recipe doesn't truncated itself.

188

Updated, thanks!

189

Done, thanks!

Ayal added inline comments.Sun, Nov 27, 8:15 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2348–2349

Above assert is now redundant.

Can hoist the comment above and rephrase it, e.g., "Ensure scalar IV and step have the same integer type", or rather "Ensure step has the same type as that of scalar IV"?

9543

Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)

I think emitTransformedIndex needs the canonical IV.

Hmm, emitTransformedIndex just needs an "Index", i.e., an "IV" or "BasicIV", regardless if it is "the canonical" IV.

9549

nit: suffice to ask if TruncToTy != DerivedIV->getType() as the latter is never null?

9550

assert(Step->getType()->isIntegerTy()) belongs earlier if still needed here at all?

9570–9571

Refactored in bf0bd85f9d82 and removed here, thanks!

Good! Thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
1874

above nit worth addressing?

1960

nit: it seems TruncToTy is always non-null, can simplify later check?

1968

CanonicalIV >> BaseIV or IV? I.e., any Index that has a value per iteration rather than per-part/per-lane, regardless if it is "the canonical" IV?

TruncToTy >> ResultTy? I.e., always specifies the type of the result, never nullptr?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1056

turns out the assertion triggered in some tests, added as an extra condition.

Good! Worth a comment?

1058

steps-recipe?

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
22

Ah, the step it uses is explicit - it's ir<1>! What has been omitted is "start" (ir<0>), which is moved from ScalarIVSteps to DerivedIV, if needed.

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

Trying to clarify why DERIVED-IV is needed at all here (too), given that it starts at 0 (false) and bumps with a step of 1 (true)?

fhahn updated this revision to Diff 478116.Sun, Nov 27, 2:03 PM
fhahn marked 18 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2348–2349

Updated, thanks!

9543

Right, but VPDerivedIVRecipe will always use the canonical IV, at least to start with IIUC.

9549

Adjusted, thanks!

9550

I might be missing something, but I think before the patch we also only had this assert for the case we need to truncate, as this can only be done for integer types. The induction could also be a floating point IV in general here I think.

9579

Added to buildScalarSteps.

llvm/lib/Transforms/Vectorize/VPlan.h
1174

I think it was already inlined in current main, it just got re-formatted here. Undid the formatting change.

1874

Missed those earlier, should be adjusted, thanks!

1960

Adjusted, thanks!

1968

CanonicalIV >> BaseIV or IV? I.e., any Index that has a value per iteration rather than per-part/per-lane, regardless if it is "the canonical" IV?

I kept it as CanonicalIV for now, as this is what all current clients use (also updated the constructor to require VPCanonicalIVPHIRecipe. I can change it if you would prefer, but can also do that if we ever lift the restriction.

TruncToTy >> ResultTy? I.e., always specifies the type of the result, never nullptr?

Updated, thanks!

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
1056

Moved the check to the return and move the part about incrementing it by one there.

1058

Adjusted, thanks!

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
22

Ah I missed this comment earlier, Yes, DerivedIV will now take care of adjusting the start value, ScalarIVSteps just generate steps.

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

I think DERIVED-IV here is for %d = phi i1 ..., which has a different type than the canonical IV, but doesn't itself need truncating because the operands are already i1.

Ayal accepted this revision.Sun, Nov 27, 3:58 PM

This looks good to me, ship it!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2343

Is this TODO an NFC to simplify the code, w/o affecting the generated code?

9543

Agreed, VPDerivedIVRecipe is (currently) always fed the canonical IV, but it can compute a derived IV given any BaseIV, i.e., does not rely on it being The Canonical IV.

9550

Ahh, sorry, agreed. (Confused by same assert-guarding-trunc in buildScalarSteps() and thought one could check if ResultTy differs from Step->getType() instead of DerivedIV->getType() thereby asserting earlier before emitTransformedIndex(). But current code is fine.)

llvm/lib/Transforms/Vectorize/VPlan.cpp
667

worth adding in the error message as well

llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll
73–75

Ah, right; DERIVED-IV truncates CAN_IV to i1 before Mul & Add, which is not dumped-out like the truncation to ResultTy.

This revision is now accepted and ready to land.Sun, Nov 27, 3:58 PM
This revision was landed with ongoing or failed builds.Mon, Nov 28, 8:32 AM
This revision was automatically updated to reflect the committed changes.
dyung added a subscriber: dyung.Tue, Nov 29, 2:12 PM

@fhahn one of our internal tests also hit the same assertion failure and I have a reduced testcase for it if it helps. I was going to file a bug for it, but since you have already reverted the change, I'll hold off for now. Let me know if you would like the testcase we found.