This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Nice refactoring - potentially closing a gap between VPlan and original post-vectorization IR sink scalar operands?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8696 | Can alternatively provide the desired constant 1 VPValue when asked to retrieve getStep()? | |
9527 | assert(!State.Instance && "VPTransformedIVRecipe being replicated."); ? | |
9537 | Some things seem a bit confusing here, looking at the existing code: VPScalarIVStepsRecipe::getCanonicalIV() presumably retrieves the same Recipe/VPValue as does the enclosing VPlan->getCanonicalIV()? The original code has both ScalarIV and CanonicalIV - are they not the same - one retrieves a Value per lane (0,0) and the other per part (0) - used only to check its type? Now TransformedIV is also "Scalar" (as in non-Vector) similar to ScalarIV. Perhaps instead of Perhaps instead of TransformedIV have NonCanonicalIV, AffineIV or DerivedIV - considering that the canonical IV is aka a "BasicIV"? Then rename VPTransformedIVRecipe accordingly? Would be good to explain somewhere all the IV recipes together: those representing a single scalar (canonical or not) across VF&UF, a single vector per part, a single scalar per lane. | |
9550 | assert(TransformedIV != ScalarIV && "..."); ? | |
9569–9570 | ditto: everything here is scalar IV. Perhaps BaseIV, FirstLaneScalarIV, or Start - reviving getStartValue() to wrap getOperand(0)? | |
9572 | Is this the source for introducing a unit step as operand to Can[onical]IV? | |
9574 | Have both producing recipes feed VPScalarIVStepsRecipe() with their step value as another operand, reviving getStepValue() to retrieve it? | |
llvm/lib/Transforms/Vectorize/VPlan.cpp | ||
642–643 | Suggest to add a FIXME to find a better way than this for VPlan to represent epilogue loop. | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
1131 | second >> last | |
1992 | Worth clarifying which scalarized versions are actually generated. | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
405 | Should this be a method of VPCanonicalIVPHIRecipe, checking if a given Start and Step match those of its own? | |
414 | Suggest to first set VPCanonicalIVPHIRecipe *CanonicalIV = Plan.getCanonicalIV(); (or auto) to avoid casting below. Then set VPValue *Start, Step to be either those of CanonicalIV or those of the new VPTransformedIVRecipe, to feed the new VPScalarIVStepsRecipe? |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
8696 | Updated, thanks! | |
9527 | Added, thanks! | |
9537 |
Yes we could also use VPlan->getCanonicalIV(), but it might be easier to follow if modeled explicitly?
Yep, that should be cleaner in the new code.
Thanks, I updated the naming to use CanonicalIV and DerivedIV. I also renamed VPTransformedIVRecipe -> VPDerivedIVRecipe
Good idea, I'll see about that separately. | |
9550 | Thanks, added the assert. | |
9569–9570 | Updated to use BaseIV. I kept getOperand(0) for now, as it seems like it may be confused with the different behavior of the existing getStartValue(). | |
9572 | Yes but that's refactored now. | |
9574 | I left things as they are for now, after refactoring getSTepValue in VPCanonicalIVPHIrecipe. | |
llvm/lib/Transforms/Vectorize/VPlan.cpp | ||
642–643 | Added, thanks! | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
1131 | That's not needed any longer. | |
1172 | Should be gone now. | |
1992 | Added an explanation, thanks! | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
405 | Added a helper. | |
414 | Updated to have a separate CanIV variable, thanks! |
Ahh, this brings up some further thoughts...
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9544 | nit: can also rename this original emitTransformedIndex() - emitDerivedIndex() TODO: have emitTransformedIndex() also take care of casting its 2nd "Index" parameter to Ty instead of asserting it's the same as that of its 4th "Step" parameter? | |
9570 | Ahh, operand 0 of VPScalarIVStepsRecipe is providing both the start value (directly - BaseIV) and the step (indirectly - by casting operand 0 into a recipe and asking it for its step). Better represent all def-use relations explicitly by passing VPValues (only) between recipes. This can be done by propagating Step instead of delegating it, e.g.: if SCEV is needed then have a common VPExpandSCEVRecipe (placed in the preheader) take care of generating the VPValue *Step = vputils::getOrCreateVPValueForSCEVExpr(Plan, ID.getStep(), SE); and feed it to both VPDerivedIVRecipe and VPScalarIVStepsRecipe. If SCEV is not used, have a Plan.getOrAddVPValue(1)) of the desired type feed these recipes? Sounds reasonable? | |
9578 | This truncation of Step is needed only if fed directly from CanonicalIV, because the Step produced by DeriveIV should have the desired (value and) type, including a truncation at the end if needed, right? Alternatively, introduce a DeriveIV recipe also if only truncation of Step is needed, so as not to sink it into triangles? (Or have ScalarIV recipe take care of TruncToTy always, also for derived steps, relieving DeriveIV of doing so, though this would sink into triangles.) | |
llvm/lib/Transforms/Vectorize/VPlan.cpp | ||
659 | ... or DerivedIV ... | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
1172 | nit: inlining can be applied separately. | |
1860 | Check if this method can remain const if this recipe no longer needs to delegate its Step. | |
1865 | Consider removing and feeding users directly with a constant 1 VPValue when needed upon their construction, to keep this recipe with a single value (Start) VPDef. Otherwise it can provide both Start and Step as parts of its multi-valued VPDef... (nit: const?) | |
1875 | nit: extra space "and the" step is also the same (1), as type and start; or does the original canonical unit step become UF * VF? | |
1961 | Worth clarifying the use of both Ty (type to cast to before conversion) and TruncToTy (type to cast to after conversion)? | |
2020 | nit: 2nd, 1st, 3rd? Also clarify involved type casts? | |
2029 | Return operand 0 inline, or outlined to avoid cast? | |
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | ||
1058 | Also check for same type, as claimed at the interface? | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
400–401 | nit: define ID and TruncI below slightly closer to first use? | |
404 | IVR may be a confusing name, defined as a VPValue* rather than a Recipe. IV also stands for a recipe, conflicting with IVR. How about renaming IVR to something like BaseIV, and define it as a VPRecipeBase*? It stands for the recipe providing a single scalar value per iteration of vectorized & unrolled loop with the desired type/start/step values, as a Base on which to build scalar steps - a scalar value per lane and part. This is either the canonical IV recipe if suitable or a newly introduced derived IV recipe which transforms it. Perhaps also rename IV to WidenIV, and spell out CanIV to CanonicalV. | |
407 | TruncTy here is the desired type of the resulting scalar steps; should it be supplied to isCanonical() along with ID in order to check if CanonicalIV is providing the suitable start/step/type for scalar users of Phi, or else a derived recipe is needed? | |
410 | nit: (re)use IVTy | |
413 | nit: was there some helper to get Def as RecipeBase? | |
llvm/lib/Transforms/Vectorize/VPlanValue.h | ||
103 | Transformed or Derived? Lex order | |
362 | Lex order | |
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll | ||
22 | This does appear less descriptive than having SCALAR-STEPS depict the step it uses? | |
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll | ||
73–75 | While we're here, ir<false>, ir<true> seem odd (and even ;-)) |
Address latest comments, thanks!
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9544 |
Done!
It looks like this is only needed for the use here and not the other uses, so it seems simpler to keep to code here? | |
9570 | Thanks, updated to keep adding the step to VPScalarIVStepsRecipe as well. | |
9578 | at the moment, a derived IV recipe is created if only a truncate is needed, but the derived IV will only be truncated at the end, so it uses the wide step whereas for the steps recipe we need the truncated step. I think to unify this we would have to compute the derived IV with truncated steps, but that would be a bigger, unrelated change, | |
llvm/lib/Transforms/Vectorize/VPlan.cpp | ||
659 | added, thanks | |
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | ||
1058 | updated, thanks! | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
400–401 | done, thanks! | |
404 | Updated, thanks! I kept the type as VPVale as this requires using getDef just once. | |
407 | done, thanks! | |
410 | done, thanks! | |
413 | the patch has not landed yet (D136068) | |
llvm/lib/Transforms/Vectorize/VPlanValue.h | ||
103 | done thanks! | |
362 | done thanks! | |
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll | ||
73–75 | I guess that's because they are boolean values and the IR get printed as boolean literals by the IR printer. Do you think this is something that should be changed? |
pong :-)
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9537 |
Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.)
Found a good place? | |
9544 |
Hmm, on the contrary - it appears all users cast the 2nd operand to match the type of the 4th operand before calling emitTransformedIndex(): VTC/"cast.vtc" casts VectorTripCount and AdditionalBypass.second to StepType, CMO/"cast.cmo" casts CountMinusOne to Step Type, PtrInd casts CanonicalIV to Step Type in order for Idx and GlobalIdx to be the desired type. Here Step is obtained from an operand via State, Ty is recorded in the recipe, and we eventually assert that the type of Step matches Ty. Seems overly complicated? Would it be simpler, here and also at the other callers, to feed emitTransformedIndex() or emitDerivedIndex() directly with the original IV and Step, and let it do the necessary casting of the former to match the type of the latter? | |
9569 | Another type reconciliation worth taking care of by the callee instead of having it assert it? buildScalarSteps() in this case. | |
9569–9570 |
Not sure what the source of confusion is, but it may appear clearer to have either Value *BaseIV = State.get(getOperand(0), VPIteration(0, 0)); Value *Step = State.get(getOperand(1), VPIteration(0, 0)); or Value *BaseIV = State.get(getBaseIV(), VPIteration(0, 0)); Value *Step = State.get(getStepValue(), VPIteration(0, 0)); | |
9578 | Perhaps worth leaving a note behind about said bigger unrelated change. | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
1860 | Cannot return a const Type because ConstantInt::get() expects a non-const type, in VPCanonicalIVPHIRecipe::getStepValue()? | |
1865 | Is VPCanonicalIVPHIRecipe::getStepValue() still needed? | |
1875 | I mean, would the following rephrasing be better: | |
1961 | Worth removing Ty altogether from VPDerivedIVRecipe - Step provides the desired type. | |
1972 | nit: Derived IV stands for producing Start + CanonicalIV * Step, so seems more natural to order its operands and dump them in this order? Possibly with + and * instead of separating commas, possibly along with type cast information. | |
1996 | Reordering the operands will also simplify this documentation. | |
2005 | Suffice to have VPValue *getBasicIVValue() const { return getOperand(0); } instead? | |
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | ||
1065 | nit: can check ConstantInt *Step = ID.getConstIntStepValue() as in Loop::isCanonical(). nit: can assert ID.getInductionOpcode() == Instruction::Add. | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
396 | nit: TruncI is used for its type only, TruncTy may better be called ResultTy, IVTy is hopefully unneeded if VPDerivedRecipe can be constructed w/o it; consider setting: Type *IVTy = WideIV->getPHINode()->getType(); Type *ResultTy = IVTy; Type *TruncInstTy = nullptr; if (auto *TruncI = WideIV->getTruncInst()) { TruncInstTy = TruncI->getType(); ResultTy = TruncInstTy; } | |
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll | ||
73–75 | Oh well, the input IR is adding and subtracting true and false... The reason for having DERIVED-IV with canonical start ir<false> ==0 and step ir<true> ==1 is because of type expansion and/or truncation? | |
188 | The reason for having DERIVED-IV with canonical start ir<0> and step ir<1> is because of type expansion and/or truncation? Perhaps worth dumping the distinct types, as this operation is effectively a cast. | |
189 | nit: check the second (ir<1>) operand of SCALAR-STEPS, for completeness? It is in practice either +1 or -1 ... |
Thank you very much Ayal! Comments should be addressed, hopefully I didn't miss any.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9537 | Modeling the canonical IV as an explicit operand is fine. In fact, it then seems irrelevant if its Canonical or not - can simply refer to it as getBasicIV() or getIV()? (When there's a need to call isCanonical() then the CanonicalIV is needed, but that is the case in optimizeInductions() rather than here in DerivedIV.) I think emitTransformedIndex needs the canonical IV.
I put up D138748 to add it to the VPHeaderPHIRecipe documentation. | |
9544 | Simplified in 12bb5535d270, thanks! | |
9569 | Refactored in bf0bd85f9d82 and removed here, thanks! | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
1860 | Unfortunately yes! | |
1865 | It's not needed in the latest version, removed, thanks! | |
1961 | Removed in the latest version, thanks! | |
1972 | Updated, thanks! | |
1996 | Updated, thanks! | |
2005 | Updated, thanks! | |
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | ||
1065 | Updated, thanks! I turns out the assertion triggered in some tests, added as an extra condition. | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
396 | Simplified, thanks! | |
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll | ||
73–75 | Here the inputs are already truncated to i1 before recipe construction so the recipe doesn't truncated itself. | |
188 | Updated, thanks! | |
189 | Done, thanks! |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2343 | Above assert is now redundant. Can hoist the comment above and rephrase it, e.g., "Ensure scalar IV and step have the same integer type", or rather "Ensure step has the same type as that of scalar IV"? | |
9537 |
Hmm, emitTransformedIndex just needs an "Index", i.e., an "IV" or "BasicIV", regardless if it is "the canonical" IV. | |
9543 | nit: suffice to ask if TruncToTy != DerivedIV->getType() as the latter is never null? | |
9544 | assert(Step->getType()->isIntegerTy()) belongs earlier if still needed here at all? | |
9569 |
Good! Thanks! | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
1875 | above nit worth addressing? | |
1961 | nit: it seems TruncToTy is always non-null, can simplify later check? | |
1969 | CanonicalIV >> BaseIV or IV? I.e., any Index that has a value per iteration rather than per-part/per-lane, regardless if it is "the canonical" IV? TruncToTy >> ResultTy? I.e., always specifies the type of the result, never nullptr? | |
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | ||
1053 |
Good! Worth a comment? | |
1055 | steps-recipe? | |
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll | ||
22 | Ah, the step it uses is explicit - it's ir<1>! What has been omitted is "start" (ir<0>), which is moved from ScalarIVSteps to DerivedIV, if needed. | |
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll | ||
73–75 | Trying to clarify why DERIVED-IV is needed at all here (too), given that it starts at 0 (false) and bumps with a step of 1 (true)? |
Address latest comments, thanks!
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2343 | Updated, thanks! | |
9537 | Right, but VPDerivedIVRecipe will always use the canonical IV, at least to start with IIUC. | |
9543 | Adjusted, thanks! | |
9544 | I might be missing something, but I think before the patch we also only had this assert for the case we need to truncate, as this can only be done for integer types. The induction could also be a floating point IV in general here I think. | |
9578 | Added to buildScalarSteps. | |
llvm/lib/Transforms/Vectorize/VPlan.h | ||
1172 | I think it was already inlined in current main, it just got re-formatted here. Undid the formatting change. | |
1875 | Missed those earlier, should be adjusted, thanks! | |
1961 | Adjusted, thanks! | |
1969 |
I kept it as CanonicalIV for now, as this is what all current clients use (also updated the constructor to require VPCanonicalIVPHIRecipe. I can change it if you would prefer, but can also do that if we ever lift the restriction.
Updated, thanks! | |
llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp | ||
1053 | Moved the check to the return and move the part about incrementing it by one there. | |
1055 | Adjusted, thanks! | |
llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll | ||
22 | Ah I missed this comment earlier, Yes, DerivedIV will now take care of adjusting the start value, ScalarIVSteps just generate steps. | |
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll | ||
73–75 | I think DERIVED-IV here is for %d = phi i1 ..., which has a different type than the canonical IV, but doesn't itself need truncating because the operands are already i1. |
This looks good to me, ship it!
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2340 | Is this TODO an NFC to simplify the code, w/o affecting the generated code? | |
9537 | Agreed, VPDerivedIVRecipe is (currently) always fed the canonical IV, but it can compute a derived IV given any BaseIV, i.e., does not rely on it being The Canonical IV. | |
9544 | Ahh, sorry, agreed. (Confused by same assert-guarding-trunc in buildScalarSteps() and thought one could check if ResultTy differs from Step->getType() instead of DerivedIV->getType() thereby asserting earlier before emitTransformedIndex(). But current code is fine.) | |
llvm/lib/Transforms/Vectorize/VPlan.cpp | ||
659 | worth adding in the error message as well | |
llvm/test/Transforms/LoopVectorize/interleave-and-scalarize-only.ll | ||
73–75 | Ah, right; DERIVED-IV truncates CAN_IV to i1 before Mul & Add, which is not dumped-out like the truncation to ResultTy. |
@fhahn one of our internal tests also hit the same assertion failure and I have a reduced testcase for it if it helps. I was going to file a bug for it, but since you have already reverted the change, I'll hold off for now. Let me know if you would like the testcase we found.
Is this TODO an NFC to simplify the code, w/o affecting the generated code?