This is an archive of the discontinued LLVM Phabricator instance.

[LV] Move code to place induction increment to VPlan post-processing.
ClosedPublic

Authored by fhahn on Mar 14 2022, 9:54 AM.

Details

Summary

This patch moves the code to set the correct incoming block for the
backedge value to VPlan::execute.

When generating the phi node, the backedge value is temporarily added
using the pre-header as incoming block. The invalid phi node will be
fixed up during VPlan::execute after main VPlan code generation.
At the same time, the backedge value is also moved to the latch.

This change removes the requirement to create the latch block up-front
for VPWidenIntOrFpInductionRecipe::execute, which in turn will enable
modeling the pre-header in VPlan.

As an alternative, the increment could be modeled as separate recipe,
but that would require more work and a bit of redundant code, as we need
to create the step-vector during VPWidenIntOrFpInductionRecipe::execute
anyways, to create the values for different parts.

Diff Detail

Event Timeline

fhahn created this revision.Mar 14 2022, 9:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 9:54 AM
fhahn requested review of this revision.Mar 14 2022, 9:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 9:54 AM
Herald added a subscriber: vkmr. · View Herald Transcript
Ayal added inline comments.Mar 20 2022, 2:32 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9540

This goes back to D22416.
Simply adding this last step here with its current block, as done prior to D22416, may be incorrect - when header and latch are distinct due to replicated regions? (May be good to (have a) test)

Would be good to explain that the last induction step, which feeds PHI, cannot be added yet to PHI properly, because that requires the latch block - which may not have been formed yet. An alternative to an incorrect addIncoming() may be to record LastInduction temporarily inside its VPWidenIntOrFpInductionRecipe (as it's operand(?) - see below), until it can be added to PHI properly. Given that LastInduction is eventually placed in the latch, another alternative may be to designate a recipe for generating LastInduction, placing it in the latch, so that when the recipe executes it could add LastInduction properly to PHI. Listing these alternatives mainly for the sake of completeness...

llvm/lib/Transforms/Vectorize/VPlan.cpp
988

Could the incoming value to an induction be added similar to adding such values for reductions and FORs below, which have distinct values across the backedge? I.e., another case of SinglePartNeeded? Could PhiR->getBackedgeValue() work - if the induction PhiR is added as an (second) operand of itself?

Moving the last induction step to the end of the latch could then be done separately.

fhahn updated this revision to Diff 416958.Mar 21 2022, 8:22 AM

Add note on why the latch cannot be used.

fhahn marked 2 inline comments as done.Mar 21 2022, 8:36 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9540

This goes back to D22416.
Simply adding this last step here with its current block, as done prior to D22416, may be incorrect - when header and latch are distinct due to replicated regions? (May be good to (have a) test)

I think there are 2 distinct things going on. 1) moving the induction to the latch block and 2) adding the incoming value using the latch block. We could stop doing 1), without impacting correctness, but 2) is needed to generate valid IR in the presence of replicate regions. This should be handled by multiple existing tests.

Would be good to explain that the last induction step, which feeds PHI, cannot be added yet to PHI properly, because that requires the latch block - which may not have been formed yet.

Done!

An alternative to an incorrect addIncoming() may be to record LastInduction temporarily inside its VPWidenIntOrFpInductionRecipe (as it's operand(?) - see below), until it can be added to PHI properly

Yes, but I think we would need to introduce a new VPValue for the increment, so we can map it properly. Adding the PHI as operand to itself wouldn't work I think, because in State we can only store values for each part, but we need the value of the last part + VF. Alternatively we could record the IR value in the recipe. But both of those seem like similar workarounds, but seems slightly more complicated. Another thing to note is that the CFG/IR LV generates during codegen is already temporarily invalid.

Given that LastInduction is eventually placed in the latch, another alternative may be to designate a recipe for generating LastInduction, placing it in the latch, so that when the recipe executes it could add LastInduction properly to PHI. Listing these alternatives mainly for the sake of completeness...

Agreed, but it would add extra clutter to the plan and also the code needed. Please let me know if any of the alternative is strictly preferable to the current approach.

llvm/lib/Transforms/Vectorize/VPlan.cpp
988

Could the incoming value to an induction be added similar to adding such values for reductions and FORs below, which have distinct values across the backedge? I.e., another case of SinglePartNeeded? Could PhiR->getBackedgeValue() work - if the induction PhiR is added as an (second) operand of itself?

Unfortunately I don't think so, without further changes. The recipe for the phi is currently mapped to the vector IV values for each part. The increment is the last part + VF. So we either would have to turn VPWidenIntOrFpInductionRecipe into a multi-def or store the LLVM IR value in the recipe directly. Both seem like bigger workarounds in my opinion.

fhahn marked 2 inline comments as done.Mar 27 2022, 4:09 AM

Ping :)

Ayal added inline comments.Mar 27 2022, 5:23 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
988

(That depends on one's definition of "workaround" ;-)

It would be good if the modeling in VPlan of inductions and header phi's is more consistent. This could perhaps be done later, in subsequent patches. Trying to capture the issue and potential resolutions:

VPHeaderPHIRecipe offers an interface for recording the backedge value as an operand, used by the Canonical IV and other header phi recipes which (must) have distinct backedge VPValues. Their modeling in VPlan thus matches the IR they generate. During VPlan execution, phi's are first generated with start values only, when traversing from header to latch, and are later revisited to add their backedge incoming values after the latch was generated.

VPWidenIntOrFpInductionRecipe is currently not a VPHeaderPHIRecipe, it does not support an explicit backedge value. VPWidenIntOrFpInductionRecipe currently represents both the header phi - explicitly (with UF per-part bumps) and its bump - implicitly; it is conceptually a multi-def recipe, but uses its bump only to feed itself, w/o exposing it to any other user. This implicit bump must wait until the latch is generated in order to be fully rewired to the header phi, but can be partially/incorrectly rewired to the header phi temporarily.

VPCanonicalIVPHIRecipe is currently a VPHeaderPHIRecipe; it represents only the header phi, having a separate CanonicalIVIncrement recipe designated to represent its bump, which feeds both VPCanonicalIVPHIRecipe and BranchOnCount; the canonical IV admittedly has an easier bump to generate than int/fp/pointer inductions. Both header phi and bump can alternatively be represented by a single multi-def recipe, as noted for VPWidenIntOrFpInductionRecipe.

Should VPWidenIntOrFpInductionRecipe expose the bump that feeds it across the backedge, for other potential users, and for transparency/consistency?
Should VPWidenIntOrFpInductionRecipe become a VPHeaderPHIRecipe, by supporting a backedge VPValue?
Should the rewiring of header phi's be delegated to VPHeaderPHIRecipe, say by adding a second 'execute' method to it, to be called after the latch is constructed, here or at the end of Region::execute?

fhahn added inline comments.Mar 27 2022, 11:51 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
988

It would be good if the modeling in VPlan of inductions and header phi's is more consistent. This could perhaps be done later, in subsequent patches. Trying to capture the issue and potential resolutions:

Agreed, we should work towards making those recipes as consistent as we can!

Should VPWidenIntOrFpInductionRecipe expose the bump that feeds it across the backedge, for other potential users, and for transparency/consistency?

IMO I think it would be better the other way around, fold the increment of VPCanonicalIVPHIRecipe back into the recipe. The PHI recipes could model the bump as additional VPValue (i.e. turn them into multi-defs). This would allow us to keep the induction concept encapsulated in single recipes and the plans themselves more concise.

Should VPWidenIntOrFpInductionRecipe become a VPHeaderPHIRecipe, by supporting a backedge VPValue?

Yes I think so. Should be easy now that` VPWidenIntOrFpInductionRecipe` always creates a phi.

Should the rewiring of header phi's be delegated to VPHeaderPHIRecipe, say by adding a second 'execute' method to it, to be called after the latch is constructed, here or at the end of Region::execute?

Yes that would be a good option to move that logic to something like VPHeaderPHIRecipe::fixupPhi.

Updating this patch to turn VPWidenIntOrFpInductionRecipe into a multi-def should be relatively straight forward. Please let me know if you'd prefer to do this in the current patch or as follow up.

Ayal accepted this revision.Mar 27 2022, 1:25 PM

A good step forward in refactoring and simplifying induction variable recipes, raising thoughts for follow-up steps. Ship it!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9540

This goes back to D22416.
Simply adding this last step here with its current block, as done prior to D22416, may be incorrect - when header and latch are distinct due to replicated regions? (May be good to (have a) test)

I think there are 2 distinct things going on. 1) moving the induction to the latch block and 2) adding the incoming value using the latch block. We could stop doing 1), without impacting correctness, but 2) is needed to generate valid IR in the presence of replicate regions. This should be handled by multiple existing tests.

Would be good to explain that the last induction step, which feeds PHI, cannot be added yet to PHI properly, because that requires the latch block - which may not have been formed yet.

Done!

An alternative to an incorrect addIncoming() may be to record LastInduction temporarily inside its VPWidenIntOrFpInductionRecipe (as it's operand(?) - see below), until it can be added to PHI properly

Yes, but I think we would need to introduce a new VPValue for the increment, so we can map it properly. Adding the PHI as operand to itself wouldn't work I think, because in State we can only store values for each part, but we need the value of the last part + VF. Alternatively we could record the IR value in the recipe. But both of those seem like similar workarounds, but seems slightly more complicated. Another thing to note is that the CFG/IR LV generates during codegen is already temporarily invalid.

Given that LastInduction is eventually placed in the latch, another alternative may be to designate a recipe for generating LastInduction, placing it in the latch, so that when the recipe executes it could add LastInduction properly to PHI. Listing these alternatives mainly for the sake of completeness...

Agreed, but it would add extra clutter to the plan and also the code needed. Please let me know if any of the alternative is strictly preferable to the current approach.

9542

VectorPreHeader is definitely an incorrect BB for the backedge incoming value, but it is the correct BB for the start value.
Perhaps the current/header BB would be a safer temporary substitute (which must be non-null), to avoid confusion?

llvm/lib/Transforms/Vectorize/VPlan.cpp
988

Should VPWidenIntOrFpInductionRecipe expose the bump that feeds it across the backedge, for other potential users, and for transparency/consistency?

IMO I think it would be better the other way around, fold the increment of VPCanonicalIVPHIRecipe back into the recipe. The PHI recipes could model the bump as additional VPValue (i.e. turn them into multi-defs). This would allow us to keep the induction concept encapsulated in single recipes and the plans themselves more concise.

Agreed. It's the same way - turning VPWidenIntOrFpInductionRecipe into a multi-def is one way of exposing its bump.
Note that enabling other potential users of the exposed bump deserves attention when sinking it to the latch.

Updating this patch to turn VPWidenIntOrFpInductionRecipe into a multi-def should be relatively straight forward. Please let me know if you'd prefer to do this in the current patch or as follow up.

Follow up is fine, preferably with some TODO/FIXME left behind to indicate expected changes.

This revision is now accepted and ready to land.Mar 27 2022, 1:25 PM
fhahn updated this revision to Diff 418593.Mar 28 2022, 8:13 AM

Thanks Ayal! I rebased the patch and added the TODO. I'll land this one soon.

This revision was landed with ongoing or failed builds.Mar 28 2022, 8:22 AM
This revision was automatically updated to reflect the committed changes.