Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
ClosedPublic

Authored by fhahn on Aug 24 2023, 2:09 PM.

Details

Summary

This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.

This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.

This has the advantage of keeping the logic (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn added a comment.Sep 19 2023, 8:22 AM

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
147

Updated for the new one, can adjust the others as follow-up cleanup, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8619–8620

Should the comment be clarified separately with more details here, after landing the change now that it is much simpler to spell out what is going on?

8726

Can do separately, thanks!

8732–8736

Adjusted and added a comment, thanks!

8948–8952

Commented and use variable to attach name, thanks!

llvm/lib/Transforms/Vectorize/VPlan.h
2086

Not needed in the current version. Removed, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
518

Removed , thanks!

867

Adjusted, thanks! Also update to use getBackedgeValue below.

869

Moved, thanks!

871

Adjusted, thanks!

876

I've a set of patches that introduce explicit unrolling, will share that soon. To keep the current patch as simple as possible, it would be great to keep the increment recipe as is for now.

879

Adjusted, thanks!

880

Added, thanks!

898

reordered, thanks!

914

Expanded comment and also clarified the code by using EB->getTerminator().

915

Renamed, thanks!

926

Maybe it would be better to just have BranchOnCond to use the blocks Successor to set destinations for the branch, then this can be controlled by adjusting the order of successors? In any case, would be good as follow up.

954

Adjusted, thanks!

958

Renamed, thanks!

970

Adjusted, thanks! Trade-off is the need to use getVPSingleValue. Really need a base class for recipes defining a single VPValue....

llvm/lib/Transforms/Vectorize/VPlanTransforms.h
72

Expanded comment, thanks! Also added that DataAndControlFlowWithoutRuntimeCheck implies UseActiveLaneMaskForControlFlow ( and added an assert)

Ayal added inline comments.Sep 19 2023, 11:04 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8951
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
870

nit: worth commenting why poison generating flags are dropped from CanonicalIVIncrement.

957

Instead of setting to nullptr, create an ALM instruction following canonical IV widened/incrementedForPart placing both before first non-phi of loop header, to make sure the former can serve all potential users of ICMP_ULE's?

962

Only immediate users of BTC are sought, suffice to traverse them via
for (VPUser &U : BTC->users()) {
rather than
vp_for_all_users() { ?

969

nit: can assert operand 1 is BTC, given that operand 0 is not and the 2-operand compare uses BTC.

977

The first operand of ALM when addVPLaneMaskPhiAndUpdateExitBranch() generates it in

  • preheader is a CanonicalIVIncrementForPart(StartV)
  • latch is CanonicalIVIncrementForPart(CanonicalIVPHI/CanonicalIVIncrement)

When generating it here in header it is VPWidenCanonicalIVRecipe(CanonicalIVPHI).

Can ALM use CanonicalIVIncrementForPart(CanonicalIVPHI) here too, for consistency?

Are VPWidenCanonicalIVRecipe and CanonicalIVIncrementForPart interchangeable, are both needed?

fhahn updated this revision to Diff 557067.Sep 19 2023, 12:08 PM
fhahn marked 6 inline comments as done.

Address latest comments, thanks

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8951

Argh, yes, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
870

IIUC we should only need to drop them for DataAndControlFlowWithoutRuntimeCheck? Might be good to move the code as follow-up and document then.

957

I adjusted the code to first look for the widened IV. If it exists and !UseActiveLaneMaskForControlFlow, create ALM up front.

962

Adjusted the code to first look for the wide canonical IV and then iterate here over just its users. Overall that makes the structure of the code easier to follow IMO.

969

added assert, thanks!

977

ALM needs scalar arguments, so it uses lane 0. This means here it would be better to use the canonical IV instead of the widened recipe. There are some small codegen changes, so I think that would be better as follow up.

Ayal added inline comments.Sep 19 2023, 3:07 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8909

To move all this from tryToBuildVPlanWithVPRecipes() into LVP::optimize() would require moving getTailFoldingStyle(), useActiveLaneMask() et al. from LoopVectorizer.cpp?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
870

Worth leaving some note behind for now.

941

Look for a user U that is a VPWidenCanonicalIVRecipe and resides in the header?
(There could also be multiple such U's....)
As we're considering multiple ICMP_ULE's, they could be using multiple R's, and both could spread out of the header.

Or simply create a WideCanonicalIV before the first non-phi of the header.

947

This should be an assert, before checking if UseActiveLaneMaskForControlFlow, because addActiveLaneMask() is called only when tail is folded, which implies an ICMP_ULE pattern.

970

OK to erase a user from parent during traversal over users?

Really need a base class for recipes defining a single VPValue....

+1

fhahn updated this revision to Diff 557117.Sep 20 2023, 8:48 AM
fhahn marked 5 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8909

Yes, it would require plugging all this through to optimize(). Would need to think about how to consolidate passing the required info through in a general way.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
870

Thanks, added a comment + TODO

941

Added an assert that it is in the header. All compares are users of the WideCanonicalIV, the created ALM should be placed to be valid for all of them. Might be good to ensure there's a single VPWidenCanonicalIVRecipe in the future as part of the verifier?

947

Converted to assert, thanks!

970

Not necessarily, created a temporary vector, thanks!

Ayal added inline comments.Sep 21 2023, 2:47 PM
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
147

All callers of createOverflowingOp() below pass {false, false} as their WrapFlags, i.e., both HasNUW and HasNSW are set to false, which is the default. Suffice to call createNaryOp() instead?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8909

Worth a TODO to wrap this all within a VPlanTransforms::addActiveLaneMask(*Plan)?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
870

TODO to revisit poison droppings still needs to be added?

914

nit: should be simply Builder.setInsertPoint(OriginalTerminator); by introducing VPBuilder::setInsertPoint(VPRecipeBase*);

939

?

941
947

assert still needs to be added, above?

970

Can alternatively avoid erasing from parent here and leave it to dead recipe elimination.

fhahn updated this revision to Diff 557273.Sep 23 2023, 2:20 PM
fhahn marked 10 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
147

The current flag-handling code assumes that flags are only queried when the correct OperationType is set. Using createNaryOp would not set the operation type, so during recipe executing an assert would be triggered when hasNUW is called. The main motivation for strict checking is to ensure consistent handling. It effectively enforces that hasNUW is only used on recipes/opcodes that have associated flags, which we would have to weaken, either by having hasNUW return false on mismatch or adding extra checks for the operation type. The current handling is similar to how it is handled in LLVM IR.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8909

added, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
860

Thanks, expanded comment!

870

added, thanks!

914

Done, thanks!

939

Done, thanks!

941

done ,thanks!

947

Removed, thanks!

970

Currently we only run dead recipe removal after induction optimizations, so leaving the compares around keeps the wide induction live. Could be addressed by running dead recipe removal at the beginning of the VPlan optimizations as well separately.

Ayal accepted this revision.Sep 24 2023, 3:51 AM

This looks good to me, adding some final thoughts and comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
147

Yes, OperationType needs to be set correctly for any VPRecipeWithIRFlags, and to OverflowingBinOp for any VPInstruction with CanonicalIVIncrementForPart as its opcode. This could presumably be handled by createNaryOp/createInstruction/VPInstruction-constructor, possibly with a protected setOperationType if needed.
OTOH, if non-default wrap flags are desired, such a designated creation method would be needed.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
862

(note that replacing branch-on-count with branch-on-cond(!ALM) effectively turns a countable loop into an uncountable one, yet VectorTripCount remains intact.)

863

(plus some poison droppings, to be percise.)

866
873

// The function adds the following recipes:

874

// %TripCount = calculate-trip-count-minus-VF (original TC) [if DataWithControlFlowWithoutRuntimeCheck]

896
941

Might be good to ensure there's a single VPWidenCanonicalIVRecipe in the future as part of the verifier?

+1

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
418

nit (independent of this patch): suffice to generate this saturating subtraction of N - vscale*32 once instead of replicating it four times.

This revision is now accepted and ready to land.Sep 24 2023, 3:51 AM
fhahn marked 8 inline comments as done.Sep 25 2023, 5:32 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
862

added a note for now/

863

Added, thanks !

866

Adjusted and moved the add part below.

873

Added, thanks!

874

Added, thanks!

896

Fixed, thanks!

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
418

Yes, hopefully should be cleaned up by explicit unrolling, but I could check if we can adjust this during execute to start with ,like done in other places.

This revision was landed with ongoing or failed builds.Sep 25 2023, 5:35 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 7 inline comments as done.