This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Proactively create mask for tail-folding up-front (NFCI).
ClosedPublic

Authored by fhahn on Aug 3 2023, 2:21 PM.

Details

Summary

Split off mask creation for tail folding and proactively create the mask for
the header block.

This simplifies createBlockInMask.

Diff Detail

Event Timeline

fhahn created this revision.Aug 3 2023, 2:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 2:21 PM
fhahn requested review of this revision.Aug 3 2023, 2:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 2:21 PM
Herald added subscribers: wangpc, vkmr. · View Herald Transcript
ssinad added a subscriber: ssinad.Aug 4 2023, 3:56 AM
Ayal added a comment.Aug 5 2023, 4:57 AM

Split off mask creation for tail folding and always create the mask for
the header block.

"always" >> "proactively"?

Down the road tail folding should be considered as a VPlan2VPlan transformation along with masking/predication.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8052

nit (independent of this patch): more accurately prefixed getOrCreate.

8065–8066

nit: indent

The comment about "Loop incoming mask" is now obsolete, it refers to BB being the header.

8065–8066

add assert message; a header mask is handled earlier by createTailFoldHeaderMask(), and now must have been found in cache.

8083

Have it return void?

Note (independent of this patch): this is truly create, except for useActiveLaneMaskForControlFlow case...

8087

assert that we create it only once?

8091

nit: here a null BlockMask will not be used. Better define it later closer to where it's actually set and used.

8094

nit: best assert earlier, first thing.

8103

nit (independent of this patch): some explanation of using active lane mask (below, not for control flow - above) which does use IV and TC, is missing here.

8776–8780

Worth a comment to explain that the mask of the header block is created here proactively, if needed - when folding tail, leaving the masks of other blocks to be created on-demand, using the header block mask.

8836–8837

Why/Is this change needed/related to tail folding? Any (header) phi recipe should indeed be placed along with other phi's rather than be appended after non-phi recipes. But are there now other non-phi, non-trunc Instr's that generate non-widen-int-or-fp-induction header phi recipes?

If needed, above comment needs update.

Perhaps better optimize truncating IV's later as a VPlan2VPlan transformation, as in optimizeInductions().

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
139–140

(nit: this assumption will now hold ;-)

fhahn updated this revision to Diff 547482.Aug 5 2023, 6:49 AM
fhahn marked 9 inline comments as done.

Address latest comments, thanks!

Down the road tail folding should be considered as a VPlan2VPlan transformation along with masking/predication.

Agreed! This would effectively require adjusting all existing masks with the loop mask.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8065–8066

Fixed indent and moved this code up, also more accurately checking for BB == OrigLoop->getHeader() which should be the only relevant case.

8065–8066

Assert removed, as we now handle OrigLoop->getHeader() == BB directly.

8083

Updated to return void

8087

This isn't needed as createTailFoldHeaderMask is called once and nothing will be cached then. Remove the code.

8091

Moved, also dropped the comment; there always will be a non-all-one mask.

8094

Moved, thanks!

8776–8780

Thanks, added a comment!

8836–8837

I updated the comment for now. Doing as a VPlan2VPlan transform should be feasible in general, but it would require TTI to check if the truncate would be free.

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
139–140

Clarified comment that the header mask will either be set to true or the loop mask when tail-folding.

fhahn retitled this revision from [VPlan] Create mask for tail-folding up-front (NFCI). to [VPlan] Proactively create mask for tail-folding up-front (NFCI)..Aug 5 2023, 6:50 AM
fhahn edited the summary of this revision. (Show Details)
Allen added a subscriber: Allen.Aug 5 2023, 7:10 AM
fhahn updated this revision to Diff 549063.Aug 10 2023, 8:56 AM

Rebase and add missing early return. Ping :)

fhahn added a comment.Aug 11 2023, 7:07 AM

I shared a patch to perform tail-folding as VPlan2VPlan transform in D157713. It requires a bit more work to get the right APIs and things into place, but separating out the header mask creation code as in the current patch should be good first step in that direction.

ABataev added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8057

It may create a new item in BlockMaskCache with nullptr value. Do you need to check that the user actually expects it?

Ayal added inline comments.Aug 20 2023, 12:13 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8057

Agreed. Better explicitly always set BlockMaskCache[] for header BB, and assert BB is not the header after looking for its mask in the cache below.

8064

nit: setting BlockMask to null here mostly represents "uninitialized" - to be overwritten by the first edge mask; and to serve as the (all-true) mask for blocks free of any predecessors.

8103

nit (independent of this patch).

8777–8780

and have createHeaderMask() set the mask to null if foldTail is false?

8836–8845

nit (independent of this patch): can the check that first non phi isn't end() be dropped? I.e., inserting before end is fine and equivalent to append.
nit (given longer explanation): consider dealing with the simpler appending case first.

8846–8848

ahh, thanks for the clarification!

8848

nit: "before" here means both in time and space. Perhaps "non-phi recipes setting header mask are introduced earlier than regular header phi recipes, and should appear after them," would be clearer. (Related to all header phi's, not only inductions).

fhahn updated this revision to Diff 552023.Aug 21 2023, 8:12 AM
fhahn marked 7 inline comments as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8057

Thanks, updated to use .find() and added an assert. Clients already should handle the case the mask is nullptr, as this is used to indicate no-mask (all-true mask) in the existing code.

8103
8777–8780

Updated, thanks!

8836–8845

I dropped the check from the condition in 57a6f6579c97 and moved it to the assert, thanks!

8846–8848

Adjusted, thanks!

8848

Reworded, thanks!

Ayal accepted this revision.Aug 22 2023, 7:01 AM

This looks good to me, adding some thoughts that could be addresses now or later.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8055–8065

?

8094

Would it be better to have one switch (TFStyle) whose cases compute BlockMask, followed by placing it in the cache, where None refers to not folding the tail?

8117

This else case of not using active lane mask is aka DataWithoutLaneMask.

ActiveLaneMask is semantically equivalent to an ICmpULE (except for wrapping TC?), so down the road we could start canonically with one and if desired replace it with the other, or with an ActiveLaneMaskPhi, or None, as VPlan2VPlan transforms.

8773

Sh/Could addCanonicalIVRecipes() move to RecipeBuilder?

8836–8837

Would something like the following be better?

auto InsertionPoint = isa<VPHeaderPHIRecipe>(Recipe) ? VPBB->getFirstNonPhi() : VBPP->end();
Recipe->insertBefore(*VPBB, InsertionPoint);

with an

assert ((!isa<VPHeaderPHIRecipe>(Recipe) ||
         VPBB->getFirstNonPhi() == VPBB->end() ||
         CM.foldTailByMasking() || isa<TruncInst>(Instr)) &&
               "unexpected recipe needs moving");

?

Note that phi recipes in VPBB's other than header VPBB also need to be placed in the phi section of their VPBB, in general. However, (a) this currently includes only VPPredInstPHIRecipe, which is introduced into VPlan only later, and (b) all current cases of out-of-place recipes occur in the header only - its mask and a trunc of an induction phi.

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
146

nit: seems slightly more logical to place createHeaderMask() first, before createBlockInMask(), which in turn continues to be followed by createEdgeMask.

This revision is now accepted and ready to land.Aug 22 2023, 7:01 AM
fhahn updated this revision to Diff 552650.Aug 23 2023, 3:52 AM
fhahn marked 6 inline comments as done.

Address latest comments, thanks! I am planning on landing this soon.

fhahn added inline comments.Aug 23 2023, 4:21 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8055–8065

Simplified as suggested, thanks!

8094

I'll share a patch soon to add active-lane-mask as optimization.

8117

I'll share a patch soon to add active-lane-mask as optimization.

8773

Probably, let me check.

8836–8837

Sounds good but I think it would be better to do that separately from the current change.

llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
146

Yep, reordered.

Ayal added a comment.Aug 23 2023, 4:30 AM

Couple of minor nits.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8083

nit: reorder implementations of createHeaderMask() and createBlockInMask(), matching the reordering of their declarations.

8777
This revision was landed with ongoing or failed builds.Aug 23 2023, 1:36 PM
This revision was automatically updated to reflect the committed changes.
fhahn marked 2 inline comments as done.Aug 23 2023, 2:25 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8083

Fixed in the committed version, thanks!

8777

Fixed in the committed version, thanks!