This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add first VPlan version of sinkScalarOperands.
ClosedPublic

Authored by fhahn on Apr 11 2021, 4:12 AM.

Details

Summary

This patch adds a first VPlan-based implementation of sinking of scalar
operands.

The current version traverse a VPlan once and processes all operands of
a predicated REPLICATE recipe. If one of those operands can be sunk,
it is moved to the block containing the predicated REPLICATE recipe.
Continue with processing the operands of the sunk recipe.

The initial version does not re-process candidates after other recipes
have been sunk. It also cannot partially sink induction increments at
the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the
induction is used for example in a GEP, only the first lane is used and
in the lowered IR the adds for the other lanes can be sunk into the
predicated blocks.

Diff Detail

Event Timeline

fhahn created this revision.Apr 11 2021, 4:12 AM
fhahn requested review of this revision.Apr 11 2021, 4:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2021, 4:12 AM
Herald added a subscriber: vkmr. · View Herald Transcript
gilr added inline comments.Apr 20 2021, 2:36 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
129

use VPBB instead?

fhahn updated this revision to Diff 338869.Apr 20 2021, 7:45 AM

Updated to use VPBB in the lambda, thanks!

I think the original code also had a bug that meant we would sink instructions that may read from memory, which would be incorrect if we move it across a instruction that may modify the read memory. I added a new test case and I think the original sinkScalarOperands gets this wrong as well.

I also put up D100857, which adjusts the logic in the legacy version of sinkScalarOperands to only handle the scalar parts of recipes that cannot be sunk by the Vplan-to-Vplan sinkScalarOperands. This requires a bit more code, so I put it up as a separate patch, but it could also be folded into this one.

Kazhuu added a subscriber: Kazhuu.Apr 29 2021, 2:20 PM
gilr added a comment.May 3 2021, 8:22 AM

Very nice step forward!!

llvm/lib/Transforms/Vectorize/VPlan.h
1266

Allowing to reset IsPredicated seems a bit fragile, as alsoPack is computed on contruction based on it. Since this patch only requires setting IsPredicated, perhaps better to have setIsPredicated() { IsPredicated = true; }. Might be worth adding a comment here that setting IsPredicated late is consistent with alsoPack being initialized to false as all users of the sunk recipes reside within the replicated region so no packing is required.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
108

Traversing the VPlan in this case doesn't require RPOT, right? can be done with a more efficient recursive scan?
Simpler to first collect all seeds into a worklist, saving the need to make early increments? If so, can alternatively be done by RecipeBuilder during VPlan construction.

128

Should be return !UI || UI->getParent() != VPBB?

133

Would be good to leave a comment here explaining that setting IsPredicated is done in favor of the legacy sinkScalarOperands() which still follows.

fhahn updated this revision to Diff 343186.May 5 2021, 2:00 PM

use depth-first instead of RPO, add comment for setting IsPredicated and assertion.

fhahn marked 2 inline comments as done.May 5 2021, 2:03 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlan.h
1266

Agreed! I added a comment and an assertion.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
108

That's a good point, the traversal order should not matter. I updated it to use depth-first.

Simpler to first collect all seeds into a worklist, saving the need to make early increments? If so, can alternatively be done by RecipeBuilder during VPlan construction.

It may be simpler, but would introduce coupling with the recipe builder and I am not sure that's desired at this point? It seems like it may limit composability of VPlan transforms (e.g. if another transform introduces new predicated recipes, it may have to update the list of seed instructions in the builder or somewhere else)

128

Yes I think so, great catch!

fhahn updated this revision to Diff 345576.May 14 2021, 3:30 PM

Remove need for setIsPredicate() by instead considering the operands of sink candidates in PredBB in the worklist as well.

Ayal added inline comments.May 22 2021, 1:51 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4557

Perhaps worth noting that VPlan's sinkScalarOperands() was responsible for sinking the instruction into PredBB in this case, but may have failed to sink its (direct or indirect) operands, which we try again here.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
108

Simpler to first collect all seeds into a worklist, saving the need to make early increments?

I.e., instead of operating in-place using make_early_inc_range(), and somewhat similar to the original sinkScalarOperand()'s use of PredicatedInstructions, would it be better to first enlist all candidates into WorkList, and then loop over it? VPBB can be set to the parent of the first user, which all other users must agree with via the any_of(). In any case, better reuse VPBB instead of RepR->getParent(); and choose more meaningful names than Current, RepR and VPBB to deal with sinking a source Def into the destination basic block where all its Users reside.

Interesting to see that InstsToReanalyze is not needed, presumably because the last user to sink would always trigger the potential sinking of its def? Probably worth a note.

127

Note that in general, insertion point should be after all phi's, but none are expected in VPBB; right?

fhahn updated this revision to Diff 347244.May 23 2021, 5:32 AM
fhahn marked an inline comment as done.

Address comments, thanks!

fhahn marked an inline comment as done.May 23 2021, 5:37 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4557

I updated the comment to reference VPlanTransforms::sinkScalarOperands and the interaction here.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
108

I updated the code to first collect all instructions in the initial worklist and then process them separately.

I also adjusted the variable names.

It looks like InstsToReanalyze is not required for any existing test cases, even for the implementation in LV. I tried to construct cases, but if a recipe is used by multiple recipes that are to be sunk, the one that is used multiple times does not get scalarized. I think it's not needed, because if we sink a recipe, its operands get added to the worklist again and the only way it enables further sinking is that there must be a def-use relation between to newly sunk recipe and the one we need to re-try.

127

Yes, there cannot be any PHI recipes in the block, but nevertheless it won't hurt to get the first non-phi recipe, in case that changes in the future. Updated the code.

Ayal added a comment.May 23 2021, 1:41 PM

Thanks for the changes!
Few remaining nits.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4559

To avoid confusion may help to state more accurately. VPlan's sinkScalarOperands() succeeded in sinking the scalar instruction I, hence it appears in PredBB; but it may have failed to sink I's operands (recursively), which we try (again) here.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
128

Therefor[e].

Note that SinkCandidate must have at-least one user.
Should cast<recipe>(first user) be dyn_cast (as below), or else drop checking if it's null?
(Note that sinkScalarOperands should not encounter 'native's block users, so it should be ok to cast the first user into a recipe.)

132

Is the cast<VPBasicBlock> needed for SinkTo? (due to const(?))

fhahn updated this revision to Diff 347325.May 24 2021, 1:51 AM
fhahn marked an inline comment as done.

Address latest comments, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4559

updated the comment, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
128

Updated to use a dyn_cast, so the code does not need to rely on what kinds of plans are passed in.

132

Removed, it's not needed.

Ayal accepted this revision.May 24 2021, 2:28 AM

Looks good to me, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
108

Interesting; might be worth removing the presumably redundant InstsToReanalyze, in a separate patch.
In any case, VPlan's sinkScalarOperands() should be enhanced to retire the original sinkScalarOperands() altogether.

127

"Therefor the" >> "Therefore the"

This revision is now accepted and ready to land.May 24 2021, 2:28 AM
This revision was landed with ongoing or failed builds.May 24 2021, 7:32 AM
This revision was automatically updated to reflect the committed changes.
fhahn added inline comments.May 24 2021, 7:33 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
127

Ah yes! Fixed in the committed version, thanks!

Hello!

The following starts failing with htis patch:

opt -loop-vectorize -force-vector-width=2 -S -o - bbi-56568.ll

Result:

Instruction does not dominate all uses!
  %2 = getelementptr inbounds [9 x i16], [9 x i16]* @v_102, i16 0, i16 undef
  %7 = load i16, i16* %2, align 1
opt: ../lib/Transforms/Vectorize/LoopVectorize.cpp:10200: bool llvm::LoopVectorizePass::processLoop(llvm::Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed.

fhahn added a comment.May 27 2021, 6:10 AM

Hello!

The following starts failing with htis patch:

opt -loop-vectorize -force-vector-width=2 -S -o - bbi-56568.ll

Result:

Instruction does not dominate all uses!
  %2 = getelementptr inbounds [9 x i16], [9 x i16]* @v_102, i16 0, i16 undef
  %7 = load i16, i16* %2, align 1
opt: ../lib/Transforms/Vectorize/LoopVectorize.cpp:10200: bool llvm::LoopVectorizePass::processLoop(llvm::Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed.

Thanks for the report. The issue appears to be that we sink a uniform replicate recipe. We only generate lane 0 for such recipes, so sinking should not be beneficial and it causes the crash because all users use lane 0, which will be generated in a conditional block after sinking. Should be fixed in 38641ddf3e56

Hello!

The following starts failing with htis patch:

opt -loop-vectorize -force-vector-width=2 -S -o - bbi-56568.ll

Result:

Instruction does not dominate all uses!
  %2 = getelementptr inbounds [9 x i16], [9 x i16]* @v_102, i16 0, i16 undef
  %7 = load i16, i16* %2, align 1
opt: ../lib/Transforms/Vectorize/LoopVectorize.cpp:10200: bool llvm::LoopVectorizePass::processLoop(llvm::Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed.

Thanks for the report. The issue appears to be that we sink a uniform replicate recipe. We only generate lane 0 for such recipes, so sinking should not be beneficial and it causes the crash because all users use lane 0, which will be generated in a conditional block after sinking. Should be fixed in 38641ddf3e56

Thanks!