This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Merge predicated-triangle regions, after sinking.
ClosedPublic

Authored by fhahn on Apr 11 2021, 4:26 AM.

Details

Summary

Sinking scalar operands into predicated-triangle regions may allow
merging regions. This patch adds a VPlan-to-VPlan transform that tries
to merge predicate-triangle regions after sinking.

Diff Detail

Event Timeline

fhahn created this revision.Apr 11 2021, 4:26 AM
fhahn requested review of this revision.Apr 11 2021, 4:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2021, 4:26 AM
Herald added a subscriber: vkmr. · View Herald Transcript
fhahn updated this revision to Diff 346406.May 19 2021, 4:41 AM

ping :)

I rebased the patch and updated it to run merging unconditionally (even if no sinking happened), as there may already be opportunities for merging that do not require sinking. Adjusted a few improved tests.

Ayal added a comment.May 23 2021, 1:26 AM

Nice optimization exercising VPlan capabilities!
Should always be a win, so ok to avoid cost-model considerations, although cost improvement is left unaccounted for.
An alternative may be to try and build larger replicating regions to begin with, but sinkScalarOperands may contribute as well.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
164

rely on order of successors?

169

Can check || ThenBB->getSingleSuccessor() != MergeBB

176

Is RPOT order vital, or would a simple DFS do, given that the pairs of regions handled are dom-postdom pairs?

192

dyn_cast_or_null would also take care of checking if Region has a single successor?

196

dyn_cast_or_null would also take care of checking if Succ has a single successor?

197

This may be somewhat misleading, and currently untestable: replicated regions are (currently) formed for unpredicatable instructions, by 'legacy' vectorizer which does not model block predicates at all. Assert the predicates are null?

205

because ... it may conflict with sink-after?

comment here should probably state what it is we can move.
Could there potentially be fusion-preventing dependencies between the two regions (packing/unpacking?), or does the lack of recipes in the 'middle' block guarantee their absence?

214

May be good to name this lambda, e.g., IsImmovableRecipe.

Check first-order recurrence (more) directly, by

if (PhiR && !PhiR->getRecurrenceDescriptor())
  return true

?
Would be good to have a clear API for such a query, if not a dedicated isa<> recipe.

232

Note: this takes care of fusion-admitting, lane-by-lane dependencies, right?

242

I could appear as multiple operands of U, so avoid early-exit on first match, right? Perhaps worth a test.

Assert that at-least one such operand was found and (re)set?

262

delete DeletedBlocks?

Ayal added inline comments.May 26 2021, 1:21 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
197

OTOH, need to check that both VPBranchOnMaskRecipes use the same mask. Potentially have getPredicate() retrieve this mask? Test that two regions having distinct masks are not merged.

214

ThenRegion is expected to have no phi recipes, let alone header phi recipes, but only replicate recipes? Perhaps something potentially worth Verifying.

May be clearer to rename variables Region1/ThenBB1/MergeBB1, Region2/ThenBB2/MergeBB2, MiddleBB

233

Is the reverse required? Could repeatedly take (r)begin until empty.

Initially each MergeRegion has a single PredInstPHIRecipe, but multiple recipes would appear after merging, right? Worth testing merging three regions into one, if not done so already.

fhahn updated this revision to Diff 351687.Jun 12 2021, 2:32 PM
fhahn marked 6 inline comments as done.

Thanks for the initial comments! Sorry for the delay with following up.

Most comments should be addressed now, with responses inline.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
164

Yes. I updated the code to check if the single successor of ThenBB is MergeBB. If that's not the case, they are swapped and must pass the check below.

176

No that's not needed. Updated to collect the relevant blocks using VPBlockUtils::blocksOnly<VPRegionBlock>

197

getPredicate was wrong. Currently CondBit is kept up-to-date. I updated the code to use it. Perhaps we should remove the CondBit field and implement getCondBit with a lookup of branch-on-mask recipes as follow-up?

214

ThenRegion is expected to have no phi recipes, let alone header phi recipes, but only replicate recipes? Perhaps something potentially worth Verifying.

Added the assertion.

May be clearer to rename variables Region1/ThenBB1/MergeBB1, Region2/ThenBB2/MergeBB2, MiddleBB

Great idea, the suggested names are much clearer!

232

Yes, it updates the uses outside the merged region.

233

I added a test merge_3_replicate_region, which exposed a problem with multiple VPPredInstPHIRecipes in the same block. I put up D104188 to fix that.

242

Yes, there could be multiple uses of ToMove in U. I still need to add a test case for that.

262

moved the deletion outside to clean up all blocks in DeletedBlocks.

fhahn updated this revision to Diff 351724.Jun 13 2021, 8:28 AM

Address remaining comments, added additional tests, clarify comment on why we need to skip recipes used by first-order recurrences.

fhahn marked 3 inline comments as done.Jun 13 2021, 8:39 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
205

The problem with first-order recurrences is that the during code-gen the recurrence vector for the current iteration must be generated *before* all FOR users and *after* all parts of the recurrence value have been computed.

Note that this depends on D100260 and I added an additional test to cover the problematic case recipe_in_merge_candidate_used_by_first_order_recurrence.

232

Note: this takes care of fusion-admitting, lane-by-lane dependencies, right?

Other thing to consider are memory dependencies. But I don't think at the moment this would be an issue, because the case where we read from a location that is overwritten by a merged store should be rejected by the earlier legality checks. Otherwise we would also not be able to re-order the accesses for widening.

233

Is the reverse required? Could repeatedly take (r)begin until empty.

moving them in reverse order seem natural (dominance between defs and uses is not broken temporarily) and reverse() should just reverse the iterators and not have any additional runtime cost.

242

Test is @update_2_uses_in_same_recipe_in_merged_block

Ayal added a comment.Jun 15 2021, 12:12 AM

This looks good to me, adding final minor nits.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
197

getPredicate was wrong. Currently CondBit is kept up-to-date. I updated the code to use it. Perhaps we should remove the CondBit field and implement getCondBit with a lookup of branch-on-mask recipes as follow-up?

+1!

204

Error message?

205

The problem with first-order recurrences is that the during code-gen the recurrence vector for the current iteration must be generated *before* all FOR users and *after* all parts of the recurrence value have been computed.

Note that this depends on D100260 and I added an additional test to cover the problematic case recipe_in_merge_candidate_used_by_first_order_recurrence.

D100260 >> D104197, right?

232

moveBefore Then2->begin() should be fine, because Then2 has no phi's; worth a comment

253

moveBefore Merge2->begin() is right, because we're moving phi's here; worth a comment.

261

Alternatively (and more consistently) use make_early_inc_range?

fhahn updated this revision to Diff 353236.Jun 20 2021, 11:15 AM

Address comments, return condition directly from isPredicateRegion.

fhahn marked 3 inline comments as done.Jun 20 2021, 11:19 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
204

I moved the assert (with added message) to later and also updated isPredicatedTriangle to return the condition from the VPBranchOnMask recipe directly.

205

Yes, D104197 is the correct one. I also addressed the latest comments there.

232

Yes, but I think it is probably better to just use getFirstNonPhi(). Alternatively I can undo the change and add a comment.

253

Yes. I updated the variable name to PhiToMove, so it should be clearer. I did not add a comment because it seems a bit redundant to me after the renaming. But I'd be happy to add one if you think it is still valuable after the renaming.

261

done, thanks!

Ayal added inline comments.Jun 20 2021, 12:48 PM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
111

Why make_early_inc_range here, we're traversing all VPBB's only to populate WorkList with VPValues.

153

Then perhaps "getPredicatedThenBlock(R)" would be a more accurate name than isPredicatedTriange()?

The first part checking if R's entry is a basic block with a single recipe being a Branch on Mask recipe, and returning its condition if so, can be done in a separate getPredicatedCondition(R)?

This would require later doing something like

Cond1 = getPredicatedCondition(Region1);
Cond2 = getPredicatedCondition(Region2);
if (!Cond1 || Cond1 != Cond2)
  continue;
Then1 = getPredicatedThenBlock(Region1);
Then2 = getPredicatedThenBlock(Region2);
if (!Then1 || !Then2)
  continue;
199

"Succ" >> "MiddleBasicBlock"?

218

Perhaps worth further clarifying: a recipe R feeding a first order recurrence phi must allow for a *vector* shuffle to be inserted immediately after it, and therefore if R is *scalarized and predicated* it must appear last in its basic block. In addition, other recipes may need to "sink after" R, so best if R not be moved at all.
A word on not moving multi-def'd recipes, e.g., they are not expected, and to simplify looking after all users?

253

+1

Another option may be "Phi1", as it is a Phi of Merge1, of Region1. All Phi1's are moved to appear right before Phi2's. Or "Phi1ToMove".

Perhaps "IncV" >> "PredInst1", given that it is the recipe in Then1 feeding [PredInst]Phi1/Phi1ToMove.

262

Region1 itself should also be deleted, after its basic blocks are.
But suffice to do DeletedRegions.insert(Region1); i.e., deleting every Region1 takes care to destroy all its basic blocks.

fhahn updated this revision to Diff 354285.Jun 24 2021, 9:51 AM
fhahn marked 7 inline comments as done.

Addressed latest comments, thanks! The patch depends on D104197 (latest comments there have been also addressed)

fhahn added inline comments.Jun 24 2021, 9:52 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
111

I think that's a leftover from a previous rebase.

153

Updated, thanks!

218

Perhaps worth further clarifying: a recipe R feeding a first order recurrence phi must allow for a *vector* shuffle to be inserted immediately after it, and therefore if R is *scalarized and predicated* it must appear last in its basic block. In addition, other recipes may need to "sink after" R, so best if R not be moved at all.

Thanks, I extended the comment!

A word on not moving multi-def'd recipes, e.g., they are not expected, and to simplify looking after all users?

There should be no multi-defs in a region, I turned it into an assert.

253

Added 1 to both names.

262

Good point, updated to just add Region1.

Ayal accepted this revision.Jun 27 2021, 7:29 AM

This looks good to me, thanks!
Adding some minor optional nits.
May be worth noting why no fusion-preventing dependencies are expected.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
154

"the condition" >> "the mask" ?

174

Could alternatively do

if (Succ0->getNumSuccessors() + Succ1->getNumSuccessors() != 1)
  return nullptr;
if (Succ0->getSingleSuccessor() == Succ1)
  return Succ0;
if (Succ1->getSingleSuccessor() == Succ0)
  return Succ1;
return nullptr;
185

SetVector<VPRegionBlock *> DeletedRegions ?

205

could use dyn_cast_or_null<VPBasicBlock>(Region1->getSingleSuccessor()); here instead of filtering regions with non single successors earlier (which probably do not exist).

259

worth introducing VPPredInstPHIRecipe::getPredInst() { return getOperand(0); } ?

275

Perhaps looks somewhat better if Region1 is inserted into DeletedBlocks (DeletedRegions) after disconnecting it.

This revision is now accepted and ready to land.Jun 27 2021, 7:29 AM
fhahn updated this revision to Diff 354816.Jun 28 2021, 2:26 AM
fhahn marked 3 inline comments as done.

Rebased, will commit soon.

This looks good to me, thanks!
Adding some minor optional nits.
May be worth noting why no fusion-preventing dependencies are expected.

Thanks, I added a note about fusion-preventing dependencies.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
205

I kept the code as it is for now, because collecting the candidates up-front is required to avoid iterator invalidation problems caused by removing a region and adjusting predecessors/successors.

Ayal added inline comments.Jun 28 2021, 2:37 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
205

Sure, early collection is still needed, can just simply have it collect all regions.

This revision was landed with ongoing or failed builds.Jun 28 2021, 3:14 AM
This revision was automatically updated to reflect the committed changes.
fhahn added inline comments.Jun 28 2021, 3:15 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
205

Sounds good, I included that in the committed version!