This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Merge predicated-triangle regions, after sinking.
ClosedPublic

Authored by fhahn on Apr 11 2021, 4:26 AM.

Details

Summary

Sinking scalar operands into predicated-triangle regions may allow
merging regions. This patch adds a VPlan-to-VPlan transform that tries
to merge predicate-triangle regions after sinking.

Diff Detail

Event Timeline

fhahn created this revision.Apr 11 2021, 4:26 AM
fhahn requested review of this revision.Apr 11 2021, 4:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2021, 4:26 AM
Herald added a subscriber: vkmr. · View Herald Transcript
fhahn updated this revision to Diff 346406.May 19 2021, 4:41 AM

ping :)

I rebased the patch and updated it to run merging unconditionally (even if no sinking happened), as there may already be opportunities for merging that do not require sinking. Adjusted a few improved tests.

Ayal added a comment.May 23 2021, 1:26 AM

Nice optimization exercising VPlan capabilities!
Should always be a win, so ok to avoid cost-model considerations, although cost improvement is left unaccounted for.
An alternative may be to try and build larger replicating regions to begin with, but sinkScalarOperands may contribute as well.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
163

rely on order of successors?

168

Can check || ThenBB->getSingleSuccessor() != MergeBB

175

Is RPOT order vital, or would a simple DFS do, given that the pairs of regions handled are dom-postdom pairs?

191

dyn_cast_or_null would also take care of checking if Region has a single successor?

195

dyn_cast_or_null would also take care of checking if Succ has a single successor?

196

This may be somewhat misleading, and currently untestable: replicated regions are (currently) formed for unpredicatable instructions, by 'legacy' vectorizer which does not model block predicates at all. Assert the predicates are null?

204

because ... it may conflict with sink-after?

comment here should probably state what it is we can move.
Could there potentially be fusion-preventing dependencies between the two regions (packing/unpacking?), or does the lack of recipes in the 'middle' block guarantee their absence?

213

May be good to name this lambda, e.g., IsImmovableRecipe.

Check first-order recurrence (more) directly, by

if (PhiR && !PhiR->getRecurrenceDescriptor())
  return true

?
Would be good to have a clear API for such a query, if not a dedicated isa<> recipe.

231

Note: this takes care of fusion-admitting, lane-by-lane dependencies, right?

241

I could appear as multiple operands of U, so avoid early-exit on first match, right? Perhaps worth a test.

Assert that at-least one such operand was found and (re)set?

261

delete DeletedBlocks?

Ayal added inline comments.May 26 2021, 1:21 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
196

OTOH, need to check that both VPBranchOnMaskRecipes use the same mask. Potentially have getPredicate() retrieve this mask? Test that two regions having distinct masks are not merged.

213

ThenRegion is expected to have no phi recipes, let alone header phi recipes, but only replicate recipes? Perhaps something potentially worth Verifying.

May be clearer to rename variables Region1/ThenBB1/MergeBB1, Region2/ThenBB2/MergeBB2, MiddleBB

232

Is the reverse required? Could repeatedly take (r)begin until empty.

Initially each MergeRegion has a single PredInstPHIRecipe, but multiple recipes would appear after merging, right? Worth testing merging three regions into one, if not done so already.

fhahn updated this revision to Diff 351687.Jun 12 2021, 2:32 PM
fhahn marked 6 inline comments as done.

Thanks for the initial comments! Sorry for the delay with following up.

Most comments should be addressed now, with responses inline.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
163

Yes. I updated the code to check if the single successor of ThenBB is MergeBB. If that's not the case, they are swapped and must pass the check below.

175

No that's not needed. Updated to collect the relevant blocks using VPBlockUtils::blocksOnly<VPRegionBlock>

196

getPredicate was wrong. Currently CondBit is kept up-to-date. I updated the code to use it. Perhaps we should remove the CondBit field and implement getCondBit with a lookup of branch-on-mask recipes as follow-up?

213

ThenRegion is expected to have no phi recipes, let alone header phi recipes, but only replicate recipes? Perhaps something potentially worth Verifying.

Added the assertion.

May be clearer to rename variables Region1/ThenBB1/MergeBB1, Region2/ThenBB2/MergeBB2, MiddleBB

Great idea, the suggested names are much clearer!

231

Yes, it updates the uses outside the merged region.

232

I added a test merge_3_replicate_region, which exposed a problem with multiple VPPredInstPHIRecipes in the same block. I put up D104188 to fix that.

241

Yes, there could be multiple uses of ToMove in U. I still need to add a test case for that.

261

moved the deletion outside to clean up all blocks in DeletedBlocks.

fhahn updated this revision to Diff 351724.Jun 13 2021, 8:28 AM

Address remaining comments, added additional tests, clarify comment on why we need to skip recipes used by first-order recurrences.

fhahn marked 3 inline comments as done.Jun 13 2021, 8:39 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
204

The problem with first-order recurrences is that the during code-gen the recurrence vector for the current iteration must be generated *before* all FOR users and *after* all parts of the recurrence value have been computed.

Note that this depends on D100260 and I added an additional test to cover the problematic case recipe_in_merge_candidate_used_by_first_order_recurrence.

231

Note: this takes care of fusion-admitting, lane-by-lane dependencies, right?

Other thing to consider are memory dependencies. But I don't think at the moment this would be an issue, because the case where we read from a location that is overwritten by a merged store should be rejected by the earlier legality checks. Otherwise we would also not be able to re-order the accesses for widening.

232

Is the reverse required? Could repeatedly take (r)begin until empty.

moving them in reverse order seem natural (dominance between defs and uses is not broken temporarily) and reverse() should just reverse the iterators and not have any additional runtime cost.

241

Test is @update_2_uses_in_same_recipe_in_merged_block

Ayal added a comment.Jun 15 2021, 12:12 AM

This looks good to me, adding final minor nits.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
196

getPredicate was wrong. Currently CondBit is kept up-to-date. I updated the code to use it. Perhaps we should remove the CondBit field and implement getCondBit with a lookup of branch-on-mask recipes as follow-up?

+1!

203

Error message?

204

The problem with first-order recurrences is that the during code-gen the recurrence vector for the current iteration must be generated *before* all FOR users and *after* all parts of the recurrence value have been computed.

Note that this depends on D100260 and I added an additional test to cover the problematic case recipe_in_merge_candidate_used_by_first_order_recurrence.

D100260 >> D104197, right?

231

moveBefore Then2->begin() should be fine, because Then2 has no phi's; worth a comment

252

moveBefore Merge2->begin() is right, because we're moving phi's here; worth a comment.

260

Alternatively (and more consistently) use make_early_inc_range?

fhahn updated this revision to Diff 353236.Jun 20 2021, 11:15 AM

Address comments, return condition directly from isPredicateRegion.

fhahn marked 3 inline comments as done.Jun 20 2021, 11:19 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
203

I moved the assert (with added message) to later and also updated isPredicatedTriangle to return the condition from the VPBranchOnMask recipe directly.

204

Yes, D104197 is the correct one. I also addressed the latest comments there.

231

Yes, but I think it is probably better to just use getFirstNonPhi(). Alternatively I can undo the change and add a comment.

252

Yes. I updated the variable name to PhiToMove, so it should be clearer. I did not add a comment because it seems a bit redundant to me after the renaming. But I'd be happy to add one if you think it is still valuable after the renaming.

260

done, thanks!

Ayal added inline comments.Jun 20 2021, 12:48 PM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
111

Why make_early_inc_range here, we're traversing all VPBB's only to populate WorkList with VPValues.

152

Then perhaps "getPredicatedThenBlock(R)" would be a more accurate name than isPredicatedTriange()?

The first part checking if R's entry is a basic block with a single recipe being a Branch on Mask recipe, and returning its condition if so, can be done in a separate getPredicatedCondition(R)?

This would require later doing something like

Cond1 = getPredicatedCondition(Region1);
Cond2 = getPredicatedCondition(Region2);
if (!Cond1 || Cond1 != Cond2)
  continue;
Then1 = getPredicatedThenBlock(Region1);
Then2 = getPredicatedThenBlock(Region2);
if (!Then1 || !Then2)
  continue;
198

"Succ" >> "MiddleBasicBlock"?

217

Perhaps worth further clarifying: a recipe R feeding a first order recurrence phi must allow for a *vector* shuffle to be inserted immediately after it, and therefore if R is *scalarized and predicated* it must appear last in its basic block. In addition, other recipes may need to "sink after" R, so best if R not be moved at all.
A word on not moving multi-def'd recipes, e.g., they are not expected, and to simplify looking after all users?

252

+1

Another option may be "Phi1", as it is a Phi of Merge1, of Region1. All Phi1's are moved to appear right before Phi2's. Or "Phi1ToMove".

Perhaps "IncV" >> "PredInst1", given that it is the recipe in Then1 feeding [PredInst]Phi1/Phi1ToMove.

261

Region1 itself should also be deleted, after its basic blocks are.
But suffice to do DeletedRegions.insert(Region1); i.e., deleting every Region1 takes care to destroy all its basic blocks.

fhahn updated this revision to Diff 354285.Jun 24 2021, 9:51 AM
fhahn marked 7 inline comments as done.

Addressed latest comments, thanks! The patch depends on D104197 (latest comments there have been also addressed)

fhahn added inline comments.Jun 24 2021, 9:52 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
111

I think that's a leftover from a previous rebase.

152

Updated, thanks!

217

Perhaps worth further clarifying: a recipe R feeding a first order recurrence phi must allow for a *vector* shuffle to be inserted immediately after it, and therefore if R is *scalarized and predicated* it must appear last in its basic block. In addition, other recipes may need to "sink after" R, so best if R not be moved at all.

Thanks, I extended the comment!

A word on not moving multi-def'd recipes, e.g., they are not expected, and to simplify looking after all users?

There should be no multi-defs in a region, I turned it into an assert.

252

Added 1 to both names.

261

Good point, updated to just add Region1.

Ayal accepted this revision.Jun 27 2021, 7:29 AM

This looks good to me, thanks!
Adding some minor optional nits.
May be worth noting why no fusion-preventing dependencies are expected.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
153

"the condition" >> "the mask" ?

173

Could alternatively do

if (Succ0->getNumSuccessors() + Succ1->getNumSuccessors() != 1)
  return nullptr;
if (Succ0->getSingleSuccessor() == Succ1)
  return Succ0;
if (Succ1->getSingleSuccessor() == Succ0)
  return Succ1;
return nullptr;
184

SetVector<VPRegionBlock *> DeletedRegions ?

204

could use dyn_cast_or_null<VPBasicBlock>(Region1->getSingleSuccessor()); here instead of filtering regions with non single successors earlier (which probably do not exist).

258

worth introducing VPPredInstPHIRecipe::getPredInst() { return getOperand(0); } ?

274

Perhaps looks somewhat better if Region1 is inserted into DeletedBlocks (DeletedRegions) after disconnecting it.

This revision is now accepted and ready to land.Jun 27 2021, 7:29 AM
fhahn updated this revision to Diff 354816.Jun 28 2021, 2:26 AM
fhahn marked 3 inline comments as done.

Rebased, will commit soon.

This looks good to me, thanks!
Adding some minor optional nits.
May be worth noting why no fusion-preventing dependencies are expected.

Thanks, I added a note about fusion-preventing dependencies.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
204

I kept the code as it is for now, because collecting the candidates up-front is required to avoid iterator invalidation problems caused by removing a region and adjusting predecessors/successors.

Ayal added inline comments.Jun 28 2021, 2:37 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
204

Sure, early collection is still needed, can just simply have it collect all regions.

This revision was landed with ongoing or failed builds.Jun 28 2021, 3:14 AM
This revision was automatically updated to reflect the committed changes.
fhahn added inline comments.Jun 28 2021, 3:15 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
204

Sounds good, I included that in the committed version!