This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Merge predicated-triangle regions, after sinking.
ClosedPublic

Authored by fhahn on Apr 11 2021, 4:26 AM.

Details

Summary

Sinking scalar operands into predicated-triangle regions may allow
merging regions. This patch adds a VPlan-to-VPlan transform that tries
to merge predicate-triangle regions after sinking.

Diff Detail

Event Timeline

fhahn created this revision.Apr 11 2021, 4:26 AM
fhahn requested review of this revision.Apr 11 2021, 4:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2021, 4:26 AM
Herald added a subscriber: vkmr. · View Herald Transcript
fhahn updated this revision to Diff 346406.May 19 2021, 4:41 AM

ping :)

I rebased the patch and updated it to run merging unconditionally (even if no sinking happened), as there may already be opportunities for merging that do not require sinking. Adjusted a few improved tests.

Ayal added a comment.May 23 2021, 1:26 AM

Nice optimization exercising VPlan capabilities!
Should always be a win, so ok to avoid cost-model considerations, although cost improvement is left unaccounted for.
An alternative may be to try and build larger replicating regions to begin with, but sinkScalarOperands may contribute as well.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
153

rely on order of successors?

158

Can check || ThenBB->getSingleSuccessor() != MergeBB

165

Is RPOT order vital, or would a simple DFS do, given that the pairs of regions handled are dom-postdom pairs?

181

dyn_cast_or_null would also take care of checking if Region has a single successor?

185

dyn_cast_or_null would also take care of checking if Succ has a single successor?

186

This may be somewhat misleading, and currently untestable: replicated regions are (currently) formed for unpredicatable instructions, by 'legacy' vectorizer which does not model block predicates at all. Assert the predicates are null?

194

because ... it may conflict with sink-after?

comment here should probably state what it is we can move.
Could there potentially be fusion-preventing dependencies between the two regions (packing/unpacking?), or does the lack of recipes in the 'middle' block guarantee their absence?

203

May be good to name this lambda, e.g., IsImmovableRecipe.

Check first-order recurrence (more) directly, by

if (PhiR && !PhiR->getRecurrenceDescriptor())
  return true

?
Would be good to have a clear API for such a query, if not a dedicated isa<> recipe.

221

Note: this takes care of fusion-admitting, lane-by-lane dependencies, right?

231

I could appear as multiple operands of U, so avoid early-exit on first match, right? Perhaps worth a test.

Assert that at-least one such operand was found and (re)set?

251

delete DeletedBlocks?

Ayal added inline comments.May 26 2021, 1:21 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
186

OTOH, need to check that both VPBranchOnMaskRecipes use the same mask. Potentially have getPredicate() retrieve this mask? Test that two regions having distinct masks are not merged.

203

ThenRegion is expected to have no phi recipes, let alone header phi recipes, but only replicate recipes? Perhaps something potentially worth Verifying.

May be clearer to rename variables Region1/ThenBB1/MergeBB1, Region2/ThenBB2/MergeBB2, MiddleBB

222

Is the reverse required? Could repeatedly take (r)begin until empty.

Initially each MergeRegion has a single PredInstPHIRecipe, but multiple recipes would appear after merging, right? Worth testing merging three regions into one, if not done so already.

fhahn updated this revision to Diff 351687.Jun 12 2021, 2:32 PM
fhahn marked 6 inline comments as done.

Thanks for the initial comments! Sorry for the delay with following up.

Most comments should be addressed now, with responses inline.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
153

Yes. I updated the code to check if the single successor of ThenBB is MergeBB. If that's not the case, they are swapped and must pass the check below.

165

No that's not needed. Updated to collect the relevant blocks using VPBlockUtils::blocksOnly<VPRegionBlock>

186

getPredicate was wrong. Currently CondBit is kept up-to-date. I updated the code to use it. Perhaps we should remove the CondBit field and implement getCondBit with a lookup of branch-on-mask recipes as follow-up?

203

ThenRegion is expected to have no phi recipes, let alone header phi recipes, but only replicate recipes? Perhaps something potentially worth Verifying.

Added the assertion.

May be clearer to rename variables Region1/ThenBB1/MergeBB1, Region2/ThenBB2/MergeBB2, MiddleBB

Great idea, the suggested names are much clearer!

221

Yes, it updates the uses outside the merged region.

222

I added a test merge_3_replicate_region, which exposed a problem with multiple VPPredInstPHIRecipes in the same block. I put up D104188 to fix that.

231

Yes, there could be multiple uses of ToMove in U. I still need to add a test case for that.

251

moved the deletion outside to clean up all blocks in DeletedBlocks.

fhahn updated this revision to Diff 351724.Jun 13 2021, 8:28 AM

Address remaining comments, added additional tests, clarify comment on why we need to skip recipes used by first-order recurrences.

fhahn marked 3 inline comments as done.Jun 13 2021, 8:39 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
194

The problem with first-order recurrences is that the during code-gen the recurrence vector for the current iteration must be generated *before* all FOR users and *after* all parts of the recurrence value have been computed.

Note that this depends on D100260 and I added an additional test to cover the problematic case recipe_in_merge_candidate_used_by_first_order_recurrence.

221

Note: this takes care of fusion-admitting, lane-by-lane dependencies, right?

Other thing to consider are memory dependencies. But I don't think at the moment this would be an issue, because the case where we read from a location that is overwritten by a merged store should be rejected by the earlier legality checks. Otherwise we would also not be able to re-order the accesses for widening.

222

Is the reverse required? Could repeatedly take (r)begin until empty.

moving them in reverse order seem natural (dominance between defs and uses is not broken temporarily) and reverse() should just reverse the iterators and not have any additional runtime cost.

231

Test is @update_2_uses_in_same_recipe_in_merged_block

Ayal added a comment.Jun 15 2021, 12:12 AM

This looks good to me, adding final minor nits.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
186

getPredicate was wrong. Currently CondBit is kept up-to-date. I updated the code to use it. Perhaps we should remove the CondBit field and implement getCondBit with a lookup of branch-on-mask recipes as follow-up?

+1!

193

Error message?

194

The problem with first-order recurrences is that the during code-gen the recurrence vector for the current iteration must be generated *before* all FOR users and *after* all parts of the recurrence value have been computed.

Note that this depends on D100260 and I added an additional test to cover the problematic case recipe_in_merge_candidate_used_by_first_order_recurrence.

D100260 >> D104197, right?

221

moveBefore Then2->begin() should be fine, because Then2 has no phi's; worth a comment

242

moveBefore Merge2->begin() is right, because we're moving phi's here; worth a comment.

250

Alternatively (and more consistently) use make_early_inc_range?

fhahn updated this revision to Diff 353236.Jun 20 2021, 11:15 AM

Address comments, return condition directly from isPredicateRegion.

fhahn marked 3 inline comments as done.Jun 20 2021, 11:19 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
193

I moved the assert (with added message) to later and also updated isPredicatedTriangle to return the condition from the VPBranchOnMask recipe directly.

194

Yes, D104197 is the correct one. I also addressed the latest comments there.

221

Yes, but I think it is probably better to just use getFirstNonPhi(). Alternatively I can undo the change and add a comment.

242

Yes. I updated the variable name to PhiToMove, so it should be clearer. I did not add a comment because it seems a bit redundant to me after the renaming. But I'd be happy to add one if you think it is still valuable after the renaming.

250

done, thanks!

Ayal added inline comments.Jun 20 2021, 12:48 PM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
108

Why make_early_inc_range here, we're traversing all VPBB's only to populate WorkList with VPValues.

142

Then perhaps "getPredicatedThenBlock(R)" would be a more accurate name than isPredicatedTriange()?

The first part checking if R's entry is a basic block with a single recipe being a Branch on Mask recipe, and returning its condition if so, can be done in a separate getPredicatedCondition(R)?

This would require later doing something like

Cond1 = getPredicatedCondition(Region1);
Cond2 = getPredicatedCondition(Region2);
if (!Cond1 || Cond1 != Cond2)
  continue;
Then1 = getPredicatedThenBlock(Region1);
Then2 = getPredicatedThenBlock(Region2);
if (!Then1 || !Then2)
  continue;
188

"Succ" >> "MiddleBasicBlock"?

207

Perhaps worth further clarifying: a recipe R feeding a first order recurrence phi must allow for a *vector* shuffle to be inserted immediately after it, and therefore if R is *scalarized and predicated* it must appear last in its basic block. In addition, other recipes may need to "sink after" R, so best if R not be moved at all.
A word on not moving multi-def'd recipes, e.g., they are not expected, and to simplify looking after all users?

242

+1

Another option may be "Phi1", as it is a Phi of Merge1, of Region1. All Phi1's are moved to appear right before Phi2's. Or "Phi1ToMove".

Perhaps "IncV" >> "PredInst1", given that it is the recipe in Then1 feeding [PredInst]Phi1/Phi1ToMove.

251

Region1 itself should also be deleted, after its basic blocks are.
But suffice to do DeletedRegions.insert(Region1); i.e., deleting every Region1 takes care to destroy all its basic blocks.

fhahn updated this revision to Diff 354285.Jun 24 2021, 9:51 AM
fhahn marked 7 inline comments as done.

Addressed latest comments, thanks! The patch depends on D104197 (latest comments there have been also addressed)

fhahn added inline comments.Jun 24 2021, 9:52 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
108

I think that's a leftover from a previous rebase.

142

Updated, thanks!

207

Perhaps worth further clarifying: a recipe R feeding a first order recurrence phi must allow for a *vector* shuffle to be inserted immediately after it, and therefore if R is *scalarized and predicated* it must appear last in its basic block. In addition, other recipes may need to "sink after" R, so best if R not be moved at all.

Thanks, I extended the comment!

A word on not moving multi-def'd recipes, e.g., they are not expected, and to simplify looking after all users?

There should be no multi-defs in a region, I turned it into an assert.

242

Added 1 to both names.

251

Good point, updated to just add Region1.

Ayal accepted this revision.Jun 27 2021, 7:29 AM

This looks good to me, thanks!
Adding some minor optional nits.
May be worth noting why no fusion-preventing dependencies are expected.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
143

"the condition" >> "the mask" ?

163

Could alternatively do

if (Succ0->getNumSuccessors() + Succ1->getNumSuccessors() != 1)
  return nullptr;
if (Succ0->getSingleSuccessor() == Succ1)
  return Succ0;
if (Succ1->getSingleSuccessor() == Succ0)
  return Succ1;
return nullptr;
174

SetVector<VPRegionBlock *> DeletedRegions ?

194

could use dyn_cast_or_null<VPBasicBlock>(Region1->getSingleSuccessor()); here instead of filtering regions with non single successors earlier (which probably do not exist).

248

worth introducing VPPredInstPHIRecipe::getPredInst() { return getOperand(0); } ?

264

Perhaps looks somewhat better if Region1 is inserted into DeletedBlocks (DeletedRegions) after disconnecting it.

This revision is now accepted and ready to land.Jun 27 2021, 7:29 AM
fhahn updated this revision to Diff 354816.Jun 28 2021, 2:26 AM
fhahn marked 3 inline comments as done.

Rebased, will commit soon.

This looks good to me, thanks!
Adding some minor optional nits.
May be worth noting why no fusion-preventing dependencies are expected.

Thanks, I added a note about fusion-preventing dependencies.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
194

I kept the code as it is for now, because collecting the candidates up-front is required to avoid iterator invalidation problems caused by removing a region and adjusting predecessors/successors.

Ayal added inline comments.Jun 28 2021, 2:37 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
194

Sure, early collection is still needed, can just simply have it collect all regions.

This revision was landed with ongoing or failed builds.Jun 28 2021, 3:14 AM
This revision was automatically updated to reflect the committed changes.
fhahn added inline comments.Jun 28 2021, 3:15 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
194

Sounds good, I included that in the committed version!