Sinking scalar operands into predicated-triangle regions may allow
merging regions. This patch adds a VPlan-to-VPlan transform that tries
to merge predicate-triangle regions after sinking.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
ping :)
I rebased the patch and updated it to run merging unconditionally (even if no sinking happened), as there may already be opportunities for merging that do not require sinking. Adjusted a few improved tests.
Nice optimization exercising VPlan capabilities!
Should always be a win, so ok to avoid cost-model considerations, although cost improvement is left unaccounted for.
An alternative may be to try and build larger replicating regions to begin with, but sinkScalarOperands may contribute as well.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
164 | rely on order of successors? | |
169 | Can check || ThenBB->getSingleSuccessor() != MergeBB | |
176 | Is RPOT order vital, or would a simple DFS do, given that the pairs of regions handled are dom-postdom pairs? | |
192 | dyn_cast_or_null would also take care of checking if Region has a single successor? | |
196 | dyn_cast_or_null would also take care of checking if Succ has a single successor? | |
197 | This may be somewhat misleading, and currently untestable: replicated regions are (currently) formed for unpredicatable instructions, by 'legacy' vectorizer which does not model block predicates at all. Assert the predicates are null? | |
205 | because ... it may conflict with sink-after? comment here should probably state what it is we can move. | |
214 | May be good to name this lambda, e.g., IsImmovableRecipe. Check first-order recurrence (more) directly, by if (PhiR && !PhiR->getRecurrenceDescriptor()) return true ? | |
232 | Note: this takes care of fusion-admitting, lane-by-lane dependencies, right? | |
242 | I could appear as multiple operands of U, so avoid early-exit on first match, right? Perhaps worth a test. Assert that at-least one such operand was found and (re)set? | |
262 | delete DeletedBlocks? |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
197 | OTOH, need to check that both VPBranchOnMaskRecipes use the same mask. Potentially have getPredicate() retrieve this mask? Test that two regions having distinct masks are not merged. | |
214 | ThenRegion is expected to have no phi recipes, let alone header phi recipes, but only replicate recipes? Perhaps something potentially worth Verifying. May be clearer to rename variables Region1/ThenBB1/MergeBB1, Region2/ThenBB2/MergeBB2, MiddleBB | |
233 | Is the reverse required? Could repeatedly take (r)begin until empty. Initially each MergeRegion has a single PredInstPHIRecipe, but multiple recipes would appear after merging, right? Worth testing merging three regions into one, if not done so already. |
Thanks for the initial comments! Sorry for the delay with following up.
Most comments should be addressed now, with responses inline.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
164 | Yes. I updated the code to check if the single successor of ThenBB is MergeBB. If that's not the case, they are swapped and must pass the check below. | |
176 | No that's not needed. Updated to collect the relevant blocks using VPBlockUtils::blocksOnly<VPRegionBlock> | |
197 | getPredicate was wrong. Currently CondBit is kept up-to-date. I updated the code to use it. Perhaps we should remove the CondBit field and implement getCondBit with a lookup of branch-on-mask recipes as follow-up? | |
214 |
Added the assertion.
Great idea, the suggested names are much clearer! | |
232 | Yes, it updates the uses outside the merged region. | |
233 | I added a test merge_3_replicate_region, which exposed a problem with multiple VPPredInstPHIRecipes in the same block. I put up D104188 to fix that. | |
242 | Yes, there could be multiple uses of ToMove in U. I still need to add a test case for that. | |
262 | moved the deletion outside to clean up all blocks in DeletedBlocks. |
Address remaining comments, added additional tests, clarify comment on why we need to skip recipes used by first-order recurrences.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
205 | The problem with first-order recurrences is that the during code-gen the recurrence vector for the current iteration must be generated *before* all FOR users and *after* all parts of the recurrence value have been computed. Note that this depends on D100260 and I added an additional test to cover the problematic case recipe_in_merge_candidate_used_by_first_order_recurrence. | |
232 |
Other thing to consider are memory dependencies. But I don't think at the moment this would be an issue, because the case where we read from a location that is overwritten by a merged store should be rejected by the earlier legality checks. Otherwise we would also not be able to re-order the accesses for widening. | |
233 |
moving them in reverse order seem natural (dominance between defs and uses is not broken temporarily) and reverse() should just reverse the iterators and not have any additional runtime cost. | |
242 | Test is @update_2_uses_in_same_recipe_in_merged_block |
This looks good to me, adding final minor nits.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
197 |
+1! | |
204 | Error message? | |
205 |
| |
232 | moveBefore Then2->begin() should be fine, because Then2 has no phi's; worth a comment | |
253 | moveBefore Merge2->begin() is right, because we're moving phi's here; worth a comment. | |
261 | Alternatively (and more consistently) use make_early_inc_range? |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
204 | I moved the assert (with added message) to later and also updated isPredicatedTriangle to return the condition from the VPBranchOnMask recipe directly. | |
205 | Yes, D104197 is the correct one. I also addressed the latest comments there. | |
232 | Yes, but I think it is probably better to just use getFirstNonPhi(). Alternatively I can undo the change and add a comment. | |
253 | Yes. I updated the variable name to PhiToMove, so it should be clearer. I did not add a comment because it seems a bit redundant to me after the renaming. But I'd be happy to add one if you think it is still valuable after the renaming. | |
261 | done, thanks! |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
111 | Why make_early_inc_range here, we're traversing all VPBB's only to populate WorkList with VPValues. | |
153 | Then perhaps "getPredicatedThenBlock(R)" would be a more accurate name than isPredicatedTriange()? The first part checking if R's entry is a basic block with a single recipe being a Branch on Mask recipe, and returning its condition if so, can be done in a separate getPredicatedCondition(R)? This would require later doing something like Cond1 = getPredicatedCondition(Region1); Cond2 = getPredicatedCondition(Region2); if (!Cond1 || Cond1 != Cond2) continue; Then1 = getPredicatedThenBlock(Region1); Then2 = getPredicatedThenBlock(Region2); if (!Then1 || !Then2) continue; | |
199 | "Succ" >> "MiddleBasicBlock"? | |
218 | Perhaps worth further clarifying: a recipe R feeding a first order recurrence phi must allow for a *vector* shuffle to be inserted immediately after it, and therefore if R is *scalarized and predicated* it must appear last in its basic block. In addition, other recipes may need to "sink after" R, so best if R not be moved at all. | |
253 | +1 Another option may be "Phi1", as it is a Phi of Merge1, of Region1. All Phi1's are moved to appear right before Phi2's. Or "Phi1ToMove". Perhaps "IncV" >> "PredInst1", given that it is the recipe in Then1 feeding [PredInst]Phi1/Phi1ToMove. | |
262 | Region1 itself should also be deleted, after its basic blocks are. |
Addressed latest comments, thanks! The patch depends on D104197 (latest comments there have been also addressed)
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
111 | I think that's a leftover from a previous rebase. | |
153 | Updated, thanks! | |
218 |
Thanks, I extended the comment!
There should be no multi-defs in a region, I turned it into an assert. | |
253 | Added 1 to both names. | |
262 | Good point, updated to just add Region1. |
This looks good to me, thanks!
Adding some minor optional nits.
May be worth noting why no fusion-preventing dependencies are expected.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
154 | "the condition" >> "the mask" ? | |
174 | Could alternatively do if (Succ0->getNumSuccessors() + Succ1->getNumSuccessors() != 1) return nullptr; if (Succ0->getSingleSuccessor() == Succ1) return Succ0; if (Succ1->getSingleSuccessor() == Succ0) return Succ1; return nullptr; | |
185 | SetVector<VPRegionBlock *> DeletedRegions ? | |
205 | could use dyn_cast_or_null<VPBasicBlock>(Region1->getSingleSuccessor()); here instead of filtering regions with non single successors earlier (which probably do not exist). | |
259 | worth introducing VPPredInstPHIRecipe::getPredInst() { return getOperand(0); } ? | |
275 | Perhaps looks somewhat better if Region1 is inserted into DeletedBlocks (DeletedRegions) after disconnecting it. |
Rebased, will commit soon.
Thanks, I added a note about fusion-preventing dependencies.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
205 | I kept the code as it is for now, because collecting the candidates up-front is required to avoid iterator invalidation problems caused by removing a region and adjusting predecessors/successors. |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
205 | Sure, early collection is still needed, can just simply have it collect all regions. |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
205 | Sounds good, I included that in the committed version! |
Why make_early_inc_range here, we're traversing all VPBB's only to populate WorkList with VPValues.