Add and run VPlan transform to fold blocks with a single predecessor
into the predecessor. This remove redundant blocks and addresses a TODO
to replace special handling for the vector latch VPBB.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9158 | nit: not sure what is meant here by "earlier" or improper maintenance of VPlan transforms? | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
313 | Is RPOT needed (due to destructive / make_early_inc_range) or would depth_first suffice? Suffice to iterate over blocksOnly<VPBasicBlock> and tryToMerge[Basic]BlockIntoPredecessor(VPBasicBlock)? | |
315 | Is tryToMergeBlockIntoPredecessor() still needed (by buildVPlanWithVPRecipes()), or can this pass absorb it? | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.h | ||
43 | Missing documentation. (Here are in few other transforms, independently of this patch) |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
313 | Unforunately depth_first doesn't support make_early_inc_range at the moment, I am not sure if that's because of VPBlockRecursiveTraversalWrapper or something else.
Done, thanks! | |
315 | Done, it's not needed any longer. | |
llvm/lib/Transforms/Vectorize/VPlanTransforms.h | ||
43 | Added, thanks! |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
9158 | I think this mainly referred to doing mergeBlocksIntoPredecessors on 2 individual blocks here and immediately after constructing the initial recipes. The latter has been removed as well now |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
328 | Q1: Can VPBB be a successor of itself? If it can, this is probably fine, as we (1) shouldn't reach here (unless VPBB is unreachable) and (2) reconnecting it to itself should do no harm as it is going to be deleted anyway. Worth asserting? |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
313 | Another option may be to scan depth_first and push candidates VPBB's onto a stack that is later traversed (succ before pred) to merge each candidate into its predecessor and delete it. This could potentially minimize RPOT copying overhead, although VPlan sizes are currently small(?) |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
313 | RPOT is not needed, but if we want to use existing graph iterators I think we need to do the traversal first and then process it. I updated the code to cache depth-first traversal, to make clear RPOT is not needed. Alternatively we could do the traversal manually, but this would need more work if we want to traverse through regions. | |
328 |
In theory there's nothing preventing that, but in practice all cycles should be represented as regions without explicit back-edge at the moment.
VPBB previously could have other users, but those have been removed AFAICT. VPBasicBlock::dropAllReferences removes references to all recipes in the block. |
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
313 | Ahh, right, candidates can also be visited in pred-before-succ order, or any other. Original thought was to collect all VPBB's having a single BB predecessor which in turn has a single successor, as this collection is expected to be smaller than all blocks/basic-blocks, if not empty; plus one could return !empty() instead of Changed. But this is mostly a matter of style, i.e., separating the logic into inspection and execution. Current approach is also fine, what would you prefer? |
Update to only collect merge-able blocks.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
313 | Oh right, updated to only collect merge-able blocks, thanks! |
Thanks!
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp | ||
---|---|---|
318 | nit: seems clearer w/o early-exit if (PredVPBB && PredVPBB->getNumSuccessors() == 1) WorkList.push_back(VPBB); ? |
nit: not sure what is meant here by "earlier" or improper maintenance of VPlan transforms?