This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Model branch cond to enter scalar epilogue in VPlan.
Needs ReviewPublic

Authored by fhahn on May 11 2023, 2:18 PM.

Details

Reviewers
Ayal
gilr
rengolin
Summary

This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. It introduces a new ICmpEQ opcode to VPInstruction and a
new live-out that just updates the branch condition with a given value.
This requires introducing subclassess for VPLiveOut.

For now this is still WIP as the ICmpEQ codegen isn't general yet and
most VPlan printing tests need updating. I'll do that once we converge
on the overall design. Also currently we keep all live-outs in a map
with phi nodes, which doesn't make sense for the new live-out.

Diff Detail

Event Timeline

fhahn created this revision.May 11 2023, 2:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 2:18 PM
fhahn requested review of this revision.May 11 2023, 2:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 2:18 PM
Ayal added inline comments.May 15 2023, 4:42 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3686

(Independent of this patch): builder insertion point should better be used for recipe execution only (and then unset), rather than passing MiddleBlock with it for live-out fixing, which should arguably take care of themselves.

3688

LiveOuts "fix" existing IR Instructions, rather than "execute" - generate new IR Instructions.

3830

(nit: LiveOuts should be a small vector of VPLiveOutPhi's in this case.)

(Independent of this patch): Above should be handled by recipe(s) and below by fixPhi().

8812

While we're here, MiddleVPBB is unused (relates to patch D123537 rather than this patch).

9076

This compare is mandatory (where needed) - should it be introduced as part of building the initial VPlan rather than here as a VPlan2VPlan transformation? In any case, probably better to outline out of the excessive tryToBuildVPlanWithVPRecipes().

9092

This is a first population of MiddleVPBB with recipes, following its introduction in D123457, right?

Presumably, all instructions of MiddleBlock should eventually be generated by VPlan recipes/VPInstructions.

The conditional branch whose condition will be backpatched to Cmp is generated by createVectorLoopSkeleton() as part of the vectorization process, rather than being an original Instruction that can be wrapped with a VPLiveOut - like the LCSSA Phi's of the Exit block.
Could this conditional branch terminating MiddleVPBB be generated as a recipe too (w/o any further live-out users), along with moving the generation of its compare from skeleton to VPlan, rather than fixing it via LiveOutBr? I.e., extend/have a forward BranchOnCond as mentioned when introduced in D126618 to represent loop/backward branches, and similar to BranchOnMask. Following VPlan's approach of rewiring branches when visiting their successors, the two successors of MiddleBlock (Exit block and ScalarPreHeader/VectorEpilogTripCountCheck) could be wrapped in VPBB's, similar to the current wrapping of MiddleVPBB (only) to rewire the vector loop latch branch. These two successors should reuse pre-built IRBB's, similar to MiddleVPBB, which in turn can have VPlan generate its IRBB instead of createVectorLoopSkeleton().

Other skeletal parts, including bypass basic blocks between Preheader and VecPreheader, will also need similar forward out-of-loop conditional branch recipes, when migrated to VPlan.

WDYT?

9093

Note that this accommodates only a single VPLiveOutBr due to insertion with null key into a unique map. Would be good to get rid of that map anyhow.

llvm/lib/Transforms/Vectorize/VPlan.cpp
461

Retain comment?

463

Worth a comment, setting builder to execute recipes at (the end of) "this" MiddleVPBB?

809

(Independent of this patch:) Note that this may be a confusing choice of names: CFG.ExitBB is set to "MiddleBlock", which is the new Exit-from-loop block, as opposed to the original Exit-from-loop-block(s), aka "Exit".

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
165

(Independent of this patch) Better to retrieve
BasicBlock *FromBB = State->CFG.VPBB2IRBB[FromVPBB];
than to rely on State.Builder to be set to MiddleBB,
where the LiveOut is initialized with FromVPBB = MiddleVPBB - conceptually as another operand, VPBB2IRBB[] analogous to State.get().

270

This ICmpEQ is designed to be placed and executed out-of-loop, assert Part is zero?

275

Builder used to indicate insertion block rather than point? Builder.CreateICmpEQ()?

fhahn updated this revision to Diff 551753.Aug 19 2023, 7:49 AM
fhahn marked 13 inline comments as done.

Address latest comments, thanks!

I reworked the patch to use a BranchOnCond VPInstruction to handle the branch in MiddleVPBB and undid the changes to live-outs. At the moment, this requires updating the branch recipe's operands after the skeleton (and the exit and scalar preheader blocks) have been created. There's FIXMEs to model creation of those blocks in VPlan directly as well.

This should compose nicely with tail-folding as VPlan-to-VPlan transform (D157713); once the transformation has been applied, the conditional branch can be replaced by an unconditional one.

fhahn added inline comments.Aug 19 2023, 7:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3686

This can be removed after this patch lands (more specifically after the insert point is set in VPBasicBlock::execute), but reorders some extracts, so would be better as a follow-up I think.

3688

(nit: LiveOuts should be a small vector of VPLiveOutPhi's in this case.)

3830

(nit: LiveOuts should be a small vector of VPLiveOutPhi's in this case.)

Code got removed.

8812

Removed in 9ee4a740e326, thanks!

9092

Updated to use BranchOnCond; as BasicBlock is a subclass of Value it is possible to wrap them in live-in VPValues

9093

Code got removed.

llvm/lib/Transforms/Vectorize/VPlan.cpp
461

Retained, thanks!

463

We need to set the insert point in both cases, sunk outside if.

809

Yes, should be updated to MiddleBlock for now? I think once the current patch is in, we could also handle exit block and scalar.ph creation in VPlan, to remove the need to create branches up front.

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
165

Updated here ada2a455fca4 to use the middle VPBB from VPlan for now, as this is what should be used implicitly

270

The modeling here will need future refinement. At the moment all recipes are executed UF times I think. As discussed at a different patch, I am working on patches to allow explicit cloning of recipes to eliminate the implicit dependence on UF.

275

Split off refactoring to use Builder in old skeleton construction to avoid unrelated changes in the diff: the builder will simplify ICmps with constant operands D158332. Should be able to perform this simplification in VPlan directly eventually.

fhahn added inline comments.Aug 19 2023, 7:52 AM
llvm/test/Transforms/LoopVectorize/AArch64/fixed-order-recurrence.ll
54

Note that the final branch is now created later than before, which means the block use list order is slightly different, resulting in different order in the incoming block here (and in many other tests)

Ayal added a comment.Aug 25 2023, 4:26 AM

Further thoughts how to best stage this.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3067

nit: LoopMiddleBlock will end with a branch to LoopScalarPreHeader, following how the latter is constructed here, so it's redundant to replace this branch below with BrInst if a scalar epilog is not required.

3081

Can this FIXME be addressed so that VPlan execution will take care of setting the successors of this branch, naturally?

In addition to moving the generation of ICmpEQ from ILV::completeLoopSkeleton() to a VPInstruction introduced in VPlan::createInitialVPlan(), can the generation of LoopMiddleBlock and LoopScalarPreHeader block move from ILV::createVectorLoopSkeleton() to VPlan? Can both be applied as a VPlan transform, perhaps alongside optimizeForVFAndUF() - knowing VF*UF may help determine if it divides TC and thus middle block should jump to exit unconditionally (or have loop latch jump to exit directly). ILV will then detect the middle block and scalar loop pre-header built by VPlan and set its LoopMiddleBlock and LoopScalarPreHeader variables.

7659

A VPBB which is invariantly associated with an existing IRBB should perhaps record this association when created, rather than via State.CFG.PrevBB/ExitBB. This includes the (pre-)pre-header, exit block, and potentially original loop blocks from header to latch. Such VPBB's facilitate hooking VPlan's boundary to existing IR, injecting recipes into existing blocks, reusing original loop for default and/or remainder loops, and as a basis to clone and optimize it. Such VPBB's may contain an "in-place" recipe that represents the existing IR Instructions (w/o generating them/anything during execute()), to model their live-ins and live-outs, and as a placeholder for injecting other recipes before/after. The existence of such a recipe implies that its VPBB is "in-place", so the associated IRBB could be retrieved from it.

8876

Here 2) conceptually includes both "tail is to be folded" and "there is no tail", i.e., when VF*UF is known to divide TC.
Can start with one "canonical" form, for arbitrary VF,UF and un/folded tail, and later revise VPlan when fixing VF,UF and (un)folding tail.

The three cases of (1) jump to scalar preheader, (2) jump to exit, (3) jump to either scalar preheader or exit, are compressed here into a single boolean RequiresScalarEpilogueCheck?

9092

Better take care of branches as recipes along with taking care of their destinations as VPBB's, consistently, even if it's possible to smuggle IRBB destinations as underlying values of live-in VPValues.

llvm/lib/Transforms/Vectorize/VPlan.cpp
405

Could the following logic of hooking up forwards branches when executing their target basic blocks be employed to backpatch the successors of the Middle block's branch targeting the exit block and/or scalar preheader? Or rather the complementary logic in VPBB::execute for IR BB's that are reused (and rewire branches to them there) instead of created here.

595

The term Exiting block typically means the first case only, i.e., a block inside a loop which can exit the loop (to an Exit block). The proposed change effectively checks if this VPBB has no successors?

llvm/lib/Transforms/Vectorize/VPlan.h
1000

nit: try to keep in lex order?

Should the various condition bit Predicates be provided by VPRecipeWithIRFlags as yet another set of flags, having a derived VPInstructions with ICmp and FCmp opcodes?

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
273

nit: ICmpEQ, and ICmpULE above, should name their operands A and B as in BinOp, rather than IV and TC.

fhahn marked 5 inline comments as done.Aug 25 2023, 1:27 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3081

Yes I think so, but it will possibly require some extra logic to update the dominator tree. In terms of ordering of changes, I think we would need to first move out the branch to be modeled in VPlan, before moving the block creating itself.

7659

PrevBB is used during execution to figure out if blocks can be re-used (mostly around replicate regions now we have VPlan based merging).

For the others, recording the association explicitly sounds good, but maybe even better to try to replace their uses by moving the logic to VPlan explicitly (e.g how to connect them).

An in-place recipe could also help gradually moving parts of the skeleton into VPlan. IIUC this is mostly a. suggestion for future improvements correct?

8876

This initially tries to just follow the existing logic, but can be broken down further in future patches.

llvm/lib/Transforms/Vectorize/VPlan.cpp
405

Hmm, one issue is that we would need to rely on the IR to figure that out, which means we need to rely on the branch in the middle block implicitly here. The back-patching shouldn't be needed once the successors are also modeled in VPlan (and middle-block is created directly in VPlan). WDYT?

595

I think this check is currently only used to check if the block is supposed to have a conditional terminator, so maybe the function should be renamed to reflect that?

llvm/lib/Transforms/Vectorize/VPlan.h
1000

Yes I have been thinking something along the same lines. It might stretch the IRFlags a bit but should work.

Ayal added inline comments.Aug 27 2023, 2:56 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3081

Moving out the branch first, requires a distinct way of hooking up the successors. Moving out the blocks first - even if disconnected at first - as in the pre-pre-header case, could potentially be done initially while keeping the branch creation outside VPlan - as a post-processing, followed by moving out the branch to a recipe - which would be simpler having prepared its successors?

7659

PrevBB is used during execution to figure out if blocks can be re-used (mostly around replicate regions now we have VPlan based merging).

Agreed, and once replicate regions get unrolled as a VPlan-to-VPlan transform, their entry and exit VPBB's should be merged in VPlan as well.

For the others, recording the association explicitly sounds good, but maybe even better to try to replace their uses by moving the logic to VPlan explicitly (e.g how to connect them).

Hmm, intrigued by an even better option - when a VPBB executes it needs to figure out whether to create a new IRBB for its recipes or whether the latter are to populate an existing IRBB?

An in-place recipe could also help gradually moving parts of the skeleton into VPlan. IIUC this is mostly a. suggestion for future improvements correct?

Yes, possibly. Raised here to consider various pieces in order to help converge on an appropriate order of introducing them.

8876

Following existing logic first is fine. Complement the above explanation about the three options with how they relate to the boolean flag below, for now?

llvm/lib/Transforms/Vectorize/VPlan.cpp
405

By backpatching I meant VPlan's existing mechanism of first generating the forward IR branch (before its IR successors are generated/visited) and then revisit the branch to set its destinations once the IR successors are generated/visited. Suggestion is indeed to model the successors in VPlan so that this mechanism could be employed.

595

If this is the desired semantics, naming should indeed reflect it, but BranchOnCond implies a conditional terminator. Question is if a destination goes backwards - in which case it is hooked up when the branch is generated, or goes forwards - in which case it is hooked up when the destination is processed (after processing the branch). Currently BranchOnCond's that have backwards destinations are loop control branches having a backwards destination targeting the header block and a forwards destination targeting the exit/middle block.

809

Or perhaps handle these blocks as VPBB's first, as discussed in other comments here.