This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Create header & latch blocks for plan skeleton up front (NFC).
ClosedPublic

Authored by fhahn on Dec 15 2021, 3:41 AM.

Details

Summary

By creating the header and latch blocks up front and adding blocks and
recipes in between those 2 blocks we ensure that the entry and exits of
the plan remain valid throughout construction.

In order to avoid test changes and keep printing of the plans the same,
we use the new header block instead of creating a new block on the first
iteration of the loop traversing the original loop.

We also fold the latch into its predecessor.

This is a follow up to a post-commit suggestion in D114586.

Diff Detail

Event Timeline

fhahn created this revision.Dec 15 2021, 3:41 AM
fhahn requested review of this revision.Dec 15 2021, 3:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2021, 3:41 AM
Herald added a subscriber: vkmr. · View Herald Transcript
fhahn updated this revision to Diff 394807.Dec 16 2021, 3:10 AM

Use early_inc_range when moving recipes.

Ayal added inline comments.Dec 16 2021, 5:05 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8808

Perhaps something like VPBlockUtils::insertBlockOnEdge() would help take care of ensuring single successor, disconnecting, reconnecting? Admittedly two blocks are being inserted here.

9031

Feed Entry and Exit to constructor of VPRegionBlock instead of setting them explicitly?
Feed Entry to constructor of VPlan instead of setting it explicitly?

9047

Can be simplified into:

if (FillHeaderVPBB)
  FillHeaderVPBB = false;
else {
  auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());
  VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB);
  VPBB = FirstVPBBForBB;
}

?

(Always generating a new VPBB and fusing the empty "dummy" header later, along with fusing latch below, would be reverting D111299?)

llvm/lib/Transforms/Vectorize/VPlan.h
2362

Something early_inc_range could handle?

fhahn updated this revision to Diff 394851.Dec 16 2021, 6:32 AM

Address comments, thanks!

fhahn marked 4 inline comments as done.Dec 16 2021, 6:38 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8808

I left it as is for now, because multiple blocks are added and connected here.

9031

Thanks, updated!

9047

Simplified,thanks!

(Always generating a new VPBB and fusing the empty "dummy" header later, along with fusing latch below, would be reverting D111299?)

Perhaps partly reverting it. But in the current patch the header VPBB is actually used, whereas pre-D111299 this 'dummy pre-entry' was not used for anything.

llvm/lib/Transforms/Vectorize/VPlan.h
2362

Unfortunately I don't think early_inc_range will work here, because the successors are managed in a SmallVector, so the early incremented iterator may be invalid after disconnectBlocks removes an entry from the vector. (It *might* work when reversing the iteration order and early increment, but it seems to me that using a SMallVector here is safer for now.

I added a VPBlockBase::successors() helper that returns an iterator range, to avoid the ugly SmallVector construction. This makes some existing use also nicer.

Ayal added inline comments.Dec 16 2021, 11:12 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9047

(Always generating a new VPBB and ...

On second thought, header phi recipes should be placed in the header VPBB, which may be awkward if current "VPBB" moved on.
Another alternative is mentioned below, which always generates the next new VPBB.

9113–9116

Another alternative to FillHeaderVPBB above, is to prepare here for the next iteration by doing:

auto *NextVPBB = new VPBasicBlock();
VPBlockUtils::insertBlockAfter(NextVPBB, VPBB);
VPBB = NextVPBB;

The start of each iteration needs to only VPBB->setName(BB->getName()) which also takes care of the header block; after exiting the loop we can get rid of the last empty VPBB, possibly as part of fusing the latch.

9297

Would be good to avoid further bloating this excessive method. Could this folding be outlined, or should it be applied only to selected VPlan prior to code-gen rather than every VPlan upon construction? Doesn't VPBasicBlock::execute() effectively fold Exit into its predecessor if possible?

llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358

Would be good to update above documentation

  • condition bit is propagated? to NewBlock?
  • disconnects BlockPtr from all its successors and connects it with NewBlock as its successor
2360–2361

Also assert NewBlock has no predecessors, as documented above?

2361–2371

While we're here, connectBlocks(BlockPtr, NewBlock); ?

2362

Very well.

fhahn updated this revision to Diff 394952.Dec 16 2021, 12:09 PM
fhahn marked 4 inline comments as done.

addressed latest comments, thanks!

fhahn marked 4 inline comments as done.Dec 16 2021, 12:15 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9113–9116

Thanks, updated to use this approach, but with a small extra to avoid generating a redundant empty block in the last iteration.

Alternatively VPBlockUtils::tryToMergeBlockIntoPredecessor could be used, if you prefer.

9297

I added a new helper tryToMergeBlockIntoPredecessor.

. Could this folding be outlined, or should it be applied only to selected VPlan prior to code-gen rather than every VPlan upon construction? Doesn't VPBasicBlock::execute() effectively fold Exit into its predecessor if possible?

The only reason to do this here is to avoid polluting the VPlan printing with additional redundant blocks. We could keep the extra block, but it would require updating all tests that check a printed VPlan and also bloats the plans we need to check in general. WDYT?

llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358

Should be updated, cond bits are now propagated (and removed from BlockPtr ) and moving successors is mentioned.

2360–2361

added, thanks

Ayal added inline comments.Dec 16 2021, 1:57 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9030

Note that setting the name of HeaderVPBB here became redundant. So is setting next.vpbb below.

9113–9116

Perhaps early-exit, combine with the bump, and count backwards:

if (--NumBBsToProcess == 0)
  continue/break;
...

It seemed simpler to avoid the counting and checking by paying for a redundant empty block at the end; it indeed could be cleaned up using tryToMergeBlockIntoPredecessor (thanks for outlining!), although it needs only a disconnect and delete.
Pick whichever version you prefer.

9297

Ah, sure, let's clean up VPlans upon construction then.

llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358

Thanks!
"If \p BlockPtr has more than one successor ..."?

2370

(These condition bits seem to be poorly tested...)

2419

dyn_cast_or_null, or must Block have a single predecessor?

2429

Update Block->getParent()'s Exit if it exists and is equal to Block?

fhahn updated this revision to Diff 395145.Dec 17 2021, 8:53 AM
fhahn marked 6 inline comments as done.

Address comments and properly transfer successors from folded block.

fhahn marked an inline comment as done.Dec 17 2021, 8:55 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9030

I removed it for HeaderVPBB both.

9113–9116

updated to just fold the block afterwards.

llvm/lib/Transforms/Vectorize/VPlan.h
2370

yes, they are only used in the native path and it looks like the function is not used on blocks with condbits. Not sure if we can really improve that.

2419

updated to dyn_cast_or_null.

2429

updated, thanks!

Ayal added inline comments.Dec 18 2021, 3:03 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9288

Should this folding of the latch be done right after folding the empty last VPBB above, or better do so here after dumping Plan.

llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358

Drop "If \p BlockPtr has more than one successor ..."? This method moves all successors of BlockPtr to be successors of NewBlock, also when this involves a single successor.

2421

!PredVPBB->getSingleSuccessor() - suffice to check if PredVPBB->getNumSuccessors() != 1

2427

Either cast (non-dyn) if Block must be in a Region, or check if (ParentRegion && ParentRegion->getExit() == Block)

fhahn updated this revision to Diff 395443.Dec 20 2021, 7:29 AM
fhahn marked 5 inline comments as done.

Address latest comments, thanks!

fhahn added inline comments.Dec 20 2021, 7:31 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9288

I don't think so. I think main motivation is to have the latch/exit block separately so sink-after & other transforms have do not have to explicitly updated the exit block.

llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358

Removed, thanks!

2427

For now, it should be safe to use cast directly, Updated, thanks!

Ayal added a comment.Dec 21 2021, 12:57 AM

Patch is ready to go in, with a comment explaining why the empty pre-latch VPBB is merged early and the latch VPBB is merged late.

Suggestions to merge both blocks early, and other comments added inline, may be further discussed in follow-up patch(es) if preferred, or in this one.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9117

Can alternatively do:

VPBlockUtils::insertBlockAfter(new VPBasicBlock(), VPBB);
VPBB = VPBB->getSingleSuccessor();

or have insertBlockAfter() return the block it inserted and have
VPBB = VPBlockUtils::insertBlockAfter(new VPBasicBlock(), VPBB);
(in this or separate patch)

9124

Perhaps tryToMergeBlockIntoPredecessor() should return the predecessor it merged into if successful (null otherwise), to support VPBB = VPBlockUtils::tryToMergeBlockIntoPredecessor(VPBB);

But probably better to avoid updating VPBB at all (tryToMerge already takes care of updating Exit if needed) - see below.

9193–9194

This updating of VPBB is essentially trying to maintain the last BB, i.e., the Latch, i.e., Exit. Would be good to use VPBB only during initial VPlan construction, and refer to Exit instead of VPBB afterwards when seeking the latch.

9205–9206

The use of VPBB here suggests that Exit be used instead, but does it use/rely on a BB preceding the latch?

9288

There are two motivations: (1) set the Exit when its region is created, (2) designate a unique BB for the Exit.
The first provides a stable region, allowing recipes to be placed in the latch during construction if needed, and resolves the issue of when to set Exit; we know a region must have an Exit, may as well have one at the outset.
The second facilitates introducing blocks internal to the loop, between header and latch, requiring only a disconnect and reconnects rather than splitting a block. This is most useful during initial VPlan construction, but seems useless for later transformations - which in general may need to split blocks.

Note that sink-after (splits blocks and) already explicitly updates the "last" BB, see comment above, so may as well have it update the Exit block instead. Would be good to have splitAt() update the enclosing Region's Exit if needed, as well, similar to tryToMergeBlockIntoPredecessor().

fhahn updated this revision to Diff 395681.Dec 21 2021, 7:52 AM
fhahn marked 6 inline comments as done.

Address latest comments, thanks!

fhahn marked an inline comment as done.Dec 21 2021, 7:53 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9117

Updated to use the first alternative, thanks!

9124

But probably better to avoid updating VPBB at all (tryToMerge already takes care of updating Exit if needed) - see below.

Sounds good! The only real later use was adjustRecipesForReductions. It expects the latch block, so LatchVPBB can be passed instead. This required a fix to ensure that VPWidenCanonicalIVRecipe is always inserted in the header: 1a54889f48fa

9193–9194

Removed!

9205–9206

adjustRecipesForReductions expects the latch block (the argument is named accordingly), so LatchVPBB can be used directly..

9288

Note that sink-after (splits blocks and) already explicitly updates the "last" BB, see comment above, so may as well have it update the Exit block instead. Would be good to have splitAt() update the enclosing Region's Exit if needed, as well, similar to tryToMergeBlockIntoPredecessor().

Sounds like a good follow-up. I'd prefer to keep the folding late for now, as there is a verifier error when it is folded earlier and I'd prefer to fix that separately.

Ayal accepted this revision.Dec 21 2021, 1:26 PM

Looks good to me!
Would be good to add some (TODO?) comment explaining why Latch is merged with its predecessor late.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9124

Good to have spotted and fixed always inserting VPWidenCanonicalIVRecipe in the header! Posted a couple of post-commit comments there. This fix led to the couple of test changes below reordering ule compares?

This revision is now accepted and ready to land.Dec 21 2021, 1:26 PM
This revision was landed with ongoing or failed builds.Dec 22 2021, 4:45 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 2 inline comments as done.