This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
27/29
LoopVectorize.cpp
18/20
VPlan.h
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
reduction-order.ll
-
select-reduction.ll

Differential D115793

[VPlan] Create header & latch blocks for plan skeleton up front (NFC).
ClosedPublic

Authored by fhahn on Dec 15 2021, 3:41 AM.

Download Raw Diff

Details

Reviewers

rengolin
Ayal
gilr

Commits

rGede7c2438f39: [VPlan] Create header & latch blocks for skeleton up front (NFC).

Summary

By creating the header and latch blocks up front and adding blocks and
recipes in between those 2 blocks we ensure that the entry and exits of
the plan remain valid throughout construction.

In order to avoid test changes and keep printing of the plans the same,
we use the new header block instead of creating a new block on the first
iteration of the loop traversing the original loop.

We also fold the latch into its predecessor.

This is a follow up to a post-commit suggestion in D114586.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Dec 15 2021, 3:41 AM

Herald added subscribers: tschuett, psnobl, rogfer01 and 2 others. · View Herald TranscriptDec 15 2021, 3:41 AM

fhahn requested review of this revision.Dec 15 2021, 3:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2021, 3:41 AM

Herald added a subscriber: vkmr. · View Herald Transcript

Harbormaster completed remote builds in B139404: Diff 394516.Dec 15 2021, 4:23 AM

Use early_inc_range when moving recipes.

fhahn mentioned this in D113183: [LV] Patch up induction phis after VPlan execution..Dec 16 2021, 3:13 AM

Harbormaster completed remote builds in B139622: Diff 394807.Dec 16 2021, 3:14 AM

Ayal added inline comments.Dec 16 2021, 5:05 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8808	Perhaps something like VPBlockUtils::insertBlockOnEdge() would help take care of ensuring single successor, disconnecting, reconnecting? Admittedly two blocks are being inserted here.
9031	Feed Entry and Exit to constructor of VPRegionBlock instead of setting them explicitly? Feed Entry to constructor of VPlan instead of setting it explicitly?
9047	Can be simplified into: if (FillHeaderVPBB) FillHeaderVPBB = false; else { auto *FirstVPBBForBB = new VPBasicBlock(BB->getName()); VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB); VPBB = FirstVPBBForBB; } ? (Always generating a new VPBB and fusing the empty "dummy" header later, along with fusing latch below, would be reverting D111299?)
llvm/lib/Transforms/Vectorize/VPlan.h
2362	Something early_inc_range could handle?

fhahn mentioned this in rG3b35113ff096: [VPlan] Add VPBlockBase::successors() returning an iterator_range (NFC)..Dec 16 2021, 6:31 AM

Address comments, thanks!

fhahn marked 4 inline comments as done.Dec 16 2021, 6:38 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8808	I left it as is for now, because multiple blocks are added and connected here.
9031	Thanks, updated!
9047	Simplified,thanks! (Always generating a new VPBB and fusing the empty "dummy" header later, along with fusing latch below, would be reverting D111299?) Perhaps partly reverting it. But in the current patch the header VPBB is actually used, whereas pre-D111299 this 'dummy pre-entry' was not used for anything.
llvm/lib/Transforms/Vectorize/VPlan.h
2362	Unfortunately I don't think `early_inc_range` will work here, because the successors are managed in a SmallVector, so the early incremented iterator may be invalid after `disconnectBlocks` removes an entry from the vector. (It might work when reversing the iteration order and early increment, but it seems to me that using a SMallVector here is safer for now. I added a `VPBlockBase::successors()` helper that returns an iterator range, to avoid the ugly SmallVector construction. This makes some existing use also nicer.

Harbormaster completed remote builds in B139651: Diff 394851.Dec 16 2021, 6:42 AM

Ayal added inline comments.Dec 16 2021, 11:12 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9047	(Always generating a new VPBB and ... On second thought, header phi recipes should be placed in the header VPBB, which may be awkward if current "VPBB" moved on. Another alternative is mentioned below, which always generates the next new VPBB.
9113–9116	Another alternative to FillHeaderVPBB above, is to prepare here for the next iteration by doing: auto *NextVPBB = new VPBasicBlock(); VPBlockUtils::insertBlockAfter(NextVPBB, VPBB); VPBB = NextVPBB; The start of each iteration needs to only `VPBB->setName(BB->getName())` which also takes care of the header block; after exiting the loop we can get rid of the last empty VPBB, possibly as part of fusing the latch.
9297	Would be good to avoid further bloating this excessive method. Could this folding be outlined, or should it be applied only to selected VPlan prior to code-gen rather than every VPlan upon construction? Doesn't VPBasicBlock::execute() effectively fold Exit into its predecessor if possible?
llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358	Would be good to update above documentation condition bit is propagated? to NewBlock? disconnects BlockPtr from all its successors and connects it with NewBlock as its successor
2360–2361	Also assert NewBlock has no predecessors, as documented above?
2361–2371	While we're here, connectBlocks(BlockPtr, NewBlock); ?
2362	Very well.

addressed latest comments, thanks!

fhahn marked 4 inline comments as done.Dec 16 2021, 12:15 PM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9113–9116	Thanks, updated to use this approach, but with a small extra to avoid generating a redundant empty block in the last iteration. Alternatively `VPBlockUtils::tryToMergeBlockIntoPredecessor` could be used, if you prefer.
9297	I added a new helper `tryToMergeBlockIntoPredecessor`. . Could this folding be outlined, or should it be applied only to selected VPlan prior to code-gen rather than every VPlan upon construction? Doesn't VPBasicBlock::execute() effectively fold Exit into its predecessor if possible? The only reason to do this here is to avoid polluting the VPlan printing with additional redundant blocks. We could keep the extra block, but it would require updating all tests that check a printed VPlan and also bloats the plans we need to check in general. WDYT?
llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358	Should be updated, cond bits are now propagated (and removed from BlockPtr ) and moving successors is mentioned.
2360–2361	added, thanks

Harbormaster completed remote builds in B139715: Diff 394952.Dec 16 2021, 1:44 PM

Ayal added inline comments.Dec 16 2021, 1:57 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9030	Note that setting the name of HeaderVPBB here became redundant. So is setting next.vpbb below.
9113–9116	Perhaps early-exit, combine with the bump, and count backwards: if (--NumBBsToProcess == 0) continue/break; ... It seemed simpler to avoid the counting and checking by paying for a redundant empty block at the end; it indeed could be cleaned up using tryToMergeBlockIntoPredecessor (thanks for outlining!), although it needs only a disconnect and delete. Pick whichever version you prefer.
9297	Ah, sure, let's clean up VPlans upon construction then.
llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358	Thanks! "If \p BlockPtr has more than one successor ..."?
2370	(These condition bits seem to be poorly tested...)
2419	dyn_cast_or_null, or must Block have a single predecessor?
2429	Update Block->getParent()'s Exit if it exists and is equal to Block?

Address comments and properly transfer successors from folded block.

fhahn marked an inline comment as done.Dec 17 2021, 8:55 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9030	I removed it for HeaderVPBB both.
9113–9116	updated to just fold the block afterwards.
llvm/lib/Transforms/Vectorize/VPlan.h
2370	yes, they are only used in the native path and it looks like the function is not used on blocks with condbits. Not sure if we can really improve that.
2419	updated to dyn_cast_or_null.
2429	updated, thanks!

Harbormaster completed remote builds in B139857: Diff 395145.Dec 17 2021, 9:52 AM

Ayal added inline comments.Dec 18 2021, 3:03 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9288	Should this folding of the latch be done right after folding the empty last VPBB above, or better do so here after dumping Plan.
llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358	Drop "If \p BlockPtr has more than one successor ..."? This method moves all successors of BlockPtr to be successors of NewBlock, also when this involves a single successor.
2421	`!PredVPBB->getSingleSuccessor()` - suffice to check if `PredVPBB->getNumSuccessors() != 1`
2427	Either cast (non-dyn) if Block must be in a Region, or check `if (ParentRegion && ParentRegion->getExit() == Block)`

Address latest comments, thanks!

fhahn added inline comments.Dec 20 2021, 7:31 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9288	I don't think so. I think main motivation is to have the latch/exit block separately so sink-after & other transforms have do not have to explicitly updated the exit block.
llvm/lib/Transforms/Vectorize/VPlan.h
2355–2358	Removed, thanks!
2427	For now, it should be safe to use `cast` directly, Updated, thanks!

Harbormaster completed remote builds in B140088: Diff 395443.Dec 20 2021, 8:33 AM

Patch is ready to go in, with a comment explaining why the empty pre-latch VPBB is merged early and the latch VPBB is merged late.

Suggestions to merge both blocks early, and other comments added inline, may be further discussed in follow-up patch(es) if preferred, or in this one.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9117	Can alternatively do: VPBlockUtils::insertBlockAfter(new VPBasicBlock(), VPBB); VPBB = VPBB->getSingleSuccessor(); or have insertBlockAfter() return the block it inserted and have `VPBB = VPBlockUtils::insertBlockAfter(new VPBasicBlock(), VPBB);` (in this or separate patch)
9124	Perhaps tryToMergeBlockIntoPredecessor() should return the predecessor it merged into if successful (null otherwise), to support VPBB = VPBlockUtils::tryToMergeBlockIntoPredecessor(VPBB); But probably better to avoid updating VPBB at all (tryToMerge already takes care of updating Exit if needed) - see below.
9193–9194	This updating of VPBB is essentially trying to maintain the last BB, i.e., the Latch, i.e., Exit. Would be good to use VPBB only during initial VPlan construction, and refer to Exit instead of VPBB afterwards when seeking the latch.
9205–9206	The use of VPBB here suggests that Exit be used instead, but does it use/rely on a BB preceding the latch?
9288	There are two motivations: (1) set the Exit when its region is created, (2) designate a unique BB for the Exit. The first provides a stable region, allowing recipes to be placed in the latch during construction if needed, and resolves the issue of when to set Exit; we know a region must have an Exit, may as well have one at the outset. The second facilitates introducing blocks internal to the loop, between header and latch, requiring only a disconnect and reconnects rather than splitting a block. This is most useful during initial VPlan construction, but seems useless for later transformations - which in general may need to split blocks. Note that sink-after (splits blocks and) already explicitly updates the "last" BB, see comment above, so may as well have it update the Exit block instead. Would be good to have splitAt() update the enclosing Region's Exit if needed, as well, similar to tryToMergeBlockIntoPredecessor().

Address latest comments, thanks!

fhahn marked an inline comment as done.Dec 21 2021, 7:53 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9117	Updated to use the first alternative, thanks!
9124	But probably better to avoid updating VPBB at all (tryToMerge already takes care of updating Exit if needed) - see below. Sounds good! The only real later use was `adjustRecipesForReductions`. It expects the latch block, so `LatchVPBB` can be passed instead. This required a fix to ensure that `VPWidenCanonicalIVRecipe` is always inserted in the header: 1a54889f48fa
9193–9194	Removed!
9205–9206	`adjustRecipesForReductions` expects the latch block (the argument is named accordingly), so `LatchVPBB` can be used directly..
9288	Note that sink-after (splits blocks and) already explicitly updates the "last" BB, see comment above, so may as well have it update the Exit block instead. Would be good to have splitAt() update the enclosing Region's Exit if needed, as well, similar to tryToMergeBlockIntoPredecessor(). Sounds like a good follow-up. I'd prefer to keep the folding late for now, as there is a verifier error when it is folded earlier and I'd prefer to fix that separately.

fhahn mentioned this in D113223: [VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable..Dec 21 2021, 8:10 AM

Harbormaster completed remote builds in B140259: Diff 395681.Dec 21 2021, 9:05 AM

fhahn added a child revision: D116123: [VPlan] Handle IV vector splat using VPWidenCanonicalIV..Dec 21 2021, 12:07 PM

fhahn added a child revision: D113223: [VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable..

Ayal mentioned this in rG1a54889f48fa: [LV] Ensure WidenCanonicalIVRecipe is always created in header (NFC)..Dec 21 2021, 1:19 PM

Looks good to me!
Would be good to add some (TODO?) comment explaining why Latch is merged with its predecessor late.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9124	Good to have spotted and fixed always inserting VPWidenCanonicalIVRecipe in the header! Posted a couple of post-commit comments there. This fix led to the couple of test changes below reordering ule compares?

This revision is now accepted and ready to land.Dec 21 2021, 1:26 PM

This revision was landed with ongoing or failed builds.Dec 22 2021, 4:45 AM

Closed by commit rGede7c2438f39: [VPlan] Create header & latch blocks for skeleton up front (NFC). (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn marked 2 inline comments as done.

fhahn added a commit: rGede7c2438f39: [VPlan] Create header & latch blocks for skeleton up front (NFC)..

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

50 lines

VPlan.h

48 lines

test/

Transforms/

LoopVectorize/

reduction-order.ll

4 lines

select-reduction.ll

2 lines

Diff 395842

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,789 Lines • ▼ Show 20 Lines	VPBasicBlock *VPRecipeBuilder::handleReplication(

// Finalize the recipe for Instr, first if it is not predicated.		// Finalize the recipe for Instr, first if it is not predicated.
if (!IsPredicated) {		if (!IsPredicated) {
LLVM_DEBUG(dbgs() << "LV: Scalarizing:" << *I << "\n");		LLVM_DEBUG(dbgs() << "LV: Scalarizing:" << *I << "\n");
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
return VPBB;		return VPBB;
}		}
LLVM_DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");		LLVM_DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");
assert(VPBB->getSuccessors().empty() &&
"VPBB has successors when handling predicated replication.");		VPBlockBase *SingleSucc = VPBB->getSingleSuccessor();
		assert(SingleSucc && "VPBB must have a single successor when handling "
		"predicated replication.");
		VPBlockUtils::disconnectBlocks(VPBB, SingleSucc);
// Record predicated instructions for above packing optimizations.		// Record predicated instructions for above packing optimizations.
VPBlockBase *Region = createReplicateRegion(I, Recipe, Plan);		VPBlockBase *Region = createReplicateRegion(I, Recipe, Plan);
VPBlockUtils::insertBlockAfter(Region, VPBB);		VPBlockUtils::insertBlockAfter(Region, VPBB);
auto *RegSucc = new VPBasicBlock();		auto *RegSucc = new VPBasicBlock();
VPBlockUtils::insertBlockAfter(RegSucc, Region);		VPBlockUtils::insertBlockAfter(RegSucc, Region);
		VPBlockUtils::connectBlocks(RegSucc, SingleSucc);
		AyalUnsubmitted Done Reply Inline Actions Perhaps something like VPBlockUtils::insertBlockOnEdge() would help take care of ensuring single successor, disconnecting, reconnecting? Admittedly two blocks are being inserted here. Ayal: Perhaps something like VPBlockUtils::insertBlockOnEdge() would help take care of ensuring…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I left it as is for now, because multiple blocks are added and connected here. fhahn: I left it as is for now, because multiple blocks are added and connected here.
return RegSucc;		return RegSucc;
}		}

VPRegionBlock VPRecipeBuilder::createReplicateRegion(Instruction Instr,		VPRegionBlock VPRecipeBuilder::createReplicateRegion(Instruction Instr,
VPRecipeBase *PredRecipe,		VPRecipeBase *PredRecipe,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
// Instructions marked for predication are replicated and placed under an		// Instructions marked for predication are replicated and placed under an
// if-then construct to prevent side-effects.		// if-then construct to prevent side-effects.
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < IG->getFactor(); i++)
RecipeBuilder.recordRecipeOf(Member);		RecipeBuilder.recordRecipeOf(Member);
};		};

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Build initial VPlan: Scan the body of the loop in a topological order to		// Build initial VPlan: Scan the body of the loop in a topological order to
// visit each basic block after having visited its predecessor basic blocks.		// visit each basic block after having visited its predecessor basic blocks.
// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------

auto Plan = std::make_unique<VPlan>();		// Create initial VPlan skeleton, with separate header and latch blocks.
		VPBasicBlock *HeaderVPBB = new VPBasicBlock();
		AyalUnsubmitted Done Reply Inline Actions Note that setting the name of HeaderVPBB here became redundant. So is setting next.vpbb below. Ayal: Note that setting the name of HeaderVPBB here became redundant. So is setting next.vpbb below.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I removed it for HeaderVPBB both. fhahn: I removed it for HeaderVPBB both.
		VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");
		AyalUnsubmitted Done Reply Inline Actions Feed Entry and Exit to constructor of VPRegionBlock instead of setting them explicitly? Feed Entry to constructor of VPlan instead of setting it explicitly? Ayal: Feed Entry and Exit to constructor of VPRegionBlock instead of setting them explicitly? Feed…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, updated! fhahn: Thanks, updated!
		VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
		auto *TopRegion = new VPRegionBlock(HeaderVPBB, LatchVPBB, "vector loop");
		auto Plan = std::make_unique<VPlan>(TopRegion);

// Scan the body of the loop in a topological order to visit each basic block		// Scan the body of the loop in a topological order to visit each basic block
// after having visited its predecessor basic blocks.		// after having visited its predecessor basic blocks.
LoopBlocksDFS DFS(OrigLoop);		LoopBlocksDFS DFS(OrigLoop);
DFS.perform(LI);		DFS.perform(LI);

VPBasicBlock *VPBB = nullptr;		VPBasicBlock *VPBB = HeaderVPBB;
VPBasicBlock *HeaderVPBB = nullptr;
SmallVector<VPWidenIntOrFpInductionRecipe *> InductionsToMove;		SmallVector<VPWidenIntOrFpInductionRecipe *> InductionsToMove;
for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {		for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
// Relevant instructions from basic block BB will be grouped into VPRecipe		// Relevant instructions from basic block BB will be grouped into VPRecipe
// ingredients and fill a new VPBasicBlock.		// ingredients and fill a new VPBasicBlock.
unsigned VPBBsForBB = 0;		unsigned VPBBsForBB = 0;
auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());		VPBB->setName(BB->getName());
		AyalUnsubmitted Done Reply Inline Actions Can be simplified into: if (FillHeaderVPBB) FillHeaderVPBB = false; else { auto FirstVPBBForBB = new VPBasicBlock(BB->getName()); VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB); VPBB = FirstVPBBForBB; } ? (Always generating a new VPBB and fusing the empty "dummy" header later, along with fusing latch below, would be reverting D111299?) Ayal:* Can be simplified into: if (FillHeaderVPBB) FillHeaderVPBB = false; else { auto…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Simplified,thanks! (Always generating a new VPBB and fusing the empty "dummy" header later, along with fusing latch below, would be reverting D111299?) Perhaps partly reverting it. But in the current patch the header VPBB is actually used, whereas pre-D111299 this 'dummy pre-entry' was not used for anything. fhahn: Simplified,thanks! > (Always generating a new VPBB and fusing the empty "dummy" header later…
		AyalUnsubmitted Not Done Reply Inline Actions (Always generating a new VPBB and ... On second thought, header phi recipes should be placed in the header VPBB, which may be awkward if current "VPBB" moved on. Another alternative is mentioned below, which always generates the next new VPBB. Ayal: > (Always generating a new VPBB and ... On second thought, header phi recipes should be placed…
if (VPBB)
VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB);
else {
auto *TopRegion = new VPRegionBlock("vector loop");
TopRegion->setEntry(FirstVPBBForBB);
Plan->setEntry(TopRegion);
HeaderVPBB = FirstVPBBForBB;
}
VPBB = FirstVPBBForBB;
Builder.setInsertPoint(VPBB);		Builder.setInsertPoint(VPBB);

// Introduce each ingredient into VPlan.		// Introduce each ingredient into VPlan.
// TODO: Model and preserve debug instrinsics in VPlan.		// TODO: Model and preserve debug instrinsics in VPlan.
for (Instruction &I : BB->instructionsWithoutDebug()) {		for (Instruction &I : BB->instructionsWithoutDebug()) {
Instruction *Instr = &I;		Instruction *Instr = &I;

// First filter out irrelevant instructions, to ensure no recipes are		// First filter out irrelevant instructions, to ensure no recipes are
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
// replicated. This may create a successor for VPBB.		// replicated. This may create a successor for VPBB.
VPBasicBlock *NextVPBB =		VPBasicBlock *NextVPBB =
RecipeBuilder.handleReplication(Instr, Range, VPBB, Plan);		RecipeBuilder.handleReplication(Instr, Range, VPBB, Plan);
if (NextVPBB != VPBB) {		if (NextVPBB != VPBB) {
VPBB = NextVPBB;		VPBB = NextVPBB;
VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)		VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)
: "");		: "");
}		}
}		}

		VPBlockUtils::insertBlockAfter(new VPBasicBlock(), VPBB);
		VPBB = cast<VPBasicBlock>(VPBB->getSingleSuccessor());
		AyalUnsubmitted Done Reply Inline Actions Another alternative to FillHeaderVPBB above, is to prepare here for the next iteration by doing: auto NextVPBB = new VPBasicBlock(); VPBlockUtils::insertBlockAfter(NextVPBB, VPBB); VPBB = NextVPBB; The start of each iteration needs to only `VPBB->setName(BB->getName())` which also takes care of the header block; after exiting the loop we can get rid of the last empty VPBB, possibly as part of fusing the latch. Ayal:* Another alternative to FillHeaderVPBB above, is to prepare here for the next iteration by doing…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, updated to use this approach, but with a small extra to avoid generating a redundant empty block in the last iteration. Alternatively `VPBlockUtils::tryToMergeBlockIntoPredecessor` could be used, if you prefer. fhahn: Thanks, updated to use this approach, but with a small extra to avoid generating a redundant…
		AyalUnsubmitted Not Done Reply Inline Actions Perhaps early-exit, combine with the bump, and count backwards: if (--NumBBsToProcess == 0) continue/break; ... It seemed simpler to avoid the counting and checking by paying for a redundant empty block at the end; it indeed could be cleaned up using tryToMergeBlockIntoPredecessor (thanks for outlining!), although it needs only a disconnect and delete. Pick whichever version you prefer. Ayal: Perhaps early-exit, combine with the bump, and count backwards: ``` if (--NumBBsToProcess ==…
		fhahnAuthorUnsubmitted Done Reply Inline Actions updated to just fold the block afterwards. fhahn: updated to just fold the block afterwards.
}		}
		AyalUnsubmitted Done Reply Inline Actions Can alternatively do: VPBlockUtils::insertBlockAfter(new VPBasicBlock(), VPBB); VPBB = VPBB->getSingleSuccessor(); or have insertBlockAfter() return the block it inserted and have `VPBB = VPBlockUtils::insertBlockAfter(new VPBasicBlock(), VPBB);` (in this or separate patch) Ayal: Can alternatively do: ``` VPBlockUtils::insertBlockAfter(new VPBasicBlock(), VPBB)…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Updated to use the first alternative, thanks! fhahn: Updated to use the first alternative, thanks!

		// Fold the last, empty block into its predecessor.
		VPBB = VPBlockUtils::tryToMergeBlockIntoPredecessor(VPBB);
		assert(VPBB && "expected to fold last (empty) block");
		// After here, VPBB should not be used.
		VPBB = nullptr;

		AyalUnsubmitted Done Reply Inline Actions Perhaps tryToMergeBlockIntoPredecessor() should return the predecessor it merged into if successful (null otherwise), to support VPBB = VPBlockUtils::tryToMergeBlockIntoPredecessor(VPBB); But probably better to avoid updating VPBB at all (tryToMerge already takes care of updating Exit if needed) - see below. Ayal: Perhaps tryToMergeBlockIntoPredecessor() should return the predecessor it merged into if…
		fhahnAuthorUnsubmitted Done Reply Inline Actions But probably better to avoid updating VPBB at all (tryToMerge already takes care of updating Exit if needed) - see below. Sounds good! The only real later use was `adjustRecipesForReductions`. It expects the latch block, so `LatchVPBB` can be passed instead. This required a fix to ensure that `VPWidenCanonicalIVRecipe` is always inserted in the header: 1a54889f48fa fhahn: > But probably better to avoid updating VPBB at all (tryToMerge already takes care of updating…
		AyalUnsubmitted Done Reply Inline Actions Good to have spotted and fixed always inserting VPWidenCanonicalIVRecipe in the header! Posted a couple of post-commit comments there. This fix led to the couple of test changes below reordering ule compares? Ayal: Good to have spotted and fixed always inserting VPWidenCanonicalIVRecipe in the header! Posted…
assert(isa<VPRegionBlock>(Plan->getEntry()) &&		assert(isa<VPRegionBlock>(Plan->getEntry()) &&
!Plan->getEntry()->getEntryBasicBlock()->empty() &&		!Plan->getEntry()->getEntryBasicBlock()->empty() &&
"entry block must be set to a VPRegionBlock having a non-empty entry "		"entry block must be set to a VPRegionBlock having a non-empty entry "
"VPBasicBlock");		"VPBasicBlock");
RecipeBuilder.fixHeaderPhis();		RecipeBuilder.fixHeaderPhis();

// ---------------------------------------------------------------------------		// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to		// Transform initial VPlan: Apply previously taken decisions, in order, to
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (TargetRegion) {
// main block.		// main block.
auto *SplitBlock =		auto *SplitBlock =
Target->getParent()->splitAt(std::next(Target->getIterator()));		Target->getParent()->splitAt(std::next(Target->getIterator()));

auto *SplitPred = SplitBlock->getSinglePredecessor();		auto *SplitPred = SplitBlock->getSinglePredecessor();

VPBlockUtils::disconnectBlocks(SplitPred, SplitBlock);		VPBlockUtils::disconnectBlocks(SplitPred, SplitBlock);
VPBlockUtils::connectBlocks(SplitPred, SinkRegion);		VPBlockUtils::connectBlocks(SplitPred, SinkRegion);
VPBlockUtils::connectBlocks(SinkRegion, SplitBlock);		VPBlockUtils::connectBlocks(SinkRegion, SplitBlock);
if (VPBB == SplitPred)
VPBB = SplitBlock;
}		}
		AyalUnsubmitted Done Reply Inline Actions This updating of VPBB is essentially trying to maintain the last BB, i.e., the Latch, i.e., Exit. Would be good to use VPBB only during initial VPlan construction, and refer to Exit instead of VPBB afterwards when seeking the latch. Ayal: This updating of VPBB is essentially trying to maintain the last BB, i.e., the Latch, i.e.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Removed! fhahn: Removed!
}		}

cast<VPRegionBlock>(Plan->getEntry())->setExit(VPBB);

VPlanTransforms::removeRedundantInductionCasts(*Plan);		VPlanTransforms::removeRedundantInductionCasts(*Plan);

// Now that sink-after is done, move induction recipes for optimized truncates		// Now that sink-after is done, move induction recipes for optimized truncates
// to the phi section of the header block.		// to the phi section of the header block.
for (VPWidenIntOrFpInductionRecipe *Ind : InductionsToMove)		for (VPWidenIntOrFpInductionRecipe *Ind : InductionsToMove)
Ind->moveBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());		Ind->moveBefore(*HeaderVPBB, HeaderVPBB->getFirstNonPhi());

// Adjust the recipes for any inloop reductions.		// Adjust the recipes for any inloop reductions.
adjustRecipesForReductions(VPBB, Plan, RecipeBuilder, Range.Start);		adjustRecipesForReductions(cast<VPBasicBlock>(TopRegion->getExit()), Plan,
		RecipeBuilder, Range.Start);
		AyalUnsubmitted Done Reply Inline Actions The use of VPBB here suggests that Exit be used instead, but does it use/rely on a BB preceding the latch? Ayal: The use of VPBB here suggests that Exit be used instead, but does it use/rely on a BB preceding…
		fhahnAuthorUnsubmitted Done Reply Inline Actions `adjustRecipesForReductions` expects the latch block (the argument is named accordingly), so `LatchVPBB` can be used directly.. fhahn: `adjustRecipesForReductions` expects the latch block (the argument is named accordingly), so…

// Introduce a recipe to combine the incoming and previous values of a		// Introduce a recipe to combine the incoming and previous values of a
// first-order recurrence.		// first-order recurrence.
for (VPRecipeBase &R : Plan->getEntry()->getEntryBasicBlock()->phis()) {		for (VPRecipeBase &R : Plan->getEntry()->getEntryBasicBlock()->phis()) {
auto *RecurPhi = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R);		auto *RecurPhi = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&R);
if (!RecurPhi)		if (!RecurPhi)
continue;		continue;

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	VPlanPtr LoopVectorizationPlanner::buildVPlanWithVPRecipes(
for (VF = 2; ElementCount::isKnownLT(VF, Range.End); VF = 2) {		for (VF = 2; ElementCount::isKnownLT(VF, Range.End); VF = 2) {
Plan->addVF(VF);		Plan->addVF(VF);
RSO << "," << VF;		RSO << "," << VF;
}		}
RSO << "},UF>=1";		RSO << "},UF>=1";
RSO.flush();		RSO.flush();
Plan->setName(PlanName);		Plan->setName(PlanName);

		// Fold Exit block into its predecessor if possible.
		// TODO: Fold block earlier once all VPlan transforms properly maintain a
		// VPBasicBlock as exit.
		AyalUnsubmitted Done Reply Inline Actions Should this folding of the latch be done right after folding the empty last VPBB above, or better do so here after dumping Plan. Ayal: Should this folding of the latch be done right after folding the empty last VPBB above, or…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I don't think so. I think main motivation is to have the latch/exit block separately so sink-after & other transforms have do not have to explicitly updated the exit block. fhahn: I don't think so. I think main motivation is to have the latch/exit block separately so sink…
		AyalUnsubmitted Done Reply Inline Actions There are two motivations: (1) set the Exit when its region is created, (2) designate a unique BB for the Exit. The first provides a stable region, allowing recipes to be placed in the latch during construction if needed, and resolves the issue of when to set Exit; we know a region must have an Exit, may as well have one at the outset. The second facilitates introducing blocks internal to the loop, between header and latch, requiring only a disconnect and reconnects rather than splitting a block. This is most useful during initial VPlan construction, but seems useless for later transformations - which in general may need to split blocks. Note that sink-after (splits blocks and) already explicitly updates the "last" BB, see comment above, so may as well have it update the Exit block instead. Would be good to have splitAt() update the enclosing Region's Exit if needed, as well, similar to tryToMergeBlockIntoPredecessor(). Ayal: There are two motivations: (1) set the Exit when its region is created, (2) designate a unique…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Note that sink-after (splits blocks and) already explicitly updates the "last" BB, see comment above, so may as well have it update the Exit block instead. Would be good to have splitAt() update the enclosing Region's Exit if needed, as well, similar to tryToMergeBlockIntoPredecessor(). Sounds like a good follow-up. I'd prefer to keep the folding late for now, as there is a verifier error when it is folded earlier and I'd prefer to fix that separately. fhahn: > Note that sink-after (splits blocks and) already explicitly updates the "last" BB, see…
		VPBlockUtils::tryToMergeBlockIntoPredecessor(TopRegion->getExit());

assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");		assert(VPlanVerifier::verifyPlanIsValid(*Plan) && "VPlan is invalid");
return Plan;		return Plan;
}		}

VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {		VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
// Outer loop handling: They may require CFG and instruction level		// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.		// transformations before even evaluating whether vectorization is profitable.
		AyalUnsubmitted Done Reply Inline Actions Would be good to avoid further bloating this excessive method. Could this folding be outlined, or should it be applied only to selected VPlan prior to code-gen rather than every VPlan upon construction? Doesn't VPBasicBlock::execute() effectively fold Exit into its predecessor if possible? Ayal: Would be good to avoid further bloating this excessive method. Could this folding be outlined…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I added a new helper `tryToMergeBlockIntoPredecessor`. . Could this folding be outlined, or should it be applied only to selected VPlan prior to code-gen rather than every VPlan upon construction? Doesn't VPBasicBlock::execute() effectively fold Exit into its predecessor if possible? The only reason to do this here is to avoid polluting the VPlan printing with additional redundant blocks. We could keep the extra block, but it would require updating all tests that check a printed VPlan and also bloats the plans we need to check in general. WDYT? fhahn: I added a new helper `tryToMergeBlockIntoPredecessor`. > . Could this folding be outlined, or…
		AyalUnsubmitted Done Reply Inline Actions Ah, sure, let's clean up VPlans upon construction then. Ayal: Ah, sure, let's clean up VPlans upon construction then.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.		// the vectorization pipeline.
assert(!OrigLoop->isInnermost());		assert(!OrigLoop->isInnermost());
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");		assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");

// Create new empty VPlan		// Create new empty VPlan
auto Plan = std::make_unique<VPlan>();		auto Plan = std::make_unique<VPlan>();

▲ Show 20 Lines • Show All 1,436 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 2,346 Lines • ▼ Show 20 Lines

/// Class that provides utilities for VPBlockBases in VPlan.		/// Class that provides utilities for VPBlockBases in VPlan.
class VPBlockUtils {		class VPBlockUtils {
public:		public:
VPBlockUtils() = delete;		VPBlockUtils() = delete;

/// Insert disconnected VPBlockBase \p NewBlock after \p BlockPtr. Add \p		/// Insert disconnected VPBlockBase \p NewBlock after \p BlockPtr. Add \p
/// NewBlock as successor of \p BlockPtr and \p BlockPtr as predecessor of \p		/// NewBlock as successor of \p BlockPtr and \p BlockPtr as predecessor of \p
/// NewBlock, and propagate \p BlockPtr parent to \p NewBlock. If \p BlockPtr		/// NewBlock, and propagate \p BlockPtr parent to \p NewBlock. \p BlockPtr's
/// has more than one successor, its conditional bit is propagated to \p		/// successors are moved from \p BlockPtr to \p NewBlock and \p BlockPtr's
/// NewBlock. \p NewBlock must have neither successors nor predecessors.		/// conditional bit is propagated to \p NewBlock. \p NewBlock must have
		/// neither successors nor predecessors.
		AyalUnsubmitted Done Reply Inline Actions Would be good to update above documentation condition bit is propagated? to NewBlock? disconnects BlockPtr from all its successors and connects it with NewBlock as its successor Ayal: Would be good to update above documentation - condition bit is propagated? to NewBlock?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Should be updated, cond bits are now propagated (and removed from BlockPtr ) and moving successors is mentioned. fhahn: Should be updated, cond bits are now propagated (and removed from BlockPtr ) and moving…
		AyalUnsubmitted Done Reply Inline Actions Thanks! "If \p BlockPtr has more than one successor ..."? Ayal: Thanks! "If \p BlockPtr has more than one successor ..."?
		AyalUnsubmitted Done Reply Inline Actions Drop "If \p BlockPtr has more than one successor ..."? This method moves all successors of BlockPtr to be successors of NewBlock, also when this involves a single successor. Ayal: Drop "If \p BlockPtr has more than one successor ..."? This method moves all successors of…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Removed, thanks! fhahn: Removed, thanks!
static void insertBlockAfter(VPBlockBase NewBlock, VPBlockBase BlockPtr) {		static void insertBlockAfter(VPBlockBase NewBlock, VPBlockBase BlockPtr) {
assert(NewBlock->getSuccessors().empty() &&		assert(NewBlock->getSuccessors().empty() &&
"Can't insert new block with successors.");		NewBlock->getPredecessors().empty() &&
		AyalUnsubmitted Done Reply Inline Actions Also assert NewBlock has no predecessors, as documented above? Ayal: Also assert NewBlock has no predecessors, as documented above?
		fhahnAuthorUnsubmitted Done Reply Inline Actions added, thanks fhahn: added, thanks
// TODO: move successors from BlockPtr to NewBlock when this functionality		"Can't insert new block with predecessors or successors.");
		AyalUnsubmitted Done Reply Inline Actions Something early_inc_range could handle? Ayal: Something early_inc_range could handle?
		fhahnAuthorUnsubmitted Done Reply Inline Actions Unfortunately I don't think `early_inc_range` will work here, because the successors are managed in a SmallVector, so the early incremented iterator may be invalid after `disconnectBlocks` removes an entry from the vector. (It might work when reversing the iteration order and early increment, but it seems to me that using a SMallVector here is safer for now. I added a `VPBlockBase::successors()` helper that returns an iterator range, to avoid the ugly SmallVector construction. This makes some existing use also nicer. fhahn: Unfortunately I don't think `early_inc_range` will work here, because the successors are…
		AyalUnsubmitted Done Reply Inline Actions Very well. Ayal: Very well.
// is necessary. For now, setBlockSingleSuccessor will assert if BlockPtr
// already has successors.
BlockPtr->setOneSuccessor(NewBlock);
NewBlock->setPredecessors({BlockPtr});
NewBlock->setParent(BlockPtr->getParent());		NewBlock->setParent(BlockPtr->getParent());
		SmallVector<VPBlockBase *> Succs(BlockPtr->successors());
		for (VPBlockBase *Succ : Succs) {
		disconnectBlocks(BlockPtr, Succ);
		connectBlocks(NewBlock, Succ);
		}
		NewBlock->setCondBit(BlockPtr->getCondBit());
		BlockPtr->setCondBit(nullptr);
		AyalUnsubmitted Not Done Reply Inline Actions (These condition bits seem to be poorly tested...) Ayal: (These condition bits seem to be poorly tested...)
		fhahnAuthorUnsubmitted Done Reply Inline Actions yes, they are only used in the native path and it looks like the function is not used on blocks with condbits. Not sure if we can really improve that. fhahn: yes, they are only used in the native path and it looks like the function is not used on blocks…
		connectBlocks(BlockPtr, NewBlock);
		AyalUnsubmitted Not Done Reply Inline Actions While we're here, connectBlocks(BlockPtr, NewBlock); ? Ayal: While we're here, connectBlocks(BlockPtr, NewBlock); ?
}		}

/// Insert disconnected VPBlockBases \p IfTrue and \p IfFalse after \p		/// Insert disconnected VPBlockBases \p IfTrue and \p IfFalse after \p
/// BlockPtr. Add \p IfTrue and \p IfFalse as succesors of \p BlockPtr and \p		/// BlockPtr. Add \p IfTrue and \p IfFalse as succesors of \p BlockPtr and \p
/// BlockPtr as predecessor of \p IfTrue and \p IfFalse. Propagate \p BlockPtr		/// BlockPtr as predecessor of \p IfTrue and \p IfFalse. Propagate \p BlockPtr
/// parent to \p IfTrue and \p IfFalse. \p Condition is set as the successor		/// parent to \p IfTrue and \p IfFalse. \p Condition is set as the successor
/// selector. \p BlockPtr must have no successors and \p IfTrue and \p IfFalse		/// selector. \p BlockPtr must have no successors and \p IfTrue and \p IfFalse
/// must have neither successors nor predecessors.		/// must have neither successors nor predecessors.
Show All 26 Lines	public:
/// Disconnect VPBlockBases \p From and \p To bi-directionally. Remove \p To		/// Disconnect VPBlockBases \p From and \p To bi-directionally. Remove \p To
/// from the successors of \p From and \p From from the predecessors of \p To.		/// from the successors of \p From and \p From from the predecessors of \p To.
static void disconnectBlocks(VPBlockBase From, VPBlockBase To) {		static void disconnectBlocks(VPBlockBase From, VPBlockBase To) {
assert(To && "Successor to disconnect is null.");		assert(To && "Successor to disconnect is null.");
From->removeSuccessor(To);		From->removeSuccessor(To);
To->removePredecessor(From);		To->removePredecessor(From);
}		}

		/// Try to merge \p Block into its single predecessor, if \p Block is a
		/// VPBasicBlock and its predecessor has a single successor. Returns a pointer
		/// to the predecessor \p Block was merged into or nullptr otherwise.
		static VPBasicBlock tryToMergeBlockIntoPredecessor(VPBlockBase Block) {
		auto *VPBB = dyn_cast<VPBasicBlock>(Block);
		auto *PredVPBB =
		AyalUnsubmitted Done Reply Inline Actions dyn_cast_or_null, or must Block have a single predecessor? Ayal: dyn_cast_or_null, or must Block have a single predecessor?
		fhahnAuthorUnsubmitted Done Reply Inline Actions updated to dyn_cast_or_null. fhahn: updated to dyn_cast_or_null.
		dyn_cast_or_null<VPBasicBlock>(Block->getSinglePredecessor());
		if (!VPBB \|\| !PredVPBB \|\| PredVPBB->getNumSuccessors() != 1)
		AyalUnsubmitted Done Reply Inline Actions `!PredVPBB->getSingleSuccessor()` - suffice to check if `PredVPBB->getNumSuccessors() != 1` Ayal: `!PredVPBB->getSingleSuccessor()` - suffice to check if `PredVPBB->getNumSuccessors() != 1`
		return nullptr;

		for (VPRecipeBase &R : make_early_inc_range(*VPBB))
		R.moveBefore(*PredVPBB, PredVPBB->end());
		VPBlockUtils::disconnectBlocks(PredVPBB, VPBB);
		auto *ParentRegion = cast<VPRegionBlock>(Block->getParent());
		AyalUnsubmitted Done Reply Inline Actions Either cast (non-dyn) if Block must be in a Region, or check `if (ParentRegion && ParentRegion->getExit() == Block)` Ayal: Either cast (non-dyn) if Block must be in a Region, or check `if (ParentRegion && ParentRegion…
		fhahnAuthorUnsubmitted Done Reply Inline Actions For now, it should be safe to use `cast` directly, Updated, thanks! fhahn: For now, it should be safe to use `cast` directly, Updated, thanks!
		if (ParentRegion->getExit() == Block)
		ParentRegion->setExit(PredVPBB);
		AyalUnsubmitted Done Reply Inline Actions Update Block->getParent()'s Exit if it exists and is equal to Block? Ayal: Update Block->getParent()'s Exit if it exists and is equal to Block?
		fhahnAuthorUnsubmitted Done Reply Inline Actions updated, thanks! fhahn: updated, thanks!
		SmallVector<VPBlockBase *> Successors(Block->successors());
		for (auto *Succ : Successors) {
		VPBlockUtils::disconnectBlocks(Block, Succ);
		VPBlockUtils::connectBlocks(PredVPBB, Succ);
		}
		delete Block;
		return PredVPBB;
		}

/// Returns true if the edge \p FromBlock -> \p ToBlock is a back-edge.		/// Returns true if the edge \p FromBlock -> \p ToBlock is a back-edge.
static bool isBackEdge(const VPBlockBase *FromBlock,		static bool isBackEdge(const VPBlockBase *FromBlock,
const VPBlockBase ToBlock, const VPLoopInfo VPLI) {		const VPBlockBase ToBlock, const VPLoopInfo VPLI) {
assert(FromBlock->getParent() == ToBlock->getParent() &&		assert(FromBlock->getParent() == ToBlock->getParent() &&
FromBlock->getParent() && "Must be in same region");		FromBlock->getParent() && "Must be in same region");
const VPLoop *FromLoop = VPLI->getLoopFor(FromBlock);		const VPLoop *FromLoop = VPLI->getLoopFor(FromBlock);
const VPLoop *ToLoop = VPLI->getLoopFor(ToBlock);		const VPLoop *ToLoop = VPLI->getLoopFor(ToBlock);
if (!FromLoop \|\| !ToLoop \|\| FromLoop != ToLoop)		if (!FromLoop \|\| !ToLoop \|\| FromLoop != ToLoop)
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-order.ll

	; RUN: opt -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S < %s 2>&1 \| FileCheck %s			; RUN: opt -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -S < %s 2>&1 \| FileCheck %s
	; RUN: opt -passes='loop-vectorize' -force-vector-width=4 -force-vector-interleave=1 -S < %s 2>&1 \| FileCheck %s			; RUN: opt -passes='loop-vectorize' -force-vector-width=4 -force-vector-interleave=1 -S < %s 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

	; Make sure the selects generated from reduction are always emitted			; Make sure the selects generated from reduction are always emitted
	; in deterministic order.			; in deterministic order.
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: icmp ule <4 x i64>			; CHECK: %[[VAR1:.*]] = add <4 x i32> <i32 3, i32 3, i32 3, i32 3>, %vec.phi1
	; CHECK-NEXT: %[[VAR1:.*]] = add <4 x i32> <i32 3, i32 3, i32 3, i32 3>, %vec.phi1
	; CHECK-NEXT: %[[VAR2:.*]] = add <4 x i32> %vec.phi, <i32 5, i32 5, i32 5, i32 5>			; CHECK-NEXT: %[[VAR2:.*]] = add <4 x i32> %vec.phi, <i32 5, i32 5, i32 5, i32 5>
				; CHECK-NEXT: icmp ule <4 x i64>
	; CHECK-NEXT: select <4 x i1> {{.*}}, <4 x i32> %[[VAR2]], <4 x i32>			; CHECK-NEXT: select <4 x i1> {{.*}}, <4 x i32> %[[VAR2]], <4 x i32>
	; CHECK-NEXT: select <4 x i1> {{.*}}, <4 x i32> %[[VAR1]], <4 x i32>			; CHECK-NEXT: select <4 x i1> {{.*}}, <4 x i32> %[[VAR1]], <4 x i32>
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
	;			;
	define internal i64 @foo(i32* %t0) !prof !1 {			define internal i64 @foo(i32* %t0) !prof !1 {
	t16:			t16:
	br label %t20			br label %t20

	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/select-reduction.ll

	Show All 27 Lines
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 [[EXTRA_ITER]], [[INDEX]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 [[EXTRA_ITER]], [[INDEX]]
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i64> poison, i64 [[OFFSET_IDX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <4 x i64> poison, i64 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT1]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT1]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 -1, i64 -2, i64 -3>			; CHECK-NEXT: [[INDUCTION:%.*]] = add <4 x i64> [[BROADCAST_SPLAT2]], <i64 0, i64 -1, i64 -2, i64 -3>
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[OFFSET_IDX]], 0
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX]], i32 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <4 x i64> poison, i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT3]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT3]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[VEC_IV:%.*]] = add <4 x i64> [[BROADCAST_SPLAT4]], <i64 0, i64 1, i64 2, i64 3>			; CHECK-NEXT: [[VEC_IV:%.*]] = add <4 x i64> [[BROADCAST_SPLAT4]], <i64 0, i64 1, i64 2, i64 3>
	; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[VEC_PHI]], <i32 10, i32 10, i32 10, i32 10>			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[VEC_PHI]], <i32 10, i32 10, i32 10, i32 10>
	; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> <i32 10, i32 10, i32 10, i32 10>			; CHECK-NEXT: [[TMP3]] = select <4 x i1> [[TMP2]], <4 x i32> [[VEC_PHI]], <4 x i32> <i32 10, i32 10, i32 10, i32 10>
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ule <4 x i64> [[VEC_IV]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP3]], <4 x i32> [[VEC_PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP3]], <4 x i32> [[VEC_PHI]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP5]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP0:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: br i1 true, label [[EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br i1 true, label [[EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Create header & latch blocks for plan skeleton up front (NFC).ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 395842

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/test/Transforms/LoopVectorize/reduction-order.ll

llvm/test/Transforms/LoopVectorize/select-reduction.ll

[VPlan] Create header & latch blocks for plan skeleton up front (NFC).
ClosedPublic