This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Vectorize/
-
lib/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
VPlan.h
2
VPlan.cpp
-
VPlanRecipes.cpp

Differential D131015

[LV] Track all IR blocks corresponding to VPBasicBlock
AbandonedPublic

Authored by reames on Aug 2 2022, 1:25 PM.

Download Raw Diff

Details

Reviewers

david-arm
fhahn
Ayal
gilr

Summary

When working with edges entering or leaving a VPBasicBlock, we need to use either the first or last IR block corresponding to the VPBasicBlock. The existing code is only correct when a VPBasicBlock in the header or latch position corresponds to exactly one BasicBlock. This happens to be true as the only VPBasicBlock which corresponds to more than one BasicBlock today is a VPReplicateRecipe which (as an implementation detail) can't be either a loop header or latch.

I decided to track the whole set for simplicity. It feels odd to not have a way to map IR blocks to VP blocks and vice versa, so since I was changing it anyways, figured I'd just track the whole set.

Diff Detail

Unit TestsFailed

	Time	Test
	60,100 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,080 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp

Event Timeline

reames created this revision.Aug 2 2022, 1:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2022, 1:25 PM

Herald added subscribers: rogfer01, bollu, hiraditya, mcrosier. · View Herald Transcript

reames requested review of this revision.Aug 2 2022, 1:25 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2022, 1:25 PM

Herald added a subscriber: vkmr. · View Herald Transcript

Harbormaster completed remote builds in B178840: Diff 449401.Aug 2 2022, 2:04 PM

reames added a child revision: D131118: [LV] Add generic scalarization support for unpredicated scalable vectors.Aug 3 2022, 2:54 PM

ping

The patch seems fine to me, but I think Florian probably understands this part of the codebase better than I.

In D131015#3726217, @david-arm wrote:

The patch seems fine to me, but I think Florian probably understands this part of the codebase better than I.

@fhahn ping

@fhahn ping x2

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptSep 6 2022, 8:28 AM

The existing code is only correct when a VPBasicBlock in the header or latch position corresponds to exactly one BasicBlock. This happens to be true as the only VPBasicBlock which corresponds to more than one BasicBlock today is a VPReplicateRecipe which (as an implementation detail) can't be either a loop header or latch.

I am not sure this is accurate, at the moment a non-predicated VPReplicateRecipe can be in any block I think. Predicated VPReplicateRecipes must be in a VPBasicBlock in a VPRegionBlock. IIUC this may become an issue after D131118 if regular VPReplicateRecipes could be expanded to multiple basic blocks?

If that's the issue, it would probably be clearer if this is modeled explicitly, by putting such recipes into their own region, more faithfully representing the fact that a loop will be generated for them.

In D131015#3772523, @fhahn wrote:

The existing code is only correct when a VPBasicBlock in the header or latch position corresponds to exactly one BasicBlock. This happens to be true as the only VPBasicBlock which corresponds to more than one BasicBlock today is a VPReplicateRecipe which (as an implementation detail) can't be either a loop header or latch.

I am not sure this is accurate, at the moment a non-predicated VPReplicateRecipe can be in any block I think. Predicated VPReplicateRecipes must be in a VPBasicBlock in a VPRegionBlock. IIUC this may become an issue after D131118 if regular VPReplicateRecipes could be expanded to multiple basic blocks?

If that's the issue, it would probably be clearer if this is modeled explicitly, by putting such recipes into their own region, more faithfully representing the fact that a loop will be generated for them.

The original comment was correct. The code as written is incorrect if a VPReplicateRecipe which corresponds to two or more BasicBlocks if that recipe were either header or exiting block of the loop.

However, as my comment said, this is impossible in the current code. A predicated replicate recipe can't be either of those positions (since it's contained by the VPRegionBlock), and a non-predicated one always corresponds to one basic block.

Your suggestion is to essentially add an invariant that a replicate region only contains more than one BB if it's contained in a VPRegionBlock. I am not opposed to that, but I'm also not motivated to make that change. It's significantly more invasive, and I just don't care that much.

Ayal added inline comments.Sep 7 2022, 1:47 PM

llvm/lib/Transforms/Vectorize/VPlan.cpp
373	Hmm, VPBasicBlock::execute() is called once per VPBB when generating IR, so how could VPBB2IRBB record multiple/overwriting NewBB's here per same `this`? If there are such cases, perhaps its better to simplify them and retain a single IRBB per VPBB, adding an assert that no overwriting takes place. VPBB's contain only recipes and are free of control-flow - which is modeled using multiple VPBB's and VPRegions - so should fill a single IRBB. A related issue are A-B-C cases above that try to reuse the same IRBB for pairs of back-to-back VPBB's (which may best be avoided for clarity and left to subsequent simplifyCFG to fold instead), but these are multiple VPBB's filling one IRBB, rather than the converse. Another related issue is the correspondence between original IRBB's and VPBB's during VPlan construction rather than execution. There multiple replicate-and-predicate recipes that stem from the same IRBB are each assigned separate VPBB's (of a replicating region), as can be seen by trying to assign each such VPBB a unique name associated with its original IRBB (`VPBBsForBB`). Again yielding multiple VPBB's per one IRBB.

Ayal added inline comments.Sep 11 2022, 7:31 AM

llvm/lib/Transforms/Vectorize/VPlan.cpp
373	Hmm, VPBasicBlock::execute() is called once per VPBB when generating IR, so how could VPBB2IRBB record multiple/overwriting NewBB's here per same this? Ahh, a Replicating Region executes each of its VPBB's VF*UF times, effectively performing complete unrolling at VPlan execution time. It may indeed be clearer to either fully unroll in VPlan (when setting VF and UF?) or emit a loop instead of unrolling it.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

8 lines

5 lines

14 lines

4 lines

Diff 449401

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,660 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
// nodes are currently empty because we did not want to introduce cycles.		// nodes are currently empty because we did not want to introduce cycles.
// This is the second stage of vectorizing recurrences.		// This is the second stage of vectorizing recurrences.
fixCrossIterationPHIs(State);		fixCrossIterationPHIs(State);

// Forget the original basic block.		// Forget the original basic block.
PSE.getSE()->forgetLoop(OrigLoop);		PSE.getSE()->forgetLoop(OrigLoop);

VPBasicBlock *LatchVPBB = Plan.getVectorLoopRegion()->getExitingBasicBlock();		VPBasicBlock *LatchVPBB = Plan.getVectorLoopRegion()->getExitingBasicBlock();
Loop *VectorLoop = LI->getLoopFor(State.CFG.VPBB2IRBB[LatchVPBB]);		Loop *VectorLoop = LI->getLoopFor(State.CFG.VPBB2IRBB[LatchVPBB].back());
if (Cost->requiresScalarEpilogue(VF)) {		if (Cost->requiresScalarEpilogue(VF)) {
// No edge from the middle block to the unique exit block has been inserted		// No edge from the middle block to the unique exit block has been inserted
// and there is nothing to fix from vector loop; phis should have incoming		// and there is nothing to fix from vector loop; phis should have incoming
// from scalar loop only.		// from scalar loop only.
Plan.clearLiveOuts();		Plan.clearLiveOuts();
} else {		} else {
// If we inserted an edge from the middle block to the unique exit block,		// If we inserted an edge from the middle block to the unique exit block,
// update uses outside the loop (phis) to account for the newly inserted		// update uses outside the loop (phis) to account for the newly inserted
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixReduction(VPReductionPHIRecipe *PhiR,
Builder.SetInsertPoint(&*LoopMiddleBlock->getFirstInsertionPt());		Builder.SetInsertPoint(&*LoopMiddleBlock->getFirstInsertionPt());

State.setDebugLocFromInst(LoopExitInst);		State.setDebugLocFromInst(LoopExitInst);

Type *PhiTy = OrigPhi->getType();		Type *PhiTy = OrigPhi->getType();

VPBasicBlock *LatchVPBB =		VPBasicBlock *LatchVPBB =
PhiR->getParent()->getEnclosingLoopRegion()->getExitingBasicBlock();		PhiR->getParent()->getEnclosingLoopRegion()->getExitingBasicBlock();
BasicBlock *VectorLoopLatch = State.CFG.VPBB2IRBB[LatchVPBB];		BasicBlock *VectorLoopLatch = State.CFG.VPBB2IRBB[LatchVPBB].back();
// If tail is folded by masking, the vector value to leave the loop should be		// If tail is folded by masking, the vector value to leave the loop should be
// a Select choosing between the vectorized LoopExitInst and vectorized Phi,		// a Select choosing between the vectorized LoopExitInst and vectorized Phi,
// instead of the former. For an inloop reduction the reduction will already		// instead of the former. For an inloop reduction the reduction will already
// be predicated, and does not need to be handled here.		// be predicated, and does not need to be handled here.
if (Cost->foldTailByMasking() && !PhiR->isInLoop()) {		if (Cost->foldTailByMasking() && !PhiR->isInLoop()) {
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *VecLoopExitInst = State.get(LoopExitInstDef, Part);		Value *VecLoopExitInst = State.get(LoopExitInstDef, Part);
SelectInst *Sel = nullptr;		SelectInst *Sel = nullptr;
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	for (VPRecipeBase &P : VPBB->phis()) {
if (!VPPhi)		if (!VPPhi)
continue;		continue;
PHINode *NewPhi = cast<PHINode>(State.get(VPPhi, 0));		PHINode *NewPhi = cast<PHINode>(State.get(VPPhi, 0));
// Make sure the builder has a valid insert point.		// Make sure the builder has a valid insert point.
Builder.SetInsertPoint(NewPhi);		Builder.SetInsertPoint(NewPhi);
for (unsigned i = 0; i < VPPhi->getNumOperands(); ++i) {		for (unsigned i = 0; i < VPPhi->getNumOperands(); ++i) {
VPValue *Inc = VPPhi->getIncomingValue(i);		VPValue *Inc = VPPhi->getIncomingValue(i);
VPBasicBlock *VPBB = VPPhi->getIncomingBlock(i);		VPBasicBlock *VPBB = VPPhi->getIncomingBlock(i);
NewPhi->addIncoming(State.get(Inc, 0), State.CFG.VPBB2IRBB[VPBB]);		NewPhi->addIncoming(State.get(Inc, 0), State.CFG.VPBB2IRBB[VPBB].back());
}		}
}		}
}		}
}		}

bool InnerLoopVectorizer::useOrderedReductions(		bool InnerLoopVectorizer::useOrderedReductions(
const RecurrenceDescriptor &RdxDesc) {		const RecurrenceDescriptor &RdxDesc) {
return Cost->useOrderedReductions(RdxDesc);		return Cost->useOrderedReductions(RdxDesc);
▲ Show 20 Lines • Show All 3,436 Lines • ▼ Show 20 Lines	void LoopVectorizationPlanner::executePlan(ElementCount BestVF, unsigned BestUF,
MDNode *OrigLoopID = OrigLoop->getLoopID();		MDNode *OrigLoopID = OrigLoop->getLoopID();

Optional<MDNode *> VectorizedLoopID =		Optional<MDNode *> VectorizedLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,		makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
LLVMLoopVectorizeFollowupVectorized});		LLVMLoopVectorizeFollowupVectorized});

VPBasicBlock *HeaderVPBB =		VPBasicBlock *HeaderVPBB =
BestVPlan.getVectorLoopRegion()->getEntryBasicBlock();		BestVPlan.getVectorLoopRegion()->getEntryBasicBlock();
Loop *L = LI->getLoopFor(State.CFG.VPBB2IRBB[HeaderVPBB]);		Loop *L = LI->getLoopFor(State.CFG.VPBB2IRBB[HeaderVPBB].back());
if (VectorizedLoopID)		if (VectorizedLoopID)
L->setLoopID(VectorizedLoopID.value());		L->setLoopID(VectorizedLoopID.value());
else {		else {
// Keep all loop hints from the original loop on the vector loop (we'll		// Keep all loop hints from the original loop on the vector loop (we'll
// replace the vectorizer-specific hints below).		// replace the vectorizer-specific hints below).
if (MDNode *LID = OrigLoop->getLoopID())		if (MDNode *LID = OrigLoop->getLoopID())
L->setLoopID(LID);		L->setLoopID(LID);

▲ Show 20 Lines • Show All 2,937 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines	struct CFGState {
/// The previous IR BasicBlock created or used. Initially set to the new		/// The previous IR BasicBlock created or used. Initially set to the new
/// header BasicBlock.		/// header BasicBlock.
BasicBlock *PrevBB = nullptr;		BasicBlock *PrevBB = nullptr;

/// The last IR BasicBlock in the output IR. Set to the exit block of the		/// The last IR BasicBlock in the output IR. Set to the exit block of the
/// vector loop.		/// vector loop.
BasicBlock *ExitBB = nullptr;		BasicBlock *ExitBB = nullptr;

/// A mapping of each VPBasicBlock to the corresponding BasicBlock. In case		/// A mapping of each VPBasicBlock to the corresponding BasicBlocks.
/// of replication, maps the BasicBlock of the last replica created.		SmallDenseMap<VPBasicBlock , SmallVector<BasicBlock , 1> > VPBB2IRBB;
SmallDenseMap<VPBasicBlock , BasicBlock > VPBB2IRBB;

CFGState() = default;		CFGState() = default;

/// Returns the BasicBlock* mapped to the pre-header of the loop region		/// Returns the BasicBlock* mapped to the pre-header of the loop region
/// containing \p R.		/// containing \p R.
BasicBlock getPreheaderBBFor(VPRecipeBase R);		BasicBlock getPreheaderBBFor(VPRecipeBase R);
} CFG;		} CFG;

▲ Show 20 Lines • Show All 2,721 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	Value VPTransformState::get(VPValue Def, const VPIteration &Instance) {
// TODO: Cache created scalar values.		// TODO: Cache created scalar values.
Value *Lane = Instance.Lane.getAsRuntimeExpr(Builder, VF);		Value *Lane = Instance.Lane.getAsRuntimeExpr(Builder, VF);
auto *Extract = Builder.CreateExtractElement(VecPart, Lane);		auto *Extract = Builder.CreateExtractElement(VecPart, Lane);
// set(Def, Extract, Instance);		// set(Def, Extract, Instance);
return Extract;		return Extract;
}		}
BasicBlock VPTransformState::CFGState::getPreheaderBBFor(VPRecipeBase R) {		BasicBlock VPTransformState::CFGState::getPreheaderBBFor(VPRecipeBase R) {
VPRegionBlock *LoopRegion = R->getParent()->getEnclosingLoopRegion();		VPRegionBlock *LoopRegion = R->getParent()->getEnclosingLoopRegion();
return VPBB2IRBB[LoopRegion->getPreheaderVPBB()];		return VPBB2IRBB[LoopRegion->getPreheaderVPBB()].back();
}		}

void VPTransformState::addNewMetadata(Instruction *To,		void VPTransformState::addNewMetadata(Instruction *To,
const Instruction *Orig) {		const Instruction *Orig) {
// If the loop was versioned with memchecks, add the corresponding no-alias		// If the loop was versioned with memchecks, add the corresponding no-alias
// metadata.		// metadata.
if (LVer && (isa<LoadInst>(Orig) \|\| isa<StoreInst>(Orig)))		if (LVer && (isa<LoadInst>(Orig) \|\| isa<StoreInst>(Orig)))
LVer->annotateInstWithNoAlias(To, Orig);		LVer->annotateInstWithNoAlias(To, Orig);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
BasicBlock *NewBB = BasicBlock::Create(PrevBB->getContext(), getName(),		BasicBlock *NewBB = BasicBlock::Create(PrevBB->getContext(), getName(),
PrevBB->getParent(), CFG.ExitBB);		PrevBB->getParent(), CFG.ExitBB);
LLVM_DEBUG(dbgs() << "LV: created " << NewBB->getName() << '\n');		LLVM_DEBUG(dbgs() << "LV: created " << NewBB->getName() << '\n');

// Hook up the new basic block to its predecessors.		// Hook up the new basic block to its predecessors.
for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {		for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();		VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();
auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();		auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();
BasicBlock *PredBB = CFG.VPBB2IRBB[PredVPBB];		BasicBlock *PredBB = CFG.VPBB2IRBB[PredVPBB].back();

assert(PredBB && "Predecessor basic-block not found building successor.");		assert(PredBB && "Predecessor basic-block not found building successor.");
auto *PredBBTerminator = PredBB->getTerminator();		auto *PredBBTerminator = PredBB->getTerminator();
LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');		LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');

auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator);		auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator);
if (isa<UnreachableInst>(PredBBTerminator)) {		if (isa<UnreachableInst>(PredBBTerminator)) {
assert(PredVPSuccessors.size() == 1 &&		assert(PredVPSuccessors.size() == 1 &&
Show All 33 Lines	if (getPlan()->getVectorLoopRegion()->getSingleSuccessor() == this) {
NewBB = State->CFG.ExitBB;		NewBB = State->CFG.ExitBB;
State->CFG.PrevBB = NewBB;		State->CFG.PrevBB = NewBB;

// Update the branch instruction in the predecessor to branch to ExitBB.		// Update the branch instruction in the predecessor to branch to ExitBB.
VPBlockBase *PredVPB = getSingleHierarchicalPredecessor();		VPBlockBase *PredVPB = getSingleHierarchicalPredecessor();
VPBasicBlock *ExitingVPBB = PredVPB->getExitingBasicBlock();		VPBasicBlock *ExitingVPBB = PredVPB->getExitingBasicBlock();
assert(PredVPB->getSingleSuccessor() == this &&		assert(PredVPB->getSingleSuccessor() == this &&
"predecessor must have the current block as only successor");		"predecessor must have the current block as only successor");
BasicBlock *ExitingBB = State->CFG.VPBB2IRBB[ExitingVPBB];		BasicBlock *ExitingBB = State->CFG.VPBB2IRBB[ExitingVPBB].back();
// The Exit block of a loop is always set to be successor 0 of the Exiting		// The Exit block of a loop is always set to be successor 0 of the Exiting
// block.		// block.
cast<BranchInst>(ExitingBB->getTerminator())->setSuccessor(0, NewBB);		cast<BranchInst>(ExitingBB->getTerminator())->setSuccessor(0, NewBB);
} else if (PrevVPBB && /* A */		} else if (PrevVPBB && /* A */
!((SingleHPred = getSingleHierarchicalPredecessor()) &&		!((SingleHPred = getSingleHierarchicalPredecessor()) &&
SingleHPred->getExitingBasicBlock() == PrevVPBB &&		SingleHPred->getExitingBasicBlock() == PrevVPBB &&
PrevVPBB->getSingleHierarchicalSuccessor() &&		PrevVPBB->getSingleHierarchicalSuccessor() &&
(SingleHPred->getParent() == getEnclosingLoopRegion() &&		(SingleHPred->getParent() == getEnclosingLoopRegion() &&
Show All 19 Lines	if (getPlan()->getVectorLoopRegion()->getSingleSuccessor() == this) {
State->Builder.SetInsertPoint(Terminator);		State->Builder.SetInsertPoint(Terminator);
State->CFG.PrevBB = NewBB;		State->CFG.PrevBB = NewBB;
}		}

// 2. Fill the IR basic block with IR instructions.		// 2. Fill the IR basic block with IR instructions.
LLVM_DEBUG(dbgs() << "LV: vectorizing VPBB:" << getName()		LLVM_DEBUG(dbgs() << "LV: vectorizing VPBB:" << getName()
<< " in BB:" << NewBB->getName() << '\n');		<< " in BB:" << NewBB->getName() << '\n');

State->CFG.VPBB2IRBB[this] = NewBB;		State->CFG.VPBB2IRBB[this].push_back(NewBB);
		AyalUnsubmitted Not Done Reply Inline Actions Hmm, VPBasicBlock::execute() is called once per VPBB when generating IR, so how could VPBB2IRBB record multiple/overwriting NewBB's here per same `this`? If there are such cases, perhaps its better to simplify them and retain a single IRBB per VPBB, adding an assert that no overwriting takes place. VPBB's contain only recipes and are free of control-flow - which is modeled using multiple VPBB's and VPRegions - so should fill a single IRBB. A related issue are A-B-C cases above that try to reuse the same IRBB for pairs of back-to-back VPBB's (which may best be avoided for clarity and left to subsequent simplifyCFG to fold instead), but these are multiple VPBB's filling one IRBB, rather than the converse. Another related issue is the correspondence between original IRBB's and VPBB's during VPlan construction rather than execution. There multiple replicate-and-predicate recipes that stem from the same IRBB are each assigned separate VPBB's (of a replicating region), as can be seen by trying to assign each such VPBB a unique name associated with its original IRBB (`VPBBsForBB`). Again yielding multiple VPBB's per one IRBB. Ayal: Hmm, VPBasicBlock::execute() is called once per VPBB when generating IR, so how could VPBB2IRBB…
		AyalUnsubmitted Not Done Reply Inline Actions Hmm, VPBasicBlock::execute() is called once per VPBB when generating IR, so how could VPBB2IRBB record multiple/overwriting NewBB's here per same this? Ahh, a Replicating Region executes each of its VPBB's VFUF times, effectively performing complete unrolling at VPlan execution time. It may indeed be clearer to either fully unroll in VPlan (when setting VF and UF?) or emit a loop instead of unrolling it. Ayal:* > Hmm, VPBasicBlock::execute() is called once per VPBB when generating IR, so how could…
State->CFG.PrevVPBB = this;		State->CFG.PrevVPBB = this;

for (VPRecipeBase &Recipe : Recipes)		for (VPRecipeBase &Recipe : Recipes)
Recipe.execute(*State);		Recipe.execute(*State);

LLVM_DEBUG(dbgs() << "LV: filled BB:" << *NewBB);		LLVM_DEBUG(dbgs() << "LV: filled BB:" << *NewBB);
}		}

▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines

void VPRegionBlock::execute(VPTransformState *State) {		void VPRegionBlock::execute(VPTransformState *State) {
ReversePostOrderTraversal<VPBlockBase *> RPOT(Entry);		ReversePostOrderTraversal<VPBlockBase *> RPOT(Entry);

if (!isReplicator()) {		if (!isReplicator()) {
// Create and register the new vector loop.		// Create and register the new vector loop.
Loop *PrevLoop = State->CurrentVectorLoop;		Loop *PrevLoop = State->CurrentVectorLoop;
State->CurrentVectorLoop = State->LI->AllocateLoop();		State->CurrentVectorLoop = State->LI->AllocateLoop();
BasicBlock *VectorPH = State->CFG.VPBB2IRBB[getPreheaderVPBB()];		BasicBlock *VectorPH = State->CFG.VPBB2IRBB[getPreheaderVPBB()].back();
Loop *ParentLoop = State->LI->getLoopFor(VectorPH);		Loop *ParentLoop = State->LI->getLoopFor(VectorPH);

// Insert the new loop into the loop nest and register the new basic blocks		// Insert the new loop into the loop nest and register the new basic blocks
// before calling any utilities such as SCEV that require valid LoopInfo.		// before calling any utilities such as SCEV that require valid LoopInfo.
if (ParentLoop)		if (ParentLoop)
ParentLoop->addChildLoop(State->CurrentVectorLoop);		ParentLoop->addChildLoop(State->CurrentVectorLoop);
else		else
State->LI->addTopLevelLoop(State->CurrentVectorLoop);		State->LI->addTopLevelLoop(State->CurrentVectorLoop);
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	void VPlan::execute(VPTransformState *State) {
BasicBlock *VectorPreHeader = State->CFG.PrevBB;		BasicBlock *VectorPreHeader = State->CFG.PrevBB;
State->Builder.SetInsertPoint(VectorPreHeader->getTerminator());		State->Builder.SetInsertPoint(VectorPreHeader->getTerminator());

// Generate code in the loop pre-header and body.		// Generate code in the loop pre-header and body.
for (VPBlockBase *Block : depth_first(Entry))		for (VPBlockBase *Block : depth_first(Entry))
Block->execute(State);		Block->execute(State);

VPBasicBlock *LatchVPBB = getVectorLoopRegion()->getExitingBasicBlock();		VPBasicBlock *LatchVPBB = getVectorLoopRegion()->getExitingBasicBlock();
BasicBlock *VectorLatchBB = State->CFG.VPBB2IRBB[LatchVPBB];		BasicBlock *VectorLatchBB = State->CFG.VPBB2IRBB[LatchVPBB].back();

// Fix the latch value of canonical, reduction and first-order recurrences		// Fix the latch value of canonical, reduction and first-order recurrences
// phis in the vector loop.		// phis in the vector loop.
VPBasicBlock *Header = getVectorLoopRegion()->getEntryBasicBlock();		VPBasicBlock *Header = getVectorLoopRegion()->getEntryBasicBlock();
for (VPRecipeBase &R : Header->phis()) {		for (VPRecipeBase &R : Header->phis()) {
// Skip phi-like recipes that generate their backedege values themselves.		// Skip phi-like recipes that generate their backedege values themselves.
if (isa<VPWidenPHIRecipe>(&R))		if (isa<VPWidenPHIRecipe>(&R))
continue;		continue;
Show All 39 Lines	for (unsigned Part = 0; Part < LastPartForNewPhi; ++Part) {
Value *Val = State->get(PhiR->getBackedgeValue(),		Value *Val = State->get(PhiR->getBackedgeValue(),
SinglePartNeeded ? State->UF - 1 : Part);		SinglePartNeeded ? State->UF - 1 : Part);
cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);		cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
}		}
}		}

// We do not attempt to preserve DT for outer loop vectorization currently.		// We do not attempt to preserve DT for outer loop vectorization currently.
if (!EnableVPlanNativePath) {		if (!EnableVPlanNativePath) {
BasicBlock *VectorHeaderBB = State->CFG.VPBB2IRBB[Header];		BasicBlock *VectorHeaderBB = State->CFG.VPBB2IRBB[Header].front();
State->DT->addNewBlock(VectorHeaderBB, VectorPreHeader);		State->DT->addNewBlock(VectorHeaderBB, VectorPreHeader);
updateDominatorTree(State->DT, VectorHeaderBB, VectorLatchBB,		updateDominatorTree(State->DT, VectorHeaderBB, VectorLatchBB,
State->CFG.ExitBB);		State->CFG.ExitBB);
}		}
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD		LLVM_DUMP_METHOD
▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Show First 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	case VPInstruction::BranchOnCond: {

// Replace the temporary unreachable terminator with a new conditional		// Replace the temporary unreachable terminator with a new conditional
// branch, hooking it up to backward destination for exiting blocks now and		// branch, hooking it up to backward destination for exiting blocks now and
// to forward destination(s) later when they are created.		// to forward destination(s) later when they are created.
BranchInst *CondBr =		BranchInst *CondBr =
Builder.CreateCondBr(Cond, Builder.GetInsertBlock(), nullptr);		Builder.CreateCondBr(Cond, Builder.GetInsertBlock(), nullptr);

if (getParent()->isExiting())		if (getParent()->isExiting())
CondBr->setSuccessor(1, State.CFG.VPBB2IRBB[Header]);		CondBr->setSuccessor(1, State.CFG.VPBB2IRBB[Header].front());

CondBr->setSuccessor(0, nullptr);		CondBr->setSuccessor(0, nullptr);
Builder.GetInsertBlock()->getTerminator()->eraseFromParent();		Builder.GetInsertBlock()->getTerminator()->eraseFromParent();
break;		break;
}		}
case VPInstruction::BranchOnCount: {		case VPInstruction::BranchOnCount: {
if (Part != 0)		if (Part != 0)
break;		break;
// First create the compare.		// First create the compare.
Value *IV = State.get(getOperand(0), Part);		Value *IV = State.get(getOperand(0), Part);
Value *TC = State.get(getOperand(1), Part);		Value *TC = State.get(getOperand(1), Part);
Value *Cond = Builder.CreateICmpEQ(IV, TC);		Value *Cond = Builder.CreateICmpEQ(IV, TC);

// Now create the branch.		// Now create the branch.
auto *Plan = getParent()->getPlan();		auto *Plan = getParent()->getPlan();
VPRegionBlock *TopRegion = Plan->getVectorLoopRegion();		VPRegionBlock *TopRegion = Plan->getVectorLoopRegion();
VPBasicBlock *Header = TopRegion->getEntry()->getEntryBasicBlock();		VPBasicBlock *Header = TopRegion->getEntry()->getEntryBasicBlock();

// Replace the temporary unreachable terminator with a new conditional		// Replace the temporary unreachable terminator with a new conditional
// branch, hooking it up to backward destination (the header) now and to the		// branch, hooking it up to backward destination (the header) now and to the
// forward destination (the exit/middle block) later when it is created.		// forward destination (the exit/middle block) later when it is created.
// Note that CreateCondBr expects a valid BB as first argument, so we need		// Note that CreateCondBr expects a valid BB as first argument, so we need
// to set it to nullptr later.		// to set it to nullptr later.
BranchInst *CondBr = Builder.CreateCondBr(Cond, Builder.GetInsertBlock(),		BranchInst *CondBr = Builder.CreateCondBr(Cond, Builder.GetInsertBlock(),
State.CFG.VPBB2IRBB[Header]);		State.CFG.VPBB2IRBB[Header].front());
CondBr->setSuccessor(0, nullptr);		CondBr->setSuccessor(0, nullptr);
Builder.GetInsertBlock()->getTerminator()->eraseFromParent();		Builder.GetInsertBlock()->getTerminator()->eraseFromParent();
break;		break;
}		}
default:		default:
llvm_unreachable("Unsupported opcode for instruction");		llvm_unreachable("Unsupported opcode for instruction");
}		}
}		}
▲ Show 20 Lines • Show All 892 Lines • Show Last 20 Lines