This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
scev-verify-ir.ll

Differential D100663

[LV] Add undef incoming value to loop-exit phis for the middle-block.
AbandonedPublic

Authored by fhahn on Apr 16 2021, 10:19 AM.

Download Raw Diff

Details

Reviewers

reames
Ayal
rengolin
gilr

Summary

LV temporarily creates invalid IR, which can trip over SCEV. In
particular, LV adds a new branch to the exit block of the scalar loop.
This means the PHIs in the loop exit block now are invalid. To avoid
issues with SCEV, add an undef incoming value for the middle-block. This
will later be replaced by the concrete value after vectorization.

Fixes PR49538, PR49900.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Apr 16 2021, 10:19 AM

Herald added subscribers: javed.absar, hiraditya. · View Herald TranscriptApr 16 2021, 10:19 AM

fhahn requested review of this revision.Apr 16 2021, 10:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2021, 10:19 AM

This is incorrect and will result in subtle miscompiles.

Why? Because SCEV is queried over the IR before point of undef insertion and replacement. SCEV will - correctly - assume that it can substitute any valid concrete value for the undef. However, the actual value later inserted is more constrained than undef. As such, the intermediate SCEV results - both as used immediately and as cached - will be incorrect for the final IR.

reames requested changes to this revision.Apr 16 2021, 10:51 AM

This revision now requires changes to proceed.Apr 16 2021, 10:51 AM

Harbormaster completed remote builds in B99212: Diff 338165.Apr 16 2021, 11:58 AM

In D100663#2695304, @reames wrote:

This is incorrect and will result in subtle miscompiles.

Why? Because SCEV is queried over the IR before point of undef insertion and replacement. SCEV will - correctly - assume that it can substitute any valid concrete value for the undef. However, the actual value later inserted is more constrained than undef. As such, the intermediate SCEV results - both as used immediately and as cached - will be incorrect for the final IR.

Thanks for taking a look and apologies for not doing a good job at conveying my thinking. My rational was that at the point we add the undef incoming value, it is yet undetermined which edge will be taken, so SCEV will constrain the undef value, but only to the incoming value of the scalar loop. In the final IR, both incoming values should result in the same value at runtime. Does that make sense or am I still missing something?

After writing this up, I went back and checked what we use as temporary condition and realized that we are using true, which means that the correct incoming value from the scalar loop is dead, until we update the branch condition. This means SCEV would indeed be able to assume an arbitrary value for the PHI, which indeed would be incorrect. But I think we may be able to avoid this problem, by making the condition false initially, to make the undef incoming value dead, until we set the final branch condition. In this patch, I moved the code for setting the condition to happen after we are done with vector-code generation. The main uses of SCEV (runtime check generation) should happen before the completeLoopSkeleton, so it may be OK to keep the code there as well, but it seems safer to wait until the phis are actually fixed.

Note that this patch also splits up the code to create and set the condition, so there are no changes in the generated IR (creating the condition at the point it is set would result in a few instructions getting reordered). I am more than happy to discuss the best place, if the direction looks good in general.

Harbormaster completed remote builds in B99375: Diff 338374.Apr 18 2021, 7:14 AM

uabelho added a subscriber: uabelho.Apr 19 2021, 1:55 AM

In D100663#2697223, @fhahn wrote:

Thanks for taking a look and apologies for not doing a good job at conveying my thinking. My rational was that at the point we add the undef incoming value, it is yet undetermined which edge will be taken, so SCEV will constrain the undef value, but only to the incoming value of the scalar loop. In the final IR, both incoming values should result in the same value at runtime. Does that make sense or am I still missing something?

I see where you're going here (assuming the fix you mention after this in your long comment), and this could maybe be made to work, but I'm still nervous.

I see two classes of potential issues.

SCEV could cache a result which was true before we rewrote the branch conditions, but not true afterwards. (e.g. treated the phi as-if it were single entry) We could invalidate to resolve this, but that means our placement of invalidates after *every* condition modification has to be perfect or we get nasty bugs.

The IR might not actually allow us to assume the scalar path and the vector path always produce the same values. I don't know the existing code structure well enough, do we ever mutate the scalar loop based on the existence of the prechecks or vector bodies? In particular, maybe when tail folding? If we do, there is no safe value to use.

I think what we need here is an actual "unknown until later" value. I'd be tempted to add an intrinsic for that, except I'm not quite sure if we need such a thing in a constant expression anywhere. This was one of the things I'd been hoping to chat through with you offline when you had a moment. :)

I'll also note that at the end of the day, I'm willing to defer to you if you think the approach here is good enough after reading the above. I don't want to block "better" on "perfect" here. If you do want to move forward with this approach, let me know and I'll do a review of the code (as opposed to the idea).

In D100663#2698977, @reames wrote:

In D100663#2697223, @fhahn wrote:

Thanks for taking a look and apologies for not doing a good job at conveying my thinking. My rational was that at the point we add the undef incoming value, it is yet undetermined which edge will be taken, so SCEV will constrain the undef value, but only to the incoming value of the scalar loop. In the final IR, both incoming values should result in the same value at runtime. Does that make sense or am I still missing something?

I see where you're going here (assuming the fix you mention after this in your long comment), and this could maybe be made to work, but I'm still nervous.

I see two classes of potential issues.

SCEV could cache a result which was true before we rewrote the branch conditions, but not true afterwards. (e.g. treated the phi as-if it were single entry) We could invalidate to resolve this, but that means our placement of invalidates after *every* condition modification has to be perfect or we get nasty bugs.

The IR might not actually allow us to assume the scalar path and the vector path always produce the same values. I don't know the existing code structure well enough, do we ever mutate the scalar loop based on the existence of the prechecks or vector bodies? In particular, maybe when tail folding? If we do, there is no safe value to use.

I think what we need here is an actual "unknown until later" value. I'd be tempted to add an intrinsic for that, except I'm not quite sure if we need such a thing in a constant expression anywhere. This was one of the things I'd been hoping to chat through with you offline when you had a moment. :)

I think having an easy way to generate a 'unknown' value would be ideal in this case and the current patch tries to work around the fact that I don't think we currently have any way to express such a value. Agreed that discussing this offline would probably be much easier!

I'll also note that at the end of the day, I'm willing to defer to you if you think the approach here is good enough after reading the above. I don't want to block "better" on "perfect" here. If you do want to move forward with this approach, let me know and I'll do a review of the code (as opposed to the idea).

I don't think there's an urgent need to rush in a fix. While I think we should be fairly safe with respect to the 2 potential issues you mentioned given the way the code is generated at the moment, IMO it would be worth it to explore the alternative first.

See https://reviews.llvm.org/D101487 for near term workaround.

Let's go with the simpler workaround for now D101487

reames mentioned this in rG80e802508398: [LV] Workaround PR49900 (a crash due to analyzing partially mutated IR).May 5 2021, 10:00 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

92 lines

test/

Transforms/

LoopVectorize/

scev-verify-ir.ll

77 lines

Diff 338374

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 488 Lines • ▼ Show 20 Lines	public:
/// Widen a single call instruction within the innermost loop.		/// Widen a single call instruction within the innermost loop.
void widenCallInstruction(CallInst &I, VPValue *Def, VPUser &ArgOperands,		void widenCallInstruction(CallInst &I, VPValue *Def, VPUser &ArgOperands,
VPTransformState &State);		VPTransformState &State);

/// Widen a single select instruction within the innermost loop.		/// Widen a single select instruction within the innermost loop.
void widenSelectInstruction(SelectInst &I, VPValue *VPDef, VPUser &Operands,		void widenSelectInstruction(SelectInst &I, VPValue *VPDef, VPUser &Operands,
bool InvariantCond, VPTransformState &State);		bool InvariantCond, VPTransformState &State);

/// Fix the vectorized code, taking care of header phi's, live-outs, and more.		/// Fix the vectorized code, taking care of updating the branch condition in
		/// the middle block, setting header phi's, live-outs, and more.
void fixVectorizedLoop(VPTransformState &State);		void fixVectorizedLoop(VPTransformState &State);

// Return true if any runtime check is added.		// Return true if any runtime check is added.
bool areSafetyChecksAdded() { return AddedSafetyChecks; }		bool areSafetyChecksAdded() { return AddedSafetyChecks; }

/// A type for vectorized values in the new loop. Each value from the		/// A type for vectorized values in the new loop. Each value from the
/// original loop, when vectorized, is represented by UF vector values in the		/// original loop, when vectorized, is represented by UF vector values in the
/// new unrolled loop, where UF is the unroll factor.		/// new unrolled loop, where UF is the unroll factor.
▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	protected:
/// In cases where the loop skeleton is more complicated (eg. epilogue		/// In cases where the loop skeleton is more complicated (eg. epilogue
/// vectorization) and the resume values can come from an additional bypass		/// vectorization) and the resume values can come from an additional bypass
/// block, the \p AdditionalBypass pair provides information about the bypass		/// block, the \p AdditionalBypass pair provides information about the bypass
/// block and the end value on the edge from bypass to this loop.		/// block and the end value on the edge from bypass to this loop.
void createInductionResumeValues(		void createInductionResumeValues(
Loop L, Value VectorTripCount,		Loop L, Value VectorTripCount,
std::pair<BasicBlock , Value > AdditionalBypass = {nullptr, nullptr});		std::pair<BasicBlock , Value > AdditionalBypass = {nullptr, nullptr});

/// Complete the loop skeleton by adding debug MDs, creating appropriate		/// Complete the loop skeleton by adding debug MDs, preparing the builder and
/// conditional branches in the middle block, preparing the builder and
/// running the verifier. Take in the vector loop \p L as argument, and return		/// running the verifier. Take in the vector loop \p L as argument, and return
/// the preheader of the completed vector loop.		/// the preheader of the completed vector loop.
BasicBlock completeLoopSkeleton(Loop L, MDNode *OrigLoopID);		BasicBlock completeLoopSkeleton(Loop L, MDNode *OrigLoopID);

/// Add additional metadata to \p To that was not present on \p Orig.		/// Add additional metadata to \p To that was not present on \p Orig.
///		///
/// Currently this is used to add the noalias annotations based on the		/// Currently this is used to add the noalias annotations based on the
/// inserted memchecks. Use this for instructions that are cloned into the		/// inserted memchecks. Use this for instructions that are cloned into the
▲ Show 20 Lines • Show All 2,649 Lines • ▼ Show 20 Lines	Loop *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopMiddleBlock =		LoopMiddleBlock =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,		SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
LI, nullptr, Twine(Prefix) + "middle.block");		LI, nullptr, Twine(Prefix) + "middle.block");
LoopScalarPreHeader =		LoopScalarPreHeader =
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,		SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
nullptr, Twine(Prefix) + "scalar.ph");		nullptr, Twine(Prefix) + "scalar.ph");

// Set up branch from middle block to the exit and scalar preheader blocks.		// Set up branch from middle block to the exit and scalar preheader blocks.
// completeLoopSkeleton will update the condition to use an iteration check,		// fixVectorizedLoop will update the condition to use an iteration check,
// if required to decide whether to execute the remainder.		// if required to decide whether to execute the remainder.
BranchInst *BrInst =		BranchInst *BrInst = BranchInst::Create(LoopExitBlock, LoopScalarPreHeader,
BranchInst::Create(LoopExitBlock, LoopScalarPreHeader, Builder.getTrue());		Builder.getFalse());
auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();		auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());		BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);		ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);
		// LoopExitBlock now has LoopMiddleBlock as new predecessor. Add a dummy
		// incoming value, to keep the IR valid until it gets replaced with the
		// concrete value after vectorization. The edge from LoopMiddleBlock to
		// LoopExitBlock is guaranteed to be dead at least until the phi is updated
		// with the concrete incoming value.
		for (PHINode &P : LoopExitBlock->phis())
		P.addIncoming(UndefValue::get(P.getType()), LoopMiddleBlock);

// We intentionally don't let SplitBlock to update LoopInfo since		// We intentionally don't let SplitBlock to update LoopInfo since
// LoopVectorBody should belong to another loop than LoopVectorPreHeader.		// LoopVectorBody should belong to another loop than LoopVectorPreHeader.
// LoopVectorBody is explicitly added to the correct place few lines later.		// LoopVectorBody is explicitly added to the correct place few lines later.
LoopVectorBody =		LoopVectorBody =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,		SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
nullptr, nullptr, Twine(Prefix) + "vector.body");		nullptr, nullptr, Twine(Prefix) + "vector.body");

▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	if (AdditionalBypass.first)
EndValueFromAdditionalBypass);		EndValueFromAdditionalBypass);

OrigPhi->setIncomingValueForBlock(LoopScalarPreHeader, BCResumeVal);		OrigPhi->setIncomingValueForBlock(LoopScalarPreHeader, BCResumeVal);
}		}
}		}

BasicBlock InnerLoopVectorizer::completeLoopSkeleton(Loop L,		BasicBlock InnerLoopVectorizer::completeLoopSkeleton(Loop L,
MDNode *OrigLoopID) {		MDNode *OrigLoopID) {
assert(L && "Expected valid loop.");

// The trip counts should be cached by now.
Value *Count = getOrCreateTripCount(L);
Value *VectorTripCount = getOrCreateVectorTripCount(L);

auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();

// Add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop.
// If (N - N%VF) == N, then we don't need to run the remainder.
// If tail is to be folded, we know we don't need to run the remainder.
if (!Cost->foldTailByMasking()) {
Instruction *CmpN = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ,
Count, VectorTripCount, "cmp.n",
LoopMiddleBlock->getTerminator());

// Here we use the same DebugLoc as the scalar loop latch terminator instead
// of the corresponding compare because they may have ended up with
// different line numbers and we want to avoid awkward line stepping while
// debugging. Eg. if the compare has got a line number inside the loop.
CmpN->setDebugLoc(ScalarLatchTerm->getDebugLoc());
cast<BranchInst>(LoopMiddleBlock->getTerminator())->setCondition(CmpN);
}

// Get ready to start creating new instructions into the vectorized body.		// Get ready to start creating new instructions into the vectorized body.
assert(LoopVectorPreHeader == L->getLoopPreheader() &&		assert(LoopVectorPreHeader == L->getLoopPreheader() &&
"Inconsistent vector loop preheader");		"Inconsistent vector loop preheader");
Builder.SetInsertPoint(&*LoopVectorBody->getFirstInsertionPt());		Builder.SetInsertPoint(&*LoopVectorBody->getFirstInsertionPt());

Optional<MDNode *> VectorizedLoopID =		Optional<MDNode *> VectorizedLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,		makeFollowupLoopID(OrigLoopID, {LLVMLoopVectorizeFollowupAll,
LLVMLoopVectorizeFollowupVectorized});		LLVMLoopVectorizeFollowupVectorized});
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,

for (auto &I : MissingVals) {		for (auto &I : MissingVals) {
PHINode *PHI = cast<PHINode>(I.first);		PHINode *PHI = cast<PHINode>(I.first);
// One corner case we have to handle is two IVs "chasing" each-other,		// One corner case we have to handle is two IVs "chasing" each-other,
// that is %IV2 = phi [...], [ %IV1, %latch ]		// that is %IV2 = phi [...], [ %IV1, %latch ]
// In this case, if IV1 has an external use, we need to avoid adding both		// In this case, if IV1 has an external use, we need to avoid adding both
// "last value of IV1" and "penultimate value of IV2". So, verify that we		// "last value of IV1" and "penultimate value of IV2". So, verify that we
// don't already have an incoming value for the middle block.		// don't already have an incoming value for the middle block.
if (PHI->getBasicBlockIndex(MiddleBlock) == -1)		if (isa<UndefValue>(PHI->getIncomingValueForBlock(MiddleBlock)))
PHI->addIncoming(I.second, MiddleBlock);		PHI->setIncomingValueForBlock(MiddleBlock, I.second);
}		}
}		}

namespace {		namespace {

struct CSEDenseMapInfo {		struct CSEDenseMapInfo {
static bool canHandle(const Instruction *I) {		static bool canHandle(const Instruction *I) {
return isa<InsertElementInst>(I) \|\| isa<ExtractElementInst>(I) \|\|		return isa<InsertElementInst>(I) \|\| isa<ExtractElementInst>(I) \|\|
▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
Inst->eraseFromParent();		Inst->eraseFromParent();
State.reset(Def, NewI, Part);		State.reset(Def, NewI, Part);
}		}
}		}
}		}
}		}

void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {		void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
		auto *L = LI->getLoopFor(LoopVectorBody);
		// The trip counts should be cached by now.
		Value *Count = getOrCreateTripCount(L);
		Value *VectorTripCount = getOrCreateVectorTripCount(L);
		auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
		Value *MiddleBlockCond = Builder.getTrue();
		// Create a check in the middle block to see if we have completed
		// all of the iterations in the first vector loop.
		// If (N - N%VF) == N, then we don't need to run the remainder.
		// If tail is to be folded, we know we don't need to run the remainder.
		if (!Cost->foldTailByMasking()) {
		Instruction *CmpN = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ,
		Count, VectorTripCount, "cmp.n",
		LoopMiddleBlock->getTerminator());

		// Here we use the same DebugLoc as the scalar loop latch terminator instead
		// of the corresponding compare because they may have ended up with
		// different line numbers and we want to avoid awkward line stepping while
		// debugging. Eg. if the compare has got a line number inside the loop.
		CmpN->setDebugLoc(ScalarLatchTerm->getDebugLoc());
		MiddleBlockCond = CmpN;
		}

// Insert truncates and extends for any truncated instructions as hints to		// Insert truncates and extends for any truncated instructions as hints to
// InstCombine.		// InstCombine.
if (VF.isVector())		if (VF.isVector())
truncateToMinimalBitwidths(State);		truncateToMinimalBitwidths(State);

// Fix widened non-induction PHIs by setting up the PHI operands.		// Fix widened non-induction PHIs by setting up the PHI operands.
if (OrigPHIsToFix.size()) {		if (OrigPHIsToFix.size()) {
assert(EnableVPlanNativePath &&		assert(EnableVPlanNativePath &&
"Unexpected non-induction PHIs for fixup in non VPlan-native path");		"Unexpected non-induction PHIs for fixup in non VPlan-native path");
fixNonInductionPHIs(State);		fixNonInductionPHIs(State);
}		}

// At this point every instruction in the original loop is widened to a		// At this point every instruction in the original loop is widened to a
// vector form. Now we need to fix the recurrences in the loop. These PHI		// vector form. Now we need to fix the recurrences in the loop. These PHI
// nodes are currently empty because we did not want to introduce cycles.		// nodes are currently empty because we did not want to introduce cycles.
// This is the second stage of vectorizing recurrences.		// This is the second stage of vectorizing recurrences.
fixCrossIterationPHIs(State);		fixCrossIterationPHIs(State);

// Forget the original basic block.
PSE.getSE()->forgetLoop(OrigLoop);

// Fix-up external users of the induction variables.		// Fix-up external users of the induction variables.
for (auto &Entry : Legal->getInductionVars())		for (auto &Entry : Legal->getInductionVars())
fixupIVUsers(Entry.first, Entry.second,		fixupIVUsers(Entry.first, Entry.second,
getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),		getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
IVEndValues[Entry.first], LoopMiddleBlock);		IVEndValues[Entry.first], LoopMiddleBlock);

fixLCSSAPHIs(State);		fixLCSSAPHIs(State);
for (Instruction *PI : PredicatedInstructions)		for (Instruction *PI : PredicatedInstructions)
sinkScalarOperands(&*PI);		sinkScalarOperands(&*PI);

		// Now all PHIs have been updated with the correct incoming values. Set the
		// correct condition for the branch from the middle block to the scalar loop
		// and exit block.
		cast<BranchInst>(LoopMiddleBlock->getTerminator())
		->setCondition(MiddleBlockCond);

		// Forget the original basic block.
		PSE.getSE()->forgetLoop(OrigLoop);

// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);

// Set/update profile weights for the vector and remainder loops as original		// Set/update profile weights for the vector and remainder loops as original
// loop iterations are now distributed among them. Note that original loop		// loop iterations are now distributed among them. Note that original loop
// represented by LoopScalarBody becomes remainder loop after vectorization.		// represented by LoopScalarBody becomes remainder loop after vectorization.
//		//
// For cases like foldTailByMasking() and requiresScalarEpiloque() we may		// For cases like foldTailByMasking() and requiresScalarEpiloque() we may
▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi,
// recurrence in the exit block, and then add an edge for the middle block.		// recurrence in the exit block, and then add an edge for the middle block.
// Note that LCSSA does not imply single entry when the original scalar loop		// Note that LCSSA does not imply single entry when the original scalar loop
// had multiple exiting edges (as we always run the last iteration in the		// had multiple exiting edges (as we always run the last iteration in the
// scalar epilogue); in that case, the exiting path through middle will be		// scalar epilogue); in that case, the exiting path through middle will be
// dynamically dead and the value picked for the phi doesn't matter.		// dynamically dead and the value picked for the phi doesn't matter.
for (PHINode &LCSSAPhi : LoopExitBlock->phis())		for (PHINode &LCSSAPhi : LoopExitBlock->phis())
if (any_of(LCSSAPhi.incoming_values(),		if (any_of(LCSSAPhi.incoming_values(),
[Phi](Value *V) { return V == Phi; }))		[Phi](Value *V) { return V == Phi; }))
LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop, LoopMiddleBlock);		LCSSAPhi.setIncomingValueForBlock(LoopMiddleBlock,
		ExtractForPhiUsedOutsideLoop);
}		}

static bool useOrderedReductions(RecurrenceDescriptor &RdxDesc) {		static bool useOrderedReductions(RecurrenceDescriptor &RdxDesc) {
return EnableStrictReductions && RdxDesc.isOrdered();		return EnableStrictReductions && RdxDesc.isOrdered();
}		}

void InnerLoopVectorizer::fixReduction(PHINode *Phi, VPTransformState &State) {		void InnerLoopVectorizer::fixReduction(PHINode *Phi, VPTransformState &State) {
// Get it's reduction variable descriptor.		// Get it's reduction variable descriptor.
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixReduction(PHINode *Phi, VPTransformState &State) {
// inside and outside of the scalar remainder loop.		// inside and outside of the scalar remainder loop.

// We know that the loop is in LCSSA form. We need to update the PHI nodes		// We know that the loop is in LCSSA form. We need to update the PHI nodes
// in the exit blocks. See comment on analogous loop in		// in the exit blocks. See comment on analogous loop in
// fixFirstOrderRecurrence for a more complete explaination of the logic.		// fixFirstOrderRecurrence for a more complete explaination of the logic.
for (PHINode &LCSSAPhi : LoopExitBlock->phis())		for (PHINode &LCSSAPhi : LoopExitBlock->phis())
if (any_of(LCSSAPhi.incoming_values(),		if (any_of(LCSSAPhi.incoming_values(),
[LoopExitInst](Value *V) { return V == LoopExitInst; }))		[LoopExitInst](Value *V) { return V == LoopExitInst; }))
LCSSAPhi.addIncoming(ReducedPartRdx, LoopMiddleBlock);		LCSSAPhi.setIncomingValueForBlock(LoopMiddleBlock, ReducedPartRdx);

// Fix the scalar loop reduction variable with the incoming reduction sum		// Fix the scalar loop reduction variable with the incoming reduction sum
// from the vector body and from the backedge value.		// from the vector body and from the backedge value.
int IncomingEdgeBlockIdx =		int IncomingEdgeBlockIdx =
Phi->getBasicBlockIndex(OrigLoop->getLoopLatch());		Phi->getBasicBlockIndex(OrigLoop->getLoopLatch());
assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");		assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");
// Pick the other block.		// Pick the other block.
int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);		int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);
Show All 28 Lines	for (User *U : Cur->users()) {
Visited.insert(UI).second)		Visited.insert(UI).second)
Worklist.push_back(UI);		Worklist.push_back(UI);
}		}
}		}
}		}

void InnerLoopVectorizer::fixLCSSAPHIs(VPTransformState &State) {		void InnerLoopVectorizer::fixLCSSAPHIs(VPTransformState &State) {
for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {		for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {
if (LCSSAPhi.getBasicBlockIndex(LoopMiddleBlock) != -1)		if (!isa<UndefValue>(LCSSAPhi.getIncomingValueForBlock(LoopMiddleBlock)))
// Some phis were already hand updated by the reduction and recurrence		// Some phis were already hand updated by the reduction and recurrence
// code above, leave them alone.		// code above, leave them alone.
continue;		continue;

auto *IncomingValue = LCSSAPhi.getIncomingValue(0);		auto *IncomingValue = LCSSAPhi.getIncomingValue(0);
// Non-instruction incoming values will have only one value.		// Non-instruction incoming values will have only one value.

VPLane Lane = VPLane::getFirstLane();		VPLane Lane = VPLane::getFirstLane();
if (isa<Instruction>(IncomingValue) &&		if (isa<Instruction>(IncomingValue) &&
!Cost->isUniformAfterVectorization(cast<Instruction>(IncomingValue),		!Cost->isUniformAfterVectorization(cast<Instruction>(IncomingValue),
VF))		VF))
Lane = VPLane::getLastLaneForVF(VF);		Lane = VPLane::getLastLaneForVF(VF);

// Can be a loop invariant incoming value or the last scalar value to be		// Can be a loop invariant incoming value or the last scalar value to be
// extracted from the vectorized loop.		// extracted from the vectorized loop.
Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());		Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
Value *lastIncomingValue =		Value *lastIncomingValue =
OrigLoop->isLoopInvariant(IncomingValue)		OrigLoop->isLoopInvariant(IncomingValue)
? IncomingValue		? IncomingValue
: State.get(State.Plan->getVPValue(IncomingValue),		: State.get(State.Plan->getVPValue(IncomingValue),
VPIteration(UF - 1, Lane));		VPIteration(UF - 1, Lane));
LCSSAPhi.addIncoming(lastIncomingValue, LoopMiddleBlock);		LCSSAPhi.setIncomingValueForBlock(LoopMiddleBlock, lastIncomingValue);
}		}
}		}

void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {		void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {
// The basic block and loop containing the predicated instruction.		// The basic block and loop containing the predicated instruction.
auto *PredBB = PredInst->getParent();		auto *PredBB = PredInst->getParent();
auto *VectorLoop = LI->getLoopFor(PredBB);		auto *VectorLoop = LI->getLoopFor(PredBB);

▲ Show 20 Lines • Show All 5,571 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/scev-verify-ir.ll

This file was added.

				; RUN: opt -loop-vectorize -force-vector-width=2 -force-vector-interleave=2 -scev-verify-ir -S %s \| FileCheck %s

				; Make sure SCEV is not queried while the IR is temporarily invalid. The tests
				; deliberately do not check for details of the vectorized IR, because that's
				; not the focus of the test.

				define void @pr49538() {
				; CHECK-LABEL: @pr49538
				; CHECK: vector.body:
				;
				entry:
				br label %loop.0

				loop.0:
				%iv.0 = phi i16 [ -1, %entry ], [ %iv.0.next, %loop.0.latch ]
				br label %loop.1

				loop.1:
				%iv.1 = phi i16 [ -1, %loop.0 ], [ %iv.1.next, %loop.1 ]
				%iv.1.next = add nsw i16 %iv.1, 1
				%i6 = icmp eq i16 %iv.1.next, %iv.0
				br i1 %i6, label %loop.0.latch, label %loop.1

				loop.0.latch:
				%i8 = phi i16 [ 1, %loop.1 ]
				%iv.0.next = add nsw i16 %iv.0, 1
				%ec.0 = icmp eq i16 %iv.0.next, %i8
				br i1 %ec.0, label %exit, label %loop.0

				exit:
				ret void
				}

				define void @pr49900(i32 %x, i64* %ptr) {
				; CHECK-LABEL: @pr49900
				; CHECK: vector.body{{.*}}:
				; CHECK: vector.body{{.*}}:
				;
				entry:
				br label %loop.0

				loop.0: ; preds = %bb2, %bb
				%ec.0 = icmp slt i32 %x, 0
				br i1 %ec.0, label %loop.0, label %loop.1.ph

				loop.1.ph: ; preds = %bb2
				br label %loop.1

				loop.1: ; preds = %bb33, %bb5
				%iv.1 = phi i32 [ 0, %loop.1.ph ], [ %iv.3.next, %loop.1.latch ]
				br label %loop.2

				loop.2:
				%iv.2 = phi i32 [ %iv.1, %loop.1 ], [ %iv.2.next, %loop.2 ]
				%tmp54 = add i32 %iv.2, 12
				%iv.2.next = add i32 %iv.2, 13
				%ext = zext i32 %iv.2.next to i64
				%tmp56 = add nuw nsw i64 %ext, 1
				%C6 = icmp sle i32 %tmp54, 65536
				br i1 %C6, label %loop.2, label %loop.3.ph

				loop.3.ph:
				br label %loop.3

				loop.3:
				%iv.3 = phi i32 [ %iv.2.next, %loop.3.ph ], [ %iv.3.next, %loop.3 ]
				%iv.3.next = add i32 %iv.3 , 13
				%C1 = icmp ult i32 %iv.3.next, 65536
				br i1 %C1, label %loop.3, label %loop.1.latch

				loop.1.latch:
				%ec = icmp ne i32 %iv.1, 9999
				br i1 %ec, label %loop.1, label %exit

				exit:
				ret void
				}