This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/
-
Transforms/
-
Utils/
-
LoopVersioning.cpp
-
Vectorize/
1/7
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
first-order-recurrence-complex.ll
-
interleaved-accesses.ll
-
loop-form.ll

Differential D94892

[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
ClosedPublic

Authored by reames on Jan 17 2021, 8:18 PM.

Download Raw Diff

Details

Reviewers

Ayal
fhahn
anna
bollu

Commits

rG723144665b7f: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop…
rG6d3e3ae8a9ca: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop…
rGc23ce54b36b1: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop…
rG3e5ce49e5371: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop…

Summary

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Jan 17 2021, 8:18 PM

Herald added subscribers: dantrushin, bollu, hiraditya, mcrosier. · View Herald TranscriptJan 17 2021, 8:18 PM

reames requested review of this revision.Jan 17 2021, 8:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2021, 8:18 PM

Update ascii art to reflect change and clarify a confusing bit in the original.

rebase over a landed set of tests

Harbormaster completed remote builds in B85551: Diff 317260.Jan 17 2021, 9:03 PM

This makes sense, but the fact that we need to scatter tests throughout various parts of the code seems unfortunate. I'm planning to take a closer look over the next few days to see if I can provide any ideas on how this could be improved.

Florian, have you had a chance to give thought to alternatives? Given it's been two weeks, unless you have an actionable suggestion, I'd like to move forward with the current patch. I'll emphasize that I'm happy to iterate on the design here, either during review, or after submission if you think of something cleaner.

Herald added a reviewer: bollu. · View Herald TranscriptFeb 1 2021, 11:24 AM

In D94892#2534445, @reames wrote:

Florian, have you had a chance to give thought to alternatives? Given it's been two weeks, unless you have an actionable suggestion, I'd like to move forward with the current patch. I'll emphasize that I'm happy to iterate on the design here, either during review, or after submission if you think of something cleaner.

Unfortunately I did not really have time to think much about better alternatives so far (and it's unlikely I'll get time until end of next week). But I agree, it's not a really big deal and it works well enough for now. Let's tweak it post-commit, should a better alternative presents itself. So for now I just have a few small suggestions, mostly wording related.

LGTM, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3153	This applies for all loops that require scalar epilogues, not just ones with multiple exits, right? If so, it might be better word it in terms of requiring scalar epiloge, rather than multi exits. (perhaps something like `An edge from the middle block to the exit block is only added if the scalar epilogue may not be executed. Thus only update the immediate dominators if the scalar epilogue is not required`.)
3402	nit: set up `a` conditional branch from `the` middle block to the scalar loop pre-header?
3410	Do we need this assert? There's a similar assert just a few lines above. If it's not needed, getting rid of the lambda would make things a bit easier to read IMO (perhaps it could be just `BranchInst *BrInst = Cost->requiresScalarEpilogue() ? BranchInst::Create(LoopScalarPreHeader) : BranchInst::Create(LoopExitBlock, LoopScalarPreHeader, Builder.getTrue())`)
3524	nit: if we require a scalar epilogue .... as we unconditionally branch to the scalar preheader?
3667	nit: unrelated whitespace change

This revision is now accepted and ready to land.Feb 4 2021, 2:24 PM

Closed by commit rG3e5ce49e5371: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop… (authored by reames). · Explain WhyFeb 4 2021, 5:28 PM

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG3e5ce49e5371: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop….

akuegel added a reverting change: rG7fe41ac3dff2: Revert "[LV] Unconditionally branch from middle to scalar preheader if the….Feb 5 2021, 3:53 AM

reames added a commit: rGc23ce54b36b1: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop….May 17 2021, 4:37 PM

reames added a reverting change: rGd16da7343d40: Revert "[LV] Unconditionally branch from middle to scalar preheader if the….May 17 2021, 4:49 PM

reames added a commit: rG6d3e3ae8a9ca: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop….May 17 2021, 4:59 PM

reames added a reverting change: rGed9d70781bbd: Revert "[LV] Unconditionally branch from middle to scalar preheader if the….May 17 2021, 8:53 PM

Ayal added inline comments.May 24 2021, 7:27 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5919	Not sure if this may help with the reported failures, but if vectorizing a loop having multiple exits is not (yet) intended to work for epilog vectorization, perhaps return false here if OrigLoop has multiple exit blocks. (Otherwise the CFG of https://llvm.org/docs/Vectorizers.html#epilogue-vectorization may be updated). Another thought to try and reduce the effect of the patch temporarily, is to check `if (!LoopExitBlock)` instead of `if (!Cost->requiresScalarEpilogue())` leaving the suboptimal but currently working code for single exit loops that require scalar epilogue.

Ayal mentioned this in D103700: [LV] Fix bug when unrolling (only) a loop with non-latch exit.Jun 6 2021, 11:55 AM

reames added inline comments.Jun 7 2021, 9:31 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5919	Not sure if this may help with the reported failures, but if vectorizing a loop having multiple exits is not (yet) intended to work for epilog vectorization, perhaps return false here if OrigLoop has multiple exit blocks. (Otherwise the CFG of https://llvm.org/docs/Vectorizers.html#epilogue-vectorization may be updated). This seems to be a comment which applies to the future patch which enables multiple exit vectorization, not this one. Unless I'm missing something? Also, I would really expect a flag called requires scalar epilogue to override the epilogue vectorization setting, but I haven't stared at the code enough to know if that's really true.

reames added a commit: rG723144665b7f: [LV] Unconditionally branch from middle to scalar preheader if the scalar loop….Jul 7 2021, 7:45 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

LoopVersioning.cpp

2 lines

Vectorize/

LoopVectorize.cpp

129 lines

test/

Transforms/

LoopVectorize/

first-order-recurrence-complex.ll

14 lines

interleaved-accesses.ll

30 lines

loop-form.ll

38 lines

Diff 321622

llvm/lib/Transforms/Utils/LoopVersioning.cpp

	Show All 38 Lines
	LoopVersioning::LoopVersioning(const LoopAccessInfo &LAI,			LoopVersioning::LoopVersioning(const LoopAccessInfo &LAI,
	ArrayRef<RuntimePointerCheck> Checks, Loop *L,			ArrayRef<RuntimePointerCheck> Checks, Loop *L,
	LoopInfo LI, DominatorTree DT,			LoopInfo LI, DominatorTree DT,
	ScalarEvolution *SE)			ScalarEvolution *SE)
	: VersionedLoop(L), NonVersionedLoop(nullptr),			: VersionedLoop(L), NonVersionedLoop(nullptr),
	AliasChecks(Checks.begin(), Checks.end()),			AliasChecks(Checks.begin(), Checks.end()),
	Preds(LAI.getPSE().getUnionPredicate()), LAI(LAI), LI(LI), DT(DT),			Preds(LAI.getPSE().getUnionPredicate()), LAI(LAI), LI(LI), DT(DT),
	SE(SE) {			SE(SE) {
	assert(L->getUniqueExitBlock() && "No single exit block");
	}			}

	void LoopVersioning::versionLoop(			void LoopVersioning::versionLoop(
	const SmallVectorImpl<Instruction *> &DefsUsedOutside) {			const SmallVectorImpl<Instruction *> &DefsUsedOutside) {
				assert(VersionedLoop->getUniqueExitBlock() && "No single exit block");
	assert(VersionedLoop->isLoopSimplifyForm() &&			assert(VersionedLoop->isLoopSimplifyForm() &&
	"Loop is not in loop-simplify form");			"Loop is not in loop-simplify form");

	Instruction *FirstCheckInst;			Instruction *FirstCheckInst;
	Instruction *MemRuntimeCheck;			Instruction *MemRuntimeCheck;
	Value *SCEVRuntimeCheck;			Value *SCEVRuntimeCheck;
	Value *RuntimeCheck = nullptr;			Value *RuntimeCheck = nullptr;

	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 846 Lines • ▼ Show 20 Lines	protected:
BasicBlock *LoopVectorPreHeader;		BasicBlock *LoopVectorPreHeader;

/// The scalar-loop preheader.		/// The scalar-loop preheader.
BasicBlock *LoopScalarPreHeader;		BasicBlock *LoopScalarPreHeader;

/// Middle Block between the vector and the scalar.		/// Middle Block between the vector and the scalar.
BasicBlock *LoopMiddleBlock;		BasicBlock *LoopMiddleBlock;

/// The (unique) ExitBlock of the scalar loop. Note that		/// The unique ExitBlock of the scalar loop if one exists. Note that
/// there can be multiple exiting edges reaching this block.		/// there can be multiple exiting edges reaching this block.
BasicBlock *LoopExitBlock;		BasicBlock *LoopExitBlock;

/// The vector loop body.		/// The vector loop body.
BasicBlock *LoopVectorBody;		BasicBlock *LoopVectorBody;

/// The scalar loop body.		/// The scalar loop body.
BasicBlock *LoopScalarBody;		BasicBlock *LoopScalarBody;
▲ Show 20 Lines • Show All 2,278 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
LoopVectorPreHeader =		LoopVectorPreHeader =
SplitBlock(TCCheckBlock, TCCheckBlock->getTerminator(), DT, LI, nullptr,		SplitBlock(TCCheckBlock, TCCheckBlock->getTerminator(), DT, LI, nullptr,
"vector.ph");		"vector.ph");

assert(DT->properlyDominates(DT->getNode(TCCheckBlock),		assert(DT->properlyDominates(DT->getNode(TCCheckBlock),
DT->getNode(Bypass)->getIDom()) &&		DT->getNode(Bypass)->getIDom()) &&
"TC check is expected to dominate Bypass");		"TC check is expected to dominate Bypass");

// Update dominator for Bypass & LoopExit.		// Update dominator for Bypass & LoopExit (if needed).
DT->changeImmediateDominator(Bypass, TCCheckBlock);		DT->changeImmediateDominator(Bypass, TCCheckBlock);
		if (!Cost->requiresScalarEpilogue())
		// If there is an epilogue which must run, there's no edge from the
		fhahnUnsubmitted Not Done Reply Inline Actions This applies for all loops that require scalar epilogues, not just ones with multiple exits, right? If so, it might be better word it in terms of requiring scalar epiloge, rather than multi exits. (perhaps something like `An edge from the middle block to the exit block is only added if the scalar epilogue may not be executed. Thus only update the immediate dominators if the scalar epilogue is not required`.) fhahn: This applies for all loops that require scalar epilogues, not just ones with multiple exits…
		// middle block to exit blocks and thus no need to update the immediate
		// dominator of the exit blocks.
DT->changeImmediateDominator(LoopExitBlock, TCCheckBlock);		DT->changeImmediateDominator(LoopExitBlock, TCCheckBlock);

ReplaceInstWithInst(		ReplaceInstWithInst(
TCCheckBlock->getTerminator(),		TCCheckBlock->getTerminator(),
BranchInst::Create(Bypass, LoopVectorPreHeader, CheckMinIters));		BranchInst::Create(Bypass, LoopVectorPreHeader, CheckMinIters));
LoopBypassBlocks.push_back(TCCheckBlock);		LoopBypassBlocks.push_back(TCCheckBlock);
}		}

void InnerLoopVectorizer::emitSCEVChecks(Loop L, BasicBlock Bypass) {		void InnerLoopVectorizer::emitSCEVChecks(Loop L, BasicBlock Bypass) {
Show All 22 Lines	void InnerLoopVectorizer::emitSCEVChecks(Loop L, BasicBlock Bypass) {
// Create new preheader for vector loop.		// Create new preheader for vector loop.
LoopVectorPreHeader =		LoopVectorPreHeader =
SplitBlock(SCEVCheckBlock, SCEVCheckBlock->getTerminator(), DT, LI,		SplitBlock(SCEVCheckBlock, SCEVCheckBlock->getTerminator(), DT, LI,
nullptr, "vector.ph");		nullptr, "vector.ph");

// Update dominator only if this is first RT check.		// Update dominator only if this is first RT check.
if (LoopBypassBlocks.empty()) {		if (LoopBypassBlocks.empty()) {
DT->changeImmediateDominator(Bypass, SCEVCheckBlock);		DT->changeImmediateDominator(Bypass, SCEVCheckBlock);
		if (!Cost->requiresScalarEpilogue())
		// If there is an epilogue which must run, there's no edge from the
		// middle block to exit blocks and thus no need to update the immediate
		// dominator of the exit blocks.
DT->changeImmediateDominator(LoopExitBlock, SCEVCheckBlock);		DT->changeImmediateDominator(LoopExitBlock, SCEVCheckBlock);
}		}

ReplaceInstWithInst(		ReplaceInstWithInst(
SCEVCheckBlock->getTerminator(),		SCEVCheckBlock->getTerminator(),
BranchInst::Create(Bypass, LoopVectorPreHeader, SCEVCheck));		BranchInst::Create(Bypass, LoopVectorPreHeader, SCEVCheck));
LoopBypassBlocks.push_back(SCEVCheckBlock);		LoopBypassBlocks.push_back(SCEVCheckBlock);
AddedSafetyChecks = true;		AddedSafetyChecks = true;
}		}
Show All 39 Lines	auto *CondBranch = cast<BranchInst>(
Builder.CreateCondBr(Builder.getTrue(), Bypass, LoopVectorPreHeader));		Builder.CreateCondBr(Builder.getTrue(), Bypass, LoopVectorPreHeader));
ReplaceInstWithInst(MemCheckBlock->getTerminator(), CondBranch);		ReplaceInstWithInst(MemCheckBlock->getTerminator(), CondBranch);
LoopBypassBlocks.push_back(MemCheckBlock);		LoopBypassBlocks.push_back(MemCheckBlock);
AddedSafetyChecks = true;		AddedSafetyChecks = true;

// Update dominator only if this is first RT check.		// Update dominator only if this is first RT check.
if (LoopBypassBlocks.empty()) {		if (LoopBypassBlocks.empty()) {
DT->changeImmediateDominator(Bypass, MemCheckBlock);		DT->changeImmediateDominator(Bypass, MemCheckBlock);
		if (!Cost->requiresScalarEpilogue())
		// If there is an epilogue which must run, there's no edge from the
		// middle block to exit blocks and thus no need to update the immediate
		// dominator of the exit blocks.
DT->changeImmediateDominator(LoopExitBlock, MemCheckBlock);		DT->changeImmediateDominator(LoopExitBlock, MemCheckBlock);
}		}

Instruction *FirstCheckInst;		Instruction *FirstCheckInst;
Instruction *MemRuntimeCheck;		Instruction *MemRuntimeCheck;
SCEVExpander Exp(*PSE.getSE(), MemCheckBlock->getModule()->getDataLayout(),		SCEVExpander Exp(*PSE.getSE(), MemCheckBlock->getModule()->getDataLayout(),
"induction");		"induction");
std::tie(FirstCheckInst, MemRuntimeCheck) = addRuntimeChecks(		std::tie(FirstCheckInst, MemRuntimeCheck) = addRuntimeChecks(
MemCheckBlock->getTerminator(), OrigLoop, RtPtrChecking.getChecks(), Exp);		MemCheckBlock->getTerminator(), OrigLoop, RtPtrChecking.getChecks(), Exp);
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	case InductionDescriptor::IK_NoInduction:
return nullptr;		return nullptr;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

Loop *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {		Loop *InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopScalarBody = OrigLoop->getHeader();		LoopScalarBody = OrigLoop->getHeader();
LoopVectorPreHeader = OrigLoop->getLoopPreheader();		LoopVectorPreHeader = OrigLoop->getLoopPreheader();
LoopExitBlock = OrigLoop->getUniqueExitBlock();
assert(LoopExitBlock && "Must have an exit block");
assert(LoopVectorPreHeader && "Invalid loop structure");		assert(LoopVectorPreHeader && "Invalid loop structure");
		LoopExitBlock = OrigLoop->getUniqueExitBlock(); // may be nullptr
		assert((LoopExitBlock \|\| Cost->requiresScalarEpilogue()) &&
		"multiple exit loop without required epilogue?");

LoopMiddleBlock =		LoopMiddleBlock =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,		SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
LI, nullptr, Twine(Prefix) + "middle.block");		LI, nullptr, Twine(Prefix) + "middle.block");
LoopScalarPreHeader =		LoopScalarPreHeader =
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,		SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
nullptr, Twine(Prefix) + "scalar.ph");		nullptr, Twine(Prefix) + "scalar.ph");

// Set up branch from middle block to the exit and scalar preheader blocks.
// completeLoopSkeleton will update the condition to use an iteration check,
// if required to decide whether to execute the remainder.
BranchInst *BrInst =
BranchInst::Create(LoopExitBlock, LoopScalarPreHeader, Builder.getTrue());
auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();		auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();

		// Set up the middle block terminator. Two cases:
		// 1) If we know that we must execute the scalar epilogue, emit an
		// unconditional branch.
		// 2) Otherwise, we must have a single unique exit block (due to how we
		// implement the multiple exit case). In this case, set up a conditonal
		fhahnUnsubmitted Not Done Reply Inline Actions nit: set up `a` conditional branch from `the` middle block to the scalar loop pre-header? fhahn: nit: set up `a` conditional branch from `the` middle block to the scalar loop pre-header?
		// branch from the middle block to the loop scalar preheader, and the
		// exit block. completeLoopSkeleton will update the condition to use an
		// iteration check, if required to decide whether to execute the remainder.
		BranchInst *BrInst = Cost->requiresScalarEpilogue() ?
		BranchInst::Create(LoopScalarPreHeader) :
		BranchInst::Create(LoopExitBlock, LoopScalarPreHeader,
		Builder.getTrue());
BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());		BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
		fhahnUnsubmitted Not Done Reply Inline Actions Do we need this assert? There's a similar assert just a few lines above. If it's not needed, getting rid of the lambda would make things a bit easier to read IMO (perhaps it could be just `BranchInst BrInst = Cost->requiresScalarEpilogue() ? BranchInst::Create(LoopScalarPreHeader) : BranchInst::Create(LoopExitBlock, LoopScalarPreHeader, Builder.getTrue())`) fhahn:* Do we need this assert? There's a similar assert just a few lines above. If it's not needed…
ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);		ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);

// We intentionally don't let SplitBlock to update LoopInfo since		// We intentionally don't let SplitBlock to update LoopInfo since
// LoopVectorBody should belong to another loop than LoopVectorPreHeader.		// LoopVectorBody should belong to another loop than LoopVectorPreHeader.
// LoopVectorBody is explicitly added to the correct place few lines later.		// LoopVectorBody is explicitly added to the correct place few lines later.
LoopVectorBody =		LoopVectorBody =
SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,		SplitBlock(LoopVectorPreHeader, LoopVectorPreHeader->getTerminator(), DT,
nullptr, nullptr, Twine(Prefix) + "vector.body");		nullptr, nullptr, Twine(Prefix) + "vector.body");

// Update dominator for loop exit.		// Update dominator for loop exit.
		if (!Cost->requiresScalarEpilogue())
		// If there is an epilogue which must run, there's no edge from the
		// middle block to exit blocks and thus no need to update the immediate
		// dominator of the exit blocks.
DT->changeImmediateDominator(LoopExitBlock, LoopMiddleBlock);		DT->changeImmediateDominator(LoopExitBlock, LoopMiddleBlock);

// Create and register the new vector loop.		// Create and register the new vector loop.
Loop *Lp = LI->AllocateLoop();		Loop *Lp = LI->AllocateLoop();
Loop *ParentLoop = OrigLoop->getParentLoop();		Loop *ParentLoop = OrigLoop->getParentLoop();

// Insert the new loop into the loop nest and register the new basic blocks		// Insert the new loop into the loop nest and register the new basic blocks
// before calling any utilities such as SCEV that require valid LoopInfo.		// before calling any utilities such as SCEV that require valid LoopInfo.
if (ParentLoop) {		if (ParentLoop) {
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	BasicBlock InnerLoopVectorizer::completeLoopSkeleton(Loop L,

// The trip counts should be cached by now.		// The trip counts should be cached by now.
Value *Count = getOrCreateTripCount(L);		Value *Count = getOrCreateTripCount(L);
Value *VectorTripCount = getOrCreateVectorTripCount(L);		Value *VectorTripCount = getOrCreateVectorTripCount(L);

auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();		auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();

// Add a check in the middle block to see if we have completed		// Add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop.		// all of the iterations in the first vector loop. Three cases:
// If (N - N%VF) == N, then we don't need to run the remainder.		// 1) If we require a scalar epilogue, there is no conditional branch as
// If tail is to be folded, we know we don't need to run the remainder.		// we unconditionally branch to the scalar preheader. Do nothing.
		fhahnUnsubmitted Not Done Reply Inline Actions nit: if we require a scalar epilogue .... as we unconditionally branch to the scalar preheader? fhahn: nit: if we require a scalar epilogue .... as we unconditionally branch to the scalar…
if (!Cost->foldTailByMasking()) {		// 2) If (N - N%VF) == N, then we don't need to run the remainder.
		// Thus if tail is to be folded, we know we don't need to run the
		// remainder and we can use the previous value for the condition (true).
		// 3) Otherwise, construct a runtime check.
		if (!Cost->requiresScalarEpilogue() && !Cost->foldTailByMasking()) {
Instruction *CmpN = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ,		Instruction *CmpN = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ,
Count, VectorTripCount, "cmp.n",		Count, VectorTripCount, "cmp.n",
LoopMiddleBlock->getTerminator());		LoopMiddleBlock->getTerminator());

// Here we use the same DebugLoc as the scalar loop latch terminator instead		// Here we use the same DebugLoc as the scalar loop latch terminator instead
// of the corresponding compare because they may have ended up with		// of the corresponding compare because they may have ended up with
// different line numbers and we want to avoid awkward line stepping while		// different line numbers and we want to avoid awkward line stepping while
// debugging. Eg. if the compare has got a line number inside the loop.		// debugging. Eg. if the compare has got a line number inside the loop.
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
\| / v		\| / v
\|\| [ ] <-- vector pre header.		\|\| [ ] <-- vector pre header.
\|/ \|		\|/ \|
\| v		\| v
\| [ ] \		\| [ ] \
\| [ ]_\| <-- vector loop.		\| [ ]_\| <-- vector loop.
\| \|		\| \|
\| v		\| v
\| -[ ] <--- middle-block.		\ -[ ] <--- middle-block.
\| / \|		\/ \|
\| / v		/\ v
-\|- >[ ] <--- new preheader.		\| ->[ ] <--- new preheader.
\| \|		\| \|
\| v		(opt) v <-- edge from middle to exit iff epilogue is not required.
\| [ ] \		\| [ ] \
\| [ ]_\| <-- old scalar loop to handle remainder.		\| [ ]_\| <-- old scalar loop to handle remainder (scalar epilogue).
\ \|		\ \|
\ v		\ v
>[ ] <-- exit block.		>[ ] <-- exit block(s).
...		...
*/		*/

// Get the metadata of the original loop before it gets modified.		// Get the metadata of the original loop before it gets modified.
MDNode *OrigLoopID = OrigLoop->getLoopID();		MDNode *OrigLoopID = OrigLoop->getLoopID();

// Create an empty vector loop, and prepare basic blocks for the runtime		// Create an empty vector loop, and prepare basic blocks for the runtime
// checks.		// checks.
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
// Fix up external users of the induction variable. At this point, we are		// Fix up external users of the induction variable. At this point, we are
// in LCSSA form, with all external PHIs that use the IV having one input value,		// in LCSSA form, with all external PHIs that use the IV having one input value,
// coming from the remainder loop. We need those PHIs to also have a correct		// coming from the remainder loop. We need those PHIs to also have a correct
// value for the IV when arriving directly from the middle block.		// value for the IV when arriving directly from the middle block.
void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,		void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
const InductionDescriptor &II,		const InductionDescriptor &II,
Value CountRoundDown, Value EndValue,		Value CountRoundDown, Value EndValue,
BasicBlock *MiddleBlock) {		BasicBlock *MiddleBlock) {
// There are two kinds of external IV usages - those that use the value		// There are two kinds of external IV usages - those that use the value
		fhahnUnsubmitted Not Done Reply Inline Actions nit: unrelated whitespace change fhahn: nit: unrelated whitespace change
// computed in the last iteration (the PHI) and those that use the penultimate		// computed in the last iteration (the PHI) and those that use the penultimate
// value (the value that feeds into the phi from the loop latch).		// value (the value that feeds into the phi from the loop latch).
// We allow both, but they, obviously, have different values.		// We allow both, but they, obviously, have different values.

assert(OrigLoop->getUniqueExitBlock() && "Expected a single exit block");		assert(OrigLoop->getUniqueExitBlock() && "Expected a single exit block");

DenseMap<Value , Value > MissingVals;		DenseMap<Value , Value > MissingVals;

▲ Show 20 Lines • Show All 323 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixVectorizedLoop() {
// vector form. Now we need to fix the recurrences in the loop. These PHI		// vector form. Now we need to fix the recurrences in the loop. These PHI
// nodes are currently empty because we did not want to introduce cycles.		// nodes are currently empty because we did not want to introduce cycles.
// This is the second stage of vectorizing recurrences.		// This is the second stage of vectorizing recurrences.
fixCrossIterationPHIs();		fixCrossIterationPHIs();

// Forget the original basic block.		// Forget the original basic block.
PSE.getSE()->forgetLoop(OrigLoop);		PSE.getSE()->forgetLoop(OrigLoop);

		// If we inserted an edge from the middle block to the unique exit block,
		// update uses outside the loop (phis) to account for the newly inserted
		// edge.
		if (!Cost->requiresScalarEpilogue()) {
// Fix-up external users of the induction variables.		// Fix-up external users of the induction variables.
for (auto &Entry : Legal->getInductionVars())		for (auto &Entry : Legal->getInductionVars())
fixupIVUsers(Entry.first, Entry.second,		fixupIVUsers(Entry.first, Entry.second,
getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),		getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
IVEndValues[Entry.first], LoopMiddleBlock);		IVEndValues[Entry.first], LoopMiddleBlock);

fixLCSSAPHIs();		fixLCSSAPHIs();
		}
for (Instruction *PI : PredicatedInstructions)		for (Instruction *PI : PredicatedInstructions)
sinkScalarOperands(&*PI);		sinkScalarOperands(&*PI);

// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);

// Set/update profile weights for the vector and remainder loops as original		// Set/update profile weights for the vector and remainder loops as original
// loop iterations are now distributed among them. Note that original loop		// loop iterations are now distributed among them. Note that original loop
▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {

// Finally, fix users of the recurrence outside the loop. The users will need		// Finally, fix users of the recurrence outside the loop. The users will need
// either the last value of the scalar recurrence or the last value of the		// either the last value of the scalar recurrence or the last value of the
// vector recurrence we extracted in the middle block. Since the loop is in		// vector recurrence we extracted in the middle block. Since the loop is in
// LCSSA form, we just need to find all the phi nodes for the original scalar		// LCSSA form, we just need to find all the phi nodes for the original scalar
// recurrence in the exit block, and then add an edge for the middle block.		// recurrence in the exit block, and then add an edge for the middle block.
// Note that LCSSA does not imply single entry when the original scalar loop		// Note that LCSSA does not imply single entry when the original scalar loop
// had multiple exiting edges (as we always run the last iteration in the		// had multiple exiting edges (as we always run the last iteration in the
// scalar epilogue); in that case, the exiting path through middle will be		// scalar epilogue); in that case, there is no edge from middle to exit and
// dynamically dead and the value picked for the phi doesn't matter.		// and thus no phis which needed updated.
		if (!Cost->requiresScalarEpilogue())
for (PHINode &LCSSAPhi : LoopExitBlock->phis())		for (PHINode &LCSSAPhi : LoopExitBlock->phis())
if (any_of(LCSSAPhi.incoming_values(),		if (any_of(LCSSAPhi.incoming_values(),
[Phi](Value *V) { return V == Phi; }))		[Phi](Value *V) { return V == Phi; }))
LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop, LoopMiddleBlock);		LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop, LoopMiddleBlock);
}		}

void InnerLoopVectorizer::fixReduction(PHINode *Phi) {		void InnerLoopVectorizer::fixReduction(PHINode *Phi) {
// Get it's reduction variable descriptor.		// Get it's reduction variable descriptor.
assert(Legal->isReductionVariable(Phi) &&		assert(Legal->isReductionVariable(Phi) &&
"Unable to find the reduction variable");		"Unable to find the reduction variable");
RecurrenceDescriptor RdxDesc = Legal->getReductionVars()[Phi];		RecurrenceDescriptor RdxDesc = Legal->getReductionVars()[Phi];

▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixReduction(PHINode *Phi) {
BCBlockPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);		BCBlockPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);

// Now, we need to fix the users of the reduction variable		// Now, we need to fix the users of the reduction variable
// inside and outside of the scalar remainder loop.		// inside and outside of the scalar remainder loop.

// We know that the loop is in LCSSA form. We need to update the PHI nodes		// We know that the loop is in LCSSA form. We need to update the PHI nodes
// in the exit blocks. See comment on analogous loop in		// in the exit blocks. See comment on analogous loop in
// fixFirstOrderRecurrence for a more complete explaination of the logic.		// fixFirstOrderRecurrence for a more complete explaination of the logic.
		if (!Cost->requiresScalarEpilogue())
for (PHINode &LCSSAPhi : LoopExitBlock->phis())		for (PHINode &LCSSAPhi : LoopExitBlock->phis())
if (any_of(LCSSAPhi.incoming_values(),		if (any_of(LCSSAPhi.incoming_values(),
[LoopExitInst](Value *V) { return V == LoopExitInst; }))		[LoopExitInst](Value *V) { return V == LoopExitInst; }))
LCSSAPhi.addIncoming(ReducedPartRdx, LoopMiddleBlock);		LCSSAPhi.addIncoming(ReducedPartRdx, LoopMiddleBlock);

// Fix the scalar loop reduction variable with the incoming reduction sum		// Fix the scalar loop reduction variable with the incoming reduction sum
// from the vector body and from the backedge value.		// from the vector body and from the backedge value.
int IncomingEdgeBlockIdx =		int IncomingEdgeBlockIdx =
Phi->getBasicBlockIndex(OrigLoop->getLoopLatch());		Phi->getBasicBlockIndex(OrigLoop->getLoopLatch());
assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");		assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");
// Pick the other block.		// Pick the other block.
int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);		int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);
▲ Show 20 Lines • Show All 1,491 Lines • ▼ Show 20 Lines	LLVM_DEBUG(if (ForceVectorization && Width > 1 && Cost >= ScalarCost) dbgs()
<< "LV: Vectorization seems to be not beneficial, "		<< "LV: Vectorization seems to be not beneficial, "
<< "but was forced by a user.\n");		<< "but was forced by a user.\n");
LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << Width << ".\n");		LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << Width << ".\n");
VectorizationFactor Factor = {ElementCount::getFixed(Width),		VectorizationFactor Factor = {ElementCount::getFixed(Width),
(unsigned)(Width * Cost)};		(unsigned)(Width * Cost)};
return Factor;		return Factor;
}		}

bool LoopVectorizationCostModel::isCandidateForEpilogueVectorization(		bool LoopVectorizationCostModel::isCandidateForEpilogueVectorization(
		AyalUnsubmitted Not Done Reply Inline Actions Not sure if this may help with the reported failures, but if vectorizing a loop having multiple exits is not (yet) intended to work for epilog vectorization, perhaps return false here if OrigLoop has multiple exit blocks. (Otherwise the CFG of https://llvm.org/docs/Vectorizers.html#epilogue-vectorization may be updated). Another thought to try and reduce the effect of the patch temporarily, is to check `if (!LoopExitBlock)` instead of `if (!Cost->requiresScalarEpilogue())` leaving the suboptimal but currently working code for single exit loops that require scalar epilogue. Ayal: Not sure if this may help with the reported failures, but if vectorizing a loop having multiple…
		reamesAuthorUnsubmitted Done Reply Inline Actions Not sure if this may help with the reported failures, but if vectorizing a loop having multiple exits is not (yet) intended to work for epilog vectorization, perhaps return false here if OrigLoop has multiple exit blocks. (Otherwise the CFG of https://llvm.org/docs/Vectorizers.html#epilogue-vectorization may be updated). This seems to be a comment which applies to the future patch which enables multiple exit vectorization, not this one. Unless I'm missing something? Also, I would really expect a flag called requires scalar epilogue to override the epilogue vectorization setting, but I haven't stared at the code enough to know if that's really true. reames: > Not sure if this may help with the reported failures, but if vectorizing a loop having…
const Loop &L, ElementCount VF) const {		const Loop &L, ElementCount VF) const {
// Cross iteration phis such as reductions need special handling and are		// Cross iteration phis such as reductions need special handling and are
// currently unsupported.		// currently unsupported.
if (any_of(L.getHeader()->phis(), [&](PHINode &Phi) {		if (any_of(L.getHeader()->phis(), [&](PHINode &Phi) {
return Legal->isFirstOrderRecurrence(&Phi) \|\|		return Legal->isFirstOrderRecurrence(&Phi) \|\|
Legal->isReductionVariable(&Phi);		Legal->isReductionVariable(&Phi);
}))		}))
return false;		return false;
▲ Show 20 Lines • Show All 2,124 Lines • ▼ Show 20 Lines	BasicBlock *EpilogueVectorizerMainLoop::emitMinimumIterationCountCheck(

if (ForEpilogue) {		if (ForEpilogue) {
assert(DT->properlyDominates(DT->getNode(TCCheckBlock),		assert(DT->properlyDominates(DT->getNode(TCCheckBlock),
DT->getNode(Bypass)->getIDom()) &&		DT->getNode(Bypass)->getIDom()) &&
"TC check is expected to dominate Bypass");		"TC check is expected to dominate Bypass");

// Update dominator for Bypass & LoopExit.		// Update dominator for Bypass & LoopExit.
DT->changeImmediateDominator(Bypass, TCCheckBlock);		DT->changeImmediateDominator(Bypass, TCCheckBlock);
		if (!Cost->requiresScalarEpilogue())
		// For loops with multiple exits, there's no edge from the middle block
		// to exit blocks (as the epilogue must run) and thus no need to update
		// the immediate dominator of the exit blocks.
DT->changeImmediateDominator(LoopExitBlock, TCCheckBlock);		DT->changeImmediateDominator(LoopExitBlock, TCCheckBlock);

LoopBypassBlocks.push_back(TCCheckBlock);		LoopBypassBlocks.push_back(TCCheckBlock);

// Save the trip count so we don't have to regenerate it in the		// Save the trip count so we don't have to regenerate it in the
// vec.epilog.iter.check. This is safe to do because the trip count		// vec.epilog.iter.check. This is safe to do because the trip count
// generated here dominates the vector epilog iter check.		// generated here dominates the vector epilog iter check.
EPI.TripCount = Count;		EPI.TripCount = Count;
}		}
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	EPI.MemSafetyCheck->getTerminator()->replaceUsesOfWith(
VecEpilogueIterationCountCheck, LoopScalarPreHeader);		VecEpilogueIterationCountCheck, LoopScalarPreHeader);

DT->changeImmediateDominator(		DT->changeImmediateDominator(
VecEpilogueIterationCountCheck,		VecEpilogueIterationCountCheck,
VecEpilogueIterationCountCheck->getSinglePredecessor());		VecEpilogueIterationCountCheck->getSinglePredecessor());

DT->changeImmediateDominator(LoopScalarPreHeader,		DT->changeImmediateDominator(LoopScalarPreHeader,
EPI.EpilogueIterationCountCheck);		EPI.EpilogueIterationCountCheck);
DT->changeImmediateDominator(LoopExitBlock, EPI.EpilogueIterationCountCheck);		if (!Cost->requiresScalarEpilogue())
		// If there is an epilogue which must run, there's no edge from the
		// middle block to exit blocks and thus no need to update the immediate
		// dominator of the exit blocks.
		DT->changeImmediateDominator(LoopExitBlock,
		EPI.EpilogueIterationCountCheck);

// Keep track of bypass blocks, as they feed start values to the induction		// Keep track of bypass blocks, as they feed start values to the induction
// phis in the scalar loop preheader.		// phis in the scalar loop preheader.
if (EPI.SCEVSafetyCheck)		if (EPI.SCEVSafetyCheck)
LoopBypassBlocks.push_back(EPI.SCEVSafetyCheck);		LoopBypassBlocks.push_back(EPI.SCEVSafetyCheck);
if (EPI.MemSafetyCheck)		if (EPI.MemSafetyCheck)
LoopBypassBlocks.push_back(EPI.MemSafetyCheck);		LoopBypassBlocks.push_back(EPI.MemSafetyCheck);
LoopBypassBlocks.push_back(EPI.EpilogueIterationCountCheck);		LoopBypassBlocks.push_back(EPI.EpilogueIterationCountCheck);
▲ Show 20 Lines • Show All 1,737 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll

	Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP11]] to <4 x i16>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP11]] to <4 x i16>*
	; CHECK-NEXT: store <4 x i16> [[TMP13]], <4 x i16>* [[TMP14]], align 4			; CHECK-NEXT: store <4 x i16> [[TMP13]], <4 x i16>* [[TMP14]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3			; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
	; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2			; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
	; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
	; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[REC_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[REC_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: [[REC_NEXT]] = load i16, i16* [[B]], align 2			; CHECK-NEXT: [[REC_NEXT]] = load i16, i16* [[B]], align 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: store i16 [[SCALAR_RECUR]], i16* [[B]], align 4			; CHECK-NEXT: store i16 [[SCALAR_RECUR]], i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP7:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP7:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16 [ [[SCALAR_RECUR]], [[FOR_BODY]] ], [ [[SCALAR_RECUR]], [[FOR_COND]] ], [ [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16 [ [[SCALAR_RECUR]], [[FOR_BODY]] ], [ [[SCALAR_RECUR]], [[FOR_COND]] ]
	; CHECK-NEXT: ret i16 [[REC_LCSSA]]			; CHECK-NEXT: ret i16 [[REC_LCSSA]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%rec = phi i16 [0, %entry], [ %rec.next, %for.body ]			%rec = phi i16 [0, %entry], [ %rec.next, %for.body ]
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x i16> [[VECTOR_RECUR]], <4 x i16> [[WIDE_LOAD]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP11]] to <4 x i16>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast i16 [[TMP11]] to <4 x i16>*
	; CHECK-NEXT: store <4 x i16> [[TMP13]], <4 x i16>* [[TMP14]], align 4			; CHECK-NEXT: store <4 x i16> [[TMP13]], <4 x i16>* [[TMP14]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 4
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i32> [[VEC_IND]], <i32 4, i32 4, i32 4, i32 4>
	; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP15:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]
	; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3			; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 3
	; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2			; CHECK-NEXT: [[VECTOR_RECUR_EXTRACT_FOR_PHI:%.*]] = extractelement <4 x i16> [[WIDE_LOAD]], i32 2
	; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SCALAR_RECUR_INIT:%.]] = phi i16 [ 0, [[ENTRY:%.]] ], [ [[VECTOR_RECUR_EXTRACT]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
	; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[REC_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SCALAR_RECUR:%.]] = phi i16 [ [[SCALAR_RECUR_INIT]], [[SCALAR_PH]] ], [ [[REC_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: [[REC_NEXT]] = load i16, i16* [[B]], align 2			; CHECK-NEXT: [[REC_NEXT]] = load i16, i16* [[B]], align 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: store i16 [[SCALAR_RECUR]], i16* [[B]], align 4			; CHECK-NEXT: store i16 [[SCALAR_RECUR]], i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP9:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP9:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16 [ [[SCALAR_RECUR]], [[FOR_COND]] ], [ 10, [[FOR_BODY]] ], [ [[VECTOR_RECUR_EXTRACT_FOR_PHI]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[REC_LCSSA:%.*]] = phi i16 [ [[SCALAR_RECUR]], [[FOR_COND]] ], [ 10, [[FOR_BODY]] ]
	; CHECK-NEXT: ret i16 [[REC_LCSSA]]			; CHECK-NEXT: ret i16 [[REC_LCSSA]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%rec = phi i16 [0, %entry], [ %rec.next, %for.body ]			%rec = phi i16 [0, %entry], [ %rec.next, %for.body ]
	Show All 16 Lines

llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll

	Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[INDEX]], 9223372036854775804			; CHECK-NEXT: [[TMP3:%.*]] = and i64 [[INDEX]], 9223372036854775804
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP3]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <4 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP5]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 508			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 508
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 false, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1016, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1016, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = shl nsw i32 [[TMP]], 1			; CHECK-NEXT: [[MUL:%.*]] = shl nsw i32 [[TMP]], 1
	; CHECK-NEXT: [[TMP1:%.*]] = lshr exact i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP1:%.*]] = lshr exact i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]
	; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX2]], align 4			; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022			; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP]], [[LOOP13:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP:%.]], [[LOOP13:!llvm.loop !.]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.body			for.cond.cleanup: ; preds = %for.body
	ret void			ret void

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP9:%.*]] = and i64 [[INDEX]], 9223372036854775804			; CHECK-NEXT: [[TMP9:%.*]] = and i64 [[INDEX]], 9223372036854775804
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP9]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[B:%.*]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP11]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP11]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 false, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = shl nsw i32 [[TMP]], 1			; CHECK-NEXT: [[MUL:%.*]] = shl nsw i32 [[TMP]], 1
	; CHECK-NEXT: [[TMP1:%.*]] = lshr exact i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP1:%.*]] = lshr exact i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]
	; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX2]], align 4			; CHECK-NEXT: store i32 [[MUL]], i32* [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV_NEXT]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP]], [[LOOP15:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP:%.]], [[LOOP15:!llvm.loop !.]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.body			for.cond.cleanup: ; preds = %for.body
	ret void			ret void

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <8 x i32> [[WIDE_VEC]], i32 4			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <8 x i32> [[WIDE_VEC]], i32 4
	; CHECK-NEXT: store i32 [[TMP17]], i32* [[TMP12]], align 4			; CHECK-NEXT: store i32 [[TMP17]], i32* [[TMP12]], align 4
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[WIDE_VEC]], i32 6			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[WIDE_VEC]], i32 6
	; CHECK-NEXT: store i32 [[TMP18]], i32* [[TMP13]], align 4			; CHECK-NEXT: store i32 [[TMP18]], i32* [[TMP13]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP24:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP19]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP24:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[P_I_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 0			; CHECK-NEXT: [[P_I_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 0
	; CHECK-NEXT: [[P_I_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 1			; CHECK-NEXT: [[P_I_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 1
	; CHECK-NEXT: store i32 [[Z]], i32* [[P_I_X]], align 4			; CHECK-NEXT: store i32 [[Z]], i32* [[P_I_X]], align 4
	; CHECK-NEXT: store i32 [[Z]], i32* [[P_I_Y]], align 4			; CHECK-NEXT: store i32 [[Z]], i32* [[P_I_Y]], align 4
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], [[LOOP25:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.]], [[LOOP25:!llvm.loop !.]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]			%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP26:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP26:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP17]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP17]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP17]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP17]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <4 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP19]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP19]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[S:%.]] = phi i32 [ [[TMP21:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[S:%.]] = phi i32 [ [[TMP21:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[P_I_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 0			; CHECK-NEXT: [[P_I_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 0
	; CHECK-NEXT: [[P_I_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 1			; CHECK-NEXT: [[P_I_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 1
	; CHECK-NEXT: [[TMP20:%.]] = load i32, i32 [[P_I_X]], align 4			; CHECK-NEXT: [[TMP20:%.]] = load i32, i32 [[P_I_X]], align 4
	; CHECK-NEXT: store i32 [[TMP20]], i32* [[P_I_Y]], align 4			; CHECK-NEXT: store i32 [[TMP20]], i32* [[P_I_Y]], align 4
	; CHECK-NEXT: [[TMP21]] = add nsw i32 [[TMP20]], [[S]]			; CHECK-NEXT: [[TMP21]] = add nsw i32 [[TMP20]], [[S]]
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], [[LOOP27:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.]], [[LOOP27:!llvm.loop !.]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP22:%.*]] = phi i32 [ [[TMP21]], [[FOR_BODY]] ], [ [[TMP19]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: ret i32 [[TMP21]]
	; CHECK-NEXT: ret i32 [[TMP22]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]			%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
	%s = phi i32 [ %2, %for.body ], [ 0, %entry ]			%s = phi i32 [ %2, %for.body ], [ 0, %entry ]
	%p_i.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 0			%p_i.x = getelementptr inbounds %pair.i32, %pair.i32* %p, i64 %i, i32 0
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[WIDE_VEC]], i32 4			; CHECK-NEXT: [[TMP18:%.*]] = extractelement <8 x i32> [[WIDE_VEC]], i32 4
	; CHECK-NEXT: store i32 [[TMP18]], i32* [[TMP13]], align 4			; CHECK-NEXT: store i32 [[TMP18]], i32* [[TMP13]], align 4
	; CHECK-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[WIDE_VEC]], i32 6			; CHECK-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[WIDE_VEC]], i32 6
	; CHECK-NEXT: store i32 [[TMP19]], i32* [[TMP14]], align 4			; CHECK-NEXT: store i32 [[TMP19]], i32* [[TMP14]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP28:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP28:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[P_I_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 0			; CHECK-NEXT: [[P_I_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 0
	; CHECK-NEXT: [[P_I_MINUS_1_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 -1, i32 0			; CHECK-NEXT: [[P_I_MINUS_1_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 -1, i32 0
	; CHECK-NEXT: [[P_I_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 1			; CHECK-NEXT: [[P_I_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 1
	; CHECK-NEXT: store i32 [[Z]], i32* [[P_I_X]], align 4			; CHECK-NEXT: store i32 [[Z]], i32* [[P_I_X]], align 4
	; CHECK-NEXT: [[TMP21:%.]] = load i32, i32 [[P_I_MINUS_1_X]], align 4			; CHECK-NEXT: [[TMP21:%.]] = load i32, i32 [[P_I_MINUS_1_X]], align 4
	; CHECK-NEXT: store i32 [[TMP21]], i32* [[P_I_Y]], align 4			; CHECK-NEXT: store i32 [[TMP21]], i32* [[P_I_Y]], align 4
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], [[LOOP29:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.]], [[LOOP29:!llvm.loop !.]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]			%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP30:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP30:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP20]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP20]], <4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP20]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP20]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP22]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP22]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[I:%.]] = phi i64 [ [[I_NEXT:%.]], [[FOR_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[S:%.]] = phi i32 [ [[TMP25:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[S:%.]] = phi i32 [ [[TMP25:%.]], [[FOR_BODY]] ], [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[I_PLUS_1:%.*]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_PLUS_1:%.*]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[P_I_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 0			; CHECK-NEXT: [[P_I_X:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 0
	; CHECK-NEXT: [[P_I_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 1			; CHECK-NEXT: [[P_I_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I]], i32 1
	; CHECK-NEXT: [[P_I_PLUS_1_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I_PLUS_1]], i32 1			; CHECK-NEXT: [[P_I_PLUS_1_Y:%.]] = getelementptr inbounds [[PAIR_I32]], %pair.i32 [[P]], i64 [[I_PLUS_1]], i32 1
	; CHECK-NEXT: [[TMP23:%.]] = load i32, i32 [[P_I_X]], align 4			; CHECK-NEXT: [[TMP23:%.]] = load i32, i32 [[P_I_X]], align 4
	; CHECK-NEXT: store i32 [[TMP23]], i32* [[P_I_PLUS_1_Y]], align 4			; CHECK-NEXT: store i32 [[TMP23]], i32* [[P_I_PLUS_1_Y]], align 4
	; CHECK-NEXT: [[TMP24:%.]] = load i32, i32 [[P_I_Y]], align 4			; CHECK-NEXT: [[TMP24:%.]] = load i32, i32 [[P_I_Y]], align 4
	; CHECK-NEXT: [[TMP25]] = add nsw i32 [[TMP24]], [[S]]			; CHECK-NEXT: [[TMP25]] = add nsw i32 [[TMP24]], [[S]]
	; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1			; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]			; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END]], [[LOOP31:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_END:%.]], [[LOOP31:!llvm.loop !.]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP26:%.*]] = phi i32 [ [[TMP25]], [[FOR_BODY]] ], [ [[TMP22]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: ret i32 [[TMP25]]
	; CHECK-NEXT: ret i32 [[TMP26]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]			%i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
	%s = phi i32 [ %2, %for.body ], [ 0, %entry ]			%s = phi i32 [ %2, %for.body ], [ 0, %entry ]
	%i_plus_1 = add nuw nsw i64 %i, 1			%i_plus_1 = add nuw nsw i64 %i, 1
	▲ Show 20 Lines • Show All 345 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/loop-form.ll

	Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i16, i16 [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i16 [[TMP8]] to <2 x i16>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast i16 [[TMP8]] to <2 x i16>*
	; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP9]], align 4			; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP9]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP4:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP1]], [[N_VEC]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: br label [[FOR_COND]], [[LOOP5:!llvm.loop !.*]]			; CHECK-NEXT: br label [[FOR_COND]], [[LOOP5:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <2 x i16>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <2 x i16>*
	; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP10]], align 4			; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP10]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP6:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP7:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP7:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <2 x i16>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <2 x i16>*
	; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP10]], align 4			; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP10]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP8:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK-NEXT: [[IND_ESCAPE:%.*]] = sub i32 [[N_VEC]], 1
	; CHECK-NEXT: [[IND_ESCAPE1:%.*]] = sub i32 [[N_VEC]], 1
	; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP9:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP9:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ], [ [[I]], [[FOR_COND]] ], [ [[IND_ESCAPE1]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[I]], [[FOR_BODY]] ], [ [[I]], [[FOR_COND]] ]
	; CHECK-NEXT: ret i32 [[I_LCSSA]]			; CHECK-NEXT: ret i32 [[I_LCSSA]]
	;			;
	; TAILFOLD-LABEL: @multiple_unique_exit2(			; TAILFOLD-LABEL: @multiple_unique_exit2(
	; TAILFOLD-NEXT: entry:			; TAILFOLD-NEXT: entry:
	; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]			; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
	; TAILFOLD: for.cond:			; TAILFOLD: for.cond:
	; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]			; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
	; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]			; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i16, i16 [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <2 x i16>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i16 [[TMP9]] to <2 x i16>*
	; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP10]], align 4			; CHECK-NEXT: store <2 x i16> zeroinitializer, <2 x i16>* [[TMP10]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i32> [[VEC_IND]], <i32 2, i32 2>
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP10:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP10:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[IF_END:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INC:%.]], [[FOR_BODY:%.*]] ]
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I]], [[N]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[IF_END:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64			; CHECK-NEXT: [[IPROM:%.*]] = sext i32 [[I]] to i64
	; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]			; CHECK-NEXT: [[B:%.]] = getelementptr inbounds i16, i16 [[P]], i64 [[IPROM]]
	; CHECK-NEXT: store i16 0, i16* [[B]], align 4			; CHECK-NEXT: store i16 0, i16* [[B]], align 4
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
	; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[I]], 2096
	; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP11:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[CMP2]], label [[FOR_COND]], label [[IF_END]], [[LOOP11:!llvm.loop !.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[EXIT:%.*]] = phi i32 [ 0, [[FOR_COND]] ], [ 1, [[FOR_BODY]] ], [ 0, [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[EXIT:%.*]] = phi i32 [ 0, [[FOR_COND]] ], [ 1, [[FOR_BODY]] ]
	; CHECK-NEXT: ret i32 [[EXIT]]			; CHECK-NEXT: ret i32 [[EXIT]]
	;			;
	; TAILFOLD-LABEL: @multiple_unique_exit3(			; TAILFOLD-LABEL: @multiple_unique_exit3(
	; TAILFOLD-NEXT: entry:			; TAILFOLD-NEXT: entry:
	; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]			; TAILFOLD-NEXT: br label [[FOR_COND:%.*]]
	; TAILFOLD: for.cond:			; TAILFOLD: for.cond:
	; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]			; TAILFOLD-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_BODY:%.]] ]
	; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]			; TAILFOLD-NEXT: [[CMP:%.]] = icmp slt i32 [[I]], [[N:%.]]
	▲ Show 20 Lines • Show All 500 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: store float 1.000000e+01, float* [[TMP9]], align 4			; CHECK-NEXT: store float 1.000000e+01, float* [[TMP9]], align 4
	; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]			; CHECK-NEXT: br label [[PRED_STORE_CONTINUE2]]
	; CHECK: pred.store.continue2:			; CHECK: pred.store.continue2:
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200
	; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP12:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 201, 200			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
	; CHECK-NEXT: [[GEP:%.]] = getelementptr float, float [[ADDR]], i64 [[IV]]			; CHECK-NEXT: [[GEP:%.]] = getelementptr float, float [[ADDR]], i64 [[IV]]
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP_BODY:%.*]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.]], label [[LOOP_BODY:%.]]
	; CHECK: loop.body:			; CHECK: loop.body:
	; CHECK-NEXT: [[TMP11:%.]] = load float, float [[GEP]], align 4			; CHECK-NEXT: [[TMP11:%.]] = load float, float [[GEP]], align 4
	; CHECK-NEXT: [[PRED:%.*]] = fcmp oeq float [[TMP11]], 0.000000e+00			; CHECK-NEXT: [[PRED:%.*]] = fcmp oeq float [[TMP11]], 0.000000e+00
	; CHECK-NEXT: br i1 [[PRED]], label [[LOOP_LATCH]], label [[THEN:%.*]]			; CHECK-NEXT: br i1 [[PRED]], label [[LOOP_LATCH]], label [[THEN:%.*]]
	; CHECK: then:			; CHECK: then:
	; CHECK-NEXT: store float 1.000000e+01, float* [[GEP]], align 4			; CHECK-NEXT: store float 1.000000e+01, float* [[GEP]], align 4
	; CHECK-NEXT: br label [[LOOP_LATCH]]			; CHECK-NEXT: br label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 200
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.]], label [[VECTOR_BODY]], [[LOOP14:!llvm.loop !.]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2 x i32> [[TMP5]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2 x i32> [[TMP5]], [[RDX_SHUF]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[BIN_RDX]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[BIN_RDX]], i32 0
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 201, 200			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 200, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
	; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LOOP_LATCH]] ]			; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LOOP_LATCH]] ]
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[ADDR]], i64 [[IV]]			; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[ADDR]], i64 [[IV]]
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV]], 200
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT]], label [[LOOP_LATCH]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[EXIT:%.*]], label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[GEP]], align 4			; CHECK-NEXT: [[TMP8:%.]] = load i32, i32 [[GEP]], align 4
	; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[TMP8]]			; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[TMP8]]
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND2_NOT:%.*]] = icmp eq i64 [[IV]], 400			; CHECK-NEXT: [[EXITCOND2_NOT:%.*]] = icmp eq i64 [[IV]], 400
	; CHECK-NEXT: br i1 [[EXITCOND2_NOT]], label [[EXIT]], label [[LOOP_HEADER]], [[LOOP15:!llvm.loop !.*]]			; CHECK-NEXT: br i1 [[EXITCOND2_NOT]], label [[EXIT]], label [[LOOP_HEADER]], [[LOOP15:!llvm.loop !.*]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[LCSSA:%.*]] = phi i32 [ 0, [[LOOP_HEADER]] ], [ [[ACCUM_NEXT]], [[LOOP_LATCH]] ], [ [[TMP7]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[LCSSA:%.*]] = phi i32 [ 0, [[LOOP_HEADER]] ], [ [[ACCUM_NEXT]], [[LOOP_LATCH]] ]
	; CHECK-NEXT: ret i32 [[LCSSA]]			; CHECK-NEXT: ret i32 [[LCSSA]]
	;			;
	; TAILFOLD-LABEL: @me_reduction(			; TAILFOLD-LABEL: @me_reduction(
	; TAILFOLD-NEXT: entry:			; TAILFOLD-NEXT: entry:
	; TAILFOLD-NEXT: br label [[LOOP_HEADER:%.*]]			; TAILFOLD-NEXT: br label [[LOOP_HEADER:%.*]]
	; TAILFOLD: loop.header:			; TAILFOLD: loop.header:
	; TAILFOLD-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]			; TAILFOLD-NEXT: [[IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]
	; TAILFOLD-NEXT: [[ACCUM:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[ACCUM_NEXT:%.]], [[LOOP_LATCH]] ]			; TAILFOLD-NEXT: [[ACCUM:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[ACCUM_NEXT:%.]], [[LOOP_LATCH]] ]
	▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines