This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
LoopPeel.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
-
peel-branch-weights.ll
-
peel-loop-conditions.ll
-
peel-loop-irreducible.ll
-
peel-multiple-unreachable-exits.ll

Differential D134803

[LoopPeeling] Support peeling loops with non-latch exits
ClosedPublic

Authored by nikic on Sep 28 2022, 5:23 AM.

Download Raw Diff

Details

Reviewers

reames
fhahn
mkazantsev

Commits

rGb43a4d0850d5: [LoopPeeling] Support peeling loops with non-latch exits

Summary

Loop peeling currently requires that a) the latch is exiting b) a branch and c) other exits are unreachable/deopt. This patch removes all of these limitations, and adds the necessary branch weight updating support. It essentially works the same way as before with latch -> exiting terminator and loop trip count -> per exit trip count.

It's worth noting that there are still other limitations in profitability heuristics: This patch enables peeling of loops to make conditions invariant (which is pretty much always highly profitable if possible), while peeling to make loads dereferenceable still checks that non-latch exits are unreachable and PGO-based peeling has even more conditions. Those checks could be relaxed later if we consider those cases profitable.

The motivation for this change is that loops using iterator adaptors in Rust often optimize very badly, and end up with a loop phi of the form phi(true, false) in the final result. Peeling eliminates that phi and conditions based on it, which enables a lot of follow-on simplification.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Sep 28 2022, 5:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 5:23 AM

Herald added subscribers: wenlei, zzheng, JDevlieghere, hiraditya. · View Herald Transcript

nikic requested review of this revision.Sep 28 2022, 5:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 5:23 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B189142: Diff 463500.Sep 28 2022, 6:46 AM

Ping :)

nikic mentioned this in D135387: [LoopPeel] Allow to bypass profitability checks. NFC.Oct 6 2022, 12:11 PM

LGTM

This revision is now accepted and ready to land.Oct 6 2022, 1:23 PM

This revision was landed with ongoing or failed builds.Oct 7 2022, 3:36 AM

Closed by commit rGb43a4d0850d5: [LoopPeeling] Support peeling loops with non-latch exits (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rGb43a4d0850d5: [LoopPeeling] Support peeling loops with non-latch exits.

Heads-up: a root cause is pointing to this patch for a miscompile that leads to an infinite loop in one of our tests.
Working on a reproducer.

This change may have caused the https://github.com/llvm/llvm-project/issues/28112 to show up again in a few of our tests (the timeout just mentioned by @bgraur) .
Unfortunately it's likely going to be hard to create a reduced reproducer as the issue tends to shows up in large code and often disappears in an unpredictable manner when code gets reduced.

Is there a way to disable this heuristic?

Another possible source of this problem is that the loop transformation didn't take into account 'convergent' attribute and may have duplicated code that should not have been. E.g. here's what we had to do in the past: https://reviews.llvm.org/D17518

@nikic ^^^ PTAL.

It's plausible that the issue is related to convergence. As far as I can tell, loop peeling currently does not do any checks related to convergent operations in the loop. However, it's not really clear to me whether loop peeling of convergent operations is illegal -- it looks like (non-runtime) unrolling of convergent operations is allowed, so I would expect loop peeling to be as well. I have a very hard time understanding the requirements this imposes from the LangRef definition.

Maybe @nhaehnle can chime in, who seems to be the expert on this topic.

The common issue with duplication of convergent instructions is when two threads that previously would execute the same static convergent instruction end up executing different static convergent instructions after a transform. This is clearly not the case with loop peeling, so it really ought to be okay.

That said, there's the underlying meta-problem that we haven't yet adopted a good definition of convergent, meaning that given a piece of LLVM IR, it is not possible to tell which threads of a group are supposed to communicate in a given convergent instruction. It is entirely possible that there is some implicit assumption in the code about how convergent instructions behave that happens to get broken by a downstream transform now that the post-loop-peel CFG is different.

One interesting observation is that if we take the current definition of convergent as gospel (which we really shouldn't, but for the sake of the argument), then loop peeling with convergent ops is actually forbidden, and peeling loops with additional exits may make the problem worse. Specifically:

preheader:
  br loop
loop:
  convergentop
  br i1 cc1, exit, latch
latch:
  br i1 cc2, loop, exit
exit:

Peeling once results in:

  convergentop'
  br i1 cc1', exit, latch.peel     <-- X
latch.peel:
  br i1 cc2, preheader, exit    <-- X
preheader:
  br loop
loop:
  convergentop
  br i1 cc1, exit, latch
latch:
  br i1 cc2, loop, exit
exit:

The convergentop inside the loop now has the two branches marked X as additional control dependencies, which is forbidden by current LangRef. It is possible that this wasn't an issue in the past because the peeled latch branch is most likely constant anyway and gets optimized away, and now there's some new loop that gets peeled with a secondary exit whose condition isn't constant.

Keep in mind that this is all speculation, and personally, I find it bizarre. The definition of convergent that I would prefer doesn't have this issue, but it may well be that something in the downstream PTX flow is sensitive to this somehow today. It would be good to properly root-cause the issue.

P.S.: It's also possible that the issue only manifests with nested loops. Either because perhaps an outer loop is peeled (does that ever happen?) or because an inner loop is peeled and one of the additional exits breaks out of or continues an outer loop.

It would be good to properly root-cause the issue.

I'll try extracting the IR/PTX from the test we have failing, though I'm not optimistic that it will be small enough to be usable.

As I suspected, the hanging code is ~2.5MB of unoptimized IR, reduced to ~1MB after optimizations. The IR is thread-sync heavy, and it's hard to tell if anything is obviously wrong after it gets further massaged by ptxas into GPU instructions. :-(

Would it be possible to introduce a cc1 option disabling this kind of peeling, so we have some sort of workaround until we figure out a better way to deal with the issue?

@nikic . This blocks our internal compiler release and we need some sort of work-around until we can properly deal with the issue.

Would it be possible to introduce a cc1 option disabling this kind of peeling,...?

I do not understand the code enough to tell where would be the best point to control this. If you could suggest a minimally invasive way to do it, I'd appreciate it.

asbirlea added a subscriber: asbirlea.Oct 24 2022, 2:26 PM

asbirlea mentioned this in D136643: [LoopPeeling] Add flag to disable support for peeling loops with non-latch exits.Oct 24 2022, 3:48 PM

In D134803#3869970, @tra wrote:

As I suspected, the hanging code is ~2.5MB of unoptimized IR, reduced to ~1MB after optimizations. The IR is thread-sync heavy, and it's hard to tell if anything is obviously wrong after it gets further massaged by ptxas into GPU instructions. :-(

In this test case, is there only a single additional loop being peeled? If there are multiple, is it possible to localize which one is the problematic one? Can you identify which of the removed conditions in canPeel() are the ones that are relevant? Does the problematic loop contain convergent instructions as speculated before?

Just some suggestions on how the problem could be narrowed a bit further.

Would it be possible to introduce a cc1 option disabling this kind of peeling, so we have some sort of workaround until we figure out a better way to deal with the issue?

For the record, D136643 has the opt option (I assume that's what you meant, a cc1 option would be unusual). I think that patch is too complicated as-is, but generally having a temporary option is fine.

In D134803#3881760, @nikic wrote:

In D134803#3869970, @tra wrote:

As I suspected, the hanging code is ~2.5MB of unoptimized IR, reduced to ~1MB after optimizations. The IR is thread-sync heavy, and it's hard to tell if anything is obviously wrong after it gets further massaged by ptxas into GPU instructions. :-(

In this test case, is there only a single additional loop being peeled? If there are multiple, is it possible to localize which one is the problematic one? Can you identify which of the removed conditions in canPeel() are the ones that are relevant? Does the problematic loop contain convergent instructions as speculated before?

The bug triggers in IR generated at runtime for a fairly convoluted computation on sparse matrices. Attempts to reduce the test case result in substantially different generated IR and make the issue disappear.

To answer your questions:

There are multiple loops.
It's hard-to-imposible to tell which loop causes the problem, due to the test nature making it hard to precisely control generated IR. It's also not feasible to detect the issue in the generated SASS by eyeballing it. I did manage to do it on few occasions in the past, but it was on a much smaller code.
Hence I can't tell where things went wrong.
The code does have a lot of calls to @llvm.nvvm.barrier0 intrinsic which does have convergent attribute and does need control flow to remain 'structured', so that the GPU code could be correctly instrumented to re-converge after diverged conditional branches.

With D136643 we should be able to observe what exactly changes in the IR in that test and that may narrow down the scope to a subset of the loops involved.

Just some suggestions on how the problem could be narrowed a bit further.

Would it be possible to introduce a cc1 option disabling this kind of peeling, so we have some sort of workaround until we figure out a better way to deal with the issue?

For the record, D136643 has the opt option (I assume that's what you meant, a cc1 option would be unusual). I think that patch is too complicated as-is, but generally having a temporary option is fine.

This option is a workaround, not intended for general use. It needs to be a CC1 option as we need to pass it to a GPU cc1 sub-compilation, but we do want your code to work as is during the host compilation. And yes, the patch is probably one of the more complicated on/off knobs I've seen. :-) If there's an easier way to control things, I'd be all for it, but I'll take whatever works w/o disrupting other LLVM users.

asbirlea mentioned this in rGd1b19da854fd: [LoopPeeling] Add flag to disable support for peeling loops with non-latch exits.Oct 25 2022, 12:19 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

LoopPeel.cpp

194 lines

test/

Transforms/

LoopUnroll/

peel-branch-weights.ll

81 lines

peel-loop-conditions.ll

56 lines

peel-loop-irreducible.ll

49 lines

peel-multiple-unreachable-exits.ll

48 lines

Diff 466026

llvm/lib/Transforms/Utils/LoopPeel.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> UnrollForcePeelCount(
"unroll-force-peel-count", cl::init(0), cl::Hidden,		"unroll-force-peel-count", cl::init(0), cl::Hidden,
cl::desc("Force a peel count regardless of profiling information."));		cl::desc("Force a peel count regardless of profiling information."));

static const char *PeeledCountMetaData = "llvm.loop.peeled.count";		static const char *PeeledCountMetaData = "llvm.loop.peeled.count";

// Check whether we are capable of peeling this loop.		// Check whether we are capable of peeling this loop.
bool llvm::canPeel(Loop *L) {		bool llvm::canPeel(Loop *L) {
// Make sure the loop is in simplified form		// Make sure the loop is in simplified form
if (!L->isLoopSimplifyForm())		return L->isLoopSimplifyForm();
return false;

// Don't try to peel loops where the latch is not the exiting block.
// This can be an indication of two different things:
// 1) The loop is not rotated.
// 2) The loop contains irreducible control flow that involves the latch.
const BasicBlock *Latch = L->getLoopLatch();
if (!L->isLoopExiting(Latch))
return false;

// Peeling is only supported if the latch is a branch.
if (!isa<BranchInst>(Latch->getTerminator()))
return false;

SmallVector<BasicBlock *, 4> Exits;
L->getUniqueNonLatchExitBlocks(Exits);
// The latch must either be the only exiting block or all non-latch exit
// blocks have either a deopt or unreachable terminator or compose a chain of
// blocks where the last one is either deopt or unreachable terminated. Both
// deopt and unreachable terminators are a strong indication they are not
// taken. Note that this is a profitability check, not a legality check. Also
// note that LoopPeeling currently can only update the branch weights of latch
// blocks and branch weights to blocks with deopt or unreachable do not need
// updating.
return llvm::all_of(Exits, IsBlockFollowedByDeoptOrUnreachable);
}		}

// This function calculates the number of iterations after which the given Phi		// This function calculates the number of iterations after which the given Phi
// becomes an invariant. The pre-calculated values are memorized in the map. The		// becomes an invariant. The pre-calculated values are memorized in the map. The
// function (shortcut is I) is calculated according to the following definition:		// function (shortcut is I) is calculated according to the following definition:
// Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].		// Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].
// If %y is a loop invariant, then I(%x) = 1.		// If %y is a loop invariant, then I(%x) = 1.
// If %y is a Phi from the loop header, I(%x) = I(%y) + 1.		// If %y is a Phi from the loop header, I(%x) = I(%y) + 1.
▲ Show 20 Lines • Show All 368 Lines • ▼ Show 20 Lines	if (*EstimatedTripCount) {
LLVM_DEBUG(dbgs() << "Loop cost: " << LoopSize << "\n");		LLVM_DEBUG(dbgs() << "Loop cost: " << LoopSize << "\n");
LLVM_DEBUG(dbgs() << "Max peel cost: " << Threshold << "\n");		LLVM_DEBUG(dbgs() << "Max peel cost: " << Threshold << "\n");
LLVM_DEBUG(dbgs() << "Max peel count by cost: "		LLVM_DEBUG(dbgs() << "Max peel count by cost: "
<< (Threshold / LoopSize - 1) << "\n");		<< (Threshold / LoopSize - 1) << "\n");
}		}
}		}
}		}

/// Update the branch weights of the latch of a peeled-off loop		struct WeightInfo {
		// Weights for current iteration.
		SmallVector<uint32_t> Weights;
		// Weights to subtract after each iteration.
		const SmallVector<uint32_t> SubWeights;
		};

		/// Update the branch weights of an exiting block of a peeled-off loop
/// iteration.		/// iteration.
/// This sets the branch weights for the latch of the recently peeled off loop		/// Let F is a weight of the edge to continue (fallthrough) into the loop.
/// iteration correctly.		/// Let E is a weight of the edge to an exit.
/// Let F is a weight of the edge from latch to header.
/// Let E is a weight of the edge from latch to exit.
/// F/(F+E) is a probability to go to loop and E/(F+E) is a probability to		/// F/(F+E) is a probability to go to loop and E/(F+E) is a probability to
/// go to exit.		/// go to exit.
/// Then, Estimated TripCount = F / E.		/// Then, Estimated ExitCount = F / E.
/// For I-th (counting from 0) peeled off iteration we set the the weights for		/// For I-th (counting from 0) peeled off iteration we set the the weights for
/// the peeled latch as (TC - I, 1). It gives us reasonable distribution,		/// the peeled exit as (EC - I, 1). It gives us reasonable distribution,
/// The probability to go to exit 1/(TC-I) increases. At the same time		/// The probability to go to exit 1/(EC-I) increases. At the same time
/// the estimated trip count of remaining loop reduces by I.		/// the estimated exit count in the remainder loop reduces by I.
/// To avoid dealing with division rounding we can just multiple both part		/// To avoid dealing with division rounding we can just multiple both part
/// of weights to E and use weight as (F - I * E, E).		/// of weights to E and use weight as (F - I * E, E).
///		static void updateBranchWeights(Instruction *Term, WeightInfo &Info) {
/// \param Header The copy of the header block that belongs to next iteration.		MDBuilder MDB(Term->getContext());
/// \param LatchBR The copy of the latch branch that belongs to this iteration.		Term->setMetadata(LLVMContext::MD_prof,
/// \param[in,out] FallThroughWeight The weight of the edge from latch to		MDB.createBranchWeights(Info.Weights));
/// header before peeling (in) and after peeled off one iteration (out).		for (auto [Idx, SubWeight] : enumerate(Info.SubWeights))
static void updateBranchWeights(BasicBlock Header, BranchInst LatchBR,		if (SubWeight != 0)
uint64_t ExitWeight,		Info.Weights[Idx] = Info.Weights[Idx] > SubWeight
uint64_t &FallThroughWeight) {		? Info.Weights[Idx] - SubWeight
// FallThroughWeight is 0 means that there is no branch weights on original		: 1;
// latch block or estimated trip count is zero.
if (!FallThroughWeight)
return;

unsigned HeaderIdx = (LatchBR->getSuccessor(0) == Header ? 0 : 1);
MDBuilder MDB(LatchBR->getContext());
MDNode *WeightNode =
HeaderIdx ? MDB.createBranchWeights(ExitWeight, FallThroughWeight)
: MDB.createBranchWeights(FallThroughWeight, ExitWeight);
LatchBR->setMetadata(LLVMContext::MD_prof, WeightNode);
FallThroughWeight =
FallThroughWeight > ExitWeight ? FallThroughWeight - ExitWeight : 1;
}		}

/// Initialize the weights.		/// Initialize the weights for all exiting blocks.
///		static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
/// \param Header The header block.		Loop *L) {
/// \param LatchBR The latch branch.		SmallVector<BasicBlock *> ExitingBlocks;
/// \param[out] ExitWeight The weight of the edge from Latch to Exit.		L->getExitingBlocks(ExitingBlocks);
/// \param[out] FallThroughWeight The weight of the edge from Latch to Header.		for (BasicBlock *ExitingBlock : ExitingBlocks) {
static void initBranchWeights(BasicBlock Header, BranchInst LatchBR,		Instruction *Term = ExitingBlock->getTerminator();
uint64_t &ExitWeight,		SmallVector<uint32_t> Weights;
uint64_t &FallThroughWeight) {		if (!extractBranchWeights(*Term, Weights))
uint64_t TrueWeight, FalseWeight;		continue;
if (!extractBranchWeights(*LatchBR, TrueWeight, FalseWeight))
return;		// See the comment on updateBranchWeights() for an explanation of what we
unsigned HeaderIdx = LatchBR->getSuccessor(0) == Header ? 0 : 1;		// do here.
ExitWeight = HeaderIdx ? TrueWeight : FalseWeight;		uint32_t FallThroughWeights = 0;
FallThroughWeight = HeaderIdx ? FalseWeight : TrueWeight;		uint32_t ExitWeights = 0;
		for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
		if (L->contains(Succ))
		FallThroughWeights += Weight;
		else
		ExitWeights += Weight;
}		}

/// Update the weights of original Latch block after peeling off all iterations.		// Don't try to update weights for degenerate case.
///		if (FallThroughWeights == 0)
/// \param Header The header block.		continue;
/// \param LatchBR The latch branch.
/// \param ExitWeight The weight of the edge from Latch to Exit.		SmallVector<uint32_t> SubWeights;
/// \param FallThroughWeight The weight of the edge from Latch to Header.		for (auto [Succ, Weight] : zip(successors(Term), Weights)) {
static void fixupBranchWeights(BasicBlock Header, BranchInst LatchBR,		if (!L->contains(Succ)) {
uint64_t ExitWeight,		// Exit weights stay the same.
uint64_t FallThroughWeight) {		SubWeights.push_back(0);
// FallThroughWeight is 0 means that there is no branch weights on original		continue;
// latch block or estimated trip count is zero.		}
if (!FallThroughWeight)
return;		// Subtract exit weights on each iteration, distributed across all
		// fallthrough edges.
// Sets the branch weights on the loop exit.		double W = (double)Weight / (double)FallThroughWeights;
MDBuilder MDB(LatchBR->getContext());		SubWeights.push_back((uint32_t)(ExitWeights * W));
unsigned HeaderIdx = LatchBR->getSuccessor(0) == Header ? 0 : 1;		}
MDNode *WeightNode =
HeaderIdx ? MDB.createBranchWeights(ExitWeight, FallThroughWeight)		WeightInfos.insert({Term, {std::move(Weights), std::move(SubWeights)}});
: MDB.createBranchWeights(FallThroughWeight, ExitWeight);		}
LatchBR->setMetadata(LLVMContext::MD_prof, WeightNode);		}

		/// Update the weights of original exiting block after peeling off all
		/// iterations.
		static void fixupBranchWeights(Instruction *Term, const WeightInfo &Info) {
		MDBuilder MDB(Term->getContext());
		Term->setMetadata(LLVMContext::MD_prof,
		MDB.createBranchWeights(Info.Weights));
}		}

/// Clones the body of the loop L, putting it between \p InsertTop and \p		/// Clones the body of the loop L, putting it between \p InsertTop and \p
/// InsertBot.		/// InsertBot.
/// \param IterNumber The serial number of the iteration currently being		/// \param IterNumber The serial number of the iteration currently being
/// peeled off.		/// peeled off.
/// \param ExitEdges The exit edges of the original loop.		/// \param ExitEdges The exit edges of the original loop.
/// \param[out] NewBlocks A list of the blocks in the newly created clone		/// \param[out] NewBlocks A list of the blocks in the newly created clone
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	static void cloneLoopBlocks(
InsertTop->getTerminator()->setSuccessor(0, cast<BasicBlock>(VMap[Header]));		InsertTop->getTerminator()->setSuccessor(0, cast<BasicBlock>(VMap[Header]));

// Similarly, for the latch:		// Similarly, for the latch:
// The original exiting edge is still hooked up to the loop exit.		// The original exiting edge is still hooked up to the loop exit.
// The backedge now goes to the "bottom", which is either the loop's real		// The backedge now goes to the "bottom", which is either the loop's real
// header (for the last peeled iteration) or the copied header of the next		// header (for the last peeled iteration) or the copied header of the next
// iteration (for every other iteration)		// iteration (for every other iteration)
BasicBlock *NewLatch = cast<BasicBlock>(VMap[Latch]);		BasicBlock *NewLatch = cast<BasicBlock>(VMap[Latch]);
BranchInst *LatchBR = cast<BranchInst>(NewLatch->getTerminator());		auto *LatchTerm = cast<Instruction>(NewLatch->getTerminator());
for (unsigned idx = 0, e = LatchBR->getNumSuccessors(); idx < e; ++idx)		for (unsigned idx = 0, e = LatchTerm->getNumSuccessors(); idx < e; ++idx)
if (LatchBR->getSuccessor(idx) == Header) {		if (LatchTerm->getSuccessor(idx) == Header) {
LatchBR->setSuccessor(idx, InsertBot);		LatchTerm->setSuccessor(idx, InsertBot);
break;		break;
}		}
if (DT)		if (DT)
DT->changeImmediateDominator(InsertBot, NewLatch);		DT->changeImmediateDominator(InsertBot, NewLatch);

// The new copy of the loop body starts with a bunch of PHI nodes		// The new copy of the loop body starts with a bunch of PHI nodes
// that pick an incoming value from either the preheader, or the previous		// that pick an incoming value from either the preheader, or the previous
// loop iteration. Since this copy is no longer part of the loop, we		// loop iteration. Since this copy is no longer part of the loop, we
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	BasicBlock *NewPreHeader =
SplitBlock(InsertBot, InsertBot->getTerminator(), &DT, LI);		SplitBlock(InsertBot, InsertBot->getTerminator(), &DT, LI);

InsertTop->setName(Header->getName() + ".peel.begin");		InsertTop->setName(Header->getName() + ".peel.begin");
InsertBot->setName(Header->getName() + ".peel.next");		InsertBot->setName(Header->getName() + ".peel.next");
NewPreHeader->setName(PreHeader->getName() + ".peel.newph");		NewPreHeader->setName(PreHeader->getName() + ".peel.newph");

ValueToValueMapTy LVMap;		ValueToValueMapTy LVMap;

		Instruction *LatchTerm =
		cast<Instruction>(cast<BasicBlock>(Latch)->getTerminator());

// If we have branch weight information, we'll want to update it for the		// If we have branch weight information, we'll want to update it for the
// newly created branches.		// newly created branches.
BranchInst *LatchBR =		DenseMap<Instruction *, WeightInfo> Weights;
cast<BranchInst>(cast<BasicBlock>(Latch)->getTerminator());		initBranchWeights(Weights, L);
uint64_t ExitWeight = 0, FallThroughWeight = 0;
initBranchWeights(Header, LatchBR, ExitWeight, FallThroughWeight);

// Identify what noalias metadata is inside the loop: if it is inside the		// Identify what noalias metadata is inside the loop: if it is inside the
// loop, the associated metadata must be cloned for each iteration.		// loop, the associated metadata must be cloned for each iteration.
SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;		SmallVector<MDNode *, 6> LoopLocalNoAliasDeclScopes;
identifyNoAliasScopesToClone(L->getBlocks(), LoopLocalNoAliasDeclScopes);		identifyNoAliasScopesToClone(L->getBlocks(), LoopLocalNoAliasDeclScopes);

// For each peeled-off iteration, make a copy of the loop.		// For each peeled-off iteration, make a copy of the loop.
for (unsigned Iter = 0; Iter < PeelCount; ++Iter) {		for (unsigned Iter = 0; Iter < PeelCount; ++Iter) {
Show All 12 Lines	for (unsigned Iter = 0; Iter < PeelCount; ++Iter) {
if (Iter == 0)		if (Iter == 0)
for (auto BBIDom : NonLoopBlocksIDom)		for (auto BBIDom : NonLoopBlocksIDom)
DT.changeImmediateDominator(BBIDom.first,		DT.changeImmediateDominator(BBIDom.first,
cast<BasicBlock>(LVMap[BBIDom.second]));		cast<BasicBlock>(LVMap[BBIDom.second]));
#ifdef EXPENSIVE_CHECKS		#ifdef EXPENSIVE_CHECKS
assert(DT.verify(DominatorTree::VerificationLevel::Fast));		assert(DT.verify(DominatorTree::VerificationLevel::Fast));
#endif		#endif

auto *LatchBRCopy = cast<BranchInst>(VMap[LatchBR]);		for (auto &[Term, Info] : Weights) {
updateBranchWeights(InsertBot, LatchBRCopy, ExitWeight, FallThroughWeight);		auto *TermCopy = cast<Instruction>(VMap[Term]);
		updateBranchWeights(TermCopy, Info);
		}

// Remove Loop metadata from the latch branch instruction		// Remove Loop metadata from the latch branch instruction
// because it is not the Loop's latch branch anymore.		// because it is not the Loop's latch branch anymore.
LatchBRCopy->setMetadata(LLVMContext::MD_loop, nullptr);		auto *LatchTermCopy = cast<Instruction>(VMap[LatchTerm]);
		LatchTermCopy->setMetadata(LLVMContext::MD_loop, nullptr);

InsertTop = InsertBot;		InsertTop = InsertBot;
InsertBot = SplitBlock(InsertBot, InsertBot->getTerminator(), &DT, LI);		InsertBot = SplitBlock(InsertBot, InsertBot->getTerminator(), &DT, LI);
InsertBot->setName(Header->getName() + ".peel.next");		InsertBot->setName(Header->getName() + ".peel.next");

F->getBasicBlockList().splice(InsertTop->getIterator(),		F->getBasicBlockList().splice(InsertTop->getIterator(),
F->getBasicBlockList(),		F->getBasicBlockList(),
NewBlocks[0]->getIterator(), F->end());		NewBlocks[0]->getIterator(), F->end());
}		}

// Now adjust the phi nodes in the loop header to get their initial values		// Now adjust the phi nodes in the loop header to get their initial values
// from the last peeled-off iteration instead of the preheader.		// from the last peeled-off iteration instead of the preheader.
for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {		for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
PHINode *PHI = cast<PHINode>(I);		PHINode *PHI = cast<PHINode>(I);
Value *NewVal = PHI->getIncomingValueForBlock(Latch);		Value *NewVal = PHI->getIncomingValueForBlock(Latch);
Instruction *LatchInst = dyn_cast<Instruction>(NewVal);		Instruction *LatchInst = dyn_cast<Instruction>(NewVal);
if (LatchInst && L->contains(LatchInst))		if (LatchInst && L->contains(LatchInst))
NewVal = LVMap[LatchInst];		NewVal = LVMap[LatchInst];

PHI->setIncomingValueForBlock(NewPreHeader, NewVal);		PHI->setIncomingValueForBlock(NewPreHeader, NewVal);
}		}

fixupBranchWeights(Header, LatchBR, ExitWeight, FallThroughWeight);		for (const auto &[Term, Info] : Weights)
		fixupBranchWeights(Term, Info);

// Update Metadata for count of peeled off iterations.		// Update Metadata for count of peeled off iterations.
unsigned AlreadyPeeled = 0;		unsigned AlreadyPeeled = 0;
if (auto Peeled = getOptionalIntLoopAttribute(L, PeeledCountMetaData))		if (auto Peeled = getOptionalIntLoopAttribute(L, PeeledCountMetaData))
AlreadyPeeled = *Peeled;		AlreadyPeeled = *Peeled;
addStringMetadataToLoop(L, PeeledCountMetaData, AlreadyPeeled + PeelCount);		addStringMetadataToLoop(L, PeeledCountMetaData, AlreadyPeeled + PeelCount);

if (Loop *ParentLoop = L->getParentLoop())		if (Loop *ParentLoop = L->getParentLoop())
Show All 17 Lines

llvm/test/Transforms/LoopUnroll/peel-branch-weights.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals
				; RUN: opt < %s -S -loop-unroll -unroll-force-peel-count=2 2>&1 \| FileCheck %s

				declare i32 @get.x()

				; Test branch weight update for terminator with multiple fallthrough and
				; multiple exit edges.
				define void @test() {
				; CHECK-LABEL: @test(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[LOOP_PEEL_BEGIN:%.*]]
				; CHECK: loop.peel.begin:
				; CHECK-NEXT: br label [[LOOP_PEEL:%.*]]
				; CHECK: loop.peel:
				; CHECK-NEXT: [[X_PEEL:%.*]] = call i32 @get.x()
				; CHECK-NEXT: switch i32 [[X_PEEL]], label [[LOOP_LATCH_PEEL:%.*]] [
				; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL]]
				; CHECK-NEXT: i32 1, label [[LOOP_EXIT:%.*]]
				; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
				; CHECK-NEXT: ], !prof [[PROF0:![0-9]+]]
				; CHECK: loop.latch.peel:
				; CHECK-NEXT: br label [[LOOP_PEEL_NEXT:%.*]]
				; CHECK: loop.peel.next:
				; CHECK-NEXT: br label [[LOOP_PEEL2:%.*]]
				; CHECK: loop.peel2:
				; CHECK-NEXT: [[X_PEEL3:%.*]] = call i32 @get.x()
				; CHECK-NEXT: switch i32 [[X_PEEL3]], label [[LOOP_LATCH_PEEL4:%.*]] [
				; CHECK-NEXT: i32 0, label [[LOOP_LATCH_PEEL4]]
				; CHECK-NEXT: i32 1, label [[LOOP_EXIT]]
				; CHECK-NEXT: i32 2, label [[LOOP_EXIT]]
				; CHECK-NEXT: ], !prof [[PROF1:![0-9]+]]
				; CHECK: loop.latch.peel4:
				; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]
				; CHECK: loop.peel.next1:
				; CHECK-NEXT: br label [[LOOP_PEEL_NEXT5:%.*]]
				; CHECK: loop.peel.next5:
				; CHECK-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
				; CHECK: entry.peel.newph:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[X:%.*]] = call i32 @get.x()
				; CHECK-NEXT: switch i32 [[X]], label [[LOOP_LATCH:%.*]] [
				; CHECK-NEXT: i32 0, label [[LOOP_LATCH]]
				; CHECK-NEXT: i32 1, label [[LOOP_EXIT_LOOPEXIT:%.*]]
				; CHECK-NEXT: i32 2, label [[LOOP_EXIT_LOOPEXIT]]
				; CHECK-NEXT: ], !prof [[PROF2:![0-9]+]]
				; CHECK: loop.latch:
				; CHECK-NEXT: br label [[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
				; CHECK: loop.exit.loopexit:
				; CHECK-NEXT: br label [[LOOP_EXIT]]
				; CHECK: loop.exit:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %loop

				loop:
				%x = call i32 @get.x()
				switch i32 %x, label %loop.latch [
				i32 0, label %loop.latch
				i32 1, label %loop.exit
				i32 2, label %loop.exit
				], !prof !0

				loop.latch:
				br label %loop

				loop.exit:
				ret void
				}

				!0 = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}

				;.
				; CHECK: [[PROF0]] = !{!"branch_weights", i32 100, i32 200, i32 20, i32 10}
				; CHECK: [[PROF1]] = !{!"branch_weights", i32 90, i32 180, i32 20, i32 10}
				; CHECK: [[PROF2]] = !{!"branch_weights", i32 80, i32 160, i32 20, i32 10}
				; CHECK: [[LOOP3]] = distinct !{!3, !4, !5}
				; CHECK: [[META4:![0-9]+]] = !{!"llvm.loop.peeled.count", i32 2}
				; CHECK: [[META5:![0-9]+]] = !{!"llvm.loop.unroll.disable"}
				;.

llvm/test/Transforms/LoopUnroll/peel-loop-conditions.ll

Show First 20 Lines • Show All 1,134 Lines • ▼ Show 20 Lines	for.inc:
%inc = add nsw i32 %i.05, 1		%inc = add nsw i32 %i.05, 1
%cmp = icmp slt i32 %inc, %k		%cmp = icmp slt i32 %inc, %k
br i1 %cmp, label %for.body, label %for.end		br i1 %cmp, label %for.body, label %for.end

for.end:		for.end:
ret void		ret void
}		}

; Invoke is not a conditional branch that we can optimize,
; so this shouldn't be peeled at all. This is a reproducer
; for a bug where evaluating the loop would fail an assertion.
define void @test17() personality i8* undef{		define void @test17() personality i8* undef{
; CHECK-LABEL: @test17(		; CHECK-LABEL: @test17(
; CHECK-NEXT: body:		; CHECK-NEXT: body:
		; CHECK-NEXT: br label [[LOOP_PEEL_BEGIN:%.*]]
		; CHECK: loop.peel.begin:
		; CHECK-NEXT: br label [[LOOP_PEEL:%.*]]
		; CHECK: loop.peel:
		; CHECK-NEXT: invoke void @f1()
		; CHECK-NEXT: to label [[LOOP_PEEL_NEXT:%.]] unwind label [[EH_UNW_LOOPEXIT_LOOPEXIT_SPLIT_LP:%.]]
		; CHECK: loop.peel.next:
		; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]
		; CHECK: loop.peel.next1:
		; CHECK-NEXT: br label [[BODY_PEEL_NEWPH:%.*]]
		; CHECK: body.peel.newph:
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[CONST:%.]] = phi i64 [ -33, [[LOOP]] ], [ -20, [[BODY:%.]] ]
; CHECK-NEXT: invoke void @f1()		; CHECK-NEXT: invoke void @f1()
; CHECK-NEXT: to label [[LOOP]] unwind label [[EH_UNW_LOOPEXIT:%.*]]		; CHECK-NEXT: to label [[LOOP]] unwind label [[EH_UNW_LOOPEXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP13:![0-9]+]]
; CHECK: eh.Unw.loopexit:		; CHECK: eh.Unw.loopexit.loopexit:
; CHECK-NEXT: [[LPAD_LOOPEXIT:%.]] = landingpad { i8, i32 }		; CHECK-NEXT: [[LPAD_LOOPEXIT2:%.]] = landingpad { i8, i32 }
		; CHECK-NEXT: catch i8* null
		; CHECK-NEXT: br label [[EH_UNW_LOOPEXIT:%.*]]
		; CHECK: eh.Unw.loopexit.loopexit.split-lp:
		; CHECK-NEXT: [[LPAD_LOOPEXIT_SPLIT_LP:%.]] = landingpad { i8, i32 }
; CHECK-NEXT: catch i8* null		; CHECK-NEXT: catch i8* null
		; CHECK-NEXT: br label [[EH_UNW_LOOPEXIT]]
		; CHECK: eh.Unw.loopexit:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
body:		body:
br label %loop		br label %loop

loop:		loop:
%const = phi i64 [ -33, %loop ], [ -20, %body ]		%const = phi i64 [ -33, %loop ], [ -20, %body ]
invoke void @f1()		invoke void @f1()
to label %loop unwind label %eh.Unw.loopexit		to label %loop unwind label %eh.Unw.loopexit

eh.Unw.loopexit:		eh.Unw.loopexit:
%lpad.loopexit = landingpad { i8*, i32 }		%lpad.loopexit = landingpad { i8*, i32 }
catch i8* null		catch i8* null
ret void		ret void
}		}

; Testcase reduced from PR48812. We expect no peeling		; Testcase reduced from PR48812.
; because the latch terminator is a switch.
define void @test18(i32* %p) {		define void @test18(i32* %p) {
; CHECK-LABEL: @test18(		; CHECK-LABEL: @test18(
; CHECK-NEXT: init:		; CHECK-NEXT: init:
		; CHECK-NEXT: br label [[LOOP_PEEL_BEGIN:%.*]]
		; CHECK: loop.peel.begin:
		; CHECK-NEXT: br label [[LOOP_PEEL:%.*]]
		; CHECK: loop.peel:
		; CHECK-NEXT: br label [[LATCH_PEEL:%.*]]
		; CHECK: latch.peel:
		; CHECK-NEXT: [[CONTROL_PEEL:%.]] = load volatile i32, i32 [[P:%.*]], align 4
		; CHECK-NEXT: switch i32 [[CONTROL_PEEL]], label [[EXIT:%.*]] [
		; CHECK-NEXT: i32 2, label [[LOOP_PEEL_NEXT:%.*]]
		; CHECK-NEXT: ]
		; CHECK: loop.peel.next:
		; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]
		; CHECK: loop.peel.next1:
		; CHECK-NEXT: br label [[INIT_PEEL_NEWPH:%.*]]
		; CHECK: init.peel.newph:
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[CONST:%.]] = phi i32 [ 40, [[INIT:%.]] ], [ 0, [[LATCH:%.*]] ]		; CHECK-NEXT: br label [[LATCH:%.*]]
; CHECK-NEXT: br label [[LATCH]]
; CHECK: latch:		; CHECK: latch:
; CHECK-NEXT: [[CONTROL:%.]] = load volatile i32, i32 [[P:%.*]], align 4		; CHECK-NEXT: [[CONTROL:%.]] = load volatile i32, i32 [[P]], align 4
; CHECK-NEXT: switch i32 [[CONTROL]], label [[EXIT:%.*]] [		; CHECK-NEXT: switch i32 [[CONTROL]], label [[EXIT_LOOPEXIT:%.*]] [
; CHECK-NEXT: i32 2, label [[LOOP]]		; CHECK-NEXT: i32 2, label [[LOOP]]
; CHECK-NEXT: ]		; CHECK-NEXT: ], !llvm.loop [[LOOP14:![0-9]+]]
		; CHECK: exit.loopexit:
		; CHECK-NEXT: br label [[EXIT]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
init:		init:
br label %loop		br label %loop

loop:		loop:
%const = phi i32 [ 40, %init ], [ 0, %latch ]		%const = phi i32 [ 40, %init ], [ 0, %latch ]
Show All 14 Lines

llvm/test/Transforms/LoopUnroll/peel-loop-irreducible.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -loop-unroll -unroll-force-peel-count=1 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-force-peel-count=1 \| FileCheck %s

	; Check we don't peel loops where the latch is not the exiting block.			define i32 @invariant_backedge_irreducible(i32 %a, i32 %b) {
	; CHECK-LABEL: @invariant_backedge_irreducible			; CHECK-LABEL: @invariant_backedge_irreducible(
	; CHECK: entry:			; CHECK-NEXT: entry:
	; CHECK: br label %header			; CHECK-NEXT: br label [[HEADER_PEEL_BEGIN:%.*]]
	; CHECK-NOT: peel			; CHECK: header.peel.begin:
				; CHECK-NEXT: br label [[HEADER_PEEL:%.*]]
				; CHECK: header.peel:
				; CHECK-NEXT: br i1 false, label [[LATCH_PEEL:%.]], label [[EXITING_PEEL:%.]]
				; CHECK: latch.peel:
				; CHECK-NEXT: [[INC_PEEL:%.*]] = add i32 0, 1
				; CHECK-NEXT: [[CMP_PEEL:%.*]] = icmp slt i32 0, 1000
				; CHECK-NEXT: br i1 [[CMP_PEEL]], label [[HEADER_PEEL_NEXT:%.*]], label [[EXITING_PEEL]]
				; CHECK: exiting.peel:
				; CHECK-NEXT: [[CMP_EXITING_PEEL:%.*]] = phi i1 [ false, [[HEADER_PEEL]] ], [ [[CMP_PEEL]], [[LATCH_PEEL]] ]
				; CHECK-NEXT: br i1 [[CMP_EXITING_PEEL]], label [[LATCH_PEEL]], label [[EXIT:%.*]]
				; CHECK: header.peel.next:
				; CHECK-NEXT: br label [[HEADER_PEEL_NEXT1:%.*]]
				; CHECK: header.peel.next1:
				; CHECK-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
				; CHECK: entry.peel.newph:
				; CHECK-NEXT: br label [[HEADER:%.*]]
	; CHECK: header:			; CHECK: header:
	; CHECK: br i1 {{.*}} label %latch, label %exiting			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[INC_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[INC:%.]], [[LATCH:%.*]] ]
				; CHECK-NEXT: [[CMP_PHI:%.]] = phi i1 [ [[CMP_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[CMP:%.]], [[LATCH]] ]
				; CHECK-NEXT: br i1 [[CMP_PHI]], label [[LATCH]], label [[EXITING:%.*]]
	; CHECK: latch:			; CHECK: latch:
	; CHECK: br i1 {{.*}} label %header, label %exiting			; CHECK-NEXT: [[INC]] = add nsw i32 [[I]], 1
				; CHECK-NEXT: [[CMP]] = icmp slt i32 [[I]], 1000
				; CHECK-NEXT: br i1 [[CMP]], label [[HEADER]], label [[EXITING]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: exiting:			; CHECK: exiting:
	; CHECK: br i1 {{.*}} label %latch, label %exit			; CHECK-NEXT: [[CMP_EXITING:%.*]] = phi i1 [ [[CMP_PHI]], [[HEADER]] ], [ [[CMP]], [[LATCH]] ]
				; CHECK-NEXT: br i1 [[CMP_EXITING]], label [[LATCH]], label [[EXIT_LOOPEXIT:%.*]]
	define i32 @invariant_backedge_irreducible(i32 %a, i32 %b) {			; CHECK: exit.loopexit:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret i32 0
				;
	entry:			entry:
	br label %header			br label %header

	header:			header:
	%i = phi i32 [ 0, %entry ], [ %inc, %latch ]			%i = phi i32 [ 0, %entry ], [ %inc, %latch ]
	%cmp.phi = phi i1 [ false, %entry ], [ %cmp, %latch ]			%cmp.phi = phi i1 [ false, %entry ], [ %cmp, %latch ]
	br i1 %cmp.phi, label %latch, label %exiting			br i1 %cmp.phi, label %latch, label %exiting

	Show All 13 Lines

llvm/test/Transforms/LoopUnroll/peel-multiple-unreachable-exits.ll

	Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	unreachable.exit:			unreachable.exit:
	call void @foo()			call void @foo()
	unreachable			unreachable
	}			}

	define void @peel_unreachable_and_multiple_reachable_exits(i32* %ptr, i32 %N, i32 %x) {			define void @peel_unreachable_and_multiple_reachable_exits(i32* %ptr, i32 %N, i32 %x) {
	; CHECK-LABEL: @peel_unreachable_and_multiple_reachable_exits(			; CHECK-LABEL: @peel_unreachable_and_multiple_reachable_exits(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[LOOP_HEADER_PEEL_BEGIN:%.*]]
				; CHECK: loop.header.peel.begin:
				; CHECK-NEXT: br label [[LOOP_HEADER_PEEL:%.*]]
				; CHECK: loop.header.peel:
				; CHECK-NEXT: [[C_PEEL:%.*]] = icmp ult i32 1, 2
				; CHECK-NEXT: br i1 [[C_PEEL]], label [[THEN_PEEL:%.]], label [[ELSE_PEEL:%.]]
				; CHECK: else.peel:
				; CHECK-NEXT: [[C_3_PEEL:%.]] = icmp eq i32 1, [[X:%.]]
				; CHECK-NEXT: br i1 [[C_3_PEEL]], label [[UNREACHABLE_EXIT:%.]], label [[LOOP_LATCH_PEEL:%.]]
				; CHECK: then.peel:
				; CHECK-NEXT: [[C_2_PEEL:%.*]] = icmp sgt i32 1, [[X]]
				; CHECK-NEXT: br i1 [[C_2_PEEL]], label [[EXIT:%.*]], label [[LOOP_LATCH_PEEL]]
				; CHECK: loop.latch.peel:
				; CHECK-NEXT: [[M_PEEL:%.*]] = phi i32 [ 0, [[THEN_PEEL]] ], [ [[X]], [[ELSE_PEEL]] ]
				; CHECK-NEXT: [[GEP_PEEL:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 1
				; CHECK-NEXT: store i32 [[M_PEEL]], i32* [[GEP_PEEL]], align 4
				; CHECK-NEXT: [[IV_NEXT_PEEL:%.*]] = add nuw nsw i32 1, 1
				; CHECK-NEXT: [[C_4_PEEL:%.*]] = icmp ult i32 1, 1000
				; CHECK-NEXT: br i1 [[C_4_PEEL]], label [[LOOP_HEADER_PEEL_NEXT:%.*]], label [[EXIT]]
				; CHECK: loop.header.peel.next:
				; CHECK-NEXT: br label [[LOOP_HEADER_PEEL_NEXT1:%.*]]
				; CHECK: loop.header.peel.next1:
				; CHECK-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
				; CHECK: entry.peel.newph:
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.*]] ]
	; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[IV]], 2			; CHECK-NEXT: br i1 false, label [[THEN:%.]], label [[ELSE:%.]]
	; CHECK-NEXT: br i1 [[C]], label [[THEN:%.]], label [[ELSE:%.]]
	; CHECK: then:			; CHECK: then:
	; CHECK-NEXT: [[C_2:%.]] = icmp sgt i32 [[IV]], [[X:%.]]			; CHECK-NEXT: br i1 true, label [[EXIT_LOOPEXIT:%.*]], label [[LOOP_LATCH]]
	; CHECK-NEXT: br i1 [[C_2]], label [[EXIT:%.*]], label [[LOOP_LATCH]]
	; CHECK: else:			; CHECK: else:
	; CHECK-NEXT: [[C_3:%.*]] = icmp eq i32 [[IV]], [[X]]			; CHECK-NEXT: [[C_3:%.*]] = icmp eq i32 [[IV]], [[X]]
	; CHECK-NEXT: br i1 [[C_3]], label [[UNREACHABLE_EXIT:%.*]], label [[LOOP_LATCH]]			; CHECK-NEXT: br i1 [[C_3]], label [[UNREACHABLE_EXIT_LOOPEXIT:%.*]], label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[M:%.*]] = phi i32 [ 0, [[THEN]] ], [ [[X]], [[ELSE]] ]			; CHECK-NEXT: [[M:%.*]] = phi i32 [ 0, [[THEN]] ], [ [[X]], [[ELSE]] ]
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[IV]]			; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR]], i32 [[IV]]
	; CHECK-NEXT: store i32 [[M]], i32* [[GEP]], align 4			; CHECK-NEXT: store i32 [[M]], i32* [[GEP]], align 4
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
	; CHECK-NEXT: [[C_4:%.*]] = icmp ult i32 [[IV]], 1000			; CHECK-NEXT: [[C_4:%.*]] = icmp ult i32 [[IV]], 1000
	; CHECK-NEXT: br i1 [[C_4]], label [[LOOP_HEADER]], label [[EXIT]]			; CHECK-NEXT: br i1 [[C_4]], label [[LOOP_HEADER]], label [[EXIT_LOOPEXIT]], !llvm.loop [[LOOP2:![0-9]+]]
				; CHECK: exit.loopexit:
				; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
				; CHECK: unreachable.exit.loopexit:
				; CHECK-NEXT: br label [[UNREACHABLE_EXIT]]
	; CHECK: unreachable.exit:			; CHECK: unreachable.exit:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br label %loop.header			br label %loop.header

	loop.header:			loop.header:
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[C_2:%.*]] = icmp eq i32 [[IV]], [[X]]			; CHECK-NEXT: [[C_2:%.*]] = icmp eq i32 [[IV]], [[X]]
	; CHECK-NEXT: br i1 [[C_2]], label [[EXIT_2_LOOPEXIT:%.*]], label [[LOOP_LATCH]]			; CHECK-NEXT: br i1 [[C_2]], label [[EXIT_2_LOOPEXIT:%.*]], label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[M:%.*]] = phi i32 [ 0, [[THEN]] ], [ [[X]], [[ELSE]] ]			; CHECK-NEXT: [[M:%.*]] = phi i32 [ 0, [[THEN]] ], [ [[X]], [[ELSE]] ]
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR]], i32 [[IV]]			; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR]], i32 [[IV]]
	; CHECK-NEXT: store i32 [[M]], i32* [[GEP]], align 4			; CHECK-NEXT: store i32 [[M]], i32* [[GEP]], align 4
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
	; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV]], 1000			; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV]], 1000
	; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP2:![0-9]+]]			; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK: exit.loopexit:			; CHECK: exit.loopexit:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: exit.1.loopexit:			; CHECK: exit.1.loopexit:
	; CHECK-NEXT: br label [[EXIT_1]]			; CHECK-NEXT: br label [[EXIT_1]]
	; CHECK: exit.1:			; CHECK: exit.1:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines

	define void @peel_exits_to_blocks_branch_to_unreachable_block_with_profile(i32* %ptr, i32 %N, i32 %x, i1 %c.1) !prof !0 {			define void @peel_exits_to_blocks_branch_to_unreachable_block_with_profile(i32* %ptr, i32 %N, i32 %x, i1 %c.1) !prof !0 {
	; CHECK-LABEL: @peel_exits_to_blocks_branch_to_unreachable_block_with_profile(			; CHECK-LABEL: @peel_exits_to_blocks_branch_to_unreachable_block_with_profile(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]			; CHECK-NEXT: br label [[LOOP_HEADER:%.*]]
	; CHECK: loop.header:			; CHECK: loop.header:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[IV_NEXT:%.]], [[LOOP_LATCH:%.]] ]
	; CHECK-NEXT: [[C:%.]] = icmp ult i32 [[IV]], [[N:%.]]			; CHECK-NEXT: [[C:%.]] = icmp ult i32 [[IV]], [[N:%.]]
	; CHECK-NEXT: br i1 [[C]], label [[THEN:%.]], label [[ELSE:%.]], !prof [[PROF3:![0-9]+]]			; CHECK-NEXT: br i1 [[C]], label [[THEN:%.]], label [[ELSE:%.]], !prof [[PROF5:![0-9]+]]
	; CHECK: then:			; CHECK: then:
	; CHECK-NEXT: br i1 [[C_1:%.]], label [[EXIT_1:%.]], label [[LOOP_LATCH]]			; CHECK-NEXT: br i1 [[C_1:%.]], label [[EXIT_1:%.]], label [[LOOP_LATCH]]
	; CHECK: else:			; CHECK: else:
	; CHECK-NEXT: [[C_2:%.]] = icmp eq i32 [[IV]], [[X:%.]]			; CHECK-NEXT: [[C_2:%.]] = icmp eq i32 [[IV]], [[X:%.]]
	; CHECK-NEXT: br i1 [[C_2]], label [[EXIT_2:%.*]], label [[LOOP_LATCH]]			; CHECK-NEXT: br i1 [[C_2]], label [[EXIT_2:%.*]], label [[LOOP_LATCH]]
	; CHECK: loop.latch:			; CHECK: loop.latch:
	; CHECK-NEXT: [[M:%.*]] = phi i32 [ 0, [[THEN]] ], [ [[X]], [[ELSE]] ]			; CHECK-NEXT: [[M:%.*]] = phi i32 [ 0, [[THEN]] ], [ [[X]], [[ELSE]] ]
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[IV]]			; CHECK-NEXT: [[GEP:%.]] = getelementptr i32, i32 [[PTR:%.*]], i32 [[IV]]
	; CHECK-NEXT: store i32 [[M]], i32* [[GEP]], align 4			; CHECK-NEXT: store i32 [[M]], i32* [[GEP]], align 4
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
	; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV_NEXT]], [[N]]			; CHECK-NEXT: [[C_3:%.*]] = icmp ult i32 [[IV_NEXT]], [[N]]
	; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT:%.*]], !prof [[PROF3]]			; CHECK-NEXT: br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT:%.*]], !prof [[PROF5]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: exit.1:			; CHECK: exit.1:
	; CHECK-NEXT: call void @foo()			; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: br label [[UNREACHABLE_TERM:%.*]]			; CHECK-NEXT: br label [[UNREACHABLE_TERM:%.*]]
	; CHECK: exit.2:			; CHECK: exit.2:
	; CHECK-NEXT: call void @bar()			; CHECK-NEXT: call void @bar()
	; CHECK-NEXT: br label [[UNREACHABLE_TERM]]			; CHECK-NEXT: br label [[UNREACHABLE_TERM]]
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines