This is an archive of the discontinued LLVM Phabricator instance.

[LoopPred/WC] Use a dominating widenable condition to remove analyze loop exits
ClosedPublic

Authored by reames on Nov 4 2019, 3:34 PM.

Download Raw Diff

Details

Reviewers

apilipenko
skatkov
fedor.sergeev
ebrevnov

Commits

rGad5a84c88335: [LoopPred/WC] Use a dominating widenable condition to remove analyze loop exits

Summary

This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars.

The core notions of the transform are as follows:

If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a *profitability* question as to what conditions to fold into the widenable branch.
To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or, widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities.
Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold.

Diff Detail

Event Timeline

reames created this revision.Nov 4 2019, 3:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2019, 3:34 PM

Herald added subscribers: bollu, hiraditya, mcrosier. · View Herald Transcript

Make the code a bit more explicit about checking property 3 in the review description.

reames mentioned this in D69454: [WIP] a potential approach to widenable condition in IndVar's loop pred.Nov 4 2019, 3:39 PM

reames mentioned this in D69465: [WIP] An alternate approach to widening exit conditions using widenable conditions.

simoll added a subscriber: simoll.Nov 5 2019, 12:25 AM

ping

In general the code looks fine to me. What is not clear how it is intended to work on real case scenarios. On the one hand we rely on WC having explicit form in the IR (ExitBB->getTerminatingDeoptimizeCall()) on the other hand we skip optimizing exits over WC since they are not analyzable. Are you going to introduce that support in the next patch or I'm missing something here?

llvm/lib/Transforms/Scalar/LoopPredication.cpp
1025	is closer is->is closer in
1083	Ill formatted comment...not trivial to understand either...I would suggest to remove it....or improve...
llvm/test/Transforms/LoopPredication/predicate-exits.ll
685	I believe you meant this is something to handle in future, right?

fedor.sergeev added inline comments.Nov 12 2019, 5:45 AM

llvm/lib/Transforms/Scalar/LoopPredication.cpp
965	typo - WideNable
1127–1128	I have not been following this 'freeze' story, but freeze instruction is already in: commit 58acbce3def63a207b8f5a69318a99666a4aac53 Author: aqjune <aqjune@gmail.com> Date: Tue Nov 5 15:53:22 2019 +0900 [IR] Add Freeze instruction Perhaps you want to rephrase this TODO ...
1200–1206	IMO it would look more natural would you remove this if() check, leaving two guard loops on the same level as predicateLoopExits call. There should be no extra efficiency gained from this (superflous) check.

BTW, please run clang-format...I noticed some trailing spaces and miss-formatting

ebrevnov added inline comments.Nov 13 2019, 4:29 AM

llvm/lib/Transforms/Scalar/LoopPredication.cpp
1108	Would it be better to check profitability in a more general way using BPI or similar?

fedor.sergeev added inline comments.Nov 13 2019, 5:44 AM

llvm/lib/Transforms/Scalar/LoopPredication.cpp
1108	Since widenable branch jumping to deoptimize is the main intended usecase for widenable branch itself (we even show it in LangRef as a prime example of widenable.condition usage) it seems to be a very good special case to start with. BPI is tricky to use (and tricky to preserve), so it can not be the main source of truth for profitability purposes.

ebrevnov added inline comments.Nov 13 2019, 9:34 AM

llvm/lib/Transforms/Scalar/LoopPredication.cpp
1108	First I don't see anything tricky in using BPI. It is widely used across optimizer and if BPI is "incorrect" that may affect lot of places. Moreover LoopPredication has explicit dependence on BPI and won't work as expected with "broken" BPI anway. Also note that current implementation doesn't handle widenable branches because such branches are not analyzable.

fedor.sergeev added inline comments.Nov 13 2019, 1:39 PM

llvm/lib/Transforms/Scalar/LoopPredication.cpp
1108	LoopPredication has explicit dependence on BPI and won't work as expected with "broken" BPI anway. I'm not talking about "broken BPI". In new PassManager BPI is not always preserved through the loop passes, so it can be just "absent". And yes, current profitability checks do depend on BPI, yet LoopPredication will just consider loop as unconditionally profitable in absence of BPI.

Patch refresh coming to incorporate comments.

In D69830#1742086, @ebrevnov wrote:

In general the code looks fine to me. What is not clear how it is intended to work on real case scenarios. On the one hand we rely on WC having explicit form in the IR (ExitBB->getTerminatingDeoptimizeCall()) on the other hand we skip optimizing exits over WC since they are not analyzable. Are you going to introduce that support in the next patch or I'm missing something here?

I'm confused by your confusion.

This patch widens checks in the loop (not involving widenable conditions) into a check before the loop (involving a widenable condition). It does not widen widenable conditions *within* a loop.

We will hoist and unswitch branches within a loop based on widenable conditions. As such, it seems reasonable to believe widenable conditions can be usually found outside of loops.

llvm/lib/Transforms/Scalar/LoopPredication.cpp
1108	I would strongly prefer not to introduce BPI here. Having an deoptimize call is a very strong signal of cold code. BPI doesn't really have a corresponding notion of "almost definitely cold code". (I'm not opposed to trying to generalize, I just don't want to do it now. There are other parts I consider more important.)
1127–1128	It hadn't at the point I posted the patch. I will add support when I refresh patch.
1200–1206	Hm, you're right. Will do.

Address review comments.

LGTM

This revision is now accepted and ready to land.Nov 18 2019, 9:04 AM

Closed by commit rGad5a84c88335: [LoopPred/WC] Use a dominating widenable condition to remove analyze loop exits (authored by reames). · Explain WhyNov 18 2019, 11:27 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopPredication.cpp

215 lines

test/

Transforms/

LoopPredication/

predicate-exits.ll

757 lines

Diff 227793

llvm/lib/Transforms/Scalar/LoopPredication.cpp

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	struct LoopICmp {
void dump() {		void dump() {
dbgs() << "LoopICmp Pred = " << Pred << ", IV = " << *IV		dbgs() << "LoopICmp Pred = " << Pred << ", IV = " << *IV
<< ", Limit = " << *Limit << "\n";		<< ", Limit = " << *Limit << "\n";
}		}
};		};

class LoopPredication {		class LoopPredication {
AliasAnalysis *AA;		AliasAnalysis *AA;
		DominatorTree *DT;
ScalarEvolution *SE;		ScalarEvolution *SE;
		LoopInfo *LI;
BranchProbabilityInfo *BPI;		BranchProbabilityInfo *BPI;

Loop *L;		Loop *L;
const DataLayout *DL;		const DataLayout *DL;
BasicBlock *Preheader;		BasicBlock *Preheader;
LoopICmp LatchCheck;		LoopICmp LatchCheck;

bool isSupportedStep(const SCEV* Step);		bool isSupportedStep(const SCEV* Step);
Show All 35 Lines	class LoopPredication {
bool widenGuardConditions(IntrinsicInst *II, SCEVExpander &Expander);		bool widenGuardConditions(IntrinsicInst *II, SCEVExpander &Expander);
bool widenWidenableBranchGuardConditions(BranchInst *Guard, SCEVExpander &Expander);		bool widenWidenableBranchGuardConditions(BranchInst *Guard, SCEVExpander &Expander);
// If the loop always exits through another block in the loop, we should not		// If the loop always exits through another block in the loop, we should not
// predicate based on the latch check. For example, the latch check can be a		// predicate based on the latch check. For example, the latch check can be a
// very coarse grained check and there can be more fine grained exit checks		// very coarse grained check and there can be more fine grained exit checks
// within the loop. We identify such unprofitable loops through BPI.		// within the loop. We identify such unprofitable loops through BPI.
bool isLoopProfitableToPredicate();		bool isLoopProfitableToPredicate();

		bool predicateLoopExits(Loop *L, SCEVExpander &Rewriter);

public:		public:
LoopPredication(AliasAnalysis AA, ScalarEvolution SE,		LoopPredication(AliasAnalysis AA, DominatorTree DT,
		ScalarEvolution SE, LoopInfo LI,
BranchProbabilityInfo *BPI)		BranchProbabilityInfo *BPI)
: AA(AA), SE(SE), BPI(BPI){};		: AA(AA), DT(DT), SE(SE), LI(LI), BPI(BPI) {};
bool runOnLoop(Loop *L);		bool runOnLoop(Loop *L);
};		};

class LoopPredicationLegacyPass : public LoopPass {		class LoopPredicationLegacyPass : public LoopPass {
public:		public:
static char ID;		static char ID;
LoopPredicationLegacyPass() : LoopPass(ID) {		LoopPredicationLegacyPass() : LoopPass(ID) {
initializeLoopPredicationLegacyPassPass(*PassRegistry::getPassRegistry());		initializeLoopPredicationLegacyPassPass(*PassRegistry::getPassRegistry());
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<BranchProbabilityInfoWrapperPass>();		AU.addRequired<BranchProbabilityInfoWrapperPass>();
getLoopAnalysisUsage(AU);		getLoopAnalysisUsage(AU);
}		}

bool runOnLoop(Loop *L, LPPassManager &LPM) override {		bool runOnLoop(Loop *L, LPPassManager &LPM) override {
if (skipLoop(L))		if (skipLoop(L))
return false;		return false;
auto *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();		auto *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
		auto *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
		auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
BranchProbabilityInfo &BPI =		BranchProbabilityInfo &BPI =
getAnalysis<BranchProbabilityInfoWrapperPass>().getBPI();		getAnalysis<BranchProbabilityInfoWrapperPass>().getBPI();
auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
LoopPredication LP(AA, SE, &BPI);		LoopPredication LP(AA, DT, SE, LI, &BPI);
return LP.runOnLoop(L);		return LP.runOnLoop(L);
}		}
};		};

char LoopPredicationLegacyPass::ID = 0;		char LoopPredicationLegacyPass::ID = 0;
} // end namespace llvm		} // end namespace llvm

INITIALIZE_PASS_BEGIN(LoopPredicationLegacyPass, "loop-predication",		INITIALIZE_PASS_BEGIN(LoopPredicationLegacyPass, "loop-predication",
Show All 9 Lines

PreservedAnalyses LoopPredicationPass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses LoopPredicationPass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &U) {		LPMUpdater &U) {
const auto &FAM =		const auto &FAM =
AM.getResult<FunctionAnalysisManagerLoopProxy>(L, AR).getManager();		AM.getResult<FunctionAnalysisManagerLoopProxy>(L, AR).getManager();
Function *F = L.getHeader()->getParent();		Function *F = L.getHeader()->getParent();
auto BPI = FAM.getCachedResult<BranchProbabilityAnalysis>(F);		auto BPI = FAM.getCachedResult<BranchProbabilityAnalysis>(F);
LoopPredication LP(&AR.AA, &AR.SE, BPI);		LoopPredication LP(&AR.AA, &AR.DT, &AR.SE, &AR.LI, BPI);
if (!LP.runOnLoop(&L))		if (!LP.runOnLoop(&L))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return getLoopPassPreservedAnalyses();		return getLoopPassPreservedAnalyses();
}		}

Optional<LoopICmp>		Optional<LoopICmp>
LoopPredication::parseLoopICmp(ICmpInst *ICI) {		LoopPredication::parseLoopICmp(ICmpInst *ICI) {
▲ Show 20 Lines • Show All 584 Lines • ▼ Show 20 Lines	if (ExitingBlockProbability > LatchProbabilityThreshold)
return false;		return false;
}		}
// Using BPI, we have concluded that the most probable way to exit from the		// Using BPI, we have concluded that the most probable way to exit from the
// loop is through the latch (or there's no profile information and all		// loop is through the latch (or there's no profile information and all
// exits are equally likely).		// exits are equally likely).
return true;		return true;
}		}

		/// If we can (cheaply) find a widenable branch which controls entry into the
		/// loop, return it.
		static BranchInst FindWideableTerminatorAboveLoop(Loop L, LoopInfo &LI) {
		fedor.sergeevUnsubmitted Not Done Reply Inline Actions typo - WideNable fedor.sergeev: typo - WideNable
		// Walk back through any unconditional executed blocks and see if we can find
		// a widenable condition which seems to control execution of this loop. Note
		// that we predict that maythrow calls are likely untaken and thus that it's
		// profitable to widen a branch before a maythrow call with a condition
		// afterwards even though that may cause the slow path to run in a case where
		// it wouldn't have otherwise.
		BasicBlock *BB = L->getLoopPreheader();
		if (!BB)
		return nullptr;
		do {
		if (BasicBlock *Pred = BB->getSinglePredecessor())
		if (BB == Pred->getSingleSuccessor()) {
		BB = Pred;
		continue;
		}
		break;
		} while (true);

		if (BasicBlock *Pred = BB->getSinglePredecessor()) {
		auto *Term = Pred->getTerminator();

		Value Cond, WC;
		BasicBlock IfTrueBB, IfFalseBB;
		if (parseWidenableBranch(Term, Cond, WC,
		IfTrueBB, IfFalseBB) &&
		IfTrueBB == BB)
		return cast<BranchInst>(Term);
		}
		return nullptr;
		}

		/// Return the minimum of all analyzeable exit counts. This is an upper bound
		/// on the actual exit count. If there are not at least two analyzeable exits,
		/// returns SCEVCouldNotCompute.
		static const SCEV*
		getMinAnalyzeableBackedgeTakenCount(ScalarEvolution &SE,
		DominatorTree &DT, Loop *L) {
		SmallVector<BasicBlock*, 16> ExitingBlocks;
		L->getExitingBlocks(ExitingBlocks);

		SmallVector<const SCEV*, 4> ExitCounts;
		for (BasicBlock *ExitingBB : ExitingBlocks) {
		const SCEV *ExitCount = SE.getExitCount(L, ExitingBB);
		if (isa<SCEVCouldNotCompute>(ExitCount))
		continue;
		assert(DT.dominates(ExitingBB, L->getLoopLatch()) &&
		"We should only have known counts for exiting blocks that "
		"dominate latch!");
		ExitCounts.push_back(ExitCount);
		}
		if (ExitCounts.size() < 2)
		return SE.getCouldNotCompute();
		return SE.getUMinFromMismatchedTypes(ExitCounts);
		}


		/// This implements an analogous, but entirely distinct transform from the main
		/// loop predication transform. This one is phrased in terms of using a
		/// widenable branch outside the loop to allow us to simplify loop exits in a
		/// following loop. This is closer is spirit to the IndVarSimplify transform
		ebrevnovUnsubmitted Not Done Reply Inline Actions is closer is->is closer in ebrevnov: is closer is->is closer in
		/// of the same name, but is materially different widening loosens legality
		/// sharply.
		bool LoopPredication::predicateLoopExits(Loop *L, SCEVExpander &Rewriter) {
		// The transformation performed here aims to widen a widenable condition
		// above the loop such that all analyzeable exit leading to deopt are dead.
		// It assumes that the latch is the dominant exit for profitability and that
		// exits branching to deoptimizing blocks are rarely taken. It relies on the
		// semantics of widenable expressions for legality. (i.e. being able to fall
		// down the widenable path spuriously allows us to ignore exit order,
		// unanalyzeable exits, side effects, exceptional exits, and other challenges
		// which restrict the applicability of the non-WC based version of this
		// transform in IndVarSimplify.)
		//
		// NOTE ON POISON/UNDEF - There's a latent problem around undef/poison here.
		// We're hoisting an expression above guards which may imply flags on the
		// expression being hoisted and inserting new uses (flags are only correct
		// for current uses). The result is that we may be inserting a branch on the
		// value which can be either poison or undef. This is exactly the same
		// problem which exists in unswitch. The long term solution is the same: the
		// new freeze instruction. In the meantime, this relies on the same
		// conservatism in the optimizer that unswitch does for correctness.

		SmallVector<BasicBlock*, 16> ExitingBlocks;
		L->getExitingBlocks(ExitingBlocks);

		if (ExitingBlocks.empty())
		return false; // Nothing to do.

		/* nullable / auto Latch = L->getLoopLatch();

		auto WidenableBR = FindWideableTerminatorAboveLoop(L, LI);
		if (!WidenableBR)
		return false;

		// The use of umin(all analyzeable exits) instead of latch is subtle, but
		// important for profitability. We may have a loop which hasn't been fully
		// canonicalized just yet. If the exit we chose to widen is provably never
		// taken, we want the widened form to also be provably never taken. We
		// can't guarantee this as a current unanalyzeable exit may later become
		// analyzeable, but we can at least avoid the obvious cases.
		const SCEV MinEC = getMinAnalyzeableBackedgeTakenCount(SE, *DT, L);
		if (isa<SCEVCouldNotCompute>(MinEC) \|\|
		MinEC->getType()->isPointerTy() \|\|
		!SE->isLoopInvariant(MinEC, L) \|\|
		!isSafeToExpandAt(MinEC, WidenableBR, *SE))
		return false;

		Rewriter.setInsertPoint(WidenableBR);
		IRBuilder<> B(WidenableBR);

		bool Changed = false;
		Value *MinECV = nullptr; //lazy generated if needed
		for (BasicBlock *ExitingBB : ExitingBlocks) {
		// If our exiting block exits multiple loops, we can only rewrite the
		// innermost one. Otherwise, we're changing how many times the innermost
		// loop runs before it exits.
		if (LI->getLoopFor(ExitingBB) != L)
		continue;
		ebrevnovUnsubmitted Not Done Reply Inline Actions Ill formatted comment...not trivial to understand either...I would suggest to remove it....or improve... ebrevnov: Ill formatted comment...not trivial to understand either...I would suggest to remove it....or…

		// Can't rewrite non-branch yet.
		auto *BI = dyn_cast<BranchInst>(ExitingBB->getTerminator());
		if (!BI)
		continue;

		// If already constant, nothing to do.
		if (isa<Constant>(BI->getCondition()))
		continue;

		const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
		if (isa<SCEVCouldNotCompute>(ExitCount) \|\|
		ExitCount->getType()->isPointerTy() \|\|
		!isSafeToExpandAt(ExitCount, WidenableBR, *SE))
		continue;

		const bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));
		BasicBlock *ExitBB = BI->getSuccessor(ExitIfTrue ? 0 : 1);
		if (!ExitBB->getTerminatingDeoptimizeCall())
		// Profitability: indicator of rarely/never taken exit
		continue;

		// If we found a widenable exit condition, do two things:
		// 1) fold the widened exit test into the widenable condition
		// 2) fold the branch to untaken - avoids infinite looping
		ebrevnovUnsubmitted Not Done Reply Inline Actions Would it be better to check profitability in a more general way using BPI or similar? ebrevnov: Would it be better to check profitability in a more general way using BPI or similar?
		fedor.sergeevUnsubmitted Not Done Reply Inline Actions Since widenable branch jumping to deoptimize is the main intended usecase for widenable branch itself (we even show it in LangRef as a prime example of widenable.condition usage) it seems to be a very good special case to start with. BPI is tricky to use (and tricky to preserve), so it can not be the main source of truth for profitability purposes. fedor.sergeev: Since widenable branch jumping to deoptimize is the main intended usecase for widenable branch…
		ebrevnovUnsubmitted Not Done Reply Inline Actions First I don't see anything tricky in using BPI. It is widely used across optimizer and if BPI is "incorrect" that may affect lot of places. Moreover LoopPredication has explicit dependence on BPI and won't work as expected with "broken" BPI anway. Also note that current implementation doesn't handle widenable branches because such branches are not analyzable. ebrevnov: First I don't see anything tricky in using BPI. It is widely used across optimizer and if BPI…
		fedor.sergeevUnsubmitted Not Done Reply Inline Actions LoopPredication has explicit dependence on BPI and won't work as expected with "broken" BPI anway. I'm not talking about "broken BPI". In new PassManager BPI is not always preserved through the loop passes, so it can be just "absent". And yes, current profitability checks do depend on BPI, yet LoopPredication will just consider loop as unconditionally profitable in absence of BPI. fedor.sergeev: > LoopPredication has explicit dependence on BPI and won't work as expected with "broken" BPI…
		reamesAuthorUnsubmitted Done Reply Inline Actions I would strongly prefer not to introduce BPI here. Having an deoptimize call is a very strong signal of cold code. BPI doesn't really have a corresponding notion of "almost definitely cold code". (I'm not opposed to trying to generalize, I just don't want to do it now. There are other parts I consider more important.) reames: I would strongly prefer not to introduce BPI here. Having an deoptimize call is a very strong…

		Value *ECV = Rewriter.expandCodeFor(ExitCount);
		if (!MinECV)
		MinECV = Rewriter.expandCodeFor(MinEC);
		Value *RHS = MinECV;
		if (ECV->getType() != RHS->getType()) {
		Type *WiderTy = SE->getWiderType(ECV->getType(), RHS->getType());
		ECV = B.CreateZExt(ECV, WiderTy);
		RHS = B.CreateZExt(RHS, WiderTy);
		}
		assert(!Latch \|\| DT->dominates(ExitingBB, Latch));
		Value *NewCond = B.CreateICmp(ICmpInst::ICMP_UGT, ECV, RHS);
		// TODO: NewCond = B.CreateFreeze(NewCond) once freeze inst submitted.
		// See NOTE ON POISON/UNDEF above for context.

		Value Cond, WC;
		BasicBlock IfTrueBB, IfFalseBB;
		bool Success = parseWidenableBranch(WidenableBR, Cond, WC,
		IfTrueBB, IfFalseBB);
		assert(Success && "implied from above");
		fedor.sergeevUnsubmitted Not Done Reply Inline Actions I have not been following this 'freeze' story, but freeze instruction is already in: commit 58acbce3def63a207b8f5a69318a99666a4aac53 Author: aqjune <aqjune@gmail.com> Date: Tue Nov 5 15:53:22 2019 +0900 [IR] Add Freeze instruction Perhaps you want to rephrase this TODO ... fedor.sergeev: I have not been following this 'freeze' story, but freeze instruction is already in: ``` commit…
		reamesAuthorUnsubmitted Done Reply Inline Actions It hadn't at the point I posted the patch. I will add support when I refresh patch. reames: It hadn't at the point I posted the patch. I will add support when I refresh patch.
		NewCond = B.CreateAnd(B.CreateAnd(NewCond, Cond), WC);
		WidenableBR->setCondition(NewCond);

		Value *OldCond = BI->getCondition();
		BI->setCondition(ConstantInt::get(OldCond->getType(), !ExitIfTrue));
		Changed = true;
		}

		if (Changed)
		// We just mutated a bunch of loop exits changing there exit counts
		// widely. We need to force recomputation of the exit counts given these
		// changes. Note that all of the inserted exits are never taken, and
		// should be removed next time the CFG is modified.
		SE->forgetLoop(L);
		return Changed;
		}


bool LoopPredication::runOnLoop(Loop *Loop) {		bool LoopPredication::runOnLoop(Loop *Loop) {
L = Loop;		L = Loop;

LLVM_DEBUG(dbgs() << "Analyzing ");		LLVM_DEBUG(dbgs() << "Analyzing ");
LLVM_DEBUG(L->dump());		LLVM_DEBUG(L->dump());

Module *M = L->getHeader()->getModule();		Module *M = L->getHeader()->getModule();

Show All 35 Lines	for (auto &I : *BB)
if (isGuard(&I))		if (isGuard(&I))
Guards.push_back(cast<IntrinsicInst>(&I));		Guards.push_back(cast<IntrinsicInst>(&I));
if (PredicateWidenableBranchGuards &&		if (PredicateWidenableBranchGuards &&
isGuardAsWidenableBranch(BB->getTerminator()))		isGuardAsWidenableBranch(BB->getTerminator()))
GuardsAsWidenableBranches.push_back(		GuardsAsWidenableBranches.push_back(
cast<BranchInst>(BB->getTerminator()));		cast<BranchInst>(BB->getTerminator()));
}		}

if (Guards.empty() && GuardsAsWidenableBranches.empty())
return false;

SCEVExpander Expander(SE, DL, "loop-predication");		SCEVExpander Expander(SE, DL, "loop-predication");

bool Changed = false;		bool Changed = false;
		if (!Guards.empty() \|\| !GuardsAsWidenableBranches.empty()) {
for (auto *Guard : Guards)		for (auto *Guard : Guards)
Changed \|= widenGuardConditions(Guard, Expander);		Changed \|= widenGuardConditions(Guard, Expander);
for (auto *Guard : GuardsAsWidenableBranches)		for (auto *Guard : GuardsAsWidenableBranches)
Changed \|= widenWidenableBranchGuardConditions(Guard, Expander);		Changed \|= widenWidenableBranchGuardConditions(Guard, Expander);
		}
		Changed \|= predicateLoopExits(L, Expander);
		fedor.sergeevUnsubmitted Not Done Reply Inline Actions IMO it would look more natural would you remove this if() check, leaving two guard loops on the same level as predicateLoopExits call. There should be no extra efficiency gained from this (superflous) check. fedor.sergeev: IMO it would look more natural would you remove this if() check, leaving two guard loops on the…
		reamesAuthorUnsubmitted Done Reply Inline Actions Hm, you're right. Will do. reames: Hm, you're right. Will do.
return Changed;		return Changed;
}		}

llvm/test/Transforms/LoopPredication/predicate-exits.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -loop-predication -S \| FileCheck %s

				declare void @prevent_merging()

				; Base case - with side effects in loop
				define i32 @test1(i32* %array, i32 %length, i32 %n, i1 %cond_0) {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1
				; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
				; CHECK-NEXT: [[TMP2:%.]] = icmp ult i32 [[LENGTH:%.]], [[TMP1]]
				; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]
				; CHECK-NEXT: [[TMP4:%.*]] = and i1 [[TMP3]], [[COND_0]]
				; CHECK-NEXT: [[TMP5:%.*]] = and i1 [[TMP4]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[TMP5]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i32 [[I]], [[LENGTH]]
				; CHECK-NEXT: br i1 true, label [[GUARDED]], label [[DEOPT2:%.*]], !prof !0
				; CHECK: deopt2:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET2:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET2]]
				; CHECK: guarded:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: [[CONTINUE:%.*]] = icmp ult i32 [[I_NEXT]], [[N]]
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[RESULT:%.*]] = phi i32 [ [[LOOP_ACC_NEXT]], [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[RESULT]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded ], [ 0, %loop.preheader ]
				call void @unknown()
				%within.bounds = icmp ult i32 %i, %length
				br i1 %within.bounds, label %guarded, label %deopt2, !prof !0

				deopt2:
				call void @unknown()
				%deoptret2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret2

				guarded:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				%continue = icmp ult i32 %i.next, %n
				br i1 %continue, label %loop, label %exit

				exit:
				%result = phi i32 [ %loop.acc.next, %guarded ]
				ret i32 %result
				}



				define i32 @test_non_canonical(i32* %array, i32 %length, i1 %cond_0) {
				; CHECK-LABEL: @test_non_canonical(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[LENGTH:%.]], 1
				; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[LENGTH]], i32 1
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
				; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[LENGTH]], [[TMP1]]
				; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]
				; CHECK-NEXT: [[TMP4:%.*]] = and i1 [[TMP3]], [[COND_0]]
				; CHECK-NEXT: [[TMP5:%.*]] = and i1 [[TMP4]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[TMP5]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i32 [[I]], [[LENGTH]]
				; CHECK-NEXT: br i1 true, label [[GUARDED]], label [[DEOPT2:%.*]], !prof !0
				; CHECK: deopt2:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET2:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET2]]
				; CHECK: guarded:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: [[CONTINUE:%.*]] = icmp ult i32 [[I_NEXT]], [[LENGTH]]
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[RESULT:%.*]] = phi i32 [ [[LOOP_ACC_NEXT]], [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[RESULT]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded ], [ 0, %loop.preheader ]
				call void @unknown()
				%within.bounds = icmp ult i32 %i, %length
				br i1 %within.bounds, label %guarded, label %deopt2, !prof !0

				deopt2:
				call void @unknown()
				%deoptret2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret2

				guarded:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				%continue = icmp ult i32 %i.next, %length
				br i1 %continue, label %loop, label %exit

				exit:
				%result = phi i32 [ %loop.acc.next, %guarded ]
				ret i32 %result
				}


				define i32 @test_two_range_checks(i32* %array, i32 %length.1, i32 %length.2, i32 %n, i1 %cond_0) {
				; CHECK-LABEL: @test_two_range_checks(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: [[TMP0:%.]] = icmp ult i32 [[LENGTH_2:%.]], [[LENGTH_1:%.*]]
				; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP0]], i32 [[LENGTH_2]], i32 [[LENGTH_1]]
				; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i32 [[N:%.]], 1
				; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP1]], i32 [[N]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = add i32 [[UMAX]], -1
				; CHECK-NEXT: [[TMP3:%.*]] = icmp ult i32 [[UMIN]], [[TMP2]]
				; CHECK-NEXT: [[UMIN1:%.*]] = select i1 [[TMP3]], i32 [[UMIN]], i32 [[TMP2]]
				; CHECK-NEXT: [[TMP4:%.*]] = icmp ugt i32 [[LENGTH_1]], [[UMIN1]]
				; CHECK-NEXT: [[TMP5:%.*]] = and i1 [[TMP4]], [[COND_0]]
				; CHECK-NEXT: [[TMP6:%.*]] = and i1 [[TMP5]], [[WIDENABLE_COND]]
				; CHECK-NEXT: [[TMP7:%.*]] = icmp ugt i32 [[LENGTH_2]], [[UMIN1]]
				; CHECK-NEXT: [[TMP8:%.*]] = and i1 [[TMP7]], [[TMP5]]
				; CHECK-NEXT: [[TMP9:%.*]] = and i1 [[TMP8]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[TMP9]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED2:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED2]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i32 [[I]], [[LENGTH_1]]
				; CHECK-NEXT: br i1 true, label [[GUARDED:%.]], label [[DEOPT2:%.]], !prof !0
				; CHECK: deopt2:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET2:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET2]]
				; CHECK: guarded:
				; CHECK-NEXT: [[WITHIN_BOUNDS2:%.*]] = icmp ult i32 [[I]], [[LENGTH_2]]
				; CHECK-NEXT: br i1 true, label [[GUARDED2]], label [[DEOPT3:%.*]], !prof !0
				; CHECK: deopt3:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET3:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET3]]
				; CHECK: guarded2:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: [[CONTINUE:%.*]] = icmp ult i32 [[I_NEXT]], [[N]]
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[RESULT:%.*]] = phi i32 [ [[LOOP_ACC_NEXT]], [[GUARDED2]] ]
				; CHECK-NEXT: ret i32 [[RESULT]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded2 ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded2 ], [ 0, %loop.preheader ]
				call void @unknown()
				%within.bounds = icmp ult i32 %i, %length.1
				br i1 %within.bounds, label %guarded, label %deopt2, !prof !0

				deopt2:
				call void @unknown()
				%deoptret2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret2

				guarded:
				%within.bounds2 = icmp ult i32 %i, %length.2
				br i1 %within.bounds2, label %guarded2, label %deopt3, !prof !0

				deopt3:
				call void @unknown()
				%deoptret3 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret3

				guarded2:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				%continue = icmp ult i32 %i.next, %n
				br i1 %continue, label %loop, label %exit

				exit:
				%result = phi i32 [ %loop.acc.next, %guarded2 ]
				ret i32 %result
				}

				@G = external global i32

				define i32 @test_unanalyzeable_exit(i32* %array, i32 %length, i32 %n, i1 %cond_0) {
				; CHECK-LABEL: @test_unanalyzeable_exit(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED2:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED2]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[VOL:%.]] = load volatile i32, i32 @G
				; CHECK-NEXT: [[UNKNOWN:%.*]] = icmp eq i32 [[VOL]], 0
				; CHECK-NEXT: br i1 [[UNKNOWN]], label [[GUARDED2]], label [[DEOPT3:%.*]], !prof !0
				; CHECK: deopt3:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET3:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET3]]
				; CHECK: guarded2:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: [[CONTINUE:%.]] = icmp ult i32 [[I_NEXT]], [[N:%.]]
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[RESULT:%.*]] = phi i32 [ [[LOOP_ACC_NEXT]], [[GUARDED2]] ]
				; CHECK-NEXT: ret i32 [[RESULT]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded2 ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded2 ], [ 0, %loop.preheader ]
				call void @unknown()
				%vol = load volatile i32, i32* @G
				%unknown = icmp eq i32 %vol, 0
				br i1 %unknown, label %guarded2, label %deopt3, !prof !0

				deopt3:
				call void @unknown()
				%deoptret3 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret3

				guarded2:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				%continue = icmp ult i32 %i.next, %n
				br i1 %continue, label %loop, label %exit

				exit:
				%result = phi i32 [ %loop.acc.next, %guarded2 ]
				ret i32 %result
				}

				define i32 @test_unanalyzeable_exit2(i32* %array, i32 %length, i32 %n, i1 %cond_0) {
				; CHECK-LABEL: @test_unanalyzeable_exit2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1
				; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1
				; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
				; CHECK-NEXT: [[TMP2:%.]] = icmp ult i32 [[LENGTH:%.]], [[TMP1]]
				; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]
				; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]
				; CHECK-NEXT: [[TMP4:%.*]] = and i1 [[TMP3]], [[COND_0]]
				; CHECK-NEXT: [[TMP5:%.*]] = and i1 [[TMP4]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[TMP5]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED2:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED2]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i32 [[I]], [[LENGTH]]
				; CHECK-NEXT: br i1 true, label [[GUARDED:%.]], label [[DEOPT2:%.]], !prof !0
				; CHECK: deopt2:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET2:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET2]]
				; CHECK: guarded:
				; CHECK-NEXT: [[VOL:%.]] = load volatile i32, i32 @G
				; CHECK-NEXT: [[UNKNOWN:%.*]] = icmp eq i32 [[VOL]], 0
				; CHECK-NEXT: br i1 [[UNKNOWN]], label [[GUARDED2]], label [[DEOPT3:%.*]], !prof !0
				; CHECK: deopt3:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET3:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET3]]
				; CHECK: guarded2:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: [[CONTINUE:%.*]] = icmp ult i32 [[I_NEXT]], [[N]]
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[RESULT:%.*]] = phi i32 [ [[LOOP_ACC_NEXT]], [[GUARDED2]] ]
				; CHECK-NEXT: ret i32 [[RESULT]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded2 ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded2 ], [ 0, %loop.preheader ]
				call void @unknown()
				%within.bounds = icmp ult i32 %i, %length
				br i1 %within.bounds, label %guarded, label %deopt2, !prof !0

				deopt2:
				call void @unknown()
				%deoptret2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret2

				guarded:
				%vol = load volatile i32, i32* @G
				%unknown = icmp eq i32 %vol, 0
				br i1 %unknown, label %guarded2, label %deopt3, !prof !0

				deopt3:
				call void @unknown()
				%deoptret3 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret3

				guarded2:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				%continue = icmp ult i32 %i.next, %n
				br i1 %continue, label %loop, label %exit

				exit:
				%result = phi i32 [ %loop.acc.next, %guarded2 ]
				ret i32 %result
				}


				define i32 @test_unanalyzeable_latch(i32* %array, i32 %length, i32 %n, i1 %cond_0) {
				; CHECK-LABEL: @test_unanalyzeable_latch(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[WITHIN_BOUNDS:%.]] = icmp ult i32 [[I]], [[LENGTH:%.]]
				; CHECK-NEXT: br i1 [[WITHIN_BOUNDS]], label [[GUARDED]], label [[DEOPT2:%.*]], !prof !0
				; CHECK: deopt2:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET2:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET2]]
				; CHECK: guarded:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: [[VOL:%.]] = load volatile i32, i32 @G
				; CHECK-NEXT: [[UNKNOWN:%.*]] = icmp eq i32 [[VOL]], 0
				; CHECK-NEXT: br i1 [[UNKNOWN]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[RESULT:%.*]] = phi i32 [ [[LOOP_ACC_NEXT]], [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[RESULT]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded ], [ 0, %loop.preheader ]
				call void @unknown()
				%within.bounds = icmp ult i32 %i, %length
				br i1 %within.bounds, label %guarded, label %deopt2, !prof !0

				deopt2:
				call void @unknown()
				%deoptret2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret2

				guarded:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				%vol = load volatile i32, i32* @G
				%unknown = icmp eq i32 %vol, 0
				br i1 %unknown, label %loop, label %exit

				exit:
				%result = phi i32 [ %loop.acc.next, %guarded ]
				ret i32 %result
				}


				define i32 @provably_taken(i32* %array, i1 %cond_0) {
				; CHECK-LABEL: @provably_taken(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: [[TMP0:%.*]] = and i1 false, [[COND_0]]
				; CHECK-NEXT: [[TMP1:%.*]] = and i1 [[TMP0]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[TMP1]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i32 [[I]], 198
				; CHECK-NEXT: br i1 true, label [[GUARDED]], label [[DEOPT2:%.*]], !prof !0
				; CHECK: deopt2:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET2:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET2]]
				; CHECK: guarded:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: [[CONTINUE:%.*]] = icmp ult i32 [[I_NEXT]], 200
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[RESULT:%.*]] = phi i32 [ [[LOOP_ACC_NEXT]], [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[RESULT]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded ], [ 0, %loop.preheader ]
				call void @unknown()
				%within.bounds = icmp ult i32 %i, 198
				br i1 %within.bounds, label %guarded, label %deopt2, !prof !0

				deopt2:
				call void @unknown()
				%deoptret2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret2

				guarded:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				%continue = icmp ult i32 %i.next, 200
				br i1 %continue, label %loop, label %exit

				exit:
				%result = phi i32 [ %loop.acc.next, %guarded ]
				ret i32 %result
				}

				define i32 @provably_not_taken(i32* %array, i1 %cond_0) {
				; CHECK-LABEL: @provably_not_taken(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: [[TMP0:%.*]] = and i1 true, [[COND_0]]
				; CHECK-NEXT: [[TMP1:%.*]] = and i1 [[TMP0]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[TMP1]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[WITHIN_BOUNDS:%.*]] = icmp ult i32 [[I]], 205
				; CHECK-NEXT: br i1 true, label [[GUARDED]], label [[DEOPT2:%.*]], !prof !0
				; CHECK: deopt2:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET2:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET2]]
				; CHECK: guarded:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: [[CONTINUE:%.*]] = icmp ult i32 [[I_NEXT]], 200
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[EXIT:%.*]]
				; CHECK: exit:
				; CHECK-NEXT: [[RESULT:%.*]] = phi i32 [ [[LOOP_ACC_NEXT]], [[GUARDED]] ]
				; CHECK-NEXT: ret i32 [[RESULT]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded ], [ 0, %loop.preheader ]
				call void @unknown()
				%within.bounds = icmp ult i32 %i, 205
				br i1 %within.bounds, label %guarded, label %deopt2, !prof !0

				deopt2:
				call void @unknown()
				%deoptret2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret2

				guarded:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				%continue = icmp ult i32 %i.next, 200
				br i1 %continue, label %loop, label %exit

				exit:
				%result = phi i32 [ %loop.acc.next, %guarded ]
				ret i32 %result
				}








				; Non-latch exits can still be predicated
				define i32 @unconditional_latch(i32* %array, i32 %length, i1 %cond_0) {
				ebrevnovUnsubmitted Not Done Reply Inline Actions I believe you meant this is something to handle in future, right? ebrevnov: I believe you meant this is something to handle in future, right?
				; CHECK-LABEL: @unconditional_latch(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
				; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.]] = and i1 [[COND_0:%.]], [[WIDENABLE_COND]]
				; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
				; CHECK: deopt:
				; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET]]
				; CHECK: loop.preheader:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED]] ], [ 0, [[LOOP_PREHEADER]] ]
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[WITHIN_BOUNDS:%.]] = icmp ult i32 [[I]], [[LENGTH:%.]]
				; CHECK-NEXT: br i1 [[WITHIN_BOUNDS]], label [[GUARDED]], label [[DEOPT2:%.*]], !prof !0
				; CHECK: deopt2:
				; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[DEOPTRET2:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				; CHECK-NEXT: ret i32 [[DEOPTRET2]]
				; CHECK: guarded:
				; CHECK-NEXT: [[I_I64:%.*]] = zext i32 [[I]] to i64
				; CHECK-NEXT: [[ARRAY_I_PTR:%.]] = getelementptr inbounds i32, i32 [[ARRAY:%.*]], i64 [[I_I64]]
				; CHECK-NEXT: [[ARRAY_I:%.]] = load i32, i32 [[ARRAY_I_PTR]], align 4
				; CHECK-NEXT: store i32 0, i32* [[ARRAY_I_PTR]]
				; CHECK-NEXT: [[LOOP_ACC_NEXT]] = add i32 [[LOOP_ACC]], [[ARRAY_I]]
				; CHECK-NEXT: [[I_NEXT]] = add nuw i32 [[I]], 1
				; CHECK-NEXT: br label [[LOOP]]
				;
				entry:
				%widenable_cond = call i1 @llvm.experimental.widenable.condition()
				%exiplicit_guard_cond = and i1 %cond_0, %widenable_cond
				br i1 %exiplicit_guard_cond, label %loop.preheader, label %deopt, !prof !0

				deopt:
				%deoptret = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret

				loop.preheader:
				br label %loop

				loop:
				%loop.acc = phi i32 [ %loop.acc.next, %guarded ], [ 0, %loop.preheader ]
				%i = phi i32 [ %i.next, %guarded ], [ 0, %loop.preheader ]
				call void @unknown()
				%within.bounds = icmp ult i32 %i, %length
				br i1 %within.bounds, label %guarded, label %deopt2, !prof !0

				deopt2:
				call void @unknown()
				%deoptret2 = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
				ret i32 %deoptret2

				guarded:
				%i.i64 = zext i32 %i to i64
				%array.i.ptr = getelementptr inbounds i32, i32* %array, i64 %i.i64
				%array.i = load i32, i32* %array.i.ptr, align 4
				store i32 0, i32* %array.i.ptr
				%loop.acc.next = add i32 %loop.acc, %array.i
				%i.next = add nuw i32 %i, 1
				br label %loop
				}


				declare void @unknown()

				declare i1 @llvm.experimental.widenable.condition()
				declare i32 @llvm.experimental.deoptimize.i32(...)

				!0 = !{!"branch_weights", i32 1048576, i32 1}
				!1 = !{i32 1, i32 -2147483648}
				!2 = !{i32 0, i32 50}