This is an archive of the discontinued LLVM Phabricator instance.

[LoopPredication] Optimize two exits case.
AbandonedPublic

Authored by ebrevnov on Jan 30 2020, 5:29 AM.

Download Raw Diff

Details

Reviewers

reames
apilipenko
fedor.sergeev

Summary

Current implementation of loop exits optimization in LoopPredication uses minimum over all analyzable exits. For two exits case we don't actually need to take a minimum and can compare exit counts directly.

Legality: Currently for two exits case we generate if (P1) deopt, where P1:=(ec1 <= min(ec1, ec2)). Since "min(ec1, ec2) <= ec2" always holds thus if(P2) deopt, where P2:=(ec1 <= ec2) always holds if original condition holds. In other words transformed condition will always deopt if original deopted.

Profitability: If ec1<=ec2 then both P1 and P2 are true. If ec1 > ec2, then both P1 and P2 are false. That means that these two forms are identical.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 45548
Build 47465: arc lint + arc unit

Event Timeline

ebrevnov created this revision.Jan 30 2020, 5:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2020, 5:29 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B45340: Diff 241422.Jan 30 2020, 5:36 AM

Update

Harbormaster completed remote builds in B45415: Diff 241639.Jan 30 2020, 11:11 PM

Another update

Yet another update

Harbormaster completed remote builds in B45547: Diff 241969.Feb 2 2020, 10:08 PM

Harbormaster completed remote builds in B45548: Diff 241970.

ebrevnov retitled this revision from [WIP][LoopPredication] Optimize two exits case. to [LoopPredication] Optimize two exits case..Feb 2 2020, 10:37 PM

ebrevnov edited the summary of this revision. (Show Details)

ebrevnov added reviewers: reames, apilipenko, fedor.sergeev.

If I'm reading the description and code correctly, you're basically trying to avoid generating the construct "a == umin(a, b)" right? If so, what's the problem with generating that? I would expect other transform passes (such as instcombine) to very happily rip that apart into the component pieces. In fact, I see in instcombine the transform "foldICmpWithMinMax" which appears to do exactly that. Why isn't that sufficient?

In fact, I took the predicate-exits.ll test file, ran it through loop-pred and then instcombine, and I see exactly the simplification expected. This seems to result in the same form as your change, so why do we need the complexity here?

This revision now requires changes to proceed.Feb 25 2020, 10:04 AM

It turned out that in the targeted case there is the following expression min(a,b) = min(min(a,c), b) where rhs min is reassociated. Due to this foldICmpWithMinMax can't handle it. I've uploaded a series of 3 patched for NaryReassociate (https://reviews.llvm.org/D88285, https://reviews.llvm.org/D88286, https://reviews.llvm.org/D88287) which converts that to min(a,b) = min(min(a,b), c). With that in place foldICmpWithMinMax does its job.

Abandoned.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopPredication.cpp

76 lines

test/

Transforms/

LoopPredication/

predicate-exits.ll

60 lines

Diff 241970

llvm/lib/Transforms/Scalar/LoopPredication.cpp

Show First 20 Lines • Show All 991 Lines • ▼ Show 20 Lines	if (BasicBlock *Pred = BB->getSinglePredecessor()) {
if (parseWidenableBranch(Term, Cond, WC, IfTrueBB, IfFalseBB) &&		if (parseWidenableBranch(Term, Cond, WC, IfTrueBB, IfFalseBB) &&
IfTrueBB == BB)		IfTrueBB == BB)
return cast<BranchInst>(Term);		return cast<BranchInst>(Term);
}		}
return nullptr;		return nullptr;
}		}

/// Return the minimum of all analyzeable exit counts. This is an upper bound		/// Return the minimum of all analyzeable exit counts. This is an upper bound
/// on the actual exit count. If there are not at least two analyzeable exits,		/// on the actual exit count.
/// returns SCEVCouldNotCompute.		static const SCEV *getMinAnalyzeableBackedgeTakenCount(
static const SCEV *getMinAnalyzeableBackedgeTakenCount(ScalarEvolution &SE,		ScalarEvolution &SE, Loop *L,
DominatorTree &DT,		SmallVectorImpl<BasicBlock *> &AnalyzableExitingBlocks) {
Loop *L) {
SmallVector<BasicBlock *, 16> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);

SmallVector<const SCEV *, 4> ExitCounts;		SmallVector<const SCEV *, 4> ExitCounts;
for (BasicBlock *ExitingBB : ExitingBlocks) {		for (BasicBlock *ExitingBB : AnalyzableExitingBlocks) {
const SCEV *ExitCount = SE.getExitCount(L, ExitingBB);		const SCEV *ExitCount = SE.getExitCount(L, ExitingBB);
if (isa<SCEVCouldNotCompute>(ExitCount))		assert(!isa<SCEVCouldNotCompute>(ExitCount) &&
continue;		"Expected analyzable exits only");
assert(DT.dominates(ExitingBB, L->getLoopLatch()) &&
"We should only have known counts for exiting blocks that "
"dominate latch!");
ExitCounts.push_back(ExitCount);		ExitCounts.push_back(ExitCount);
}		}
if (ExitCounts.size() < 2)		assert(!ExitCounts.empty() && "No analyzable exits?");
return SE.getCouldNotCompute();
return SE.getUMinFromMismatchedTypes(ExitCounts);		return SE.getUMinFromMismatchedTypes(ExitCounts);
}		}

/// This implements an analogous, but entirely distinct transform from the main		/// This implements an analogous, but entirely distinct transform from the main
/// loop predication transform. This one is phrased in terms of using a		/// loop predication transform. This one is phrased in terms of using a
/// widenable branch outside the loop to allow us to simplify loop exits in a		/// widenable branch outside the loop to allow us to simplify loop exits in a
/// following loop. This is close in spirit to the IndVarSimplify transform		/// following loop. This is close in spirit to the IndVarSimplify transform
/// of the same name, but is materially different widening loosens legality		/// of the same name, but is materially different widening loosens legality
Show All 13 Lines	bool LoopPredication::predicateLoopExits(Loop *L, SCEVExpander &Rewriter) {
// imply flags on the expression being hoisted and inserting new uses (flags		// imply flags on the expression being hoisted and inserting new uses (flags
// are only correct for current uses). The result is that we may be		// are only correct for current uses). The result is that we may be
// inserting a branch on the value which can be either poison or undef. In		// inserting a branch on the value which can be either poison or undef. In
// this case, the branch can legally go either way; we just need to avoid		// this case, the branch can legally go either way; we just need to avoid
// introducing UB. This is achieved through the use of the freeze		// introducing UB. This is achieved through the use of the freeze
// instruction.		// instruction.

SmallVector<BasicBlock *, 16> ExitingBlocks;		SmallVector<BasicBlock *, 16> ExitingBlocks;
		SmallVector<BasicBlock *, 4> AnalyzableExitingBlocks;
L->getExitingBlocks(ExitingBlocks);		L->getExitingBlocks(ExitingBlocks);

if (ExitingBlocks.empty())		if (ExitingBlocks.empty())
return false; // Nothing to do.		return false; // Nothing to do.

auto *Latch = L->getLoopLatch();		auto *Latch = L->getLoopLatch();
if (!Latch)		if (!Latch)
return false;		return false;
Show All 9 Lines	bool LoopPredication::predicateLoopExits(Loop *L, SCEVExpander &Rewriter) {
// At this point, we have found an analyzeable latch, and a widenable		// At this point, we have found an analyzeable latch, and a widenable
// condition above the loop. If we have a widenable exit within the loop		// condition above the loop. If we have a widenable exit within the loop
// (for which we can't compute exit counts), drop the ability to further		// (for which we can't compute exit counts), drop the ability to further
// widen so that we gain ability to analyze it's exit count and perform this		// widen so that we gain ability to analyze it's exit count and perform this
// transform. TODO: It'd be nice to know for sure the exit became		// transform. TODO: It'd be nice to know for sure the exit became
// analyzeable after dropping widenability.		// analyzeable after dropping widenability.
{		{
bool Invalidate = false;		bool Invalidate = false;

for (auto *ExitingBB : ExitingBlocks) {		for (auto *ExitingBB : ExitingBlocks) {
if (LI->getLoopFor(ExitingBB) != L)		if (LI->getLoopFor(ExitingBB) != L)
continue;		continue;

auto *BI = dyn_cast<BranchInst>(ExitingBB->getTerminator());		auto *BI = dyn_cast<BranchInst>(ExitingBB->getTerminator());
if (!BI)		if (!BI)
continue;		continue;

Use Cond, WC;		Use Cond, WC;
BasicBlock IfTrueBB, IfFalseBB;		BasicBlock IfTrueBB, IfFalseBB;
if (parseWidenableBranch(BI, Cond, WC, IfTrueBB, IfFalseBB) &&		if (parseWidenableBranch(BI, Cond, WC, IfTrueBB, IfFalseBB) &&
L->contains(IfTrueBB)) {		L->contains(IfTrueBB)) {
WC->set(ConstantInt::getTrue(IfTrueBB->getContext()));		WC->set(ConstantInt::getTrue(IfTrueBB->getContext()));
Invalidate = true;		Invalidate = true;
}		}
}		}
if (Invalidate)		if (Invalidate)
SE->forgetLoop(L);		SE->forgetLoop(L);
}		}

		// Collect all analyzable exits.
		for (auto *ExitingBB : ExitingBlocks) {
		const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
		if (isa<SCEVCouldNotCompute>(ExitCount))
		continue;
		assert(DT->dominates(ExitingBB, Latch) &&
		"We should only have known counts for exiting blocks that "
		"dominate latch!");
		AnalyzableExitingBlocks.push_back(ExitingBB);
		}

		// Bail out if there are less than two analyzable exits.
		if (AnalyzableExitingBlocks.size() < 2)
		return false;

		const SCEV *MinEC = nullptr;
		bool IsTwoAnalyzableExits = false;
		if (AnalyzableExitingBlocks.size() == 2) {
		MinEC = SE->getExitCount(L, Latch);
		IsTwoAnalyzableExits = true;
		} else
// The use of umin(all analyzeable exits) instead of latch is subtle, but		// The use of umin(all analyzeable exits) instead of latch is subtle, but
// important for profitability. We may have a loop which hasn't been fully		// important for profitability. We may have a loop which hasn't been fully
// canonicalized just yet. If the exit we chose to widen is provably never		// canonicalized just yet. If the exit we chose to widen is provably never
// taken, we want the widened form to also be provably never taken. We		// taken, we want the widened form to also be provably never taken. We
// can't guarantee this as a current unanalyzeable exit may later become		// can't guarantee this as a current unanalyzeable exit may later become
// analyzeable, but we can at least avoid the obvious cases.		// analyzeable, but we can at least avoid the obvious cases.
const SCEV MinEC = getMinAnalyzeableBackedgeTakenCount(SE, *DT, L);		MinEC =
		getMinAnalyzeableBackedgeTakenCount(*SE, L, AnalyzableExitingBlocks);

if (isa<SCEVCouldNotCompute>(MinEC) \|\| MinEC->getType()->isPointerTy() \|\|		if (isa<SCEVCouldNotCompute>(MinEC) \|\| MinEC->getType()->isPointerTy() \|\|
!SE->isLoopInvariant(MinEC, L) \|\|		!SE->isLoopInvariant(MinEC, L) \|\|
!isSafeToExpandAt(MinEC, WidenableBR, *SE))		!isSafeToExpandAt(MinEC, WidenableBR, *SE))
return false;		return false;

// Subtlety: We need to avoid inserting additional uses of the WC. We know		// Subtlety: We need to avoid inserting additional uses of the WC. We know
// that it can only have one transitive use at the moment, and thus moving		// that it can only have one transitive use at the moment, and thus moving
// that use to just before the branch and inserting code before it and then		// that use to just before the branch and inserting code before it and then
// modifying the operand is legal.		// modifying the operand is legal.
auto *IP = cast<Instruction>(WidenableBR->getCondition());		auto *IP = cast<Instruction>(WidenableBR->getCondition());
IP->moveBefore(WidenableBR);		IP->moveBefore(WidenableBR);
Rewriter.setInsertPoint(IP);		Rewriter.setInsertPoint(IP);
IRBuilder<> B(IP);		IRBuilder<> B(IP);

bool Changed = false;		bool Changed = false;
Value *MinECV = nullptr; // lazily generated if needed		Value *MinECV = nullptr; // lazily generated if needed
for (BasicBlock *ExitingBB : ExitingBlocks) {		for (BasicBlock *ExitingBB : AnalyzableExitingBlocks) {
		// There is no point in optimizing the latch in case of two exits. If we
		// do we will be deopting unconditionally.
		if (IsTwoAnalyzableExits && ExitingBB == Latch)
		continue;

// If our exiting block exits multiple loops, we can only rewrite the		// If our exiting block exits multiple loops, we can only rewrite the
// innermost one. Otherwise, we're changing how many times the innermost		// innermost one. Otherwise, we're changing how many times the innermost
// loop runs before it exits.		// loop runs before it exits.
if (LI->getLoopFor(ExitingBB) != L)		if (LI->getLoopFor(ExitingBB) != L)
continue;		continue;

// Can't rewrite non-branch yet.		// Can't rewrite non-branch yet.
auto *BI = dyn_cast<BranchInst>(ExitingBB->getTerminator());		auto *BI = dyn_cast<BranchInst>(ExitingBB->getTerminator());
if (!BI)		if (!BI)
continue;		continue;

// If already constant, nothing to do.		// If already constant, nothing to do.
if (isa<Constant>(BI->getCondition()))		if (isa<Constant>(BI->getCondition()))
continue;		continue;

const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);		const SCEV *ExitCount = SE->getExitCount(L, ExitingBB);
if (isa<SCEVCouldNotCompute>(ExitCount) \|\|		assert(!isa<SCEVCouldNotCompute>(ExitCount) &&
ExitCount->getType()->isPointerTy() \|\|		"Only analyzable exits are expected");
		if (ExitCount->getType()->isPointerTy() \|\|
!isSafeToExpandAt(ExitCount, WidenableBR, *SE))		!isSafeToExpandAt(ExitCount, WidenableBR, *SE))
continue;		continue;

const bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));		const bool ExitIfTrue = !L->contains(*succ_begin(ExitingBB));
BasicBlock *ExitBB = BI->getSuccessor(ExitIfTrue ? 0 : 1);		BasicBlock *ExitBB = BI->getSuccessor(ExitIfTrue ? 0 : 1);
if (!ExitBB->getPostdominatingDeoptimizeCall())		if (!ExitBB->getPostdominatingDeoptimizeCall())
continue;		continue;

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopPredication/predicate-exits.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -loop-predication -S \| FileCheck %s			; RUN: opt < %s -loop-predication -S \| FileCheck %s

	declare void @prevent_merging()			declare void @prevent_merging()

	; Base case - with side effects in loop			; Base case - with side effects in loop
	define i32 @test1(i32* %array, i32 %length, i32 %n, i1 %cond_0) {			define i32 @test1(i32* %array, i32 %length, i32 %n, i1 %cond_0) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()			; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
	; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1
	; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1			; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
	; CHECK-NEXT: [[TMP2:%.]] = icmp ult i32 [[LENGTH:%.]], [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = icmp ugt i32 [[LENGTH:%.]], [[TMP1]]
	; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]			; CHECK-NEXT: [[TMP4:%.]] = and i1 [[TMP3]], [[COND_0:%.]]
	; CHECK-NEXT: [[TMP4:%.*]] = freeze i1 [[TMP3]]			; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[TMP4]], [[WIDENABLE_COND]]
	; CHECK-NEXT: [[TMP5:%.]] = and i1 [[TMP4]], [[COND_0:%.]]
	; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[TMP5]], [[WIDENABLE_COND]]
	; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0			; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
	; CHECK: deopt:			; CHECK: deopt:
	; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]			; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
	; CHECK-NEXT: ret i32 [[DEOPTRET]]			; CHECK-NEXT: ret i32 [[DEOPTRET]]
	; CHECK: loop.preheader:			; CHECK: loop.preheader:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines

	define i32 @test_non_canonical(i32* %array, i32 %length, i1 %cond_0) {			define i32 @test_non_canonical(i32* %array, i32 %length, i1 %cond_0) {
	; CHECK-LABEL: @test_non_canonical(			; CHECK-LABEL: @test_non_canonical(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()			; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
	; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[LENGTH:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[LENGTH:%.]], 1
	; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[LENGTH]], i32 1			; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[LENGTH]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
	; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[LENGTH]], [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt i32 [[LENGTH]], [[TMP1]]
	; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]			; CHECK-NEXT: [[TMP4:%.]] = and i1 [[TMP3]], [[COND_0:%.]]
	; CHECK-NEXT: [[TMP4:%.*]] = freeze i1 [[TMP3]]			; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[TMP4]], [[WIDENABLE_COND]]
	; CHECK-NEXT: [[TMP5:%.]] = and i1 [[TMP4]], [[COND_0:%.]]
	; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[TMP5]], [[WIDENABLE_COND]]
	; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0			; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
	; CHECK: deopt:			; CHECK: deopt:
	; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]			; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
	; CHECK-NEXT: ret i32 [[DEOPTRET]]			; CHECK-NEXT: ret i32 [[DEOPTRET]]
	; CHECK: loop.preheader:			; CHECK: loop.preheader:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
	▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines

	define i32 @test_unanalyzeable_exit2(i32* %array, i32 %length, i32 %n, i1 %cond_0) {			define i32 @test_unanalyzeable_exit2(i32* %array, i32 %length, i32 %n, i1 %cond_0) {
	; CHECK-LABEL: @test_unanalyzeable_exit2(			; CHECK-LABEL: @test_unanalyzeable_exit2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()			; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
	; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1
	; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1			; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
	; CHECK-NEXT: [[TMP2:%.]] = icmp ult i32 [[LENGTH:%.]], [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = icmp ugt i32 [[LENGTH:%.]], [[TMP1]]
	; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]			; CHECK-NEXT: [[TMP4:%.]] = and i1 [[TMP3]], [[COND_0:%.]]
	; CHECK-NEXT: [[TMP4:%.*]] = freeze i1 [[TMP3]]			; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[TMP4]], [[WIDENABLE_COND]]
	; CHECK-NEXT: [[TMP5:%.]] = and i1 [[TMP4]], [[COND_0:%.]]
	; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[TMP5]], [[WIDENABLE_COND]]
	; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0			; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
	; CHECK: deopt:			; CHECK: deopt:
	; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]			; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
	; CHECK-NEXT: ret i32 [[DEOPTRET]]			; CHECK-NEXT: ret i32 [[DEOPTRET]]
	; CHECK: loop.preheader:			; CHECK: loop.preheader:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED2:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED2:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
	▲ Show 20 Lines • Show All 307 Lines • ▼ Show 20 Lines
	;; being applied. Make sure we can handle that form.			;; being applied. Make sure we can handle that form.
	define i32 @unswitch_exit_form(i32* %array, i32 %length, i32 %n, i1 %cond_0) {			define i32 @unswitch_exit_form(i32* %array, i32 %length, i32 %n, i1 %cond_0) {
	; CHECK-LABEL: @unswitch_exit_form(			; CHECK-LABEL: @unswitch_exit_form(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()			; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
	; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1
	; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1			; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
	; CHECK-NEXT: [[TMP2:%.]] = icmp ult i32 [[LENGTH:%.]], [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = icmp ugt i32 [[LENGTH:%.]], [[TMP1]]
	; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]			; CHECK-NEXT: [[TMP4:%.]] = and i1 [[TMP3]], [[COND_0:%.]]
	; CHECK-NEXT: [[TMP4:%.*]] = freeze i1 [[TMP3]]			; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[TMP4]], [[WIDENABLE_COND]]
	; CHECK-NEXT: [[TMP5:%.]] = and i1 [[TMP4]], [[COND_0:%.]]
	; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[TMP5]], [[WIDENABLE_COND]]
	; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0			; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
	; CHECK: deopt.loopexit:			; CHECK: deopt.loopexit:
	; CHECK-NEXT: br label [[DEOPT]]			; CHECK-NEXT: br label [[DEOPT]]
	; CHECK: deopt:			; CHECK: deopt:
	; CHECK-NEXT: [[PHI:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[DEOPT_LOOPEXIT:%.*]] ]			; CHECK-NEXT: [[PHI:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[DEOPT_LOOPEXIT:%.*]] ]
	; CHECK-NEXT: call void @unknown()			; CHECK-NEXT: call void @unknown()
	; CHECK-NEXT: br label [[ACTUAL_DEOPT:%.*]]			; CHECK-NEXT: br label [[ACTUAL_DEOPT:%.*]]
	; CHECK: actual_deopt:			; CHECK: actual_deopt:
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

	define i32 @swapped_wb(i32* %array, i32 %length, i32 %n, i1 %cond_0) {			define i32 @swapped_wb(i32* %array, i32 %length, i32 %n, i1 %cond_0) {
	; CHECK-LABEL: @swapped_wb(			; CHECK-LABEL: @swapped_wb(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()			; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
	; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1
	; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1			; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
	; CHECK-NEXT: [[TMP2:%.]] = icmp ult i32 [[LENGTH:%.]], [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = icmp ugt i32 [[LENGTH:%.]], [[TMP1]]
	; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]			; CHECK-NEXT: [[TMP4:%.]] = and i1 [[TMP3]], [[COND_0:%.]]
	; CHECK-NEXT: [[TMP4:%.*]] = freeze i1 [[TMP3]]			; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[WIDENABLE_COND]], [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.]] = and i1 [[TMP4]], [[COND_0:%.]]
	; CHECK-NEXT: [[EXIPLICIT_GUARD_COND:%.*]] = and i1 [[WIDENABLE_COND]], [[TMP5]]
	; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0			; CHECK-NEXT: br i1 [[EXIPLICIT_GUARD_COND]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
	; CHECK: deopt:			; CHECK: deopt:
	; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]			; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
	; CHECK-NEXT: ret i32 [[DEOPTRET]]			; CHECK-NEXT: ret i32 [[DEOPTRET]]
	; CHECK: loop.preheader:			; CHECK: loop.preheader:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	}			}

	define i32 @trivial_wb(i32* %array, i32 %length, i32 %n) {			define i32 @trivial_wb(i32* %array, i32 %length, i32 %n) {
	; CHECK-LABEL: @trivial_wb(			; CHECK-LABEL: @trivial_wb(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1			; CHECK-NEXT: [[TMP0:%.]] = icmp ugt i32 [[N:%.]], 1
	; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1			; CHECK-NEXT: [[UMAX:%.*]] = select i1 [[TMP0]], i32 [[N]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1			; CHECK-NEXT: [[TMP1:%.*]] = add i32 [[UMAX]], -1
	; CHECK-NEXT: [[TMP2:%.]] = icmp ult i32 [[LENGTH:%.]], [[TMP1]]			; CHECK-NEXT: [[TMP2:%.]] = icmp ugt i32 [[LENGTH:%.]], [[TMP1]]
	; CHECK-NEXT: [[UMIN:%.*]] = select i1 [[TMP2]], i32 [[LENGTH]], i32 [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = freeze i1 [[TMP2]]
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i32 [[LENGTH]], [[UMIN]]
	; CHECK-NEXT: [[TMP4:%.*]] = freeze i1 [[TMP3]]
	; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()			; CHECK-NEXT: [[WIDENABLE_COND:%.*]] = call i1 @llvm.experimental.widenable.condition()
	; CHECK-NEXT: [[TMP5:%.*]] = and i1 [[TMP4]], [[WIDENABLE_COND]]			; CHECK-NEXT: [[TMP4:%.*]] = and i1 [[TMP3]], [[WIDENABLE_COND]]
	; CHECK-NEXT: br i1 [[TMP5]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0			; CHECK-NEXT: br i1 [[TMP4]], label [[LOOP_PREHEADER:%.]], label [[DEOPT:%.]], !prof !0
	; CHECK: deopt:			; CHECK: deopt:
	; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]			; CHECK-NEXT: [[DEOPTRET:%.*]] = call i32 (...) @llvm.experimental.deoptimize.i32() [ "deopt"() ]
	; CHECK-NEXT: ret i32 [[DEOPTRET]]			; CHECK-NEXT: ret i32 [[DEOPTRET]]
	; CHECK: loop.preheader:			; CHECK: loop.preheader:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[LOOP_ACC:%.]] = phi i32 [ [[LOOP_ACC_NEXT:%.]], [[GUARDED:%.*]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[I:%.]] = phi i32 [ [[I_NEXT:%.]], [[GUARDED]] ], [ 0, [[LOOP_PREHEADER]] ]
	▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines