This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
Analysis/
-
ValueTracking.cpp
-
Transforms/Utils/
-
Utils/
9
SimplifyCFG.cpp
-
test/Transforms/
-
Transforms/
-
InstSimplify/
-
implies.ll
-
SimplifyCFG/
-
fast-fallthrough.ll

Differential D17553

[SimplifyCFG] Speculatively flatten CFG based on profiling metadata (try 2)
AbandonedPublic

Authored by reames on Feb 23 2016, 3:00 PM.

Download Raw Diff

Details

Reviewers

sanjoy

Summary

Note: This is a corrected version of an approach previously reviewed in http://reviews.llvm.org/D13070, landed, and reverted due to a problem with control dependence and nsw flags.

The main change in this patch over the previous approach is to explicitly check for possible control dependent flags in the BB we're considering flattening which might influence the condition of the second branch. I believe this is sufficient to prevent the problematic case, but it does end up making the transform less powerful.

I also had to add a bit of logic to the implies reasoning to handle the simpler test cases I want to introduce. This logic could arguable by separated, but I kept it together to show the motivation.

Original Review Comments --

If we have a series of branches which are all unlikely to fail, we can possibly combine them into a single check on the fastpath combined with a bit of dispatch logic on the slowpath. We don't want to do this unconditionally since it requires speculating instructions past a branch, but if the profiling metadata on the branch indicates profitability, this can reduce the number of checks needed along the fast path.

The canonical example this is trying to handle is removing the second bounds check implied by the Java code: a[i] + a[i+1]. Note that it can currently only do so for really simple conditions and the values of a[i] can't be used anywhere except in the addition. (i.e. the load has to have been sunk already and not prevent speculation.)

Diff Detail

Event Timeline

reames updated this revision to Diff 48852.Feb 23 2016, 3:00 PM

reames retitled this revision from to [SimplifyCFG] Speculatively flatten CFG based on profiling metadata (try 2).

reames updated this object.

reames added a reviewer: sanjoy.

reames added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptFeb 23 2016, 3:00 PM

sanjoy requested changes to this revision.Feb 23 2016, 5:04 PM

sanjoy edited edge metadata.

sanjoy added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
2708	Nit: should be `IsRarelyUntaken`.
2734	Minor: might do this check before the check on `isSafeToSpeculativelyExecute`, since it will be a cheaper way out, if it fails.
2741	Spelling nit: "computation"
2759	I'd say this is a little risky. I think you're okay now since `isImpliedCondition` only looks at overflow flags, but if it is changed to consider, e.g., the `exact` flag on `udiv` or `inbounds` on `getelementptr` at some time in the future, this might subtly break. I'd rather split out a helper routine, `GetCombinedCond` that takes two predicates, `P` and `Q` and a location, `Loc`, and returns a predicate that is equivalent to `P && Q` at `Loc`, if it can be cheaply computed. Predicates like `5 u< L` and `6 u< L` can be combined into `6 u< L`; and predicates like `I u< L` and `(I +nuw 1) u< L` can be combined into `(I +nuw 1) u< L` if you can prove that the `nuw` holds at `Loc`, etc.

This revision now requires changes to proceed.Feb 23 2016, 5:04 PM

Responding only to the material design comment for the moment. Will address nits once the design question is resolved.

lib/Transforms/Utils/SimplifyCFG.cpp
2759	To be clear, I'm not proposing that we just check nsw/nuw. As you point out, that would be utterly unsound. Assuming I extend the logic to include exact and inbounds (which I had completely forgotten about), what do you think of the approach? (If I've forgotten other cases, please suggest them.) I'm a bit leery of introducing the separate predicate. I think that unless we're really careful here, we could end up duplicating a lot of code. Now, having said that, I'll give it a bit more thought tomorrow and see what I think after writing some prototype code.

It is probably worthwhile to have a more general transformation to reduce control dependence height by using logical/bitwise and|or operation:

br (cond1) fail1
check2:
br (cond2) fail2
label:

..

if cond2's computation can be safely control speculated, it becomes
br ( (cond1 == 0) & (cond2 == 0)) label;
dispatch:
br (cond1) fail1
br fail1
label:

Note that (cond1 == 0 & cond2 == 0) can be simplified in many cases (e.g, in the example test cases).

lib/Transforms/Utils/SimplifyCFG.cpp
2689	Should these two conditions swapped in this example or change > to < ?
2708	How about IsLikelyTaken?
2721	nit: similar.

sanjoy added inline comments.Feb 26 2016, 2:07 PM

lib/Transforms/Utils/SimplifyCFG.cpp
2759	I don't think we'll duplicate much code here, `GetCombinedCond` will just be a helper to simplify `X && Y` into something cheaper. `X` implies `Y` (without exploiting invalid no-wrap) will then just be one case where it `X && Y` can be simplified to `X`. It should also be a more natural extension point -- for instance, if we want to combine `0 < I < 20` and `10 < I < 50`, neither implies the other but their logical-and can be expressed as `10 < I < 20`.

reames abandoned this revision.Sep 24 2016, 11:41 AM

Revision Contents

Path

Size

lib/

Analysis/

ValueTracking.cpp

14 lines

Transforms/

Utils/

SimplifyCFG.cpp

140 lines

test/

Transforms/

InstSimplify/

implies.ll

47 lines

SimplifyCFG/

fast-fallthrough.ll

113 lines

Diff 48852

lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 4,126 Lines • ▼ Show 20 Lines
static bool isTruePredicate(CmpInst::Predicate Pred, Value LHS, Value RHS,		static bool isTruePredicate(CmpInst::Predicate Pred, Value LHS, Value RHS,
const DataLayout &DL, unsigned Depth,		const DataLayout &DL, unsigned Depth,
AssumptionCache AC, const Instruction CxtI,		AssumptionCache AC, const Instruction CxtI,
const DominatorTree *DT) {		const DominatorTree *DT) {
assert(!LHS->getType()->isVectorTy() && "TODO: extend to handle vectors!");		assert(!LHS->getType()->isVectorTy() && "TODO: extend to handle vectors!");
if (ICmpInst::isTrueWhenEqual(Pred) && LHS == RHS)		if (ICmpInst::isTrueWhenEqual(Pred) && LHS == RHS)
return true;		return true;

		if (Constant *CLHS = dyn_cast<Constant>(LHS))
		if (Constant *CRHS = dyn_cast<Constant>(RHS)) {
		Constant *Res = ConstantFoldCompareInstOperands(Pred, CLHS, CRHS, DL);
		return Res->isOneValue();
		}

switch (Pred) {		switch (Pred) {
default:		default:
return false;		return false;

case CmpInst::ICMP_SLE: {		case CmpInst::ICMP_SLE: {
const APInt *C;		const APInt *C;

// LHS s<= LHS +_{nsw} C if C >= 0		// LHS s<= LHS +_{nsw} C if C >= 0
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	return isTruePredicate(CmpInst::ICMP_SLE, BLHS, ALHS, DL, Depth, AC, CxtI,
DT);		DT);

case CmpInst::ICMP_ULT:		case CmpInst::ICMP_ULT:
case CmpInst::ICMP_ULE:		case CmpInst::ICMP_ULE:
return isTruePredicate(CmpInst::ICMP_ULE, BLHS, ALHS, DL, Depth, AC, CxtI,		return isTruePredicate(CmpInst::ICMP_ULE, BLHS, ALHS, DL, Depth, AC, CxtI,
DT) &&		DT) &&
isTruePredicate(CmpInst::ICMP_ULE, ARHS, BRHS, DL, Depth, AC, CxtI,		isTruePredicate(CmpInst::ICMP_ULE, ARHS, BRHS, DL, Depth, AC, CxtI,
DT);		DT);

		case CmpInst::ICMP_UGT:
		case CmpInst::ICMP_UGE:
		return isTruePredicate(CmpInst::ICMP_UGE, BLHS, ALHS, DL, Depth, AC, CxtI,
		DT) &&
		isTruePredicate(CmpInst::ICMP_UGE, ARHS, BRHS, DL, Depth, AC, CxtI,
		DT);
}		}

}		}

bool llvm::isImpliedCondition(Value LHS, Value RHS, const DataLayout &DL,		bool llvm::isImpliedCondition(Value LHS, Value RHS, const DataLayout &DL,
unsigned Depth, AssumptionCache *AC,		unsigned Depth, AssumptionCache *AC,
const Instruction *CxtI,		const Instruction *CxtI,
const DominatorTree *DT) {		const DominatorTree *DT) {
assert(LHS->getType() == RHS->getType() && "mismatched type");		assert(LHS->getType() == RHS->getType() && "mismatched type");
Type *OpTy = LHS->getType();		Type *OpTy = LHS->getType();
Show All 24 Lines

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	static cl::opt<bool> SpeculateOneExpensiveInst(
cl::desc("Allow exactly one expensive instruction to be speculatively "		cl::desc("Allow exactly one expensive instruction to be speculatively "
"executed"));		"executed"));

static cl::opt<unsigned> MaxSpeculationDepth(		static cl::opt<unsigned> MaxSpeculationDepth(
"max-speculation-depth", cl::Hidden, cl::init(10),		"max-speculation-depth", cl::Hidden, cl::init(10),
cl::desc("Limit maximum recursion depth when calculating costs of "		cl::desc("Limit maximum recursion depth when calculating costs of "
"speculatively executed instructions"));		"speculatively executed instructions"));

		static cl::opt<unsigned> SpeculativeFlattenBias(
		"speculative-flatten-bias", cl::Hidden, cl::init(100),
		cl::desc("Control how biased a branch needs to be to be considered rarely"
		" failing for speculative flattening (default = 100)"));

		static cl::opt<unsigned> SpeculativeFlattenThreshold(
		"speculative-flatten-threshold", cl::Hidden, cl::init(10),
		cl::desc("Control how much speculation happens due to speculative"
		" flattening (default = 10)"));

STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");		STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
STATISTIC(NumLinearMaps, "Number of switch instructions turned into linear mapping");		STATISTIC(NumLinearMaps, "Number of switch instructions turned into linear mapping");
STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");		STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");		STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");
STATISTIC(NumTableCmpReuses, "Number of reused switch table lookup compares");		STATISTIC(NumTableCmpReuses, "Number of reused switch table lookup compares");
STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");		STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
STATISTIC(NumSpeculations, "Number of speculative executed instructions");		STATISTIC(NumSpeculations, "Number of speculative executed instructions");

▲ Show 20 Lines • Show All 2,554 Lines • ▼ Show 20 Lines	static bool mergeConditionalStores(BranchInst PBI, BranchInst QBI) {

bool Changed = false;		bool Changed = false;
for (auto *Address : CommonAddresses)		for (auto *Address : CommonAddresses)
Changed \|= mergeConditionalStoreToAddress(		Changed \|= mergeConditionalStoreToAddress(
PTB, PFB, QTB, QFB, PostBB, Address, InvertPCond, InvertQCond);		PTB, PFB, QTB, QFB, PostBB, Address, InvertPCond, InvertQCond);
return Changed;		return Changed;
}		}

		/// If we have a series of tests leading to a frequently executed fallthrough
		/// path and we can test all the conditions at once, we can execute a single
		/// test on the fast path and figure out which condition failed on the slow
		/// path. This transformation considers a pair of branches at a time since
		/// recursively considering branches pairwise will cause an entire chain to
		/// collapse. This transformation is code size neutral, but makes it
		/// dynamically more expensive to fail either check. As such, we only want to
		/// do this if both checks are expected to essentially never fail.
		/// The key motivating examples are cases like:
		/// br (5 > Length), in_bounds, fail1
		/// in_bounds:
		/// br (6 > Length), in_bounds2, fail2
		davidxlUnsubmitted Not Done Reply Inline Actions Should these two conditions swapped in this example or change > to < ? davidxl: Should these two conditions swapped in this example or change > to < ?
		/// in_bounds2:
		/// ...
		///
		/// We can rewrite this as:
		/// br (6 > Length), in_bounds2, dispatch
		/// in_bounds2:
		/// ...
		/// dispatch:
		/// br (5 > Length), fail2, fail1
		///
		/// TODO: we could consider duplicating some (non-speculatable) instructions
		/// from BI->getParent() down both paths
		static bool SpeculativelyFlattenCondBranchToCondBranch(BranchInst PBI, BranchInst BI,
		const DataLayout &DL) {
		auto *PredBB = PBI->getParent();
		auto *BB = BI->getParent();

		/// Is the failing path of this branch taken rarely if at all?
		auto isRarelyUntaken = [](BranchInst *BI) {
		sanjoyUnsubmitted Not Done Reply Inline Actions Nit: should be `IsRarelyUntaken`. sanjoy: Nit: should be `IsRarelyUntaken`.
		davidxlUnsubmitted Not Done Reply Inline Actions How about IsLikelyTaken? davidxl: How about IsLikelyTaken?
		uint64_t ProbTrue;
		uint64_t ProbFalse;
		if (!ExtractBranchMetadata(BI, ProbTrue, ProbFalse))
		return false;
		return ProbTrue > ProbFalse * SpeculativeFlattenBias;
		};

		if (PBI->getSuccessor(0) != BB \|\|
		!isRarelyUntaken(PBI) \|\| !isRarelyUntaken(BI) \|\|
		!isImpliedCondition(BI->getCondition(), PBI->getCondition(), DL))
		return false;

		// TODO: The following code performs a similiar, but slightly distinct
		davidxlUnsubmitted Not Done Reply Inline Actions nit: similar. davidxl: nit: similar.
		// transformation to that done by SpeculativelyExecuteBB. We should consider
		// combining them at some point.

		// Can we speculate everything in the given block (except for the terminator
		// instruction) at the instruction boundary before 'At'?
		unsigned SpeculationCost = 0;
		for (Instruction &I : *BB) {
		if (isa<TerminatorInst>(I)) break;
		if (!isSafeToSpeculativelyExecute(&I, PBI))
		return false;
		SpeculationCost++;
		// Only flatten relatively small BBs to avoid making the bad case terrible
		if (SpeculationCost > SpeculativeFlattenThreshold \|\| isa<CallInst>(I))
		sanjoyUnsubmitted Not Done Reply Inline Actions Minor: might do this check before the check on `isSafeToSpeculativelyExecute`, since it will be a cheaper way out, if it fails. sanjoy: Minor: might do this check before the check on `isSafeToSpeculativelyExecute`, since it will be…
		return false;
		}

		// We need to check whether any of the instructions we may have examined when
		// deciding the BI implies PBI have flags which might be control dependent on
		// PBI. This can happen if (for instance), PBI ensures no overflow in the
		// computaiton leading to BI's condition. We can't use the overflow bits to
		sanjoyUnsubmitted Not Done Reply Inline Actions Spelling nit: "computation" sanjoy: Spelling nit: "computation"
		// prove the original condition checking for overflow. This only considers
		// the instructions within BB which actually contribute to BI's condition.
		SmallVector<Instruction *, 8> Worklist;
		SmallSet<Instruction *, 8> Visited;
		Value *BICond = BI->getCondition();
		if (auto *CondI = dyn_cast<Instruction>(BICond))
		if (CondI->getParent() == BB)
		Worklist.push_back(CondI);
		while (!Worklist.empty()) {
		Instruction *I = Worklist.pop_back_val();
		assert(I->getParent() == BB && "broken invariant");

		if (!Visited.insert(I).second)
		// already visited
		continue;

		if (auto *OBO = dyn_cast<OverflowingBinaryOperator>(I))
		if (OBO->hasNoUnsignedWrap() \|\| OBO->hasNoSignedWrap())
		sanjoyUnsubmitted Not Done Reply Inline Actions I'd say this is a little risky. I think you're okay now since `isImpliedCondition` only looks at overflow flags, but if it is changed to consider, e.g., the `exact` flag on `udiv` or `inbounds` on `getelementptr` at some time in the future, this might subtly break. I'd rather split out a helper routine, `GetCombinedCond` that takes two predicates, `P` and `Q` and a location, `Loc`, and returns a predicate that is equivalent to `P && Q` at `Loc`, if it can be cheaply computed. Predicates like `5 u< L` and `6 u< L` can be combined into `6 u< L`; and predicates like `I u< L` and `(I +nuw 1) u< L` can be combined into `(I +nuw 1) u< L` if you can prove that the `nuw` holds at `Loc`, etc. sanjoy: I'd say this is a little risky. I think you're okay now since `isImpliedCondition` only looks…
		reamesAuthorUnsubmitted Not Done Reply Inline Actions To be clear, I'm not proposing that we just check nsw/nuw. As you point out, that would be utterly unsound. Assuming I extend the logic to include exact and inbounds (which I had completely forgotten about), what do you think of the approach? (If I've forgotten other cases, please suggest them.) I'm a bit leery of introducing the separate predicate. I think that unless we're really careful here, we could end up duplicating a lot of code. Now, having said that, I'll give it a bit more thought tomorrow and see what I think after writing some prototype code. reames: To be clear, I'm not proposing that we just check nsw/nuw. As you point out, that would be…
		sanjoyUnsubmitted Not Done Reply Inline Actions I don't think we'll duplicate much code here, `GetCombinedCond` will just be a helper to simplify `X && Y` into something cheaper. `X` implies `Y` (without exploiting invalid no-wrap) will then just be one case where it `X && Y` can be simplified to `X`. It should also be a more natural extension point -- for instance, if we want to combine `0 < I < 20` and `10 < I < 50`, neither implies the other but their logical-and can be expressed as `10 < I < 20`. sanjoy: I don't think we'll duplicate much code here, `GetCombinedCond` will just be a helper to…
		// TODO: can we prove the flags are valid before PBI?
		return false;

		for (Value *Op : I->operands())
		if (auto *OpI = dyn_cast<Instruction>(Op))
		if (OpI->getParent() == BB)
		Worklist.push_back(OpI);
		}


		DEBUG(dbgs() << "Outlining slow path comparison: "
		<< *PBI->getCondition() << " implied by "
		<< *BI->getCondition() << "\n");
		// See the example in the function comment.
		Value *WhichCond = PBI->getCondition();
		auto *Success = BI->getSuccessor(0);
		auto *FailPBI = PBI->getSuccessor(1);
		auto *FailBI = BI->getSuccessor(1);
		// Have PBI branch directly to the fast path using BI's condition, branch
		// to this BI's block for the slow path dispatch
		PBI->setSuccessor(0, Success);
		PBI->setSuccessor(1, BB);
		PBI->setCondition(BI->getCondition());
		// Rewrite BI to distinguish between the two failing cases
		BI->setSuccessor(0, FailBI);
		BI->setSuccessor(1, FailPBI);
		BI->setCondition(WhichCond);
		// Move all of the instructions from BI->getParent other than
		// the terminator into the fastpath. This requires speculating them past
		// the original PBI branch, but we checked for that legality above.
		// TODO: This doesn't handle dependent loads. We could duplicate those
		// down both paths, but that involves further code growth. We need to
		// figure out a good cost model here.
		PredBB->getInstList().splice(BasicBlock::iterator(PBI), BB->getInstList(),
		BB->begin(), std::prev(BB->end()));

		// To be conservatively correct, drop all metadata on the rewritten
		// branches. TODO: update metadata
		PBI->dropUnknownNonDebugMetadata();
		BI->dropUnknownNonDebugMetadata();
		return true;
		}
/// If we have a conditional branch as a predecessor of another block,		/// If we have a conditional branch as a predecessor of another block,
/// this function tries to simplify it. We know		/// this function tries to simplify it. We know
/// that PBI and BI are both conditional branches, and BI is in one of the		/// that PBI and BI are both conditional branches, and BI is in one of the
/// successor blocks of PBI - PBI branches to BI.		/// successor blocks of PBI - PBI branches to BI.
static bool SimplifyCondBranchToCondBranch(BranchInst PBI, BranchInst BI,		static bool SimplifyCondBranchToCondBranch(BranchInst PBI, BranchInst BI,
const DataLayout &DL) {		const DataLayout &DL) {
assert(PBI->isConditional() && BI->isConditional());		assert(PBI->isConditional() && BI->isConditional());
BasicBlock *BB = BI->getParent();		BasicBlock *BB = BI->getParent();
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (CE->canTrap())
return false;		return false;

// If BI is reached from the true path of PBI and PBI's condition implies		// If BI is reached from the true path of PBI and PBI's condition implies
// BI's condition, we know the direction of the BI branch.		// BI's condition, we know the direction of the BI branch.
if (PBI->getSuccessor(0) == BI->getParent() &&		if (PBI->getSuccessor(0) == BI->getParent() &&
isImpliedCondition(PBI->getCondition(), BI->getCondition(), DL) &&		isImpliedCondition(PBI->getCondition(), BI->getCondition(), DL) &&
PBI->getSuccessor(0) != PBI->getSuccessor(1) &&		PBI->getSuccessor(0) != PBI->getSuccessor(1) &&
BB->getSinglePredecessor()) {		BB->getSinglePredecessor()) {
		DEBUG(dbgs() << "Branch (" << *BI << ") implied by predecessor ("
		<< *PBI << ")\n");
// Turn this into a branch on constant.		// Turn this into a branch on constant.
auto *OldCond = BI->getCondition();		auto *OldCond = BI->getCondition();
BI->setCondition(ConstantInt::getTrue(BB->getContext()));		BI->setCondition(ConstantInt::getTrue(BB->getContext()));
RecursivelyDeleteTriviallyDeadInstructions(OldCond);		RecursivelyDeleteTriviallyDeadInstructions(OldCond);
return true; // Nuke the branch on constant.		return true; // Nuke the branch on constant.
}		}

		// If BI could imply PBI, try to remove one check from the fast-path
		if (SpeculativelyFlattenCondBranchToCondBranch(PBI, BI, DL))
		return true;

// If both branches are conditional and both contain stores to the same		// If both branches are conditional and both contain stores to the same
// address, remove the stores from the conditionals and create a conditional		// address, remove the stores from the conditionals and create a conditional
// merged store at the end.		// merged store at the end.
if (MergeCondStores && mergeConditionalStores(PBI, BI))		if (MergeCondStores && mergeConditionalStores(PBI, BI))
return true;		return true;

// If this is a conditional branch in an empty block, and if any		// If this is a conditional branch in an empty block, and if any
// predecessors are a conditional branch to one of our destinations,		// predecessors are a conditional branch to one of our destinations,
▲ Show 20 Lines • Show All 2,560 Lines • Show Last 20 Lines

test/Transforms/InstSimplify/implies.ll

	Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @test_sge			; CHECK-LABEL: @test_sge
	; CHECK: ret i1 true			; CHECK: ret i1 true
	%iplus1 = add nsw nuw i32 %i, 1			%iplus1 = add nsw nuw i32 %i, 1
	%var29 = icmp ult i32 %i, %length.i			%var29 = icmp ult i32 %i, %length.i
	%var30 = icmp ult i32 %iplus1, %length.i			%var30 = icmp ult i32 %iplus1, %length.i
	%res = icmp sge i1 %var30, %var29			%res = icmp sge i1 %var30, %var29
	ret i1 %res			ret i1 %res
	}			}

				define i1 @test16(i32 %length.i, i32 %i) {
				; CHECK-LABEL: @test16
				; CHECK: ret i1 true
				%var29 = icmp ugt i32 %length.i, 5
				%var30 = icmp ugt i32 %length.i, 6
				%res = icmp ule i1 %var30, %var29
				ret i1 %res
				}

				define i1 @test17(i32 %length.i, i32 %i) {
				; CHECK-LABEL: @test17
				; CHECK: ret i1 true
				%var29 = icmp ugt i32 %length.i, 5
				%var30 = icmp ugt i32 %length.i, 5
				%res = icmp ule i1 %var30, %var29
				ret i1 %res
				}

				; negative test (smaller range)
				define i1 @test18(i32 %length.i, i32 %i) {
				; CHECK-LABEL: @test18
				; CHECK: ret i1 %res
				%var29 = icmp ugt i32 %length.i, 5
				%var30 = icmp ugt i32 %length.i, 4
				%res = icmp ule i1 %var30, %var29
				ret i1 %res
				}

				; negative test
				define i1 @test19(i32 %length.i, i32 %i) {
				; CHECK-LABEL: @test19
				; CHECK: ret i1 %res
				%var29 = icmp ugt i32 5, %length.i
				%var30 = icmp ugt i32 6, %length.i
				%res = icmp ule i1 %var30, %var29
				ret i1 %res
				}

				define i1 @test20(i32 %length.i, i32 %i) {
				; CHECK-LABEL: @test20
				; CHECK: ret i1 true
				%var29 = icmp ugt i32 5, %length.i
				%var30 = icmp ugt i32 4, %length.i
				%res = icmp ule i1 %var30, %var29
				ret i1 %res
				}

test/Transforms/SimplifyCFG/fast-fallthrough.ll

				; RUN: opt -S %s -simplifycfg \| FileCheck %s

				define void @test(i32 %length.i, i32 %i) {
				; CHECK-LABEL: @test
				%iplus1 = add nsw i32 %i, 1
				%var29 = icmp ugt i32 %length.i, 5
				%var30 = icmp ugt i32 %length.i, 6
				; CHECK: br i1 %var30, label %in_bounds, label %next
				br i1 %var29, label %next, label %out_of_bounds, !prof !{!"branch_weights", i32 1000, i32 0}

				next:
				; CHECK-LABEL: next:
				; CHECK: br i1 %var29, label %out_of_bounds2, label %out_of_bounds
				br i1 %var30, label %in_bounds, label %out_of_bounds2, !prof !{!"branch_weights", i32 1000, i32 0}

				in_bounds:
				ret void

				out_of_bounds:
				call void @foo(i64 0)
				unreachable

				out_of_bounds2:
				call void @foo(i64 1)
				unreachable
				}

				define void @test2(i32 %length.i, i32 %i) {
				; CHECK-LABEL: @test2
				%iplus1 = add nsw i32 %i, 1
				%var29 = icmp slt i32 %i, %length.i
				%var30 = icmp slt i32 %iplus1, %length.i
				; CHECK: br i1 %var30, label %in_bounds, label %next
				br i1 %var29, label %next, label %out_of_bounds, !prof !{!"branch_weights", i32 1000, i32 0}

				next:
				; CHECK-LABEL: next:
				; CHECK: br i1 %var29, label %out_of_bounds2, label %out_of_bounds
				br i1 %var30, label %in_bounds, label %out_of_bounds2, !prof !{!"branch_weights", i32 1000, i32 0}

				in_bounds:
				ret void

				out_of_bounds:
				call void @foo(i64 0)
				unreachable

				out_of_bounds2:
				call void @foo(i64 1)
				unreachable
				}

				; This is a negative test. The msw on the add might be control
				; dependent on the first check, so we can't hoist it above.
				define void @test3(i32 %length.i, i32 %i) {
				; CHECK-LABEL: @test3
				%var29 = icmp slt i32 %i, %length.i
				; CHECK: br i1 %var29, label %next, label %out_of_bounds
				br i1 %var29, label %next, label %out_of_bounds, !prof !{!"branch_weights", i32 1000, i32 0}

				next:
				; CHECK-LABEL: next:
				; CHECK: br i1 %var30, label %in_bounds, label %out_of_bounds2
				%iplus1 = add nsw i32 %i, 1
				%var30 = icmp slt i32 %iplus1, %length.i
				br i1 %var30, label %in_bounds, label %out_of_bounds2, !prof !{!"branch_weights", i32 1000, i32 0}

				in_bounds:
				ret void

				out_of_bounds:
				call void @foo(i64 0)
				unreachable

				out_of_bounds2:
				call void @foo(i64 1)
				unreachable
				}

				; As written, this one can't trigger today. It would require us to duplicate
				; the %val1 load down two paths and that's not implemented yet.
				define i64 @test4(i32 %length.i, i32 %i, i64* %base) {
				; CHECK-LABEL: @test4
				%var29 = icmp slt i32 %i, %length.i
				; CHECK: br i1 %var29, label %next, label %out_of_bounds
				br i1 %var29, label %next, label %out_of_bounds, !prof !{!"branch_weights", i32 1000, i32 0}

				next:
				; CHECK-LABEL: next:
				%addr1 = getelementptr i64, i64* %base, i32 %i
				%val1 = load i64, i64* %addr1
				%iplus1 = add nsw i32 %i, 1
				%var30 = icmp slt i32 %iplus1, %length.i
				; CHECK: br i1 %var30, label %in_bounds, label %out_of_bounds2
				br i1 %var30, label %in_bounds, label %out_of_bounds2, !prof !{!"branch_weights", i32 1000, i32 0}

				in_bounds:
				%addr2 = getelementptr i64, i64* %base, i32 %iplus1
				%val2 = load i64, i64* %addr2
				%res = sub i64 %val1, %val2
				ret i64 %res

				out_of_bounds:
				call void @foo(i64 0)
				unreachable

				out_of_bounds2:
				call void @foo(i64 %val1)
				unreachable
				}

				declare void @foo(i64)