This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
SimpleLoopUnswitch.cpp
-
test/Transforms/SimpleLoopUnswitch/
-
Transforms/
-
SimpleLoopUnswitch/
-
exponential-nontrivial-unswitch-nested.ll
-
exponential-nontrivial-unswitch-nested2.ll
-
exponential-nontrivial-unswitch.ll
-
exponential-nontrivial-unswitch2.ll
-
exponential-switch-unswitch.ll

Differential D54223

[SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with
ClosedPublic

Authored by fedor.sergeev on Nov 7 2018, 1:05 PM.

Download Raw Diff

Details

Reviewers

chandlerc
asbirlea
mkazantsev
skatkov

Commits

rG2e3e224e715e: [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with
rL347097: [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with

Summary

We need to control exponential behavior of loop-unswitch so we do not get
run-away compilation.

Suggested solution is to introduce a multiplier for an unswitch cost that
makes cost prohibitive as soon as there are too many candidates and too
many sibling loops (meaning we have already started duplicating loops
by unswitching).

It does solve the currently known problem with compile-time degradation
(PR 39544).

Diff Detail

Repository: rL LLVM

Event Timeline

fedor.sergeev created this revision.Nov 7 2018, 1:05 PM

Harbormaster completed remote builds in B24688: Diff 173011.Nov 7 2018, 1:06 PM

Honestly, this seems like a really reasonable approach in practice.

Between the sibling threshold and the candidate threshold I think gives highly stable results as well as pruning unreasonable search spaces.

I think this does in practice prevent the explosion, not just make it less likely.

One thing that might be interesting to consider is whether unswitching that typically creates siblings could through some surprise create non-sibling loops due to restructuring of the nest? If not, then this seems pretty solid. If so, we could consider look at the sibling loop nest sizes rather than just the sibling loop nest counts.

Can a contrived test case hit this quickly enough to make sense to add to the tests and show the limit being applied? Potentially by lowering these thresholds on the commandline?

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2486–2487 ↗	(On Diff #173011)	s/since it a/since it is a/ or maybe.... "since it is designed to limit the worst case behavior and not an attempt to provide a nuanced heuristic size prediction"
2493–2494 ↗	(On Diff #173011)	Maybe: "When the number of unswitch candidates is below our "bottom cap" we disable the scaling of the cost and use the directly computed cost."
2496 ↗	(On Diff #173011)	I would just cast the size to an int rather than the larger function style cast?
2503–2504 ↗	(On Diff #173011)	Maybe: "Handle possible overlow." -> "Compute the cost multiplier in a way that won't overflow by saturating at an upper bound."

In general I think it is what we can go/start with.
This formula seems to be a bit pessimistic. Specifically not every candidate creates sibling loops. Only if all successors is in loop or there are sub-loops before unswitch candidate. So applying the power multiplier for this candidates seems unreasonable.
Also it is possible that several unswitch candidates can be done at one shot (the same condition), in this case the number of siblings will not grow so fast.
But these improvements can be done as a follow-up if needed.

General comment, please consider to extract the helper function findBestUnswitchCandidate to get these computation in one place because it becomes not so trivial.

mkazantsev added inline comments.Nov 7 2018, 10:34 PM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2489 ↗	(On Diff #173011)	The check here should be more accurate, at least it should consider conditional branches that has a dest outside the loop, because unswitching of such branches won't produce a sibling loop.
2513 ↗	(On Diff #173011)	Space before `(`
2518 ↗	(On Diff #173011)	How can `CostMultiplier` be less than 1? Integer overflow is undefined, so you won't catch it, otherwise it's just impossible. :)

introducing two tests, no comments addressed yet

fedor.sergeev edited the summary of this revision. (Show Details)Nov 8 2018, 1:12 PM

Harbormaster completed remote builds in B24744: Diff 173215.Nov 8 2018, 1:12 PM

I like you suggestion of writing FileCheck against print<loops> output.

The 96 loop case .... is painful. Is it possible to get something smaller? Or possible to add something like CHECK-COUNT-<N>: ... ?

Not trying to make this too much yak shaving / work invention so if adding CHECK-COUNT is ... too painful and you can get the large ones closer to 32, I can tolerate that. Still seems better than grep.

test/Transforms/SimpleLoopUnswitch/exponential-nontrivial-unswitch.ll
20–27 ↗	(On Diff #173215)	Maybe use cap=4 or something here so that you still see the cap applying compared to the entire thing disabled below?

addressing some of the comments, will work on improving tests next

Harbormaster completed remote builds in B24749: Diff 173227.Nov 8 2018, 2:24 PM

fedor.sergeev retitled this revision from WIP... [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with to [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with.Nov 8 2018, 2:50 PM

fedor.sergeev edited the summary of this revision. (Show Details)

fedor.sergeev marked 6 inline comments as done.

updating tests, using not-yet-integrated CHECK-COUNT- FileCheck directives

Harbormaster completed remote builds in B24799: Diff 173377.Nov 9 2018, 10:20 AM

introducing cap=4 tests in nested case; use sort -b

Harbormaster completed remote builds in B24803: Diff 173389.Nov 9 2018, 11:10 AM

fedor.sergeev mentioned this in D54336: [FileCheck] introduce CHECK-COUNT-<num> repetition directive.Nov 9 2018, 11:17 AM

fedor.sergeev added a parent revision: D54336: [FileCheck] introduce CHECK-COUNT-<num> repetition directive.Nov 11 2018, 4:11 AM

mkazantsev added inline comments.Nov 12 2018, 3:12 AM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
75 ↗	(On Diff #173389)	That name is not super-informative. Maybe something like `LimitUnswitchingFromExponentExplosion` or something of this variety?
2286 ↗	(On Diff #173389)	I'd suggest to make this check outside `calculateUnswitchCostMultiplier` and only multiply cost by this factor if needed.
2546 ↗	(On Diff #173389)	Maybe `1..UnswitchThreshold`?

fedor.sergeev added inline comments.Nov 12 2018, 3:39 AM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
75 ↗	(On Diff #173389)	There might be other means of controlling this explosion things. I would prefer naming by actual semantics and not by the most interesting side-effect. This is definitely not an option that anybody should use lightly.

update as per Max' comments

fedor.sergeev marked 2 inline comments as done.Nov 12 2018, 3:41 AM

I think this is looking pretty good as an initial cut. I'm still very interested in follow-ups to use a more precise heuristic of course, but I think those can be follow-ups.

Make sure to sync with Max before landing to make sure he's OK with the current state.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
82–85 ↗	(On Diff #173648)	Maybe `UnswitchNumInitialUnscaledCandidates` for a name?
2300–2301 ↗	(On Diff #173648)	Just return here?

This revision is now accepted and ready to land.Nov 12 2018, 7:43 AM

more extensive checking for exiting branches; adding tests with exiting branches

fedor.sergeev added inline comments.Nov 12 2018, 12:48 PM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2300–2301 ↗	(On Diff #173648)	Nope, power 0 means 1 for this part of a multiplier. Still can get nontrivial siblings multiplier part.

renamed bottom-cap option, updated comments, tests

Harbormaster completed remote builds in B24904: Diff 173741.Nov 12 2018, 12:59 PM

fedor.sergeev mentioned this in rL346722: [FileCheck] introduce CHECK-COUNT-<num> repetition directive.Nov 12 2018, 4:48 PM

LGTM

Discovered a tricky testcase with 16-way switch and nested exiting branches which manages to skip the multiplier introduced here and still lead to exponential explosion.
Definitely need to check if exiting branch dominates the latch before skipping its multiplier calculation.
Also the testcase shows that we really need to go further calculating costs per candidate and using that in calculation of exponential-explosion threshold, just as Chandler asked to.

Will do an update to a multiplier skipping check here.
Other changes will go as a follow up (hopefully, soon :-/ ).

handle switch candidates; exponential switch case added

This revision is now accepted and ready to land.Nov 14 2018, 1:51 PM

Harbormaster completed remote builds in B25011: Diff 174094.Nov 14 2018, 1:51 PM

fedor.sergeev marked 6 inline comments as done.Nov 14 2018, 1:55 PM

Makes sense to me generally. Some questions below really.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
87 ↗	(On Diff #174094)	nit: s/Amount/Number/
2281–2283 ↗	(On Diff #174094)	Not sure how simple this is now?
2307–2308 ↗	(On Diff #174094)	I'd change the name here. You're not counting candidates, you're counting unswitched clones I think. To that end, should you do the same filtering here that you do above for guards and exiting conditions?

fedor.sergeev added inline comments.Nov 14 2018, 2:56 PM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2281–2283 ↗	(On Diff #174094)	Err... intent was for it to be simple :)
2307–2308 ↗	(On Diff #174094)	Hmm... yes, filtering them here makes sense. Speaking of which, I dont believe my filtering is right for the switch. Even if switch has one exiting case it still can have many other cases remaining for the unswitch (and thus duplication of loops). Will try to do something here, though for now I'm still going to keep it "simple" ... at least to the extent possible.

updating names, comments etc

chandlerc added inline comments.Nov 14 2018, 3:22 PM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2307–2308 ↗	(On Diff #174094)	I'd change here and above to skip when: `DT.dominates(CondBlock, Latch) && (isGuard(&TI) \|\| CondBlock->getNumSuccessors() - count_if(CondBlock->successors(), [](BasicBlock *SuccBB) { return !L.contains(SuccBB); }) <= 1)`.

Harbormaster completed remote builds in B25016: Diff 174111.Nov 14 2018, 3:22 PM

chandlerc added inline comments.Nov 14 2018, 3:24 PM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2307–2308 ↗	(On Diff #174094)	Er, that should somewhat obviously be: `DT.dominates(CondBlock, Latch) && (isGuard(&TI) \|\| count_if(CondBlock->successors(), [](BasicBlock *SuccBB) { return L.contains(SuccBB); }) <= 1)`

getting clones calculation more precise

Harbormaster completed remote builds in B25020: Diff 174120.Nov 14 2018, 4:25 PM

fedor.sergeev marked 7 inline comments as done.Nov 14 2018, 4:26 PM

still LGTM.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2303 ↗	(On Diff #174120)	I think the yoda-comparison makes this harder to read. I'd rather say `count_if(...) <= 1`.

fedor.sergeev added inline comments.Nov 14 2018, 10:54 PM

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2303 ↗	(On Diff #174120)	Specifically wanted to try this and check how people react. :) I find both variants somewhat uneasy to glance over. Standard one because of the multi-line nature of a lambda and many closing brackets/parens :(

Closed by commit rL347097: [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with (authored by fedor.sergeev). · Explain WhyNov 16 2018, 1:19 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

SimpleLoopUnswitch.cpp

118 lines

test/

Transforms/

SimpleLoopUnswitch/

exponential-nontrivial-unswitch-nested.ll

139 lines

exponential-nontrivial-unswitch-nested2.ll

149 lines

exponential-nontrivial-unswitch.ll

80 lines

exponential-nontrivial-unswitch2.ll

56 lines

exponential-switch-unswitch.ll

118 lines

Diff 174441

llvm/trunk/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
#define DEBUG_TYPE "simple-loop-unswitch"		#define DEBUG_TYPE "simple-loop-unswitch"

using namespace llvm;		using namespace llvm;

STATISTIC(NumBranches, "Number of branches unswitched");		STATISTIC(NumBranches, "Number of branches unswitched");
STATISTIC(NumSwitches, "Number of switches unswitched");		STATISTIC(NumSwitches, "Number of switches unswitched");
STATISTIC(NumGuards, "Number of guards turned into branches for unswitching");		STATISTIC(NumGuards, "Number of guards turned into branches for unswitching");
STATISTIC(NumTrivial, "Number of unswitches that are trivial");		STATISTIC(NumTrivial, "Number of unswitches that are trivial");
		STATISTIC(
		NumCostMultiplierSkipped,
		"Number of unswitch candidates that had their cost multiplier skipped");

static cl::opt<bool> EnableNonTrivialUnswitch(		static cl::opt<bool> EnableNonTrivialUnswitch(
"enable-nontrivial-unswitch", cl::init(false), cl::Hidden,		"enable-nontrivial-unswitch", cl::init(false), cl::Hidden,
cl::desc("Forcibly enables non-trivial loop unswitching rather than "		cl::desc("Forcibly enables non-trivial loop unswitching rather than "
"following the configuration passed into the pass."));		"following the configuration passed into the pass."));

static cl::opt<int>		static cl::opt<int>
UnswitchThreshold("unswitch-threshold", cl::init(50), cl::Hidden,		UnswitchThreshold("unswitch-threshold", cl::init(50), cl::Hidden,
cl::desc("The cost threshold for unswitching a loop."));		cl::desc("The cost threshold for unswitching a loop."));

		static cl::opt<bool> EnableUnswitchCostMultiplier(
		"enable-unswitch-cost-multiplier", cl::init(true), cl::Hidden,
		cl::desc("Enable unswitch cost multiplier that prohibits exponential "
		"explosion in nontrivial unswitch."));
		static cl::opt<int> UnswitchSiblingsToplevelDiv(
		"unswitch-siblings-toplevel-div", cl::init(2), cl::Hidden,
		cl::desc("Toplevel siblings divisor for cost multiplier."));
		static cl::opt<int> UnswitchNumInitialUnscaledCandidates(
		"unswitch-num-initial-unscaled-candidates", cl::init(8), cl::Hidden,
		cl::desc("Number of unswitch candidates that are ignored when calculating "
		"cost multiplier."));
static cl::opt<bool> UnswitchGuards(		static cl::opt<bool> UnswitchGuards(
"simple-loop-unswitch-guards", cl::init(true), cl::Hidden,		"simple-loop-unswitch-guards", cl::init(true), cl::Hidden,
cl::desc("If enabled, simple loop unswitching will also consider "		cl::desc("If enabled, simple loop unswitching will also consider "
"llvm.experimental.guard intrinsics as unswitch candidates."));		"llvm.experimental.guard intrinsics as unswitch candidates."));

/// Collect all of the loop invariant input values transitively used by the		/// Collect all of the loop invariant input values transitively used by the
/// homogeneous instruction graph from a given root.		/// homogeneous instruction graph from a given root.
///		///
▲ Show 20 Lines • Show All 2,172 Lines • ▼ Show 20 Lines	turnGuardIntoBranch(IntrinsicInst *GI, Loop &L,
DT.applyUpdates(DTUpdates);		DT.applyUpdates(DTUpdates);
// Inform LI of a new loop block.		// Inform LI of a new loop block.
L.addBasicBlockToLoop(GuardedBlock, LI);		L.addBasicBlockToLoop(GuardedBlock, LI);

++NumGuards;		++NumGuards;
return CheckBI;		return CheckBI;
}		}

		/// Cost multiplier is a way to limit potentially exponential behavior
		/// of loop-unswitch. Cost is multipied in proportion of 2^number of unswitch
		/// candidates available. Also accounting for the number of "sibling" loops with
		/// the idea to account for previous unswitches that already happened on this
		/// cluster of loops. There was an attempt to keep this formula simple,
		/// just enough to limit the worst case behavior. Even if it is not that simple
		/// now it is still not an attempt to provide a detailed heuristic size
		/// prediction.
		///
		/// TODO: Make a proper accounting of "explosion" effect for all kinds of
		/// unswitch candidates, making adequate predictions instead of wild guesses.
		/// That requires knowing not just the number of "remaining" candidates but
		/// also costs of unswitching for each of these candidates.
		static int calculateUnswitchCostMultiplier(
		Instruction &TI, Loop &L, LoopInfo &LI, DominatorTree &DT,
		ArrayRef<std::pair<Instruction , TinyPtrVector<Value >>>
		UnswitchCandidates) {

		// Guards and other exiting conditions do not contribute to exponential
		// explosion as soon as they dominate the latch (otherwise there might be
		// another path to the latch remaining that does not allow to eliminate the
		// loop copy on unswitch).
		BasicBlock *Latch = L.getLoopLatch();
		BasicBlock *CondBlock = TI.getParent();
		if (DT.dominates(CondBlock, Latch) &&
		(isGuard(&TI) \|\|
		llvm::count_if(successors(&TI), [&L](BasicBlock *SuccBB) {
		return L.contains(SuccBB);
		}) <= 1)) {
		NumCostMultiplierSkipped++;
		return 1;
		}

		auto *ParentL = L.getParentLoop();
		int SiblingsCount = (ParentL ? ParentL->getSubLoopsVector().size()
		: std::distance(LI.begin(), LI.end()));
		// Count amount of clones that all the candidates might cause during
		// unswitching. Branch/guard counts as 1, switch counts as log2 of its cases.
		int UnswitchedClones = 0;
		for (auto Candidate : UnswitchCandidates) {
		Instruction *CI = Candidate.first;
		BasicBlock *CondBlock = CI->getParent();
		bool SkipExitingSuccessors = DT.dominates(CondBlock, Latch);
		if (isGuard(CI)) {
		if (!SkipExitingSuccessors)
		UnswitchedClones++;
		continue;
		}
		int NonExitingSuccessors = llvm::count_if(
		successors(CondBlock), [SkipExitingSuccessors, &L](BasicBlock *SuccBB) {
		return !SkipExitingSuccessors \|\| L.contains(SuccBB);
		});
		UnswitchedClones += Log2_32(NonExitingSuccessors);
		}

		// Ignore up to the "unscaled candidates" number of unswitch candidates
		// when calculating the power-of-two scaling of the cost. The main idea
		// with this control is to allow a small number of unswitches to happen
		// and rely more on siblings multiplier (see below) when the number
		// of candidates is small.
		unsigned ClonesPower =
		std::max(UnswitchedClones - (int)UnswitchNumInitialUnscaledCandidates, 0);

		// Allowing top-level loops to spread a bit more than nested ones.
		int SiblingsMultiplier =
		std::max((ParentL ? SiblingsCount
		: SiblingsCount / (int)UnswitchSiblingsToplevelDiv),
		1);
		// Compute the cost multiplier in a way that won't overflow by saturating
		// at an upper bound.
		int CostMultiplier;
		if (ClonesPower > Log2_32(UnswitchThreshold) \|\|
		SiblingsMultiplier > UnswitchThreshold)
		CostMultiplier = UnswitchThreshold;
		else
		CostMultiplier = std::min(SiblingsMultiplier * (1 << ClonesPower),
		(int)UnswitchThreshold);

		LLVM_DEBUG(dbgs() << " Computed multiplier " << CostMultiplier
		<< " (siblings " << SiblingsMultiplier << " * clones "
		<< (1 << ClonesPower) << ")"
		<< " for unswitch candidate: " << TI << "\n");
		return CostMultiplier;
		}

static bool		static bool
unswitchBestCondition(Loop &L, DominatorTree &DT, LoopInfo &LI,		unswitchBestCondition(Loop &L, DominatorTree &DT, LoopInfo &LI,
AssumptionCache &AC, TargetTransformInfo &TTI,		AssumptionCache &AC, TargetTransformInfo &TTI,
function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB,		function_ref<void(bool, ArrayRef<Loop *>)> UnswitchCB,
ScalarEvolution *SE) {		ScalarEvolution *SE) {
// Collect all invariant conditions within this loop (as opposed to an inner		// Collect all invariant conditions within this loop (as opposed to an inner
// loop which would be handled when visiting that inner loop).		// loop which would be handled when visiting that inner loop).
SmallVector<std::pair<Instruction , TinyPtrVector<Value >>, 4>		SmallVector<std::pair<Instruction , TinyPtrVector<Value >>, 4>
▲ Show 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	unswitchBestCondition(Loop &L, DominatorTree &DT, LoopInfo &LI,
ArrayRef<Value *> BestUnswitchInvariants;		ArrayRef<Value *> BestUnswitchInvariants;
for (auto &TerminatorAndInvariants : UnswitchCandidates) {		for (auto &TerminatorAndInvariants : UnswitchCandidates) {
Instruction &TI = *TerminatorAndInvariants.first;		Instruction &TI = *TerminatorAndInvariants.first;
ArrayRef<Value *> Invariants = TerminatorAndInvariants.second;		ArrayRef<Value *> Invariants = TerminatorAndInvariants.second;
BranchInst *BI = dyn_cast<BranchInst>(&TI);		BranchInst *BI = dyn_cast<BranchInst>(&TI);
int CandidateCost = ComputeUnswitchedCost(		int CandidateCost = ComputeUnswitchedCost(
TI, /FullUnswitch/ !BI \|\| (Invariants.size() == 1 &&		TI, /FullUnswitch/ !BI \|\| (Invariants.size() == 1 &&
Invariants[0] == BI->getCondition()));		Invariants[0] == BI->getCondition()));
		// Calculate cost multiplier which is a tool to limit potentially
		// exponential behavior of loop-unswitch.
		if (EnableUnswitchCostMultiplier) {
		int CostMultiplier =
		calculateUnswitchCostMultiplier(TI, L, LI, DT, UnswitchCandidates);
		assert(
		(CostMultiplier > 0 && CostMultiplier <= UnswitchThreshold) &&
		"cost multiplier needs to be in the range of 1..UnswitchThreshold");
		CandidateCost *= CostMultiplier;
LLVM_DEBUG(dbgs() << " Computed cost of " << CandidateCost		LLVM_DEBUG(dbgs() << " Computed cost of " << CandidateCost
		<< " (multiplier: " << CostMultiplier << ")"
<< " for unswitch candidate: " << TI << "\n");		<< " for unswitch candidate: " << TI << "\n");
		} else {
		LLVM_DEBUG(dbgs() << " Computed cost of " << CandidateCost
		<< " for unswitch candidate: " << TI << "\n");
		}

if (!BestUnswitchTI \|\| CandidateCost < BestUnswitchCost) {		if (!BestUnswitchTI \|\| CandidateCost < BestUnswitchCost) {
BestUnswitchTI = &TI;		BestUnswitchTI = &TI;
BestUnswitchCost = CandidateCost;		BestUnswitchCost = CandidateCost;
BestUnswitchInvariants = Invariants;		BestUnswitchInvariants = Invariants;
}		}
}		}

if (BestUnswitchCost >= UnswitchThreshold) {		if (BestUnswitchCost >= UnswitchThreshold) {
▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SimpleLoopUnswitch/exponential-nontrivial-unswitch-nested.ll

				;
				; There should be just a single copy of each loop when strictest mutiplier
				; candidates formula (unscaled candidates == 0) is enforced:

				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=16 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				;
				; When we relax the candidates part of a multiplier formula
				; (unscaled candidates == 4) we start getting some unswitches,
				; which leads to siblings multiplier kicking in.
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=4 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP-UNSCALE4-DIV1
				;
				; NB: sort -b is essential here and below, otherwise blanks might lead to different
				; order depending on locale.
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=4 -unswitch-siblings-toplevel-div=2 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP-UNSCALE4-DIV2
				;
				;
				; Get
				; 2^(num conds) == 2^5 = 32
				; loop nests when cost multiplier is disabled:
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=false \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP32
				;
				; Single loop nest, not unswitched
				; LOOP1: Loop at depth 1 containing:
				; LOOP1: Loop at depth 2 containing:
				; LOOP1: Loop at depth 3 containing:
				; LOOP1-NOT: Loop at depth {{[0-9]+}} containing:
				;
				; Half unswitched loop nests, with unscaled4 and div1 it gets less depth1 loops unswitched
				; since they have more cost.
				; LOOP-UNSCALE4-DIV1-COUNT-6: Loop at depth 1 containing:
				; LOOP-UNSCALE4-DIV1-COUNT-19: Loop at depth 2 containing:
				; LOOP-UNSCALE4-DIV1-COUNT-29: Loop at depth 3 containing:
				; LOOP-UNSCALE4-DIV1-NOT: Loop at depth {{[0-9]+}} containing:
				;
				; Half unswitched loop nests, with unscaled4 and div2 it gets more depth1 loops unswitched
				; as div2 kicks in.
				; LOOP-UNSCALE4-DIV2-COUNT-11: Loop at depth 1 containing:
				; LOOP-UNSCALE4-DIV2-COUNT-22: Loop at depth 2 containing:
				; LOOP-UNSCALE4-DIV2-COUNT-29: Loop at depth 3 containing:
				; LOOP-UNSCALE4-DIV2-NOT: Loop at depth {{[0-9]+}} containing:
				;
				; 32 loop nests, fully unswitched
				; LOOP32-COUNT-32: Loop at depth 1 containing:
				; LOOP32-COUNT-32: Loop at depth 2 containing:
				; LOOP32-COUNT-32: Loop at depth 3 containing:
				; LOOP32-NOT: Loop at depth {{[0-9]+}} containing:

				declare void @bar()

				define void @loop_nested3_conds5(i32* %addr, i1 %c1, i1 %c2, i1 %c3, i1 %c4, i1 %c5) {
				entry:
				%addr1 = getelementptr i32, i32* %addr, i64 0
				%addr2 = getelementptr i32, i32* %addr, i64 1
				%addr3 = getelementptr i32, i32* %addr, i64 2
				br label %outer
				outer:
				%iv1 = phi i32 [0, %entry], [%iv1.next, %outer_latch]
				%iv1.next = add i32 %iv1, 1
				;; skip nontrivial unswitch
				call void @bar()
				br label %middle
				middle:
				%iv2 = phi i32 [0, %outer], [%iv2.next, %middle_latch]
				%iv2.next = add i32 %iv2, 1
				;; skip nontrivial unswitch
				call void @bar()
				br label %loop
				loop:
				%iv3 = phi i32 [0, %middle], [%iv3.next, %loop_latch]
				%iv3.next = add i32 %iv3, 1
				;; skip nontrivial unswitch
				call void @bar()
				br i1 %c1, label %loop_next1_left, label %loop_next1_right
				loop_next1_left:
				br label %loop_next1
				loop_next1_right:
				br label %loop_next1

				loop_next1:
				br i1 %c2, label %loop_next2_left, label %loop_next2_right
				loop_next2_left:
				br label %loop_next2
				loop_next2_right:
				br label %loop_next2

				loop_next2:
				br i1 %c3, label %loop_next3_left, label %loop_next3_right
				loop_next3_left:
				br label %loop_next3
				loop_next3_right:
				br label %loop_next3

				loop_next3:
				br i1 %c4, label %loop_next4_left, label %loop_next4_right
				loop_next4_left:
				br label %loop_next4
				loop_next4_right:
				br label %loop_next4

				loop_next4:
				br i1 %c5, label %loop_latch_left, label %loop_latch_right
				loop_latch_left:
				br label %loop_latch
				loop_latch_right:
				br label %loop_latch

				loop_latch:
				store volatile i32 0, i32* %addr1
				%test_loop = icmp slt i32 %iv3, 50
				br i1 %test_loop, label %loop, label %middle_latch
				middle_latch:
				store volatile i32 0, i32* %addr2
				%test_middle = icmp slt i32 %iv2, 50
				br i1 %test_middle, label %middle, label %outer_latch
				outer_latch:
				store volatile i32 0, i32* %addr3
				%test_outer = icmp slt i32 %iv1, 50
				br i1 %test_outer, label %outer, label %exit
				exit:
				ret void
				}

llvm/trunk/test/Transforms/SimpleLoopUnswitch/exponential-nontrivial-unswitch-nested2.ll

				;
				; Here all the branches we unswitch are exiting from the inner loop.
				; That means we should not be getting exponential behavior on inner-loop
				; unswitch. In fact there should be just a single version of inner-loop,
				; with possibly some outer loop copies.
				;
				; There should be just a single copy of each loop when strictest mutiplier
				; candidates formula (unscaled candidates == 0) is enforced:

				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=16 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				;
				; When we relax the candidates part of a multiplier formula
				; (unscaled candidates == 2) we start getting some unswitches in outer loops,
				; which leads to siblings multiplier kicking in.
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=3 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP-UNSCALE3-DIV1
				;
				; NB: sort -b is essential here and below, otherwise blanks might lead to different
				; order depending on locale.
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=3 -unswitch-siblings-toplevel-div=2 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP-UNSCALE3-DIV2
				;
				; With disabled cost-multiplier we get maximal possible amount of unswitches.
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=false \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP-MAX
				;
				; Single loop nest, not unswitched
				; LOOP1: Loop at depth 1 containing:
				; LOOP1-NOT: Loop at depth 1 containing:
				; LOOP1: Loop at depth 2 containing:
				; LOOP1-NOT: Loop at depth 2 containing:
				; LOOP1: Loop at depth 3 containing:
				; LOOP1-NOT: Loop at depth 3 containing:
				;
				; Half unswitched loop nests, with unscaled3 and div1 it gets less depth1 loops unswitched
				; since they have more cost.
				; LOOP-UNSCALE3-DIV1-COUNT-4: Loop at depth 1 containing:
				; LOOP-UNSCALE3-DIV1-NOT: Loop at depth 1 containing:
				; LOOP-UNSCALE3-DIV1-COUNT-1: Loop at depth 2 containing:
				; LOOP-UNSCALE3-DIV1-NOT: Loop at depth 2 containing:
				; LOOP-UNSCALE3-DIV1-COUNT-1: Loop at depth 3 containing:
				; LOOP-UNSCALE3-DIV1-NOT: Loop at depth 3 containing:
				;
				; Half unswitched loop nests, with unscaled3 and div2 it gets more depth1 loops unswitched
				; as div2 kicks in.
				; LOOP-UNSCALE3-DIV2-COUNT-6: Loop at depth 1 containing:
				; LOOP-UNSCALE3-DIV2-NOT: Loop at depth 1 containing:
				; LOOP-UNSCALE3-DIV2-COUNT-1: Loop at depth 2 containing:
				; LOOP-UNSCALE3-DIV2-NOT: Loop at depth 2 containing:
				; LOOP-UNSCALE3-DIV2-COUNT-1: Loop at depth 3 containing:
				; LOOP-UNSCALE3-DIV2-NOT: Loop at depth 3 containing:
				;
				; Maximally unswitched (copy of the outer loop per each condition)
				; LOOP-MAX-COUNT-6: Loop at depth 1 containing:
				; LOOP-MAX-NOT: Loop at depth 1 containing:
				; LOOP-MAX-COUNT-1: Loop at depth 2 containing:
				; LOOP-MAX-NOT: Loop at depth 2 containing:
				; LOOP-MAX-COUNT-1: Loop at depth 3 containing:
				; LOOP-MAX-NOT: Loop at depth 3 containing:

				declare void @bar()

				define void @loop_nested3_conds5(i32* %addr, i1 %c1, i1 %c2, i1 %c3, i1 %c4, i1 %c5) {
				entry:
				%addr1 = getelementptr i32, i32* %addr, i64 0
				%addr2 = getelementptr i32, i32* %addr, i64 1
				%addr3 = getelementptr i32, i32* %addr, i64 2
				br label %outer
				outer:
				%iv1 = phi i32 [0, %entry], [%iv1.next, %outer_latch]
				%iv1.next = add i32 %iv1, 1
				;; skip nontrivial unswitch
				call void @bar()
				br label %middle
				middle:
				%iv2 = phi i32 [0, %outer], [%iv2.next, %middle_latch]
				%iv2.next = add i32 %iv2, 1
				;; skip nontrivial unswitch
				call void @bar()
				br label %loop
				loop:
				%iv3 = phi i32 [0, %middle], [%iv3.next, %loop_latch]
				%iv3.next = add i32 %iv3, 1
				;; skip nontrivial unswitch
				call void @bar()
				br i1 %c1, label %loop_next1_left, label %outer_latch
				loop_next1_left:
				br label %loop_next1
				loop_next1_right:
				br label %loop_next1

				loop_next1:
				br i1 %c2, label %loop_next2_left, label %outer_latch
				loop_next2_left:
				br label %loop_next2
				loop_next2_right:
				br label %loop_next2

				loop_next2:
				br i1 %c3, label %loop_next3_left, label %outer_latch
				loop_next3_left:
				br label %loop_next3
				loop_next3_right:
				br label %loop_next3

				loop_next3:
				br i1 %c4, label %loop_next4_left, label %outer_latch
				loop_next4_left:
				br label %loop_next4
				loop_next4_right:
				br label %loop_next4

				loop_next4:
				br i1 %c5, label %loop_latch_left, label %outer_latch
				loop_latch_left:
				br label %loop_latch
				loop_latch_right:
				br label %loop_latch

				loop_latch:
				store volatile i32 0, i32* %addr1
				%test_loop = icmp slt i32 %iv3, 50
				br i1 %test_loop, label %loop, label %middle_latch
				middle_latch:
				store volatile i32 0, i32* %addr2
				%test_middle = icmp slt i32 %iv2, 50
				br i1 %test_middle, label %middle, label %outer_latch
				outer_latch:
				store volatile i32 0, i32* %addr3
				%test_outer = icmp slt i32 %iv1, 50
				br i1 %test_outer, label %outer, label %exit
				exit:
				ret void
				}

llvm/trunk/test/Transforms/SimpleLoopUnswitch/exponential-nontrivial-unswitch.ll

				;
				; There should be just a single copy of loop when strictest mutiplier candidates
				; formula (unscaled candidates == 0) is enforced:
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=8 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; With relaxed candidates multiplier (unscaled candidates == 8) we should allow
				; some unswitches to happen until siblings multiplier starts kicking in:
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=8 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP5
				;
				; With relaxed candidates multiplier (unscaled candidates == 8) and with relaxed
				; siblings multiplier for top-level loops (toplevel-div == 8) we should get
				; 2^(num conds) == 2^5 == 32
				; copies of the loop:
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=8 -unswitch-siblings-toplevel-div=8 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP32
				;
				; Similarly get
				; 2^(num conds) == 2^5 == 32
				; copies of the loop when cost multiplier is disabled:
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=false \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP32
				;
				;
				; Single loop, not unswitched
				; LOOP1: Loop at depth 1 containing:
				; LOOP1-NOT: Loop at depth 1 containing:

				; 5 loops, unswitched 4 times
				; LOOP5-COUNT-5: Loop at depth 1 containing:
				; LOOP5-NOT: Loop at depth 1 containing:

				; 32 loops, fully unswitched
				; LOOP32-COUNT-32: Loop at depth 1 containing:
				; LOOP32-NOT: Loop at depth 1 containing:

				define void @loop_simple5(i32* %addr, i1 %c1, i1 %c2, i1 %c3, i1 %c4, i1 %c5) {
				entry:
				br label %loop
				loop:
				%iv = phi i32 [0, %entry], [%iv.next, %loop_latch]
				%iv.next = add i32 %iv, 1
				br i1 %c1, label %loop_next1, label %loop_next1_right
				loop_next1_right:
				br label %loop_next1
				loop_next1:
				br i1 %c2, label %loop_next2, label %loop_next2_right
				loop_next2_right:
				br label %loop_next2
				loop_next2:
				br i1 %c3, label %loop_next3, label %loop_next3_right
				loop_next3_right:
				br label %loop_next3
				loop_next3:
				br i1 %c4, label %loop_next4, label %loop_next4_right
				loop_next4_right:
				br label %loop_next4
				loop_next4:
				br i1 %c5, label %loop_latch, label %loop_latch_right
				loop_latch_right:
				br label %loop_latch
				loop_latch:
				store volatile i32 0, i32* %addr
				%test_loop = icmp slt i32 %iv, 50
				br i1 %test_loop, label %loop, label %exit
				exit:
				ret void
				}

llvm/trunk/test/Transforms/SimpleLoopUnswitch/exponential-nontrivial-unswitch2.ll

				;
				; Here all the branches are exiting ones. Checking that we dont have
				; exponential behavior with any kind of controlling heuristics here.
				;
				; There we should have just a single loop.
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=8 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=8 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=8 -unswitch-siblings-toplevel-div=8 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=false \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				;
				; Single loop, not unswitched
				; LOOP1: Loop at depth 1 containing:
				; LOOP1-NOT: Loop at depth 1 containing:

				declare void @bar()

				define void @loop_simple5(i32* %addr, i1 %c1, i1 %c2, i1 %c3, i1 %c4, i1 %c5) {
				entry:
				br label %loop
				loop:
				%iv = phi i32 [0, %entry], [%iv.next, %loop_latch]
				%iv.next = add i32 %iv, 1
				;; disabling trivial unswitch
				call void @bar()
				br i1 %c1, label %loop_next1, label %exit
				loop_next1:
				br i1 %c2, label %loop_next2, label %exit
				loop_next2:
				br i1 %c3, label %loop_next3, label %exit
				loop_next3:
				br i1 %c4, label %loop_next4, label %exit
				loop_next4:
				br i1 %c5, label %loop_latch, label %exit
				loop_latch:
				store volatile i32 0, i32* %addr
				%test_loop = icmp slt i32 %iv, 50
				br i1 %test_loop, label %loop, label %exit
				exit:
				ret void
				}

llvm/trunk/test/Transforms/SimpleLoopUnswitch/exponential-switch-unswitch.ll

				;
				; Here we have 5-way unswitchable switch with each successor also having an unswitchable
				; exiting branch in it. If we start unswitching those branches we start duplicating the
				; whole switch. This can easily lead to exponential behavior w/o proper control.
				; On a real-life testcase there was 16-way switch and that took forever to compile w/o
				; a cost control.
				;
				;
				; When we use the stricted multiplier candidates formula (unscaled candidates == 0)
				; we should be getting just a single loop.
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=0 -unswitch-siblings-toplevel-div=16 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| FileCheck %s --check-prefixes=LOOP1
				;
				;
				; With relaxed candidates multiplier (unscaled candidates == 8) we should allow
				; some unswitches to happen until siblings multiplier starts kicking in:
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=8 -unswitch-siblings-toplevel-div=1 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP-RELAX
				;
				; With relaxed candidates multiplier (unscaled candidates == 8) and with relaxed
				; siblings multiplier for top-level loops (toplevel-div == 8) we should get
				; considerably more copies of the loop (especially top-level ones).
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=true \
				; RUN: -unswitch-num-initial-unscaled-candidates=8 -unswitch-siblings-toplevel-div=8 \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP-RELAX2
				;
				; We get hundreds of copies of the loop when cost multiplier is disabled:
				;
				; RUN: opt < %s -enable-nontrivial-unswitch -enable-unswitch-cost-multiplier=false \
				; RUN: -passes='loop(unswitch),print<loops>' -disable-output 2>&1 \| \
				; RUN: sort -b \| FileCheck %s --check-prefixes=LOOP-MAX
				;

				; Single loop nest, not unswitched
				; LOOP1: Loop at depth 1 containing:
				; LOOP1-NOT: Loop at depth 1 containing:
				; LOOP1: Loop at depth 2 containing:
				; LOOP1-NOT: Loop at depth 2 containing:
				;
				; Somewhat relaxed restrictions on candidates:
				; LOOP-RELAX-COUNT-5: Loop at depth 1 containing:
				; LOOP-RELAX-NOT: Loop at depth 1 containing:
				; LOOP-RELAX-COUNT-32: Loop at depth 2 containing:
				; LOOP-RELAX-NOT: Loop at depth 2 containing:
				;
				; Even more relaxed restrictions on candidates and siblings.
				; LOOP-RELAX2-COUNT-11: Loop at depth 1 containing:
				; LOOP-RELAX2-NOT: Loop at depth 1 containing:
				; LOOP-RELAX2-COUNT-40: Loop at depth 2 containing:
				; LOOP-RELAX-NOT: Loop at depth 2 containing:
				;
				; Unswitched as much as it could (with multiplier disabled).
				; LOOP-MAX-COUNT-56: Loop at depth 1 containing:
				; LOOP-MAX-NOT: Loop at depth 1 containing:
				; LOOP-MAX-COUNT-111: Loop at depth 2 containing:
				; LOOP-MAX-NOT: Loop at depth 2 containing:

				define i32 @loop_switch(i32* %addr, i32 %c1, i32 %c2) {
				entry:
				%addr1 = getelementptr i32, i32* %addr, i64 0
				%addr2 = getelementptr i32, i32* %addr, i64 1
				%check0 = icmp eq i32 %c2, 0
				%check1 = icmp eq i32 %c2, 31
				%check2 = icmp eq i32 %c2, 32
				%check3 = icmp eq i32 %c2, 33
				%check4 = icmp eq i32 %c2, 34
				br label %outer_loop

				outer_loop:
				%iv1 = phi i32 [0, %entry], [%iv1.next, %outer_latch]
				%iv1.next = add i32 %iv1, 1
				br label %inner_loop
				inner_loop:
				%iv2 = phi i32 [0, %outer_loop], [%iv2.next, %inner_latch]
				%iv2.next = add i32 %iv2, 1
				switch i32 %c1, label %inner_latch [
				i32 0, label %case0
				i32 1, label %case1
				i32 2, label %case2
				i32 3, label %case3
				i32 4, label %case4
				]

				case4:
				br i1 %check4, label %exit, label %inner_latch
				case3:
				br i1 %check3, label %exit, label %inner_latch
				case2:
				br i1 %check2, label %exit, label %inner_latch
				case1:
				br i1 %check1, label %exit, label %inner_latch
				case0:
				br i1 %check0, label %exit, label %inner_latch

				inner_latch:
				store volatile i32 0, i32* %addr1
				%test_inner = icmp slt i32 %iv2, 50
				br i1 %test_inner, label %inner_loop, label %outer_latch

				outer_latch:
				store volatile i32 0, i32* %addr2
				%test_outer = icmp slt i32 %iv1, 50
				br i1 %test_outer, label %outer_loop, label %exit

				exit: ; preds = %bci_0
				ret i32 1
				}