This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
-
WebAssemblyFixIrreducibleControlFlow.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
irreducible-cfg-exceptions.ll
-
irreducible-cfg-nested.ll
-
irreducible-cfg-nested2.ll
-
irreducible-cfg.ll

Differential D55467

[WebAssembly] Optimize Irreducible Control Flow
ClosedPublic

Authored by kripken on Dec 7 2018, 5:38 PM.

Download Raw Diff

Details

Reviewers

sunfish
aheejin

Commits

rG777d01c756de: [WebAssembly] Optimize Irreducible Control Flow
rL350367: [WebAssembly] Optimize Irreducible Control Flow

Summary

Irreducible control flow is not that rare, e.g. it happens in malloc and 3 other places in the libc portions linked in to a hello world program. This patch improves how we handle that code: it emits a br_table to dispatch to only the minimal necessary number of blocks. This reduces the size of malloc by 33%, and makes it comparable in size to asm2wasm's malloc output.

Added some tests, and verified this passes the emscripten-wasm tests run on the waterfall (binaryen2, wasmobj2, other).

Diff Detail

Repository: rL LLVM

Event Timeline

kripken created this revision.Dec 7 2018, 5:38 PM

Herald added subscribers: llvm-commits, mgrang, jgravelle-google and 2 others. · View Herald TranscriptDec 7 2018, 5:38 PM

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

mgrang added inline comments.Dec 10 2018, 2:22 PM

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
264 ↗	(On Diff #177363)	Please use the range-based llvm::sort instead of std::sort. llvm::sort(SortedEntries, comparator); See https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements

In D55467#1326177, @sunfish wrote:

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

Not for malloc - I think @jgravelle-google may have been looking into that though?

Some other musl libc elements with irreducible control flow that show up in a hello world are actually irreducible in the source (!), I verified.

Use llvm::sort following @mgrang's feedback. Thanks!

In D55467#1326244, @kripken wrote:

In D55467#1326177, @sunfish wrote:

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

Not for malloc - I think @jgravelle-google may have been looking into that though?

Discussing offline, there is some investigation but the details are not yet clear.

I agree figuring that out is very important - if we can avoid unnecessary irreducible control flow that's great. I think it's a separate issue from this patch, though, since as mentioned above even libc has a few places with source-code level irreducible control flow. We could only fix that by rewriting the source, but even that would just be for libc, and not other user code. So regardless I think we should emit compact code when irreducible code occurs, which is what this patch focuses on.

First brief look at the pass.. mostly low-level nits

Nit: Could you wrap comments to 80 cols? Some lines end prematurely now. (clang-format does not do that for us.. In vim you can use gq, but not sure about other editors)
There are many lambda functions that can be made into plain static functions or class member functions. I think it is more conventional and readable that way unless they are very short or are meant to be passed into arguments to another function.. I feel it's not very easy to read when other several lambda function definitions are interleaved with the function body.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
85 ↗	(On Diff #177606)	I don't think we need dominator tree anymore, so I think we can delete these two lines.
87 ↗	(On Diff #177606)	Do we preserve loop info?
109 ↗	(On Diff #177606)	Nit: LLVM coding standards say function names should start with a lowercase letter.
159 ↗	(On Diff #177606)	Nit: `using` instead of `typedef`
160 ↗	(On Diff #177606)	LLVM coding standards warn against uses of non-deterministic containers. Also LLVM progamming manual prevents the uses of `unordered_set` or `unordered_map`. (ref1, ref2) Maybe we can choose from other set-like containers and map-like containers?
164 ↗	(On Diff #177606)	Nit: `using` instead of `typedef`
179 ↗	(On Diff #177606)	Nit: You can do WorkList.emplace_back(MBB, Succ);
214 ↗	(On Diff #177606)	Nit: You can merge these lines maybe with auto MBB, Succ; std::tie(MBB, Succ) = WorkList.pop_back_val()
382 ↗	(On Diff #177606)	Should we recompute these three every time we make a change? This pass does not use liveness info, so I think we can do this at the end of the whole pass if any change was made. I don't think we need to do renumbering? We only use BB numbers at CFGSort / CFGStackify and CFGSort recomputes all numbers itself before doing anything. This pass does not seem to use dominator tree, right? Then deleting `AU.addPreserved<MachineDominatorTree>();` in `getAnalysisUsage()` should be sufficient. I don't think we even need to require it.
383 ↗	(On Diff #177606)	Should we recompute loop info for the whole function every time we make any change? `MachineLoopInfo` class seems to have various modifier functions. Can we use these on-the-fly during the pass?
405 ↗	(On Diff #177606)	How about flattening `Iteration` and `DoVisitLoop` into this while loop, like this? I find nested lambdas a bit hard to read. while (...) { SmallVector<MachineLoop , 8> Worklist; // Visit the function body, which is identified as a null loop. Worklist.push_back(nullptr); // Visit all the loops. Worklist.append(MLI.begin(), MLI.end()); while (!Worklist.empty()) { MachineLoop Loop = Worklist.pop_back_val(); Worklist.append(Loop->begin(), Loop->end()); if (VisitLoop(MF, MLI, Loop)) { ... } } }

Is it possible to reduce test cases more while they exhibit the same behavior? We usually try to avoid big generated test cases to our LLVM regression tests, so..

In D55467#1326244, @kripken wrote:

In D55467#1326177, @sunfish wrote:

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

Not for malloc - I think @jgravelle-google may have been looking into that though?

@jgravelle-google Have you had a chance to look at this?

Some other musl libc elements with irreducible control flow that show up in a hello world are actually irreducible in the source (!), I verified.

Do you by chance remember which functions this was?

Thanks for the comments @aheejin!

All should be addressed in the patch I'm updating now, or if not then I replied to those comments.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
85 ↗	(On Diff #177606)	You're right that we don't need it, but it turns out recomputing MLI crashes if I don't keep this code there. I guess the dominator tree is used in MLI, and MLI does not know to recompute it itself? (I don't know anything about LLVM internals...)
87 ↗	(On Diff #177606)	Yes, we preserve it by recomputing it if we invalidate it.
382 ↗	(On Diff #177606)	We need to recompute the loop info each time since we use it, which I think means we need to recompute the others (like dominator trees, see comment above). In practice, often there is no irreducible control flow, so no recomputing is done. Or it can be resolved in one iteration, in which case we recompute once as needed. It's rare to need more work than that.
383 ↗	(On Diff #177606)	I'd prefer not to as it seems more complex than worthwhile, given the comment above on how much work is done in the common cases here. But if you feel strongly I can do that.

kripken updated this revision to Diff 178143.Dec 13 2018, 3:01 PM

In D55467#1330401, @aheejin wrote:

Is it possible to reduce test cases more while they exhibit the same behavior? We usually try to avoid big generated test cases to our LLVM regression tests, so..

Not a lot more - perhaps I can tidy up the names and remove an instruction or two, but the CFG they represent is necessary to reproduce the issue (no simple CFG can represent irreducible control flow that requires multiple passes to reduce...).

If they are too big for the LLVM test suite, I can just remove them - they will still be run in the emscripten test suite.

In D55467#1330455, @sunfish wrote:

In D55467#1326244, @kripken wrote:

In D55467#1326177, @sunfish wrote:

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

Some other musl libc elements with irreducible control flow that show up in a hello world are actually irreducible in the source (!), I verified.

Do you by chance remember which functions this was?

One is

https://github.com/kripken/emscripten/blob/incoming/system/lib/libc/musl/src/multibyte/mbsrtowcs.c

That one looks clearly irreducible. Musl has a bunch more goto cases, which end up irreducible in the backend, but it can be kind of hard to inspect visually, like

https://github.com/kripken/emscripten/blob/incoming/system/lib/libc/musl/src/locale/iconv.c

In the CL/commit description, could you add some explanation on what has changed, i.e., what unnecessary cases of transformation this patch reduces?

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
24 ↗	(On Diff #178143)	assign to -> assign?
76 ↗	(On Diff #178143)	clang-format
98 ↗	(On Diff #178143)	a loop -> an inner loop, just to be clear
183 ↗	(On Diff #178143)	Just a suggestion on the workflow... Currently `WorkList` is a class variable modified in multiple functions (inserted in `maybeInsert` and popped in `run`), and `maybeInsert` takes `MBB` and `Succ` but `MBB` is already canonicalized at that point but it canonicalize `Succ` within `maybeInsert` and if it is not canonicalized we bail out, which makes the workflow little hard to follow. I think making `WorkList` as just a local variable within `run` and call both `canonicalize` and `canonicalizeSuccessor` in `run`, and inlining the remaining contents of `maybeInsert` into the `run` function would make it a bit simpler.
222 ↗	(On Diff #178143)	Do we need two separate containers? If you want a vector with a O(1) search, maybe you can use `SetVector` and merge these two?
235 ↗	(On Diff #178143)	How about `if (LLVM_LIKELY(Entries.size() <= 1))` to signal we wouldn't find an irreducible control flow in usual cases?
280 ↗	(On Diff #178143)	Is there a word missing between 'every' and 'that'?
397 ↗	(On Diff #178143)	How about `while (LLVM_UNLIKELY(runIteration(MF, MLI))` to signal that this is unlikely to be true?
85 ↗	(On Diff #177606)	Oh, you're right, sorry.
test/CodeGen/WebAssembly/irreducible-cfg-exceptions.ll
4 ↗	(On Diff #178143)	Now we use just "wasm32-unknown-unknown". In other test cases too.
6 ↗	(On Diff #178143)	Can we delete this?
17 ↗	(On Diff #178143)	Can we delete all attributes (like #0, #1, ...) from this and other test cases if they are not necessary to show the bug? Also `hidden` and `unnamed_addr` can be deleted as well.
113 ↗	(On Diff #178143)	We can delete these if not necessary (for other test cases too)
test/CodeGen/WebAssembly/irreducible-cfg-nested.ll
11 ↗	(On Diff #178143)	I think we can delete `dso_local`, `fastcc`, and `unnamed_addr` here

Thanks, submitting a fixed patch now.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
24 ↗	(On Diff #178143)	We assign to the same one in all paths. I'll reword it.
183 ↗	(On Diff #178143)	To make sure I understand, inlining the remaining contents would mean doing the same work 3 times (for each inlined location) to do the insert, check if it actually added, and add to the work list if so? That feels less good to me, but happy to do it either way.
222 ↗	(On Diff #178143)	Hmm, can I sort a SetVector, though? It says the order of iteration is the order of insertion (which is fixed).

kripken updated this revision to Diff 178257.Dec 14 2018, 11:12 AM

kripken marked an inline comment as done.

aheejin added inline comments.Dec 14 2018, 3:00 PM

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
183 ↗	(On Diff #178143)	I thought, after taking out calling `canonicalizeSuccessor` out of `maybeInsert`, `maybeInsert` would be like 2-3 lines, so it would be OK to inline, but maybe it's less desirable. I was mainly concerned about calling `canonicalize` and `canonicalizeSuccessor` in two different places, but maybe it's ok this way. :)
222 ↗	(On Diff #178143)	If the insertion / iteration order is fixed, do we need sorting at all?
test/CodeGen/WebAssembly/irreducible-cfg-exceptions.ll
49 ↗	(On Diff #178257)	I understand this test is complex, but maybe for these couple extremely long basic block names, we can simplify them a bit..?

Thanks for the comments, uploading an updated patch now.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
183 ↗	(On Diff #178143)	I'll try to improve the comments to clarify that. There should be just one location that calls canonicalizeSuccessor (maybeInsert), aside from an assertion. I think the assertions help, but if you feel they make the code less clear, then that's fine with me of course, and it would be shorter with fewer of them. I'll also add some comments on where canonicalize/canonicalizeSuccessor should be called, so hopefully they look unsurprising in those locations.
222 ↗	(On Diff #178143)	Yeah, good point - I was using an unsorted set earlier on (for LoopBlocks), but if I replace it with a SetVector too then everything indeed ends up deterministic. I do have some vague worry about using SetVectors everywhere adding some overhead, but probably I'm overthinking it?

Use SetVector for determinism
Simplify names in exceptions testcase
Improve comments about canonicalization

LGTM to me modulo a nit.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
222 ↗	(On Diff #178143)	Yeah maybe we are unnecessarily using too many `SetVector`s... how about using not `SetVector` but other set such as `SmallPtrSet` or something for all other usages and revive `SortedEntires`, and sort them once and for all based on the MBB number, as you did before? But maybe not `unordered_set`, because its use is discouraged because it is expensive. Sorry for going back and forth :(

This revision is now accepted and ready to land.Dec 17 2018, 4:34 PM

kripken marked an inline comment as done.Dec 18 2018, 12:59 PM

kripken added inline comments.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
222 ↗	(On Diff #178143)	Sounds good, thanks, updating patch.

Use SmallPtrSet for our sets, and return to using a vector for SortedEntries which we sort, avoiding SetVector.

Closed by commit rL350367: [WebAssembly] Optimize Irreducible Control Flow (authored by aheejin). · Explain WhyJan 3 2019, 3:13 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

WebAssembly/

WebAssemblyFixIrreducibleControlFlow.cpp

429 lines

test/

CodeGen/

WebAssembly/

irreducible-cfg-exceptions.ll

108 lines

irreducible-cfg-nested.ll

63 lines

irreducible-cfg-nested2.ll

39 lines

irreducible-cfg.ll

129 lines

Diff 180155

llvm/trunk/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp

//=- WebAssemblyFixIrreducibleControlFlow.cpp - Fix irreducible control flow -//		//=- WebAssemblyFixIrreducibleControlFlow.cpp - Fix irreducible control flow -//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
///		///
/// \file		/// \file
/// This file implements a pass that transforms irreducible control flow		/// This file implements a pass that transforms irreducible control flow into
/// into reducible control flow. Irreducible control flow means multiple-entry		/// reducible control flow. Irreducible control flow means multiple-entry
/// loops; they appear as CFG cycles that are not recorded in MachineLoopInfo		/// loops; they appear as CFG cycles that are not recorded in MachineLoopInfo
/// due to being unnatural.		/// due to being unnatural.
///		///
/// Note that LLVM has a generic pass that lowers irreducible control flow, but		/// Note that LLVM has a generic pass that lowers irreducible control flow, but
/// it linearizes control flow, turning diamonds into two triangles, which is		/// it linearizes control flow, turning diamonds into two triangles, which is
/// both unnecessary and undesirable for WebAssembly.		/// both unnecessary and undesirable for WebAssembly.
///		///
/// TODO: The transformation implemented here handles all irreducible control		/// The big picture: Ignoring natural loops (seeing them monolithically), we
/// flow, without exponential code-size expansion, though it does so by creating		/// find all the blocks which can return to themselves ("loopers"). Loopers
/// inefficient code in many cases. Ideally, we should add other		/// reachable from the non-loopers are loop entries: if there are 2 or more,
/// transformations, including code-duplicating cases, which can be more		/// then we have irreducible control flow. We fix that as follows: a new block
/// efficient in common cases, and they can fall back to this conservative		/// is created that can dispatch to each of the loop entries, based on the
/// implementation as needed.		/// value of a label "helper" variable, and we replace direct branches to the
		/// entries with assignments to the label variable and a branch to the dispatch
		/// block. Then the dispatch block is the single entry in a new natural loop.
		///
		/// This is similar to what the Relooper [1] does, both identify looping code
		/// that requires multiple entries, and resolve it in a similar way. In
		/// Relooper terminology, we implement a Multiple shape in a Loop shape. Note
		/// also that like the Relooper, we implement a "minimal" intervention: we only
		/// use the "label" helper for the blocks we absolutely must and no others. We
		/// also prioritize code size and do not perform node splitting (i.e. we don't
		/// duplicate code in order to resolve irreducibility).
		///
		/// The difference between this code and the Relooper is that the Relooper also
		/// generates ifs and loops and works in a recursive manner, knowing at each
		/// point what the entries are, and recursively breaks down the problem. Here
		/// we just want to resolve irreducible control flow, and we also want to use
		/// as much LLVM infrastructure as possible. So we use the MachineLoopInfo to
		/// identify natural loops, etc., and we start with the whole CFG and must
		/// identify both the looping code and its entries.
		///
		/// [1] Alon Zakai. 2011. Emscripten: an LLVM-to-JavaScript compiler. In
		/// Proceedings of the ACM international conference companion on Object oriented
		/// programming systems languages and applications companion (SPLASH '11). ACM,
		/// New York, NY, USA, 301-312. DOI=10.1145/2048147.2048224
		/// http://doi.acm.org/10.1145/2048147.2048224
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "MCTargetDesc/WebAssemblyMCTargetDesc.h"		#include "MCTargetDesc/WebAssemblyMCTargetDesc.h"
#include "WebAssembly.h"		#include "WebAssembly.h"
#include "WebAssemblyMachineFunctionInfo.h"		#include "WebAssemblyMachineFunctionInfo.h"
#include "WebAssemblySubtarget.h"		#include "WebAssemblySubtarget.h"
#include "llvm/ADT/PriorityQueue.h"		#include "llvm/ADT/PriorityQueue.h"
#include "llvm/ADT/SCCIterator.h"		#include "llvm/ADT/SCCIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "wasm-fix-irreducible-control-flow"		#define DEBUG_TYPE "wasm-fix-irreducible-control-flow"

namespace {		namespace {
class WebAssemblyFixIrreducibleControlFlow final : public MachineFunctionPass {
StringRef getPassName() const override {
return "WebAssembly Fix Irreducible Control Flow";
}

void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();
AU.addRequired<MachineDominatorTree>();
AU.addPreserved<MachineDominatorTree>();
AU.addRequired<MachineLoopInfo>();
AU.addPreserved<MachineLoopInfo>();
MachineFunctionPass::getAnalysisUsage(AU);
}

bool runOnMachineFunction(MachineFunction &MF) override;

bool VisitLoop(MachineFunction &MF, MachineLoopInfo &MLI, MachineLoop *Loop);

		class LoopFixer {
public:		public:
static char ID; // Pass identification, replacement for typeid		LoopFixer(MachineFunction &MF, MachineLoopInfo &MLI, MachineLoop *Loop)
WebAssemblyFixIrreducibleControlFlow() : MachineFunctionPass(ID) {}		: MF(MF), MLI(MLI), Loop(Loop) {}
};
} // end anonymous namespace

char WebAssemblyFixIrreducibleControlFlow::ID = 0;		// Run the fixer on the given inputs. Returns whether changes were made.
INITIALIZE_PASS(WebAssemblyFixIrreducibleControlFlow, DEBUG_TYPE,		bool run();
"Removes irreducible control flow", false, false)

FunctionPass *llvm::createWebAssemblyFixIrreducibleControlFlow() {		private:
return new WebAssemblyFixIrreducibleControlFlow();		MachineFunction &MF;
		MachineLoopInfo &MLI;
		MachineLoop *Loop;

		MachineBasicBlock *Header;
		SmallPtrSet<MachineBasicBlock *, 4> LoopBlocks;

		using BlockSet = SmallPtrSet<MachineBasicBlock *, 4>;
		DenseMap<MachineBasicBlock *, BlockSet> Reachable;

		// The worklist contains pairs of recent additions, (a, b), where we just
		// added a link a => b.
		using BlockPair = std::pair<MachineBasicBlock , MachineBasicBlock >;
		SmallVector<BlockPair, 4> WorkList;

		// Get a canonical block to represent a block or a loop: the block, or if in
		// an inner loop, the loop header, of it in an outer loop scope, we can
		// ignore it. We need to call this on all blocks we work on.
		MachineBasicBlock canonicalize(MachineBasicBlock MBB) {
		MachineLoop *InnerLoop = MLI.getLoopFor(MBB);
		if (InnerLoop == Loop) {
		return MBB;
		} else {
		// This is either in an outer or an inner loop, and not in ours.
		if (!LoopBlocks.count(MBB)) {
		// It's in outer code, ignore it.
		return nullptr;
		}
		assert(InnerLoop);
		// It's in an inner loop, canonicalize it to the header of that loop.
		return InnerLoop->getHeader();
}		}

namespace {

/// A utility for walking the blocks of a loop, handling a nested inner
/// loop as a monolithic conceptual block.
class MetaBlock {
MachineBasicBlock *Block;
SmallVector<MachineBasicBlock *, 2> Preds;
SmallVector<MachineBasicBlock *, 2> Succs;

public:
explicit MetaBlock(MachineBasicBlock *MBB)
: Block(MBB), Preds(MBB->pred_begin(), MBB->pred_end()),
Succs(MBB->succ_begin(), MBB->succ_end()) {}

explicit MetaBlock(MachineLoop *Loop) : Block(Loop->getHeader()) {
Loop->getExitBlocks(Succs);
for (MachineBasicBlock *Pred : Block->predecessors())
if (!Loop->contains(Pred))
Preds.push_back(Pred);
}		}

MachineBasicBlock *getBlock() const { return Block; }		// For a successor we can additionally ignore it if it's a branch back to a
		// natural loop top, as when we are in the scope of a loop, we just care
const SmallVectorImpl<MachineBasicBlock *> &predecessors() const {		// about internal irreducibility, and can ignore the loop we are in. We need
return Preds;		// to call this on all blocks in a context where they are a successor.
		MachineBasicBlock canonicalizeSuccessor(MachineBasicBlock MBB) {
		if (Loop && MBB == Loop->getHeader()) {
		// Ignore branches going to the loop's natural header.
		return nullptr;
		}
		return canonicalize(MBB);
		}

		// Potentially insert a new reachable edge, and if so, note it as further
		// work.
		void maybeInsert(MachineBasicBlock MBB, MachineBasicBlock Succ) {
		assert(MBB == canonicalize(MBB));
		assert(Succ);
		// Succ may not be interesting as a sucessor.
		Succ = canonicalizeSuccessor(Succ);
		if (!Succ)
		return;
		if (Reachable[MBB].insert(Succ).second) {
		// For there to be further work, it means that we have
		// X => MBB => Succ
		// for some other X, and in that case X => Succ would be a new edge for
		// us to discover later. However, if we don't care about MBB as a
		// successor, then we don't care about that anyhow.
		if (canonicalizeSuccessor(MBB)) {
		WorkList.emplace_back(MBB, Succ);
		}
}		}
const SmallVectorImpl<MachineBasicBlock *> &successors() const {
return Succs;
}		}

bool operator==(const MetaBlock &MBB) { return Block == MBB.Block; }
bool operator!=(const MetaBlock &MBB) { return Block != MBB.Block; }
};		};

class SuccessorList final : public MetaBlock {		bool LoopFixer::run() {
size_t Index;		Header = Loop ? Loop->getHeader() : &*MF.begin();
size_t Num;

public:		// Identify all the blocks in this loop scope.
explicit SuccessorList(MachineBasicBlock *MBB)		if (Loop) {
: MetaBlock(MBB), Index(0), Num(successors().size()) {}		for (auto *MBB : Loop->getBlocks()) {
		LoopBlocks.insert(MBB);
		}
		} else {
		for (auto &MBB : MF) {
		LoopBlocks.insert(&MBB);
		}
		}

explicit SuccessorList(MachineLoop *Loop)		// Compute which (canonicalized) blocks each block can reach.
: MetaBlock(Loop), Index(0), Num(successors().size()) {}

bool HasNext() const { return Index != Num; }		// Add all the initial work.
		for (auto *MBB : LoopBlocks) {
		MachineLoop *InnerLoop = MLI.getLoopFor(MBB);

MachineBasicBlock *Next() {		if (InnerLoop == Loop) {
assert(HasNext());		for (auto *Succ : MBB->successors()) {
return successors()[Index++];		maybeInsert(MBB, Succ);
}		}
};		} else {
		// It can't be in an outer loop - we loop on LoopBlocks - and so it must
} // end anonymous namespace		// be an inner loop.
		assert(InnerLoop);
bool WebAssemblyFixIrreducibleControlFlow::VisitLoop(MachineFunction &MF,		// Check if we are the canonical block for this loop.
MachineLoopInfo &MLI,		if (canonicalize(MBB) != MBB) {
MachineLoop *Loop) {
MachineBasicBlock Header = Loop ? Loop->getHeader() : &MF.begin();
SetVector<MachineBasicBlock *> RewriteSuccs;

// DFS through Loop's body, looking for irreducible control flow. Loop is
// natural, and we stay in its body, and we treat any nested loops
// monolithically, so any cycles we encounter indicate irreducibility.
SmallPtrSet<MachineBasicBlock *, 8> OnStack;
SmallPtrSet<MachineBasicBlock *, 8> Visited;
SmallVector<SuccessorList, 4> LoopWorklist;
LoopWorklist.push_back(SuccessorList(Header));
OnStack.insert(Header);
Visited.insert(Header);
while (!LoopWorklist.empty()) {
SuccessorList &Top = LoopWorklist.back();
if (Top.HasNext()) {
MachineBasicBlock *Next = Top.Next();
if (Next == Header \|\| (Loop && !Loop->contains(Next)))
continue;
if (LLVM_LIKELY(OnStack.insert(Next).second)) {
if (!Visited.insert(Next).second) {
OnStack.erase(Next);
continue;		continue;
}		}
MachineLoop *InnerLoop = MLI.getLoopFor(Next);		// The successors are those of the loop.
if (InnerLoop != Loop)		SmallVector<MachineBasicBlock *, 2> ExitBlocks;
LoopWorklist.push_back(SuccessorList(InnerLoop));		InnerLoop->getExitBlocks(ExitBlocks);
else		for (auto *Succ : ExitBlocks) {
LoopWorklist.push_back(SuccessorList(Next));		maybeInsert(MBB, Succ);
} else {
RewriteSuccs.insert(Top.getBlock());
}		}
continue;
}		}
OnStack.erase(Top.getBlock());
LoopWorklist.pop_back();
}		}

// Most likely, we didn't find any irreducible control flow.		// Do work until we are all done.
if (LLVM_LIKELY(RewriteSuccs.empty()))		while (!WorkList.empty()) {
		MachineBasicBlock *MBB;
		MachineBasicBlock *Succ;
		std::tie(MBB, Succ) = WorkList.pop_back_val();
		// The worklist item is an edge we just added, so it must have valid blocks
		// (and not something canonicalized to nullptr).
		assert(MBB);
		assert(Succ);
		// The successor in that pair must also be a valid successor.
		assert(MBB == canonicalizeSuccessor(MBB));
		// We recently added MBB => Succ, and that means we may have enabled
		// Pred => MBB => Succ. Check all the predecessors. Note that our loop here
		// is correct for both a block and a block representing a loop, as the loop
		// is natural and so the predecessors are all predecessors of the loop
		// header, which is the block we have here.
		for (auto *Pred : MBB->predecessors()) {
		// Canonicalize, make sure it's relevant, and check it's not the same
		// block (an update to the block itself doesn't help compute that same
		// block).
		Pred = canonicalize(Pred);
		if (Pred && Pred != MBB) {
		maybeInsert(Pred, Succ);
		}
		}
		}

		// It's now trivial to identify the loopers.
		SmallPtrSet<MachineBasicBlock *, 4> Loopers;
		for (auto MBB : LoopBlocks) {
		if (Reachable[MBB].count(MBB)) {
		Loopers.insert(MBB);
		}
		}
		// The header cannot be a looper. At the toplevel, LLVM does not allow the
		// entry to be in a loop, and in a natural loop we should ignore the header.
		assert(Loopers.count(Header) == 0);

		// Find the entries, loopers reachable from non-loopers.
		SmallPtrSet<MachineBasicBlock *, 4> Entries;
		SmallVector<MachineBasicBlock *, 4> SortedEntries;
		for (auto *Looper : Loopers) {
		for (auto *Pred : Looper->predecessors()) {
		Pred = canonicalize(Pred);
		if (Pred && !Loopers.count(Pred)) {
		Entries.insert(Looper);
		SortedEntries.push_back(Looper);
		break;
		}
		}
		}

		// Check if we found irreducible control flow.
		if (LLVM_LIKELY(Entries.size() <= 1))
return false;		return false;

LLVM_DEBUG(dbgs() << "Irreducible control flow detected!\n");		// Sort the entries to ensure a deterministic build.
		llvm::sort(SortedEntries,
		[&](const MachineBasicBlock A, const MachineBasicBlock B) {
		auto ANum = A->getNumber();
		auto BNum = B->getNumber();
		assert(ANum != -1 && BNum != -1);
		assert(ANum != BNum);
		return ANum < BNum;
		});

// Ok. We have irreducible control flow! Create a dispatch block which will		// Create a dispatch block which will contain a jump table to the entries.
// contains a jump table to any block in the problematic set of blocks.
MachineBasicBlock *Dispatch = MF.CreateMachineBasicBlock();		MachineBasicBlock *Dispatch = MF.CreateMachineBasicBlock();
MF.insert(MF.end(), Dispatch);		MF.insert(MF.end(), Dispatch);
MLI.changeLoopFor(Dispatch, Loop);		MLI.changeLoopFor(Dispatch, Loop);

// Add the jump table.		// Add the jump table.
const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();		const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();
MachineInstrBuilder MIB = BuildMI(*Dispatch, Dispatch->end(), DebugLoc(),		MachineInstrBuilder MIB = BuildMI(*Dispatch, Dispatch->end(), DebugLoc(),
TII.get(WebAssembly::BR_TABLE_I32));		TII.get(WebAssembly::BR_TABLE_I32));

// Add the register which will be used to tell the jump table which block to		// Add the register which will be used to tell the jump table which block to
// jump to.		// jump to.
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
unsigned Reg = MRI.createVirtualRegister(&WebAssembly::I32RegClass);		unsigned Reg = MRI.createVirtualRegister(&WebAssembly::I32RegClass);
MIB.addReg(Reg);		MIB.addReg(Reg);

// Collect all the blocks which need to have their successors rewritten,		// Compute the indices in the superheader, one for each bad block, and
// add the successors to the jump table, and remember their index.		// add them as successors.
DenseMap<MachineBasicBlock *, unsigned> Indices;		DenseMap<MachineBasicBlock *, unsigned> Indices;
SmallVector<MachineBasicBlock *, 4> SuccWorklist(RewriteSuccs.begin(),		for (auto *MBB : SortedEntries) {
RewriteSuccs.end());
while (!SuccWorklist.empty()) {
MachineBasicBlock *MBB = SuccWorklist.pop_back_val();
auto Pair = Indices.insert(std::make_pair(MBB, 0));		auto Pair = Indices.insert(std::make_pair(MBB, 0));
if (!Pair.second)		if (!Pair.second) {
continue;		continue;
		}

unsigned Index = MIB.getInstr()->getNumExplicitOperands() - 1;		unsigned Index = MIB.getInstr()->getNumExplicitOperands() - 1;
LLVM_DEBUG(dbgs() << printMBBReference(*MBB) << " has index " << Index
<< "\n");

Pair.first->second = Index;		Pair.first->second = Index;
for (auto Pred : MBB->predecessors())
RewriteSuccs.insert(Pred);

MIB.addMBB(MBB);		MIB.addMBB(MBB);
Dispatch->addSuccessor(MBB);		Dispatch->addSuccessor(MBB);
		}

MetaBlock Meta(MBB);		// Rewrite the problematic successors for every block that wants to reach the
for (auto *Succ : Meta.successors())		// bad blocks. For simplicity, we just introduce a new block for every edge
if (Succ != Header && (!Loop \|\| Loop->contains(Succ)))		// we need to rewrite. (Fancier things are possible.)
SuccWorklist.push_back(Succ);
		SmallVector<MachineBasicBlock *, 4> AllPreds;
		for (auto *MBB : SortedEntries) {
		for (auto *Pred : MBB->predecessors()) {
		if (Pred != Dispatch) {
		AllPreds.push_back(Pred);
		}
		}
}		}

// Rewrite the problematic successors for every block in RewriteSuccs.		for (MachineBasicBlock *MBB : AllPreds) {
// For simplicity, we just introduce a new block for every edge we need to
// rewrite. Fancier things are possible.
for (MachineBasicBlock *MBB : RewriteSuccs) {
DenseMap<MachineBasicBlock , MachineBasicBlock > Map;		DenseMap<MachineBasicBlock , MachineBasicBlock > Map;
for (auto *Succ : MBB->successors()) {		for (auto *Succ : MBB->successors()) {
if (!Indices.count(Succ))		if (!Entries.count(Succ)) {
continue;		continue;
		}

		// This is a successor we need to rewrite.
MachineBasicBlock *Split = MF.CreateMachineBasicBlock();		MachineBasicBlock *Split = MF.CreateMachineBasicBlock();
MF.insert(MBB->isLayoutSuccessor(Succ) ? MachineFunction::iterator(Succ)		MF.insert(MBB->isLayoutSuccessor(Succ) ? MachineFunction::iterator(Succ)
: MF.end(),		: MF.end(),
Split);		Split);
MLI.changeLoopFor(Split, Loop);		MLI.changeLoopFor(Split, Loop);

// Set the jump table's register of the index of the block we wish to		// Set the jump table's register of the index of the block we wish to
// jump to, and jump to the jump table.		// jump to, and jump to the jump table.
Show All 17 Lines	bool LoopFixer::run() {
// Create a fake default label, because br_table requires one.		// Create a fake default label, because br_table requires one.
MIB.addMBB(MIB.getInstr()		MIB.addMBB(MIB.getInstr()
->getOperand(MIB.getInstr()->getNumExplicitOperands() - 1)		->getOperand(MIB.getInstr()->getNumExplicitOperands() - 1)
.getMBB());		.getMBB());

return true;		return true;
}		}

bool WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction(		class WebAssemblyFixIrreducibleControlFlow final : public MachineFunctionPass {
MachineFunction &MF) {		StringRef getPassName() const override {
LLVM_DEBUG(dbgs() << "******** Fixing Irreducible Control Flow ********\n"		return "WebAssembly Fix Irreducible Control Flow";
"********** Function: "		}
<< MF.getName() << '\n');

bool Changed = false;		void getAnalysisUsage(AnalysisUsage &AU) const override {
auto &MLI = getAnalysis<MachineLoopInfo>();		AU.setPreservesCFG();
		AU.addRequired<MachineDominatorTree>();
		AU.addPreserved<MachineDominatorTree>();
		AU.addRequired<MachineLoopInfo>();
		AU.addPreserved<MachineLoopInfo>();
		MachineFunctionPass::getAnalysisUsage(AU);
		}

		bool runOnMachineFunction(MachineFunction &MF) override;

		bool runIteration(MachineFunction &MF, MachineLoopInfo &MLI) {
// Visit the function body, which is identified as a null loop.		// Visit the function body, which is identified as a null loop.
Changed \|= VisitLoop(MF, MLI, nullptr);		if (LoopFixer(MF, MLI, nullptr).run()) {
		return true;
		}

// Visit all the loops.		// Visit all the loops.
SmallVector<MachineLoop *, 8> Worklist(MLI.begin(), MLI.end());		SmallVector<MachineLoop *, 8> Worklist(MLI.begin(), MLI.end());
while (!Worklist.empty()) {		while (!Worklist.empty()) {
MachineLoop *CurLoop = Worklist.pop_back_val();		MachineLoop *Loop = Worklist.pop_back_val();
Worklist.append(CurLoop->begin(), CurLoop->end());		Worklist.append(Loop->begin(), Loop->end());
Changed \|= VisitLoop(MF, MLI, CurLoop);		if (LoopFixer(MF, MLI, Loop).run()) {
		return true;
		}
		}

		return false;
}		}

// If we made any changes, completely recompute everything.		public:
if (LLVM_UNLIKELY(Changed)) {		static char ID; // Pass identification, replacement for typeid
LLVM_DEBUG(dbgs() << "Recomputing dominators and loops.\n");		WebAssemblyFixIrreducibleControlFlow() : MachineFunctionPass(ID) {}
		};
		} // end anonymous namespace

		char WebAssemblyFixIrreducibleControlFlow::ID = 0;
		INITIALIZE_PASS(WebAssemblyFixIrreducibleControlFlow, DEBUG_TYPE,
		"Removes irreducible control flow", false, false)

		FunctionPass *llvm::createWebAssemblyFixIrreducibleControlFlow() {
		return new WebAssemblyFixIrreducibleControlFlow();
		}

		bool WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction(
		MachineFunction &MF) {
		LLVM_DEBUG(dbgs() << "******** Fixing Irreducible Control Flow ********\n"
		"********** Function: "
		<< MF.getName() << '\n');

		bool Changed = false;
		auto &MLI = getAnalysis<MachineLoopInfo>();

		// When we modify something, bail out and recompute MLI, then start again, as
		// we create a new natural loop when we resolve irreducible control flow, and
		// other loops may become nested in it, etc. In practice this is not an issue
		// because irreducible control flow is rare, only very few cycles are needed
		// here.
		while (LLVM_UNLIKELY(runIteration(MF, MLI))) {
		// We rewrote part of the function; recompute MLI and start again.
		LLVM_DEBUG(dbgs() << "Recomputing loops.\n");
MF.getRegInfo().invalidateLiveness();		MF.getRegInfo().invalidateLiveness();
MF.RenumberBlocks();		MF.RenumberBlocks();
getAnalysis<MachineDominatorTree>().runOnMachineFunction(MF);		getAnalysis<MachineDominatorTree>().runOnMachineFunction(MF);
MLI.runOnMachineFunction(MF);		MLI.runOnMachineFunction(MF);
		Changed = true;
}		}

return Changed;		return Changed;
}		}

llvm/trunk/test/CodeGen/WebAssembly/irreducible-cfg-exceptions.ll

				; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers -enable-emscripten-cxx-exceptions \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
				target triple = "wasm32-unknown-unknown"

				declare i32 @__gxx_personality_v0(...)

				; Check an interesting case of complex control flow due to exceptions CFG rewriting.
				; There should not be any irreducible control flow here.

				; CHECK-LABEL: crashy:
				; CHECK-NOT: br_table

				; Function Attrs: minsize noinline optsize
				define void @crashy() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				invoke void undef()
				to label %invoke.cont unwind label %lpad

				invoke.cont: ; preds = %entry
				invoke void undef()
				to label %invoke.cont4 unwind label %lpad3

				invoke.cont4: ; preds = %invoke.cont
				%call.i82 = invoke i8* undef()
				to label %invoke.cont6 unwind label %lpad3

				invoke.cont6: ; preds = %invoke.cont4
				invoke void undef()
				to label %invoke.cont13 unwind label %lpad12

				invoke.cont13: ; preds = %invoke.cont6
				br label %for.cond

				for.cond: ; preds = %for.cond.backedge, %invoke.cont13
				br i1 undef, label %exit2, label %land.lhs

				land.lhs: ; preds = %for.cond
				%call.i.i.i.i92 = invoke i32 undef()
				to label %exit1 unwind label %lpad16.loopexit

				exit1: ; preds = %land.lhs
				br label %exit2

				exit2: ; preds = %exit1, %for.cond
				%call.i.i12.i.i93 = invoke i32 undef()
				to label %exit3 unwind label %lpad16.loopexit

				exit3: ; preds = %exit2
				invoke void undef()
				to label %invoke.cont23 unwind label %lpad22

				invoke.cont23: ; preds = %exit3
				invoke void undef()
				to label %invoke.cont25 unwind label %lpad22

				invoke.cont25: ; preds = %invoke.cont23
				%call.i.i137 = invoke i32 undef()
				to label %invoke.cont29 unwind label %lpad16.loopexit

				lpad: ; preds = %entry
				%0 = landingpad { i8*, i32 }
				cleanup
				unreachable

				lpad3: ; preds = %invoke.cont4, %invoke.cont
				%1 = landingpad { i8*, i32 }
				cleanup
				unreachable

				lpad12: ; preds = %invoke.cont6
				%2 = landingpad { i8*, i32 }
				cleanup
				resume { i8*, i32 } undef

				lpad16.loopexit: ; preds = %if.then, %invoke.cont29, %invoke.cont25, %exit2, %land.lhs
				%lpad.loopexit = landingpad { i8*, i32 }
				cleanup
				unreachable

				lpad22: ; preds = %invoke.cont23, %exit3
				%3 = landingpad { i8*, i32 }
				cleanup
				unreachable

				invoke.cont29: ; preds = %invoke.cont25
				invoke void undef()
				to label %invoke.cont33 unwind label %lpad16.loopexit

				invoke.cont33: ; preds = %invoke.cont29
				br label %for.inc

				for.inc: ; preds = %invoke.cont33
				%cmp.i.i141 = icmp eq i8* undef, undef
				br i1 %cmp.i.i141, label %if.then, label %if.end.i.i146

				if.then: ; preds = %for.inc
				%call.i.i148 = invoke i32 undef()
				to label %for.cond.backedge unwind label %lpad16.loopexit

				for.cond.backedge: ; preds = %if.end.i.i146, %if.then
				br label %for.cond

				if.end.i.i146: ; preds = %for.inc
				call void undef()
				br label %for.cond.backedge
				}

llvm/trunk/test/CodeGen/WebAssembly/irreducible-cfg-nested.ll

				; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
				target triple = "wasm32-unknown-unknown"


				; Test an interesting pattern of nested irreducibility.
				; Just check we resolve all the irreducibility here (if not we'd crash).

				; CHECK-LABEL: tre_parse:

				define void @tre_parse() {
				entry:
				br label %for.cond.outer

				for.cond.outer: ; preds = %do.body14, %entry
				br label %for.cond

				for.cond: ; preds = %for.cond.backedge, %for.cond.outer
				%nbranch.0 = phi i32* [ null, %for.cond.outer ], [ %call188, %for.cond.backedge ]
				switch i8 undef, label %if.else [
				i8 40, label %do.body14
				i8 41, label %if.then63
				]

				do.body14: ; preds = %for.cond
				br label %for.cond.outer

				if.then63: ; preds = %for.cond
				unreachable

				if.else: ; preds = %for.cond
				switch i8 undef, label %if.then84 [
				i8 92, label %if.end101
				i8 42, label %if.end101
				]

				if.then84: ; preds = %if.else
				switch i8 undef, label %cleanup.thread [
				i8 43, label %if.end101
				i8 63, label %if.end101
				i8 123, label %if.end101
				]

				if.end101: ; preds = %if.then84, %if.then84, %if.then84, %if.else, %if.else
				unreachable

				cleanup.thread: ; preds = %if.then84
				%call188 = tail call i32* undef(i32* %nbranch.0)
				switch i8 undef, label %for.cond.backedge [
				i8 92, label %land.lhs.true208
				i8 0, label %if.else252
				]

				land.lhs.true208: ; preds = %cleanup.thread
				unreachable

				for.cond.backedge: ; preds = %cleanup.thread
				br label %for.cond

				if.else252: ; preds = %cleanup.thread
				unreachable
				}

llvm/trunk/test/CodeGen/WebAssembly/irreducible-cfg-nested2.ll

				; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
				target triple = "wasm32-unknown-unknown"

				; Test an interesting pattern of nested irreducibility.
				; Just check we resolve all the irreducibility here (if not we'd crash).

				; CHECK-LABEL: func_2:

				; Function Attrs: noinline nounwind optnone
				define void @func_2() {
				entry:
				br i1 undef, label %lbl_937, label %if.else787

				lbl_937: ; preds = %for.body978, %entry
				br label %if.end965

				if.else787: ; preds = %entry
				br label %if.end965

				if.end965: ; preds = %if.else787, %lbl_937
				br label %for.cond967

				for.cond967: ; preds = %for.end1035, %if.end965
				br label %for.cond975

				for.cond975: ; preds = %if.end984, %for.cond967
				br i1 undef, label %for.body978, label %for.end1035

				for.body978: ; preds = %for.cond975
				br i1 undef, label %lbl_937, label %if.end984

				if.end984: ; preds = %for.body978
				br label %for.cond975

				for.end1035: ; preds = %for.cond975
				br label %for.cond967
				}

llvm/trunk/test/CodeGen/WebAssembly/irreducible-cfg.ll

; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s		; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s

; Test irreducible CFG handling.		; Test irreducible CFG handling.

target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"		target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
target triple = "wasm32-unknown-unknown"		target triple = "wasm32-unknown-unknown"

; A simple loop with two entries.		; A simple loop with two entries.

; CHECK-LABEL: test0:		; CHECK-LABEL: test0:
; CHECK: f64.load		; CHECK: f64.load
; CHECK: i32.const $[[REG:[^,]+]]=, 0{{$}}		; CHECK: i32.const $[[REG:[^,]+]]=
; CHECK: br_table $[[REG]],		; CHECK: br_table $[[REG]],
define void @test0(double* %arg, i32 %arg1, i32 %arg2, i32 %arg3) {		define void @test0(double* %arg, i32 %arg1, i32 %arg2, i32 %arg3) {
bb:		bb:
%tmp = icmp eq i32 %arg2, 0		%tmp = icmp eq i32 %arg2, 0
br i1 %tmp, label %bb6, label %bb3		br i1 %tmp, label %bb6, label %bb3

bb3:		bb3:
%tmp4 = getelementptr double, double* %arg, i32 %arg3		%tmp4 = getelementptr double, double* %arg, i32 %arg3
Show All 24 Lines
bb19:		bb19:
ret void		ret void
}		}

; A simple loop with two entries and an inner natural loop.		; A simple loop with two entries and an inner natural loop.

; CHECK-LABEL: test1:		; CHECK-LABEL: test1:
; CHECK: f64.load		; CHECK: f64.load
; CHECK: i32.const $[[REG:[^,]+]]=, 0{{$}}		; CHECK: i32.const $[[REG:[^,]+]]=
; CHECK: br_table $[[REG]],		; CHECK: br_table $[[REG]],
define void @test1(double* %arg, i32 %arg1, i32 %arg2, i32 %arg3) {		define void @test1(double* %arg, i32 %arg1, i32 %arg2, i32 %arg3) {
bb:		bb:
%tmp = icmp eq i32 %arg2, 0		%tmp = icmp eq i32 %arg2, 0
br i1 %tmp, label %bb6, label %bb3		br i1 %tmp, label %bb6, label %bb3

bb3:		bb3:
%tmp4 = getelementptr double, double* %arg, i32 %arg3		%tmp4 = getelementptr double, double* %arg, i32 %arg3
Show All 25 Lines	bb13:
%tmp17 = fadd double %tmp14, 1.300000e+00		%tmp17 = fadd double %tmp14, 1.300000e+00
store double %tmp17, double* %tmp16, align 4		store double %tmp17, double* %tmp16, align 4
%tmp18 = add nsw i32 %tmp15, 1		%tmp18 = add nsw i32 %tmp15, 1
br label %bb6		br label %bb6

bb19:		bb19:
ret void		ret void
}		}

		; A simple loop 2 blocks that are both entries.

		; CHECK-LABEL: test2:
		; CHECK: br_if
		; CHECK: i32.const $[[REG:[^,]+]]=
		; CHECK: br_table $[[REG]],
		define internal i32 @test2(i32) noinline {
		entry:
		br label %A0

		A0:
		%a0a = tail call i32 @test2(i32 1)
		%a0b = icmp eq i32 %a0a, 0
		br i1 %a0b, label %A1, label %A2

		A1:
		%a1a = tail call i32 @test2(i32 2)
		%a1b = icmp eq i32 %a1a, 0
		br i1 %a1b, label %A1, label %A2

		A2:
		%a2a = tail call i32 @test2(i32 3)
		%a2b = icmp eq i32 %a2a, 0
		br i1 %a2b, label %A1, label %A2
		}

		; An interesting loop with inner loop and if-else structure too.

		; CHECK-LABEL: test3:
		; CHECK: br_if
		define void @test3(i32 %ws) {
		entry:
		%ws.addr = alloca i32, align 4
		store volatile i32 %ws, i32* %ws.addr, align 4
		%0 = load volatile i32, i32* %ws.addr, align 4
		%tobool = icmp ne i32 %0, 0
		br i1 %tobool, label %if.then, label %if.end

		if.then: ; preds = %entry
		br label %wynn

		if.end: ; preds = %entry
		%1 = load volatile i32, i32* %ws.addr, align 4
		%tobool1 = icmp ne i32 %1, 0
		br i1 %tobool1, label %if.end9, label %if.then2

		if.then2: ; preds = %if.end
		br label %for.cond

		for.cond: ; preds = %wynn, %if.then7, %if.then2
		%2 = load volatile i32, i32* %ws.addr, align 4
		%tobool3 = icmp ne i32 %2, 0
		br i1 %tobool3, label %if.then4, label %if.end5

		if.then4: ; preds = %for.cond
		br label %if.end5

		if.end5: ; preds = %if.then4, %for.cond
		%3 = load volatile i32, i32* %ws.addr, align 4
		%tobool6 = icmp ne i32 %3, 0
		br i1 %tobool6, label %if.then7, label %if.end8

		if.then7: ; preds = %if.end5
		br label %for.cond

		if.end8: ; preds = %if.end5
		br label %wynn

		wynn: ; preds = %if.end8, %if.then
		br label %for.cond

		if.end9: ; preds = %if.end
		ret void
		}

		; Multi-level irreducibility, after reducing in the main scope we must then
		; reduce in the inner loop that we just created.
		; CHECK: br_table
		; CHECK: br_table
		define void @pi_next() {
		entry:
		br i1 undef, label %sw.bb5, label %return

		sw.bb5: ; preds = %entry
		br i1 undef, label %if.then.i49, label %if.else.i52

		if.then.i49: ; preds = %sw.bb5
		br label %for.inc197.i

		if.else.i52: ; preds = %sw.bb5
		br label %for.cond57.i

		for.cond57.i: ; preds = %for.inc205.i, %if.else.i52
		store i32 0, i32* undef, align 4
		br label %for.cond65.i

		for.cond65.i: ; preds = %for.inc201.i, %for.cond57.i
		br i1 undef, label %for.body70.i, label %for.inc205.i

		for.body70.i: ; preds = %for.cond65.i
		br label %for.cond76.i

		for.cond76.i: ; preds = %for.inc197.i, %for.body70.i
		%0 = phi i32 [ %inc199.i, %for.inc197.i ], [ 0, %for.body70.i ]
		%cmp81.i = icmp slt i32 %0, 0
		br i1 %cmp81.i, label %for.body82.i, label %for.inc201.i

		for.body82.i: ; preds = %for.cond76.i
		br label %for.inc197.i

		for.inc197.i: ; preds = %for.body82.i, %if.then.i49
		%inc199.i = add nsw i32 undef, 1
		br label %for.cond76.i

		for.inc201.i: ; preds = %for.cond76.i
		br label %for.cond65.i

		for.inc205.i: ; preds = %for.cond65.i
		br label %for.cond57.i

		return: ; preds = %entry
		ret void
		}