This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
-
WebAssemblyFixIrreducibleControlFlow.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
irreducible-cfg-nested.ll
-
irreducible-cfg-nested2.ll
-
irreducible-cfg.ll

Differential D58919

[WebAssembly] Irreducible control flow rewrite
ClosedPublic

Authored by kripken on Mar 4 2019, 12:31 PM.

Download Raw Diff

Details

Reviewers

aheejin

Commits

rGa41250c7be56: [WebAssembly] Irreducible control flow rewrite
rL356313: [WebAssembly] Irreducible control flow rewrite

Summary

Rewrite WebAssemblyFixIrreducibleControlFlow to a simpler and cleaner design,
which directly computes reachability and other properties itself. This avoids
previous complexity and bugs. (The new graph analyses are very similar to how
the Relooper algorithm would find loop entries and so forth.)

This fixes a few bugs, including where we had a false positive and thought
fannkuch was irreducible when it was not, which made us much larger and
slower there, and a reverse bug where we missed irreducibility. On fannkuch,
we used to be 44% slower than asm2wasm and are now 4% faster.

Diff Detail

Repository: rL LLVM

Event Timeline

kripken created this revision.Mar 4 2019, 12:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 12:31 PM

Herald added subscribers: llvm-commits, jdoerfert, mgrang and 4 others. · View Herald Transcript

Sorry for the delayed reply!

What changes fixed the previous bug?

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
30 ↗	(On Diff #189189)	Why is this?
144 ↗	(On Diff #189189)	Real nit: We can merge these into a single line: MachineBasicBlock MBB, Succ;
158 ↗	(On Diff #189189)	We used to have the description on the definitions of 'loopers' and 'non-loopers' in the comment at the beginning of the file, but not anymore. Can we have that again here maybe?
167 ↗	(On Diff #189189)	Does this definition cover when `Entry` is an entry (or header) of a loop? I'm wondering because the reachability computation here does not consider backedges to `Entry`, so it will not be discovered as a looper. Also does it cover this case? bb0: br bb1 bb1: br_if bb0 br bb2 bb2: br bb3 bb3: br bb2 So here bb0 and bb1 comprise a loop and bb2 and bb3 comprise another loop. And all four blocks are loopers, but bb0 and bb2 are loop entries. But they are not reachable from any of non-loopers, because there are no non-loopers in this CFG. (Also here bb0 is the header of a loop so it belongs to the first case where `Entry` is a header)
201 ↗	(On Diff #189189)	Why is this worklist a set?
222 ↗	(On Diff #189189)	Do we need to make this as a separate class now? Unlike the previous version, we create `LoopFixer` object only once in `WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction`. I think we can add paste the body of `LoopFixer::run` into `WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction` and make `LoopFixer:: processRegion` as a member function of `WebAssemblyFixIrreducibleControlFlow` that returns a bool.
270 ↗	(On Diff #189189)	I might be mistaken, but the algorithm seems to not care about whether loop entries found are within the same level loop. For example, the case where there are two-level nested loops and block A is one of the entries to the outer loop and B is one of the entries for the inner loop. Even in this case, we don't distinguish them and resolve these with a single pass with a single dispatch block. Is this OK and intended? And for the 'mutual entries' above, can't they also be not in the same loop but each in an outer loop and an inner loop?
275 ↗	(On Diff #189189)	How about reducing depth of the subloop processing routine below by wrapping this loop here early, like while (FoundIrreducibility) { ReachabilityGraph Graph(Entry, Blocks); bool FoundIrreducibility = false; for (auto LoopEntry : Graph.getLoopEntries()) { ... } } for (auto LoopEntry : Graph.getLoopEntries()) { ... } Btw, is there any case we can find irreducibility again with the same region before we recurse into inner loops below? If there is, how is that case different from the case that's processed below on the inner loops?
286 ↗	(On Diff #189189)	In which case do we need to recurse into inner loops? (Given that we didn't distinguish inner loops and outer loops above) Could you add some comments?
303 ↗	(On Diff #189189)	Nit: We can do in a single line BlockVector SortedEntries(Entries.begin(), Entries.end());
347 ↗	(On Diff #189189)	Is there any case `!Pair.second` is not set? Aren't all blocks in `SortedEntries` distinct?
371 ↗	(On Diff #189189)	This loop does not seem to check if a corresponding `Split` block for `Entry` has been already created. In case `Entry` has more than one predecessors, `Split` for `Entry` is gonna be created multiple times and `Map[Entry]` will be overwritten. We should do something like this inside the loop: if (Map.count(Entry)) continue;
378 ↗	(On Diff #189189)	Did you mean Entry->isLayoutSuccessor(Pred) ? In the current state all new blocks are gonna be appended at the end, because this condition is hardly likely to be satisfied.
390 ↗	(On Diff #189189)	Nit: There is a [[ https://github.com/llvm/llvm-project/blob/a946997c2482e4386549ee38b4bb154eb58efbb6/llvm/include/llvm/CodeGen/MachineInstrBuilder.h#L405-L410 \| `BuildMI` function ]] that defaults to append instruction at the end, so you can just do BuildMI(Split, DebugLoc(), TII.get(WebAssembly::CONST_I32), Reg) .addImm(Indices[Entry]); BuildMI(Split, DebugLoc(), TII.get(WebAssembly::BR)).addMBB(Dispatch); (Note that `Split` does not have `*` anymore)
test/CodeGen/WebAssembly/irreducible-cfg.ll
224 ↗	(On Diff #189189)	If this is without irreducibility, shouldn't this be in `non-irreducible-cfg.ll` below? Can we remove `#0`?

What changes fixed the previous bug?

Basically the rewrite avoids the entire previous approach of "canonicalization", of representing an inner loop by its entry. When we got that wrong, we could reach very wrong results.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
30 ↗	(On Diff #189189)	The NumLoops^2 part? To find mutual loop entries, that is, loop entries that can reach each other, we compare each one to the others. This is actually just the square of the number of loops in a single scope, so nested loops don't count. Anyhow, even in the "worst case" which is a loop after a loop after a loop, this factor is going to be quite small, I'd guess.
167 ↗	(On Diff #189189)	When we look at a region, the entry block cannot be a looper, which simplifies things. LLVM doesn't let a function entry be in a loop (maybe for similar reasons as why it's convenient here?), and when we recurse into a loop we ignore backedges to Entry intentionally (so we see the inner part of the loop, and not the looping). In other words, if a block is a looper, it will be seen as such in the outer scope it is in (where it is not the entry). When it is the entry into a region, and we are looking at just that region, it cannot be a looper there. For the second issue (with that example): yeah, the comment was not clear enough, I'll update it. They key is that a loop enterer is a block not in the same loop, that enters it. So here bb2 is a looper, and bb1 is a predecessor of it, and not in the same loop (bb2 can't reach bb1), so we identify bb2 as a loop entry and bb1 as a loop enterer for it.
201 ↗	(On Diff #189189)	As a set, it won't have duplicate entries - so if we visit A that causes us to add C to the future work, and then visit B that also causes us to add C to the future work, then when we get to C we'll visit it once, and avoid visiting it again later (and not having anything to do the second time). I haven't benchmarked this recently, but I remember it was beneficial in the Relooper.
222 ↗	(On Diff #189189)	Good idea, thanks!
270 ↗	(On Diff #189189)	Yeah, it's somewhat debatable (is an inner loop an "inner loop" if there's a branch going into it from several levels up?), but yes, when we see irreducibility we handle it now, instead of leaving it for the inner loop. We have to, I think - if we can branch to it now, we need to fix up that branch now - we wouldn't see it later when we recurse and ignore the outside. At least that would require a very different approach I suppose (which might be more optimal in some cases).
275 ↗	(On Diff #189189)	The problem is that the Graph is used outside of the loop in your code. I agree it's a little awkward as it is... another option is to use an std::unique_ptr to allow using it from the outside. But that's less clear I think. Yes, we might find irreducibility again if we have two irreducible loops one after the other (i.e. not nested). It's possible we don't need to look at the irreducible one we've already fixed, however, I don't have a proof of that. And we do need to recompute the Graph again after fixing one loop (since it can affect others; and trying to update it is fairly complex an optimization), so starting another loop iteration is the simple thing. The cost is another iteration per irreducible loop, which is not that bad since they are rare. I'll try to improve the comment here.
286 ↗	(On Diff #189189)	Good point, this is important to explain, adding.
347 ↗	(On Diff #189189)	This is code not changed in this patch (just some variable renaming), but I think you're right, fixing.
371 ↗	(On Diff #189189)	This is code that did not change in this patch (aside from names and clang-format), and I'm not sure I fully understand it (I think Dan wrote it?). But I don't think `Map[Entry]` can be overwritten? The shape of the code is DenseMap<MachineBasicBlock , MachineBasicBlock > Map; for (auto *Entry : Pred->successors()) { [..] Map[Entry] = Split; }
378 ↗	(On Diff #189189)	As before, this is Dan's code that I didn't change here (aside from variable names). I'm not sure what it does...
test/CodeGen/WebAssembly/irreducible-cfg.ll
224 ↗	(On Diff #189189)	The other file has -O0, and this one doesn't specify an opt level (so it's -O2 I guess?), so I kept them separate. Removed `#0`.

Review feedback changes, in particular:

Remove LoopFixer class.
Various comment additions and improvements.
Various nits.

Real nit: We usually start CL/commit headers with [WebAssembly]

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
64 ↗	(On Diff #190328)	Not sure what's happened, but it says "Context not available" on the diff on these lines. Could you rebase your CL onto the tip of the repo? We usually include `WebAssembly.h` in every pass, and I guess we need to include (Maybe it's not shown here because of the diff problem?)

There are several places in the diff that says "Context not available". I think the diff has some problems. (I thought it needed rebase but maybe it's not relevant) Could you check?
In this new algorithm, all tests in irreducible-cfg-nested.ll, irreducible-cfg-nested2.ll, and irreducible-cfg-exceptions.ll are reducible now, i.e. this pass does not do anything on them. But some of them are irreducible if we pass -O0. (I only checked func_2 in irreducible-cfg-nested2.ll) I guess we should check these tests and either delete the ones that are reducible or change the flags to llc to -O0 or something to make them irreducible again. Also test3 in irredducible-cfg.ll is now considered as reducible too.
On second thought, wouldn't it better to just specify -O0 for all irreducibility tests? -O2 basically means we don't now what would happen to our carefully created irreducible CFG.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
167 ↗	(On Diff #189189)	Oh I see. Thanks. I didn't know that function entries cannot be loop headers.
201 ↗	(On Diff #189189)	But isn't it possible that by the time you try to add C for the second time, the first C has been already erased from the set so you do the unnecessary work again? Usually what I've seen was making worklist as a vector and have a separate set called something like `Visited`, to which visited blocks are added and not erased.
270 ↗	(On Diff #189189)	Is there a condition in the code that we only consider inner loops that have entries from several levels up? I don't think I found one. We just treat all blocks within the region the same way. So, for example, an outer loop consists of blocks A, B, C, D, E, and F, and among them E and F comprises an inner loop. The outer loop has two entries: A and D from the blocks outside, and the inner loop also has two entries: E and F. E's enterer is C and F's enterer is D. In this case, can we solve this case in one pass with a single dispatch block that branches into all of A, D, E, and F? I guess I don't understand the algorithm :( Sorry for that. Please correct me if I'm mistaken.
275 ↗	(On Diff #189189)	I see. I missed the graph computation was there too.
371 ↗	(On Diff #189189)	Ah right, it's not gonna be overwritten. So the code structure is for (MachineBasicBlock Pred : AllPreds) { DenseMap<MachineBasicBlock , MachineBasicBlock > Map; for (auto Entry : Pred->successors()) { ... Map[Entry] = Split; } And in case we have entries E0 and E1, and there are predecessors P0 and P1, and the edges are like P0->E0, P0->E1, P1->E0, and P1->E1. So the both preds point to the both entries. In this case, each entry (E0 or E1) is processed twice in the inner for loop. Because `Map` is created within the outer loop, it's not gonna be overwritten, but `Split` for each E0 and E1 is gonna be created twice unnecessarily. (I know it's not the part of the code that's changed in this CL; but I thought while we're writing the major part of the pass, maybe we can fix other things too. If you don't prefer that maybe I can submit that as a separate patch. Let me know which one you prefer.)
378 ↗	(On Diff #189189)	(Because code parts have moved, this comment is not where it was anymore, so) The code is // This is a successor we need to rewrite. MachineBasicBlock *Split = MF.CreateMachineBasicBlock(); MF.insert(Pred->isLayoutSuccessor(Entry) ? MachineFunction::iterator(Entry) : MF.end(), Split); I think the intention of the original code was, when we insert the `Split` which is to serve as a stepping stone between `Pred` and `Entry`, if currently `Pred` is right before `Entry` in the BB order, we want to insert `Split` between them, which makes a natural order. If not, we just append the new `Split` at the end of the function. I think this bug was there all along for years, and this was all newly created blocks are appended at the end. I know this is not related to this CL, but if you prefer you can fix it maybe, or I can submit that as a separate patch.
390 ↗	(On Diff #189189)	Ping
230 ↗	(On Diff #190328)	Nit: Can we take out this and `makeSingleEntryLoop` from the class definition? I don't think we want to inline these long functions and we can have one less level of indentation outside. I mean, outside the class, bool WebAssemblyFixIrreducibleControlFlow::processRegion(...) { ... }
test/CodeGen/WebAssembly/irreducible-cfg.ll
224 ↗	(On Diff #189189)	You can use two different commands and check them separately within a single file. ; RUN: llc < %s -O0 -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s ; RUN: llc < %s -O2 -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck -check-prefix=OPT %s ; OPT-NOT: br_table define hidden void @ps_hints_apply() { ... } ; CHECK-NOT: br_table define hidden i32 @_Z15fannkuch_workerPv(i8* %_arg) { ... } Here is the manual for this `-check-prefix` option. Nit: Can we remove `hidden` too?
test/CodeGen/WebAssembly/non-irreducible-cfg.ll
12 ↗	(On Diff #190328)	Nit: Can we remove `hidden` and `#0`?

aheejin added inline comments.Mar 13 2019, 4:18 AM

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
270 ↗	(On Diff #189189)	Oh, nevermind. I think I got it. :) Inner loop entries that are not reachable from outside will not be identified as loop entries when we examine the outer region, because those inner loop entries can also reach their preds, as long as the preds are not from outside and in an outer loop.
259 ↗	(On Diff #190328)	Nit: How about renaming this to `EntriesForSingleLoop` or something and start by inserting `LoopEntry` first? (I don't really like the name `EntriesForSingleLoop`, but can't think of a better alternative now... I guess there can be better ones) BlockSet EntriesForSingleLoop; EntriesForSingleLoop.insert(LoopEntry); for (auto *OtherLoopEntry : Graph.getLoopEntries()) { if (Graph.canReach(LoopEntry, OtherLoopEntry) && Graph.canReach(OtherLoopEntry, LoopEntry)) EntriesForSingleLoop.insert(OhterLoopEntry); } if (EntriesForSingleLoop.size() > 1) { makeSingleEntryLoop(EntriesForSingleLoop, Blocks, MF); ... } ... That way we don't need to `std::move` this later and don't need to check `if (OtherLoopEntry != LoopEntry)`. And in my opinion `if (EntriesForSingleLoop.size() > 1)` makes it clear that we take a multiple entry loop and transform it into a single entry loop. This is just a matter of style preference, so if you don't like it, please ignore it.
292 ↗	(On Diff #190328)	we may only alter branches 'in' existing..?
test/CodeGen/WebAssembly/irreducible-cfg.ll
224 ↗	(On Diff #189189)	On second thought, wouldn't it better to just specify -O0 for all irreducibility tests? -O2 basically means we don't now what would happen to our carefully created irreducible CFG.

kripken marked 11 inline comments as done.Mar 13 2019, 1:40 PM

kripken added inline comments.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
201 ↗	(On Diff #189189)	The way you suggest is probably faster overall, I agree, I'll change it to that.
371 ↗	(On Diff #189189)	I'd prefer we do it separately since this patch already rewrites all the other code in this file, aside from that one unchanged function. Seems safer to change it later.
390 ↗	(On Diff #189189)	Same issue as before - I'd prefer to not modify this one function in this patch (which changes everything aside from that one function), for safety. (I did rename a bunch of variables, because I wanted to keep names consistent throughout the file - that seemed necessary in this patch.)
64 ↗	(On Diff #190328)	Rebasing - hopefully that fixes the context issue. Yeah, WebAssembly.h was included already, so might be the diff context problem.
259 ↗	(On Diff #190328)	I agree the `move` is not nice. I think we can just add LoopEntry to MutualLoopEntries and simplify things that way, updating the patch with that, let me know what you think.
292 ↗	(On Diff #190328)	Thanks, I'll fix that.
test/CodeGen/WebAssembly/irreducible-cfg.ll
224 ↗	(On Diff #189189)	Removed `hidden`, and changed to -O0 in irreducible-cfg. Yeah, that seems like the right thing to do for irreducibility tests. I didn't want to change `irreducible-cfg-exceptions.ll`'s optimization level though because I'm not sure that wouldn't invalidate it.

Various changes from the feedback:

Change how the LoopBlocks work list works, using a set+vector instead of just a set.
Move large functions out of the class definition.
Avoid a move in MutualEntries, instead have them always contain all the mutual entries (which includes the original entry, as it is mutual to itself).
Test cleanup; use -O0 for the main irreducibility tests; remove old irreducibility tests which were invalid.
Comment improvements.

Sorry, but at least some of the tests in irreducible-cfg-nested.ll and irreducible-cfg-nested2.ll were still irreducible if we passed -O0. Do we want to delete all of them? Shouldn't we keep the ones that are still irreducible with -O0?
Real nit: We usually start CL/commit headers with [WebAssembly]

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
259 ↗	(On Diff #190328)	I think this is better, thanks.

aheejin added inline comments.Mar 13 2019, 6:53 PM

test/CodeGen/WebAssembly/irreducible-cfg.ll
288 ↗	(On Diff #190491)	Nit: newline at the end

kripken retitled this revision from WebAssembly: Irreducible control flow rewrite to [WebAssembly] Irreducible control flow rewrite.Mar 14 2019, 2:05 PM

Updates:

Thanks, indeed one of the tests removed before (func_2) was still irreducible at -O0, restored it into the main irreducible file test, which is now at -O0. The other (tre_parse) was actually not irreducible, so we mis-identified it, and I left it out.
Fixed extra newline at the end of a test.
Renamed the title to have [WebAssembly]

LGTM, thank you. Do you want me to commit this for you? Or are you gonna get a commit access?

This revision is now accepted and ready to land.Mar 15 2019, 5:36 PM

Great, and thanks for the comments!

I should probably get commit access, yeah :) Can you please commit this one, though, so it doesn't wait on that?

Closed by commit rL356313: [WebAssembly] Irreducible control flow rewrite (authored by aheejin). · Explain WhyMar 15 2019, 7:59 PM

This revision was automatically updated to reflect the committed changes.

aheejin added inline comments.Mar 18 2019, 6:50 PM

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
378 ↗	(On Diff #189189)	Just letting you know, I was incorrect on this part; the code was correct. I realized that while I was working on D59462..

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

WebAssembly/

WebAssemblyFixIrreducibleControlFlow.cpp

529 lines

test/

CodeGen/

WebAssembly/

irreducible-cfg-nested.ll

63 lines

irreducible-cfg-nested2.ll

39 lines

irreducible-cfg.ll

103 lines

Diff 190941

llvm/trunk/lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp

	//=- WebAssemblyFixIrreducibleControlFlow.cpp - Fix irreducible control flow -//			//=- WebAssemblyFixIrreducibleControlFlow.cpp - Fix irreducible control flow -//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	///			///
	/// \file			/// \file
	/// This file implements a pass that transforms irreducible control flow into			/// This file implements a pass that removes irreducible control flow.
	/// reducible control flow. Irreducible control flow means multiple-entry			/// Irreducible control flow means multiple-entry loops, which this pass
	/// loops; they appear as CFG cycles that are not recorded in MachineLoopInfo			/// transforms to have a single entry.
	/// due to being unnatural.
	///			///
	/// Note that LLVM has a generic pass that lowers irreducible control flow, but			/// Note that LLVM has a generic pass that lowers irreducible control flow, but
	/// it linearizes control flow, turning diamonds into two triangles, which is			/// it linearizes control flow, turning diamonds into two triangles, which is
	/// both unnecessary and undesirable for WebAssembly.			/// both unnecessary and undesirable for WebAssembly.
	///			///
	/// The big picture: Ignoring natural loops (seeing them monolithically), we			/// The big picture: We recursively process each "region", defined as a group
	/// find all the blocks which can return to themselves ("loopers"). Loopers			/// of blocks with a single entry and no branches back to that entry. A region
	/// reachable from the non-loopers are loop entries: if there are 2 or more,			/// may be the entire function body, or the inner part of a loop, i.e., the
	/// then we have irreducible control flow. We fix that as follows: a new block			/// loop's body without branches back to the loop entry. In each region we fix
	/// is created that can dispatch to each of the loop entries, based on the			/// up multi-entry loops by adding a new block that can dispatch to each of the
	/// value of a label "helper" variable, and we replace direct branches to the			/// loop entries, based on the value of a label "helper" variable, and we
	/// entries with assignments to the label variable and a branch to the dispatch			/// replace direct branches to the entries with assignments to the label
	/// block. Then the dispatch block is the single entry in a new natural loop.			/// variable and a branch to the dispatch block. Then the dispatch block is the
				/// single entry in the loop containing the previous multiple entries. After
				/// ensuring all the loops in a region are reducible, we recurse into them. The
				/// total time complexity of this pass is:
				/// O(NumBlocks * NumNestedLoops * NumIrreducibleLoops +
				/// NumLoops * NumLoops)
	///			///
	/// This is similar to what the Relooper [1] does, both identify looping code			/// This pass is similar to what the Relooper [1] does. Both identify looping
	/// that requires multiple entries, and resolve it in a similar way. In			/// code that requires multiple entries, and resolve it in a similar way (in
	/// Relooper terminology, we implement a Multiple shape in a Loop shape. Note			/// Relooper terminology, we implement a Multiple shape in a Loop shape). Note
	/// also that like the Relooper, we implement a "minimal" intervention: we only			/// also that like the Relooper, we implement a "minimal" intervention: we only
	/// use the "label" helper for the blocks we absolutely must and no others. We			/// use the "label" helper for the blocks we absolutely must and no others. We
	/// also prioritize code size and do not perform node splitting (i.e. we don't			/// also prioritize code size and do not duplicate code in order to resolve
	/// duplicate code in order to resolve irreducibility).			/// irreducibility. The graph algorithms for finding loops and entries and so
	///			/// forth are also similar to the Relooper. The main differences between this
	/// The difference between this code and the Relooper is that the Relooper also			/// pass and the Relooper are:
	/// generates ifs and loops and works in a recursive manner, knowing at each			/// * We just care about irreducibility, so we just look at loops.
	/// point what the entries are, and recursively breaks down the problem. Here			/// * The Relooper emits structured control flow (with ifs etc.), while we
	/// we just want to resolve irreducible control flow, and we also want to use			/// emit a CFG.
	/// as much LLVM infrastructure as possible. So we use the MachineLoopInfo to
	/// identify natural loops, etc., and we start with the whole CFG and must
	/// identify both the looping code and its entries.
	///			///
	/// [1] Alon Zakai. 2011. Emscripten: an LLVM-to-JavaScript compiler. In			/// [1] Alon Zakai. 2011. Emscripten: an LLVM-to-JavaScript compiler. In
	/// Proceedings of the ACM international conference companion on Object oriented			/// Proceedings of the ACM international conference companion on Object oriented
	/// programming systems languages and applications companion (SPLASH '11). ACM,			/// programming systems languages and applications companion (SPLASH '11). ACM,
	/// New York, NY, USA, 301-312. DOI=10.1145/2048147.2048224			/// New York, NY, USA, 301-312. DOI=10.1145/2048147.2048224
	/// http://doi.acm.org/10.1145/2048147.2048224			/// http://doi.acm.org/10.1145/2048147.2048224
	///			///
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	Show All 14 Lines
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "wasm-fix-irreducible-control-flow"			#define DEBUG_TYPE "wasm-fix-irreducible-control-flow"

	namespace {			namespace {

	class LoopFixer {			using BlockVector = SmallVector<MachineBasicBlock *, 4>;
				using BlockSet = SmallPtrSet<MachineBasicBlock *, 4>;

				// Calculates reachability in a region. Ignores branches to blocks outside of
				// the region, and ignores branches to the region entry (for the case where
				// the region is the inner part of a loop).
				class ReachabilityGraph {
	public:			public:
	LoopFixer(MachineFunction &MF, MachineLoopInfo &MLI, MachineLoop *Loop)			ReachabilityGraph(MachineBasicBlock *Entry, const BlockSet &Blocks)
	: MF(MF), MLI(MLI), Loop(Loop) {}			: Entry(Entry), Blocks(Blocks) {
				#ifndef NDEBUG
				// The region must have a single entry.
				for (auto *MBB : Blocks) {
				if (MBB != Entry) {
				for (auto *Pred : MBB->predecessors()) {
				assert(inRegion(Pred));
				}
				}
				}
				#endif
				calculate();
				}

				bool canReach(MachineBasicBlock From, MachineBasicBlock To) {
				assert(inRegion(From) && inRegion(To));
				return Reachable[From].count(To);
				}

	// Run the fixer on the given inputs. Returns whether changes were made.			// "Loopers" are blocks that are in a loop. We detect these by finding blocks
	bool run();			// that can reach themselves.
				const BlockSet &getLoopers() { return Loopers; }

				// Get all blocks that are loop entries.
				const BlockSet &getLoopEntries() { return LoopEntries; }

				// Get all blocks that enter a particular loop from outside.
				const BlockSet &getLoopEnterers(MachineBasicBlock *LoopEntry) {
				assert(inRegion(LoopEntry));
				return LoopEnterers[LoopEntry];
				}

	private:			private:
	MachineFunction &MF;			MachineBasicBlock *Entry;
	MachineLoopInfo &MLI;			const BlockSet &Blocks;
	MachineLoop *Loop;

	MachineBasicBlock *Header;			BlockSet Loopers, LoopEntries;
	SmallPtrSet<MachineBasicBlock *, 4> LoopBlocks;			DenseMap<MachineBasicBlock *, BlockSet> LoopEnterers;

	using BlockSet = SmallPtrSet<MachineBasicBlock *, 4>;			bool inRegion(MachineBasicBlock *MBB) { return Blocks.count(MBB); }

				// Maps a block to all the other blocks it can reach.
	DenseMap<MachineBasicBlock *, BlockSet> Reachable;			DenseMap<MachineBasicBlock *, BlockSet> Reachable;

	// The worklist contains pairs of recent additions, (a, b), where we just			void calculate() {
	// added a link a => b.			// Reachability computation work list. Contains pairs of recent additions
				// (A, B) where we just added a link A => B.
	using BlockPair = std::pair<MachineBasicBlock , MachineBasicBlock >;			using BlockPair = std::pair<MachineBasicBlock , MachineBasicBlock >;
	SmallVector<BlockPair, 4> WorkList;			SmallVector<BlockPair, 4> WorkList;

	// Get a canonical block to represent a block or a loop: the block, or if in			// Add all relevant direct branches.
	// an inner loop, the loop header, of it in an outer loop scope, we can			for (auto *MBB : Blocks) {
	// ignore it. We need to call this on all blocks we work on.			for (auto *Succ : MBB->successors()) {
	MachineBasicBlock canonicalize(MachineBasicBlock MBB) {			if (Succ != Entry && inRegion(Succ)) {
	MachineLoop *InnerLoop = MLI.getLoopFor(MBB);			Reachable[MBB].insert(Succ);
	if (InnerLoop == Loop) {			WorkList.emplace_back(MBB, Succ);
	return MBB;			}
	} else {
	// This is either in an outer or an inner loop, and not in ours.
	if (!LoopBlocks.count(MBB)) {
	// It's in outer code, ignore it.
	return nullptr;
	}
	assert(InnerLoop);
	// It's in an inner loop, canonicalize it to the header of that loop.
	return InnerLoop->getHeader();
	}			}
	}			}

	// For a successor we can additionally ignore it if it's a branch back to a			while (!WorkList.empty()) {
	// natural loop top, as when we are in the scope of a loop, we just care			MachineBasicBlock MBB, Succ;
	// about internal irreducibility, and can ignore the loop we are in. We need			std::tie(MBB, Succ) = WorkList.pop_back_val();
	// to call this on all blocks in a context where they are a successor.			assert(inRegion(MBB) && Succ != Entry && inRegion(Succ));
	MachineBasicBlock canonicalizeSuccessor(MachineBasicBlock MBB) {			if (MBB != Entry) {
	if (Loop && MBB == Loop->getHeader()) {			// We recently added MBB => Succ, and that means we may have enabled
	// Ignore branches going to the loop's natural header.			// Pred => MBB => Succ.
	return nullptr;			for (auto *Pred : MBB->predecessors()) {
	}			if (Reachable[Pred].insert(Succ).second) {
	return canonicalize(MBB);			WorkList.emplace_back(Pred, Succ);
	}			}

	// Potentially insert a new reachable edge, and if so, note it as further
	// work.
	void maybeInsert(MachineBasicBlock MBB, MachineBasicBlock Succ) {
	assert(MBB == canonicalize(MBB));
	assert(Succ);
	// Succ may not be interesting as a sucessor.
	Succ = canonicalizeSuccessor(Succ);
	if (!Succ)
	return;
	if (Reachable[MBB].insert(Succ).second) {
	// For there to be further work, it means that we have
	// X => MBB => Succ
	// for some other X, and in that case X => Succ would be a new edge for
	// us to discover later. However, if we don't care about MBB as a
	// successor, then we don't care about that anyhow.
	if (canonicalizeSuccessor(MBB)) {
	WorkList.emplace_back(MBB, Succ);
	}			}
	}			}
	}			}
	};

	bool LoopFixer::run() {			// Blocks that can return to themselves are in a loop.
	Header = Loop ? Loop->getHeader() : &*MF.begin();			for (auto *MBB : Blocks) {
				if (canReach(MBB, MBB)) {
				Loopers.insert(MBB);
				}
				}
				assert(!Loopers.count(Entry));

	// Identify all the blocks in this loop scope.			// Find the loop entries - loopers reachable from blocks not in that loop -
	if (Loop) {			// and those outside blocks that reach them, the "loop enterers".
	for (auto *MBB : Loop->getBlocks()) {			for (auto *Looper : Loopers) {
	LoopBlocks.insert(MBB);			for (auto *Pred : Looper->predecessors()) {
				// Pred can reach Looper. If Looper can reach Pred, it is in the loop;
				// otherwise, it is a block that enters into the loop.
				if (!canReach(Looper, Pred)) {
				LoopEntries.insert(Looper);
				LoopEnterers[Looper].insert(Pred);
				}
	}			}
	} else {
	for (auto &MBB : MF) {
	LoopBlocks.insert(&MBB);
	}			}
	}			}
				};

	// Compute which (canonicalized) blocks each block can reach.			// Finds the blocks in a single-entry loop, given the loop entry and the
				// list of blocks that enter the loop.
				class LoopBlocks {
				public:
				LoopBlocks(MachineBasicBlock *Entry, const BlockSet &Enterers)
				: Entry(Entry), Enterers(Enterers) {
				calculate();
				}

	// Add all the initial work.			BlockSet &getBlocks() { return Blocks; }
	for (auto *MBB : LoopBlocks) {
	MachineLoop *InnerLoop = MLI.getLoopFor(MBB);

	if (InnerLoop == Loop) {			private:
	for (auto *Succ : MBB->successors()) {			MachineBasicBlock *Entry;
	maybeInsert(MBB, Succ);			const BlockSet &Enterers;
	}
	} else {			BlockSet Blocks;
	// It can't be in an outer loop - we loop on LoopBlocks - and so it must
	// be an inner loop.			void calculate() {
	assert(InnerLoop);			// Going backwards from the loop entry, if we ignore the blocks entering
	// Check if we are the canonical block for this loop.			// from outside, we will traverse all the blocks in the loop.
	if (canonicalize(MBB) != MBB) {			BlockVector WorkList;
	continue;			BlockSet AddedToWorkList;
	}			Blocks.insert(Entry);
	// The successors are those of the loop.			for (auto *Pred : Entry->predecessors()) {
	SmallVector<MachineBasicBlock *, 2> ExitBlocks;			if (!Enterers.count(Pred)) {
	InnerLoop->getExitBlocks(ExitBlocks);			WorkList.push_back(Pred);
	for (auto *Succ : ExitBlocks) {			AddedToWorkList.insert(Pred);
	maybeInsert(MBB, Succ);
	}
	}			}
	}			}

	// Do work until we are all done.
	while (!WorkList.empty()) {			while (!WorkList.empty()) {
	MachineBasicBlock *MBB;			auto *MBB = WorkList.pop_back_val();
	MachineBasicBlock *Succ;			assert(!Enterers.count(MBB));
	std::tie(MBB, Succ) = WorkList.pop_back_val();			if (Blocks.insert(MBB).second) {
	// The worklist item is an edge we just added, so it must have valid blocks
	// (and not something canonicalized to nullptr).
	assert(MBB);
	assert(Succ);
	// The successor in that pair must also be a valid successor.
	assert(MBB == canonicalizeSuccessor(MBB));
	// We recently added MBB => Succ, and that means we may have enabled
	// Pred => MBB => Succ. Check all the predecessors. Note that our loop here
	// is correct for both a block and a block representing a loop, as the loop
	// is natural and so the predecessors are all predecessors of the loop
	// header, which is the block we have here.
	for (auto *Pred : MBB->predecessors()) {			for (auto *Pred : MBB->predecessors()) {
	// Canonicalize, make sure it's relevant, and check it's not the same			if (!AddedToWorkList.count(Pred)) {
	// block (an update to the block itself doesn't help compute that same			WorkList.push_back(Pred);
	// block).			AddedToWorkList.insert(Pred);
	Pred = canonicalize(Pred);			}
	if (Pred && Pred != MBB) {			}
	maybeInsert(Pred, Succ);
	}			}
	}			}
	}			}
				};

	// It's now trivial to identify the loopers.			class WebAssemblyFixIrreducibleControlFlow final : public MachineFunctionPass {
	SmallPtrSet<MachineBasicBlock *, 4> Loopers;			StringRef getPassName() const override {
	for (auto MBB : LoopBlocks) {			return "WebAssembly Fix Irreducible Control Flow";
	if (Reachable[MBB].count(MBB)) {
	Loopers.insert(MBB);
	}			}

				bool runOnMachineFunction(MachineFunction &MF) override;

				bool processRegion(MachineBasicBlock *Entry, BlockSet &Blocks,
				MachineFunction &MF);

				void makeSingleEntryLoop(BlockSet &Entries, BlockSet &Blocks,
				MachineFunction &MF);

				public:
				static char ID; // Pass identification, replacement for typeid
				WebAssemblyFixIrreducibleControlFlow() : MachineFunctionPass(ID) {}
				};

				bool WebAssemblyFixIrreducibleControlFlow::processRegion(
				MachineBasicBlock *Entry, BlockSet &Blocks, MachineFunction &MF) {
				bool Changed = false;

				// Remove irreducibility before processing child loops, which may take
				// multiple iterations.
				while (true) {
				ReachabilityGraph Graph(Entry, Blocks);

				bool FoundIrreducibility = false;

				for (auto *LoopEntry : Graph.getLoopEntries()) {
				// Find mutual entries - all entries which can reach this one, and
				// are reached by it (that always includes LoopEntry itself). All mutual
				// entries must be in the same loop, so if we have more than one, then we
				// have irreducible control flow.
				//
				// Note that irreducibility may involve inner loops, e.g. imagine A
				// starts one loop, and it has B inside it which starts an inner loop.
				// If we add a branch from all the way on the outside to B, then in a
				// sense B is no longer an "inner" loop, semantically speaking. We will
				// fix that irreducibility by adding a block that dispatches to either
				// either A or B, so B will no longer be an inner loop in our output.
				// (A fancier approach might try to keep it as such.)
				//
				// Note that we still need to recurse into inner loops later, to handle
				// the case where the irreducibility is entirely nested - we would not
				// be able to identify that at this point, since the enclosing loop is
				// a group of blocks all of whom can reach each other. (We'll see the
				// irreducibility after removing branches to the top of that enclosing
				// loop.)
				BlockSet MutualLoopEntries;
				MutualLoopEntries.insert(LoopEntry);
				for (auto *OtherLoopEntry : Graph.getLoopEntries()) {
				if (OtherLoopEntry != LoopEntry &&
				Graph.canReach(LoopEntry, OtherLoopEntry) &&
				Graph.canReach(OtherLoopEntry, LoopEntry)) {
				MutualLoopEntries.insert(OtherLoopEntry);
	}			}
	// The header cannot be a looper. At the toplevel, LLVM does not allow the			}
	// entry to be in a loop, and in a natural loop we should ignore the header.
	assert(Loopers.count(Header) == 0);			if (MutualLoopEntries.size() > 1) {
				makeSingleEntryLoop(MutualLoopEntries, Blocks, MF);
	// Find the entries, loopers reachable from non-loopers.			FoundIrreducibility = true;
	SmallPtrSet<MachineBasicBlock *, 4> Entries;			Changed = true;
	SmallVector<MachineBasicBlock *, 4> SortedEntries;
	for (auto *Looper : Loopers) {
	for (auto *Pred : Looper->predecessors()) {
	Pred = canonicalize(Pred);
	if (Pred && !Loopers.count(Pred)) {
	Entries.insert(Looper);
	SortedEntries.push_back(Looper);
	break;			break;
	}			}
	}			}
				// Only go on to actually process the inner loops when we are done
				// removing irreducible control flow and changing the graph. Modifying
				// the graph as we go is possible, and that might let us avoid looking at
				// the already-fixed loops again if we are careful, but all that is
				// complex and bug-prone. Since irreducible loops are rare, just starting
				// another iteration is best.
				if (FoundIrreducibility) {
				continue;
	}			}

	// Check if we found irreducible control flow.			for (auto *LoopEntry : Graph.getLoopEntries()) {
	if (LLVM_LIKELY(Entries.size() <= 1))			LoopBlocks InnerBlocks(LoopEntry, Graph.getLoopEnterers(LoopEntry));
	return false;			// Each of these calls to processRegion may change the graph, but are
				// guaranteed not to interfere with each other. The only changes we make
				// to the graph are to add blocks on the way to a loop entry. As the
				// loops are disjoint, that means we may only alter branches that exit
				// another loop, which are ignored when recursing into that other loop
				// anyhow.
				if (processRegion(LoopEntry, InnerBlocks.getBlocks(), MF)) {
				Changed = true;
				}
				}

				return Changed;
				}
				}

				// Given a set of entries to a single loop, create a single entry for that
				// loop by creating a dispatch block for them, routing control flow using
				// a helper variable. Also updates Blocks with any new blocks created, so
				// that we properly track all the blocks in the region.
				void WebAssemblyFixIrreducibleControlFlow::makeSingleEntryLoop(
				BlockSet &Entries, BlockSet &Blocks, MachineFunction &MF) {
				assert(Entries.size() >= 2);

	// Sort the entries to ensure a deterministic build.			// Sort the entries to ensure a deterministic build.
				BlockVector SortedEntries(Entries.begin(), Entries.end());
	llvm::sort(SortedEntries,			llvm::sort(SortedEntries,
	[&](const MachineBasicBlock A, const MachineBasicBlock B) {			[&](const MachineBasicBlock A, const MachineBasicBlock B) {
	auto ANum = A->getNumber();			auto ANum = A->getNumber();
	auto BNum = B->getNumber();			auto BNum = B->getNumber();
	return ANum < BNum;			return ANum < BNum;
	});			});

	#ifndef NDEBUG			#ifndef NDEBUG
	for (auto Block : SortedEntries)			for (auto Block : SortedEntries)
	assert(Block->getNumber() != -1);			assert(Block->getNumber() != -1);
	if (SortedEntries.size() > 1) {			if (SortedEntries.size() > 1) {
	for (auto I = SortedEntries.begin(), E = SortedEntries.end() - 1;			for (auto I = SortedEntries.begin(), E = SortedEntries.end() - 1; I != E;
	I != E; ++I) {			++I) {
	auto ANum = (*I)->getNumber();			auto ANum = (*I)->getNumber();
	auto BNum = (*(std::next(I)))->getNumber();			auto BNum = (*(std::next(I)))->getNumber();
	assert(ANum != BNum);			assert(ANum != BNum);
	}			}
	}			}
	#endif			#endif

	// Create a dispatch block which will contain a jump table to the entries.			// Create a dispatch block which will contain a jump table to the entries.
	MachineBasicBlock *Dispatch = MF.CreateMachineBasicBlock();			MachineBasicBlock *Dispatch = MF.CreateMachineBasicBlock();
	MF.insert(MF.end(), Dispatch);			MF.insert(MF.end(), Dispatch);
	MLI.changeLoopFor(Dispatch, Loop);			Blocks.insert(Dispatch);

	// Add the jump table.			// Add the jump table.
	const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();			const auto &TII = *MF.getSubtarget<WebAssemblySubtarget>().getInstrInfo();
	MachineInstrBuilder MIB = BuildMI(*Dispatch, Dispatch->end(), DebugLoc(),			MachineInstrBuilder MIB = BuildMI(*Dispatch, Dispatch->end(), DebugLoc(),
	TII.get(WebAssembly::BR_TABLE_I32));			TII.get(WebAssembly::BR_TABLE_I32));

	// Add the register which will be used to tell the jump table which block to			// Add the register which will be used to tell the jump table which block to
	// jump to.			// jump to.
	MachineRegisterInfo &MRI = MF.getRegInfo();			MachineRegisterInfo &MRI = MF.getRegInfo();
	unsigned Reg = MRI.createVirtualRegister(&WebAssembly::I32RegClass);			unsigned Reg = MRI.createVirtualRegister(&WebAssembly::I32RegClass);
	MIB.addReg(Reg);			MIB.addReg(Reg);

	// Compute the indices in the superheader, one for each bad block, and			// Compute the indices in the superheader, one for each bad block, and
	// add them as successors.			// add them as successors.
	DenseMap<MachineBasicBlock *, unsigned> Indices;			DenseMap<MachineBasicBlock *, unsigned> Indices;
	for (auto *MBB : SortedEntries) {			for (auto *Entry : SortedEntries) {
	auto Pair = Indices.insert(std::make_pair(MBB, 0));			auto Pair = Indices.insert(std::make_pair(Entry, 0));
	if (!Pair.second) {			assert(Pair.second);
	continue;
	}

	unsigned Index = MIB.getInstr()->getNumExplicitOperands() - 1;			unsigned Index = MIB.getInstr()->getNumExplicitOperands() - 1;
	Pair.first->second = Index;			Pair.first->second = Index;

	MIB.addMBB(MBB);			MIB.addMBB(Entry);
	Dispatch->addSuccessor(MBB);			Dispatch->addSuccessor(Entry);
	}			}

	// Rewrite the problematic successors for every block that wants to reach the			// Rewrite the problematic successors for every block that wants to reach
	// bad blocks. For simplicity, we just introduce a new block for every edge			// the bad blocks. For simplicity, we just introduce a new block for every
	// we need to rewrite. (Fancier things are possible.)			// edge we need to rewrite. (Fancier things are possible.)

	SmallVector<MachineBasicBlock *, 4> AllPreds;			BlockVector AllPreds;
	for (auto *MBB : SortedEntries) {			for (auto *Entry : SortedEntries) {
	for (auto *Pred : MBB->predecessors()) {			for (auto *Pred : Entry->predecessors()) {
	if (Pred != Dispatch) {			if (Pred != Dispatch) {
	AllPreds.push_back(Pred);			AllPreds.push_back(Pred);
	}			}
	}			}
	}			}

	for (MachineBasicBlock *MBB : AllPreds) {			for (MachineBasicBlock *Pred : AllPreds) {
	DenseMap<MachineBasicBlock , MachineBasicBlock > Map;			DenseMap<MachineBasicBlock , MachineBasicBlock > Map;
	for (auto *Succ : MBB->successors()) {			for (auto *Entry : Pred->successors()) {
	if (!Entries.count(Succ)) {			if (!Entries.count(Entry)) {
	continue;			continue;
	}			}

	// This is a successor we need to rewrite.			// This is a successor we need to rewrite.
	MachineBasicBlock *Split = MF.CreateMachineBasicBlock();			MachineBasicBlock *Split = MF.CreateMachineBasicBlock();
	MF.insert(MBB->isLayoutSuccessor(Succ) ? Succ->getIterator() : MF.end(),			MF.insert(Pred->isLayoutSuccessor(Entry)
				? MachineFunction::iterator(Entry)
				: MF.end(),
	Split);			Split);
	MLI.changeLoopFor(Split, Loop);			Blocks.insert(Split);

	// Set the jump table's register of the index of the block we wish to			// Set the jump table's register of the index of the block we wish to
	// jump to, and jump to the jump table.			// jump to, and jump to the jump table.
	BuildMI(*Split, Split->end(), DebugLoc(), TII.get(WebAssembly::CONST_I32),			BuildMI(*Split, Split->end(), DebugLoc(), TII.get(WebAssembly::CONST_I32),
	Reg)			Reg)
	.addImm(Indices[Succ]);			.addImm(Indices[Entry]);
	BuildMI(*Split, Split->end(), DebugLoc(), TII.get(WebAssembly::BR))			BuildMI(*Split, Split->end(), DebugLoc(), TII.get(WebAssembly::BR))
	.addMBB(Dispatch);			.addMBB(Dispatch);
	Split->addSuccessor(Dispatch);			Split->addSuccessor(Dispatch);
	Map[Succ] = Split;			Map[Entry] = Split;
	}			}
	// Remap the terminator operands and the successor list.			// Remap the terminator operands and the successor list.
	for (MachineInstr &Term : MBB->terminators())			for (MachineInstr &Term : Pred->terminators())
	for (auto &Op : Term.explicit_uses())			for (auto &Op : Term.explicit_uses())
	if (Op.isMBB() && Indices.count(Op.getMBB()))			if (Op.isMBB() && Indices.count(Op.getMBB()))
	Op.setMBB(Map[Op.getMBB()]);			Op.setMBB(Map[Op.getMBB()]);
	for (auto Rewrite : Map)			for (auto Rewrite : Map)
	MBB->replaceSuccessor(Rewrite.first, Rewrite.second);			Pred->replaceSuccessor(Rewrite.first, Rewrite.second);
	}			}

	// Create a fake default label, because br_table requires one.			// Create a fake default label, because br_table requires one.
	MIB.addMBB(MIB.getInstr()			MIB.addMBB(MIB.getInstr()
	->getOperand(MIB.getInstr()->getNumExplicitOperands() - 1)			->getOperand(MIB.getInstr()->getNumExplicitOperands() - 1)
	.getMBB());			.getMBB());

	return true;
	}			}

	class WebAssemblyFixIrreducibleControlFlow final : public MachineFunctionPass {
	StringRef getPassName() const override {
	return "WebAssembly Fix Irreducible Control Flow";
	}

	void getAnalysisUsage(AnalysisUsage &AU) const override {
	AU.setPreservesCFG();
	AU.addRequired<MachineDominatorTree>();
	AU.addPreserved<MachineDominatorTree>();
	AU.addRequired<MachineLoopInfo>();
	AU.addPreserved<MachineLoopInfo>();
	MachineFunctionPass::getAnalysisUsage(AU);
	}

	bool runOnMachineFunction(MachineFunction &MF) override;

	bool runIteration(MachineFunction &MF, MachineLoopInfo &MLI) {
	// Visit the function body, which is identified as a null loop.
	if (LoopFixer(MF, MLI, nullptr).run()) {
	return true;
	}

	// Visit all the loops.
	SmallVector<MachineLoop *, 8> Worklist(MLI.begin(), MLI.end());
	while (!Worklist.empty()) {
	MachineLoop *Loop = Worklist.pop_back_val();
	Worklist.append(Loop->begin(), Loop->end());
	if (LoopFixer(MF, MLI, Loop).run()) {
	return true;
	}
	}

	return false;
	}

	public:
	static char ID; // Pass identification, replacement for typeid
	WebAssemblyFixIrreducibleControlFlow() : MachineFunctionPass(ID) {}
	};
	} // end anonymous namespace			} // end anonymous namespace

	char WebAssemblyFixIrreducibleControlFlow::ID = 0;			char WebAssemblyFixIrreducibleControlFlow::ID = 0;
	INITIALIZE_PASS(WebAssemblyFixIrreducibleControlFlow, DEBUG_TYPE,			INITIALIZE_PASS(WebAssemblyFixIrreducibleControlFlow, DEBUG_TYPE,
	"Removes irreducible control flow", false, false)			"Removes irreducible control flow", false, false)

	FunctionPass *llvm::createWebAssemblyFixIrreducibleControlFlow() {			FunctionPass *llvm::createWebAssemblyFixIrreducibleControlFlow() {
	return new WebAssemblyFixIrreducibleControlFlow();			return new WebAssemblyFixIrreducibleControlFlow();
	}			}

	bool WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction(			bool WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction(
	MachineFunction &MF) {			MachineFunction &MF) {
	LLVM_DEBUG(dbgs() << "******** Fixing Irreducible Control Flow ********\n"			LLVM_DEBUG(dbgs() << "******** Fixing Irreducible Control Flow ********\n"
	"********** Function: "			"********** Function: "
	<< MF.getName() << '\n');			<< MF.getName() << '\n');

	bool Changed = false;			// Start the recursive process on the entire function body.
	auto &MLI = getAnalysis<MachineLoopInfo>();			BlockSet AllBlocks;
				for (auto &MBB : MF) {
				AllBlocks.insert(&MBB);
				}

	// When we modify something, bail out and recompute MLI, then start again, as			if (LLVM_UNLIKELY(processRegion(&*MF.begin(), AllBlocks, MF))) {
	// we create a new natural loop when we resolve irreducible control flow, and			// We rewrote part of the function; recompute relevant things.
	// other loops may become nested in it, etc. In practice this is not an issue
	// because irreducible control flow is rare, only very few cycles are needed
	// here.
	while (LLVM_UNLIKELY(runIteration(MF, MLI))) {
	// We rewrote part of the function; recompute MLI and start again.
	LLVM_DEBUG(dbgs() << "Recomputing loops.\n");
	MF.getRegInfo().invalidateLiveness();			MF.getRegInfo().invalidateLiveness();
	MF.RenumberBlocks();			MF.RenumberBlocks();
	getAnalysis<MachineDominatorTree>().runOnMachineFunction(MF);			return true;
	MLI.runOnMachineFunction(MF);
	Changed = true;
	}			}

	return Changed;			return false;
	}			}

llvm/trunk/test/CodeGen/WebAssembly/irreducible-cfg-nested.ll

	; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"


	; Test an interesting pattern of nested irreducibility.
	; Just check we resolve all the irreducibility here (if not we'd crash).

	; CHECK-LABEL: tre_parse:

	define void @tre_parse() {
	entry:
	br label %for.cond.outer

	for.cond.outer: ; preds = %do.body14, %entry
	br label %for.cond

	for.cond: ; preds = %for.cond.backedge, %for.cond.outer
	%nbranch.0 = phi i32* [ null, %for.cond.outer ], [ %call188, %for.cond.backedge ]
	switch i8 undef, label %if.else [
	i8 40, label %do.body14
	i8 41, label %if.then63
	]

	do.body14: ; preds = %for.cond
	br label %for.cond.outer

	if.then63: ; preds = %for.cond
	unreachable

	if.else: ; preds = %for.cond
	switch i8 undef, label %if.then84 [
	i8 92, label %if.end101
	i8 42, label %if.end101
	]

	if.then84: ; preds = %if.else
	switch i8 undef, label %cleanup.thread [
	i8 43, label %if.end101
	i8 63, label %if.end101
	i8 123, label %if.end101
	]

	if.end101: ; preds = %if.then84, %if.then84, %if.then84, %if.else, %if.else
	unreachable

	cleanup.thread: ; preds = %if.then84
	%call188 = tail call i32* undef(i32* %nbranch.0)
	switch i8 undef, label %for.cond.backedge [
	i8 92, label %land.lhs.true208
	i8 0, label %if.else252
	]

	land.lhs.true208: ; preds = %cleanup.thread
	unreachable

	for.cond.backedge: ; preds = %cleanup.thread
	br label %for.cond

	if.else252: ; preds = %cleanup.thread
	unreachable
	}

llvm/trunk/test/CodeGen/WebAssembly/irreducible-cfg-nested2.ll

	; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"

	; Test an interesting pattern of nested irreducibility.
	; Just check we resolve all the irreducibility here (if not we'd crash).

	; CHECK-LABEL: func_2:

	; Function Attrs: noinline nounwind optnone
	define void @func_2() {
	entry:
	br i1 undef, label %lbl_937, label %if.else787

	lbl_937: ; preds = %for.body978, %entry
	br label %if.end965

	if.else787: ; preds = %entry
	br label %if.end965

	if.end965: ; preds = %if.else787, %lbl_937
	br label %for.cond967

	for.cond967: ; preds = %for.end1035, %if.end965
	br label %for.cond975

	for.cond975: ; preds = %if.end984, %for.cond967
	br i1 undef, label %for.body978, label %for.end1035

	for.body978: ; preds = %for.cond975
	br i1 undef, label %lbl_937, label %if.end984

	if.end984: ; preds = %for.body978
	br label %for.cond975

	for.end1035: ; preds = %for.cond975
	br label %for.cond967
	}

llvm/trunk/test/CodeGen/WebAssembly/irreducible-cfg.ll

	; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s			; RUN: llc < %s -O0 -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers \| FileCheck %s

	; Test irreducible CFG handling.			; Test irreducible CFG handling.

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"			target triple = "wasm32-unknown-unknown"

	; A simple loop with two entries.			; A simple loop with two entries.

	▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines

	for.inc205.i: ; preds = %for.cond65.i			for.inc205.i: ; preds = %for.cond65.i
	br label %for.cond57.i			br label %for.cond57.i

	return: ; preds = %entry			return: ; preds = %entry
	ret void			ret void
	}			}

				; A more complx case of irreducible control flow, two interacting loops.
				; CHECK: ps_hints_apply
				; CHECK: br_table
				define void @ps_hints_apply() {
				entry:
				br label %psh

				psh: ; preds = %entry
				br i1 undef, label %for.cond, label %for.body

				for.body: ; preds = %psh
				br label %do.body

				do.body: ; preds = %do.cond, %for.body
				%cmp118 = icmp eq i32* undef, undef
				br i1 %cmp118, label %Skip, label %do.cond

				do.cond: ; preds = %do.body
				br label %do.body

				for.cond: ; preds = %Skip, %psh
				br label %for.body39

				for.body39: ; preds = %for.cond
				br i1 undef, label %Skip, label %do.body45

				do.body45: ; preds = %for.body39
				unreachable

				Skip: ; preds = %for.body39, %do.body
				br label %for.cond
				}

				; A simple sequence of loops with blocks in between, that should not be
				; misinterpreted as irreducible control flow.
				; CHECK: fannkuch_worker
				; CHECK-NOT: br_table
				define i32 @fannkuch_worker(i8* %_arg) {
				for.cond: ; preds = %entry
				br label %do.body

				do.body: ; preds = %do.cond, %for.cond
				br label %for.cond1

				for.cond1: ; preds = %for.body, %do.body
				br i1 1, label %for.cond1, label %for.end

				for.end: ; preds = %for.cond1
				br label %do.cond

				do.cond: ; preds = %for.end
				br i1 1, label %do.body, label %do.end

				do.end: ; preds = %do.cond
				br label %for.cond2

				for.cond2: ; preds = %for.end6, %do.end
				br label %for.cond3

				for.cond3: ; preds = %for.body5, %for.cond2
				br i1 1, label %for.cond3, label %for.end6

				for.end6: ; preds = %for.cond3
				br label %for.cond2

				return: ; No predecessors!
				ret i32 1
				}

				; Test an interesting pattern of nested irreducibility.

				; CHECK: func_2:
				; CHECK: br_table
				define void @func_2() {
				entry:
				br i1 undef, label %lbl_937, label %if.else787

				lbl_937: ; preds = %for.body978, %entry
				br label %if.end965

				if.else787: ; preds = %entry
				br label %if.end965

				if.end965: ; preds = %if.else787, %lbl_937
				br label %for.cond967

				for.cond967: ; preds = %for.end1035, %if.end965
				br label %for.cond975

				for.cond975: ; preds = %if.end984, %for.cond967
				br i1 undef, label %for.body978, label %for.end1035

				for.body978: ; preds = %for.cond975
				br i1 undef, label %lbl_937, label %if.end984

				if.end984: ; preds = %for.body978
				br label %for.cond975

				for.end1035: ; preds = %for.cond975
				br label %for.cond967
				}