This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Irreducible control flow rewrite
ClosedPublic

Authored by kripken on Mar 4 2019, 12:31 PM.

Details

Summary
Rewrite WebAssemblyFixIrreducibleControlFlow to a simpler and cleaner design,
which directly computes reachability and other properties itself. This avoids
previous complexity and bugs. (The new graph analyses are very similar to how
the Relooper algorithm would find loop entries and so forth.)

This fixes a few bugs, including where we had a false positive and thought
fannkuch was irreducible when it was not, which made us much larger and
slower there, and a reverse bug where we missed irreducibility. On fannkuch,
we used to be 44% slower than asm2wasm and are now 4% faster.

Diff Detail

Event Timeline

kripken created this revision.Mar 4 2019, 12:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 12:31 PM
aheejin added a comment.EditedMar 12 2019, 5:32 AM

Sorry for the delayed reply!

What changes fixed the previous bug?

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
30

Why is this?

144

Real nit: We can merge these into a single line:

MachineBasicBlock *MBB, *Succ;
158

We used to have the description on the definitions of 'loopers' and 'non-loopers' in the comment at the beginning of the file, but not anymore. Can we have that again here maybe?

167

Does this definition cover when Entry is an entry (or header) of a loop? I'm wondering because the reachability computation here does not consider backedges to Entry, so it will not be discovered as a looper.

Also does it cover this case?

bb0:
  br bb1
bb1:
  br_if bb0
  br bb2
bb2:
  br bb3
bb3:
  br bb2

So here bb0 and bb1 comprise a loop and bb2 and bb3 comprise another loop. And all four blocks are loopers, but bb0 and bb2 are loop entries. But they are not reachable from any of non-loopers, because there are no non-loopers in this CFG.
(Also here bb0 is the header of a loop so it belongs to the first case where Entry is a header)

201

Why is this worklist a set?

222

Do we need to make this as a separate class now? Unlike the previous version, we create LoopFixer object only once in WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction. I think we can add paste the body of LoopFixer::run into WebAssemblyFixIrreducibleControlFlow::runOnMachineFunction and make LoopFixer:: processRegion as a member function of WebAssemblyFixIrreducibleControlFlow that returns a bool.

270

I might be mistaken, but the algorithm seems to not care about whether loop entries found are within the same level loop. For example, the case where there are two-level nested loops and block A is one of the entries to the outer loop and B is one of the entries for the inner loop. Even in this case, we don't distinguish them and resolve these with a single pass with a single dispatch block. Is this OK and intended? And for the 'mutual entries' above, can't they also be not in the same loop but each in an outer loop and an inner loop?

275
  • How about reducing depth of the subloop processing routine below by wrapping this loop here early, like
while (FoundIrreducibility) {
  ReachabilityGraph Graph(Entry, Blocks);

  bool FoundIrreducibility = false;
  for (auto *LoopEntry : Graph.getLoopEntries()) {
    ...
  }
}

for (auto *LoopEntry : Graph.getLoopEntries()) {
  ...
}
  • Btw, is there any case we can find irreducibility again with the same region before we recurse into inner loops below? If there is, how is that case different from the case that's processed below on the inner loops?
286

In which case do we need to recurse into inner loops? (Given that we didn't distinguish inner loops and outer loops above) Could you add some comments?

303

Nit: We can do in a single line

BlockVector SortedEntries(Entries.begin(), Entries.end());
347

Is there any case !Pair.second is not set? Aren't all blocks in SortedEntries distinct?

371

This loop does not seem to check if a corresponding Split block for Entry has been already created. In case Entry has more than one predecessors, Split for Entry is gonna be created multiple times and Map[Entry] will be overwritten. We should do something like this inside the loop:

if (Map.count(Entry))
  continue;
378

Did you mean

Entry->isLayoutSuccessor(Pred)

? In the current state all new blocks are gonna be appended at the end, because this condition is hardly likely to be satisfied.

390

Nit: There is a [[ https://github.com/llvm/llvm-project/blob/a946997c2482e4386549ee38b4bb154eb58efbb6/llvm/include/llvm/CodeGen/MachineInstrBuilder.h#L405-L410 | BuildMI function ]] that defaults to append instruction at the end, so you can just do

BuildMI(Split, DebugLoc(), TII.get(WebAssembly::CONST_I32), Reg)
    .addImm(Indices[Entry]);                                              
BuildMI(Split, DebugLoc(), TII.get(WebAssembly::BR)).addMBB(Dispatch);

(Note that Split does not have * anymore)

test/CodeGen/WebAssembly/irreducible-cfg.ll
224
  • If this is without irreducibility, shouldn't this be in non-irreducible-cfg.ll below?
  • Can we remove #0?
kripken marked 17 inline comments as done.Mar 12 2019, 1:52 PM

What changes fixed the previous bug?

Basically the rewrite avoids the entire previous approach of "canonicalization", of representing an inner loop by its entry. When we got that wrong, we could reach very wrong results.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
30

The NumLoops^2 part? To find mutual loop entries, that is, loop entries that can reach each other, we compare each one to the others.

This is actually just the square of the number of loops in a single scope, so nested loops don't count. Anyhow, even in the "worst case" which is a loop after a loop after a loop, this factor is going to be quite small, I'd guess.

167

When we look at a region, the entry block cannot be a looper, which simplifies things. LLVM doesn't let a function entry be in a loop (maybe for similar reasons as why it's convenient here?), and when we recurse into a loop we ignore backedges to Entry intentionally (so we see the inner part of the loop, and not the looping).

In other words, if a block is a looper, it will be seen as such in the outer scope it is in (where it is not the entry). When it is the entry into a region, and we are looking at just that region, it cannot be a looper there.

For the second issue (with that example): yeah, the comment was not clear enough, I'll update it. They key is that a loop enterer is a block not in the same loop, that enters it. So here bb2 is a looper, and bb1 is a predecessor of it, and not in the same loop (bb2 can't reach bb1), so we identify bb2 as a loop entry and bb1 as a loop enterer for it.

201

As a set, it won't have duplicate entries - so if we visit A that causes us to add C to the future work, and then visit B that also causes us to add C to the future work, then when we get to C we'll visit it once, and avoid visiting it again later (and not having anything to do the second time).

I haven't benchmarked this recently, but I remember it was beneficial in the Relooper.

222

Good idea, thanks!

270

Yeah, it's somewhat debatable (is an inner loop an "inner loop" if there's a branch going into it from several levels up?), but yes, when we see irreducibility we handle it now, instead of leaving it for the inner loop. We have to, I think - if we can branch to it now, we need to fix up that branch now - we wouldn't see it later when we recurse and ignore the outside. At least that would require a very different approach I suppose (which might be more optimal in some cases).

275

The problem is that the Graph is used outside of the loop in your code. I agree it's a little awkward as it is... another option is to use an std::unique_ptr to allow using it from the outside. But that's less clear I think.

Yes, we might find irreducibility again if we have two irreducible loops one after the other (i.e. not nested). It's *possible* we don't need to look at the irreducible one we've already fixed, however, I don't have a proof of that. And we do need to recompute the Graph again after fixing one loop (since it can affect others; and trying to update it is fairly complex an optimization), so starting another loop iteration is the simple thing. The cost is another iteration per irreducible loop, which is not that bad since they are rare.

I'll try to improve the comment here.

286

Good point, this is important to explain, adding.

347

This is code not changed in this patch (just some variable renaming), but I think you're right, fixing.

371

This is code that did not change in this patch (aside from names and clang-format), and I'm not sure I fully understand it (I think Dan wrote it?). But I don't think Map[Entry] can be overwritten? The shape of the code is

DenseMap<MachineBasicBlock *, MachineBasicBlock *> Map;
for (auto *Entry : Pred->successors()) {
  [..]
  Map[Entry] = Split;
}
378

As before, this is Dan's code that I didn't change here (aside from variable names). I'm not sure what it does...

test/CodeGen/WebAssembly/irreducible-cfg.ll
224
  • The other file has -O0, and this one doesn't specify an opt level (so it's -O2 I guess?), so I kept them separate.
  • Removed #0.
kripken updated this revision to Diff 190328.Mar 12 2019, 1:57 PM
kripken marked 3 inline comments as done.

Review feedback changes, in particular:

  • Remove LoopFixer class.
  • Various comment additions and improvements.
  • Various nits.

Real nit: We usually start CL/commit headers with [WebAssembly]

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
64
  • Not sure what's happened, but it says "Context not available" on the diff on these lines. Could you rebase your CL onto the tip of the repo?
  • We usually include WebAssembly.h in every pass, and I guess we need to include (Maybe it's not shown here because of the diff problem?)
aheejin added a comment.EditedMar 13 2019, 1:58 AM
  • There are several places in the diff that says "Context not available". I think the diff has some problems. (I thought it needed rebase but maybe it's not relevant) Could you check?
  • In this new algorithm, all tests in irreducible-cfg-nested.ll, irreducible-cfg-nested2.ll, and irreducible-cfg-exceptions.ll are reducible now, i.e. this pass does not do anything on them. But some of them are irreducible if we pass -O0. (I only checked func_2 in irreducible-cfg-nested2.ll) I guess we should check these tests and either delete the ones that are reducible or change the flags to llc to -O0 or something to make them irreducible again. Also test3 in irredducible-cfg.ll is now considered as reducible too.
  • On second thought, wouldn't it better to just specify -O0 for all irreducibility tests? -O2 basically means we don't now what would happen to our carefully created irreducible CFG.
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
167

Oh I see. Thanks. I didn't know that function entries cannot be loop headers.

201

But isn't it possible that by the time you try to add C for the second time, the first C has been already erased from the set so you do the unnecessary work again? Usually what I've seen was making worklist as a vector and have a separate set called something like Visited, to which visited blocks are added and not erased.

230

Nit: Can we take out this and makeSingleEntryLoop from the class definition? I don't think we want to inline these long functions and we can have one less level of indentation outside. I mean, outside the class,

bool WebAssemblyFixIrreducibleControlFlow::processRegion(...) {
  ...
}
270

Is there a condition in the code that we only consider inner loops that have entries from several levels up? I don't think I found one. We just treat all blocks within the region the same way.

So, for example, an outer loop consists of blocks A, B, C, D, E, and F, and among them E and F comprises an inner loop. The outer loop has two entries: A and D from the blocks outside, and the inner loop also has two entries: E and F. E's enterer is C and F's enterer is D. In this case, can we solve this case in one pass with a single dispatch block that branches into all of A, D, E, and F?

I guess I don't understand the algorithm :( Sorry for that. Please correct me if I'm mistaken.

275

I see. I missed the graph computation was there too.

371

Ah right, it's not gonna be overwritten. So the code structure is

for (MachineBasicBlock *Pred : AllPreds) {
  DenseMap<MachineBasicBlock *, MachineBasicBlock *> Map;
  for (auto *Entry : Pred->successors()) {
    ...
    Map[Entry] = Split;
  }

And in case we have entries E0 and E1, and there are predecessors P0 and P1, and the edges are like P0->E0, P0->E1, P1->E0, and P1->E1. So the both preds point to the both entries. In this case, each entry (E0 or E1) is processed twice in the inner for loop. Because Map is created within the outer loop, it's not gonna be overwritten, but Split for each E0 and E1 is gonna be created twice unnecessarily.

(I know it's not the part of the code that's changed in this CL; but I thought while we're writing the major part of the pass, maybe we can fix other things too. If you don't prefer that maybe I can submit that as a separate patch. Let me know which one you prefer.)

378

(Because code parts have moved, this comment is not where it was anymore, so)
The code is

// This is a successor we need to rewrite.                               
MachineBasicBlock *Split = MF.CreateMachineBasicBlock();                 
MF.insert(Pred->isLayoutSuccessor(Entry)                                 
              ? MachineFunction::iterator(Entry)                         
              : MF.end(),                                                
          Split);

I think the intention of the original code was, when we insert the Split which is to serve as a stepping stone between Pred and Entry, if currently Pred is right before Entry in the BB order, we want to insert Split between them, which makes a natural order. If not, we just append the new Split at the end of the function. I think this bug was there all along for years, and this was all newly created blocks are appended at the end.

I know this is not related to this CL, but if you prefer you can fix it maybe, or I can submit that as a separate patch.

390

Ping

test/CodeGen/WebAssembly/irreducible-cfg.ll
224
  • You can use two different commands and check them separately within a single file.
; RUN: llc < %s -O0 -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers | FileCheck %s
; RUN: llc < %s -O2 -asm-verbose=false -verify-machineinstrs -disable-block-placement -wasm-disable-explicit-locals -wasm-keep-registers | FileCheck -check-prefix=OPT %s

; OPT-NOT: br_table
define hidden void @ps_hints_apply() {
  ...
}

; CHECK-NOT: br_table
define hidden i32 @_Z15fannkuch_workerPv(i8* %_arg) {
  ...
}

Here is the manual for this -check-prefix option.

  • Nit: Can we remove hidden too?
test/CodeGen/WebAssembly/non-irreducible-cfg.ll
13

Nit: Can we remove hidden and #0?

aheejin added inline comments.Mar 13 2019, 4:18 AM
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
259

Nit: How about renaming this to EntriesForSingleLoop or something and start by inserting LoopEntry first? (I don't really like the name EntriesForSingleLoop, but can't think of a better alternative now... I guess there can be better ones)

BlockSet EntriesForSingleLoop;
EntriesForSingleLoop.insert(LoopEntry);
for (auto *OtherLoopEntry : Graph.getLoopEntries()) {
  if (Graph.canReach(LoopEntry, OtherLoopEntry) &&
      Graph.canReach(OtherLoopEntry, LoopEntry))
    EntriesForSingleLoop.insert(OhterLoopEntry);
}

if (EntriesForSingleLoop.size() > 1) {
  makeSingleEntryLoop(EntriesForSingleLoop, Blocks, MF);
  ...
}
...

That way we don't need to std::move this later and don't need to check if (OtherLoopEntry != LoopEntry). And in my opinion if (EntriesForSingleLoop.size() > 1) makes it clear that we take a multiple entry loop and transform it into a single entry loop.

This is just a matter of style preference, so if you don't like it, please ignore it.

270

Oh, nevermind. I think I got it. :) Inner loop entries that are not reachable from outside will not be identified as loop entries when we examine the outer region, because those inner loop entries can also reach their preds, as long as the preds are not from outside and in an outer loop.

292

we may only alter branches 'in' existing..?

test/CodeGen/WebAssembly/irreducible-cfg.ll
224

On second thought, wouldn't it better to just specify -O0 for all irreducibility tests? -O2 basically means we don't now what would happen to our carefully created irreducible CFG.

kripken marked 11 inline comments as done.Mar 13 2019, 1:40 PM
kripken added inline comments.
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
64

Rebasing - hopefully that fixes the context issue.

Yeah, WebAssembly.h was included already, so might be the diff context problem.

201

The way you suggest is probably faster overall, I agree, I'll change it to that.

259

I agree the move is not nice. I think we can just add LoopEntry to MutualLoopEntries and simplify things that way, updating the patch with that, let me know what you think.

292

Thanks, I'll fix that.

371

I'd prefer we do it separately since this patch already rewrites all the other code in this file, aside from that one unchanged function. Seems safer to change it later.

390

Same issue as before - I'd prefer to not modify this one function in this patch (which changes everything aside from that one function), for safety.

(I did rename a bunch of variables, because I wanted to keep names consistent throughout the file - that seemed necessary in this patch.)

test/CodeGen/WebAssembly/irreducible-cfg.ll
224

Removed hidden, and changed to -O0 in irreducible-cfg. Yeah, that seems like the right thing to do for irreducibility tests.

I didn't want to change irreducible-cfg-exceptions.ll's optimization level though because I'm not sure that wouldn't invalidate it.

kripken updated this revision to Diff 190491.Mar 13 2019, 1:43 PM
kripken marked 2 inline comments as done.

Various changes from the feedback:

  • Change how the LoopBlocks work list works, using a set+vector instead of just a set.
  • Move large functions out of the class definition.
  • Avoid a move in MutualEntries, instead have them always contain all the mutual entries (which includes the original entry, as it is mutual to itself).
  • Test cleanup; use -O0 for the main irreducibility tests; remove old irreducibility tests which were invalid.
  • Comment improvements.
  • Sorry, but at least some of the tests in irreducible-cfg-nested.ll and irreducible-cfg-nested2.ll were still irreducible if we passed -O0. Do we want to delete all of them? Shouldn't we keep the ones that are still irreducible with -O0?
  • Real nit: We usually start CL/commit headers with [WebAssembly]
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
259

I think this is better, thanks.

aheejin added inline comments.Mar 13 2019, 6:53 PM
test/CodeGen/WebAssembly/irreducible-cfg.ll
288

Nit: newline at the end

kripken retitled this revision from WebAssembly: Irreducible control flow rewrite to [WebAssembly] Irreducible control flow rewrite.Mar 14 2019, 2:05 PM
kripken updated this revision to Diff 190722.Mar 14 2019, 2:10 PM

Updates:

  • Thanks, indeed one of the tests removed before (func_2) was still irreducible at -O0, restored it into the main irreducible file test, which is now at -O0. The other (tre_parse) was actually not irreducible, so we mis-identified it, and I left it out.
  • Fixed extra newline at the end of a test.
  • Renamed the title to have [WebAssembly]
aheejin accepted this revision.Mar 15 2019, 5:36 PM

LGTM, thank you. Do you want me to commit this for you? Or are you gonna get a commit access?

This revision is now accepted and ready to land.Mar 15 2019, 5:36 PM

Great, and thanks for the comments!

I should probably get commit access, yeah :) Can you please commit this one, though, so it doesn't wait on that?

This revision was automatically updated to reflect the committed changes.
aheejin added inline comments.Mar 18 2019, 6:50 PM
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
378

Just letting you know, I was incorrect on this part; the code was correct. I realized that while I was working on D59462..