This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Optimize the number of routing blocks in FixIrreducibleCFG
ClosedPublic

Authored by aheejin on Mar 16 2019, 10:44 AM.

Details

Summary

Currently we create a routing block to the dispatch block for every
predecessor of every entry. So the total number of routing blocks
created will be (# of preds) * (# of entries). But we don't need to do
this: we need at most 2 routing blocks per loop entry, one for when the
predecessor is inside the loop and one for it is outside the loop. (We
can't merge these into one because this will creates another loop cycle
between blocks inside and blocks outside) This patch fixes this and
creates at most 2 routing blocks per entry.

This also renames variable Split to Routing, which I think is a bit
clearer.

Diff Detail

Repository
rL LLVM

Event Timeline

aheejin created this revision.Mar 16 2019, 10:44 AM
aheejin added a comment.EditedMar 16 2019, 10:49 AM

@kripken I wonder how much this will reduce in code size... How can I test with malloc or fannkuch (or other benchmarks)? (Where is the source / and what are the command line options you use?)

aheejin marked an inline comment as done.Mar 16 2019, 10:50 AM
aheejin added inline comments.
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
406 ↗(On Diff #190978)

This was previously DenseMap, but I changed this to std::map because to use std::pair as a key to DenseMap I need to write DenseMapInfo routines and I don't think it's really worth it because this map will be quite small anyway.

aheejin updated this revision to Diff 190980.Mar 16 2019, 10:53 AM

Newline in test

riccibruno added inline comments.
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
406 ↗(On Diff #190978)

Note that DenseMapInfo is already implemented for std::pair

aheejin marked an inline comment as done.Mar 16 2019, 6:23 PM
aheejin added inline comments.
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
406 ↗(On Diff #190978)

Thank you. But I think I was not correct by saying std::pair is not supported; the real problem looks like we don't have DenseMapInfo for bool. For example, this does not compile (without extra implementation of DenseMapInfo for bool)

DenseMap<bool, int> Map;
aheejin updated this revision to Diff 191004.Mar 16 2019, 7:13 PM

Comment grammar fix

aheejin planned changes to this revision.Mar 16 2019, 8:48 PM
aheejin updated this revision to Diff 191011.Mar 17 2019, 1:42 AM

Fix fallthrough condition check

Nice idea, and overall looks good to me. I didn't quite understand the TII part though.

Btw, are there no backend passes that would do this optimization anyhow (merge identical branches)?

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
390 ↗(On Diff #191011)

What is the advantage of a map to bool over a set?

aheejin updated this revision to Diff 191229.Mar 18 2019, 6:40 PM
aheejin marked an inline comment as done.
  • Use a set instead for InLoop
aheejin marked an inline comment as done.EditedMar 18 2019, 7:10 PM

Nice idea, and overall looks good to me. I didn't quite understand the TII part though.

TargetInstrInfo::analyzeBranch function's output is a bit complicated; we have to check the return value and three parameters that are passed (TBB, FBB, and Cond). The detailed description on the meaning of its output is here. The default return value true means (not intuitively) the branch is not analyzable, and each target is supposed to override this function. Our overridden version is here.

Before, we added a routing block for each predecessor / entry combination, so if in the original code a predecessor fell through to an entry, we placed a routing block right there where the entry block was, maintaining the fall-through relationship, so that in the new code the predecessor fell through to the new routing block. But after this patch we don't create a routing block per pred/entry conbination; there can be only one routing block for multiple predecessors, and one of those predecessor might have fell through to the entry in the original code, i.e., it was the layout predecessor of the entry. But we create a routing block when we first encounter any predecessor, so it's possible the new routing block is not created right after the layout predecessor anymore. For example, when the original code looks like this:

pred0:
  ...
  br label %entry
pred1:
  ...
  (falls through to %entry)
entry:
  ...

Before, we created two routing blocks, each for pred0 and pred1, so the result will be like this:

pred0:
  ...
  br label %routing0
pred1:
  ...
  (falls through to %routing1)
routing1:   // This `routing1` is created right after `pred1`, because `pred1` was the layout predecessor of `entry` 
  i32.const 0
  br %dispatch
entry:
  ...
routing0:
  i32.const 0
  br %dispatch
dispatch:
 br_table ...

But now we create only one routing block in this case, so we end up with

pred0:
  ...
  br label %routing
pred1:
  ...
  br label %routing            <-- We need this new branch
entry:
  ...
routing:
  i32.const 0
  br %dispatch
dispatch:
 br_table ...

We need the new branch, because we already created routing when we first encountered pred0, and for pred1 we need a branch. Ideally we can do some more optimization here by first checking if there is any layout predecessor among all predecessors and create the routing block right after that, but all of these can change anyway in CFGSort, so I'm not sure how much benefit it will give.

Anyway, so for the code:

// In case the predecessor fell through to the successor in the original
// CFG and and the new routing block is not the predecessor's layout
// successor, we need to add a branch to the new routing block.
MachineBasicBlock *TBB = nullptr, *FBB = nullptr;
SmallVector<MachineOperand, 4> Cond;
bool Analyzable = !TII.analyzeBranch(*Pred, TBB, FBB, Cond);
if (Analyzable &&
    ((Cond.empty() && !TBB && !FBB) ||
     (!Cond.empty() && !FBB && Pred->isLayoutSuccessor(Succ))))
  BuildMI(Pred, DebugLoc(), TII.get(WebAssembly::BR)).addMBB(Routing);
  • Cond.empty() && !TBB && !FBB is for when we don't have any branches and just fall through to the next block.
  • !Cond.empty() && !FBB && Pred->isLayoutSuccessor(Succ) is for when we only have a conditional branch in the block and if that condition is not taken we fall through to the next block.

Btw, are there no backend passes that would do this optimization anyhow (merge identical branches)?

There are, but all of them run before this pass, because this pass is the last pass in the wasm compilation pipeline that modifies CFG. (which is not strictly true, because we have to modify CFG in CFGStackify to resolve things for exception handling, but I mean in general.)

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
390 ↗(On Diff #191011)

None. It should've been a set. I think I was changing the implementation this way and that and somehow ended up with that weird map. :( Thanks for the catch.

Thanks for the explanation! I think I understand.

So this patch makes us only create a branch there if we don't happen to fall through to the right place. Does that mean that if we ran some CFG-related pass after this, it would not see a valid CFG? (I guess we don't run such a pass now, but I wonder if we may reorder passes in the future at some point.)

aheejin added a comment.EditedMar 20 2019, 6:47 AM

Thanks for the explanation! I think I understand.

So this patch makes us only create a branch there if we don't happen to fall through to the right place. Does that mean that if we ran some CFG-related pass after this, it would not see a valid CFG? (I guess we don't run such a pass now, but I wonder if we may reorder passes in the future at some point.)

I don't think so.

  1. This pass should be the last pass that modifies CFG anyway, considering what this pass does. (There's an exception for exception handling in D48345 though)
  2. Even if for some reason we have to change CFG after this pass, that pass is responsible for making everything valid, as in other CFG-changing passes. That does not change whether the pass is before or after this pass.

But to think about it, because CFGSort tried to preserve the BB order if possible, actually it may be worthwhile to figure out if there is a predecessor that is the layout predecessor of the entry and try to place the routing block after that, which can save a branch. Let me try that.

aheejin planned changes to this revision.Mar 20 2019, 8:09 AM
aheejin updated this revision to Diff 191708.EditedMar 21 2019, 8:35 AM

Layout predecessor optimization. Now if there's a layout predecessor to the entry, we generate the routing block right after the layout predecessor. So now there's no case we need to analyze branch and generate extra brs.

aheejin updated this revision to Diff 191720.Mar 21 2019, 9:29 AM
  • Comment grammar fix
kripken accepted this revision.Mar 25 2019, 2:48 PM

Nice!

This revision is now accepted and ready to land.Mar 25 2019, 2:48 PM
This revision was automatically updated to reflect the committed changes.