This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Optimize Irreducible Control Flow
ClosedPublic

Authored by kripken on Dec 7 2018, 5:38 PM.

Details

Summary

Irreducible control flow is not that rare, e.g. it happens in malloc and 3 other places in the libc portions linked in to a hello world program. This patch improves how we handle that code: it emits a br_table to dispatch to only the minimal necessary number of blocks. This reduces the size of malloc by 33%, and makes it comparable in size to asm2wasm's malloc output.

Added some tests, and verified this passes the emscripten-wasm tests run on the waterfall (binaryen2, wasmobj2, other).

Diff Detail

Repository
rL LLVM

Event Timeline

kripken created this revision.Dec 7 2018, 5:38 PM

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

mgrang added inline comments.Dec 10 2018, 2:22 PM
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
264 ↗(On Diff #177363)

Please use the range-based llvm::sort instead of std::sort.

llvm::sort(SortedEntries, comparator);

See https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

Not for malloc - I think @jgravelle-google may have been looking into that though?

Some other musl libc elements with irreducible control flow that show up in a hello world are actually irreducible in the source (!), I verified.

kripken updated this revision to Diff 177606.Dec 10 2018, 3:13 PM

Use llvm::sort following @mgrang's feedback. Thanks!

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

Not for malloc - I think @jgravelle-google may have been looking into that though?

Discussing offline, there is some investigation but the details are not yet clear.

I agree figuring that out is very important - if we can avoid unnecessary irreducible control flow that's great. I think it's a separate issue from this patch, though, since as mentioned above even libc has a few places with source-code level irreducible control flow. We could only fix that by rewriting the source, but even that would just be for libc, and not other user code. So regardless I think we should emit compact code when irreducible code occurs, which is what this patch focuses on.

First brief look at the pass.. mostly low-level nits

  • Nit: Could you wrap comments to 80 cols? Some lines end prematurely now. (clang-format does not do that for us.. In vim you can use gq, but not sure about other editors)
  • There are many lambda functions that can be made into plain static functions or class member functions. I think it is more conventional and readable that way unless they are very short or are meant to be passed into arguments to another function.. I feel it's not very easy to read when other several lambda function definitions are interleaved with the function body.
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
85 ↗(On Diff #177606)

I don't think we need dominator tree anymore, so I think we can delete these two lines.

87 ↗(On Diff #177606)

Do we preserve loop info?

109 ↗(On Diff #177606)

Nit: LLVM coding standards say function names should start with a lowercase letter.

159 ↗(On Diff #177606)

Nit: using instead of typedef

160 ↗(On Diff #177606)

LLVM coding standards warn against uses of non-deterministic containers. Also LLVM progamming manual prevents the uses of unordered_set or unordered_map. (ref1, ref2)

Maybe we can choose from other set-like containers and map-like containers?

164 ↗(On Diff #177606)

Nit: using instead of typedef

179 ↗(On Diff #177606)

Nit: You can do

WorkList.emplace_back(MBB, Succ);
214 ↗(On Diff #177606)

Nit: You can merge these lines maybe with

auto *MBB, *Succ;
std::tie(MBB, Succ) = WorkList.pop_back_val()
382 ↗(On Diff #177606)

Should we recompute these three every time we make a change?

  • This pass does not use liveness info, so I think we can do this at the end of the whole pass if any change was made.
  • I don't think we need to do renumbering? We only use BB numbers at CFGSort / CFGStackify and CFGSort recomputes all numbers itself before doing anything.
  • This pass does not seem to use dominator tree, right? Then deleting AU.addPreserved<MachineDominatorTree>(); in getAnalysisUsage() should be sufficient. I don't think we even need to require it.
383 ↗(On Diff #177606)

Should we recompute loop info for the whole function every time we make any change? MachineLoopInfo class seems to have various modifier functions. Can we use these on-the-fly during the pass?

405 ↗(On Diff #177606)

How about flattening Iteration and DoVisitLoop into this while loop, like this? I find nested lambdas a bit hard to read.

while (...) {                                                                 
  SmallVector<MachineLoop *, 8> Worklist;                                                    
  // Visit the function body, which is identified as a null loop.              
  Worklist.push_back(nullptr);                                                 
  // Visit all the loops.                                                      
  Worklist.append(MLI.begin(), MLI.end());      
                             
  while (!Worklist.empty()) {                                                  
    MachineLoop *Loop = Worklist.pop_back_val();                               
    Worklist.append(Loop->begin(), Loop->end());                                                          
    if (VisitLoop(MF, MLI, Loop)) {           
      ...                                 
    }                                                                     
  }                                                                            
}

Is it possible to reduce test cases more while they exhibit the same behavior? We usually try to avoid big generated test cases to our LLVM regression tests, so..

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

Not for malloc - I think @jgravelle-google may have been looking into that though?

@jgravelle-google Have you had a chance to look at this?

Some other musl libc elements with irreducible control flow that show up in a hello world are actually irreducible in the source (!), I verified.

Do you by chance remember which functions this was?

kripken marked 12 inline comments as done.Dec 13 2018, 3:01 PM

Thanks for the comments @aheejin!

All should be addressed in the patch I'm updating now, or if not then I replied to those comments.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
85 ↗(On Diff #177606)

You're right that we don't need it, but it turns out recomputing MLI crashes if I don't keep this code there. I guess the dominator tree is used in MLI, and MLI does not know to recompute it itself? (I don't know anything about LLVM internals...)

87 ↗(On Diff #177606)

Yes, we preserve it by recomputing it if we invalidate it.

382 ↗(On Diff #177606)

We need to recompute the loop info each time since we use it, which I think means we need to recompute the others (like dominator trees, see comment above).

In practice, often there is no irreducible control flow, so no recomputing is done. Or it can be resolved in one iteration, in which case we recompute once as needed. It's rare to need more work than that.

383 ↗(On Diff #177606)

I'd prefer not to as it seems more complex than worthwhile, given the comment above on how much work is done in the common cases here. But if you feel strongly I can do that.

kripken updated this revision to Diff 178143.Dec 13 2018, 3:01 PM

Is it possible to reduce test cases more while they exhibit the same behavior? We usually try to avoid big generated test cases to our LLVM regression tests, so..

Not a lot more - perhaps I can tidy up the names and remove an instruction or two, but the CFG they represent is necessary to reproduce the issue (no simple CFG can represent irreducible control flow that requires multiple passes to reduce...).

If they are too big for the LLVM test suite, I can just remove them - they will still be run in the emscripten test suite.

Do we know what part of the optimizer or codegen is introducing the irreducible control flow in malloc?

Some other musl libc elements with irreducible control flow that show up in a hello world are actually irreducible in the source (!), I verified.

Do you by chance remember which functions this was?

One is

https://github.com/kripken/emscripten/blob/incoming/system/lib/libc/musl/src/multibyte/mbsrtowcs.c

That one looks clearly irreducible. Musl has a bunch more goto cases, which end up irreducible in the backend, but it can be kind of hard to inspect visually, like

https://github.com/kripken/emscripten/blob/incoming/system/lib/libc/musl/src/locale/iconv.c

In the CL/commit description, could you add some explanation on what has changed, i.e., what unnecessary cases of transformation this patch reduces?

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
24 ↗(On Diff #178143)

assign to -> assign?

76 ↗(On Diff #178143)

clang-format

98 ↗(On Diff #178143)

a loop -> an inner loop, just to be clear

183 ↗(On Diff #178143)

Just a suggestion on the workflow... Currently WorkList is a class variable modified in multiple functions (inserted in maybeInsert and popped in run), and maybeInsert takes MBB and Succ but MBB is already canonicalized at that point but it canonicalize Succ within maybeInsert and if it is not canonicalized we bail out, which makes the workflow little hard to follow.

I think making WorkList as just a local variable within run and call both canonicalize and canonicalizeSuccessor in run, and inlining the remaining contents of maybeInsert into the run function would make it a bit simpler.

222 ↗(On Diff #178143)

Do we need two separate containers? If you want a vector with a O(1) search, maybe you can use SetVector and merge these two?

235 ↗(On Diff #178143)

How about if (LLVM_LIKELY(Entries.size() <= 1)) to signal we wouldn't find an irreducible control flow in usual cases?

280 ↗(On Diff #178143)

Is there a word missing between 'every' and 'that'?

397 ↗(On Diff #178143)

How about while (LLVM_UNLIKELY(runIteration(MF, MLI)) to signal that this is unlikely to be true?

85 ↗(On Diff #177606)

Oh, you're right, sorry.

test/CodeGen/WebAssembly/irreducible-cfg-exceptions.ll
4 ↗(On Diff #178143)

Now we use just "wasm32-unknown-unknown". In other test cases too.

6 ↗(On Diff #178143)

Can we delete this?

17 ↗(On Diff #178143)

Can we delete all attributes (like #0, #1, ...) from this and other test cases if they are not necessary to show the bug? Also hidden and unnamed_addr can be deleted as well.

113 ↗(On Diff #178143)

We can delete these if not necessary (for other test cases too)

test/CodeGen/WebAssembly/irreducible-cfg-nested.ll
11 ↗(On Diff #178143)

I think we can delete dso_local, fastcc, and unnamed_addr here

kripken marked 17 inline comments as done.Dec 14 2018, 11:11 AM

Thanks, submitting a fixed patch now.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
24 ↗(On Diff #178143)

We assign to the same one in all paths. I'll reword it.

183 ↗(On Diff #178143)

To make sure I understand, inlining the remaining contents would mean doing the same work 3 times (for each inlined location) to do the insert, check if it actually added, and add to the work list if so? That feels less good to me, but happy to do it either way.

222 ↗(On Diff #178143)

Hmm, can I sort a SetVector, though? It says the order of iteration is the order of insertion (which is fixed).

kripken updated this revision to Diff 178257.Dec 14 2018, 11:12 AM
kripken marked an inline comment as done.
aheejin added inline comments.Dec 14 2018, 3:00 PM
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
183 ↗(On Diff #178143)

I thought, after taking out calling canonicalizeSuccessor out of maybeInsert, maybeInsert would be like 2-3 lines, so it would be OK to inline, but maybe it's less desirable. I was mainly concerned about calling canonicalize and canonicalizeSuccessor in two different places, but maybe it's ok this way. :)

222 ↗(On Diff #178143)

If the insertion / iteration order is fixed, do we need sorting at all?

test/CodeGen/WebAssembly/irreducible-cfg-exceptions.ll
49 ↗(On Diff #178257)

I understand this test is complex, but maybe for these couple extremely long basic block names, we can simplify them a bit..?

kripken marked 5 inline comments as done.Dec 17 2018, 3:41 PM

Thanks for the comments, uploading an updated patch now.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
183 ↗(On Diff #178143)

I'll try to improve the comments to clarify that. There should be just one location that calls canonicalizeSuccessor (maybeInsert), aside from an assertion. I think the assertions help, but if you feel they make the code less clear, then that's fine with me of course, and it would be shorter with fewer of them.

I'll also add some comments on where canonicalize/canonicalizeSuccessor should be called, so hopefully they look unsurprising in those locations.

222 ↗(On Diff #178143)

Yeah, good point - I was using an unsorted set earlier on (for LoopBlocks), but if I replace it with a SetVector too then everything indeed ends up deterministic.

I do have some vague worry about using SetVectors everywhere adding some overhead, but probably I'm overthinking it?

kripken updated this revision to Diff 178547.Dec 17 2018, 3:43 PM
kripken marked an inline comment as done.
  • Use SetVector for determinism
  • Simplify names in exceptions testcase
  • Improve comments about canonicalization
aheejin accepted this revision.Dec 17 2018, 4:34 PM

LGTM to me modulo a nit.

lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
222 ↗(On Diff #178143)

Yeah maybe we are unnecessarily using too many SetVectors... how about using not SetVector but other set such as SmallPtrSet or something for all other usages and revive SortedEntires, and sort them once and for all based on the MBB number, as you did before? But maybe not unordered_set, because its use is discouraged because it is expensive. Sorry for going back and forth :(

This revision is now accepted and ready to land.Dec 17 2018, 4:34 PM
kripken marked an inline comment as done.Dec 18 2018, 12:59 PM
kripken added inline comments.
lib/Target/WebAssembly/WebAssemblyFixIrreducibleControlFlow.cpp
222 ↗(On Diff #178143)

Sounds good, thanks, updating patch.

kripken updated this revision to Diff 178769.Dec 18 2018, 1:00 PM

Use SmallPtrSet for our sets, and return to using a vector for SortedEntries which we sort, avoiding SetVector.

This revision was automatically updated to reflect the committed changes.