This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add WebAssemblyException information analysis
ClosedPublic

Authored by aheejin on Mar 5 2018, 9:43 PM.

Details

Summary

A WebAssemblyException object contains BBs that belong to a 'catch' part
of the try-catch-end structure. Because CFGSort requires all the BBs
within a catch part to be sorted together as it does for loops, this
pass calculates the nesting structure of catch part of exceptions in a
function. Now this assumes the use of Windows EH instructions.

Diff Detail

Repository
rL LLVM

Event Timeline

aheejin created this revision.Mar 5 2018, 9:43 PM
majnemer added inline comments.
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
25–26 ↗(On Diff #137127)

What happens if some BBs belong to two different catch blocks?

WIP: Will add a unit test soon.

aheejin added inline comments.Mar 6 2018, 12:34 AM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
25–26 ↗(On Diff #137127)

Like loop nesting structure, catch block structure should form a nested structure, so a BB belongs to both catch1 and catch2, catch1 should contain catch2 or vice versa.

aheejin planned changes to this revision.Mar 6 2018, 5:18 PM

To handle if-else structure as well, I will create WebAssemblySortingBlock or something that can calculate the nesting structure of all exceptions, loops and if-elses.

majnemer added inline comments.Mar 6 2018, 9:19 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
25–26 ↗(On Diff #137127)

This is not so in LLVM IR. For example:

define dso_local void @f() local_unnamed_addr #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
  invoke void @x0()
          to label %5 unwind label %1

; <label>:1:                                      ; preds = %0
  %2 = landingpad { i8*, i32 }
          catch i8* null
  %3 = extractvalue { i8*, i32 } %2, 0
  %4 = tail call i8* @__cxa_begin_catch(i8* %3) #3
  br label %11

; <label>:5:                                      ; preds = %0
  invoke void @x1()
          to label %10 unwind label %6

; <label>:6:                                      ; preds = %5
  %7 = landingpad { i8*, i32 }
          catch i8* null
  %8 = extractvalue { i8*, i32 } %7, 0
  %9 = tail call i8* @__cxa_begin_catch(i8* %8) #3
  br label %11

; <label>:10:                                     ; preds = %5
  ret void

; <label>:11:                                     ; preds = %6, %1
  tail call void @abort() #4
  unreachable
}

declare dso_local void @x0() local_unnamed_addr #1

declare i32 @__gxx_personality_v0(...)

declare i8* @__cxa_begin_catch(i8*) local_unnamed_addr

declare dso_local void @x1() local_unnamed_addr #1

; Function Attrs: noreturn nounwind
declare dso_local void @abort() local_unnamed_addr #2

In this example, BB#11 belongs to two different catch blocks.

aheejin added inline comments.Mar 6 2018, 10:06 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
25–26 ↗(On Diff #137127)
An exception catch part is defined as a BB with catch instruction and all other BBs dominated by this BB.

In this case BB#11 is neither in the two catch blocks. Catch#1 contains BB#6 and BB#11, and Catch#2 contains BB#2 only.

majnemer added inline comments.Mar 6 2018, 10:36 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
25–26 ↗(On Diff #137127)

Why doesn't Catch#2 contain BB#11? The two catches are identical, they just jump to a shared common block after executing __cxa_begin_catch.

aheejin added inline comments.Mar 8 2018, 1:02 AM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
25–26 ↗(On Diff #137127)

What I meant by BB#11 belongs to neither of the catches was more about the real wasm code structure. Yes, it belongs to the catches semantically. But the wasm code will be like

(block
 (try
  ...
  catch
  ...
  br 1
 )
)
 ...
 (try
  ...
  catch
  ...
  br 1
 )
)
call @abort

This code is not specific to your code example; this is trying to show the simplest case that two catch blocks jump to the common code. This text format is called the folded text format in wasm, and the real wasm code does not have parentheses of course, but this format is easier to show scope information.

br is an unconditional branch instruction in wasm. The argument 1 means it breaks out of 2 scopes. (br 0 means it breaks out of 1 scope.) So br 1 within a try breaks out both the try scope and the outer block scope to get to the call @abort instruction.

So, unless you want to duplicate call @abort part, in should be located outside of the two catch blocks, and there should be a branch within each of the catches to jump to the shared code. An instance of WebAssemblyExceptionInfo represents BBs within a single catch in wasm code (try is not there yet, so just BBs starting from catch). This is the reason why it is defined as 'the landing pad BB and all other BBs dominated by it`.

aheejin updated this revision to Diff 144470.Apr 28 2018, 10:39 PM

Updated the patch to use Windows EH instructions

aheejin updated this revision to Diff 144684.May 1 2018, 2:10 AM
  • Added getHeader method for CFGSort
aheejin updated this revision to Diff 144852.May 2 2018, 4:33 AM
  • Fixed triples to 'wasm32-unknown-unknown'
aheejin updated this revision to Diff 145128.May 3 2018, 6:42 PM
  • Add blocks() method for later use
aheejin updated this revision to Diff 145164.May 4 2018, 2:23 AM
  • Add more methods to make the interface consistent w/ Loop
aheejin updated this revision to Diff 145610.May 7 2018, 6:15 PM
  • clang-tidy
aheejin updated this revision to Diff 146919.May 15 2018, 2:30 PM
  • Group two terminate pads (catch and catch_all) together
aheejin updated this revision to Diff 147700.May 19 2018, 9:48 PM
  • clang-format
majnemer added inline comments.May 20 2018, 12:00 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
60–72 ↗(On Diff #147700)

Similar concern here.

79–90 ↗(On Diff #147700)

Is it not possible for the MBB to branch to block which then calls std::terminate?

lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
38–39 ↗(On Diff #147700)

DeleteContainerPointers(SubExceptions);

Even better would be to use a std::vector<unique_ptr<WebAssemblyException>>

aheejin added inline comments.May 20 2018, 9:55 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

Hmm, good point... I assumed a terminatepad would be always a single BB but it may not hold. Do you think I should DFS all descendants from here?

lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
38–39 ↗(On Diff #147700)

To change this to std::vector<unique_ptr<WebAssemblyException>>, should I RAII WebAssemblyException object at the first creation time and use std::move whenever I pass it to a function until I figure out which container to insert this?

majnemer added inline comments.May 20 2018, 10:22 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

We handled problems like this one by using getFuncletMembership on the MachineFunction. This does a DFS to let you know which MBBs are part of which catch scopes. I think I'd find a way to reuse it for this.

aheejin added inline comments.May 20 2018, 10:23 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

Hmm, no, I guess I shouldn't.. doing DFS from all catchpads or cleanuppads sounds like unnecessarily expensive. Should I search for a function call to __clang_call_terminate and search upward from there..?

majnemer added inline comments.May 20 2018, 10:38 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

IIRC, getFuncletMembership is O(# MBBs) which is no worse than the fundamental amount of work you are doing here. It shouldn't be a considerable amount of overhead.

aheejin added inline comments.May 20 2018, 11:20 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

Oh, I added the last comment before seeing your new comments about getFuncletMembership, sorry.

You mean, I can replace the whole analysis (not only terminate pad thing) with getFuncletMembership? Or are you talking only about this terminatepad detection?

If you meant the former, I actually thought about not doing this DFS myself but just use the result from getFuncletMembership, but I couldn't find a way to create parent-child relationship between exception objects with getFuncletMembership. That function just returns a map of <BB, funclet number>, which does not say anything about whether a funclet is included in another funclet or is a top-level funclet. Can I reconstruct that information using getFuncletMembership?

And by the way I changed the function name to getEHScopeMembership in D47005 after you accepted it, per @dschuff's request. PTAL if it looks OK.

aheejin added inline comments.May 21 2018, 4:09 AM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

Oh, maybe I can just search for a call to __clang_call_terminate() and figure out which WebAssemblyException object it belongs to, and search for BBs that are contained in that exception object...

majnemer added inline comments.May 21 2018, 8:39 AM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

Wouldn't finding the other BBs be equivalent to inverting the map given by getEHScopeMembership?

lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
38–39 ↗(On Diff #147700)

Yes, I think that makes sense.

aheejin added inline comments.May 21 2018, 2:04 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

As I said above, can I reconstruct LoopInfo-like parent-child relationship with getFuncletMembership?

aheejin added inline comments.May 21 2018, 5:15 PM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
38–39 ↗(On Diff #147700)

Come to think of it, this can be complicated. WebAssemblyException object is created in recalculate, which calls discoverAndMapException. Within discoverAndMapException, if a given exception is found to be a subexception of some other exception, it is inserted into a SubExceptions vector in the parent exception. If not, it is not inserted into anywhere in discoverAndMapException, but it is inserted into WebAssemblyExceptionInfo::TopLevelExceptions after returning from discoverAndMapException. So the code should look something like

void WebAssembly::discoverAndMapException(std::unique_ptr<WebAssemblyException> WE) {
  ...
  if (WE is a subexception of a parent exception P)
    add WE to P.SubExceptions;
  ...
}

void addTopLeveException(std::unique_ptr<WebAssemblyException> WE);

void WebAssemblyExceptinInfo::recalculate() {
  ...
  std::unique_ptr<WebAssemblyException> o(args);
  for (...) {
    auto *WE = new WeAssemblyException(EHPad);
    discoverAndMapException(std::move(WE), MDT);
    ...
  }
  ...
  if (top level exception)
    addTopLevelException(std::move(WE));
  ...
}

But the problem is, once we move the object to discoverAndMapException, we can't dereference it or move it again after returning from it. So it is complicated because

  1. Where you create an object and insert it to a container are different
  2. There are multiple places and multiple containers in which an object can be inserted.

I think DeleteContainerPointers(SubExceptions); is a simple and sane solution. What do you think?

majnemer added inline comments.May 22 2018, 8:47 AM
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

I don't think you can do it with getEHScopeMembership, it just tells you which MBBs are in the same EH scope... However, testing if something is a top-level scope should be possible I think. You should be able to go from the MBB which starts the scope back to the llvm::BasicBlock and inspect its pad to see if it has a parent.

lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
38–39 ↗(On Diff #147700)

SGTM

aheejin updated this revision to Diff 148114.May 22 2018, 3:13 PM
aheejin marked an inline comment as done.
  • Use DeleteContainerPointers
lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
79–90 ↗(On Diff #147700)

Actually we should reconsider the terminatepad-related transformations that happen in WebAssemblyExceptionPrepare pass, before this pass runs. WebAssemblyExceptionPrepare is not WasmEHPrepare that runs before instruction selection; it is a MachineFunctionPass that runs before we run WebAssemblyExceptionInfo. WebAssemblyExceptionInfo is in D46803 and I added you as a reviewer there just now. I'll continue this discussion there.

aheejin updated this revision to Diff 148335.May 23 2018, 10:22 PM
  • One more use of DeleteContainerPointers
aheejin updated this revision to Diff 148585.May 25 2018, 4:42 AM

Updated exception grouping algorithm: now it simply gather all the BBs
dominated by an EH pad, rather than stopping at catchret/cleanupret/unreachable.
CFGStackify seems to work either way, and this makes things simpler:

  1. We don't need to store frontier BBs ourselves and be able to just use existing MachineDominanceFrontier
  2. We can replace catchret/cleanupret earlier in ExceptionPrepare, so we need a single pass rather than two passes.
aheejin planned changes to this revision.Jun 2 2018, 2:37 AM
aheejin updated this revision to Diff 151641.Jun 17 2018, 2:15 AM
  • Rebase: now using utility functions in WebAssemblyUtilities.cpp

Looks pretty reasonable otherwise. I like the MIR unittest 👌

lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp
14 ↗(On Diff #151641)

Ah, this is the explanation I was looking for before. I would put it (at least the part that describes the class) in the header where the class is defined.

18 ↗(On Diff #151641)

I would reverse it and say something like "A WebAssemblyException represents/contains/consists of these blocks/this region"

lib/Target/WebAssembly/WebAssemblyExceptionInfo.h
27 ↗(On Diff #151641)

Per our discussion on the previous CL, a big comment here explaining what this class represents, and how it is used.

aheejin updated this revision to Diff 151997.Jun 19 2018, 4:50 PM
aheejin marked 3 inline comments as done.
  • Address comments

@majnemer I inserted a routine to ensure a terminate pad should be a single BB in this previous pass (D46803). Do you have any other concerns here?

dschuff accepted this revision.Jun 22 2018, 11:14 AM
This revision is now accepted and ready to land.Jun 22 2018, 11:14 AM
This revision was automatically updated to reflect the committed changes.