This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Use Windows EH instructions for Wasm EH
ClosedPublic

Authored by aheejin on Mar 27 2018, 7:15 AM.

Details

Summary

Because wasm control flow needs to be structured, using WinEH
instructions to support wasm EH brings several benefits. This patch
makes wasm EH uses Windows EH instructions, with some changes:

  1. Because wasm uses a single catch block to catch all C++ exceptions, this merges all catch clauses into a single catchpad, within which we test the EH selector as in Itanium EH.
  2. Generates a call to __clang_call_terminate in case a cleanup throws. Wasm does not have a runtime to handle this.
  3. In case there is no catch-all clause, inserts a call to __cxa_rethrow at the end of a catchpad in order to unwind to an enclosing EH scope.

Diff Detail

Repository
rL LLVM

Event Timeline

aheejin created this revision.Mar 27 2018, 7:15 AM
aheejin updated this revision to Diff 139920.Mar 27 2018, 7:17 AM
  • Comment fix

Some quick first pass comments.

lib/CodeGen/CGCleanup.cpp
985 ↗(On Diff #139920)

I think this condition can be simplified to !isMSVCPersonality() with a slight tweak of the comment.

test/CodeGenCXX/wasm-eh.cpp
33 ↗(On Diff #139920)

I'd expect a funclet bundle operand here..

dschuff added inline comments.Mar 27 2018, 5:16 PM
lib/CodeGen/CGCXXABI.h
610 ↗(On Diff #139920)

Should be public?

lib/CodeGen/CGCleanup.h
630 ↗(On Diff #139920)

We might consider having 2 personalities: one for use with builtin exceptions, and other for emulated exceptions? I'm imagining that with this style of IR we might be able to do emulated exceptions better than we have for emscripten today. We don't have to think about that now, but maybe the name of the personality might reflect it: e.g. GNU_Wasm_CPlusPlus_Native (or builtin) vs GNU_Wasm_CPlusPlus_Emulated (or external).

lib/CodeGen/CGException.cpp
1236 ↗(On Diff #139920)

Why do we need to call __cxa_rethrow instead of just emitting a rethrow instruction intrinsic to unwind?

1534 ↗(On Diff #139920)

Why?

aheejin updated this revision to Diff 140144.EditedMar 28 2018, 3:15 PM
aheejin marked an inline comment as done.
  • Rebase & Simplified the if condition
lib/CodeGen/CGCXXABI.h
610 ↗(On Diff #139920)

This code was moved from here in lib/CodeGen/MicrosoftCXXABI.cpp. There are dozens of classes inheriting from EHScopeStack::Cleanup and they all use private inheritance. Examples 1 2 3 4

The comment from EHScopeStack::Cleanup says all subclasses must be POD-like, which I guess is the reason, but I'm not very sure.

lib/CodeGen/CGCleanup.h
630 ↗(On Diff #139920)

(We talked in person :) ) I'll think about it. But I guess we can change the name once we start implementing that feature?

lib/CodeGen/CGException.cpp
1236 ↗(On Diff #139920)

Because we have to run the library code as well. As well as in other functions in libcxxabi, [[ https://github.com/llvm-mirror/libcxxabi/blob/565ba0415b6b17bbca46820a0fcfe4b6ab5abce2/src/cxa_exception.cpp#L571-L607 | __cxa_rethrow ]] has some bookeeping code like incrementing the handler count or something. After it is re-caught by an enclosing scope, it is considered as a newly thrown exception and the enclosing scope is run functions like __cxa_begin_catch and __cxa_end_catch. So we also have to run __cxa_rethrow when rethrowing something to make sure everything is matched.

The actual rethrow instruction will be added next to __cxa_rethrow in the backend, or can be embedded within __cxa_rethrow function later if we decide how to pass an exception value to a function. (which might be one reason why we want to keep the first-class exception thing)

1534 ↗(On Diff #139920)

Not strictly necessarily, because we can modify libcxxabi to our liking. I was trying to keep the same behavior as Itanium-style libcxxabi. The __clang_call_terminate function that's called when an EH cleanup throws is as follows:

; Function Attrs: noinline noreturn nounwind                                     
define linkonce_odr hidden void @__clang_call_terminate(i8*) #6 comdat {         
  %2 = call i8* @__cxa_begin_catch(i8* %0) #2                                    
  call void @_ZSt9terminatev() #8                                                
  unreachable                                                                    
}

So it calls __cxa_begin_catch on the exception value before calling std::terminate. We can change this behavior for wasm if we want, and I guess we need some proxy object in case of a foreign exception, but anyway I was trying to keep the behavior the same unless there's any reason not to.

test/CodeGenCXX/wasm-eh.cpp
33 ↗(On Diff #139920)

Even if it cannot throw? Looks like even CodeGenFunction::getBundlesForFunclet function does not add funclet bundle operand in case the callee cannot throw: code

Otherwise it looks good to me, although @majnemer would know more about the subtleties of what IR actually gets generated.

lib/CodeGen/CGCleanup.h
630 ↗(On Diff #139920)

Sounds good. BTW this should actually be GNU_Wasm_CPlusPlus instead of GNU_Wasm_CPlusCPlus

lib/CodeGen/CGException.cpp
1541 ↗(On Diff #140144)

Should this be in an else block? No need to emit it after we emit the call to __clang_call_terminate

1534 ↗(On Diff #139920)

Oh I see, we are deviating from MSVC behavior to be more like itanium here. Makes sense.

aheejin updated this revision to Diff 140156.Mar 28 2018, 4:13 PM
aheejin marked an inline comment as done.
  • GNU_CPlusCPlus -> GNU_CPlusPlus
lib/CodeGen/CGException.cpp
1541 ↗(On Diff #140144)

I don't understand? The call emitted within the if block is not a call to __clang_call_terminate but to wasm.get.exception intrinsic.

dschuff added inline comments.Mar 28 2018, 4:20 PM
lib/CodeGen/CGException.cpp
1541 ↗(On Diff #140144)

Oh you're right, I misread that, nevermind.

aheejin updated this revision to Diff 141759.Apr 9 2018, 3:44 PM
  • Add a test in which try-catches are nested within a catch
  • Rebase
aheejin updated this revision to Diff 141979.Apr 11 2018, 5:38 AM
  • Change personality function name to a unique one
rnk added a subscriber: rnk.Apr 11 2018, 1:41 PM
aheejin updated this revision to Diff 146076.May 10 2018, 12:07 AM
  • getMSVCDispatchBlock -> getFuncletEHDispatchBlock
majnemer added inline comments.May 16 2018, 3:20 PM
lib/CodeGen/CGException.cpp
1164 ↗(On Diff #146076)

Maybe this should be called WasmCatchStartBlock?

1173 ↗(On Diff #146076)

Hmm, why is this done? Won't RestoreCurrentFuncletPad undo this?

1241–1245 ↗(On Diff #146076)

This seems pretty fragile, why is this guaranteed to work? Could we maintain a map from CatchSwitchInst to catch-all block?

aheejin updated this revision to Diff 147287.May 17 2018, 4:14 AM
aheejin marked an inline comment as done.

CatchStartBlock -> WasmCatchStartBlock

Thank you for the reviews!

lib/CodeGen/CGException.cpp
1173 ↗(On Diff #146076)

Isn't RestoreCurrentFuncletPad outside of if (EHPersonality::get(*this).isWasmPersonality())? Isn't this supposed to stay until this function finishes?

1241–1245 ↗(On Diff #146076)

The function call sequence here is CodeGenFunction::ExitCXXTryStmt -> emitCatchDispatchBlock (static) -> emitWasmCatchDispatchBlock (static) and emitCatchDispatchBlock also has other callers, so it is a little cumbersome to pass a map to those functions to be filled in. (We have to make a parameter that's only gonna be used for wasm to both emitCatchDispatchBlock and emitWasmCatchDispatchBlock)

The other way is also change those static emit functions into CodeGenFunction class's member functions and make the map as a member variable.

But first, in which case do you think this will be fragile? emitWasmCatchDispatchBlock follows the structure of the landingpad model, so for a C++ code like this

try {
  ...
} catch (int) {
  ...
} catch (float) {
  ...
}

the BB structure that starts from wasm's catch.start block will look like

catch.dispatch:                                   ; preds = %entry
  %0 = catchswitch within none [label %catch.start] unwind to caller

catch.start:                                      ; preds = %catch.dispatch
  %1 = catchpad within %0 [i8* bitcast (i8** @_ZTIi to i8*), i8* bitcast (i8** @_ZTIf to i8*)]
  %2 = call i8* @llvm.wasm.get.exception()
  %3 = call i32 @llvm.wasm.get.ehselector()
  %4 = call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIi to i8*)) #2
  %matches = icmp eq i32 %3, %4
  br i1 %matches, label %catch12, label %catch.fallthrough

catch12:                                          ; preds = %catch.start
  body of catch (int)

catch.fallthrough:                                ; preds = %catch.start
  %8 = call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIf to i8*)) #2
  %matches1 = icmp eq i32 %3, %8
  br i1 %matches1, label %catch, label %rethrow

catch:                                            ; preds = %catch.fallthrough
  body of catch (float)

rethrow:                                          ; preds = %catch.fallthrough
  call void @__cxa_rethrow() #5 [ "funclet"(token %1) ]
  unreachable

So to me it looks like, no matter how the bodies of catch (int) or catch (float) are complicated, there should always be blocks like catch.start and catch.fallthrough, which compares typeids and divide control flow depending on the typeid comparison. I could very well be mistaken, so please let me know if so.

aheejin added inline comments.May 17 2018, 4:28 AM
lib/CodeGen/CGException.cpp
1241–1245 ↗(On Diff #146076)

Oh and the RethrowBlock in the code is not the same as the catch_all block... cleanuppads will be catch_all blocks in wasm, and catchpads will be catch <C++>. That RethrowBlock belongs to catch <C++> block, and is entered when the current exception caught is a C++ exception but does not match any of the catch clauses, so it can be rethrown to the enclosing scope.

aheejin updated this revision to Diff 147695.May 19 2018, 9:05 PM
  • Make wasm.get.exception/selector intrinsics take a token argument
aheejin updated this revision to Diff 147696.May 19 2018, 9:13 PM

Test case fix was missing in the last commit

majnemer added inline comments.May 20 2018, 11:50 AM
lib/CodeGen/CGException.cpp
1173 ↗(On Diff #146076)

Ah, true! Nevermind!

1241–1245 ↗(On Diff #146076)

I guess I'm worried that we could have emitted statements inside the catch(int) and catch(float) blocks and we'd either run into a terminator which isn't a BranchInst.
If we could not emit any statements yet, then I think this is OK...

aheejin added inline comments.May 20 2018, 6:36 PM
lib/CodeGen/CGException.cpp
1241–1245 ↗(On Diff #146076)

Actually it's after we emit all catch handlers, but I think this routine is not gonna touch the contents of those handlers. So it's like

Start --> int? (N) --> float? (N) --> rethrow
          (Y)           (Y)
        handler       handler

So however much control flow each handler contains, if we follow only (N)s, we will end up in the rethrow block.

dschuff accepted this revision.May 29 2018, 4:42 PM

LGTM assuming we are convinced for now that finding the rethrow block from the CatchSwitch will work. Does this need to wait until after https://reviews.llvm.org/D43746 or other CLs from that chain, or does it matter?

This revision is now accepted and ready to land.May 29 2018, 4:42 PM

Thanks! No, it does not depend on the other CL chain on the llvm side.

aheejin added a comment.EditedMay 29 2018, 8:14 PM

Oh no, sorry, it actually depends on D43746; but not others on the chain. I changed the signatures of wasm.get.exception and wasm.get.ehselector intrinsics to take a token argument, and the intrinsic signature changes were added in D43746. By the way I added one more line (setting the personality function wrapper as nothrow) to D43746 just now, so if you are ok with that, I'll land that and then this.

This revision was automatically updated to reflect the committed changes.