This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Rename Emscripten EH functions
ClosedPublic

Authored by aheejin on Oct 1 2020, 1:28 PM.

Details

Summary

[WebAssembly] Rename Emscripten EH functions

Renaming for some Emscripten EH functions has so far been done in
wasm-emscripten-finalize tool in Binaryen. But recently we decided to
make a compilation/linking path that does not rely on
wasm-emscripten-finalize for modifications, so here we move that
functionality to LLVM.

Invoke wrappers are generated in LowerEmscriptenEHSjLj pass, but final
wasm types are not available in the IR pass, we need to rename them at
the end of the pipeline.

This patch also removes uses of emscripten_longjmp_jmpbuf in
LowerEmscriptenEHSjLj pass, replacing that with emscripten_longjmp.
emscripten_longjmp_jmpbuf is lowered to emscripten_longjmp, but
previously we generated calls to emscripten_longjmp_jmpbuf in
LowerEmscriptenEHSjLj pass because it takes jmp_buf* instead of i32.
But we were able use ptrtoint to make it use emscripten_longjmp
directly here.

Addresses:
https://github.com/WebAssembly/binaryen/issues/3043
https://github.com/WebAssembly/binaryen/issues/3081

Companions:
https://github.com/WebAssembly/binaryen/pull/3191
https://github.com/emscripten-core/emscripten/pull/12399

Diff Detail

Event Timeline

aheejin created this revision.Oct 1 2020, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2020, 1:28 PM
aheejin requested review of this revision.Oct 1 2020, 1:28 PM
aheejin added inline comments.Oct 1 2020, 1:36 PM
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
118

This comment doesn't sound relevant now. This was added in D52580 by @dschuff, and then the line below this comment was this:

getTargetStreamer()->emitIndirectFunctionType(Sym);

I don't know the full history, but the comment seemed to be somehow related to indirect function types, but it seems we are using this for all functions types, so we don't need this FIXME anymore..?

121

I guess now isOSBinFormatWasm() is always true? So removed it

aheejin edited the summary of this revision. (Show Details)Oct 1 2020, 1:41 PM
aheejin updated this revision to Diff 295672.Oct 1 2020, 1:59 PM
  • Update test expectations
dschuff added a subscriber: aardappel.EditedOct 1 2020, 4:51 PM

So, just to be sure I understand the architecture here:
Previously we use separate longjup_jmpbuf/longjmp and __invoke_{$LLVMSIG} because the lowering pass runs on IR and the IR type system's rules apply (and we don't have wasm signatures there). Now we can get wasm function signatures at AsmPrinter::emitEndOfAsmFile and the only thing that needs to be done is rename them?

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
118

In the meantime @aardappel did rethink how import declarations work in asm files (and that's why we got the .functype directive in https://reviews.llvm.org/D54652). And he also fixed the huge pile of tests that this comment refers to. so removing this FIXME LGTM.

121

LGTM, we don't support wasm-in-ELF anymore.

155

I don' think this check is necessary, empty params just won't trip the loop.

207

I think you can just call insert unconditionally and ignore its return value in this case.

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
65

this isn't an override, right? should it go after the functions that are part of the AsmPrinter implementation?

aheejin updated this revision to Diff 295693.Oct 1 2020, 4:54 PM
  • Comment fix
tlively added inline comments.Oct 1 2020, 5:10 PM
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
151–152

This naming scheme does not support multivalue functions. For example functions with signatures (i32) -> (i32, i32) and (i32, i32) -> (i32) both become invoke_iii and are indistinguishable. Would it make sense to error out if Sig->Returns has multiple elements?

194–210

"have created" => "have been created"

llvm/lib/Target/WebAssembly/WebAssemblyMCInstLower.cpp
52

Should this be a non-dyn cast now?

aheejin edited the summary of this revision. (Show Details)Oct 1 2020, 6:02 PM
aheejin marked 4 inline comments as done.Oct 1 2020, 8:51 PM

So, just to be sure I understand the architecture here:
Previously we use separate longjup_jmpbuf/longjmp and __invoke_{$LLVMSIG} because the lowering pass runs on IR and the IR type system's rules apply (and we don't have wasm signatures there). Now we can get wasm function signatures at AsmPrinter::emitEndOfAsmFile and the only thing that needs to be done is rename them?

Yes. For invokes the problem is, as you pointed out, we don't have final signatures there. For emscripten_longjmp_jmpbuf / emscripten_longjmp, the problem is not that we don't have the final signature there, but rather, we cannot have two functions with different signatures with the same name emscripten_longjmp. There are two occasions that we add a call to emscripten_longjmp, but one of them is calling it with an argument type jmp_buf. So we need two different signatures, and thus two different functions there. But jmp_buf is lowered to i32 at this point so we can unify them.

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
151–152

I thought about this and initially tried to separate returns and params with _, but didn't end up doing it because

  1. I'm not even sure if Emscripten EH supports multiple returns now.
  2. Even if it does (or it will in future), I think making it a separate patch is better, because changing this will change the invoke interface currently shared with Emscripten.

Yeah, in the meantime, I think erroring out when people try to use multivalue functions with Emscripten EH/SjLj sounds like a good idea. Added the error message, and added 'lower-em-ehsjlj-multi-return.ll' to check this erroring out behavior.

207

We still need to continue in case the symbol is already in the map. I changed the code to this. This does not ignore the return value but uses it to figure out whether we should continue or not. Did you mean this?

if (!EmSymbols.insert(Sym).second)                                       
  continue;
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
65

It is used by emitEndOfAsmFile so I put it there, but yeah, I think moving its declaration down is better. It is still placed right before in emitEndOfAsmFile in cpp file; let me know if you think we should change that too.

aheejin marked 2 inline comments as done.Oct 1 2020, 8:52 PM
aheejin updated this revision to Diff 295726.Oct 1 2020, 8:53 PM
  • Address comments

There are two occasions that we add a call to emscripten_longjmp, but one of them is calling it with an argument type jmp_buf. So we need two different signatures, and thus two different functions there.

Would it also work to emit a bitcast of some sort from jmp_buf to i32 where that second kind of emscripten_longjmp call is generated? If so, would that be simpler overall?

dschuff accepted this revision.Oct 2 2020, 11:10 AM

LGTM modulo Thomas's idea if you think that will work and is an improvement.

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
65

Oh, I think WebAssemblyAsmPrinter (the class and the .cpp file) is the right place to put it, I just meant that the comment says "AsmPrinter Implementation" which I took to mean "these functions are overrides of the AsmPrinter class's virtual functions". so getMCSymbolForFunction can just go after those rather than in between the comment and the AsmPrinter overrides. What you have now is fine.

This revision is now accepted and ready to land.Oct 2 2020, 11:10 AM

There are two occasions that we add a call to emscripten_longjmp, but one of them is calling it with an argument type jmp_buf. So we need two different signatures, and thus two different functions there.

Would it also work to emit a bitcast of some sort from jmp_buf to i32 where that second kind of emscripten_longjmp call is generated? If so, would that be simpler overall?

That's a good idea! bitcast doesn't work here, but I was able to use ptrtoint. Yeah, it looks simpler, and we only need to rename invokes at the end. Changed LowerEmscriptenEHSjLj pass to do that and deleted all jmp_buf related stuff.

aheejin updated this revision to Diff 295885.Oct 2 2020, 11:55 AM

Use emscripten_longjmp directly in LowerEmscriptenEHSjLj pass

aheejin edited the summary of this revision. (Show Details)Oct 2 2020, 11:56 AM
aheejin updated this revision to Diff 295890.Oct 2 2020, 12:02 PM
  • Revert accidental .clang-tidy change + variable renaming
aheejin added inline comments.Oct 2 2020, 12:08 PM
llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll
1–5

This file shows diff from lower-em-ehsjlj-options.ll so it looks weird, but this is actually a new file (copied from lower-em-ehsjlj-options.ll just to quickly create a test file) and this file checks the erroring-out behavior for multivalue returns.

aheejin updated this revision to Diff 295893.Oct 2 2020, 12:19 PM
  • more comment
tlively accepted this revision.Oct 2 2020, 1:33 PM
tlively added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
640–641

Do we emit an error if EmscriptenEHSjLj is used with wasm64?

sbc100 added a comment.Oct 2 2020, 2:27 PM

In PR description: final wasm times -> but final wasm types

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
207

What happens here if there is a user symbol that starts with invoke_? Perhaps getMCSymbolForFunction could also return bool if it detected an __invoke_ wrapper?

Might be worth adding a function called invoke_ignoreme as a normal symbol in one of the tests?

232–234

Again, this check seem fragile or at least maybe worthy of a helper like isEmscriptenInvokeWrapperName?

aheejin edited the summary of this revision. (Show Details)Oct 2 2020, 3:41 PM
aheejin updated this revision to Diff 295968.Oct 3 2020, 1:41 AM
  • Error out for wasm64
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
207

What happens here if there is a user symbol that starts with invoke_?

We haven't handled this situation so far, and this patch does not change this. Invokes' format has set in Emscripten, so any user code that uses function name starting with invoke_ has a problem now. I think we should consider prepending Emscripten's invokes with two underscores as in the names generated by LowerEmscriptenEHSjLj pass, given that functions with two leading underscores are mostly considered compiler-specific functions. But I'd like to do it as a separate patch, as that changes the interface between LLVM, Binaryen, and Emscripten. Do you think this is a good idea?

Perhaps getMCSymbolForFunction could also return bool if it detected an __invoke_ wrapper?

What will it be used for?

llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
640–641

No. Supporting wasm64 in Emscripten EHSjLj will be a simple task, but I guess no one needs it now and supporting it should be a separate CL. Erroring out sounds good to me for an interim measure. Did that.

The condition here is little tricky: Emscripten always enables SjLj, so we can't error out whenever SjLj is enabled. But we can check by whether setjmp or longjmp is actually used, so we error out only when one of those are used.
Whether we really use EH is hard to check unless we scan every single instruction in all functions. Anyway Emscripten EH is not enabled by default, so we error out whenever EH is enabled. (EH does not use emscripten_longjmp, but apparently other global variables used in this pass also needs adjusting to i64, which is not hard if we need to do that)

aheejin updated this revision to Diff 296351.Oct 5 2020, 7:47 PM

Revise invoke format

aheejin added inline comments.Oct 5 2020, 7:49 PM
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
207

I ended up renaming invokes to use two leading underscores in this patch. I wanted to it as a separate patch, but that will again need three different patches each from LLVM, Binaryen, and Emscripten, and landing it through the roller using a double roll twice is also a pain, so I ended up doing it here. Will do the same for the pending Binaryen and Emscripten paches.

aheejin edited the summary of this revision. (Show Details)Oct 5 2020, 7:56 PM
tlively accepted this revision.Oct 5 2020, 7:59 PM
sbc100 added inline comments.Oct 5 2020, 8:02 PM
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
207

I was actually thinking of the case where there a *not* a collision.

e.g. invoke_foo

This cannot possibly collide today because foo is not a valid signature. The code today seems to allow this and pass through such symbols. wasm-emscripten-finalize includes this function in the normal exports section not in the invokeFuncs section of the metadata.

By renaming the functions to use two underscores we solve this issue but I still think it might be nicer for getMCSymbolForFunction to return an additional bool if it created a wrapper.. rather then trying to detect this post-hoc with a call to isEmscriptenInvokeWrapperName on the result.

aheejin added a comment.EditedOct 6 2020, 4:00 AM

Sorry for changing things again; I reverted the invoke format changing part (two leading underscore thing). Renaming this turns out to change a lot of things in Emscripten side, which I think is really better be a separate patch.

aheejin updated this revision to Diff 296415.EditedOct 6 2020, 4:43 AM
  • Add invoke_ignoreme test
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
207

I added invoke_ignoreme test. This test passes without a problem in the current code.

Adding InvokeDetected output parameter to getMCSymbolForFunction seems tricky. getMCSymbolForFunction can either create a new invoke symbol or retrieve an existing invoke symbol.

Note that this function is used in two different places. When used in MCInstLower, it always creates a symbol. But when used in AsmPrinter, it can either create a symbol or retrieve an existing symbol. (It creates a symbol in AsmPrinter when the symbol has not been used in any instructions, such as calls, so it didn't have a chance to be created in MCInstLower.)

So when retrieving an already created invoke symbol, getMCSymbolForFunction just returns the existing name, so it cannot distinguish invoke_vi from invoke_foo. So if we only count newly created invoke symbols, we are gonna miss already created invoke symbols in calls from emitEndOfAsmFile in AsmPrinter.

But I'm not sure if this is a problem. Without this we can use invoke_ignoreme just fine. It will be put into EmSymbols set under the incorrect assumption that it is an invoke symbol, but that does not cause any problem here. We can actually put all symbols in the set; I didn't do that because I didn't want to grow a large set every time.

aheejin edited the summary of this revision. (Show Details)Oct 6 2020, 4:58 AM
aheejin updated this revision to Diff 296417.Oct 6 2020, 5:09 AM
  • EmSymbols -> InvokeSymbols
sbc100 accepted this revision.Oct 6 2020, 8:07 AM
sbc100 added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
207

I think I still don't really understand.. isn't this condition basically trying to say saying "is this a symbol that possibly got remapped by getMCSymbolForFunction"?

if that is true then the more precise condition would be the same as the one in line of getMCSymbolForFunction wouldn't it: if (EnableEmEH && isEmscriptenInvokeWrapperName(F->getName())) ?

It seems like this line is trying to infer this condition based on the name of the resulting MC symbol.

But as you say maybe it doesn't matter if this condition is conservative?

208

Perhaps merge these two lines into single condition.

aheejin updated this revision to Diff 296508.Oct 6 2020, 11:32 AM
aheejin marked 4 inline comments as done.
  • Add InvokeDetected output param to getMCSymbolForFunction
  • Rename InvokeWrapper to just Invoke
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
207

Oh you're right. I was confused and forgot that getMCSymbolForFunction uses and always has an access to the original function name. Nevermind what I said ;( Changed getMCSymbolForFunction to take an additional output parameter InvokeDetected and made the caller to use it.

aheejin marked an inline comment as done.Oct 6 2020, 11:36 AM
aheejin added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
232–234

Sorry I missed this. Now this is using InvokeDetected as well, so I think it's OK now.

This revision was automatically updated to reflect the committed changes.
aheejin marked an inline comment as done.