This is an archive of the discontinued LLVM Phabricator instance.

Add asm.js-style setjmp/longjmp handling for wasm
ClosedPublic

Authored by aheejin on Aug 26 2016, 9:05 AM.

Details

Summary

This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism.

Diff Detail

Event Timeline

aheejin updated this revision to Diff 69386.Aug 26 2016, 9:05 AM
aheejin retitled this revision from to Add asm.js-style setjmp/longjmp handling for wasm.
aheejin updated this object.
aheejin added reviewers: dschuff, jpp.

WIll add a test case soon.

dschuff edited edge metadata.Aug 26 2016, 5:32 PM

Haven't quite finished reviewing below my last comment, but wanted to at least get these to you.

lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
42 ↗(On Diff #69386)

Isn't longjmp() a noreturn function which is always followed by unreachable? What gets put into the post-longjmp side of the split block?

165 ↗(On Diff #69386)

It's still a little bit unclear about which calls this applies to. I guess it's any call in the function with the setjmp call, that appears after the call to setjmp (or really, that is reachable from the callsite), that could eventually result in a call to longjmp (i.e. not an intrinsic)?

177 ↗(On Diff #69386)

Not sure about the indentation/braces here. Is just line 177 the else clause? (If so then then it's of course correct as written but let's put braces around it anyway to make it more clear).

180 ↗(On Diff #69386)

There is one of these labels for each call to setjmp?

404 ↗(On Diff #69386)

sad that we have to duplicate this whole function with a template :(
Is getCalledValue() the only method we need that is common between InvokeInst and CallInst but not in Instruction?

Maybe we could change the factoring to something like

Value *WebAssemblyLowerEmscriptenEHSjLj::doWrapInvoke(Instruction *CI, Value* CalledValue) {
  // all the code
}

template<typename CallOrInvoke> 
Value *WebAssemblyLowerEmscriptenEHSjLj::wrapInvoke(CallOrInvoke *CI) {
  return doWrapInvoke(CI, CI->getCalledValue());
}

and then only the thin wrapper would be duplicated.

480 ↗(On Diff #69386)

We always expect this cast to succeed right (we use it without a null-check below)? So this should be cast instead of dyn_cast, so it will assert instead of returning nullptr.

508 ↗(On Diff #69386)

does getFunction() just return nullptr if there's no "malloc" in the module?

514 ↗(On Diff #69386)

"these" are calls?

544 ↗(On Diff #69386)

let's get put this else clause in braces too.

565 ↗(On Diff #69386)

can we just make this a bitwise & as written in the JS code rather than 2 BBs?

583 ↗(On Diff #69386)

could move this definition of Args down next to its use (or even just use {LoadedThrew, SetjmpTable, SetjmpTableSize} inline in the call to CreateCall and not even have to declare it).

592 ↗(On Diff #69386)

isn't emscripten_longjmp a noreturn function? If so this should be unreachable instead of a branch.

658 ↗(On Diff #69386)

could this (and I guess createSetThrewFunction too) just be static free functions instead of methods?

746 ↗(On Diff #69386)

rather than nesting everything below inside this conditional, can we just do an early return false if this condition is false?

749 ↗(On Diff #69386)

This is going to be problematic, actually. With emscripten currently, by the time we generate code, linking has already been done. So if malloc or free are missing, we can't just create a declaration. In the end we will have an undefined reference. I think here we may just have to report_fatal_error if they are not in the module. On the emscripten side we need to ensure that they are included in the link before codegen. Your previous emscripten patch caused them to be exported, but I can't remember whether it also ensures that they are linked in?

1003 ↗(On Diff #69386)

This should be report_fatal_error instead of assert, so that it doesn't get compiled away in release builds (i.e. there could be input code that does this and works on other platforms).

1047 ↗(On Diff #69386)

Could we just use std::copy(F.begin(), F.end(), std::back_inserter(BBs)) or some such?

(There's a lot of comments btw, partly because obviously this is a big CL, but also because I'm still trying to completely grok the way it works).

aheejin updated this revision to Diff 69512.Aug 28 2016, 6:52 AM
aheejin edited edge metadata.
  • Add a test case for SjLj handling
  • Modify the type of THREW from i1 to i32 (sjlj requires it to be i32)
aheejin updated this revision to Diff 69515.Aug 28 2016, 11:07 AM
aheejin marked 11 inline comments as done.

Address comments

lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
42 ↗(On Diff #69386)

That should have been "a function call that might longjmp". Fixed.

165 ↗(On Diff #69386)

Currently the condition is specified in WebAssemgly::canLongjmp(). It excludes all intrinsics and all other functions this pass is creating or using, but otherwise function calls are considered longjmp-able. This can be improved by only considering function calls that are reachable from any setjmp, but currently all function calls are candidates.

180 ↗(On Diff #69386)

For each callsite to setjmp.

404 ↗(On Diff #69386)

There are other shared methods between CallInst and InvokeInst that are used in this function:
arg_begin(), arg_end(), getCallingConv(), getAttributes(), getNumArgOperands(), doesNotReturn(), and removeAttribute().

508 ↗(On Diff #69386)

Yes. And it's OK because then Callee would not be equal to nullptr anyway.

514 ↗(On Diff #69386)

It should have been 'There are functions in JS code". Fixed.

658 ↗(On Diff #69386)

They can, but they currently use other member variables (SetThrewFName, SetTempRet0FName, ThrewGV, ThrewValueGV, and TempRet0GV). Do you think it's better to make the two functions as static functions and pass these as arguments?

746 ↗(On Diff #69386)

At the end of runOnModule function, we have some cleanups to do, and we also need to run createSetTempRet0Function and createSetThrewFunction. I made these two functions run at the very end of runOnModule method because I wanted to add those functions only there is some change made from either EH or SjLj handling. This is the reason why I couldn't do the early return. Do you have suggestions?

if (!Changed) {                                                                
  // Delete unused global variables and functions                              
  ThrewGV->eraseFromParent();                                                  
  ThrewValueGV->eraseFromParent();                                             
  TempRet0GV->eraseFromParent();                                               
  if (ResumeF)                                                                 
    ResumeF->eraseFromParent();                                                
  if (EHTypeIDF)                                                               
    EHTypeIDF->eraseFromParent();                                              
  if (EmLongjmpF)                                                              
    EmLongjmpF->eraseFromParent();                                             
  if (SaveSetjmpF)                                                             
    SaveSetjmpF->eraseFromParent();                                            
  if (TestSetjmpF)                                                             
    TestSetjmpF->eraseFromParent();                                            
  return false;                                                                
}                                                                              
                                                                               
// If we have made any changes while doing exception handling or               
// setjmp/longjmp handling, we have to create these functions for JavaScript   
// to call.                                                                    
createSetThrewFunction(M);                                                     
createSetTempRet0Function(M);
749 ↗(On Diff #69386)

The patch that ensured malloc and free to be exported was not emscripten but binaryen patch. So it didn't do anything about the linking phase. It just ensures them to be exported if they exist. But it looks like dlmalloc.bc is always included in linking now in emscripten anyway:
https://github.com/kripken/emscripten/blob/incoming/tools/system_libs.py#L341

dschuff added inline comments.Aug 29 2016, 5:14 PM
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
737 ↗(On Diff #69515)

Could you:

  1. move the calculation of SetjmpUsed up to the top, and maybe factor DoSjLj && (SetjmpUsed || LongjmpUsed into a named bool. (maybe call this bool DoSjLj and change the current DoSjLj to EnableSjLj)
  2. Only create ThrewGV/ThrewValueGV if DoSjLj || DoEH
  3. Only create TestSetjmpF et al. if DoSjLj is true.
  4. Move the deletion of ResumeF into the if (DoEH) clause. Alternatively you could put that into a getOrCreateResumeF-type function and replace direct uses of ResumeF with that.

Would that work? I realize that's some churn but I think it would improve readability.

743 ↗(On Diff #69515)

How about "malloc and free must be linked into the module if setjmp is used" so it's a little more actionable and clear?

967 ↗(On Diff #69515)

If it's simpler or less code, you can just directly allocate an add rather than creating a whole new IRB if you want.

1036 ↗(On Diff #69515)

this can be more idiomatic as assert(!isa<InvokeInst>(&I))

514 ↗(On Diff #69386)

Oh, i see, these callees are functions defined in JS code. I think maybe "These are functions in JS glue code" is better still.

658 ↗(On Diff #69386)

Oh; no, I think it's fine as it is.

1047 ↗(On Diff #69386)

Oh, I guess std::copy doesn't work because we want pointers and we have have iterators/refs? I actually think the simple for-each loop is more readable than std::transform in this case. So, I guess change it back? (sorry!)

lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
168 ↗(On Diff #69515)

Is this moved from somewhere else or is it new? If it's new, can we put it into a different CL, so we can better track possible breakage? (and in any case we should also have a test for it).

test/CodeGen/WebAssembly/lower-em-sjlj.ll
2

could add a test that calls in other functions (which do not call setjmp) are not modified, and a test which calls only longjmp but not setjmp

aheejin updated this revision to Diff 69848.Aug 31 2016, 6:58 AM
aheejin marked 5 inline comments as done.

Address comments & fix some typos

aheejin marked an inline comment as done.Aug 31 2016, 6:58 AM
aheejin added inline comments.
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
737 ↗(On Diff #69515)

I have some questions:

  1. Done
  1. The semantics of DoEH is little vague. Currently DoSjLj means we really need to do some transformation, but DoEH not necessarily so (if DoEH = EnableEH). If we want to figure out if we really need to do some transformation for EH, we have to traverse all instructions in the module first to check if there is any invoke or landingpads, and I guess that's too much.

Because DoEH does not necessarily mean we are going to do some transformation, even if we only create those two (actually there's one more, TempRet0GV, and I guess you mean all these three?) only if (DoSjLj || DOEH), it's still possible they may not be used so we should delete them at the end. I guess your intention was to create those variables only if (DoEH || DoSjLj) and obviate the need to do this part, right?

if (!Changed) {
  ThrewGV->eraseFromParent();
  ThrewValueGV->eraseFromParent();
  TempRet0GV->eraseFromParent();
  ...
}

But we still need to this because DoEH does not mean we need to EH. What do you think?

  1. TestSetjmpF is already being created only if DoSjLj is true.
  1. Move only ResumeF? There are many other functions. What do I do about others?
if (!Changed) {
  // Delete unused global variables and functions
  ThrewGV->eraseFromParent();
  ThrewValueGV->eraseFromParent();
  TempRet0GV->eraseFromParent();
  if (ResumeF)
    ResumeF->eraseFromParent();
  if (EHTypeIDF)
    EHTypeIDF->eraseFromParent();
  if (EmLongjmpF)
    EmLongjmpF->eraseFromParent();
  if (SaveSetjmpF)
    SaveSetjmpF->eraseFromParent();
  if (TestSetjmpF)
    TestSetjmpF->eraseFromParent();
  return false;
}

And if I am to put getOrCreateResumeF kind of function, it would be consistent to create that kind of function for every function I create in this pass. Do you prefer that?

lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
168 ↗(On Diff #69515)

It's new. But I'm not sure how I can add a test for this. Because currently this EH and SjLj options are in the backend so we can pass those options to the EHSjLj pass, and EHSjLj pass has default parameters 'true' for each of these options, as follows:

WebAssemblyLowerEmscriptenEHSjLj(bool EnableEH = true, bool EnableSjLj = true)

We need to set these variables to true by default to enable opt testing in *.ll files. If we want to delete these default parameters and add two separate 'enable' kind of options also in this EHSjLj pass (separately from the options that are in the backend), we can't initialize this pass with INITIALIZE_PASS macro anymore. There is something called INITIALIZE_PASS_WITH_OPTIONS or something, and I had a look at it briefly, but it looks like it's for more detailed options. (Correct me if I'm mistaken)

So what I mean is, currently there is no way to enable only SjLj and not EH using only 'opt'. (We can do that in the backend of course, but what I'm talking about is the opt test).

dschuff added inline comments.Aug 31 2016, 11:14 AM
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
744 ↗(On Diff #69848)

OK; so EmLongjmpF, SaveSetJmpF, and TestSetjmpF are only needed by setjmp and created if needed, ResumeF and EHTypeIDF are created speculatively to avoid an extra pass, and all 3 GVs are shared, and so may also be needed or not.

I think it's probably OK as it is rather than converting everything to getOrCreate.

lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp
168–175 ↗(On Diff #69848)

Sorry; I meant the change that runs lowerInvoke. We can have a test that runs llc on a ll file that has invokes, and doesn't use the -enable-emscripten-cxx-exceptions flag, and check that the invokes are lowered away.

aheejin updated this revision to Diff 69909.Aug 31 2016, 3:03 PM

Address comments

aheejin marked an inline comment as done.Aug 31 2016, 3:03 PM
aheejin added inline comments.
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
744 ↗(On Diff #69848)

So do I leave it as it is?

dschuff accepted this revision.Aug 31 2016, 3:10 PM
dschuff edited edge metadata.
dschuff added inline comments.
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp
744 ↗(On Diff #69909)

Yeah, as-is is fine.

This revision is now accepted and ready to land.Aug 31 2016, 3:10 PM
aheejin updated this revision to Diff 69911.Aug 31 2016, 3:14 PM
aheejin edited edge metadata.

Delete a blank line

aheejin closed this revision.Aug 31 2016, 3:48 PM

Looks like this breaks some of the existing tests now: https://wasm-stat.us/builders/linux/builds/10379/steps/LLVM/logs/stdio

Probably should revert it and then land a fixed version.