This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism.
Details
Diff Detail
Event Timeline
Haven't quite finished reviewing below my last comment, but wanted to at least get these to you.
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp | ||
---|---|---|
42 | Isn't longjmp() a noreturn function which is always followed by unreachable? What gets put into the post-longjmp side of the split block? | |
166 | It's still a little bit unclear about which calls this applies to. I guess it's any call in the function with the setjmp call, that appears after the call to setjmp (or really, that is reachable from the callsite), that could eventually result in a call to longjmp (i.e. not an intrinsic)? | |
178 | Not sure about the indentation/braces here. Is just line 177 the else clause? (If so then then it's of course correct as written but let's put braces around it anyway to make it more clear). | |
181 | There is one of these labels for each call to setjmp? | |
402 | sad that we have to duplicate this whole function with a template :( Maybe we could change the factoring to something like Value *WebAssemblyLowerEmscriptenEHSjLj::doWrapInvoke(Instruction *CI, Value* CalledValue) { // all the code } template<typename CallOrInvoke> Value *WebAssemblyLowerEmscriptenEHSjLj::wrapInvoke(CallOrInvoke *CI) { return doWrapInvoke(CI, CI->getCalledValue()); } and then only the thin wrapper would be duplicated. | |
475 | We always expect this cast to succeed right (we use it without a null-check below)? So this should be cast instead of dyn_cast, so it will assert instead of returning nullptr. | |
503 | does getFunction() just return nullptr if there's no "malloc" in the module? | |
509 | "these" are calls? | |
539 | let's get put this else clause in braces too. | |
560 | can we just make this a bitwise & as written in the JS code rather than 2 BBs? | |
578 | could move this definition of Args down next to its use (or even just use {LoadedThrew, SetjmpTable, SetjmpTableSize} inline in the call to CreateCall and not even have to declare it). | |
587 | isn't emscripten_longjmp a noreturn function? If so this should be unreachable instead of a branch. | |
653 | could this (and I guess createSetThrewFunction too) just be static free functions instead of methods? | |
744 | rather than nesting everything below inside this conditional, can we just do an early return false if this condition is false? | |
747 | This is going to be problematic, actually. With emscripten currently, by the time we generate code, linking has already been done. So if malloc or free are missing, we can't just create a declaration. In the end we will have an undefined reference. I think here we may just have to report_fatal_error if they are not in the module. On the emscripten side we need to ensure that they are included in the link before codegen. Your previous emscripten patch caused them to be exported, but I can't remember whether it also ensures that they are linked in? | |
986 | This should be report_fatal_error instead of assert, so that it doesn't get compiled away in release builds (i.e. there could be input code that does this and works on other platforms). | |
1030 | Could we just use std::copy(F.begin(), F.end(), std::back_inserter(BBs)) or some such? |
(There's a lot of comments btw, partly because obviously this is a big CL, but also because I'm still trying to completely grok the way it works).
- Add a test case for SjLj handling
- Modify the type of THREW from i1 to i32 (sjlj requires it to be i32)
Address comments
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp | ||
---|---|---|
42 | That should have been "a function call that might longjmp". Fixed. | |
166 | Currently the condition is specified in WebAssemgly::canLongjmp(). It excludes all intrinsics and all other functions this pass is creating or using, but otherwise function calls are considered longjmp-able. This can be improved by only considering function calls that are reachable from any setjmp, but currently all function calls are candidates. | |
181 | For each callsite to setjmp. | |
402 | There are other shared methods between CallInst and InvokeInst that are used in this function: | |
503 | Yes. And it's OK because then Callee would not be equal to nullptr anyway. | |
509 | It should have been 'There are functions in JS code". Fixed. | |
653 | They can, but they currently use other member variables (SetThrewFName, SetTempRet0FName, ThrewGV, ThrewValueGV, and TempRet0GV). Do you think it's better to make the two functions as static functions and pass these as arguments? | |
744 | At the end of runOnModule function, we have some cleanups to do, and we also need to run createSetTempRet0Function and createSetThrewFunction. I made these two functions run at the very end of runOnModule method because I wanted to add those functions only there is some change made from either EH or SjLj handling. This is the reason why I couldn't do the early return. Do you have suggestions? if (!Changed) { // Delete unused global variables and functions ThrewGV->eraseFromParent(); ThrewValueGV->eraseFromParent(); TempRet0GV->eraseFromParent(); if (ResumeF) ResumeF->eraseFromParent(); if (EHTypeIDF) EHTypeIDF->eraseFromParent(); if (EmLongjmpF) EmLongjmpF->eraseFromParent(); if (SaveSetjmpF) SaveSetjmpF->eraseFromParent(); if (TestSetjmpF) TestSetjmpF->eraseFromParent(); return false; } // If we have made any changes while doing exception handling or // setjmp/longjmp handling, we have to create these functions for JavaScript // to call. createSetThrewFunction(M); createSetTempRet0Function(M); | |
747 | The patch that ensured malloc and free to be exported was not emscripten but binaryen patch. So it didn't do anything about the linking phase. It just ensures them to be exported if they exist. But it looks like dlmalloc.bc is always included in linking now in emscripten anyway: |
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp | ||
---|---|---|
509 | Oh, i see, these callees are functions defined in JS code. I think maybe "These are functions in JS glue code" is better still. | |
653 | Oh; no, I think it's fine as it is. | |
744 | Could you:
Would that work? I realize that's some churn but I think it would improve readability. | |
750 | How about "malloc and free must be linked into the module if setjmp is used" so it's a little more actionable and clear? | |
969 | If it's simpler or less code, you can just directly allocate an add rather than creating a whole new IRB if you want. | |
1030 | Oh, I guess std::copy doesn't work because we want pointers and we have have iterators/refs? I actually think the simple for-each loop is more readable than std::transform in this case. So, I guess change it back? (sorry!) | |
1038 | this can be more idiomatic as assert(!isa<InvokeInst>(&I)) | |
lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp | ||
168–175 | Is this moved from somewhere else or is it new? If it's new, can we put it into a different CL, so we can better track possible breakage? (and in any case we should also have a test for it). | |
test/CodeGen/WebAssembly/lower-em-sjlj.ll | ||
2 | could add a test that calls in other functions (which do not call setjmp) are not modified, and a test which calls only longjmp but not setjmp |
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp | ||
---|---|---|
744 | I have some questions:
Because DoEH does not necessarily mean we are going to do some transformation, even if we only create those two (actually there's one more, TempRet0GV, and I guess you mean all these three?) only if (DoSjLj || DOEH), it's still possible they may not be used so we should delete them at the end. I guess your intention was to create those variables only if (DoEH || DoSjLj) and obviate the need to do this part, right? if (!Changed) { ThrewGV->eraseFromParent(); ThrewValueGV->eraseFromParent(); TempRet0GV->eraseFromParent(); ... } But we still need to this because DoEH does not mean we need to EH. What do you think?
if (!Changed) { // Delete unused global variables and functions ThrewGV->eraseFromParent(); ThrewValueGV->eraseFromParent(); TempRet0GV->eraseFromParent(); if (ResumeF) ResumeF->eraseFromParent(); if (EHTypeIDF) EHTypeIDF->eraseFromParent(); if (EmLongjmpF) EmLongjmpF->eraseFromParent(); if (SaveSetjmpF) SaveSetjmpF->eraseFromParent(); if (TestSetjmpF) TestSetjmpF->eraseFromParent(); return false; } And if I am to put getOrCreateResumeF kind of function, it would be consistent to create that kind of function for every function I create in this pass. Do you prefer that? | |
lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp | ||
168–175 | It's new. But I'm not sure how I can add a test for this. Because currently this EH and SjLj options are in the backend so we can pass those options to the EHSjLj pass, and EHSjLj pass has default parameters 'true' for each of these options, as follows: WebAssemblyLowerEmscriptenEHSjLj(bool EnableEH = true, bool EnableSjLj = true) We need to set these variables to true by default to enable opt testing in *.ll files. If we want to delete these default parameters and add two separate 'enable' kind of options also in this EHSjLj pass (separately from the options that are in the backend), we can't initialize this pass with INITIALIZE_PASS macro anymore. There is something called INITIALIZE_PASS_WITH_OPTIONS or something, and I had a look at it briefly, but it looks like it's for more detailed options. (Correct me if I'm mistaken) So what I mean is, currently there is no way to enable only SjLj and not EH using only 'opt'. (We can do that in the backend of course, but what I'm talking about is the opt test). |
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp | ||
---|---|---|
744 | OK; so EmLongjmpF, SaveSetJmpF, and TestSetjmpF are only needed by setjmp and created if needed, ResumeF and EHTypeIDF are created speculatively to avoid an extra pass, and all 3 GVs are shared, and so may also be needed or not. I think it's probably OK as it is rather than converting everything to getOrCreate. | |
lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp | ||
168–176 | Sorry; I meant the change that runs lowerInvoke. We can have a test that runs llc on a ll file that has invokes, and doesn't use the -enable-emscripten-cxx-exceptions flag, and check that the invokes are lowered away. |
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp | ||
---|---|---|
744 | So do I leave it as it is? |
lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp | ||
---|---|---|
744 | Yeah, as-is is fine. |
Looks like this breaks some of the existing tests now: https://wasm-stat.us/builders/linux/builds/10379/steps/LLVM/logs/stdio
Probably should revert it and then land a fixed version.
Isn't longjmp() a noreturn function which is always followed by unreachable? What gets put into the post-longjmp side of the split block?