This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] WebAssemblyLowerEmscriptenEHSjLj: use getter/setter for accessing tempRet0
ClosedPublic

Authored by sbc100 on Oct 12 2018, 7:31 PM.

Details

Summary

Rather than assuming that tempRet0 exists in linear memory only assume
the getter/setter functions exist. This avoids conflicting with
binaryen which declares a wasm global for this purpose and defines
its own getter and setter for that.

The other advantage of doing things this way is that it leaving
it up to the linker/finalizer to decide how to actually store this
temporary. As it happens binaryen uses a wasm global which is more
appropriate since it is thread safe.

This also allows us to change the way this is stored in the future
(memory, TLS memory, wasm global) without modifying LLVM.

This is part of a 4 part change:
LLVM: https://reviews.llvm.org/D53240
fastcomp: https://github.com/kripken/emscripten-fastcomp/pull/237
emscripten: https://github.com/kripken/emscripten/pull/7358
binaryen: https://github.com/WebAssembly/binaryen/pull/1709

Diff Detail

Repository
rL LLVM

Event Timeline

sbc100 created this revision.Oct 12 2018, 7:31 PM
sbc100 edited the summary of this revision. (Show Details)Oct 12 2018, 7:47 PM
sbc100 added reviewers: jgravelle-google, sunfish.
aheejin added a comment.EditedOct 12 2018, 8:41 PM
  • If __THREW__, __threwValue, and __tempRet0 are all variables used to communicate between wasm and JS side, isn't it inconsistent that __tempRet0 is the only function that requires a separate setter and getter here?
  • Pasted from my comment in emscripten repo: They are used to mean different things, and they don't even set the same thing (one sets a global variable in linear memory, and the other sets an wasm global). Should they share the same name in the first place?
sbc100 updated this revision to Diff 170007.Oct 17 2018, 7:54 AM
  • format
  • If __THREW__, __threwValue, and __tempRet0 are all variables used to communicate between wasm and JS side, isn't it inconsistent that __tempRet0 is the only function that requires a separate setter and getter here?
  • Pasted from my comment in emscripten repo: They are used to mean different things, and they don't even set the same thing (one sets a global variable in linear memory, and the other sets an wasm global). Should they share the same name in the first place?

We could convert THREW and __threwValue to also be stored as wasm globals and also use a setter. But for now I'm focusing on setTempRet0 and getTempRet0. This change is one of three patches that unify the behavior. Once they land all access will be via the binaryen-generated helper functions. Part of the reason for doing it that way is that neither C code nor this emscripten pass can easily use wasm globals.

aheejin added a comment.EditedOct 17 2018, 9:37 AM

Do you think all variables used to communicate between wasm and JS use not global variables but a few predetermined wasm globals? I think that's a bit strict restriction.

And I'll repeat what I said in my previous comment:

They are used to mean different things, and they don't even set the same thing (one sets a global variable in linear memory, and the other sets an wasm global). Should they share the same name in the first place?

I couldn't understand the motivation behind this change, that's all. This stemmed from my patch that tried to first eliminate seemingly unused getTempRet0 and then tried to export it to be consistent with other methods, which broke another test. The reason that another test was broken was because we were using __tempRet0 for two different purpose. So I thought if we divide the variable the problem would be solved.

Maybe I'm mistaken, so then please let me know. 😅

Do you think all variables used to communicate between wasm and JS use not global variables but a few predetermined wasm globals? I think that's a bit strict restriction.

No, the backing store used by setTempRet0 and getTempRet0 should independent of the users of those functions. That is one reason for this change. So that we can independently choose to use either wasm global or a memory location. As it happens wasm globals are basically TLS so in this case I do think its better than a raw memory location. Of course a TLS memory location would be find too, but that would probably be marginally more expensive because you'd need to add an offset to the TLS base.

And I'll repeat what I said in my previous comment:

They are used to mean different things, and they don't even set the same thing (one sets a global variable in linear memory, and the other sets an wasm global). Should they share the same name in the first place?

I couldn't understand the motivation behind this change, that's all. This stemmed from my patch that tried to first eliminate seemingly unused getTempRet0 and then tried to export it to be consistent with other methods, which broke another test. The reason that another test was broken was because we were using __tempRet0 for two different purpose. So I thought if we divide the variable the problem would be solved.

I think unifying and simplifying if fine in this case. This notion is that tempRet0 is used a TLS scratch space and I think its OK that it has several use cases. The alternative you are suggesting is to create several sets of global+getter+setter right? getReturnHighBits/setReturnHighBits/__returnHighBit + getLongJmpArg/setLongJmpArg/__longJmpArg ... etc. That seems lake added complexity we don't need although I can see that argument. Even if we do decide that is better, I'd first like to solve it this way, so at least we have one unified example or how to do this, and we can split it out later if we decide that is preferable.

Maybe I'm mistaken, so then please let me know. 😅

aheejin added a comment.EditedOct 17 2018, 6:09 PM

The alternative you are suggesting is to create several sets of global+getter+setter right? getReturnHighBits/setReturnHighBits/__returnHighBit + getLongJmpArg/setLongJmpArg/__longJmpArg ... etc.

Not really. What I was suggesting was use different names for EH's __tempRet0 and other.test_sixtyfour_bit_return_value's __tempRet0, add getter or setter for each of them only if it is necessary, and that's all. In other words what I meant was we might not need to impose a strict getter/setter rules for every variable, because we don't know how many we will end up with. Person who creates a new variable can add getter or setter in the library if it is necessary, but I was not sure if we want to impose a strict rule that all of getter and setter for cross-communicating variables should be generated from binaryen's legalize-js-interface.

The alternative you are suggesting is to create several sets of global+getter+setter right? getReturnHighBits/setReturnHighBits/__returnHighBit + getLongJmpArg/setLongJmpArg/__longJmpArg ... etc.

Not really. What I was suggesting was use different names for EH's __tempRet0 and other.test_sixtyfour_bit_return_value's __tempRet0, add getter or setter for each of them only if it is necessary, and that's all. In other words what I meant was we might not need to impose a strict getter/setter rules for every variable, because we don't know how many we will end up with. Person who creates a new variable can add getter or setter in the library if it is necessary, but I was not sure if we want to impose a strict rule that all of getter and setter for cross-communicating variables should be generated from binaryen's legalize-js-interface.

This change doesn't mind if its binaryen's legalize-js-interface defines this stuff, or if they come from a C library. The main point of this change is to allow whoever defines these function to choose how the getters/setters work. We don't mandate the existence of the global memory location

We might be able to rename these functions as followup change. I'm not sure how much emscripten depends on these names. But I think this change is a good first step in any case.

I did a little more investigating and there it turns out that decouple EH from getTempRet/setTempRet would be difficult because __cxa_find_matching_catch JS function in emscripten uses setTempRet0. So it seems that for now anyway it makes sense keep these unified.

aheejin added a comment.EditedOct 18 2018, 10:56 AM

I did a little more investigating and there it turns out that decouple EH from getTempRet/setTempRet would be difficult because __cxa_find_matching_catch JS function in emscripten uses setTempRet0. So it seems that for now anyway it makes sense keep these unified.

Not sure what you mean. That's the only place setTempRet0 is used for EH and it was the reason this pass generated setTempRet0 function for, which is also written in the comment. You sound like it is being used elsewhere in EH and now you've discovered it is also being used in __cxa_find_matching_catch so it is hard to separate variable names, which I don't understand.

And I still think treating only __tempRet0 special is a bit weird. What I initially suggested was not imposing any getter/setter requirement for any variable (regardless of where they come from. Binaryen or library or wherever), but if we want to do that, maybe we should be consistent on all variables. If it's the case you want to go that route, maybe they can come from follow-up patches though.

I think what I was trying express was that the names setTempRet0/getTempRet0 are embedded in emscripten for both returning the high bits of an i64 function and for EH, so renaming them to two independent sets of functions would a non-trivial change to emscripten and asm2wasm as well as this pass.

This change simply decouples llvm from the implementation of setTempRet0/getTempRet0 , in the same way that I am aslo doing in binaryen: https://github.com/WebAssembly/binaryen/pull/1709. I think are you suggesting that we do the same for the variables behind setThrew? I wouldn't be against that but I don't see that same urgent needs since there are not conflicting implementations.

sbc100 edited the summary of this revision. (Show Details)Oct 22 2018, 6:40 AM
sbc100 edited the summary of this revision. (Show Details)Oct 22 2018, 6:45 AM
kripken accepted this revision.Oct 22 2018, 11:54 AM
This revision is now accepted and ready to land.Oct 22 2018, 11:54 AM
This revision was automatically updated to reflect the committed changes.