It should use a reserved name, rather than one which can collide with a user-defined symbol.
@sunfish, is there anything that would need to change on the Emscripten side to support this?
ncw on Feb 23 2018, 7:11 AM.Authored by
I'm not very familiar with the Emscripten parts of this, but some grepping in the emscripten tree found these lines:
src/preamble.js: env['memory'] = Module['wasmMemory']; src/preamble.js: env['memory'] = providedBuffer; src/preamble.js: assert(env['memory'] instanceof ArrayBuffer);
Do you have an emscripten setup you could use to test a change there - I'd have to get emscripten set up, which I don't have currently, I haven't contributed to Emscripten for about four years :(
I'll make an Emscripten PR though if no-one volunteers.
But we don't do that for other linker-provided symbols like __wasm_call_ctors or __stack_pointer, so why should we do it for __linear_memory? We do provide a commandline-parameter for the entry-point, but that's a user-provided symbol so it's not really the same.
I've made an Emscripten pull request here, to be merged at the same time as this change: https://github.com/kripken/emscripten/pull/6293
As long as emscripten can control the things it needs to we can di
The only reason would be to allow lld to be flexible enough to meet emscripten's needs. I've got a feeling that changing this on the emscripten side my be more complex than that change you propose there. Remember there is the asm2wasm and s2wasm case too. I imagine that this name is assumed and used in more places. My guiding principle here is that lld need to be able to support the needs of emscripten but try to avoid making our output emscripten-specific where possible. Command line flags that emscripten can use to alter the lld behaviour are reasonable in some cases. Maybe this is one of them, maybe not?
An alternative change on the emscripten side would be have the emscripten tool that post-processes the wasm file rename the memory export (this tool already exists and will continue to exist for the foreseeable and it seems reasonable for it to do this kind of thing).
I guess the multi-step process would be:
Seems a bit laborious, but if it's necessary then it's necessary. I don't really have a say on the process... for merging it, whatever the Emscripten devs want really.
Basically I'm not in a position to merge this LLD change; someone from Emscripten would need to nice and coordinate :) I'm just trying to make it easy-ish for someone with Emscripten commit access to have a go at making the change.
I was just speaking to @jgravelle-google about this issue. I think we are leaning towards just leaving the status quo is that maybe we should just leave the status quo. The only downside is that we will need to produce an error if somebody tries to export a symbol called memory. Given that we want to recommend limiting the number of exports (i.e. hidden is the default in for our clang generated symbols) this doesn't seems like a big risk/cost.
We figured that all embedding/hosting code will probably want easy access to this object so giving it a more obscure name seems unnecessarily complicated. We also have existing users in the form of emscripten and wasm.js (and possibly others) which would all need to updated.
I guess we will want the error message in any case (even if we rename this).
Additional thoughts of mine:
Who are the expected users of these symbols? Anyone who wants to call __wasm_call_ctors or read __data_end are probably building their own external start function, or writing their own malloc.
My philosophy in this is that most users shouldn't need to touch anything __-prefixed, though toolchain authors should be expected to.
So I think in the limit we'll want to have a flag for this, so that non-Emscripten users can use whatever they think makes sense. At which time we should change the default to be __-prefixed. But I'm not sure if we have enough user-driven need to say for sure.
Some more thoughts - I don't think we want to add a commandline flag for this. Better to pick a name and stick with it. If that's going to be "memory" - then fine. But there's no-one asking for it to be customised. If we went down that route, for consistency we'd have to add a commandline flag for all the linker-created symbols; __linear_memory is really no different to __stack_pointer.
a) I personally find it odd that we use the string "__linear_memory" in the object files, then change it to "memory" in LLD. But, we use the string "__stack_pointer" in both object file and linked output. That's clearer.
In fact though, I don't think it's a correct expectation that people will be using the memory export by its export name. Only toolchain writers will do that, in order to wrap the ArrayBuffer in a TypedArray for applications to use. That's how it is currently in Emscripten - there is no user or code that's using that "memory" string (and it's not even available in asm.js mode, I don't think), the name of the Wasm export is entirely hidden from application code. Instead it's all done through helpers on the Runtime object or using TypedArray globals HEAP8. LLD could import/export the memory under any name it chooses, as long as Emscripten knows where to put it; it really don't end up polluting any user code, at least in the way Emscripten currently works. I'd expect that to hold for other toolchains too, since the raw WebAssembly.Memory object is virtually never going to be used without going through some toolchain-provided TypedArray.
It would be a shame to have the linker step outside of the reserved namespace and stomp on an unprefixed name, especially as the product is still unreleased. :)
You do make some good points and I agree with most of them.
I would say that it is not just toolchain who will want to do access the wasm memory. Its basically anyone who wants to write there own JS to load and use wasm module. If I just want to build my own wasm and then load it myself and call a function I will almost certainly want to read and write data to the wasm's memory. I'm not saying that is a strong argument for calling it memory I'm just saying its not always an internal detail.
I agree, and I think "no one is asking for this" is an important point. I think if people ask for it in the future it will be something we can do, which is why I'm biased towards leaving this as-is for now.
I don't think they will though. The use cases for passing things on the stack (that I can think of) are when the JS embedder wants to allocate new memory and pass that back to the wasm module, and in that case doing so via the stack is an optimization over using an exported (or JS-implemented) malloc/free.
So in my head, I'm planning around a world where Clang+LLVM+LLD is the extent of the toolchain, and people write their minimal JS glue code themselves because it's small enough to avoid the overhead of some JS dependency to do it for them. For context, Hello World in Emscripten is has ~10K of minified JS glue code. Writing the JS glue by hand keeps both files combined under 1K.
All that is to say, having slept on it, you're right, and for the sanity of LLD on its own, creating a non-prefixed symbol is a bizarre design decision. It's not LLD's responsibility to be maximally ergonomic for highly specific use cases, and frankly as a user the difference between exports.memory and exports.__linear_memory isn't that weird or interesting. I still need to examine the wasm module that LLD produces to see what it's called either way, then I use whatever it is once in my program and call it done.
So, to reiterate, yeah __linear_memory is probably the right direction.
How about we do the multi-stage thing:
Would you accept/approve a patch to add the __linear_memory export (without removing memory) on the understanding that memory is deprecated?
Otherwise this is just going to stagnate forever...
I've been persuaded I think that we should rename this, and let emscripten coerce it as it sees fit. However I think maybe we should have common scheme for all these synthetic/internal exports. How about: __wasm_memory, __wasm_table, __wasm_data_start, etc?
Renaming them all will take a bit more work (for everyone) - I'm not sure it's especially urgent. Personally I'm happy as long as they all have underscores, and aren't wildly inconsistent.
I like the idea of a consistent naming scheme with double-underscores for all of those special symbols. I also don't see any reason emscripten shouldn't use those names too. If it's easier, we could just prepare both an emscripten and an LLVM change and land them more-or-less at the same time and avoid a 3-stage process.
The prompt here is that the LLVM wasm backend will be leaving "experimental" status. That means in about 6 months, it'll be part of an LLVM release. Once that happens, it'll be harder to make changes like this.
So yeah we don't need it *today* or anything, but it'd also be good to avoid scrambling to get it in at the end.