This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Rename imported/exported memory symbol to __linear_memory
Needs ReviewPublic

Authored by ncw on Feb 23 2018, 7:11 AM.

Details

Reviewers
sbc100
sunfish
Summary

It should use a reserved name, rather than one which can collide with a user-defined symbol.

@sunfish, is there anything that would need to change on the Emscripten side to support this?

Event Timeline

ncw created this revision.Feb 23 2018, 7:11 AM

I'm not very familiar with the Emscripten parts of this, but some grepping in the emscripten tree found these lines:

src/preamble.js:    env['memory'] = Module['wasmMemory'];
src/preamble.js:    env['memory'] = providedBuffer;
src/preamble.js:    assert(env['memory'] instanceof ArrayBuffer);
ncw added a comment.Feb 23 2018, 9:18 AM

Do you have an emscripten setup you could use to test a change there - I'd have to get emscripten set up, which I don't have currently, I haven't contributed to Emscripten for about four years :(

I'll make an Emscripten PR though if no-one volunteers.

Might be easier to add a command line flag to control the name here, rather than changing emscripten?

ncw added a comment.Mar 1 2018, 7:28 AM

Might be easier to add a command line flag to control the name here, rather than changing emscripten?

But we don't do that for other linker-provided symbols like __wasm_call_ctors or __stack_pointer, so why should we do it for __linear_memory? We do provide a commandline-parameter for the entry-point, but that's a user-provided symbol so it's not really the same.

I've made an Emscripten pull request here, to be merged at the same time as this change: https://github.com/kripken/emscripten/pull/6293

sbc100 added a comment.Mar 1 2018, 9:19 AM

As long as emscripten can control the things it needs to we can di

In D43675#1023703, @ncw wrote:

Might be easier to add a command line flag to control the name here, rather than changing emscripten?

But we don't do that for other linker-provided symbols like __wasm_call_ctors or __stack_pointer, so why should we do it for __linear_memory? We do provide a commandline-parameter for the entry-point, but that's a user-provided symbol so it's not really the same.

I've made an Emscripten pull request here, to be merged at the same time as this change: https://github.com/kripken/emscripten/pull/6293

The only reason would be to allow lld to be flexible enough to meet emscripten's needs. I've got a feeling that changing this on the emscripten side my be more complex than that change you propose there. Remember there is the asm2wasm and s2wasm case too. I imagine that this name is assumed and used in more places. My guiding principle here is that lld need to be able to support the needs of emscripten but try to avoid making our output emscripten-specific where possible. Command line flags that emscripten can use to alter the lld behaviour are reasonable in some cases. Maybe this is one of them, maybe not?

An alternative change on the emscripten side would be have the emscripten tool that post-processes the wasm file rename the memory export (this tool already exists and will continue to exist for the foreseeable and it seems reasonable for it to do this kind of thing).
@jgravelle-google what do you think?

ncw added a comment.Mar 1 2018, 9:28 AM

I guess the multi-step process would be:

  1. Add commandline option allowing LLD's name for the memory import to be changed
  2. Emscripten passes on the option to request the new symbol name, and fixes whatever needs to be fixed
  3. Then we change the default and remove the commandline option

Seems a bit laborious, but if it's necessary then it's necessary. I don't really have a say on the process... for merging it, whatever the Emscripten devs want really.

Basically I'm not in a position to merge this LLD change; someone from Emscripten would need to nice and coordinate :) I'm just trying to make it easy-ish for someone with Emscripten commit access to have a go at making the change.

sbc100 added a comment.Mar 1 2018, 1:02 PM

I was just speaking to @jgravelle-google about this issue. I think we are leaning towards just leaving the status quo is that maybe we should just leave the status quo. The only downside is that we will need to produce an error if somebody tries to export a symbol called memory. Given that we want to recommend limiting the number of exports (i.e. hidden is the default in for our clang generated symbols) this doesn't seems like a big risk/cost.

We figured that all embedding/hosting code will probably want easy access to this object so giving it a more obscure name seems unnecessarily complicated. We also have existing users in the form of emscripten and wasm.js (and possibly others) which would all need to updated.

I guess we will want the error message in any case (even if we rename this).

Additional thoughts of mine:

Who are the expected users of these symbols? Anyone who wants to call __wasm_call_ctors or read __data_end are probably building their own external start function, or writing their own malloc.
But what kind of program will want to use the memory export? Any program that wants to pass structs or strings across a wasm boundary.
What kind of program will need to use an imported memory's name? Well, all of them.

My philosophy in this is that most users shouldn't need to touch anything __-prefixed, though toolchain authors should be expected to.

So I think in the limit we'll want to have a flag for this, so that non-Emscripten users can use whatever they think makes sense. At which time we should change the default to be __-prefixed. But I'm not sure if we have enough user-driven need to say for sure.
From an Emscripten perspective, the best thing I can think of to do is to have the wasm-emscripten-finalize tool in Binaryen rewrite the memory import from whatever name it has to whatever Emscripten expects. This is one point of change, LLD doesn't need to know it exists, Emscripten doesn't need to change that string depending on what it's targeting. We don't need a flag just for Emscripten, but we may want a flag for non-Emscripten toolchains.

ncw added a comment.Mar 2 2018, 9:28 AM

Some more thoughts - I don't think we want to add a commandline flag for this. Better to pick a name and stick with it. If that's going to be "memory" - then fine. But there's no-one asking for it to be customised. If we went down that route, for consistency we'd have to add a commandline flag for all the linker-created symbols; __linear_memory is really no different to __stack_pointer.

a) I personally find it odd that we use the string "__linear_memory" in the object files, then change it to "memory" in LLD. But, we use the string "__stack_pointer" in both object file and linked output. That's clearer.
b) Surely just as many users will want to access the stack pointer (to pass things on the stack!) as will want to access the linear memory? So if you're expecting people to be typing "__linear_memory" in their code, surely you should also be concerned about them typing "__stack_pointer".

In fact though, I don't think it's a correct expectation that people will be using the memory export by its export name. Only toolchain writers will do that, in order to wrap the ArrayBuffer in a TypedArray for applications to use. That's how it is currently in Emscripten - there is no user or code that's using that "memory" string (and it's not even available in asm.js mode, I don't think), the name of the Wasm export is entirely hidden from application code. Instead it's all done through helpers on the Runtime object or using TypedArray globals HEAP8. LLD could import/export the memory under any name it chooses, as long as Emscripten knows where to put it; it really don't end up polluting any user code, at least in the way Emscripten currently works. I'd expect that to hold for other toolchains too, since the raw WebAssembly.Memory object is virtually never going to be used without going through some toolchain-provided TypedArray.

It would be a shame to have the linker step outside of the reserved namespace and stomp on an unprefixed name, especially as the product is still unreleased. :)

sbc100 added a comment.Mar 2 2018, 9:52 AM

You do make some good points and I agree with most of them.

I would say that it is not just toolchain who will want to do access the wasm memory. Its basically anyone who wants to write there own JS to load and use wasm module. If I just want to build my own wasm and then load it myself and call a function I will almost certainly want to read and write data to the wasm's memory. I'm not saying that is a strong argument for calling it memory I'm just saying its not always an internal detail.

I don't think we want to add a commandline flag for this. Better to pick a name and stick with it. If that's going to be "memory" - then fine. But there's no-one asking for it to be customised.

I agree, and I think "no one is asking for this" is an important point. I think if people ask for it in the future it will be something we can do, which is why I'm biased towards leaving this as-is for now.

Surely just as many users will want to access the stack pointer (to pass things on the stack!) as will want to access the linear memory?

I don't think they will though. The use cases for passing things on the stack (that I can think of) are when the JS embedder wants to allocate new memory and pass that back to the wasm module, and in that case doing so via the stack is an optimization over using an exported (or JS-implemented) malloc/free.
On the flip side, a use case for reading memory external to the module that I can think of off the top of my head is converting a C string to a JS string. Or reading a vararg buffer from JS (which requires carnal enough knowledge of the ABI that you're a toolchain developer at this point :D )

Only toolchain writers will do that

So in my head, I'm planning around a world where Clang+LLVM+LLD is the extent of the toolchain, and people write their minimal JS glue code themselves because it's small enough to avoid the overhead of some JS dependency to do it for them. For context, Hello World in Emscripten is has ~10K of minified JS glue code. Writing the JS glue by hand keeps both files combined under 1K.
So I'm interested in highly lightweight toolchains that reuse minimal JS glue, and in those contexts referencing the memory export by name will be nearly mandatory.

It would be a shame to have the linker step outside of the reserved namespace and stomp on an unprefixed name, especially as the product is still unreleased. :)

All that is to say, having slept on it, you're right, and for the sanity of LLD on its own, creating a non-prefixed symbol is a bizarre design decision. It's not LLD's responsibility to be maximally ergonomic for highly specific use cases, and frankly as a user the difference between exports.memory and exports.__linear_memory isn't that weird or interesting. I still need to examine the wasm module that LLD produces to see what it's called either way, then I use whatever it is once in my program and call it done.

So, to reiterate, yeah __linear_memory is probably the right direction.

ncw added a comment.Mar 27 2018, 11:06 AM

How about we do the multi-stage thing:

  1. LLD should output two exports for the linear memory - __linear_memory and memory.
  2. Then Emscripten switches over to using __linear_memory at its leisure
  3. Finally we tidy up the unused memory export

Would you accept/approve a patch to add the __linear_memory export (without removing memory) on the understanding that memory is deprecated?

Otherwise this is just going to stagnate forever...

How about you provide a flag to emscripten to choose the name for the export? That way they can choose if/when to switch to the new default name (or not).

@sbc100 @ncw Can you comment on the status of this issue? Is this still something we want to fix before we consider the basic ABI to be basically stable?

I've been persuaded I think that we should rename this, and let emscripten coerce it as it sees fit. However I think maybe we should have common scheme for all these synthetic/internal exports. How about: __wasm_memory, __wasm_table, __wasm_data_start, etc?

ncw added a comment.Sep 21 2018, 8:08 AM

Renaming them all will take a bit more work (for everyone) - I'm not sure it's especially urgent. Personally I'm happy as long as they all have underscores, and aren't wildly inconsistent.

I like the idea of a consistent naming scheme with double-underscores for all of those special symbols. I also don't see any reason emscripten shouldn't use those names too. If it's easier, we could just prepare both an emscripten and an LLVM change and land them more-or-less at the same time and avoid a 3-stage process.

The prompt here is that the LLVM wasm backend will be leaving "experimental" status. That means in about 6 months, it'll be part of an LLVM release. Once that happens, it'll be harder to make changes like this.

So yeah we don't need it *today* or anything, but it'd also be good to avoid scrambling to get it in at the end.

"memory" is now a de-facto standard. We may still be able to change it, but we'll now need a transition plan.

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2019, 9:52 AM