This is an archive of the discontinued LLVM Phabricator instance.

Wasm entrypoint changes #2 (export entrypoint in "start" section) APPLY AFTER D40724
AbandonedPublic

Authored by ncw on Nov 28 2017, 7:16 AM.

Details

Reviewers
sbc100
Summary

This was originally posted on GitHub here: https://github.com/WebAssembly/lld/pull/15

Now that the GitHub repo has been archived, I can't post a reply there! So I'm opening this for discussion despite Sam's lack of enthusiasm...

I said:

The --entry option is there to allow designating the entry-point symbol, but it isn't wired in!

LLD needs to actually set the entrypoint on the Wasm output, using the "start" section.

Sam said:

The reason we do this right now is for parity the expectations of emscripten and also wasm.js in the musl repo (which we use for testing):
https://github.com/jfbastien/musl/blob/wasm-prototype-1/arch/wasm32/wasm.js

Both these loaders expect to be able to run the main function explicitly and get the return code from it.

There has been some discussion about the role of the start section and how the linker should use it. I think it was originally intented to tune the C/C++ internalizer functions but not run main(). This is certainly one concievable option, but it would involve the linker generating said function, or assuming its name (in musl this i called __libc_start_init for example).

For now I think it makes sense to keep it empty and let the embedder choose to run what it wants when it whats rather than on instantiate.

My response:

I see where you're coming from... but surely the whole point of the "entrypoint" argument is to determine which symbol (if any) should be set as the Wasm module's entry point.

If in fact you'd like it so that the default is not to have an entrypoint, then we could easily change the default to be --entry '' (empty string). But for those of us who do want an entrypoint on our Wasm modules, it would be possible to do that.

I agree that we don't really want "main" to be run as part of the Wasm entrypoint.

Maybe the long-term solution is to fiddle with Musl, to create a new function "_start_wasm" that calls both __init_libc and __libc_start_init (but not main), and make that function name part of the Wasm "linkage conventions".

That would be basically this PR, plus a one-liner to change "_start" to "_start_wasm".

Finally, the reason why I'm interesting in actually using Wasm's "start" functionality is because it seems to me to be a better form level of encapsulation. No-one really wants it to be possible to touch global variables that aren't initialised, I think it is preferable that the WebAssembly instantiation fails if the globals can't be initialised, and make it impossible to invoke exports on an uninitialised module.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

ncw created this revision.Nov 28 2017, 7:16 AM

Thinking about this issue, I find myself going back to main. Besides asm.js/etc. compatibility, is there a reason we don't want to call main from the wasm start function, along with all the static constructors? Doing so would be pretty tidy from an ABI perspective.

main having a return value doesn't seem to be it, because musl (and all others) handle this by passing main's return value to a call to exit.

Is it needing to return to the browser event loop between the static constructors and main? However, it seems like in web contexts, modules often wouldn't have a normal main, but would either be libraries that just exist to be called from other code, or would integrate with an event loop.

Is it to have more control over C++ static constructor order? I haven't looked at this closely, but it seems like the C++ rules have enough wiggle room. Wasm loaders should do the right thing with an acyclic dependency graph, and if it's cyclic, C++ doesn't seem to have anything to help.

Are there other issues?

tl;dr: I think I agree with @sunfish

More specifically, if I were starting from scratch, I'd want:

  1. Wire up --entry in the linker like this (although I might not give it a default; instead make the absence of the flag mean "don't make a start section").
  2. In my hypothetical non-emscripten C toolchain, use a crt1.o or similar file with a _start() function that runs the constructors, calls main(), and exits with its result, just like conventional _start() functions do today (e.g. musl's __libc_start_main). This would work with both web and non-web use cases.
  3. For use cases which want to have no main, or return to the event loop after constructors, or call the constructors and/or main explicitly, there are lots of options. A toolchain (e.g. emscripten) could use a different crt file (possibly optionally based on a flag), or use -nostartfiles, or just make some libstartup.a that's linked by default by clang, that a toolchain could replace or cause not to be linked.
  4. Non-C toolchains would the also be able to control their startup by using lld's --entry flag or not, and they'd have their own standard library startup and language-defined entry point.
ruiu added a subscriber: ruiu.Nov 28 2017, 2:50 PM

I believe you want to report an error when a user passes both --relocatable and --entry.

sbc100 edited edge metadata.Nov 28 2017, 10:45 PM

OK, it sounds like have the entry section call the entry point by default seems to be the consensus. We will likely need to modify musl wasm.js to handle that before this change lands. And I imagine emscripten will want to pass --entry to override this behaviour (but its already going to be passing a lot of cusom flags).

Can you add test (or tests)?

wasm/Driver.cpp
328

Don't we always want some kind of entry point for non-relocatable binaries. Without it it will be hard to gc unused functions. (I suppose we could use the '-u' flag to pin the function we want kept instead?)

352

We probably don't need these two checks here and in Writer.cpp.

ncw added a comment.Nov 29 2017, 9:09 AM
In D40559#938273, @ruiu wrote:

I believe you want to report an error when a user passes both --relocatable and --entry.

That's already done by the existing code in LinkerDriver::link.

I've added a new test, and updated the existing ones, and I think all the cases are covered now.

wasm/Driver.cpp
328

I think it makes sense to allow having no entry point. In this case, the only exports will be the (non-hidden) functions defined in the .o files, plus the (non-hidden) functions defined in objects that were loaded from any .a files.

It would be nice for "completeness" to add a -u commandline argument, to allow pulling in symbols from .a files that come from translation units we're not otherwise touching.

I'm expecting that 99% of people with a Wasm module will be wanting to export an API consisting of several functions, so the --entry argument was never really going to be enough to be useful in the general case for selecting which functions should be exported.

I'll add -u support to wasm-lld while I'm at it!

352

Done, I've kept the second checks as an assertion though. (And also added a check for whether Sym is actually defined, fixing a crash.)

ncw updated this revision to Diff 124763.Nov 29 2017, 9:10 AM
  • Added tests
dschuff added inline comments.Nov 29 2017, 10:10 AM
wasm/Driver.cpp
328

+1. For Emscripten, for now we will probably just use the "no entry" case and call main explicitly.

I'm not a big fan of having --entry '' be the way to disable the entry point. If we really want to allow that then perhaps --no-entry? I'm not sure we really need that feature though. The other linkers all require an entry point I think.

Anyway, perhaps this change can be split into pieces?

  1. Add support for --undefined/-u
  2. Create Start section with Config->Entry
  3. Add --no-entry option

(2) is the real substantive change here, I'm working on change the musl so that we can land it without breaking the waterfall.

test/wasm/data-layout.ll
3

Why is this needed? Shouldn't allow-undefined be enough here?

wasm/Driver.cpp
301

Does this need to be in Config, or can it just be a local ?

ruiu added a comment.Nov 29 2017, 4:42 PM

I'm not a big fan of having --entry '' be the way to disable the entry point. If we really want to allow that then perhaps --no-entry? I'm not sure we really need that feature though. The other linkers all require an entry point I think.

I'd agree. --dynamic-linker <path> and --no-dynamic-linker are a precedence.

The existing ELF linkers automatically deduce entry point if no -entry option is given. Please look at https://github.com/llvm-project/llvm-project-20170507/blob/master/lld/ELF/Writer.cpp#L1755

That being said, I don't think you want to copy that hacky behavior to a shiny new file format. I'd make -entry mandatory when creating a wasm executable.

I'm running into some issues with our current musl tests using the wasm start section, even purely for static initializers.

The problem is that we are currently exporting our memory to JS, but the memory export cannot be accessed until the module is instantiated (i.e. after the start function). This means that any system calls (e.g. puts) don't work during the start function because they have no access to the memory.

Importing the memory would fix this.

Maybe I'm missing something?

ncw added a comment.Nov 30 2017, 1:49 AM

OK, I can split this review into two, one for --undefined and one for the --entry changes (I'll do it tomorrow), and I could open a third review for the --no-entry support.

I'm not too bothered by the --entry '' syntax, mainly because I expect 99% of people to be happy with the default. When we get the default right, who _wouldn't_ want to run static initialisers? I think the default should be some new name, eg _start_wasm, to differentiate its behaviour from the standard _start function, rather than redefine the behaviour of _start on the wasm platform to not include running main. (This would smooth integration with upstream Musl, and give developers the option of passing --entry _start if they _do_ want to run main, but if we actually change _start then there's no way to specify that you actually want the traditional behaviour.)

In D40559#939945, @ruiu wrote:

That being said, I don't think you want to copy that hacky behavior to a shiny new file format. I'd make -entry mandatory when creating a wasm executable.

That would be a bit awkward. For a "default" C/C++ executable, we should aim to keep the link-line as short as possible, and not require the dev to know magic names for functions like _start that are supplied by the platform. (I'm imagining that the Clang+LLD toolchain will be usable and useful as a freestanding thing, which is how my company currently uses Emscripten - we basically provide all the JS syscalls ourselves now, and use only the Emscripten compiler.)

test/wasm/data-layout.ll
3

--allow-undefined doesn't allow the entrypoint to be undefined. Without an --entry argument you get --entry _start by default, and if that function isn't found then it will cause an error. I think that's the right behaviour; it wouldn't make any sense to automatically convert _start into an import, since that's not likely to be what the dev intended.

wasm/Driver.cpp
301

Good point, it doesn't _have_ to go in Config. But, if this is ever refactored from being one long do-it-all method, it will have to go there (see ELF driver). It's nicer on the eyes to have all the different options match here, I felt.

ruiu added a comment.Nov 30 2017, 12:53 PM

As to --entry '', I can say as a linker guy that it looks odd. When something shouldn't be set, we usually pass -no-<something> instead of -<something> ''. For consistency with other options, I strongly prefer adding -no-entry, or not to set the entry point function if no -entry option is given. --entry '' is, well, just a bit too odd.

wasm/Driver.cpp
301

It will eventually be in Config (e.g. when you implement the gc-sections), but in general, in lld, we do it when we need it. It looks like you don't need to store it even to a local variable. You could do

for (StringRef S : getArgs(Args, OPT_undefined)
  addSyntheticUndefinedFunction(S);

at line 331.

Also, the empty string "" is a valid symbol name on wasm, so it'd be awkward to make it a special case.

ncw updated this revision to Diff 125135.Dec 1 2017, 6:59 AM
ncw retitled this revision from Run Wasm entrypoint on load to Wasm entrypoint changes #2 (export entrypoint in "start" section) APPLY AFTER D40724.

Split out into three reviews as requested; this change to export the entrypoint as the "start" section is diff #2.

ncw updated this revision to Diff 125552.Dec 5 2017, 9:23 AM
ncw marked 3 inline comments as done.

Rebased, all comments acted on I believe.

There basically wasn't much feedback on this change, should be OK to go in I think. The only thing people objected to was the --entry '' syntax, which is fixed as requested in another Diff.

sbc100 added a comment.Dec 5 2017, 7:27 PM

Sadly I think we are blocked on: https://github.com/WebAssembly/design/issues/1160.

Once your --no-entry change lands and this is rebased on top this change should be very small right? Maybe we could land it behind a flag then to make this behaviour optional? --use-wasm-start?

Out of interest, do you have a use case where you strongly prefer this over calling the function explicitly from the host side?

ncw added a comment.Dec 6 2017, 3:19 AM

Sadly I think we are blocked on: https://github.com/WebAssembly/design/issues/1160.

Once your --no-entry change lands and this is rebased on top this change should be very small right? Maybe we could land it behind a flag then to make this behaviour optional? --use-wasm-start?

Out of interest, do you have a use case where you strongly prefer this over calling the function explicitly from the host side?

Because of fact that syscalls can't work until after WebAssembly instantiation (in order to get the Memory), you're right that the "start" block is actually less useful than I'd thought :(

However, the changes here to --entry are still worth doing, I think.

I reckon, now that we have --undefined, the semantics of the default behaviour that Emscripten wants is actually --undefined=_start_wasm rather than --entry _start_wasm.

  • Change the default value for --entry to be --no-entry rather than --entry _start
  • Make the following change to the Clang driver, to use --undefined so that the symbol is exported, but not set as the Wasm entrypoint:
diff --git a/lib/Driver/ToolChains/WebAssembly.cpp b/lib/Driver/ToolChains/WebAssembly.cpp
index 8ae1b6c2f5..d0888fc5b7 100644
--- a/lib/Driver/ToolChains/WebAssembly.cpp
+++ b/lib/Driver/ToolChains/WebAssembly.cpp
@@ -51,7 +51,7 @@ void wasm::Linker::ConstructJob(Compilation &C, const JobAction &JA,
   ToolChain.AddFilePathLibArgs(Args, CmdArgs);
 
   if (!Args.hasArg(options::OPT_nostdlib, options::OPT_nostartfiles))
-    CmdArgs.push_back(Args.MakeArgString(ToolChain.GetFilePath("crt1.o")));
+    CmdArgs.push_back("--undefined=_start_wasm");
 
   AddLinkerInputs(ToolChain, Inputs, Args, CmdArgs, JA);

(Here, I'm using _start_wasm instead of _start, which is a new symbol I've submitted in a patch to upstream Musl. It's like _start, but it runs the libc static initialisers without then running main. The name is flexible - we just need libc and Clang to agree on a symbol name that has that behaviour, and there isn't any existing conventional name for such a symbol.)

ncw updated this revision to Diff 125699.Dec 6 2017, 4:25 AM

Another rebase. A new test was added on master, "stack-pointer.ll" which needed an adjustment to pass with the entrypoint changes.

I'd rather not change the behaviour of --entry since its a flag that is in common usage. I'd also rather not default --no-entry since that means that by default nothing will get pulled into the link (i.e. all users will need explicit --entry and/or --undefined otherwise nothing will be linked).

I don't mind adding code to support writing the start section, but since it is (currently) of very limited use I think we should not enable it by default. How about putting it behing a new flag (i.e. --start=foo). Even that much I think is a little controversial since we don't know if there will be any users of this flag.

Sorry to be negative about this change. I do think in the future we may have more use (or even default use) of the start section, but thats later down the road.

I'd like to probe the solution of importing the linear memory a little more. There's nothing in wasm that prevents syscalls from working in the wasm start function. The problem seems to just be pre-ES6-module JS. Non-Web, future-ES6-modules-Web, and future programs that are all wasm with no JS won't have this problem.

I haven't thought about it too deeply yet, but it's tempting to wonder if we could get by with the workaround of importing linear memory for now, to avoid adding interfaces that we won't need in the future.

ncw added a comment.Dec 6 2017, 1:39 PM

I'd rather not change the behaviour of --entry since its a flag that is in common usage. I'd also rather not default --no-entry since that means that by default nothing will get pulled into the link (i.e. all users will need explicit --entry and/or --undefined otherwise nothing will be linked).

I don't mind adding code to support writing the start section, but since it is (currently) of very limited use I think we should not enable it by default. How about putting it behing a new flag (i.e. --start=foo). Even that much I think is a little controversial since we don't know if there will be any users of this flag.

Sorry to be negative about this change. I do think in the future we may have more use (or even default use) of the start section, but thats later down the road.

That's OK, if it's not useful it's OK to drop it. I think my requests D40724 and D40725 should be OK without this one, if that's preferred.

I'm personally not too concerned by compatibility, since I don't think anyone is using the new LLD toolchain in production yet!

What I was suggesting was defaulting to --no-entry, but at the same time making the Clang driver default to adding --undefined=_start(_wasm) to the link line (where it currently pulls in crt1.o). That's actually preserving the original behaviour, since the old --entry argument was the same as the new --undefined, namely it prods the linker to include the symbol but doesn't do anything special to call it.

Now that we have --undefined, it's odd to have --entry as basically a synonym for it, when we could make it actually set the Wasm entry point (not by default, but for people who want it, eg if using --import-memory).

ncw abandoned this revision.Dec 19 2017, 3:28 AM

Abandoning. There's not much that can be salvaged from here, it's all superseded.