This is an archive of the discontinued LLVM Phabricator instance.

[wasm-ld] Add support for calling constructors in reactors.
Needs ReviewPublic

Authored by sunfish on Oct 13 2022, 11:55 AM.

Details

Reviewers
sbc100
Summary

With [WebAssembly/wasi-libc#328], we no longer want the behavior of
automatically inserting constructor calls on every command export.
The libc is now responsible for calling constructor calls in its
_start entrypoint. So it makes sense to remove support for
auto-wrapping command exports now.

At the same time, with [WebAssembly/wasi-libc#328], we are moving to
a place where users who had previously been building reactors and
getting away with not running the constructors will suddenly have a
harder time doing so.

And as in [WebAssembly/wasi-libc#329], we're anticipating Wasm modules
being used in more environments where these invariants won't be as easy
to enforce.

So instead of removing the auto-wrapping *command* export code entirely,
convert into code to code that auto-wraps *reactor* exports. And give
it logic to call __wasm_call_ctors only if it hasn't been called yet.
That way, constructors will run automatically on the first export call,
so existing users won't be suddenly broken, and so reactors in general
won't be as tricky to use.

[WebAssembly/wasi-libc#328]: https://github.com/WebAssembly/wasi-libc/pull/328
[WebAssembly/wasi-libc#338]: https://github.com/WebAssembly/wasi-libc/pull/338
[WebAssembly/wasi-libc#329]: https://github.com/WebAssembly/wasi-libc/pull/329

Diff Detail

Event Timeline

sunfish created this revision.Oct 13 2022, 11:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 11:55 AM
sunfish requested review of this revision.Oct 13 2022, 11:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 11:55 AM
Herald added a subscriber: aheejin. · View Herald Transcript
sbc100 edited the summary of this revision. (Show Details)Oct 13 2022, 12:10 PM

Aren't reactors supposed to define an _initialize function that takes care of this?

Perhaps the idea is to make the reactor model work even for users that don't link against crt1_reactor.o?

In that case could we instead create a default _initialize function here and avoid the wrappers completely?

e.g.

if (!isDefined("_initialize")) {
  create_default_initialize();
}

Obviously this would mean that if a user supplies their own initialize function that they need to take care of calling __wasm_call_ctors, but that seems like a pretty reasonable expectation.

lld/test/wasm/init-fini-no-gc.ll
59

These days we can use llvm-object -d to disassembly the code here, making the test more readable. See lld/test/wasm/tls.s for example.

lld/test/wasm/reactor-exports.s
87–90

Ditto, we can use disassembly here now.

lld/wasm/Symbols.h
539

This could be tricky because wasm globals are non-shared and therefore thread local.

@tlively solved this problem for data segment initialization by using a word of memory instead. Search for initMemoryFlag to see how it was done in that case.

lld/wasm/Writer.cpp
1083

Odd indentation here?

Aren't reactors supposed to define an _initialize function that takes care of this?

Yes, but there are two considerations. One, not everyone is calling _initialize today, because in some situations today things work without it, however with the change to make malloc have a constructor, that will change. The other is that I'm anticipating use cases, such as loading a wasm module in JS, where _initialize might not be automatically called, and developers won't know to manually call it.

Perhaps the idea is to make the reactor model work even for users that don't link against crt1_reactor.o?

Yep. It'll be a transition, but that's one of the things this will enable.

In that case could we instead create a default _initialize function here and avoid the wrappers completely?

I don't think we can avoid the wrappers, if there are constructors. The wasm start function is too early for user constructors, because it's limited in what it can do. As I mentioned above, I want to avoid the requirement that _initialize be called explicitly, so wrappers seem like the only option.

One way we eventually can avoid the wrappers is to eliminate all the constructors, such as by writing a malloc which doesn't need them, and making other changes, and encouraging users to avoid static constructors. That is a longer-term goal though.

sbc100 added a comment.EditedOct 13 2022, 2:38 PM

Aren't reactors supposed to define an _initialize function that takes care of this?

Yes, but there are two considerations. One, not everyone is calling _initialize today, because in some situations today things work without it, however with the change to make malloc have a constructor, that will change. The other is that I'm anticipating use cases, such as loading a wasm module in JS, where _initialize might not be automatically called, and developers won't know to manually call it.

Perhaps the idea is to make the reactor model work even for users that don't link against crt1_reactor.o?

Yep. It'll be a transition, but that's one of the things this will enable.

In that case could we instead create a default _initialize function here and avoid the wrappers completely?

I don't think we can avoid the wrappers, if there are constructors. The wasm start function is too early for user constructors, because it's limited in what it can do. As I mentioned above, I want to avoid the requirement that _initialize be called explicitly, so wrappers seem like the only option.

One way we eventually can avoid the wrappers is to eliminate all the constructors, such as by writing a malloc which doesn't need them, and making other changes, and encouraging users to avoid static constructors. That is a longer-term goal though.

Sounds reasonable.

However, shouldn't the long term goal be to have all wasi loaders run the _initialize function correctly, as spec'd?

Also, IIUC if _initialize exists and calls __wasm_call_ctors then these wrappers won't be generated (because __wasm_call_ctors will be referenced) .. so this solution doesn't help modules that contain a valid _initialize function. Again it seems like fixing the runtime such that it calls initialize would be better solution here. At least from the perspective of wasi.

What about this solution that I think works even in the face of an correct _initialize function:

  1. Is no _initialize function exists, create a synthetic one that just calls __wasm_call_ctors
  2. Create a wrapper around _initialize which can detect when it has been run, or not
  3. Create wrappers around all the exports that guard against _initialize not being called yet

One problem with this solution is that it adds non-optional overhead to all reactors and even well-behaved runtimes that call _initialize have to then pay that price.. so I guess there should be way to disable it too?

sunfish updated this revision to Diff 467639.Oct 13 2022, 5:05 PM
  • Use a memory location instead of a wasm global, and support threads.
  • Use llvm-objdump -d in tests.
sunfish updated this revision to Diff 467640.Oct 13 2022, 5:07 PM
sunfish marked 4 inline comments as done.
  • Fix indentation.

However, shouldn't the long term goal be to have all wasi loaders run the _initialize function correctly, as spec'd?

I'll be proposing we drop the _initialize convention altogether. It was never properly standardized, and I now think we're better off without it. One of the nice things about this will be that we don't need wasi-specific loaders for reactors. Given a reactor, we can just load it into any wasm runtime and call functions on it, and it just works. This is how Web-oriented Wasm tends to work—you create a wasm module that exports/imports stuff, and then write JS code that instantiates it and calls it. You don't expect an implicitly-created _initialize function that you have to remember to call first. So dropping the _initialize function will make this kind of reactor-like use case less different from Wasm on the Web.

Also, IIUC if _initialize exists and calls __wasm_call_ctors then these wrappers won't be generated (because __wasm_call_ctors will be referenced) .. so this solution doesn't help modules that contain a valid _initialize function.

Yeah; the next step here will be to change the clang driver to stop passing --entry=_initialize and use --no-entry when linking a reactor, which will stop emitting _initialize and enable the wrappers.

abrown added a subscriber: abrown.Oct 14 2022, 8:59 AM

lso, IIUC if _initialize exists and calls __wasm_call_ctors then these wrappers won't be generated (because __wasm_call_ctors will be referenced) .. so this solution doesn't help modules that contain a valid _initialize function.

Yeah; the next step here will be to change the clang driver to stop passing --entry=_initialize and use --no-entry when linking a reactor, which will stop emitting _initialize and enable the wrappers.

Ok, so if emscripten (or any other toolchain) wants to continue to handle initialization using _start (function name not section name) and/or _initialize then it won't pay the cost of the extra wrappers?