This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] New-style command support
ClosedPublic

Authored by sunfish on Jun 11 2020, 1:48 PM.

Details

Reviewers
sbc100
Summary

This adds support for new-style command support. In this mode, all exports
are considered command entrypoints, and the linker inserts calls to
__wasm_call_ctors and __wasm_call_dtors for all such entrypoints.

This enables support for:

  • Command entrypoints taking arguments other than strings and return values other than int.
  • Multicall executables without requiring on the use of string-based command-line arguments.

This new behavior is disabled when the input has an explicit call to
__wasm_call_ctors, indicating code not expecting new-style command
support.

This change does mean that wasm-ld no longer supports DCE-ing the
__wasm_call_ctors function when there are no calls to it. If there are no
calls to it, and there are ctors present, we assume it's wasm-ld's job to
insert the calls. This seems ok though, because if there are ctors present,
the program is expecting them to be called. This change affects the
init-fini-gc.ll test.

Diff Detail

Event Timeline

sunfish created this revision.Jun 11 2020, 1:48 PM
sunfish updated this revision to Diff 270386.Jun 12 2020, 7:06 AM

Use the return value of handleUndefined instead of doing a separate lookup, which avoids a redundant lookup and handles LazySymbols properly.

sunfish updated this revision to Diff 270412.Jun 12 2020, 8:38 AM

Run clang-format and fix lint warnings.

Interesting direction! I'm only skimmed it for now. Not strongly opposed to this direction although it makes me sad to see more codegen/magic in the linker.

lld/wasm/Driver.cpp
475

Can we make this a separate change? Its seems pretty distinct.

lld/wasm/MarkLive.cpp
76

Is this just a de-duplication with the code below? It is related to this change to independent?

lld/wasm/Writer.cpp
1159

Can you replace these 2 lines with WasmSym::callCtors->isExported() ?

sunfish updated this revision to Diff 270489.Jun 12 2020, 12:21 PM
  • Address review feedback
  • Mark callCtors live in the MarkLive pass rather than in the Writer pass.
sunfish marked 6 inline comments as done.Jun 12 2020, 12:48 PM
sunfish added inline comments.
lld/wasm/Driver.cpp
475

Ok, I've removed it, and I'll submit it in a separate patch.

lld/wasm/MarkLive.cpp
76

I guess it is deduplication and possibly separable, but it's closely related to the rest of the patch.

Previously, when the input didn't have a __wasm_call_ctors call, we'd treat the init functions as dead code and delete them. With this patch, when the input doesn't have a __wasm_call_ctors call, we assume it's wasm-ld's job to insert __wasm_call_ctors calls. So it no longer makes sense to say "if __wasm_call_ctors is live then the init functions are live". The init functions are now just always live.

lld/wasm/Writer.cpp
1159

Done.

sbc100 added inline comments.Jun 26 2020, 4:45 PM
lld/test/wasm/command-exports-no-tors.ll
5

no-tors.ll -> no-ctors.ll?

Also, I've been converting existing tests over to the .s format. For an example of this see ctor_return_value.s. If possible, I'd prefer to add .s tests only going forward .

lld/test/wasm/command-exports.ll
2

Ditto .s if possible.

lld/test/wasm/init-fini-gc.ll
8

I guess we should rename this test...?

27

Why this?

72

What do we actually want to test for here?

lld/wasm/Writer.cpp
624

config->isPic is already checked at the call site.

Description needs updating after --emscripten was split out.

sbc100 added inline comments.Jun 26 2020, 5:02 PM
lld/wasm/Writer.cpp
946

The comment say *or* but the code says "&&".

Can this part just be reverted? Since callCtors should not be live unless its needed.

sunfish updated this revision to Diff 282304.Jul 31 2020, 1:27 PM
sunfish marked 9 inline comments as done.

Address review feedback.

lld/test/wasm/command-exports-no-tors.ll
5

no-tors.ll has no ctors *or* dtors. I've now converted these to .s tests.

lld/test/wasm/init-fini-gc.ll
8

Renamed.

27

That's how __cxa_atexit will typically be defined. And we don't want it being exported to prevent GC, which might defeat the purpose of the test.

72

I've now added comments to these CHECK lines.

lld/wasm/Writer.cpp
624

I've now changed it to an assert.

Friendly ping!

sbc100 accepted this revision.Sep 30 2020, 5:17 PM

The description needs updating to reflect the fact that this change no longer adds the --emscripten flag.

lld/test/wasm/command-exports-no-tors.s
8 ↗(On Diff #282304)

You can remove the .section directives for all functions I believe to make the tests most concise.

10 ↗(On Diff #282304)

And these explicit .type entries for functions. (the .functype on its own below I think is enough).

18 ↗(On Diff #282304)

And these two lines can go too.

lld/test/wasm/command-exports.s
94 ↗(On Diff #282304)

No need for __dso_handle here or above.

This revision is now accepted and ready to land.Sep 30 2020, 5:17 PM
sunfish edited the summary of this revision. (Show Details)Sep 30 2020, 5:19 PM
sunfish closed this revision.Sep 30 2020, 7:06 PM

Thanks! Description updated, fixes applied, and landed in https://reviews.llvm.org/rG6cd8511e5932

realuptime added a subscriber: realuptime.EditedFeb 20 2022, 12:51 AM

Hey, this new behaviour broke existing functionality!

I can still see this a workaround by making a call to __wasm_call_ctors.
Be sure that people are notified when such functionality change!

Calling by default C++ static constructors and functions (static int x = somefunc()) for each "export" entry point is a very very bad idea!!!
I had to compile with -fno-c++-static-destructors first and then refactor my code.
Or use no_destroy flag(https://clang.llvm.org/docs/AttributeReference.html#no-destroy).
For example, on the project that I am working on I am calling a function(export.NeedsDisplay()) for each rendered frame, which could be 1000 times per second without VSYNC!

The project:
http://myapp.eu/wasi/

Here is my GitHub issue, which is closed now after spending two days finding the cause!
https://github.com/WebAssembly/WASI/issues/471

yamt added a subscriber: yamt.Jun 27 2022, 2:11 AM

there are use cases for a command to export non-main functions.
eg. malloc/free stuff mentioned in https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md
this change broke them.
for my very specific case it can be worked around eg. by "-mexec-model=reactor -emain". but i feel it isn't a right thing to do.

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2022, 2:11 AM
Herald added subscribers: pmatos, asb. · View Herald Transcript

there are use cases for a command to export non-main functions.
eg. malloc/free stuff mentioned in https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md
this change broke them.
for my very specific case it can be worked around eg. by "-mexec-model=reactor -emain". but i feel it isn't a right thing to do.

I believe the correct fix is to reference __wasm_call_ctors() somewhere, either by calling it or by exporting. A common way to do this is to have a _start function that calls __wasm_call_ctors and then main.

there are use cases for a command to export non-main functions.
eg. malloc/free stuff mentioned in https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md
this change broke them.

One consideration is that that wamr API doesn't seem compatible with closest thing we have to a spec, which states "Command instances may assume that they will be called from the environment at most once. Command instances may assume that none of their exports are accessed outside the duration of that call.". It's debatable how authoritative that spec is, but the big picture is that there's not currently much clarity on the relationship between LLVM and engines.

for my very specific case it can be worked around eg. by "-mexec-model=reactor -emain". but i feel it isn't a right thing to do.

It's difficult to talk about the "right thing to do" in the short term. I'm giving a presentation at the upcoming WASI meeting about "WASI Preview2", which is a plan for a major new successor to the current wasi_snapshot_preview1 API, which will include a new clear definition of "commands", which I expect will be our best chance at fixing these issues.

I believe the correct fix is to reference __wasm_call_ctors() somewhere, either by calling it or by exporting. A common way to do this is to have a _start function that calls __wasm_call_ctors and then main.

This may work for now, but note that it isn't a documented workaround, and we may break it in future LLVM releases.

yamt added a comment.Jun 27 2022, 5:44 PM

there are use cases for a command to export non-main functions.
eg. malloc/free stuff mentioned in https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md
this change broke them.

One consideration is that that wamr API doesn't seem compatible with closest thing we have to a spec, which states "Command instances may assume that they will be called from the environment at most once. Command instances may assume that none of their exports are accessed outside the duration of that call.". It's debatable how authoritative that spec is, but the big picture is that there's not currently much clarity on the relationship between LLVM and engines.

  • if "outside the duration of that call" mean it's ok to access exports during the call of _start, the wamr api seems compatible to me. (those malloc/free exports are used during the execution of _start, as far as i understand.)
  • does the "command" concept implies wasi?
  • i feel it's simpler to have two types of --export (eg. a new cli option to specify extra entry points), rather than flipping the semantics of all --export with a global command/reactor switch.

for my very specific case it can be worked around eg. by "-mexec-model=reactor -emain". but i feel it isn't a right thing to do.

It's difficult to talk about the "right thing to do" in the short term. I'm giving a presentation at the upcoming WASI meeting about "WASI Preview2", which is a plan for a major new successor to the current wasi_snapshot_preview1 API, which will include a new clear definition of "commands", which I expect will be our best chance at fixing these issues.

unfortunately the meeting is at midnight for me.

there are use cases for a command to export non-main functions.
eg. malloc/free stuff mentioned in https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md
this change broke them.

One consideration is that that wamr API doesn't seem compatible with closest thing we have to a spec, which states "Command instances may assume that they will be called from the environment at most once. Command instances may assume that none of their exports are accessed outside the duration of that call.". It's debatable how authoritative that spec is, but the big picture is that there's not currently much clarity on the relationship between LLVM and engines.

  • if "outside the duration of that call" mean it's ok to access exports during the call of _start, the wamr api seems compatible to me. (those malloc/free exports are used during the execution of _start, as far as i understand.)

I'm referring to the "will be called from the environment at most once" part.
If I understand correctly, these malloc/free exports are used to make
additional calls into the instance.

  • does the "command" concept implies wasi?

That's a complex question :-). The concept of a "command" isn't in core
WebAssembly, at least. Consequently, it needs to be defined in some other
standards layer. At the moment, WASI has a definition for its own use.

It looks like the component model will provide a definition soon, with the
canonical ABI defining how a "command" looks in core WebAssembly, and with
WASI Preview2 I'm planning to propose WASI switch to that.

  • i feel it's simpler to have two types of --export (eg. a new cli option to specify extra entry points), rather than flipping the semantics of all --export with a global command/reactor switch.

I don't currently have a clear view of what functionality LLVM should support,
and am reluctant to add more compatibility surface area until we have more
clarity about this.

for my very specific case it can be worked around eg. by "-mexec-model=reactor -emain". but i feel it isn't a right thing to do.

It's difficult to talk about the "right thing to do" in the short term. I'm giving a presentation at the upcoming WASI meeting about "WASI Preview2", which is a plan for a major new successor to the current wasi_snapshot_preview1 API, which will include a new clear definition of "commands", which I expect will be our best chance at fixing these issues.

unfortunately the meeting is at midnight for me.

I'll post the slides in the meeting notes, and I'll be following up with
documentation and standards proposals.

yamt added a comment.Jun 30 2022, 12:25 AM

there are use cases for a command to export non-main functions.
eg. malloc/free stuff mentioned in https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/memory_tune.md
this change broke them.

One consideration is that that wamr API doesn't seem compatible with closest thing we have to a spec, which states "Command instances may assume that they will be called from the environment at most once. Command instances may assume that none of their exports are accessed outside the duration of that call.". It's debatable how authoritative that spec is, but the big picture is that there's not currently much clarity on the relationship between LLVM and engines.

  • if "outside the duration of that call" mean it's ok to access exports during the call of _start, the wamr api seems compatible to me. (those malloc/free exports are used during the execution of _start, as far as i understand.)

I'm referring to the "will be called from the environment at most once" part.
If I understand correctly, these malloc/free exports are used to make
additional calls into the instance.

the sentence made me think it was only about the entry point of the command. (_start)
what's "called" here is "command instances", not exported functions, isn't it?

a related question:
does it mean to prevent any callback-style apis?
i have a system in which the embedder provides C-callback style api to modules.
it's via C function pointers (func table indexes), not via exports, though.

  • does the "command" concept implies wasi?

That's a complex question :-). The concept of a "command" isn't in core
WebAssembly, at least. Consequently, it needs to be defined in some other
standards layer. At the moment, WASI has a definition for its own use.

It looks like the component model will provide a definition soon, with the
canonical ABI defining how a "command" looks in core WebAssembly, and with
WASI Preview2 I'm planning to propose WASI switch to that.

ok. thank you for explanation.

  • i feel it's simpler to have two types of --export (eg. a new cli option to specify extra entry points), rather than flipping the semantics of all --export with a global command/reactor switch.

I don't currently have a clear view of what functionality LLVM should support,
and am reluctant to add more compatibility surface area until we have more
clarity about this

i agree it isn't the best moment to make changes.

my point is that some of exports are entry points of the command and the others are not.

for my very specific case it can be worked around eg. by "-mexec-model=reactor -emain". but i feel it isn't a right thing to do.

It's difficult to talk about the "right thing to do" in the short term. I'm giving a presentation at the upcoming WASI meeting about "WASI Preview2", which is a plan for a major new successor to the current wasi_snapshot_preview1 API, which will include a new clear definition of "commands", which I expect will be our best chance at fixing these issues.

unfortunately the meeting is at midnight for me.

I'll post the slides in the meeting notes, and I'll be following up with
documentation and standards proposals.

ok. thank you.

IIUC, with commands you can call back into the module, but only while the command entry point (_start) is running. i.e. while _start is running it can call out, and you can callback in as much as you want.

Once _start returns the instance is no longer in a valid state and can no longer called into (for example it would not be valid to call malloc after _start returns since the heap data structures may have been torn down).

If you want to call into a module after _start returns (and have the state preserved between calls) then what you want is a reactor and you should use _initialize rather than _start.

yamt added a comment.EditedJul 8 2022, 12:09 AM

IIUC, with commands you can call back into the module, but only while the command entry point (_start) is running. i.e. while _start is running it can call out, and you can callback in as much as you want.

Once _start returns the instance is no longer in a valid state and can no longer called into (for example it would not be valid to call malloc after _start returns since the heap data structures may have been torn down).

If you want to call into a module after _start returns (and have the state preserved between calls) then what you want is a reactor and you should use _initialize rather than _start.

sure.
but i'm not sure how it's related to this change.

in the wamr malloc/free use case mentioned above, these exports are used while _start is running as far as i know.

yamt added a comment.Jul 8 2022, 12:55 AM

for my very specific case it can be worked around eg. by "-mexec-model=reactor -emain". but i feel it isn't a right thing to do.

It's difficult to talk about the "right thing to do" in the short term. I'm giving a presentation at the upcoming WASI meeting about "WASI Preview2", which is a plan for a major new successor to the current wasi_snapshot_preview1 API, which will include a new clear definition of "commands", which I expect will be our best chance at fixing these issues.

unfortunately the meeting is at midnight for me.

I'll post the slides in the meeting notes, and I'll be following up with
documentation and standards proposals.

https://github.com/WebAssembly/meetings/blob/main/wasi/2022/WASI-06-30.md