This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] When loading libraries look for companion `.imports` file
ClosedPublic

Authored by sbc100 on Jan 10 2018, 3:33 PM.

Details

Summary

This allows libraries to supply a list of symbols which are
allowed to be undefined at link time (i.e. result in imports).

This method replaces the existing mechanism (-allow-undefined-file)
used by the clang driver to allow undefined symbols in libc.

For more on the motivation for this see:
https://github.com/WebAssembly/tool-conventions/issues/35

In the long run we hope to remove this features and instead
include this information in the object format itself.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

sbc100 created this revision.Jan 10 2018, 3:33 PM
sbc100 updated this revision to Diff 129355.Jan 10 2018, 3:49 PM
  • Support all library usage
ruiu added inline comments.Jan 10 2018, 11:13 PM
wasm/Driver.cpp
183–194

This looks suspicious --

  1. It loads an entire .import file for an .a file, even if we won't end up linking any object file from that archive file. That doesn't make much sense to me. Shouldn't this be per-file?
  1. IMO a side-by-side magic text file is in general not a good idea. We don't have such notion in Unix linker, and we get used to copy .a and .o files to other directories expecting that they would just work fine. This seems error-prone.
sbc100 added inline comments.Jan 11 2018, 7:16 AM
wasm/Driver.cpp
183–194

I agree its not ideal. But it is incrementally better than that current solution which requires the user (normally the clang driver) to pass extra linker args. The long term solution probably involved marking the symbold with attribute(something) in the source and marking those symbols in the object format. So I'd like to land this incremental solution now. We can leave the bug open to create a more robust solution.

This change will also allow us to remove the -allow-undefined-file option, which will be nice cleanup.

sbc100 updated this revision to Diff 129466.Jan 11 2018, 9:49 AM
  • fix test
ruiu added inline comments.Jan 11 2018, 12:54 PM
wasm/Driver.cpp
164

nit: "read" is more common than "process" for reading a file in lld.

183–194

If that's the case, I'm fine with this, but could you describe it as the comment for this code? It's somewhat surprising feature, so it needs some explanation to keep it readable.

193

I'd return here instead of } else {.

sbc100 edited the summary of this revision. (Show Details)Jan 11 2018, 1:55 PM
sbc100 edited the summary of this revision. (Show Details)
sbc100 updated this revision to Diff 129515.Jan 11 2018, 1:57 PM
sbc100 marked 3 inline comments as done.
  • feedback
sbc100 edited the summary of this revision. (Show Details)Jan 11 2018, 1:58 PM
ruiu added a comment.Jan 11 2018, 2:00 PM

How about adding a comment about the .import file?

sbc100 updated this revision to Diff 129524.Jan 11 2018, 2:12 PM
sbc100 edited the summary of this revision. (Show Details)
  • add comment
ruiu accepted this revision.Jan 11 2018, 2:15 PM

LGTM

wasm/Driver.cpp
165

typo: are can

167

that list -> that lists
syscall function -> syscall functions

This revision is now accepted and ready to land.Jan 11 2018, 2:15 PM
This revision was automatically updated to reflect the committed changes.
ncw added a subscriber: ncw.EditedJan 29 2018, 5:51 AM

I've been having further thoughts about this libc.imports business.

What I'm doing in my "minscripten" minimal Wasm toolchain turns the problem the other way up: it's not the business of a library file like libc.a to declare what symbols it "imports". Instead, at the point when you link, the linker should have all the JavaScript files and so on all available - so as a pre-link step, Emscripten should be generating the --allow-undefined-symbols file based on the JavaScript symbols. Emscripten definitely knows what JS symbols are available at the point when it invokes LLD via Clang, so it should be telling LLD which symbols are allowed to be imported.

In D41922#990507, @ncw wrote:

I've been having further thoughts about this libc.imports business.

What I'm doing in my "minscripten" minimal Wasm toolchain turns the problem the other way up: it's not the business of a library file like libc.a to declare what symbols it "imports". Instead, at the point when you link, the linker should have all the JavaScript files and so on all available - so as a pre-link step, Emscripten should be generating the --allow-undefined-symbols file based on the JavaScript symbols. Emscripten definitely knows what JS symbols are available at the point when it invokes LLD via Clang, so it should be telling LLD which symbols are allowed to be imported.

I'm not sure about that. Isn't it libc that knows exactly what symbols it expects to come from the embeder? Sure with emscripten we could imagine a complex compiler driver that is aware of all the JS functions and construct a long list of possible imports to be passed to linker, but I don't think that a very nice approach and requires complexity in the driver that it would nice to avoid. I like Dan's approach to being able to mark such function in the source language: https://reviews.llvm.org/D42520.

In any case it seems like we can support both methods: import defined by the libraries you link as well as explicitly allowed imports via the linker command line.

On this topic I was considering removing --allow-undefined-file since we are no longer using it.. and perhaps renaming --allow-undefined to --import (to match the existing --export). Are you using these flags currently in your toolchain?

ncw added a comment.EditedJan 29 2018, 12:35 PM

I'm not sure about that. Isn't it libc that knows exactly what symbols it expects to come from the embeder? Sure with emscripten we could imagine a complex compiler driver that is aware of all the JS functions and construct a long list of possible imports to be passed to linker, but I don't think that a very nice approach and requires complexity in the driver that it would nice to avoid. I like Dan's approach to being able to mark such function in the source language: https://reviews.llvm.org/D42520.

Not from my point of view! That would be an "inversion of control" in the Dependency Injection jargon. We often write software that has an external dependency on a higher layer (for example Musl does this when it invokes the "kernel"/embedder for syscalls, or a library that requires the developer to define/resolve some symbols that the library doesn't provide). The rule is: if you implement it, you get to say where it comes from. So anything that Musl doesn't provide, Musl shouldn't care where it comes from: whoever provides the syscalls has business of getting them through to Musl. Since Musl doesn't provide the syscalls, it feels wrong for me have libc.imports file shipped along with Musl, since that imports file is effectively a file whose whole purpose is to declare the implementation language (JS vs C) for certain routines. Only the person who provides those routines should care which language they're written in (ie which "side" of the JS-Wasm interface provides them).

You mentioned a "complex" linker driver that is "aware of all the JS functions". But what I'm envisaging is fairly simple: Emscripten (and my "minscripten" toolchain) currently construct a JS file that loads the Wasm module and builds some JS shims around it. Those tools are fully aware of which linkable symbols are available on the JS side - it's not a hardcoded list, it's in fact part of the working set of the Emscripten toolchain; they've got a variable already in their Python code that knows about the those symbols, because it's an unavoidable part of building the JS module. There's no practical reason why LLD couldn't be told an exact list of which functions are available on the JS side.

And indeed, the nice thing about that is it's fully automatic - who wants to build a .imports file manually? If you write some functions in C, and some functions in JS, you want Emscripten to link them together for you, rather than having to manually write a second set of declarations mirroring your JS files for the benefit of LLD. Anything that can be derived automatically from the developer's code, should be. Hence the build toolchain ought to be passing the list of JS-defined symbols down to LLD, since those are precisely the symbols that it's OK for LLD to be importing.

In any case it seems like we can support both methods: import defined by the libraries you link as well as explicitly allowed imports via the linker command line.

On this topic I was considering removing --allow-undefined-file since we are no longer using it.. and perhaps renaming --allow-undefined to --import (to match the existing --export). Are you using these flags currently in your toolchain?

I am using --allow-undefined-file, and I think that it is useful. I'm not using --allow-undefined or --export, but I suspect they must be useful to someone.

In D41922#991056, @ncw wrote:

I'm not sure about that. Isn't it libc that knows exactly what symbols it expects to come from the embeder? Sure with emscripten we could imagine a complex compiler driver that is aware of all the JS functions and construct a long list of possible imports to be passed to linker, but I don't think that a very nice approach and requires complexity in the driver that it would nice to avoid. I like Dan's approach to being able to mark such function in the source language: https://reviews.llvm.org/D42520.

Not from my point of view! That would be an "inversion of control" in the Dependency Injection jargon. We often write software that has an external dependency on a higher layer (for example Musl does this when it invokes the "kernel"/embedder for syscalls, or a library that requires the developer to define/resolve some symbols that the library doesn't provide). The rule is: if you implement it, you get to say where it comes from. So anything that Musl doesn't provide, Musl shouldn't care where it comes from: whoever provides the syscalls has business of getting them through to Musl. Since Musl doesn't provide the syscalls, it feels wrong for me have libc.imports file shipped along with Musl, since that imports file is effectively a file whose whole purpose is to declare the implementation language (JS vs C) for certain routines. Only the person who provides those routines should care which language they're written in (ie which "side" of the JS-Wasm interface provides them).

You mentioned a "complex" linker driver that is "aware of all the JS functions". But what I'm envisaging is fairly simple: Emscripten (and my "minscripten" toolchain) currently construct a JS file that loads the Wasm module and builds some JS shims around it. Those tools are fully aware of which linkable symbols are available on the JS side - it's not a hardcoded list, it's in fact part of the working set of the Emscripten toolchain; they've got a variable already in their Python code that knows about the those symbols, because it's an unavoidable part of building the JS module. There's no practical reason why LLD couldn't be told an exact list of which functions are available on the JS side.

And indeed, the nice thing about that is it's fully automatic - who wants to build a .imports file manually? If you write some functions in C, and some functions in JS, you want Emscripten to link them together for you, rather than having to manually write a second set of declarations mirroring your JS files for the benefit of LLD. Anything that can be derived automatically from the developer's code, should be. Hence the build toolchain ought to be passing the list of JS-defined symbols down to LLD, since those are precisely the symbols that it's OK for LLD to be importing.

In any case it seems like we can support both methods: import defined by the libraries you link as well as explicitly allowed imports via the linker command line.

On this topic I was considering removing --allow-undefined-file since we are no longer using it.. and perhaps renaming --allow-undefined to --import (to match the existing --export). Are you using these flags currently in your toolchain?

I am using --allow-undefined-file, and I think that it is useful. I'm not using --allow-undefined or --export, but I suspect they must be useful to someone.

I agree that providing the imports file is a hack and needs to go. That is way we want to allow for source level attributes. Once that lands we can remove support for the imports file. If you prefer the idea of the linker driver controlling this stuff we don't need to remove the support for --allow-undefined/-file. I don't think there is a reason these two approaches can't both exist (Unless you are opposed to Dan added the new source attribute?). I think the main thing is that we agree this imports hack should be removed.

From my POV, I've very keen on the vanilla out-of-the-box clang "just working", and not have to rely on a higher level or external toolchain drivers. I would love just to be able to do "wasm-clang hello.c -o hello.wasm" and have it produce something usable.

ncw added a comment.Jan 29 2018, 1:27 PM

From my POV, I've very keen on the vanilla out-of-the-box clang "just working", and not have to rely on a higher level or external toolchain drivers. I would love just to be able to do "wasm-clang hello.c -o hello.wasm" and have it produce something usable.

Me too! I don't think that the Wasm output should need any post-processing at all. The only tool I want really is to act as a JavaScript "linker" to build the JavaScript module, that calls the Wasm module, and exposes a higher-level API on top of the C-level provided by the Wasm exports. For the moment, that does need an extra tool to build.