Page MenuHomePhabricator

[WebAssembly] Initial wasm linker implementation
ClosedPublic

Authored by sbc100 on Jun 29 2017, 5:23 PM.

Details

Summary

This linking backend is a work in progress, but is enough
to link simple programs including linking against library
archives.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ruiu added inline comments.Sep 19 2017, 4:30 PM
wasm/OutputSections.cpp
106 ↗(On Diff #115885)

Can you use ArrayRef?

147 ↗(On Diff #115885)

Is it okay to fallthrough? If so, add LLVM_FALLTHROUGH.

150 ↗(On Diff #115885)

Why you can't directly call encodeULEB128(Reloc.Value, Buf, 5); here?

155 ↗(On Diff #115885)

Ditto -- it seems you could call encodeSLEB128(Reloc.Value, Buf, 5); here.

164 ↗(On Diff #115885)

Previous switch doesn't have a default, so the control could reach here with Encoding uninitialized, no?

165–172 ↗(On Diff #115885)

Remove {}

195 ↗(On Diff #115885)

I believe you don't need this because raw_string_ostream::~raw_string_ostream would do this for you.

201 ↗(On Diff #115885)

If BodySize is always initialized to 0, the preferred way is to declare the variable with the initial value (i.e. BodySize = 0;)

213 ↗(On Diff #115885)

remove const as it is obvious that we don't mutate Content.

230–231 ↗(On Diff #115885)

You want to merge successive log into one function call because otherwise other thread could write other lines between the two lines.

wasm/OutputSections.h
50 ↗(On Diff #115885)

I don't know why you need to aggregate input sections' relocations into one place. For COFF or ELF, we don't do that. Each input section has writeTo method and knows how to copy itself to a given buffer and relocate itself. Can't you do in the same way?

wasm/Symbols.cpp
46 ↗(On Diff #115885)

If you know File is of ObjectFile, use cast instead of dyn_cast.

wasm/Symbols.h
94 ↗(On Diff #115885)

I geuss you can initialize

ArchiveSymbol = {nullptr, 0, 0};
106–109 ↗(On Diff #115885)

In lld we don't use operator overloading unless it is absolutely necessary. Since you can explicitly call toString() before emitting it to an output buffer, you can live without this. I'd remove this function.

wasm/Writer.cpp
35 ↗(On Diff #115885)

constexpr instead of const may make more sense.

69 ↗(On Diff #115885)

Do you need this?

144 ↗(On Diff #115885)

We use pre-increments unless there's reason to use post-increments.

193–195 ↗(On Diff #115885)

You can omit {}

206–210 ↗(On Diff #115885)

You can omit {}

266 ↗(On Diff #115885)

is?

345 ↗(On Diff #115885)

Remove an empty function.

375 ↗(On Diff #115885)

auto -> auto *

384 ↗(On Diff #115885)

auto -> auto *

504 ↗(On Diff #115885)

We usually name this Offset.

Is uint32_t enough? I thought 64-bit executables could be larger than 4 GiB.

584 ↗(On Diff #115885)

I'm a bit confused about what this function does. Does this calculate file offsets as opposed to virtual memory offsets? In general, this file needs more comments.

607–612 ↗(On Diff #115885)

Looks like it's the same as

if (WasmFile->memories().size() > 1)
  fatal(File->getName() + ": contains more than one memory");
627 ↗(On Diff #115885)

No else after fatal, as fatal does not return.

764–770 ↗(On Diff #115885)

You could write

void wasm::writeResult() { Writer().run(); }

without these namespaces.

wasm/Writer.h
22–26 ↗(On Diff #115885)

It seems you are not using this struct.

wasm/WriterUtils.cpp
23 ↗(On Diff #115885)

Use the LLVM naming convention.

34–35 ↗(On Diff #115885)

fatal is _Noreturn.

39–40 ↗(On Diff #115885)

Instead of enclosing the entire file, can you explicitly add wasm:: to each public functions?

wasm/WriterUtils.h
39–43 ↗(On Diff #115885)

These two functions look the same.

ruiu added a comment.Sep 19 2017, 4:33 PM

I'm sorry that I accidentally hit "Submit" button while I was writing a comment.

So, generally, this patch is improved in a right direction. There are still some code that I don't understand yet, mainly in Writer, but it is probably due to my lack of knowledge about wasm.

Thank you for all the feedback @ruiu. It not surprise to me that a patch of this size takes a long time to review. I'm currently working on a fairly large change to allow data sections with the same name to be combined into a single section.. in order to allow things like init_array/fini_array. My aim is to get that done before the initial version of this patch lands.

sbc100 updated this revision to Diff 116282.Sep 21 2017, 3:20 PM
sbc100 marked 33 inline comments as done.
  • [WebAssembly] Update test output to match llvm changes
  • [WebAssembly] only run wasm tests when wasm is enabled
  • [WebAssembly] Review feedback
sbc100 added inline comments.Sep 21 2017, 3:20 PM
test/wasm/archive.ll
1 ↗(On Diff #115885)

I added lit.local.cfg to this directory to disable them all in this case.

wasm/InputFiles.cpp
166–171 ↗(On Diff #115885)

Yes, you are right. The work i'm doing to combine data sections will address, and makes this code much cleaner. So this will be fixed on my next upload.

208 ↗(On Diff #115885)

Yes, llvm-ar would be needed to great the archive indexes.

I see yes, it might be good to handle archives with no symbol table. I'll add a TODO note here.

266 ↗(On Diff #115885)

Ah, I was copying from the COFF linker. We should probably make this consistent at some point (perhaps as good candidate for shared code?)

wasm/OutputSections.cpp
56–57 ↗(On Diff #115885)

switched to llvm_unreachable

195 ↗(On Diff #115885)

True, but the getSize() below depends on Header, so we need the value before the end of the function. I could refactor to avoid this, especially since its only a logging statement.

wasm/OutputSections.h
24–25 ↗(On Diff #115885)

I'm creating these using emplace_back() which seems to require a constructor. I could just use push_back() instead I guess?

ruiu added a comment.Sep 24 2017, 4:18 PM

Is this ready for another round of review?

wasm/WriterUtils.cpp
38 ↗(On Diff #116282)

More conventional way of doing it is to add using namespace lld; at the start of each file and do not enclose a file with namespace lld {}.

sbc100 updated this revision to Diff 120681.Oct 27 2017, 12:59 PM
  • Merge branch 'master' into wasm
  • [WebAssembly] Use UINT32_MAX rather than 0 for adddresses of undefined globals/functions
  • [WebAssembly] Combine data segments with the same name
  • Reflect upstream path changes to include/lld/common (#7)
  • Delete lldConfig library from CMakeFiles.txt
  • Add lldCommon to wasm LINK_LIBS (left over from last update)
  • Merge branch 'master' into wasm
  • Fix a merge conflict
  • Merge remote-tracking branch 'origin/master' into wasm
  • [WebAssembly] Ensure than name section is in function index order
  • [WebAssembly] Fix for SLEB relocations of type UINT32_MAX.
  • Merge remote-tracking branch 'origin/master' into wasm_work
  • [WebAssembly] Switch to Common/ErrorHandler.h
  • [WebAssembly] use Common/Threads.h
ruiu added a comment.Oct 30 2017, 4:06 PM

This patch generally looking good. Looks like all my big concerns have already been resolved.

I have a little concern on documentation. Since wasm is a new object file format, and it employs different concepts than ELF or COFF, we cannot assume any prior knowledge to readers on wasm file format or linking semantics other than the common knowledge about linking. So, in short, I think this patch needs a bit more comment. Particularly, comments from high level would help readers understand your code, because once you understand a concept, minor details of your code will look pretty obvious. (This is a pretty straightforward linker implementation, so once you understand what it is supposed to do, it is easy to understand the code, I think.)

tools/lld/CMakeLists.txt
21 ↗(On Diff #120681)

Looks like <arch>-<toolname> is more common than <toolname>-<arch>. For example, cross gcc for AVR installed on my system is avr-gcc and not gcc-avr. So, maybe wasm-ld is a better name than ld-wasm.

Sorry for back-and-forth.

wasm/InputFiles.cpp
135 ↗(On Diff #120681)

Please add a brief function comment saying that this function returns a segment that contains a given symbol.

155–158 ↗(On Diff #120681)

nit: pre-++ is a bit more C++-ish.

233 ↗(On Diff #120681)

count++ → ++Count

244 ↗(On Diff #120681)

each other.)

wasm/InputFiles.h
83–85 ↗(On Diff #120681)

They are private functions, no?

117 ↗(On Diff #120681)

nit: add a blank line.

123–127 ↗(On Diff #120681)

Can you add a blank line before each line comment?

wasm/InputSegment.h
7 ↗(On Diff #120681)

Can you describe what is "segment" in wasm? I was thinking that segments are another name for output sections, but looks like it is a different concept.

wasm/OutputSections.cpp
56–57 ↗(On Diff #115885)

Looks like this function is not updated.

33 ↗(On Diff #120681)

Return StringRef instead of const char *.

wasm/Symbols.cpp
35 ↗(On Diff #120681)

Can you add function comments to functions defined in this file?

wasm/Writer.cpp
69 ↗(On Diff #115885)

Looks like you didn't update this file for my previous comments.

sbc100 updated this revision to Diff 122722.Nov 13 2017, 1:49 PM
sbc100 marked 14 inline comments as done.
  • [WebAssembly] Handle early exit via exitLld
  • Merge remote-tracking branch 'origin/master' into wasm
  • [WebAssembly] Fix --verbose flag
  • [WebAssembly] Fix gcc warnings/errors
  • [WebAssembly] Fix code relocation for large output files
  • Merge remote-tracking branch 'origin/master' into wasm
  • [WebAssembly] Support comments in wasm.syms
  • Merge remote-tracking branch 'origin/master' into wasm
  • [WebAssembly] Update for llvm change
  • [WebAssembly] Review feedback
sbc100 updated this revision to Diff 122735.Nov 13 2017, 3:04 PM
  • add docs
sbc100 updated this revision to Diff 122741.Nov 13 2017, 3:48 PM
  • Remove whole-file namespace wrapping

Hi ruiu.

I've added some documentation, removed the whole-file namesspace wrapping, integrated with the recent libCommon changes and upstream llvm changes.

Hopefully we can land this soon, so we can move to incremental diffs that will be easier to review. Or maybe you think I think try to be more feature complete for the initial version?

ruiu accepted this revision.Nov 16 2017, 4:17 AM

I'd write more documents to describe the wasm object file format and the wasm linker's semantics, but we can do that later.

LGTM with these changes.

Thank you very much for your patience!

docs/WebAssembly.rst
8 ↗(On Diff #122741)

mimic

33 ↗(On Diff #122741)

functions

wasm/Driver.cpp
82 ↗(On Diff #122741)

If you change this to LinkerDriver().link(Args), can you remove global variable Driver?

150 ↗(On Diff #122741)

Could you revert this '#' check if you don't need it now? I mean if we want to add a way to have a comment in some file, we want to do the same thing for ELF and wasm, instead of inventing a different way for each target.

This revision is now accepted and ready to land.Nov 16 2017, 4:17 AM
sbc100 updated this revision to Diff 123246.Nov 16 2017, 2:59 PM
sbc100 marked an inline comment as done.
  • Fix use after free
  • Merge remote-tracking branch 'origin/master' into wasm
  • [WebAssembly] Add test case for output >128 functions
  • [WebAssembly] Refactor the way relocations are applied and written
  • [WebAssemlby] Parallel code writing
  • [WebAssembly] Parallel data writing
  • [WebAssembly] clang-format
  • [WebAssembly] IWYU
  • fix typo in addDefinedGlobal
  • Merge branch 'master' into wasm
  • review feedback
aheejin added inline comments.Nov 16 2017, 3:16 PM
wasm/Driver.cpp
102 ↗(On Diff #122741)

After this was changed from const to constexpr, VS 2017 compiler on the waterfall bot does not compile this, saying:

tools\lld\wasm\Options.inc(28): error C2131: expression did not evaluate to a constant
tools\lld\wasm\Options.inc(28): note: failure was caused by non-constant arguments or reference to a non-constant symbol

I'm not really sure what the problem is, since this looks fine to me, but anyway could you possibly revert this to const, to make the VS2017 happy?

sbc100 updated this revision to Diff 123249.Nov 16 2017, 3:23 PM
  • fix windows build
sbc100 updated this revision to Diff 123255.Nov 16 2017, 3:30 PM
  • final tweaks (i hope!)

No problem! Glad to land this initial version. Lets hope it sticks ..

docs/WebAssembly.rst
33 ↗(On Diff #122741)

Reworded.

wasm/Driver.cpp
102 ↗(On Diff #122741)

Ok, I reverted that for now. The other linker backed all use const rather than `constexpr` anyway.

150 ↗(On Diff #122741)

This is already a completely wasm-specific feature. And it quite useful to support comments at the moment.

See: https://github.com/jfbastien/musl/blob/wasm-prototype-1/arch/wasm32/wasm.syms

In the long run I think we might be able to find a better solution to this. Some way to inform the linker about the expected undefined symbols. Maybe some kind of declspec in the source code?

sbc100 edited the summary of this revision. (Show Details)Nov 16 2017, 3:30 PM
This revision was automatically updated to reflect the committed changes.