Page MenuHomePhabricator

[WebAssembly] Add symbol table to LLD, 1/2

Authored by ncw on Jan 11 2018, 10:45 AM.



WebAssembly object files will now have an explicit symbol table. This commit is the first of two halves, adding support throughout the code for addressing symbols by index; however, the symbols are still written out using a single import/export (depending on whether defined).

This half of the symbol table work is valid on its own, and the second half doesn't need to be committed immediately after. However, the corresponding LLVM change D41954 must be committed simultaneously with this.

Diff Detail

rLLD LLVM Linker

Event Timeline

ncw created this revision.Jan 11 2018, 10:45 AM
ncw updated this revision to Diff 129658.Jan 12 2018, 10:32 AM
ncw edited the summary of this revision. (Show Details)
ncw added a reviewer: sbc100.


  • Some fixes after rebase
  • Most tests now passing (a couple of bugs still to iron out)

Thankfully there's not too much to change in LLD!

ncw updated this revision to Diff 129732.Jan 12 2018, 4:45 PM

More updates, fixed all but two tests.

The remaining two tests are rather hard to fix; my changes unfortunately don't play well with "relocatable" output.

Problem #1

The stack-pointer.ll test uses --emit-relocs but not --relocatable. This means that the __stack_pointer symbol is _not_ imported; it's instead defined, but we can't export it (because it's mutable). Thus, the _global_ has an output index, but there is no _symbol_ index for it (so far, behaviour hasn't changed...). What this breaks is that I've made it so that relocs reference the symbol index - so relocs against the stack pointer won't work with --emit-relocs but not --relocatable.

Hmm. Maybe we can just fix the test to use --relocatable :( at least until we can export mutable globals.

Problem #2

The other failing test is init-fini.ll. What causes the problem is that we take the address of a BINDING_LOCAL symbol. When linking with --relocatable, we try and create a relocation against the function and fail. This is because the function isn't exported as a Symbol in the relocatable output.

This can probably be fixed by just exporting Symbols for the locals as well when doing relocatable. We'd need to budge the names to keep them unique, which they won't be if we export the locals from all object files. That would at least make LLD's relocatable output match what Clang outputs.


Linking (relocating) against Symbols rather than functions/globals does make aliases nice and elegant... but breaks some other things.

Another (less good?) solution might be to split the relocations into "FUNCTION_SYMBOL_INDEX_LEB" and "FUNCTION_WASM_INDEX_LEB", where the SYMBOL_INDEX version in outputted by WasmObjectWriter (Clang) and the WASM_INDEX version is outputted by LLD when it's creating relocatable output. The we can cope with both the cases above, at the expense of even more code handling different types of relocations.

ncw updated this revision to Diff 129858.Jan 15 2018, 7:25 AM

Simple rebase to fix conflicts. The two problems above remain.

ncw updated this revision to Diff 129980.Jan 16 2018, 9:49 AM
ncw retitled this revision from [WebAssembly] Refactor symbol index handling (LLD) to [WebAssembly] Symbol changes #5: add "symbol index" concept (LLD).
ncw edited the summary of this revision. (Show Details)

Updated: split up the big review into six smaller ones.

The two big "problems" described above are now fixed. Problem #1 (stack pointer incompatible with --emit-relocs is fixed by using --relocatable for that test instead. Problem #2 (taking the address of a local function) is/was fixed in D42105.

ncw added inline comments.Jan 16 2018, 9:56 AM

Updated test expectations: Snuck in an test update here that's not strictly required...


Updated test expectations: Because the import has been removed (not part of the test assertions, sadly) this index has now gone down by one.

ncw updated this revision to Diff 130187.Jan 17 2018, 7:50 AM


  • Simple rebase to fix conflict with __heap_base change
ncw updated this revision to Diff 130377.Jan 18 2018, 2:42 AM
ncw edited the summary of this revision. (Show Details)


  • Got rid of change to stack-pointer.ll test. That's now split out into another review, #4b (D42237).
ncw updated this revision to Diff 130654.Jan 19 2018, 10:27 AM


  • Simple rebase, fixing conflicts along the way
ncw updated this revision to Diff 131111.Jan 23 2018, 11:07 AM


  • Now follows the symtab proposal! Or at least, it's first half of that work. The various indexes now match the exact values they will have after the symtab refactor, but we're still reading/writing the symbols using the imports+exports rather than the linking metadata section
  • Next bits of symtab should follow in another review
ncw updated this revision to Diff 131256.Jan 24 2018, 6:48 AM

Updated to remove conflicts in the hope it can be merged soonish... Getting the relocation indexes right is a prerequisite for the new symtab stuff.

ncw retitled this revision from [WebAssembly] Symbol changes #5: add "symbol index" concept (LLD) to [WebAssembly] Add symbol table to LLD, 1/2.Jan 25 2018, 8:08 AM
ncw edited the summary of this revision. (Show Details)
ncw updated this revision to Diff 132217.Jan 31 2018, 11:07 AM

Simple rebase on top of D42751, to allow those changes to be cleared out of the way

ncw updated this revision to Diff 132285.Jan 31 2018, 3:19 PM

Rebased and renamed WasmIndex -> ElementIndex

ruiu added a subscriber: ruiu.Jan 31 2018, 3:47 PM
ruiu added inline comments.

This is not related to this particular patch, but since I found this in this patch, I'll write the comment here. This piece of code is alarming because it uses hash tables. In general, in lld, I'd strongly recommend not to use hash tables or anything that is more complicated than the vector, because hash tables can make the linker noticeably slower.

If you really need to use a hash table, please look it up only once for a new piece of data, just like we do in the SymbolTable. In SymbolTable, we do lookup a new symbol name in the hash table only once to obtain a pointer, and after that, we access the symbols through the pointer. We never look up the symbol table more than once for the same data to minimize the cost of hash table lookup. I designed everything in lld that way, and that is I believe one of the big reasons why lld is so fast.

I'm not sure if it is applicable to this piece of code. If you have a small number of types and look the hash table only a few times, that's perhaps fine. But if each function have a type, and you need to insert/look up a hash table for each function, that's too much. If that's the case, could you revisit this code and redesign?

sbc100 added inline comments.Jan 31 2018, 3:59 PM

Yes, I think we can eliminate lookupType completely once the symbol table changes land. We can just have the TypeIndex from each symbol as we call registerType.

ncw added inline comments.Jan 31 2018, 4:06 PM

In fact I remember making the same change in WasmObjectWriter (then reverting it), using a vector of bool to store whether something had been allocated rather than a DenseMap.

In LLD there are a few hash map lookups we've added I'm afraid :( In fact COMDAT is another one, I think I can rid of the hashmap LLD uses for that as well. That was my fault, before I realised hash maps were forbidden.

ruiu added inline comments.Jan 31 2018, 4:48 PM

It's not forbidden. :) For example SymbolTable is a valid use of the hash table -- I can't think of any way to avoid hash table in that class. But it is strongly discouraged because it sometimes has a significant performance impact.

ncw updated this revision to Diff 132355.Feb 1 2018, 3:57 AM

Update: no functional changes, just fixed the compile after the "calculateTypes" change added a conflict

sbc100 added inline comments.Feb 6 2018, 7:39 PM

This should be called "Symbol:" now, right? (at last with 2/2)



ncw updated this revision to Diff 133221.Feb 7 2018, 8:32 AM
ncw marked an inline comment as done.

Rebased, attended to review comments/renaming


Renamed to "Symbol:"


Hmm, I've left it as just "Index" for the time being. We could rename it in a later commit.

Relocations have a Type/Index/Addend, which is consistent for the different types, whereas if we change it to "Symbol" it would then have to be different for TYPE relocations, which seems uglier to me (initially).

ncw updated this revision to Diff 133307.Feb 7 2018, 2:26 PM

Update: folded D42539 into this patch

ncw abandoned this revision.Feb 14 2018, 5:31 AM