This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add symbol table to LLVM, 1/2
AbandonedPublic

Authored by ncw on Jan 11 2018, 10:45 AM.

Details

Reviewers
sbc100
sunfish
Summary

WebAssembly object files will now have an explicit symbol table. This commit is the first of two halves, adding support throughout the code for addressing symbols by index; however, the symbols are still written out using a single import/export (depending on whether defined).

This half of the symbol table work is valid on its own, and the second half doesn't need to be committed immediately after. However, the corresponding LLD change D41955 must be committed simultaneously with this.


Addresses: https://github.com/WebAssembly/tool-conventions/issues/34

Changes:

  • For IMPORTs, go back to where we in November! Get rid of "alt index" stuff and don't generate an import for weakly-exported symbols. Now each MCSymbol generates a single import or export (never both) and each import/export generates a single LLD Symbol. Whew!
  • Track both "symbol index" and "Wasm index" for each entity. A Wasm function can have more than one symbol attached. Hopefully the naming now avoids any confusion as to what index is referring to what.
  • Document new scheme in the headers

Diff Detail

Repository
rL LLVM

Event Timeline

ncw created this revision.Jan 11 2018, 10:45 AM

Looks promising!

If we are going to remove debug names from the symbol table (which seems like a reasonable idea) can you split that into a separate change?

sbc100 added inline comments.Jan 11 2018, 11:31 AM
include/llvm/BinaryFormat/Wasm.h
93 ↗(On Diff #129478)

How about just:

uint32_t Index;   // Function index space

There is really only one index that a function can have.

If we want to be extra clear we could also add new types for FunctionIndex and GlobalIndex I suppose.

include/llvm/Object/Wasm.h
57

Its a single index space right? How about just "Index into the symbol index space, which consists of imports and exports"

159

Do we need these new methods?

265

Do these needs to be separate? Why not just have "Symbols"?

lib/MC/WasmObjectWriter.cpp
579–581

I guess we can make this an if/else now since all relocations except TYPE_INDEX are now symbol-based.

It would be nice if we could unify TYPE_INDEX too but I don't think we can sadly since its used by call_indirect to call function pointers for which there is not a symbol.

776

Why this change?

878

Rename this argument to Symbolnfo?

lib/Object/WasmObjectFile.cpp
754

isValidFunctionWasmIndex?

759

Should we add isValidGlobalIndex? Maybe that plus the above can be a separate simply change?

950

Again, not sure we need to separate these do we? Seems like just makes our lives harder.

test/tools/llvm-nm/wasm/weak-symbols.yaml
61 ↗(On Diff #129478)

It would probably be useful to have the symbol index here too so it clear which symbol the info is for. I've been going this for other sections too: https://reviews.llvm.org/D41877

I guess you would need to rename the existing Index: field then. Perhaps to WasmIndex?

ncw added a comment.Jan 11 2018, 3:24 PM

Thanks for the early feedback!

I hadn't quite finished work on it, and was going to add some reviewers tomorrow.

The tests aren't all passing, some more need to be updated.

Regarding the tests - the diffs are going to seem larger & more invasive than I would have liked. Sorry :( Part of the changes are due to (partially) undoing D41472 at the same time.

Splitting it up would be a fiddle, but might be possible. Changing the Symbol index space kind-of forces most of the changes to happen all at once; some of the changes though are stuffed in a bit gratuitously I admit.

include/llvm/BinaryFormat/Wasm.h
93 ↗(On Diff #129478)

Oops, yes, need to rename it back! I renamed every var with "Index" in to force myself to examine/audit everywhere they were used.

(In my mind there are three possible index spaces for functions: the index into std::vector<WasmFunction> Functions, secondly the Wasm function index space, and thirdly the function Symbol index space. They're pretty clear from the context though! This one didn't need renaming.)

include/llvm/Object/Wasm.h
57

I've made functions/globals have separate Symbol index spaces, just like they have separate Wasm index spaces.

My rationale:

  1. Matches Wasm index space(s)
  2. Makes range checks easier in dozens of places - no need to check "is < max && is correct type" when the index spaces are separate
  3. No chance of type confusion - forgetting to do a type check and then accidentally using a Symbol of the wrong type. I really feel that merging the index spaces would increase the risk of bugs.
159

They're short and simple! I find it disappointing to have at least two (redundant) instances of the following code:

int NumImportedGlobals = 0;
int NumImportedFunctions = 0;
for (auto Import : Imports) {
  switch (Import.Type) {
  case Global: ++NumImportedGlobals; break;
  case Function: ++NumImportedFunctions; break;
  }
}

How verbose! Definitely worth eliminating silly counting loops; the WasmObjectFile is an ideal place to store simple information like how many imports/exports there are, so that other code doesn't have to do the counting.

265

See above. I could be convinced (or told!) to merge them, but for the moment I think it's more robust (and shorter overall) to keep separately-typed objects separate.

include/llvm/ObjectYAML/WasmYAML.h
133 ↗(On Diff #129478)

This is something I slightly regret. The need for a "Kind" field here is the one downside I can think of for keeping the function/global index space of Symbols separate.

"Index" var below needs comment.

lib/MC/WasmObjectWriter.cpp
776

Good catch. I wanted to make the intermediate object files as similar as possible to the final output files.

Why the odd differences? When making normal output, LLD started the Table from 1, but for relocatable output, LLD started it at zero, and WasmObjectWriter also started from zero. They should all just agree on what the right convention is, and use it.

I did consider not outputting a table at all from WasmObjectWriter (and for LLD's relocatable output). It's not used... so it's just dead code to generate something redundant. On the other hand, it's "nice" to have the object file mirror the behaviour of the LLD output. I'm still not 100% convinced "nice" is enough to justify dozens of lines of useless code however. Do you have any thoughts?

lib/Object/WasmObjectFile.cpp
295

I've decided actually to edit this.

Currently, I've changed it to get rid of the weird/unused DEBUG symbols, and generate a fresh+shiny "names" section in LLD for the browser to use. But, I should actually keep the original vector of names around, just for the time being, for wasm2yaml's sake.

950

Aha, you've pounced on the one line of code where it's extra...

test/tools/llvm-nm/wasm/weak-symbols.yaml
61 ↗(On Diff #129478)

The index here is already the symbol index (since it's symbol metadata!)

I need to make a corresponding PR in the Linking Conventions repo, to document what I've done (assuming this is eventually merged). Most of the linking metadata uses Symbol index space, just one or two bits of metadata use the wasm index space.

ncw updated this revision to Diff 129651.Jan 12 2018, 10:09 AM
ncw marked 6 inline comments as done.
ncw edited the summary of this revision. (Show Details)
ncw added reviewers: sbc100, sunfish.

Changes:

  • Responded to Sam's feedback
  • Fixed all test failures

It's pretty large, I'll look into splitting up into a couple of smaller commits. That drags out the time to get the chain of patches committed which increases rebasing time... but it's probably worth doing here. I'll have to come back to it next week, I can't do any more refactoring on it this week.

I'm happy with my changes overall now.

ncw updated this revision to Diff 129857.Jan 15 2018, 7:21 AM

Simple rebase to fix conflicts.

ncw updated this revision to Diff 129977.Jan 16 2018, 9:43 AM
ncw retitled this revision from [WebAssembly] Refactor symbol index handling (LLVM) to [WebAssembly] Symbol changes #5: add "symbol index" concept (LLVM).
ncw edited the summary of this revision. (Show Details)

Updated: split up the big review into six smaller ones. This is the core change still, the change that motivated it all!

ncw marked 3 inline comments as done.Jan 16 2018, 9:53 AM
ncw added inline comments.
test/MC/WebAssembly/global-ctor-dtor.ll
155

Updated test expectations: some benign reordering of iteration now means that globals come after functions.

test/MC/WebAssembly/weak-alias.ll
68

Updated test expectations: getting rid of these imports that duplicate the exports is exactly what we were trying to achieve!

The consequence is that all the function/global indexes below have been decremented by 1 - except for index zero (foo_alias) which is now 5. Each index change below matches my expectations.

ncw edited the summary of this revision. (Show Details)Jan 17 2018, 2:59 AM
ncw updated this revision to Diff 130147.Jan 17 2018, 5:30 AM

Updated:

  • Simple rebase to fix conflict
ncw updated this revision to Diff 130653.Jan 19 2018, 10:27 AM

Updated:

  • Simple rebase, fixing conflicts along the way
ncw updated this revision to Diff 131108.Jan 23 2018, 11:05 AM

Updated:

  • Now in line with the new symtab proposal!
  • NB Doesn't actually include the symtab itself, but changes relocations to use the new symbol indexes
  • The next review after this will do the actual refactor whereby global symbols are split into global/data symbols, and are written out in the linking metadata rather than in the imports/exports table
ncw updated this revision to Diff 131255.Jan 24 2018, 6:44 AM

Updating to remove conflict, just in hope it'll be merged...

ncw updated this revision to Diff 131457.Jan 25 2018, 8:05 AM
ncw marked 2 inline comments as done.
ncw retitled this revision from [WebAssembly] Symbol changes #5: add "symbol index" concept (LLVM) to [WebAssembly] Add symbol table to LLVM, 1/2.
ncw edited the summary of this revision. (Show Details)

Updated:

  • Rejigged to reduce the diff
  • Makes clearer the progression to the final symbol tab implementation

I'm happy with this, awaiting review.

ncw updated this revision to Diff 132215.Jan 31 2018, 11:03 AM

Simple update to remove conflicts (now based on top of D42750)

Looks like this needs anther rebase

sbc100 accepted this revision.Jan 31 2018, 1:38 PM

Ok.. I think we are almost there now.

I'm still not a fan of landing this in two parts, but I'm prepared to do it if land them close together I guess.

include/llvm/Object/Wasm.h
55

I think "either" is not supposed to be here.

61

Can we avoid renaming this for now and continue to call it ElementIndex? Especially since the above "SymbolIndex" is removed in the followup change. We can rename it later perhaps, but I keeping it the same will reduce churn and make this change more concise.

105

Revert this change, since SymbolIndex is a temporary measure?

lib/Object/WasmObjectFile.cpp
458

So we want to INIT_FUNCS to refer to symbols rather than functions?

969

Revert this?

This revision is now accepted and ready to land.Jan 31 2018, 1:38 PM
ncw updated this revision to Diff 132286.Jan 31 2018, 3:21 PM
ncw marked 4 inline comments as done.

Updated, renamed WasmIndex -> ElementIndex

lib/Object/WasmObjectFile.cpp
458

I think so. An init func could perfectly well be a weak symbol like a C++ inline ctor or something - I don't think the linking conventions should force the frontend into always emitting a non-discardable thunk for init functions (rather than allowing the init declaration to link to a shared symbol).

We should add a test for this at some point I guess. I've just checked and the following really does work: void __attribute__((weak,constructor)) myFunc() {}.

ncw updated this revision to Diff 133222.Feb 7 2018, 8:33 AM

Rebased, attended to review comments/renaming

sbc100 added inline comments.Feb 7 2018, 12:05 PM
lib/MC/WasmObjectWriter.cpp
1049

Can SymbolIndices[&WS] = NumSymbols++; be moved in the AddSymbol? That would seem logical to me.

1103

I don't like the term WasmIndex here. Can we just revert this or come up with a better name? Looks like its actially used as the Export.Index below, so maybe ExportIndex? Just reverting to Index seems reasonable though.

1138–1141

Revert this indentation change? Seems like continue; is still what we want.

ncw updated this revision to Diff 133305.Feb 7 2018, 2:21 PM

Updated; renamed WasmIndex -> Index, folder assignment into AddSymbol helper

ncw abandoned this revision.Feb 14 2018, 5:32 AM