This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add symbol table to LLVM, 2/2
AbandonedPublic

Authored by ncw on Jan 24 2018, 10:45 AM.

Details

Reviewers
sbc100
sunfish
Summary

This patch gets rid of the Globals! Now we have "true" Data symbols, as requested by Sam.

And, there's an explicit symbol table.

All tests pass (after some updates). Function bodies and relocations etc have not changed, which adds confidence that it's working.

First half of symbol table support: D41954

Must be committed along with the corresponding LLD change: D42585

Implements: https://github.com/WebAssembly/tool-conventions/pull/39

Diff Detail

Event Timeline

ncw created this revision.Jan 24 2018, 10:45 AM
ncw updated this revision to Diff 131483.Jan 25 2018, 10:52 AM
ncw retitled this revision from [WebAssembly] Symbol changes #7: add symtab, LLVM to [WebAssembly] Add symbol table to LLVM, 2/2.
ncw edited the summary of this revision. (Show Details)
ncw added reviewers: sbc100, sunfish.

I had a few hours this afternoon after my "real" work, and finished off the first implementation: all tests now passing!

(NB corresponding LLD changes not yet done.)

ncw updated this revision to Diff 131611.Jan 26 2018, 10:15 AM
ncw edited the summary of this revision. (Show Details)

Updated:

  • Added some more bits and pieces to truly make Wasm globals first-class objects
  • Removed some unused code for emitting globals from a .global_variables section, seems to be completely unused

This changeset is getting a little bit big, but it is actually rather hard to split up. There's not much that could be done independently in its own commit - not without making the intermediate bits non-functioning. Hmm. Removing the .global_variables section could be split out.

In any case, there six other reviews that need to go in before this one, so we'll maybe get onto the review of all this code next week.

sbc100 added inline comments.Jan 29 2018, 5:18 PM
include/llvm/BinaryFormat/Wasm.h
75

This separation of GlobalType looks good, but can perhaps be standalone change unrelated to symbols?

150

How about WasmDataReference or something like that? Since this isn't really symbol but a location a symbol can point to right?

151

Segment?

227

Can this be split out and added separately? (Or is it only relevant in this context?)

include/llvm/Object/Wasm.h
41

If these mirror the existing types then maybe just drop this enum completely?

57–58

I think we can drop SymbolIndex completely now can't we? The yaml can still include it just like it does for the other Index's but I don't think needs to be part of this data structure does it?

test/Object/obj2yaml.test
658

Can you just call this "Index" and have it come first, just like we do for the other indexes (i.e. Type index). It should always be contiguous and always start at zero but can still be useful in the yaml output I think.

Some initial comments. Looks good in general!

Although I like small changes, I'm not sure I want to land that part 1 of this separately. Lets see how this evolves and how they look when combined too.

lib/MC/WasmObjectWriter.cpp
575

Nice!

1210

If this only __stack_pointer right now, can we wait on adding support for defined globals? Can this, and WASM_COMDAT_GLOBAL itself be a separate change?

ncw added a comment.Jan 30 2018, 4:44 PM

Some initial comments. Looks good in general!

Although I like small changes, I'm not sure I want to land that part 1 of this separately. Lets see how this evolves and how they look when combined too.

Parts 1 & 2 should certainly be in different svn commits (I feel), because otherwise you'd have a single commit where basically every line of test code changes. In the current split, roughly half the test output changes in each commit, which is rather more reassuring. Are you suggesting actually squashing them into a single mega-review?

I can see your point though that Parts 1 & 2 might want to be merged in immediate succession. Is your reasoning simply that you don't want to break the binary format too often?

In practice by the time a couple of other chunks are split off it'll be harder to merge the patch queue in one blast, rather than incrementally. Part 1 is sensible on its own - all functionality works and tests pass.

There are a few smaller changes that we could clean up now while pondering the big ones: D42095, D42539, D42540.

include/llvm/BinaryFormat/Wasm.h
75

Good point, candidate for splitting.

227

If it were split out, it would have to come after the big symbol change refactor, since before that point globals are all fake. I guess that would be possible to do that, but it would be a fairly small post-commit. I can remove it, for now.

include/llvm/Object/Wasm.h
41

Possibly, yes, but it does mirror existing usage. The enums in BinaryFormat are enum : unsigned and are used for assigning values read out of a binary, and the enums in Object are enum class and are used for logical operations like switch statements where you want the strongest compiler warnings.

57–58

I could check - the concept of SymbolIndex is still very much used, but maybe the variable isn't? I'd be surprised, but if it is now obsolete I'll remove it.

lib/MC/WasmObjectWriter.cpp
991–992

Now this I will split out into another commit.

Do you know how this code got here? It's completely unused, and appears like it wouldn't even work. That's because we're emitting globals here that aren't related to any MCSymbols, so they don't generate relocations even in the existing code - and without a R_GLOBAL relocation how could this have ever worked after linking?

1210

Yes, this little bit of stub code for defined globals isn't necessary and could be pulled out into a later commit. Would make sense.

Note - although this looks like an un-split mega-diff, I do actually have several followup chunks already split out, but I just haven't made the reviews for them to avoid premature spam.

test/Object/obj2yaml.test
658

:( Drat, regenerate every test again.... OK, fair enough, it's worth getting it right while we're at it.

ncw updated this revision to Diff 132221.Jan 31 2018, 11:16 AM

Updated:

  • Rebased on top of latest work
  • Pulled out some chunks in separate reviews: D42750, and one to follow afterwards
sbc100 added inline comments.Jan 31 2018, 2:43 PM
test/MC/WebAssembly/array-fill.ll
18–21

Shouldn't this be SymbolTable now?

21

Lets put "Kind" before "Name". i.e. common fields coming first makes sense to me.

25

Just Offset and Size make sense to me. If you want to be specific then maybe SegmentOffset

test/MC/WebAssembly/unnamed-data.ll
66

How about calling this just "Segment". We could even list the name here if it has one, rather than the index?

Since we use -fdata-sections by default the Offset will almost alwasys be zero, so perhaps we can elide it when it is zero (save for Size).. to make the output less verbose?

test/ObjectYAML/wasm/linking_section.yaml
57

Strange, some tests seem to use use "Index" and others use "SymbolIndex" are some not up-to-date?

tools/yaml2obj/yaml2wasm.cpp
149

Wrapping in ifndef _NDEBUG is more explicit I think.

ncw updated this revision to Diff 132307.Jan 31 2018, 6:03 PM

Updated: addressed comments

ncw marked 5 inline comments as done.Jan 31 2018, 6:07 PM
ncw added inline comments.
test/MC/WebAssembly/unnamed-data.ll
66

As discussed, printing the segment name is more fiddly than it's worth given yaml2obj has to go the other way.

I've renamed DataSize -> Size, DataOffset -> Offset, and made Offset default to 0 (optional)

test/ObjectYAML/wasm/linking_section.yaml
57

Oops, just hadn't updated a couple

tools/yaml2obj/yaml2wasm.cpp
149

Hm, sorry, I forgot to do that. I just try and avoid ifdefs since they're ugly.

ncw updated this revision to Diff 132354.Feb 1 2018, 3:54 AM
ncw marked 2 inline comments as done.

Updated:

  • Fixed compile error after rebase
  • Added "Defined" flag to WasmObject, use it to simplify a few lines of code
sbc100 added a comment.Feb 6 2018, 7:29 PM

Looks like this needs a rebase. Part 1 seems to apply fine, but this part isn't applying on top of that currently.

include/llvm/BinaryFormat/Wasm.h
235

I think maybe we should expose the symtab struct type here, and have WasmObjectFile have a symtab() method for iterating them directly.

My preference is for all the the structures that map directly from the binary should be declared here in BinaryFormat/Wasm.h.

Then perhaps we can even from the existing WasmSymbol from the other Wasm.h? Or at least have it be a wrapper around the one declared here?

include/llvm/Object/Wasm.h
41

I don't think we need to duplicate this just for that reason. I think we should just remove this completely.

sbc100 added inline comments.Feb 6 2018, 7:33 PM
include/llvm/BinaryFormat/Wasm.h
233

[Not really part of this change but...] Should we add a separate type of BSS here? Or can BSS symbols be expresses as DATA with no segment?

Should these be WASM_SYMTYPE_XXX or WASM_SYMBOL_XXX or maybe just WASM_SYM_XXX ?

ncw updated this revision to Diff 133229.Feb 7 2018, 8:58 AM
ncw marked 3 inline comments as done.

Updated with rebase and comments addressed

include/llvm/BinaryFormat/Wasm.h
233

Renamed to WASM_SYMBOL_TYPE_XXX - a bit verbose but matches conventions for the other enums.

Regarding BSS: I've been thinking/planning how to handle that; I can work on it after these patches if you want. What I had in mind was that the segment-metadata would gain the ability to describe BSS segments via a flag (currently no segment flags are defined), and for BSS segments there would be no corresponding Wasm segment. Currently we have Wasm segments mapping 1-to-1 to segment metadata; in the new scheme we'd have possibly more segment metadata than Wasm segments, and BSS segment metadata wouldn't have a corresponding Wasm segment.

235

Good idea, makes sense. I've added it to the LinkingMetadata struct rather than have a new symtab() method on the WasmObjectFile.

In fact, I've got a future commit where I'll add the Comdats to the LinkingMetadata (rather than their location as a freestanding method on WasmObjectFile).

I've stripped the guts out of llvm::object::WasmSymbol, but I've kept it around for the time being. It currently contains some extra information that isn't stored directly in the binary format, so it makes sense to keep it - in the same way that llvm::object::WasmSegment and WasmSection wrap the underlying BinaryFormat structs.

include/llvm/Object/Wasm.h
41

Ok, removed.

ncw updated this revision to Diff 133306.Feb 7 2018, 2:24 PM

Updated: just a rebase to fix conflicts after variable renaming in patch 1/2

sbc100 added inline comments.Feb 7 2018, 3:32 PM
lib/MC/WasmObjectWriter.cpp
151

Isn't this now the same as WasmSymbolInfo? (i.e. can we remove this?

1017

The return value of getType() here needs to be WASM_SYMBOL_TYPE_*.. but its currently a MCSymboLWasm::SymbolType. Rather than casting here, can we just remove MCSymboLWasm::SymbolType completely?

ncw updated this revision to Diff 133388.Feb 8 2018, 2:28 AM

Updated: Removed redundant WasmObjectWriter::WasmSymbolEntry and MCSymbolWasm::SymbolType, now that there is an appropriate named struct/enum for these in BinaryFormat/Wasm.h

ncw marked 8 inline comments as done.Feb 8 2018, 2:36 AM
ncw added inline comments.
include/llvm/BinaryFormat/Wasm.h
233

I've had a better plan for BSS: we should keep the one-to-one correspondence Wasm segments and linking segments. Instead, we should tweak the "segment metadata" to include a Size, and there should be a WASM_SEGMENT_BSS flag. For BSS segments, the Wasm elem will just be empty (rather than contain the zeros explicitly), and the segment's size in LLD shall be taken from the metadata Size; for non-BSS segments the metadata's Size much match the size of the Wasm elem. Thus we can skip long runs of zeros - but without needing yet another concept of "index".

sbc100 added inline comments.Feb 8 2018, 8:40 AM
include/llvm/BinaryFormat/Wasm.h
233

Sounds good to me

sbc100 added inline comments.Feb 8 2018, 10:39 AM
lib/Object/WasmObjectFile.cpp
460

I think we should use readUint8 here.

I'm also not sure about the use of Optional in WasmSymbolInfo. I'd rather just explicit this byte directly, like we do with the HAS_MAX flag in WasmLimits.

ncw updated this revision to Diff 133531.Feb 8 2018, 4:14 PM

Updated: removed use of Optional in symbol struct, used union instead for consistency with other structs. Using existing "DEFINED" flag to indicate whether a segment is present.

ncw marked an inline comment as done.Feb 8 2018, 4:15 PM
ncw added inline comments.
lib/Object/WasmObjectFile.cpp
460

Done. I was able to re-use the existing DEFINED flag to keep the struct compact

ncw abandoned this revision.Feb 14 2018, 5:31 AM
ncw marked an inline comment as done.