This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] COMDAT: core LLVM support
ClosedPublic

Authored by ncw on Dec 5 2017, 10:28 AM.

Details

Summary

This adds COMDAT support to the LLVM frontend and Wasm object-file support.

For spec, see https://github.com/WebAssembly/tool-conventions/pull/31

For corresponding LLD change, see https://bugs.llvm.org/show_bug.cgi?id=35533, and D40845

Diff Detail

Repository
rL LLVM

Event Timeline

ncw created this revision.Dec 5 2017, 10:28 AM
ncw edited the summary of this revision. (Show Details)
sbc100 edited edge metadata.Dec 5 2017, 11:01 AM

Thanks for doing this. Looks like a good direction!

include/llvm/BinaryFormat/Wasm.h
79 ↗(On Diff #125570)

Why a new struct here but not for Function and Segment?

If we decide a new struct is needed, we have tended to prefer composition over inheritance for this kind of thing.

218 ↗(On Diff #125570)

Give that we already have symbol and segment info, and that functions and globals are both symbols, would it make sense to add the COMDAT to the existing segment / symbol info? I'm don't feel strongly about this, just wondering what you think?

lib/MC/WasmObjectWriter.cpp
148 ↗(On Diff #125570)

Should this maybe go in BinaryFormat/Wasm.h since its part of the binary format. Then the reader could use it too?

And maybe another struct type which is a Name + a vector of WasmComdatEntry would be useful to represent a whole COMDAT declaration?

lib/Object/WasmObjectFile.cpp
413 ↗(On Diff #125570)

Read into a WasmComdatEntry maybe?

ncw added inline comments.Dec 5 2017, 1:08 PM
include/llvm/BinaryFormat/Wasm.h
79 ↗(On Diff #125570)

I knew this would come up! It's a hack/workaround. I would have preferred to add the member to WasmGlobal, but WasmGlobal is used inside a union (in WasmImport) and StringRef is non-POD so can't go in a union.

I'll fix it up one way or another, WasmGlobalDecl is a (small) bodge just to get it working. Possible solutions: use a Variant instead of a union in WasmImport; or, review the hierarchy of types and come up with a cleaner representation of the Wasm data. (I just need an hour to draw some diagrams and read the existing code to understand where the Comdat field belongs.)

218 ↗(On Diff #125570)

I mainly did it this way for size. C++ function names can be hundreds of characters long, so you don't want to use the name very often in the object file.

ELF in fact does something very sneaky, and in the COMDAT group directly references the symbol name in the symbol table, for ultimate compactness.

If we put the long (string) COMDAT identifier next to each segment/symbol that would be quite a bit of bloat. We could have a list of COMDAT names separately, and then in the symbol data have an integer index into the COMDAT list, but I'm not sure that's any tidier than this approach.

lib/MC/WasmObjectWriter.cpp
148 ↗(On Diff #125570)

Good idea.

ncw updated this revision to Diff 126165.Dec 8 2017, 9:07 AM
ncw added a reviewer: ruiu.

OK, I've cleaned this up, and I think it's ready to go.

  • Firstly, note that this can be merged before LLD gains COMDAT support (ie - this doesn't need corresponding changes in LLD to keep working)
  • I've made a matching PR in the tool-conventions repo, https://github.com/WebAssembly/tool-conventions/pull/31
  • Added test coverage for each bit

Now, the contentious bit - is it actually the right design to be using ELF-style COMDAT?

Well, I think it _maybe_ is, and it's better than nothing. I think we basically need to just keep on improving LLD, and as we go we'll add more flags and bits and pieces as needed. LLD linking is a bit different to x86 linking, certainly when we start doing more clever dead code removal, and I don't want to anticipate that too much.

I think there's value in merging a simple implementation soon, and improving it.

Anything more fancy than the COMDAT support I've done here, would require a correspondingly more fancy/complicated initial review on the LLD side.

ncw updated this revision to Diff 126324.Dec 11 2017, 2:33 AM
ncw edited the summary of this revision. (Show Details)
ncw added a reviewer: sunfish.

Update - added "flags" field and removed explicit Comdat for globals, following Dan's comments on the GitHub PR.

ncw marked 2 inline comments as done.Dec 11 2017, 2:34 AM
ncw added inline comments.
lib/MC/WasmObjectWriter.cpp
148 ↗(On Diff #125570)

I've decided to keep it this way, after having a fiddle to see what it would look like.

Firstly, I'm just storing the COMDAT information in the same way as existing linking data. We don't have a struct for the "names" custom section, and when we read them in, we attach the names directly to the objects.

I actually like the way it's represented. Conceptually, the COMDAT name *is* attached to the function/data-segment. The linking section is just a way to attach extra information to the standard Wasm sections, and I think it makes sense to model it that way.

Based on the discussion in the bug (https://bugs.llvm.org/show_bug.cgi?id=35533) it sounds like we need to take a look at SHF_LINK_ORDER and how that works. It sounds like its different to the "group" concept. But if we do decide groups are the way to go this change lgtm.

ncw updated this revision to Diff 127341.Dec 18 2017, 5:33 AM

Changes:

  • Rebased
  • Added yaml2obj support to match obj2yaml, tweaked the YAML display for neatness
ncw added a comment.Dec 18 2017, 8:49 AM

Based on the discussion in the bug (https://bugs.llvm.org/show_bug.cgi?id=35533) it sounds like we need to take a look at SHF_LINK_ORDER and how that works. It sounds like its different to the "group" concept. But if we do decide groups are the way to go this change lgtm.

I've added a looong comment to that bug. SHF_LINK_ORDER really is very very similar to "Comdat groups" - the only differences are 1) ELF historical encoding stuff, and 2) SHF_LINK_ORDER creates "anonymous" linker groupings rather than named Comdat groups.

I basically can't see what all the fuss is about! Comdat "groupings" are what we want, and all the alternatives we're discussing provide the same semantic. It's just arguing about the name and the details? Maybe if you've been working on this for a while though the details seem bigger.

I can't see a major functional difference between what I've implemented here, and what SHF_LINK_ORDER would provide. I might be missing something though....

sbc100 added inline comments.Dec 19 2017, 9:42 PM
include/llvm/Object/Wasm.h
144 ↗(On Diff #127341)

I'd rather expose an ArrayRef here (I think you can use a local function-local set() to detect duplicate entries while parsing). For consistency and so that comdats appear in order.

lib/Object/WasmObjectFile.cpp
438 ↗(On Diff #127341)

while (ComdatCount--)?

449 ↗(On Diff #127341)

while (EntryCount--)

468 ↗(On Diff #127341)

We now have isValidFunctionIndex(Index) for this.

lib/ObjectYAML/WasmYAML.cpp
65 ↗(On Diff #127341)

std::map funny to me here. This this its because I worry about iteration order being consistent. I'd want the comdats to be displayed in the order they appear in the input file, not alphabetically.

How about std::vector<Comdat> where Comdat is a struct with a name + entries?

ncw updated this revision to Diff 127684.Dec 20 2017, 4:50 AM
ncw marked 5 inline comments as done.

Updated:

  • Responded to Sam's comments
  • Fixed a minor issue with the function indices
  • Fixed a bug in my addition to yaml2wasm
include/llvm/Object/Wasm.h
144 ↗(On Diff #127341)

Good point, done.

lib/ObjectYAML/WasmYAML.cpp
65 ↗(On Diff #127341)

OK, for YAML it's using a vector now. That makes sense, thanks.

ncw retitled this revision from Wasm COMDAT: core LLVM support to [WebAssembly] COMDAT: core LLVM support.Dec 20 2017, 8:08 AM

I'm behind on some of the discussions on this topic at the moment, but overall, this looks like it's headed in a good direction, so I'm in favor of this patch.

sbc100 accepted this revision.Jan 5 2018, 2:25 AM
This revision is now accepted and ready to land.Jan 5 2018, 2:25 AM

Would you mind re-basing this, and then I'll land it.

ncw updated this revision to Diff 129128.Jan 9 2018, 11:00 AM

Thanks!

Rebased.

NB. I haven't yet re-run the tests, this is an optimistic update - it takes a looong time to compile everything, and after a rebase pretty much everything needs to recompile. Just building the tests now.

ncw added a comment.Jan 9 2018, 11:45 AM

MC tests pass, but I've just noticed I missed one in CodeGen/WebAssembly/comdat.ll. I won't be able to fix it today, sorry..

This revision was automatically updated to reflect the committed changes.
sbc100 added inline comments.Jan 9 2018, 4:12 PM
tools/obj2yaml/wasm2yaml.cpp
52 ↗(On Diff #127684)

I ended up exposing the Index directly in the WasmFunction struct to avoid the need for this.

When testing with lld I noticed that the comdat group names are appearing as undefined symbols in the final output.

See: https://reviews.llvm.org/D41959

I this case it results in unresolved symbol: sharedComdat

Can you think of a way to avoid including these?

llvm/trunk/lib/ObjectYAML/WasmYAML.cpp