This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] COMDAT: LLD support
ClosedPublic

Authored by ncw on Dec 5 2017, 10:34 AM.

Details

Summary

See https://bugs.llvm.org/show_bug.cgi?id=35533, and D40844

Things covered:

  • Removing duplicate data segments (as determined by COMDATs emitted by the frontend)
  • Removing duplicate globals and functions in COMDATs
  • Checking that each time a COMDAT is seen it has the same symbols as at other times (ie it's a stronger check than simply giving all the symbols in the COMDAT weak linkage)

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

ncw created this revision.Dec 5 2017, 10:34 AM
ruiu added a subscriber: ruiu.Dec 6 2017, 4:40 PM

I just skimmed through the code, but it looks like what this code does is different from ELF. Is there a reason to be different?

sbc100 edited edge metadata.Dec 7 2017, 5:12 PM

I create a patch based partially on this one that just does the compression of the function table. https://reviews.llvm.org/D40989.
Hopefully that means this can be rebased into a smaller change.

ncw updated this revision to Diff 126342.Dec 11 2017, 4:55 AM
ncw edited the summary of this revision. (Show Details)

Updates:

  • Added test
  • Rebased on top of Sam's table deduplication
  • Added name-to-segment map to avoid N^2 lookup when deduplicating N inline/Comdat data segments

I think it's an OK start on the problem.

Having looked more at what LDD does with ELF, it's hard to copy what it's doing. I think in spirit the basic process is identical, it's just that relocations and code/data segments are rather different in Wasm.

I agree it would be nice to move towards the ELF linker, by adding section GC etc, but that's maybe out-of-scope for what I'm trying to achieve here.

ncw updated this revision to Diff 127355.Dec 18 2017, 7:03 AM

Rebased and simplified using Sam's latest tweaks to InputSegment handling.

ncw updated this revision to Diff 127540.Dec 19 2017, 8:54 AM

I've now split some of this out into D41390.

The remaining changes here are now very short! It's pretty good value for its line count, hopefully short and inoffensive.

I'm not doing anything to it right now, apart from waiting.

I think it's now much closer to the way the ELF linker does it, which should alleviate concerns.

ncw updated this revision to Diff 127731.Dec 20 2017, 8:12 AM
ncw retitled this revision from Wasm COMDAT: LLD support to [WebAssembly] COMDAT: LLD support.
ncw edited the summary of this revision. (Show Details)

Rebased on top of Sam's refactoring work...

APPLY AFTER:

  • D41315 (Sam's commit "Output functions individually")
  • D41419 (Sam's commit "Add InputChunk base class")
  • D41390 (My commit, "Don't write out discarded symbols")

Also added support for "relocatable" output from LLD, which should also include the Comdats merged from all the input files.

sbc100 added a comment.Jan 9 2018, 5:23 PM

Could you rebase this? Does this really depend D41390

ncw updated this revision to Diff 129267.Jan 10 2018, 6:17 AM

Rebased as requested.

Does this depend on D41390?

I did it this way round, because I thought D41390 was the less controversial, and hence would be applied first. I wanted to add the Discarded flag and behaviour, without having to wait for discussion of the COMDAT to grind on.

The COMDAT support here does depend on the Discarded flag added in D41390, so I think the order does make sense.

If feels like there should be a better way implement D41390 . Do you think we could detagle the two CLs such that this one can land sooner?

sbc100 added inline comments.Jan 10 2018, 11:52 AM
wasm/InputFiles.cpp
281

Can we check the comdats when the segment or function is first created to avoid two loops?

Maybe we could then avoid even adding the to the Functions or Segments list at all and just discard them?

ncw updated this revision to Diff 129430.Jan 11 2018, 4:22 AM
ncw marked an inline comment as done.
ncw edited the summary of this revision. (Show Details)

Changes:

  • Disentangled from D41390
  • Changed to assume that all Functions/InputSegments are covered by Symbols. The "discardable" entities are the function and segment definitions, not the Symbols referring to them, so previously I was iterating over the Functions/Segments to make sure they were discarded. We know though that the MCWasmWriter will never output a COMDAT Function/Segment without an accompanying Symbol, so the change is safe.

One thing I did like a bit about the previous implementation was that it made Comdat behave pretty much just like (non-Comdat) weak symbols; now the Comdat symbols are discarded earlier, which I guess makes sense.

wasm/InputFiles.cpp
281

It was harder before, but in the last fortnight of refactoring, the signatures all match up nicely now, making that easier :)

You're right that's now the best place for it.

I've moved the check above, to where we add the symbols to the SymbolTable.

ncw updated this revision to Diff 129598.Jan 12 2018, 3:31 AM

Simple rebase, to fix conflict with master after latest commits

Looks like we need one more rebase then this is good to go.

wasm/InputChunks.h
37

no need to move this I think?

wasm/InputFiles.cpp
173

I guess this could be a local lambda function since its just used in one function?

wasm/Writer.cpp
443

I think you can just use F->getOutputIndex() here and avoid NumFunctionImports and use a foreach loop instread

This revision was not accepted when it landed; it landed in state Needs Review.Jan 12 2018, 2:28 PM
This revision was not accepted when it landed; it landed in state Needs Review.
This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.
ncw added a comment.Jan 12 2018, 2:41 PM

Thank you for reviewing and landing! And also for the "setComdat/isComdat" fix in the Clang WasmObjectWriter, I missed a trick there.

I'm glad the community's able to accept patches so easily.

wasm/InputChunks.h
37

Oh, I just did it while I was at it...

I really don't like it when method don't match up! I like them to be in the same order in parent and child classes, and I like them to be in the same order in the header and code files, so I can navigate quickly. Just a habit.

It's public in both derived classes, and implemented right next to getSize(), so I felt it belongs here.

I don't mind.