This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Use DenseMapInfo traits from LLVM repo. NFC
ClosedPublic

Authored by ncw on Mar 6 2018, 6:26 AM.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

ncw created this revision.Mar 6 2018, 6:26 AM

I'll defer to @ruiu but seems good to me.

wasm/Writer.cpp
123

Interesting. Is this the preferred way to do this? Seems better I guess.

@ruiu, you know more about the llvm ADT stuff than me. Is this how it should have been from the beginning?

ncw added inline comments.Mar 6 2018, 2:26 PM
wasm/Writer.cpp
123

In fact - why are we needing to do this in LLD at all? Surely the == operators and DenseMap traits for these fundamental types should be in the LLVM repo (at the end of BinaryFormat/Wasm.h) rather than here in LLD?

There's another mostly-but-not-quite-identical copy of this code in WasmObjectWriter (WasmFunctionTypeDenseMapInfo). So moving it into Wasm.h would make sense to me.

Are core headers like BinaryFormat/Wasm.h "supposed" to define container-specific headers like DenseMap traits and equality operators, for the structures they define?

ruiu added a comment.Mar 6 2018, 2:29 PM

Yeah, I agree with Nicholas. Moving this piece of code to llvm makes sense to me.

sbc100 added a comment.Mar 6 2018, 2:35 PM

I'm not sure about putting this stuff in the BinaryFormat. To me it makes sense to keep those headers as mostly structures and enums. If its just a matter of having this in two places I don't think that a huge deal (although the one in llvm is missing a recent fix which argues against my point :)

ncw added a comment.Mar 6 2018, 2:52 PM

I'm not sure about putting this stuff in the BinaryFormat. To me it makes sense to keep those headers as mostly structures and enums. If its just a matter of having this in two places I don't think that a huge deal (although the one in llvm is missing a recent fix which argues against my point :)

Bikeshed time! Feel free to suggest any header you like, I'm not picky... Would it be allowed to give Wasm another header like BinaryFormat/Wasm{Util/Ops/MapTraits/...}.h? The DenseMap traits are a bit specific and maybe could earn a header of their own, but personally I feel utilities like operator== should really go alongside their structs, since they're so useful to have around.

sbc100 accepted this revision.Mar 8 2018, 8:37 AM

Having two copies seems fine for now.

wasm/WriterUtils.h
19–20

Do these need to be moved into the namespace? if so can you drop the extra llvm:: prefixes?

40

Should we just inline this and drop the operators overloads for signatures? Or are they used elsewhere? (In which case the comment neeeds updating I guess).

This revision is now accepted and ready to land.Mar 8 2018, 8:37 AM
ncw added a comment.Mar 8 2018, 9:05 AM

Thanks for the reviews - I've actually been deluged with "real" work and won't have any time at all to respond and land things today. Sorry for filing a batch of patches and then ignoring them...

Patches I have coming up - in my queue and just need to submit them:

  • Fiddling with COMDAT handling, to avoid the need to do a hashmap lookup per symbol
  • Improve MarkLive to allow pruning imports as well as defined functions
  • Output demangled names in the "name" section for linked output (Dan Gohman indicated that was the intended design, and it makes good sense actually)
  • Add support for thread-local variables

After that, all the patches I accumulated over Feb will be caught up!

Things I intend to look into after that:

  • Save some space by avoiding writing out zeros for BSS segments
  • ICF support (fold identical constant data sections together)
ncw updated this revision to Diff 137747.Mar 9 2018, 7:11 AM
ncw retitled this revision from [WebAssembly] Move WasmSignatureDenseMapInfo to header for reuse. NFC to [WebAssembly] Use DenseMapInfo traits from LLVM repo. NFC.
ncw edited the summary of this revision. (Show Details)

Updated: just for neatness, I think I may as well move them to LLVM, now we've started a discussion of it.

The structures themselves and DenseMapInfo come from there, and LLD shouldn't need to be defining much stuff in the llvm namespace.

Also addressed Sam's comments (oops, non-member operator== should go in the same namespace as their struct, you're right). What is the fix that's missing for the other DenseMap traits implementation? In WasmObjectWriter it has a slightly different way of handling the empty/tombstone values, is that what you mean?

This revision was automatically updated to reflect the committed changes.
ruiu added a comment.Mar 14 2018, 2:33 PM
  • ICF support (fold identical constant data sections together)

For ICF, I'd recommend you make an improvement to the compiler and the wasm object file format first. The current implementation of lld's ICF is in some sense too aggressive -- it merges functions when they are identical. But that's actually a violation of the C/C++ spec because if you take addresses of two different functions, they must be distinct. If your program depends on pointer uniqueness, lld's ICF break your program.

On the other hand, lld is conservative on merging data. Comparing two pointers for data is more common in programs, so we don't merge two read-only data by ICF even if their contents are identical. So there's a missed opportunity here.

What we really should do is to make an improvement to the compiler so that it emits the information as to whether an address is significant or not for each chunk, and use that information in the linker. In this way, we can optimize linker output for size the most.

  • ICF support (fold identical constant data sections together)

For ICF, I'd recommend you make an improvement to the compiler and the wasm object file format first. The current implementation of lld's ICF is in some sense too aggressive -- it merges functions when they are identical. But that's actually a violation of the C/C++ spec because if you take addresses of two different functions, they must be distinct. If your program depends on pointer uniqueness, lld's ICF break your program.

On the other hand, lld is conservative on merging data. Comparing two pointers for data is more common in programs, so we don't merge two read-only data by ICF even if their contents are identical. So there's a missed opportunity here.

What we really should do is to make an improvement to the compiler so that it emits the information as to whether an address is significant or not for each chunk, and use that information in the linker. In this way, we can optimize linker output for size the most.

Would including llvm's nnamed_addr/local_unnamed_addr attribute in the symbol info be right direction for this?

ncw added a comment.Mar 14 2018, 2:51 PM
  • ICF support (fold identical constant data sections together)

For ICF, I'd recommend you make an improvement to the compiler and the wasm object file format first. The current implementation of lld's ICF is in some sense too aggressive -- it merges functions when they are identical. But that's actually a violation of the C/C++ spec because if you take addresses of two different functions, they must be distinct. If your program depends on pointer uniqueness, lld's ICF break your program.

On the other hand, lld is conservative on merging data. Comparing two pointers for data is more common in programs, so we don't merge two read-only data by ICF even if their contents are identical. So there's a missed opportunity here.

What we really should do is to make an improvement to the compiler so that it emits the information as to whether an address is significant or not for each chunk, and use that information in the linker. In this way, we can optimize linker output for size the most.

Cool! Good points.

Interestingly, for Wasm we can merge functions without violating pointer uniqueness. That's because we can create two different pointers for the exact same function! Function pointers are indexes into the "table", and the table in turn is populated with real function addresses. Currently we give each function one and only one pointer (so two aliases to the same function get the same pointer, as required), but if we merge two functions (via some "OutputFunction" object) then we'd still de-duplicate the table entries based on the InputFunctions, allowing one OutputFunction to have two pointers, and address-taken relocations would get the right value for the symbol.

I'll think about data too, that's a good point about needing to look at the relocations to work out whether the data is mergable.