This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Output functions individually
ClosedPublic

Authored by sbc100 on Dec 15 2017, 4:53 PM.

Details

Summary

The code section is now written out one function
at a time rather than all the functions in a given
objects being serialized at once.

This change lays the groundwork for supporting
--gc-sections.

Event Timeline

sbc100 created this revision.Dec 15 2017, 4:53 PM
sbc100 added a reviewer: ruiu.Dec 15 2017, 4:54 PM
sbc100 updated this revision to Diff 127210.Dec 15 2017, 4:57 PM

remove debugging

sbc100 updated this revision to Diff 127211.Dec 15 2017, 4:58 PM
  • remove debugging
sbc100 updated this revision to Diff 127213.Dec 15 2017, 5:01 PM
  • add const
sbc100 updated this revision to Diff 127214.Dec 15 2017, 5:01 PM
  • remove debugging
sbc100 updated this revision to Diff 127215.Dec 15 2017, 5:02 PM
  • fix header
sbc100 added a reviewer: ncw.Dec 15 2017, 5:02 PM
ruiu added inline comments.Dec 15 2017, 5:58 PM
wasm/InputFiles.cpp
131

This is not a review comment to lld, but why don't you emit one relocation section for each function? IIUC, this code scans an entire relocation section each time you instantiate a function, so that's O(n*m) where n and m are the number of relocations and functions.

179

i -> I

i++ -> ++I

183

CopyRelocationsRange -> copyRelocationsRange

wasm/InputFiles.h
113–117

What is a relation between Segment and Function? Is Function a type of Segment?

sbc100 retitled this revision from [WebAssembly] Output function individually to [WebAssembly] Output functions individually.Dec 15 2017, 7:17 PM
ncw added a comment.Dec 16 2017, 7:00 AM

(This is towards https://bugs.llvm.org/show_bug.cgi?id=35534)

It looks like you're still emitting all functions, and the output hasn't actually changed? I haven't read it thoroughly yet, but from the fact that the tests haven't changed I deduce this patch doesn't do any pruning.

The next simple/logical thing to do is to handle weak symbols - when assigning the function indices, check for duplicates. Only functions that actually provide a Symbol should be kept, and the rest discarded (if the Symbol that they would have provided points to a different Function). That's a straightforward and quick check that's probably worth having, even if GC is later used to do even more vigorous pruning of unused code.

wasm/InputFiles.h
113–117

A "segment" is a data segment (a section containing data rather than text). When createDefined is used to make a function symbol, Function will not non-null and Segment null; and vice-versa when it's used for making symbols that refer to global variables, which have an associate segment containing the initial value.

sbc100 updated this revision to Diff 127252.Dec 16 2017, 11:50 AM
sbc100 marked 2 inline comments as done.
  • feedback
sbc100 added inline comments.Dec 16 2017, 11:50 AM
wasm/InputFiles.cpp
131

You are right we probably can/should do better here.

We tried to design the relocation system such that relocations would apply to the basic primitive of the object format: the wasm section.

However, the only two sections that contain relocations so far are the code and data sections, and we are not trying to split these up into GC'able chunks (a single segment for data and a single function for code). Going back to revist the relocations in the object file it would probably make a lot more sense at this point to one relocation section per function and per data segments. i.e. reloc.CODE.1 .. reloc.CODE.n rather than simply reloc.CODE. I've opened bug in our tool-conventions repo to discuss changing this: https://github.com/WebAssembly/tool-conventions/issues/32

wasm/InputFiles.h
113–117

In the wasm format the CODE section is made up of a sequence of functions (essentially blobs of wasm code). Whereas the DATA section is made up of a sequence of data segments. We could try to treat these to equally in the abstraction here, but unlike ELF they are represented in the same way.

sbc100 updated this revision to Diff 127254.Dec 16 2017, 11:53 AM
  • feedback
ncw accepted this revision.Dec 18 2017, 5:46 AM

Looks good, this definitely goes in the right direction and will be useful.

wasm/InputFiles.cpp
131

The linking tools convention shouldn't need to be changed - there's nothing stopping us just using a map when we read in the relocations in LLD. There's minimal advantage to storing the list of relocations per-section in the object file, which doesn't have to reflect whether LLD wants to use a map/vector for lookups.

Either way, I agree this commit doesn't need to be held up on that.

This revision is now accepted and ready to land.Dec 18 2017, 5:46 AM
ncw added a comment.Dec 18 2017, 10:52 AM

I've just been doing some testing, to evaluate the filesize from Wasm LLD.

I have here a binary that's part of our HTML5 product (the VNC Connect SDK, which includes an asm.js blob).

  • In asm.js uncompressed, unminified, the output is 12684717 (13MB), gzips to 2138009 (2.1MB)
  • LLD's current output is 1611558 (1.6MB)! Hooray.
  • Of current output, 50695 (50KB) is duplicated string data. Tiny actually! Calculated as difference between strings and strings | sort -u output.
  • Of current output, 87179 (85KB) is data from functions that are provided in multiple translation units (ie inline functions - where we can avoid outputting the duplicate copies). This was calculated with some hacky extra logging on my COMDAT branch.
  • Current output gzips to 465671 (450KB)

So, we're actually pretty small already, at least compared to asm.js from Emscripten. Pruning some duplicate data would be nice, but (to me) it's lower-priority than essential functionality, considering the output is surprisingly competitive with asm.js already.

The next steps for me are to continue debugging the Comdat support to the point where our full app runs successfully.

sbc100 added inline comments.Dec 18 2017, 11:09 AM
wasm/InputFiles.h
113–117

Sorry, I meant to say: they are *not* represented in the same way.

ncw added inline comments.Dec 19 2017, 4:07 AM
wasm/InputSegment.h
39

Would you mind reverting this change (pointer to ref) before merging? I'd like to add a member static InputSegment Discarded (see ELF InputSection) - but that's awkward if the members are forced to be non-null. Ditto for InputFunction.

I have a followup change that will unify the InputFunction and InputSegment, but I'd like to land this change as-is to keep reasonably sizes. @ruiu WDYT?

sbc100 updated this revision to Diff 127649.Dec 19 2017, 6:54 PM
  • remove two redundant methods
sbc100 updated this revision to Diff 127651.Dec 19 2017, 8:28 PM
  • remove unused function

ping @ruiu. I've got quick a few other pending changes stacked up on top of this one now. (sorry I know the holidays are coming up).

sbc100 updated this revision to Diff 127660.Dec 19 2017, 9:51 PM
  • remove unused 'using'
ncw added inline comments.Dec 20 2017, 5:06 AM
wasm/InputSegment.h
39

Actually - if we're switching to isDiscarded rather than InputSegment::Discarded, then the change from pointer to ref is fine. Sorry! I'll change the Comdat review to match.

sbc100 added a comment.Jan 4 2018, 3:19 PM

@ruiu PTAL. This change is needed before I can land https://reviews.llvm.org/D41426

wasm/InputFiles.h
113–117

I have a followup change that unifies these two with single base class

sbc100 updated this revision to Diff 128670.Jan 4 2018, 4:16 PM
  • check condition outside loop
sbc100 added a comment.Jan 8 2018, 3:43 PM

@ruiu: ping

(sorry for repeated ping, just that this change is blocking others)

ruiu added inline comments.Jan 8 2018, 3:59 PM
wasm/InputFunction.h
36

IIUC, since wasm functions don't exist in the regular virtual address space, it is always considered zero in size. Defining this function seems a bit pointless to me because functions don't have the notion of "size" at all in webassembly. Can you remove this function?

47

Not sure if I understand this, but is Offset a correct term for wasm functions? I think functions have indices, but they don't have output offsets in a normal sense (because they do not exist in the regular address space).

sbc100 updated this revision to Diff 128999.Jan 8 2018, 4:01 PM
  • rebase
sbc100 updated this revision to Diff 129120.Jan 9 2018, 10:22 AM
  • - feedback
sbc100 added inline comments.Jan 9 2018, 10:22 AM
wasm/InputFunction.h
36

removed getSize()

47

This is the offset of the bytes for this function withing the output CODE section.

ruiu accepted this revision.Jan 9 2018, 1:34 PM

LGTM

wasm/InputFiles.cpp
29–31

You are using namespace llvm::wasm just above this line.

This revision was automatically updated to reflect the committed changes.