Download Raw Diff

Details

Reviewers

Commits

rG668447677a8b: [WebAssembly] Add a flag to control merging data segments
rLLD332013: [WebAssembly] Add a flag to control merging data segments
rL332013: [WebAssembly] Add a flag to control merging data segments

Summary

Merging data segments produces smaller code sizes because each segment has some
boilerplate. Therefore, merging data segments is generally the right approach,
especially with wasm where binaries are typically delivered over the network.

However, when analyzing wasm binaries, it can be helpful to get a conservative
picture of which functions are using which data segments[0]. Perhaps there is a
large data segment that you didn't expect to be included in the wasm, introduced
by some library you're using, and you'd like to know which library it was. In
this scenario, merging data segments only makes the analysis worse.

Alternatively, perhaps you will remove some dead functions by-hand[1] that can't
be statically proven dead by the compiler or lld, and removing these functions
might make some data garbage collect-able, and you'd like to run --gc-sections
again so that this now-unused data can be collected. If the segments were
originally merged, then a single use of the merged data segment will entrench
all of the data.

[0] https://github.com/rustwasm/twiggy
[1] https://github.com/fitzgen/wasm-snip

Diff Detail

Repository: rL LLVM

Event Timeline

fitzgen created this revision.May 3 2018, 7:12 PM

Herald added subscribers: llvm-commits, sunfish, JDevlieghere and 2 others. · View Herald TranscriptMay 3 2018, 7:12 PM

I don't know how to write a test for lld -- would appreciate some guidance doing that!

I do have a little test case I have been using:

This is a.c:

char A_HELLO[] = "hello";
char A_GOODBYE[] = "goodbye";
char A_WHATEVER[] = "whatever";
int N_ITERATIONS = 42;

This is b.c:

// b.c
#define WASM_EXPORT __attribute__((visibility("default")))

extern char* A_HELLO;
extern char* A_GOODBYE;
extern char* A_WHATEVER;
extern int N_ITERATIONS;

extern void print(char*);

WASM_EXPORT
void printem() {
    for (int i = 0; i < N_ITERATIONS; i++) {
        print(A_HELLO);
        print(A_GOODBYE);
        print(A_WHATEVER);
    }
}

To build and test:

~/llvm/obj/bin/clang --target=wasm32-unknown-unknown-wasm -c a.c
~/llvm/obj/bin/clang --target=wasm32-unknown-unknown-wasm -c b.c

~/llvm/obj/bin/wasm-ld --allow-undefined -o merged-data-segments.wasm a.o b.o
~/llvm/obj/bin/wasm-ld --allow-undefined --no-merge-data-segments -o no-merged-data-segments.wasm a.o b.o

wasm-objdump -x merged-data-segments.wasm
wasm-objdump -x no-merged-data-segments.wasm

This is the dump for merged-data-segments.wasm:

merged-data-segments.wasm:	file format wasm 0x1

Section Details:

Type:
 - type[0] () -> nil
 - type[1] (i32) -> nil
Import:
 - func[0] sig=0 <_start> <- env._start
 - func[1] sig=1 <print> <- env.print
Function:
 - func[2] sig=0 <__wasm_call_ctors>
 - func[3] sig=0 <printem>
Table:
 - table[0] type=anyfunc initial=1 max=1
Memory:
 - memory[0] pages: initial=2
Global:
 - global[0] i32 mutable=1 - init i32=66592
 - global[1] i32 mutable=0 - init i32=66592
 - global[2] i32 mutable=0 - init i32=1052
Export:
 - memory[0] -> "memory"
 - global[1] -> "__heap_base"
 - global[2] -> "__data_end"
 - func[3] <printem> -> "printem"
Data:
 - segment[0] size=28 - init i32=1024
  - 0000400: 6865 6c6c 6f00 676f 6f64 6279 6500 7768  hello.goodbye.wh
  - 0000410: 6174 6576 6572 0000 2a00 0000            atever..*...
Custom:
 - name: "name"
 - func[0] _start
 - func[1] print
 - func[2] __wasm_call_ctors
 - func[3] printem

And this is the dump for no-merged-data-segments.wasm:

no-merged-data-segments.wasm:	file format wasm 0x1

Section Details:

Type:
 - type[0] () -> nil
 - type[1] (i32) -> nil
Import:
 - func[0] sig=0 <_start> <- env._start
 - func[1] sig=1 <print> <- env.print
Function:
 - func[2] sig=0 <__wasm_call_ctors>
 - func[3] sig=0 <printem>
Table:
 - table[0] type=anyfunc initial=1 max=1
Memory:
 - memory[0] pages: initial=2
Global:
 - global[0] i32 mutable=1 - init i32=66592
 - global[1] i32 mutable=0 - init i32=66592
 - global[2] i32 mutable=0 - init i32=1052
Export:
 - memory[0] -> "memory"
 - global[1] -> "__heap_base"
 - global[2] -> "__data_end"
 - func[3] <printem> -> "printem"
Data:
 - segment[0] size=6 - init i32=1024
  - 0000400: 6865 6c6c 6f00                           hello.
 - segment[1] size=8 - init i32=1030
  - 0000406: 676f 6f64 6279 6500                      goodbye.
 - segment[2] size=9 - init i32=1038
  - 000040e: 7768 6174 6576 6572 00                   whatever.
 - segment[3] size=4 - init i32=1048
  - 0000418: 2a00 0000                                *...
Custom:
 - name: "name"
 - func[0] _start
 - func[1] print
 - func[2] __wasm_call_ctors
 - func[3] printem

Thanks for the patch! I'll take a look at create a test case or two.

FYI the tests are pretty simple. They live in 'test/wasm' and can be run using llvm-lit. The command I normally run is ninja && ./bin/llvm-lit ../llvm/tools/lld/test

In D46417#1088769, @sbc100 wrote:

FYI the tests are pretty simple. They live in 'test/wasm' and can be run using llvm-lit. The command I normally run is ninja && ./bin/llvm-lit ../llvm/tools/lld/test

I will read up on llvm-lit and write up a test. Thanks!

sbc100 added inline comments.May 4 2018, 5:18 PM

wasm/Writer.cpp
861 ↗	(On Diff #145136)	I did some experimenting with this locally and it looks like the better solution here is to change the above line to `if (!Config->MergeDataSegments)`. Then you don't need the below change.

In D46417#1088769, @sbc100 wrote:

FYI the tests are pretty simple. They live in 'test/wasm' and can be run using llvm-lit. The command I normally run is ninja && ./bin/llvm-lit ../llvm/tools/lld/test

It looks like these tests might require llc, llvm-as, etc, but I can't figure out how to enable building them. I get this when I run the tests:

+ ./bin/llvm-lit /home/fitzgen/lld/test/wasm/
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llc in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-as in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-mc in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-nm in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-objdump in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-pdbutil in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-readelf in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-readobj in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find obj2yaml in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find yaml2obj in /home/fitzgen/llvm/obj/./bin
-- Testing: 42 tests, 42 threads --
FAIL: lld :: wasm/call-indirect.ll (1 of 42)
FAIL: lld :: wasm/export-table.test (2 of 42)
FAIL: lld :: wasm/archive.ll (3 of 42)
FAIL: lld :: wasm/custom-sections.ll (4 of 42)
FAIL: lld :: wasm/alias.ll (5 of 42)
FAIL: lld :: wasm/comdats.ll (6 of 42)
FAIL: lld :: wasm/conflict.test (7 of 42)
FAIL: lld :: wasm/entry.ll (8 of 42)
FAIL: lld :: wasm/data-layout.ll (9 of 42)
FAIL: lld :: wasm/cxx-mangling.ll (10 of 42)
FAIL: lld :: wasm/function-imports-first.ll (11 of 42)
FAIL: lld :: wasm/demangle.ll (12 of 42)
FAIL: lld :: wasm/driver.ll (13 of 42)
FAIL: lld :: wasm/invalid-stack-size.test (14 of 42)
FAIL: lld :: wasm/function-imports.ll (15 of 42)
FAIL: lld :: wasm/function-index.test (16 of 42)
FAIL: lld :: wasm/gc-sections.ll (17 of 42)
FAIL: lld :: wasm/import-memory.test (18 of 42)
FAIL: lld :: wasm/export.ll (19 of 42)
FAIL: lld :: wasm/strip-debug.test (20 of 42)
FAIL: lld :: wasm/symbol-type-mismatch.ll (21 of 42)
FAIL: lld :: wasm/reloc-addend.ll (22 of 42)
FAIL: lld :: wasm/locals-duplicate.test (23 of 42)
FAIL: lld :: wasm/undefined-entry.test (24 of 42)
FAIL: lld :: wasm/import-table.test (25 of 42)
FAIL: lld :: wasm/init-fini.ll (26 of 42)
FAIL: lld :: wasm/load-undefined.test (27 of 42)
FAIL: lld :: wasm/gc-imports.ll (28 of 42)
FAIL: lld :: wasm/visibility-hidden.ll (29 of 42)
FAIL: lld :: wasm/relocatable.ll (30 of 42)
FAIL: lld :: wasm/local-symbols.ll (31 of 42)
FAIL: lld :: wasm/weak-symbols.ll (32 of 42)
FAIL: lld :: wasm/undefined-weak-call.ll (33 of 42)
FAIL: lld :: wasm/weak-undefined.ll (34 of 42)
FAIL: lld :: wasm/many-functions.ll (35 of 42)
FAIL: lld :: wasm/version.ll (36 of 42)
FAIL: lld :: wasm/undefined.ll (37 of 42)
FAIL: lld :: wasm/signature-mismatch.ll (38 of 42)
FAIL: lld :: wasm/weak-alias-overide.ll (39 of 42)
FAIL: lld :: wasm/stack-pointer.ll (40 of 42)
FAIL: lld :: wasm/weak-alias.ll (41 of 42)
FAIL: lld :: wasm/signature-mismatch-weak.ll (42 of 42)
Testing Time: 0.14s
********************
Failing Tests (42):
    lld :: wasm/alias.ll
    lld :: wasm/archive.ll
    lld :: wasm/call-indirect.ll
    lld :: wasm/comdats.ll
    lld :: wasm/conflict.test
    lld :: wasm/custom-sections.ll
    lld :: wasm/cxx-mangling.ll
    lld :: wasm/data-layout.ll
    lld :: wasm/demangle.ll
    lld :: wasm/driver.ll
    lld :: wasm/entry.ll
    lld :: wasm/export-table.test
    lld :: wasm/export.ll
    lld :: wasm/function-imports-first.ll
    lld :: wasm/function-imports.ll
    lld :: wasm/function-index.test
    lld :: wasm/gc-imports.ll
    lld :: wasm/gc-sections.ll
    lld :: wasm/import-memory.test
    lld :: wasm/import-table.test
    lld :: wasm/init-fini.ll
    lld :: wasm/invalid-stack-size.test
    lld :: wasm/load-undefined.test
    lld :: wasm/local-symbols.ll
    lld :: wasm/locals-duplicate.test
    lld :: wasm/many-functions.ll
    lld :: wasm/reloc-addend.ll
    lld :: wasm/relocatable.ll
    lld :: wasm/signature-mismatch-weak.ll
    lld :: wasm/signature-mismatch.ll
    lld :: wasm/stack-pointer.ll
    lld :: wasm/strip-debug.test
    lld :: wasm/symbol-type-mismatch.ll
    lld :: wasm/undefined-entry.test
    lld :: wasm/undefined-weak-call.ll
    lld :: wasm/undefined.ll
    lld :: wasm/version.ll
    lld :: wasm/visibility-hidden.ll
    lld :: wasm/weak-alias-overide.ll
    lld :: wasm/weak-alias.ll
    lld :: wasm/weak-symbols.ll
    lld :: wasm/weak-undefined.ll

  Unexpected Failures: 42

Is there some documentation for building those tools you can point me at?

fitzgen updated this revision to Diff 145522.May 7 2018, 1:11 PM

fitzgen edited the summary of this revision. (Show Details)

fitzgen marked an inline comment as done.

In D46417#1090261, @fitzgen wrote:

In D46417#1088769, @sbc100 wrote:

FYI the tests are pretty simple. They live in 'test/wasm' and can be run using llvm-lit. The command I normally run is ninja && ./bin/llvm-lit ../llvm/tools/lld/test

It looks like these tests might require llc, llvm-as, etc, but I can't figure out how to enable building them. I get this when I run the tests:

Fixed by running ninja check-lld instead of llvm-lit directly. All the current tests are passing now.

Ok, I've added two tests: one for when we do --no-merge-data-segments and one for when we don't.

Change itself good now.

I think we can simplify/improve the new tests though.

test/wasm/merge-data-segments.test
4 ↗	(On Diff #145538)	You can do the no merge case in the same file with something like this: # wasm-ld --allow-undefined --no-merge-data-segments -o %t.nomerge.wasm %t.data-segments-a.o %t.data-segments-b.o # RUN: obj2yaml %t.wasm \| FileCheck %s -check-prefix=NO-MERGE Also I think we can avoid adding new input file, and just put the bitcode here in the test file. You'll need to rename this to a ".ll" file and use ";" for comments. Also, you can make the bitcode itself minimal? Just declaring the data should be enough no need to for any code I think.

This iteration has a single .ll test file that checks both merging and separate data segments.

fitzgen marked an inline comment as done.May 8 2018, 3:17 PM

Nice! Much cleaner now.

test/wasm/data-segment-merging.ll
6 ↗	(On Diff #145797)	Maybe just call these @a, @b, @c, @d?

This revision is now accepted and ready to land.May 8 2018, 4:39 PM

The test data is now named @a, @b, @c, @d.

Hi Sam,

Assuming everything is ready, can you commit this for me? I don't have commit access.

Thanks!

Closed by commit rL332013: [WebAssembly] Add a flag to control merging data segments (authored by sbc). · Explain WhyMay 10 2018, 11:27 AM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

wasm: Add a flag to control merging data segments
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 146169

lld/trunk/test/wasm/data-segment-merging.ll

lld/trunk/wasm/Config.h

lld/trunk/wasm/Driver.cpp

lld/trunk/wasm/Options.td

lld/trunk/wasm/Writer.cpp

This is an archive of the discontinued LLVM Phabricator instance.

wasm: Add a flag to control merging data segmentsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 146169

lld/trunk/test/wasm/data-segment-merging.ll

lld/trunk/wasm/Config.h

lld/trunk/wasm/Driver.cpp

lld/trunk/wasm/Options.td

lld/trunk/wasm/Writer.cpp

wasm: Add a flag to control merging data segments
ClosedPublic