This is an archive of the discontinued LLVM Phabricator instance.

wasm: Add a flag to control merging data segments
ClosedPublic

Authored by fitzgen on May 3 2018, 7:12 PM.

Details

Summary

Merging data segments produces smaller code sizes because each segment has some
boilerplate. Therefore, merging data segments is generally the right approach,
especially with wasm where binaries are typically delivered over the network.

However, when analyzing wasm binaries, it can be helpful to get a conservative
picture of which functions are using which data segments[0]. Perhaps there is a
large data segment that you didn't expect to be included in the wasm, introduced
by some library you're using, and you'd like to know which library it was. In
this scenario, merging data segments only makes the analysis worse.

Alternatively, perhaps you will remove some dead functions by-hand[1] that can't
be statically proven dead by the compiler or lld, and removing these functions
might make some data garbage collect-able, and you'd like to run --gc-sections
again so that this now-unused data can be collected. If the segments were
originally merged, then a single use of the merged data segment will entrench
all of the data.

[0] https://github.com/rustwasm/twiggy
[1] https://github.com/fitzgen/wasm-snip

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

fitzgen created this revision.May 3 2018, 7:12 PM

I don't know how to write a test for lld -- would appreciate some guidance doing that!

I do have a little test case I have been using:

This is a.c:

char A_HELLO[] = "hello";
char A_GOODBYE[] = "goodbye";
char A_WHATEVER[] = "whatever";
int N_ITERATIONS = 42;

This is b.c:

// b.c
#define WASM_EXPORT __attribute__((visibility("default")))

extern char* A_HELLO;
extern char* A_GOODBYE;
extern char* A_WHATEVER;
extern int N_ITERATIONS;

extern void print(char*);

WASM_EXPORT
void printem() {
    for (int i = 0; i < N_ITERATIONS; i++) {
        print(A_HELLO);
        print(A_GOODBYE);
        print(A_WHATEVER);
    }
}

To build and test:

~/llvm/obj/bin/clang --target=wasm32-unknown-unknown-wasm -c a.c
~/llvm/obj/bin/clang --target=wasm32-unknown-unknown-wasm -c b.c

~/llvm/obj/bin/wasm-ld --allow-undefined -o merged-data-segments.wasm a.o b.o
~/llvm/obj/bin/wasm-ld --allow-undefined --no-merge-data-segments -o no-merged-data-segments.wasm a.o b.o

wasm-objdump -x merged-data-segments.wasm
wasm-objdump -x no-merged-data-segments.wasm

This is the dump for merged-data-segments.wasm:

merged-data-segments.wasm:	file format wasm 0x1

Section Details:

Type:
 - type[0] () -> nil
 - type[1] (i32) -> nil
Import:
 - func[0] sig=0 <_start> <- env._start
 - func[1] sig=1 <print> <- env.print
Function:
 - func[2] sig=0 <__wasm_call_ctors>
 - func[3] sig=0 <printem>
Table:
 - table[0] type=anyfunc initial=1 max=1
Memory:
 - memory[0] pages: initial=2
Global:
 - global[0] i32 mutable=1 - init i32=66592
 - global[1] i32 mutable=0 - init i32=66592
 - global[2] i32 mutable=0 - init i32=1052
Export:
 - memory[0] -> "memory"
 - global[1] -> "__heap_base"
 - global[2] -> "__data_end"
 - func[3] <printem> -> "printem"
Data:
 - segment[0] size=28 - init i32=1024
  - 0000400: 6865 6c6c 6f00 676f 6f64 6279 6500 7768  hello.goodbye.wh
  - 0000410: 6174 6576 6572 0000 2a00 0000            atever..*...
Custom:
 - name: "name"
 - func[0] _start
 - func[1] print
 - func[2] __wasm_call_ctors
 - func[3] printem

And this is the dump for no-merged-data-segments.wasm:

no-merged-data-segments.wasm:	file format wasm 0x1

Section Details:

Type:
 - type[0] () -> nil
 - type[1] (i32) -> nil
Import:
 - func[0] sig=0 <_start> <- env._start
 - func[1] sig=1 <print> <- env.print
Function:
 - func[2] sig=0 <__wasm_call_ctors>
 - func[3] sig=0 <printem>
Table:
 - table[0] type=anyfunc initial=1 max=1
Memory:
 - memory[0] pages: initial=2
Global:
 - global[0] i32 mutable=1 - init i32=66592
 - global[1] i32 mutable=0 - init i32=66592
 - global[2] i32 mutable=0 - init i32=1052
Export:
 - memory[0] -> "memory"
 - global[1] -> "__heap_base"
 - global[2] -> "__data_end"
 - func[3] <printem> -> "printem"
Data:
 - segment[0] size=6 - init i32=1024
  - 0000400: 6865 6c6c 6f00                           hello.
 - segment[1] size=8 - init i32=1030
  - 0000406: 676f 6f64 6279 6500                      goodbye.
 - segment[2] size=9 - init i32=1038
  - 000040e: 7768 6174 6576 6572 00                   whatever.
 - segment[3] size=4 - init i32=1048
  - 0000418: 2a00 0000                                *...
Custom:
 - name: "name"
 - func[0] _start
 - func[1] print
 - func[2] __wasm_call_ctors
 - func[3] printem
sbc100 added a comment.May 4 2018, 4:28 PM

Thanks for the patch! I'll take a look at create a test case or two.

sbc100 added a comment.May 4 2018, 4:51 PM

FYI the tests are pretty simple. They live in 'test/wasm' and can be run using llvm-lit. The command I normally run is ninja && ./bin/llvm-lit ../llvm/tools/lld/test

FYI the tests are pretty simple. They live in 'test/wasm' and can be run using llvm-lit. The command I normally run is ninja && ./bin/llvm-lit ../llvm/tools/lld/test

I will read up on llvm-lit and write up a test. Thanks!

sbc100 added inline comments.May 4 2018, 5:18 PM
wasm/Writer.cpp
861

I did some experimenting with this locally and it looks like the better solution here is to change the above line to if (!Config->MergeDataSegments).

Then you don't need the below change.

FYI the tests are pretty simple. They live in 'test/wasm' and can be run using llvm-lit. The command I normally run is ninja && ./bin/llvm-lit ../llvm/tools/lld/test

It looks like these tests might require llc, llvm-as, etc, but I can't figure out how to enable building them. I get this when I run the tests:

+ ./bin/llvm-lit /home/fitzgen/lld/test/wasm/
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llc in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-as in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-mc in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-nm in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-objdump in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-pdbutil in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-readelf in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find llvm-readobj in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find obj2yaml in /home/fitzgen/llvm/obj/./bin
llvm-lit: /home/fitzgen/llvm/utils/lit/lit/llvm/subst.py:126: note: Did not find yaml2obj in /home/fitzgen/llvm/obj/./bin
-- Testing: 42 tests, 42 threads --
FAIL: lld :: wasm/call-indirect.ll (1 of 42)
FAIL: lld :: wasm/export-table.test (2 of 42)
FAIL: lld :: wasm/archive.ll (3 of 42)
FAIL: lld :: wasm/custom-sections.ll (4 of 42)
FAIL: lld :: wasm/alias.ll (5 of 42)
FAIL: lld :: wasm/comdats.ll (6 of 42)
FAIL: lld :: wasm/conflict.test (7 of 42)
FAIL: lld :: wasm/entry.ll (8 of 42)
FAIL: lld :: wasm/data-layout.ll (9 of 42)
FAIL: lld :: wasm/cxx-mangling.ll (10 of 42)
FAIL: lld :: wasm/function-imports-first.ll (11 of 42)
FAIL: lld :: wasm/demangle.ll (12 of 42)
FAIL: lld :: wasm/driver.ll (13 of 42)
FAIL: lld :: wasm/invalid-stack-size.test (14 of 42)
FAIL: lld :: wasm/function-imports.ll (15 of 42)
FAIL: lld :: wasm/function-index.test (16 of 42)
FAIL: lld :: wasm/gc-sections.ll (17 of 42)
FAIL: lld :: wasm/import-memory.test (18 of 42)
FAIL: lld :: wasm/export.ll (19 of 42)
FAIL: lld :: wasm/strip-debug.test (20 of 42)
FAIL: lld :: wasm/symbol-type-mismatch.ll (21 of 42)
FAIL: lld :: wasm/reloc-addend.ll (22 of 42)
FAIL: lld :: wasm/locals-duplicate.test (23 of 42)
FAIL: lld :: wasm/undefined-entry.test (24 of 42)
FAIL: lld :: wasm/import-table.test (25 of 42)
FAIL: lld :: wasm/init-fini.ll (26 of 42)
FAIL: lld :: wasm/load-undefined.test (27 of 42)
FAIL: lld :: wasm/gc-imports.ll (28 of 42)
FAIL: lld :: wasm/visibility-hidden.ll (29 of 42)
FAIL: lld :: wasm/relocatable.ll (30 of 42)
FAIL: lld :: wasm/local-symbols.ll (31 of 42)
FAIL: lld :: wasm/weak-symbols.ll (32 of 42)
FAIL: lld :: wasm/undefined-weak-call.ll (33 of 42)
FAIL: lld :: wasm/weak-undefined.ll (34 of 42)
FAIL: lld :: wasm/many-functions.ll (35 of 42)
FAIL: lld :: wasm/version.ll (36 of 42)
FAIL: lld :: wasm/undefined.ll (37 of 42)
FAIL: lld :: wasm/signature-mismatch.ll (38 of 42)
FAIL: lld :: wasm/weak-alias-overide.ll (39 of 42)
FAIL: lld :: wasm/stack-pointer.ll (40 of 42)
FAIL: lld :: wasm/weak-alias.ll (41 of 42)
FAIL: lld :: wasm/signature-mismatch-weak.ll (42 of 42)
Testing Time: 0.14s
********************
Failing Tests (42):
    lld :: wasm/alias.ll
    lld :: wasm/archive.ll
    lld :: wasm/call-indirect.ll
    lld :: wasm/comdats.ll
    lld :: wasm/conflict.test
    lld :: wasm/custom-sections.ll
    lld :: wasm/cxx-mangling.ll
    lld :: wasm/data-layout.ll
    lld :: wasm/demangle.ll
    lld :: wasm/driver.ll
    lld :: wasm/entry.ll
    lld :: wasm/export-table.test
    lld :: wasm/export.ll
    lld :: wasm/function-imports-first.ll
    lld :: wasm/function-imports.ll
    lld :: wasm/function-index.test
    lld :: wasm/gc-imports.ll
    lld :: wasm/gc-sections.ll
    lld :: wasm/import-memory.test
    lld :: wasm/import-table.test
    lld :: wasm/init-fini.ll
    lld :: wasm/invalid-stack-size.test
    lld :: wasm/load-undefined.test
    lld :: wasm/local-symbols.ll
    lld :: wasm/locals-duplicate.test
    lld :: wasm/many-functions.ll
    lld :: wasm/reloc-addend.ll
    lld :: wasm/relocatable.ll
    lld :: wasm/signature-mismatch-weak.ll
    lld :: wasm/signature-mismatch.ll
    lld :: wasm/stack-pointer.ll
    lld :: wasm/strip-debug.test
    lld :: wasm/symbol-type-mismatch.ll
    lld :: wasm/undefined-entry.test
    lld :: wasm/undefined-weak-call.ll
    lld :: wasm/undefined.ll
    lld :: wasm/version.ll
    lld :: wasm/visibility-hidden.ll
    lld :: wasm/weak-alias-overide.ll
    lld :: wasm/weak-alias.ll
    lld :: wasm/weak-symbols.ll
    lld :: wasm/weak-undefined.ll

  Unexpected Failures: 42

Is there some documentation for building those tools you can point me at?

fitzgen updated this revision to Diff 145522.May 7 2018, 1:11 PM
fitzgen edited the summary of this revision. (Show Details)
fitzgen marked an inline comment as done.

FYI the tests are pretty simple. They live in 'test/wasm' and can be run using llvm-lit. The command I normally run is ninja && ./bin/llvm-lit ../llvm/tools/lld/test

It looks like these tests might require llc, llvm-as, etc, but I can't figure out how to enable building them. I get this when I run the tests:

Fixed by running ninja check-lld instead of llvm-lit directly. All the current tests are passing now.

fitzgen updated this revision to Diff 145538.May 7 2018, 1:41 PM

Ok, I've added two tests: one for when we do --no-merge-data-segments and one for when we don't.

sbc100 added a comment.May 7 2018, 2:05 PM

Change itself good now.

I think we can simplify/improve the new tests though.

test/wasm/merge-data-segments.test
4

You can do the no merge case in the same file with something like this:

# wasm-ld --allow-undefined --no-merge-data-segments -o %t.nomerge.wasm %t.data-segments-a.o %t.data-segments-b.o
​# RUN: obj2yaml %t.wasm | FileCheck %s -check-prefix=NO-MERGE

Also I think we can avoid adding new input file, and just put the bitcode here in the test file. You'll need to rename this to a ".ll" file and use ";" for comments.

Also, you can make the bitcode itself minimal? Just declaring the data should be enough no need to for any code I think.

fitzgen updated this revision to Diff 145797.May 8 2018, 3:17 PM

This iteration has a single .ll test file that checks both merging and separate data segments.

fitzgen marked an inline comment as done.May 8 2018, 3:17 PM
sbc100 accepted this revision.May 8 2018, 4:39 PM

Nice! Much cleaner now.

test/wasm/data-segment-merging.ll
6 ↗(On Diff #145797)

Maybe just call these @a, @b, @c, @d?

This revision is now accepted and ready to land.May 8 2018, 4:39 PM
fitzgen updated this revision to Diff 145827.May 8 2018, 5:33 PM

The test data is now named @a, @b, @c, @d.

fitzgen marked an inline comment as done.May 8 2018, 5:34 PM

Hi Sam,

Assuming everything is ready, can you commit this for me? I don't have commit access.

Thanks!

This revision was automatically updated to reflect the committed changes.