This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add support for table linking to wasm-ld
ClosedPublic

Authored by wingo on Nov 20 2020, 8:02 AM.

Details

Summary

This patch adds support to wasm-ld for linking multiple table references together, in a manner similar to wasm globals. The indirect function table is synthesized as needed.

To manage the transitional period in which the compiler doesn't yet produce TABLE_NUMBER relocations and doesn't residualize table symbols, the linker will detect object files which have table imports or definitions, but no table symbols. In that case it will synthesize symbols for the defined and imported tables.

As a change, relocatable objects are now written with table symbols, which can cause symbol renumbering in some of the tests. If no object file requires an indirect function table, none will be written to the file. Note that for legacy ObjFile inputs, this test is conservative: as we don't have relocs for each use of the indirecy function table, we just assume that any incoming indirect function table should be propagated to the output.

Diff Detail

Event Timeline

wingo created this revision.Nov 20 2020, 8:02 AM
wingo requested review of this revision.Nov 20 2020, 8:02 AM
wingo planned changes to this revision.Nov 20 2020, 8:03 AM

Marking as revisions planned, as it's still a wip. LLD tests not passing yet. Just a savepoint before the weekend ~~

wingo retitled this revision from [WebAssembly] Linker detects need for function table via symbols to [WebAssembly] Add support for table linking to wasm-ld.Nov 20 2020, 8:07 AM
wingo edited the summary of this revision. (Show Details)
wingo updated this revision to Diff 307274.Nov 24 2020, 1:19 AM

Finish implementation, update tests

wingo updated this revision to Diff 307275.Nov 24 2020, 1:20 AM

Update summary

wingo edited the summary of this revision. (Show Details)Nov 24 2020, 1:22 AM
wingo updated this revision to Diff 307279.Nov 24 2020, 1:30 AM
  • Remove unused InputTable "synthetic" flag
wingo updated this revision to Diff 307994.Nov 27 2020, 2:18 AM

Fix a couple missing / bogus places where TABLE_NUMBER relocs were incorrectly handled in the linker

wingo added a comment.Dec 1 2020, 5:58 AM

I will assume that all of the clang-tidy feedback is valid and will adjust accordingly. Otherwise I think this is ready for review @sbc100. If it is too big, I think there may be small pieces I can split off, or it could just make sense as-is.

Maybe there are two logical parts we can separate here:

  1. Support for first class tables in input files
  1. Support for the special __indirect_function_table

I guess they are somewhat intertwined by I can imagine landing (1) without changing the way we handle the special table 0. Maybe its not worth the work to split up?

lld/wasm/Driver.cpp
791

We use a slightly different naming convention in lld.

I think the plan is to transition all of llvm eventually to this style and we are trying it out. So locals and parameters start with lowercase, but are otherwise camelcase.

lld/wasm/InputFiles.cpp
542

Could we make this separate function just to split it up?

How about:

if (!haveTableSymbol)
  synthesizeTableSymbols();
lld/wasm/SymbolTable.h
110 ↗(On Diff #307994)

The other globally-known symbols are kept in Symbols.h:WasmSym .. seems like maybe this should go there too?

lld/wasm/Symbols.cpp
19

Why this new include?

sbc100 added inline comments.Dec 1 2020, 12:04 PM
lld/wasm/SyntheticSections.h
370

This looks unused.

lld/wasm/Writer.cpp
825

This last seems a little out of place since its not assigning indexes. Would this make more sense alongside wherever out.elemSec is populated once we know its complete?

sbc100 added inline comments.Dec 1 2020, 12:13 PM
lld/wasm/SyntheticSections.h
153–154

How is (2) handled after this change?

Imagine a program that contains just this code:

EXPORT_OR_OTHERWISE_KEEP_ALIVE
 void call_func(func_ptr f) {
    f()
}

Now we a call_indirect instruction but no TABLE_INDEX relocations.

In fact we have no relocations at all, but the resulting program needs to have a table in order to validate.

I guess once you move to new relocation model to call_indirect this program *will* have a TABLE_NUMBER relocation. But we need to support the case where this code was compiled with the old model too.

The change to lld/test/wasm/stack-pointer.ll looks a little suspicious because it removes the table from the object file which would potentially make it fail to validate.

wingo added inline comments.Dec 2 2020, 8:01 AM
lld/wasm/SyntheticSections.h
153–154

Humm this is a head-scratcher! I think https://reviews.llvm.org/D91637 may have introduced a bad state.

Backing up -- The intention with the change here is that any object file that needs an indirect function table should either

  • have a TABLE_NUMBER reloc against the table ("new" object file with call_indirect)
  • have the table marked as no-strip ("new" object file, caused by assembler recording a TABLE_INDEX reloc)
  • or, if the incoming object file is "old" (has table import or definition but no table symbols), then those tables are treated as having no-strip symbols.

For new files, this works fine. For very old files, also fine -- they always have an __indirect_function_table import. But since https://reviews.llvm.org/D91637, there can be object files which refer to __indirect_function_table via call_indirect, but which don't have TABLE_INDEX relocs. In that case the object file will lack the __indirect_function_table import.

The problem is mitigated in that if any other linker input imports __indirect_function_table, then the result is an output file that validates.

wingo marked 4 inline comments as done.Dec 8 2020, 1:54 AM

New patch addresses micro-feedback, but still thinking if a patch split is possible. Probably also want to land a precursor also to ensure that the compiler produces __indirect_function_table imports for object files with call_indirect but no TABLE_INDEX relocs.

lld/wasm/Symbols.cpp
19

We can't fit a WasmTableType in a TableSymbol without growing the symbol size, so instead TableSymbol has a WasmTableType*, with the precondition that the symbol's lifespan is a subset of the WasmTableType's lifespan.

But in TableSymbol::setLimits, we can't just fiddle the limits in place, because they are embedded in a WasmTableType that doesn't belong to the symbol; we need to allocate a fresh table type in the arena. So Memory.h allow us to allocate a new WasmTableType using make<>.

Did I get the memory management right? Feedback very welcome.

lld/wasm/Writer.cpp
825

Sure. I tried initially putting it in TableSection::finalizeContents but of course this symbol might be imported, so I instead arranged to call it just after populating elemSec.

wingo updated this revision to Diff 310112.Dec 8 2020, 1:54 AM
wingo marked an inline comment as done.

Address feedback

wingo edited the summary of this revision. (Show Details)Dec 8 2020, 1:59 AM
wingo marked an inline comment as done.Dec 8 2020, 6:10 AM

Maybe there are two logical parts we can separate here:

  1. Support for first class tables in input files
  1. Support for the special __indirect_function_table

I guess they are somewhat intertwined by I can imagine landing (1) without changing the way we handle the special table 0. Maybe its not worth the work to split up?

For what it is worth, this was the first tack that I took, but I couldn't find a useful intermediate point where we still had consistent files. Simple things like whether __indirect_function_table is imported, exported, or neither end up propagating into special cases all over the place -- at least that was my conclusion at the time and I think it might still hold. It seemed cleaner to me to make something more conventional using symbols and relocs like everything else in the linker, and then adapt the current input files to that newer more conventional machinery. Dunno. WDYT?

wingo planned changes to this revision.Jan 5 2021, 3:14 AM

Will split this one.

wingo updated this revision to Diff 314554.Jan 5 2021, 3:15 AM

Split from D94075

wingo updated this revision to Diff 316598.Jan 14 2021, 2:23 AM

Rebase to include synthetic tables

wingo added a comment.Jan 14 2021, 4:50 AM

I think this one is ready for review @sbc100, when you have a moment :)

Nice. I like that we were able to delete almost as much code as was added here.

Just a few more comments but mostly lgtm.

lld/wasm/Driver.cpp
791

provide seems like a slightly odd verb. (Or is it just me?) How about calling this createIndirectFunctionTable or defineIndirectFunctionTable?

Or maybe creatDefinedIndirectFunctionTable to match the function below?

805

I don't think it ever make sense to pass StringRef by reference does it?

822

I guess this is existing behaviour. However I wonder if it would be better (in the future) to have --import-table and --export-table only effect the table if exists?

823

Can you just use functionTableName directly below rather than adding this local?

lld/wasm/InputFiles.cpp
312

Could you add a comment about the purpose of this function either in the header or here. IIUC this can be removed if ever drop support for older objects files, is that right?

313

Hoa

322

In other places we use saver which is StringSaver defined in lld/include/lld/Common/Memory.h

491

Ah.. I see you added a comment here rather than in the definition. Seems reasonable.

lld/wasm/SymbolTable.h
108 ↗(On Diff #316598)

IIUC this is only need to support legacy object files? Perhaps add a TODO or a note here that to that effect?

lld/wasm/Writer.cpp
775

Can/should we call this after scanRelocations in the caller?

There are many other things that need to happen only after relocations are scanned. I don't see why this one should inlined, unless I'm missing something?

wingo updated this revision to Diff 316666.Jan 14 2021, 8:05 AM
wingo marked 9 inline comments as done.

Address feedback

lld/wasm/Driver.cpp
805

Good point :)

822

Good question. Is it reasonable to make the rest of the toolchain inspect the output module to see if it actually needed a table? My initial thought was "well they asked for it, let's give them what they expect" but perhaps this is not the right thing.

lld/wasm/InputFiles.cpp
312

Good point! And yes it could be dropped in some future.

313

What does this mean? :)

322

Good tip, thanks. Though, I think the comment was actually erroneous; shouldn't the name be just (e.g.) "__indirect_function_table" ?

491

The updated patch includes a comment both places; lmk if i should remove one / change something / etc.

lld/wasm/SymbolTable.h
108 ↗(On Diff #316598)

I think not, actually -- before, there was some custom logic to reify an __indirect_function_table as part of the writer. Now, for all files, if an indirect function table definition is needed, it will be made with respect to a synthesized InputTable. So this vector will only ever have 1 or 0 elements.

lld/wasm/Writer.cpp
775

Sure, no prob.

sbc100 accepted this revision.Jan 14 2021, 8:10 AM
This revision is now accepted and ready to land.Jan 14 2021, 8:10 AM

Would you mind formatting the commit message to wrap at 80 chars? I know llvm commits are bit all over the place on the style used but I find wrapped commit message easier to read myself.

Thanks for the review!

Would you mind formatting the commit message to wrap at 80 chars? I know llvm commits are bit all over the place on the style used but I find wrapped commit message easier to read myself.

Yeah no problem -- actually the local commit message is wrapped, I just unwrapped it for phab. Incidentally I find it weird that we can land patches from local commits where the message doesn't have to correspond with what was in phab!

This revision was automatically updated to reflect the committed changes.

This seems to have a bug that we discovered on the emscripten builders where it's exporting the table twice.
The output (from wasm-objdump):

Export[14]:
 - memory[0] -> "memory"
 - table[0] -> "__indirect_function_table"
 - func[3] <__wasm_call_ctors> -> "__wasm_call_ctors"
 - func[5] <main> -> "main"
 - table[0] -> "__indirect_function_table"
 - func[6] <stackSave> -> "stackSave"
 - func[7] <stackRestore> -> "stackRestore"
 - func[8] <stackAlloc> -> "stackAlloc"
 - func[9] <emscripten_stack_init> -> "emscripten_stack_init"
 - func[10] <emscripten_stack_get_free> -> "emscripten_stack_get_free"
 - func[11] <emscripten_stack_get_end> -> "emscripten_stack_get_end"
 - func[12] <saveSetjmp> -> "saveSetjmp"
 - func[13] <testSetjmp> -> "testSetjmp"
 - func[14] <setThrew> -> "setThrew"

I'll see if I can find a more useful reproducer

it's funny, since this CL doesn't really modify the export behavior.
I'm going to go ahead and revert it though ("revert first and ask questions later" being the usual LLVM practice), until we can figure out what's happening.

Thanks for the sherriffing, I'll get on it :)

wingo reopened this revision.Jan 18 2021, 1:34 AM

Reopening, as that will let the diff to the previously landed patch be legible; lmk if you prefer a new rev.

This revision is now accepted and ready to land.Jan 18 2021, 1:34 AM
wingo updated this revision to Diff 317287.Jan 18 2021, 1:55 AM

Fix double-export of __indirect_function_table

wingo requested review of this revision.Jan 18 2021, 1:58 AM

PTAL @sbc100 -- the problem was that I had neglected to remove some special logic regarding export of the indirect function table. Instead it's now handled in the usual way, as a flag on the symbol.

Added a test that succeeds either way with the current patch, but which starts failing once MC writes table symbols in https://reviews.llvm.org/D92215 with the older version of this patch.

Tested llvm-lit on MC, CodeGen, and LLD with this change with the whole patch series, and all pass.

sbc100 accepted this revision.Jan 18 2021, 7:02 AM

Thanks andy! Still lgtm.

I think re-using the same review makes sense (although I don't know the conventions/rules are around that).

lld/test/wasm/export-table-test-2.test
1 ↗(On Diff #317287)

Can we come up with a better name for this new test? I don't have any great idea but just calling it "-test-2" doesn't seem useful.

How about export-table-explicit.test? Since this test includes and explicit declaration of the __indirect_function_table?

This revision is now accepted and ready to land.Jan 18 2021, 7:02 AM
wingo updated this revision to Diff 317352.Jan 18 2021, 7:20 AM

Rename new test to export-table-explicit.test

wingo marked an inline comment as done.Jan 18 2021, 7:21 AM
wingo added inline comments.
lld/test/wasm/export-table-test-2.test
1 ↗(On Diff #317287)

Good idea; done :)

This revision was automatically updated to reflect the committed changes.
wingo marked an inline comment as done.

It looks like this change once again broke some emscripten tests. This time it made it past our auto-rollers because they run the tests in questions.

The only real issue looks like a crash:

https://app.circleci.com/pipelines/github/emscripten-core/emscripten/10572/workflows/c3d06117-e1a6-4c55-95b0-80333209ad10

See the browsers tests log:

https://circleci.com/api/v1.1/project/github/emscripten-core/emscripten/388624/output/107/0?file=true&allocation-id=6006f0022f4f9c06d96ced0d-0-build%2F7C3546A8

/tmp/emscripten_test_browser_ixzsc8j6/building/glbook/Chapter_2/Hello_Triangle/CH02_HelloTriangle.o
wasm-ld: /b/s/w/ir/cache/builder/emscripten-releases/llvm-project/lld/wasm/SymbolTable.cpp:278: lld::wasm::DefinedTable *lld::wasm::SymbolTable::addSyntheticTable(llvm::StringRef, uint32_t, lld::wasm::InputTable *): Assertion `!s || s->isUndefined()' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /root/emsdk/upstream/bin/wasm-ld -o test.wasm /tmp/emscripten_test_browser_ixzsc8j6/building/glbook/Chapter_2/Hello_Triangle/CH02_HelloTriangle.o /tmp/emscripten_temp_eu76mn7s/report_result_0.o -L/root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libcompiler_rt.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc++-noexcept.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc++abi-noexcept.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libdlmalloc.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc_rt_wasm.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libsockets.a -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --allow-undefined --strip-debug --export-table --export main --export emscripten_stack_get_end --export emscripten_stack_get_free --export emscripten_stack_init --export stackSave --export stackRestore --export stackAlloc --export __wasm_call_ctors --export fflush --export __errno_location -z stack-size=5242880 --initial-memory=16777216 --no-entry --max-memory=16777216 --global-base=1024
 #0 0x00007f93c0d537f3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/root/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x84f7f3)
 #1 0x00007f93c0d5153e llvm::sys::RunSignalHandlers() (/root/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x84d53e)
 #2 0x00007f93c0d53cbf SignalHandler(int) (/root/emsdk/upstream/bin/../lib/libLLVM-12git.so+0x84fcbf)
 #3 0x00007f93c43a6890 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12890)
 #4 0x00007f93bfb9be97 raise (/lib/x86_64-linux-gnu/libc.so.6+0x3ee97)
 #5 0x00007f93bfb9d801 abort (/lib/x86_64-linux-gnu/libc.so.6+0x40801)
 #6 0x00007f93bfb8d39a (/lib/x86_64-linux-gnu/libc.so.6+0x3039a)
 #7 0x00007f93bfb8d412 (/lib/x86_64-linux-gnu/libc.so.6+0x30412)
 #8 0x00000000007caba2 (/root/emsdk/upstream/bin/wasm-ld+0x7caba2)
 #9 0x00000000007b43f3 lld::wasm::(anonymous namespace)::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (/root/emsdk/upstream/bin/wasm-ld+0x7b43f3)
#10 0x00000000007ad836 lld::wasm::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&, llvm::raw_ostream&) (/root/emsdk/upstream/bin/wasm-ld+0x7ad836)
#11 0x000000000048df76 lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) (/root/emsdk/upstream/bin/wasm-ld+0x48df76)
#12 0x000000000048db00 main (/root/emsdk/upstream/bin/wasm-ld+0x48db00)
#13 0x00007f93bfb7eb97 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b97)
#14 0x000000000048d7da _start (/root/emsdk/upstream/bin/wasm-ld+0x48d7da)
emcc: error: '/root/emsdk/upstream/bin/wasm-ld -o test.wasm /tmp/emscripten_test_browser_ixzsc8j6/building/glbook/Chapter_2/Hello_Triangle/CH02_HelloTriangle.o /tmp/emscripten_temp_eu76mn7s/report_result_0.o -L/root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libcompiler_rt.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc++-noexcept.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc++abi-noexcept.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libdlmalloc.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libc_rt_wasm.a /root/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libsockets.a -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --allow-undefined --strip-debug --export-table --export main --export emscripten_stack_get_end --export emscripten_stack_get_free --export emscripten_stack_init --export stackSave --export stackRestore --export stackAlloc --export __wasm_call_ctors --export fflush --export __errno_location -z stack-size=5242880 --initial-memory=16777216 --no-entry --max-memory=16777216 --global-base=1024' failed (-6)

I'm pretty sure thse test uses partial linking (wasm-ld -r) so I guess that is the problem. We are probably generating the __indirect_function_table in the relocatable case when we should no be. It should only be the final link to an executable that generates it, right?

wingo added a comment.Jan 19 2021, 8:17 AM

Sadly I need to run out but can look again early in the morning. I am not sure how relocatable results should work; if the intention is that they need to pass a subsequent final link pass, then I think it's reasonable to want the indirect function table to only be built by the final link, and not in the--relocatable case. Feel very free to back out patches as necessary.

I see the bug was fixed in https://reviews.llvm.org/D94993. That patch LGTM, fwiw. Thank you!