This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Move .debug_line section address of dead function outside section range
ClosedPublic

Authored by yurydelendik on Jul 17 2018, 12:26 PM.

Details

Summary

Currently we are pointing all debug information that refer removed function code
to the beginning of the code section (offset = 0). A debugger may want to
resolve code offset to the debug information, which will collide with offsets
of the live functions.

Moving offsets of dead functions outside code section range.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

yurydelendik created this revision.Jul 17 2018, 12:26 PM
ruiu added a subscriber: ruiu.Jul 18 2018, 11:01 AM

I believe for other architectures, when the linker emits debug info for a dead function, it treats relocations pointing a dead section as if it were pointing to address zero. Is there any reason to do that differnetly for wasm?

I believe for other architectures, when the linker emits debug info for a dead function, it treats relocations pointing a dead section as if it were pointing to address zero. Is there any reason to do that differnetly for wasm?

The .debug_line section encodes instructions offset in relative form, so the next address advances by delta, and that introduces non-zero addresses for dead code. It is hard to tell if an address that is close to 0 points to dead section or just to the beginning of the code (unless you have relocation sections handy). It will make consumption of the debug information way easier if dead pointers will not point to the same locations where live data can reside.

I'm still not sure I understand why writing zero here is not a good idea.

wasm/InputFiles.cpp
152

Could/Should this be "->isLive()"?

I'm still not sure I understand why writing zero here is not a good idea.

It will be probably easy to demonstrate on the example. When running the test case (test/wasm/debug-removed-fn.ll) without the patch, the .debug_line data contains the following info

0x0000000000000005      1      0      1   0             0  is_stmt
0x0000000000000008      2      3      1   0             0  is_stmt prologue_end
0x0000000000000009      2      3      1   0             0  is_stmt end_sequence
0x0000000000000000      5      0      1   0             0  is_stmt
0x0000000000000003      6      3      1   0             0  is_stmt prologue_end
0x0000000000000004      6      3      1   0             0  is_stmt end_sequence

With patch:

0x0000000000000005      1      0      1   0             0  is_stmt
0x0000000000000008      2      3      1   0             0  is_stmt prologue_end
0x0000000000000009      2      3      1   0             0  is_stmt end_sequence
0x00000000fdead000      5      0      1   0             0  is_stmt
0x00000000fdead003      6      3      1   0             0  is_stmt prologue_end
0x00000000fdead004      6      3      1   0             0  is_stmt end_sequence

Notice that the "bar" function debug data "claims" that the function can be found at addresses 0-4 (without the patch). Then same true for .debug_info.

If the both functions are exported, .debug_line looks like:

0x0000000000000005      1      0      1   0             0  is_stmt
0x0000000000000008      2      3      1   0             0  is_stmt prologue_end
0x0000000000000009      2      3      1   0             0  is_stmt end_sequence
0x000000000000000a      5      0      1   0             0  is_stmt
0x000000000000000d      6      3      1   0             0  is_stmt prologue_end
0x000000000000000e      6      3      1   0             0  is_stmt end_sequence

Does this explain the problem?

yurydelendik added inline comments.Jul 18 2018, 3:11 PM
wasm/InputFiles.cpp
152

InputFunction/InputChunk does not provide such method. Do you want to add this attribute to the InpurtFunction?

I'm still not sure I understand why writing zero here is not a good idea.

It will be probably easy to demonstrate on the example. When running the test case (test/wasm/debug-removed-fn.ll) without the patch, the .debug_line data contains the following info

0x0000000000000005      1      0      1   0             0  is_stmt
0x0000000000000008      2      3      1   0             0  is_stmt prologue_end
0x0000000000000009      2      3      1   0             0  is_stmt end_sequence
0x0000000000000000      5      0      1   0             0  is_stmt
0x0000000000000003      6      3      1   0             0  is_stmt prologue_end
0x0000000000000004      6      3      1   0             0  is_stmt end_sequence

With patch:

0x0000000000000005      1      0      1   0             0  is_stmt
0x0000000000000008      2      3      1   0             0  is_stmt prologue_end
0x0000000000000009      2      3      1   0             0  is_stmt end_sequence
0x00000000fdead000      5      0      1   0             0  is_stmt
0x00000000fdead003      6      3      1   0             0  is_stmt prologue_end
0x00000000fdead004      6      3      1   0             0  is_stmt end_sequence

Notice that the "bar" function debug data "claims" that the function can be found at addresses 0-4 (without the patch). Then same true for .debug_info.

If the both functions are exported, .debug_line looks like:

0x0000000000000005      1      0      1   0             0  is_stmt
0x0000000000000008      2      3      1   0             0  is_stmt prologue_end
0x0000000000000009      2      3      1   0             0  is_stmt end_sequence
0x000000000000000a      5      0      1   0             0  is_stmt
0x000000000000000d      6      3      1   0             0  is_stmt prologue_end
0x000000000000000e      6      3      1   0             0  is_stmt end_sequence

Does this explain the problem?

Ish - but existing debuggers/linkers already tolerate this sort of debug info - so it'd be ideal/better if WAsm did too. Any address derived from zero would be considered invalid/uninteresting. (similarly for the debug_info itself - you'd find similarly confusing data, where functions start at zero and are X bytes long - so technically a bunch of non-zero bytes are covered by that range, but currently it seems existing debuggers/linkers are OK with this contract too).

Ish - but existing debuggers/linkers already tolerate this sort of debug info - so it'd be ideal/better if WAsm did too. Any address derived from zero would be considered invalid/uninteresting. (similarly for the debug_info itself - you'd find similarly confusing data, where functions start at zero and are X bytes long - so technically a bunch of non-zero bytes are covered by that range, but currently it seems existing debuggers/linkers are OK with this contract too).

Currently, the wasm function (dead or alive) never starts from 0. The Sym->Function->getFunctionCodeOffset() returns size of the function size field that is placed before function body. If we want debug info to point strict 0, we can change the DEAD_FUNCTION_OFFSET to be 0 -- the rest of the patch will stay pretty much the same. I just took one step further and moved that completely on of code range -- with other platforms that's happening automatically since PC is almost never close to 0.

FWIW, the algorithm for the detection of dead function debug entries looks like https://github.com/kripken/emscripten/pull/6884/files#diff-d7e06c82407a72c0476db85b2cf5d3deR157 and requires linker to produce compact LEB values for functions size field length.

Ish - but existing debuggers/linkers already tolerate this sort of debug info - so it'd be ideal/better if WAsm did too. Any address derived from zero would be considered invalid/uninteresting. (similarly for the debug_info itself - you'd find similarly confusing data, where functions start at zero and are X bytes long - so technically a bunch of non-zero bytes are covered by that range, but currently it seems existing debuggers/linkers are OK with this contract too).

Currently, the wasm function (dead or alive) never starts from 0. The Sym->Function->getFunctionCodeOffset() returns size of the function size field that is placed before function body. If we want debug info to point strict 0, we can change the DEAD_FUNCTION_OFFSET to be 0 -- the rest of the patch will stay pretty much the same. I just took one step further and moved that completely on of code range -- with other platforms that's happening automatically since PC is almost never close to 0.

FWIW, the algorithm for the detection of dead function debug entries looks like https://github.com/kripken/emscripten/pull/6884/files#diff-d7e06c82407a72c0476db85b2cf5d3deR157 and requires linker to produce compact LEB values for functions size field length.

Guess there's no chance of reusing the existing ELF logic for this? (I assume the ELF linker and WAsm linker are too different for that)

If it's not going to fall out naturally as a consequence of a single shared implementation, then I'm not too fussed with what value you give dead code - I'd still lean towards zero, but I'll leave it to Rui.

sbc100 added inline comments.Aug 21 2018, 11:35 AM
wasm/InputFiles.cpp
152

But Sym does have this, no? See line 134 above? Maybe I'm missing something.

sbc100 added inline comments.Aug 21 2018, 11:42 AM
wasm/InputFiles.cpp
152

You should be able use the same pattern as R_WEBASSEMBLY_MEMORY_ADDR_LEB above I think.

Doing that, should allow you to revert InputFiles.cpp and InputChunks.h making this patch really trivial. We could then consider the DEADFUNC thing as a followup?

Use Sym->isLive; start dead function offset at 0.

yurydelendik marked 3 inline comments as done.Sep 24 2018, 2:17 PM
sbc100 accepted this revision.Sep 24 2018, 4:21 PM
This revision is now accepted and ready to land.Sep 24 2018, 4:21 PM
This revision was automatically updated to reflect the committed changes.