This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Fixed objdump not parsing function headers.
ClosedPublic

Authored by aardappel on Jan 14 2019, 2:57 PM.

Details

Summary

objdump was interpreting the function header containing the locals
declaration as instructions. To parse these without injecting target
specific code in objdump, MCDisassembler::onSymbolStart was added to
be implemented by the WebAssembly implemention.

WasmObjectFile now returns a code offset for the "address" of a symbol,
rather than the index. This is also more in-line with what other
targets do.

Also ensured that the AsmParser correctly puts each function
in its own segment to enable this test case.

Diff Detail

Repository
rL LLVM

Event Timeline

aardappel created this revision.Jan 14 2019, 2:57 PM
aardappel updated this revision to Diff 181658.Jan 14 2019, 3:30 PM

rebased / new instruction names.

Generally LGTM.

Previously objdump had listed "function index" for address of function symbols because in the wasm world the address of a function is its index. But I guess listing code section offset is fine too.

This change will require a corresponding lld-side change.

include/llvm/MC/MCDisassembler/MCDisassembler.h
91 ↗(On Diff #181658)

actual bytes at the symbol location?

include/llvm/Object/Wasm.h
225 ↗(On Diff #181658)

Do we still need to non-const version?

lib/MC/MCParser/WasmAsmParser.cpp
96 ↗(On Diff #181658)

This seems unrelated? Separate change maybe?

lib/Object/WasmObjectFile.cpp
1016 ↗(On Diff #181658)

Seems strange to have to duplicate the body here.

lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
302 ↗(On Diff #181658)

Separate change?

lib/Target/WebAssembly/Disassembler/LLVMBuild.txt
22 ↗(On Diff #181658)

What made this necessary?

lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
133 ↗(On Diff #181658)

But will any symbol ever point to this address?

181 ↗(On Diff #181658)

Separate bug fix?

aardappel marked 9 inline comments as done.Jan 14 2019, 4:24 PM
include/llvm/Object/Wasm.h
225 ↗(On Diff #181658)

Yup I checked, it is in use.

lib/MC/MCParser/WasmAsmParser.cpp
96 ↗(On Diff #181658)

This change is required for the tests to work (see commit message)

lib/Object/WasmObjectFile.cpp
1016 ↗(On Diff #181658)

Afaik having one call the other requires a const_cast which is a bit awkward?

lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
302 ↗(On Diff #181658)

I personally feel small "drive by" fixes are ok. I could have also left this type out of the test but felt this was better.

lib/Target/WebAssembly/Disassembler/LLVMBuild.txt
22 ↗(On Diff #181658)

The disassembler calls WebAssembly::anyTypeToString which I didn't want to duplicate. And the WebAssemblyAsmPrinter library is really small, just a single small .cpp

lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
133 ↗(On Diff #181658)

objdump automatically generates a symbol at address 0 if there isn't one. Rather than adding more wasm specifics into the main objdump code to prevent this, I made use of that fact to have it parse just the function count.

181 ↗(On Diff #181658)

This isn't even a bug, it's just a refactoring (making an argument non-default).

aardappel updated this revision to Diff 181670.Jan 14 2019, 4:43 PM
aardappel marked an inline comment as done.

code review fixes

sbc100 accepted this revision.Jan 14 2019, 4:45 PM
This revision is now accepted and ready to land.Jan 14 2019, 4:45 PM

Is this actually what we want?
If I'm debugging a crash in a browser, the browser shows stack traces by function index rather than file offset, in this format
${url}:wasm-function[${funcIndex}]:${pcOffset}
(see https://webassembly.github.io/spec/web-api/index.html#conventions)
So it might be better for objdump to match that format.

Is this change just for consistency (between targets) of printing, or is there another reason why it's better to have "addresses" be offsets rather than indices?

Oh I had actually forgotten that the PC offset is the offset in the module, not the offset within the function. So for objdump either way would be ok, since you could easily find what function a particular instruction is in. I guess my point is not that the formats have to match, (since e.g. you probably wouldn't be using the same code to parse both browser output and tool output?) but if you are debugging and looking at a stack trace, you should easily be able to find the same location in an objdump-produced assembly dump.... Without a calculator (annoyance of mismatched hex/dec address formats was what originally led me to add that section to the spec :) )

@dschuff This change only affects how internally symbol.getAddress works, which is needed for objdump disassembly to work at all, and what is reported for objdump symbol tables.. it should not affect what you see in a stack trace.

I can also not make this change, but then I'll need further wasm specifics in the general objdump and object code. For example I could introduce a new getOffset or whatever, that does the same as getAddress on most targets but has a different meaning in wasm. Then I could leave getAddress as is and objdump symbol table output would not change as it does now.

Yeah, thinking about this a little more, I think it makes sense. IIRC the "address" (or value of a symbol, which I think is equivalent?) in the object file/linking/executable context refers to the section offset rather than memory-mapped address even for ELF, which would make our objdump consistent with other targets too. For the purposes of LLVM internals, I think you are right that having that consistency seems best; for the users, it's possible we'll want to put some more info in the final displayed output, but we can figure out how to get that once the guts are hooked up.

Ok, sounds like we're good to land this then..

dschuff accepted this revision.Jan 17 2019, 9:32 AM
This revision was automatically updated to reflect the committed changes.