This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Initial Disassembler.
ClosedPublic

Authored by aardappel on Apr 19 2018, 4:25 PM.

Details

Summary

This implements a new table-gen emitter to create tables for
a wasm disassembler, and a dissassembler to use them.

Comes with 2 tests, that tests a few instructions manually. Is also able to
disassemble large .wasm files with objdump reasonably.

Not working so well, to be addressed in followups:

  • objdump appears to be passing an incorrect starting point.
  • since the disassembler works an instruction at a time, and it is disassembling stack instruction, it has no idea of pseudo register assignments. These registers are required for the instruction printing code that follows. For now, all such registers appear in the output as $0.

Diff Detail

Repository
rL LLVM

Event Timeline

aardappel created this revision.Apr 19 2018, 4:25 PM
aardappel updated this revision to Diff 145519.May 7 2018, 12:47 PM

This actually implements the disassembler, some parts not working
ideally, see comments.

sbc100 added a comment.May 7 2018, 2:34 PM

Is this ready to be submitted now?

lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
70 ↗(On Diff #145519)

Mark these internal functions as static?

70 ↗(On Diff #145519)

Does it make sense to pass an arrayref by ref?

82 ↗(On Diff #145519)

I general I think llvm style says not to use auto unless is immediately obvious what the type is (i.g. if there is a cast on the right hand side which already names the type).

I also wouldn't use auto for start above.

89 ↗(On Diff #145519)

decodeULEB128 functions already detect malformed LEBs. I think you can avoid the entire readByte loop and just pass the error argument to decodeULEB128, along with the actual end of the Bytes array.

lib/Target/WebAssembly/InstPrinter/WebAssemblyInstPrinter.cpp
202 ↗(On Diff #145519)

Formatting looks strange. git clang-format origin/master?

lib/Target/WebAssembly/InstPrinter/WebAssemblyInstPrinter.h
46 ↗(On Diff #145519)

It looks like none of these functions use this new arg. Is it needed for some other reason?

lib/Target/WebAssembly/WebAssembly.td
87 ↗(On Diff #145519)

Can this be 0 and avoid the new STI argument?

unittests/MC/Disassembler.cpp
66 ↗(On Diff #145519)

Probably want a better if you want to add this new test :)

utils/TableGen/DisassemblerEmitter.cpp
133 ↗(On Diff #145519)

This is a standalone function so it should start with capital letter I think

aardappel marked 9 inline comments as done.May 7 2018, 3:24 PM

It can be ready to submit if we agree that for the moment having non-sense registers in the output is ok (looks like this will need works elsewhere before that can be properly fixed).
It could also probably need more tests. I haven't run it on a larger binary yet.

lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
70 ↗(On Diff #145519)

No, it doesn't.. it was originally mutable, forgot to change it.

89 ↗(On Diff #145519)

You're totally right.. I had initially assumed there was a possibility of it reading past the end.

lib/Target/WebAssembly/InstPrinter/WebAssemblyInstPrinter.cpp
202 ↗(On Diff #145519)

did git clang-format HEAD^ since I am maybe behind on master..

lib/Target/WebAssembly/InstPrinter/WebAssemblyInstPrinter.h
46 ↗(On Diff #145519)

Because of the changes I made in the .td files, the generated code that interfaces with this code now expects these extra arguments. No idea why.

lib/Target/WebAssembly/WebAssembly.td
87 ↗(On Diff #145519)

Ahh that's where that came from, thanks :)

unittests/MC/Disassembler.cpp
66 ↗(On Diff #145519)

I presume you mean a better name.. fixed.

utils/TableGen/DisassemblerEmitter.cpp
133 ↗(On Diff #145519)

I don't see any mention of special treatmeant of stand-alone functions in https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
In fact, it has an example that happens to be a stand-alone function which starts with lowercase.

aardappel updated this revision to Diff 145567.May 7 2018, 3:26 PM
aardappel marked 7 inline comments as done.

Fixes due to sbc100 review + clang-format.

sbc100 added a comment.May 7 2018, 3:37 PM

I don't think there is a problem with landing a partial implementation. Up to you I guess.
I'm wondering if this change will be enough for me to use it in some of the existing tests where are force to use obj2yaml.

lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
139 ↗(On Diff #145567)

At least to my eye this is overuse of auto, but i can't see what the type of OT is here.

See: https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

aardappel marked an inline comment as done.May 7 2018, 3:49 PM
aardappel added inline comments.
lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
139 ↗(On Diff #145567)

I'm an "maximally auto" kind of person so I may miss these on occasion :)

aardappel updated this revision to Diff 145578.May 7 2018, 3:52 PM
aardappel marked an inline comment as done.

Forgot test file (wasm.txt) + 1 more review fix.

haven't looked at most of the code, but thought about the register operand issue.

lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
170 ↗(On Diff #145519)

I think we will need to generate the register assignments, either here or in a post-processing step. I don't think it currently makes sense to have a second version of all the instructions with implicit operands (even though they wouldn't be needed by MC in explicit-locals mode).

You'd think it ought to be as simple as running a "pass" over the MCInsts, but last I checked there wasn't really any facility for running passes over MC; it's structured much more like a stream. (and looking at the use of getInstruction in e.g. LLVMDisasmInstruction in MCDisassembler or in llvm-objdump, it looks like the returned MCInsts just get passed directly through to the printer. We could maybe hang some state off of the WebAssemblyDisassembler?

aardappel added inline comments.May 10 2018, 8:58 AM
lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
170 ↗(On Diff #145519)

Yes, we can easily keep a stack to track assignment + types in WebAssemblyDisassembler, but we'd be making the assumption that instructions are always disassembled in order. To accomodate this code being called for disassembling single instructions, this code would have to be "robust" and still function if the stack is empty etc.

More-over to get correct types we'd need access to locals/args, which we don't have. We currently don't even know where a function starts and stops.

Then, in table-gen there is currently one instruction per return type, i.e. i32.call vs i64.call, which are actually the same instruction. We'd either need to strip these types in a post-process, or we'd need to output more complicated tables that contain all of them, so the dissassembler can pick the right one based on stack type information, if it has any.

Since this isn't quite trivial, I suggest I land this basic disassembler (which will at least allow objdump etc to show something useful), then experiment with the above in follow-ups.

dschuff added inline comments.May 10 2018, 2:29 PM
lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
170 ↗(On Diff #145519)

More-over to get correct types we'd need access to locals/args, which we don't have. We currently don't even know where a function starts and stops.

Ah yeah, that's the really unfortunate bit that I hadn't thought about.

Then, in table-gen there is currently one instruction per return type, i.e. i32.call vs i64.call, which are actually the same instruction.

This is mostly just for call, right? It looks all the flavors have the same mnemonic, so maybe we can just fake it or something.

Since this isn't quite trivial, I suggest I land this basic disassembler (which will at least allow objdump etc to show something useful), then experiment with the above in follow-ups.

Yeah I think that makes sense. This seems like a reasonable unit of code to put in, do you want to remove the DO NOT MERGE and consider it ready then? I like that this is one of the few areas of LLVM that can get proper unit tests. We should maybe also add at least a basic mc test as well though to ensure things can be hooked up end-to-end. See e.g. test/MC/Disassembler, but we can probably put the most coverage in the unit tests.

sbc100 added inline comments.May 10 2018, 2:33 PM
test/MC/Disassembler/WebAssembly/wasm.txt
1 ↗(On Diff #145578)

Can you drop the -elf? Its hopefully not long for this world anyway.

aardappel added inline comments.May 10 2018, 2:39 PM
lib/Target/WebAssembly/Disassembler/WebAssemblyDisassembler.cpp
170 ↗(On Diff #145519)

You're right, it is actually few instructions: call, call_indirect, select. The first two already have a shorter version in the tables, so if I on purpose select the one with the least operands, only select remains? So maybe we can add an instruction for that one.. then we can largely ignore types.

I have a bit more testing I want to do with objdump before we can merge this, I will let y'all know when its ready.

There are already 2 tests here, one in unittests/MC/Disassembler.cpp and one in test/MC/Disassembler/WebAssembly/wasm.txt, see below. The latter in particular could use more tests I am sure.

aardappel marked an inline comment as done.May 10 2018, 2:40 PM
aardappel updated this revision to Diff 146238.May 10 2018, 3:08 PM

Last review fix.

aardappel edited the summary of this revision. (Show Details)May 10 2018, 3:13 PM
This revision was not accepted when it landed; it landed in state Needs Review.May 10 2018, 3:22 PM
This revision was automatically updated to reflect the committed changes.