Page MenuHomePhabricator

[WebAssembly] Added initial AsmParser implementation.

Authored by aardappel on Mar 9 2018, 2:37 PM.



It uses the MC framework and the tablegen matcher to do the
heavy lifting. Can handle both explicit and implicit locals
(-disable-wasm-explicit-locals). Comes with a small regression

This is a first basic implementation that can parse most llvm .s
output and round-trips most instructions succesfully, but in order
to keep the commit small, does not address all issues.

There are a fair number of mismatches between what MC / assembly
matcher think a "CPU" should look like and what WASM provides,
some already have workarounds in this commit (e.g. the way it
deals with register operands) and some that require further work.
Some of that further work may involve changing what the
Disassembler outputs (and what s2wasm parses), so are probably
best left to followups.

Some known things missing:

  • Many directives are ignored and not emitted.
  • Vararg calls are parsed but extra args not emitted.
  • Loop signatures are likely incorrect.
  • $drop= is not emitted.
  • Disassembler does not output SIMD types correctly, so assembler can't test them.

Diff Detail


Event Timeline

aardappel created this revision.Mar 9 2018, 2:37 PM
aardappel updated this revision to Diff 137862.Mar 9 2018, 3:49 PM

Fixed out of date comments in WebAssemblyParser.cpp

WIP review, read up through WebAssemblyAsmParser::ParseInstruction

291 ↗(On Diff #137862)

Wait, is truthiness here used to represent failure? That's... confusing.
Oh, that's overridden from MCTargetAsmParser::ParseInstruction, cool.

320 ↗(On Diff #137862)

... I feel like this sort of thing might be cleaner with a #define ERR_IF(x) if(x) { return true; } macro, but I may be in the minority there.

34 ↗(On Diff #137843)

Nit: I feel this should be "wasm-asm-parser"? The other targets define this as:

lib/Target/ARM/AsmParser/ARMAsmParser.cpp:#define DEBUG_TYPE "asm-parser"
lib/Target/Mips/AsmParser/MipsAsmParser.cpp:#define DEBUG_TYPE "mips-asm-parser"
lib/Target/AVR/AsmParser/AVRAsmParser.cpp:#define DEBUG_TYPE "avr-asm-parser"
lib/Target/Hexagon/AsmParser/HexagonAsmParser.cpp:#define DEBUG_TYPE "mcasmparser"

so it probably doesn't really matter.

82 ↗(On Diff #137843)

The repetition here makes me think something is odd, but I'm not convinced that other designs would be better.

One thing that might reduce the conceptual repetition is changing SMLoc Start, SMLoc End to AsmToken Tok, and just calling Tok.getLoc()/Tok.getEndLoc() from here, instead of at each callsite.

117 ↗(On Diff #137843)

Nit: "not" -> "Not", for consistent capitalization

284 ↗(On Diff #137843)

Typo: should be ParseSingleInteger?

aardappel marked 4 inline comments as done.Mar 12 2018, 3:38 PM
aardappel added inline comments.
291 ↗(On Diff #137862)

Yup, just following along with what LLVM does here.

320 ↗(On Diff #137862)

I'd just swap the truthyness. Since the rest of the code doesn't use such a macro, there's little point I'm afraid.

82 ↗(On Diff #137843)

There is one call-site (for one instance of TokOp) where no token is available, so the best I can do is make all of em use Tok and then still have one at that has 2 SMLoc's. Not sure if that is much cleaner in total.

aardappel updated this revision to Diff 138098.Mar 12 2018, 3:41 PM
aardappel marked 2 inline comments as done.

Minor changes due to @jgravelle-google's review.

looking pretty good overall, here are a couple of high-level comments.

402 ↗(On Diff #137862)

Wow, is it just me or did the diff engine really screw this file up :/

215 ↗(On Diff #137862)

There are a couple of confusing things to me about this. First, the SIMD proposal has only one value type (v128) and the wasm backend has a corresponding register class (V128) into which all the SIMD MVTs (v4f32 et al) can go. But in the assembly language the scalar instructions use the value type name in their mnemonic and the SIMD instructions use the lane division interpretation names. So that's a bit of inherent weirdness that will have to be reflected in this code somewhere. But the other thing is that this function is named ParseRegType which to me would indicate the type of register (i.e. register class which in C++ code would be e.g. WebAssembly::I32RegClass), but what it returns is actually a physical register number. What we maybe actually want is to reflect either a wasm value type (although it looks like that's maybe not actually even reflected directly in the MIR, just in the binary format) or an MVT (e.g. MVT::i32 or MVT::v4f32).

... ok actually looking at the way these reg types are used I now actually understand the comment you wrote in WebAssemblyOperand::RegOp. Not sure I have a better idea right now. I guess it's because the generated matcher expects the reg operand to be one of the physical registers and validates that?

249 ↗(On Diff #137862)

Up until now, the stack "reg" numbers on the "$push" and "$pop" tokens were really just annotations for readability, (because of course the operands push and pop on the stack in a defined order). So far nothing has actually depended on those notations, and this seems to be adding that dependence. Not that we absolutely can't do that; but it would be nice if we didn't have to. It looks like it might make this code more awkward though, because it appears that we are just getting the lexer tokens in order, and we'd maybe have to buffer them or something before processing them in right-to-left order. Am I understanding this correctly?

291 ↗(On Diff #137862)

There are a lot of places in LLVM backends that return true for failure and false for success ¯\_(ツ)_/¯
edit: actually there doesn't appear to be a failure path in this function, but its caller below clearly expects that possibility?

aardappel marked 2 inline comments as done.Mar 12 2018, 4:59 PM
aardappel added inline comments.
402 ↗(On Diff #137862)

whatever diff this uses seems generally much worse than github/gerrit/critique..

215 ↗(On Diff #137862)

Yes, this was originally to please the generated matcher. Though looking at the code, it seems the different v128 layouts are only needed for emitParam / emitLocal, whereas the matcher would be fine with generic registers. I can try splitting these up and reverting to a single physical register type, and see if it still works.

249 ↗(On Diff #137862)

The generated matcher does need registers in pretty much the same locations as these, so thats why it seemed natural to follow along. Certainly the number part is non-essential, since I am essentially re-using the assignment from the disassembler, though to accept code that does not have these numbers, I presume I would have to track a stack to able to assign them myself. That's not hard, but also not entirely trivial, so that may be a nice thing to do for a followup. In fact, I think it be nice to be able to not require these at all, and completely skip them on parse, and have a mode (initially for the non-elf path) that makes the dissassembler not output them either. Opinions welcome.

291 ↗(On Diff #137862)

I'm guessing this originally had code that could fail parsing in it.. I'll make it void.

34 ↗(On Diff #137843)

I'll change it anyway, seems more logical to have a -

sbc100 added inline comments.Mar 12 2018, 5:27 PM
10 ↗(On Diff #138098)

This diff looks message up to. How did you generate/upload this diff? arc diff?

55 ↗(On Diff #138098)

And this one..

dschuff added inline comments.Mar 12 2018, 5:39 PM
249 ↗(On Diff #137862)

Yeah, that was sort of what I was getting at. s2wasm ignores the numbers entirely and tracks the stack itself. I do think it would be nice to not need them at all.

I think It sort of comes down to what we want the assembly language to look like. The non-ELF path will soon be the only path, so we don't need to worry about it for the long term. I think my preference for the language would be to keep it simpler and not require the numbers. This would require the assembler to track the stack, but even if it used the numbers, it would probably need to validate them anyway so overall it wouldn't be much different. I do actually like having the numbers as an annotation though (it makes the asm easier to read), and there is a way to attach annotations to assembly output, so maybe we should use it for that.

aardappel marked 2 inline comments as done.Mar 15 2018, 2:52 PM
aardappel added inline comments.
249 ↗(On Diff #137862)

agreed. I think it should be an option for the dissassembler (I personally would prefer to read the code without them), and the assemble should permit but ignore them.

Tracking the stack myself may be a bit more interesting than usual, since the current assembler code has no idea of instruction semantics, it leaves all of that to the table-gen matcher. So I'd need to at least dip into the table-gen to extract information on instruction signature if I want to avoid hard-coding any of that.

10 ↗(On Diff #138098)

git format-patch, as recommended by I'll try arc for my next PR.

249 ↗(On Diff #137862)

To throw my own opinions in the mix,
Design goals I think are important:

  1. the .s syntax should be easy to read
  2. it should support inline asm fairly easily

To that end we could take and existing (wasm-elf) .s:

i32.const $push3=, 0
i32.load  $push2=, __stack_pointer($pop3)
i32.const $push4=, 16
i32.sub   $1=, $pop2, $pop4

and write it as

i32.const 0
i32.load  offset=__stack_pointer
i32.const 16
set_local 1

which is generally identical to .wat. We (in principle) can do this because the push/pop annotations are essentially cosmetic: we know them statically on a per-opcode basis. (This is almost certainly less the case for function calls.)
This should be the easiest thing to write inline asm for, and map most directly to the official text format, which should help the learning curve.

As a secondary idea, we keep the numbers, but observe that we can reuse virtual registers / stack slots after popping. Then the number refers to the depth of the stack at any given point. This also serves as a proxy for register pressure in the engines.

i32.const $push0=, 0
i32.load  $push0=, __stack_pointer($pop0)
i32.const $push1=, 16
i32.sub   $1=, $pop0, $pop1

I think i'd be in favor of letting this patch go in as it is for now, since it uses our existing syntax, and then have a separate discussion of how to change it on tool-conventions.

249 ↗(On Diff #137862)

Yeah, when formatted as an annotation the push/pop labels would probably end up looking something like this, when using explicit locals:

i32.const 0   #  push3
i32.load  offset=__stack_pointer    #  pop3
i32.const 16  #  push4
i32.sub    # pop4, pop3, push4
set_local 1 # pop5

Without explicit locals it gets more interesting because the actual parsed syntax is different from the spec (and of course we have to parse it). In that case it might make more sense to also explicitly print/parse pop operations.

But anyway we shouldn't be having this discussion here, an issue on tool-conventions probably makes more sense.

aardappel marked 13 inline comments as done.Mar 19 2018, 10:21 AM
aardappel added inline comments.
215 ↗(On Diff #137862)

Refactored how register types and layout are handled, and reverted to a single SIMD register in the latest patch.

249 ↗(On Diff #137862)

Agreed. I'm fine either making them optional and ignored (in wasm mode, not elf), or comments, or omitted all-together. I am assuming this will be in a follow-up though. I can start a conversation on tool-conventions to include everyone.

aardappel marked an inline comment as done.

Addressing all comments so far, except for the removal of $push etc annotations.

dschuff accepted this revision.Mar 19 2018, 4:08 PM


This revision is now accepted and ready to land.Mar 19 2018, 4:08 PM

Want to land this @dschuff ? Or I can if you prefer.

39 ↗(On Diff #138098)

"used a" -> "used as"

This revision was automatically updated to reflect the committed changes.