This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Parsing missing directives to produce valid .o
ClosedPublic

Authored by aardappel on Oct 29 2018, 3:35 PM.

Details

Summary

The assembler was able to assemble and then dump back to .s, but
was failing to parse certain directives necessary for valid .o
output:

  • .type directives are now recognized to distinguish function symbols and others.
  • .size is now parsed to provide function size.
  • __stack_pointer is now special-cased with code that is shared with Codegen.

Added test that previously was failing on all 3 of the above.

Also makes a small change in StringSwitch.h for some missing
functionality (being able to detect no match).

Diff Detail

Event Timeline

aardappel created this revision.Oct 29 2018, 3:35 PM
aardappel added inline comments.Oct 29 2018, 3:40 PM
include/llvm/ADT/StringSwitch.h
214

Since this is a core LLVM class, should I get someone in particular to review? Or am I better off writing the calling code differently?

include/llvm/MC/MCSymbolWasm.h
73

Suggestions welcome where to move this code. The only library/.cpp that Codegen and Assembler share is the WebAssemblyInfo lib, which seemed very unrelated.

lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
282

I could remove the code below, left in for code review purposes.

aardappel edited the summary of this revision. (Show Details)Oct 29 2018, 3:42 PM
dschuff added inline comments.Oct 29 2018, 4:07 PM
include/llvm/ADT/StringSwitch.h
214

For our use case, we have T = wasm::WasmSymbolType and R defaults to T. Could we set R to Optional<wasm::WasmSymbolType> and just use Default instead of NoDefault?

dschuff added inline comments.Oct 29 2018, 4:24 PM
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
282

In general, I think making the assembler dumb is not a bad thing. Also you could argue that conventions for symbol names are compiler/ABI conventions and should not be automatically assumed by the assembler. So I guess if we thought of it that way it would mean that we'd have something like this generic code here, and the compiler would be required to output a directive declaring the type of __stack_pointer as a global, right? I think that is actually better than making the assembler smart about the compiler's ABI. Even the factoring of the Assembler/MC code reflects this idea; it's why there's no good place to put that code if it were shared.

aardappel added inline comments.Oct 29 2018, 5:05 PM
include/llvm/ADT/StringSwitch.h
214

That doesn't work because the existing Default always returns a T, i.e. there is no way to indicate a null Optional.

lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
282

Ok, so I could make the InstPrinter print out this info, but really the only information that should be in the .S output is wether it is global (which is already in there, and I can recover using the commented code below), and the type of the var, which I can also infer in the assembler. So really I don't need to share the function, the only reason to do was that a) it happens the same in both places, and b) that if someone decides to refactor this code they'll be forced to look at both.

I could make the assembler dumber if the InstPrinter also outputs the type. This could be in a directive like .global __stack_pointer i32 but there is no obvious prelude/post output in InstPrinter (on the wasm side). Or we could pack it all in the symbol reference somehow, like __stack_pointer@GLOBAL:I32 (would maybe require changes from the general parser) or __stack_pointer@GLOBAL_I32 (polute with wasm specific symbol types).

sbc100 added inline comments.Oct 30 2018, 8:52 AM
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
138

LastSymbolType?

145

Use = nullptr above instead?

282

Seem like either .global __stack_pointer or __stack_pointer@GLOBAL_I32 would be preferable to checkKnownSymbol. For one thing, it would be nice if authors of .S files could declare (or at least use) globals.

Also, .global is probably a bad directive name because .globl already exists for symbol binding.

376

Can we limit this to supporting just function and global here? Sections are created using .section directive and I think data is the default.

Then maybe you can avoid the StringSwitch change too.

396

Isn't .size already handled by the ELFAsmParser, e.g. ELFAsmParser::ParseDirectiveSize? Or do we not use the ELFAsmParser at all anymore?

aardappel added inline comments.Nov 1 2018, 9:28 AM
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
282

Talked with @sunfish also and he seemed to prefer a directive.

What name should we use instead? .wasm_global? That's a bit odd as none of the other directives have wasm_ in them. Maybe .globaltype __stack_pointer i32 makes sense.

396

Without me parsing this, the binary writer would complain the size wasn't set. I'll look into it.

dschuff added inline comments.Nov 1 2018, 10:08 AM
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
282

I like the directive idea too. ELF has .type (https://sourceware.org/binutils/docs/as/Type.html) for symbol types. I think we have been using it for functions and global variables already as a consequence of our overlap/inheritance of ELF. It seems like this would fit in the same space. Whether we directly use ELF's semantics for the possible types or the ELF implementation code would be another question, but I think conceptually .type makes sense.

kristina added inline comments.Nov 1 2018, 11:12 AM
include/llvm/ADT/StringSwitch.h
214

The general convention is to have an "invalid" type as a default which can be explicitly checked for, something like wasm::WASM_SYMBOL_TYPE_INVALID. IMO if the enum has no default value and yet it's possible for it to end up in an "invalid" state, it *should* have a default value, which makes it much easier to understand the code.

aardappel added inline comments.Nov 1 2018, 12:47 PM
include/llvm/ADT/StringSwitch.h
214

I agree, and most enums have these. I didn't want to add such an enum value here since 0 was already taken by a legal value, and I thought it should be possible to detect that no case applies. But this is the best solution I will add such a value, will need to be value 4 though (enum storage type is unsigned).

lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
282

We use it as .type test0,@function already, so I could add parsing of .type __stack_pointer,@global,i32 ?

I'd have to output such directives at the end of a .s though, if InstPrinter permits, since I won't know what globals I will encounter at the start.

aardappel marked 16 inline comments as done.Nov 2 2018, 9:48 AM
aardappel added inline comments.
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
396

Yes, ELFAsmParser::ParseDirectiveSize is never called while assembling with llvm-mc.

aardappel updated this revision to Diff 172385.Nov 2 2018, 9:49 AM
aardappel marked an inline comment as done.

Refactored due to code review changes, should be a lot cleaner & simpler now :)

dschuff added inline comments.
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
195

This is pre-existing, so I don't want to hold up this CL, but I think this should only accept v128 and not the MVT types. In the wasm spec there is only a v128 value type (and hence params can only have this one type). In principle LLVM *shouldn't* need to know the MVT in its .param directive, although it's possible that something depends on that now. We should look into this separately. +cc @tlively

364–370

It just occurred to me: do we need to make this stateful like this? Should a .type directive create a symbol/type association regardless of where it is with respect to the actual label? It looks like the ELF Asm parser (ParseDirectiveType) just looks up the symbol (creating if necessary) and emits a symbol attribute to the streamer (which is what actually manipulates the symbol). Maybe we could use this pattern in wasm too, to make asm vs object more uniform?

dschuff accepted this revision.Nov 2 2018, 10:38 AM

I like the idea of trying to make the assembler less stateful if possible, but that can be another CL

This revision is now accepted and ready to land.Nov 2 2018, 10:38 AM
sbc100 added inline comments.Nov 2 2018, 11:07 AM
include/llvm/BinaryFormat/Wasm.h
245 ↗(On Diff #172385)

I'd rather avoid adding this if possible. We don't want to allow symbols to have an invalid type, or have to add this invalid case to our switch statements over symbol type.

lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
364–370

This sounds good to me. Than we could avoid LastTypeOfSymbol and GlobalTypeFixups completely?

aardappel updated this revision to Diff 172414.Nov 2 2018, 12:20 PM

Made assembler less stateful thanks to getOrCreateSymbol.

aardappel marked an inline comment as done.Nov 2 2018, 12:20 PM
aardappel added inline comments.
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
195

I am fine modifying this either way, but yes, for MVT we don't have a generic type, so it is currently returning v16i8 which is bad.

364–370

Made it less stateful using getOrCreateSymbol.

sbc100 added inline comments.Nov 2 2018, 12:38 PM
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
376

So now that you don't need to store Type can you avoid adding WASM_SYMBOL_TYPE_INVALID and just do something like.

if ("function")
  setType(FUNCTION)
else ("global')
  setType(GLOBAL)
else
  error
393

What isn't ELFAsmParser::ParseDirectiveSize which then calls getStreamer().emitELFSize doing this already?

otherwise lgtm

sbc100 accepted this revision.Nov 2 2018, 2:30 PM

After speaking offline we decided to land this and then followup by investigating how to share more code (or not) with the ELFAsmParser.

aardappel updated this revision to Diff 172439.Nov 2 2018, 2:49 PM
aardappel marked an inline comment as done.

Attempt to fix generic directive parsing, but will do this in a followup.

aardappel marked an inline comment as done.Nov 2 2018, 2:50 PM
aardappel added inline comments.
lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
393

As discussed offline will do this in a followup, as it requires our own implementation of ELFAsmParser

This revision was automatically updated to reflect the committed changes.
aardappel marked an inline comment as done.

This patch addresses global types, but do we have a way to recognize function types for undefined functions? I filed a bug for that: PR39557

This patch addresses global types, but do we have a way to recognize function types for undefined functions? I filed a bug for that: PR39557

Thanks! Yes, I am sure there are plenty of directives missing. Effectively we have to encode all metadata the backend generates that is used for constructing .o in them.