This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Added initial type checker to MC Assembler
ClosedPublic

Authored by aardappel on Jun 25 2021, 12:43 PM.

Details

Summary

This to protect against non-sensical instruction sequences being assembled,
which would either cause asserts/crashes further down, or a Wasm module being output that doesn't validate.

Unlike a validator, this type checker is able to give type-errors as part of the parsing process, which makes the assembler much friendlier to be used by humans writing manual input.

Because the MC system is single pass (instructions aren't even stored in MC format, they are directly output) the type checker has to be single pass as well, which means that from now on .globaltype and .functype decls must come before their use. An extra pass is added to Codegen to collect information for this purpose, since AsmPrinter is normally single pass / streaming as well, and would otherwise generate this information on the fly.

A -no-type-check flag was added to llvm-mc (and any other tools that take asm input) that surpresses type errors, as a quick escape hatch for tests that were not intended to be type correct.

This is a first version of the type checker that ignores control flow, i.e. it checks that types are correct along the linear path, but not the branch path. This will still catch most errors. Branch checking could be added in the future.

Diff Detail

Event Timeline

aardappel created this revision.Jun 25 2021, 12:43 PM
aardappel requested review of this revision.Jun 25 2021, 12:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2021, 12:43 PM

This is a first version that passes all (modified) tests. There is still a few things wrong with this, but I thought I'd get some feedback on this direction as it stands.

sbc100 added inline comments.Jun 28 2021, 1:24 PM
lld/test/wasm/Inputs/call-ret32.s
17

Remove extra newline too?

lld/test/wasm/bsymbolic.s
102

Is this change needed here or just for consistency? If its needed why does declaring the linkage type as global up front help the validator?

lld/test/wasm/gc-imports.s
18

Probably best to leave _start void->void as that is in some sense part of the ABI

lld/test/wasm/relocatable-comdat.s
11

Ditto, leave sig alone and just add drop maybe?

lld/test/wasm/relocation-bad-tls.s
6

Ditto

lld/test/wasm/shared64.s
82

Yup.. good catch.

lld/test/wasm/signature-mismatch-relocatable.s
15

Ditto.

llvm/test/CodeGen/WebAssembly/global-get.ll
76

This seems like i32_external_used should have some kind of type, no? I guess we should look at the FIXME here before landing.

llvm/test/MC/WebAssembly/basic-assembly.s
59

Did we not loose coverage of these different block types here?

114

Why this? Why not just drop?

llvm/test/MC/WebAssembly/globals.s
26

Nice!

Just an initial review of the test output.. Looks pretty great in general. Nice to see some glaring errors in the tests get fixed.

Do you feel like landing the lld tests up front as an NFC maybe? (I could go either way on that since maybe landing together is nice too).

aardappel marked 8 inline comments as done.Jun 28 2021, 2:03 PM
aardappel added inline comments.
lld/test/wasm/bsymbolic.s
102

not needed, probably part of an earlier refactor I changed later. reverted.

llvm/test/CodeGen/WebAssembly/global-get.ll
76

Yes, this is the case I mentioned I couldn't figure out. This seems like the only place where we are declaring custom globals and they are handled differently from the standard ones it seems..

llvm/test/MC/WebAssembly/basic-assembly.s
59

This file is really a random collection of wasm features, meant to test the assembler more than the features in question. This file does not promise to be complete in any way.. I simply made the changes in a way to require as few changes to the original as possible while type-checking.

114

I guess I was testing global.set as to how its handled in the type checker? drop would allow any type from global.get, so not a great test. Again, this file is a grab-bag of tests.

aardappel updated this revision to Diff 355039.Jun 28 2021, 2:07 PM
aardappel marked an inline comment as done.

Addressed most of sbc code review

sbc100 added a subscriber: pmatos.Jun 28 2021, 2:45 PM

lgtm

We should probably wait for @dschuff input before landing.

llvm/include/llvm/CodeGen/MachineModuleInfoImpls.h
113

The classes above seem to track sets of MCSymbol* (see SymbolListTy). Would that make sense here too (rather than StringRef)?

llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
248

Why not SmallVector like above?

257

Can we do SkipTypeCheck(Options.MCNoTypeCheck) here in the member initializer? (same with is64(STI.getTargetTriple().isArch64Bit())?

Then the don't need to inline = false above either I guess?

269

Is this a permanent caveat or should we have a TODO here?

1195

All this new code here makes me wonder if the typechecker should be its own class with its own state. Maybe not worth it?

1361

I surprised this functionality doesn't exist somewhere else already..

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
212

Is this function called maybe times with the same name? Can/shouldSetWasmSymType we avoid re-doing the work herein?

I just saw you have a FIXME for that already below. Maybe we can tell if this symbols has already had its type set and early out?

366

The old HasAddressTakenFunction was only set if !F.isIntrinsic() too.. is this change OK/indended?

375

Left over debugging?

528

So other AsmPrinters use this print the linkage of a single symbol, but we are co-opting to print the all signatures?

531

Is this just const cast? Why not use const_cast?

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
86

I think the direction is towards lowerSnakeCase in this API. Also it would match better with getMCSymbolForFunction above to call this setWasmSymType.

llvm/lib/Target/WebAssembly/WebAssemblyInstrTable.td
16

I don't really grok this part of the change. @pmatos will know more. Can we split part out since it feels like its own bugfix. How was it not breaking stuff? (Or was it?)

llvm/lib/Target/WebAssembly/WebAssemblyMCInstLower.cpp
94

Actually maybe we should just call this method something like getOrCreateWasmSymbol? (don't feel strongly about this).

aardappel marked 10 inline comments as done.Jun 28 2021, 3:38 PM
aardappel added inline comments.
llvm/include/llvm/CodeGen/MachineModuleInfoImpls.h
113

The pass that uses this runs before AsmPrinter/MCLower, thus MCContext etc is not available yet.

llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
248

There's only ever 1 instance of these per assembler, so there is no allocation cost. But will change them to SmallVector anyway.

269

This is intended to be fairly permanent. The assembler is run on the inline asm contents without any context of a function, or globals, or anything, which breaks everything this type checker does. We'd need to somehow recover all that information. I made inline asm "work" in the past because, why not, but I am not sure there's any legit uses for it in Wasm.

1195

Yup, that could probably be done? Lemme give that a go..

1361

Nope.. I definitely looked.

This stuff is really ugly though, it does a linear search thru the table, and then the type checker above does linear if-checks against the resulting string.

Would be awesome to not do this.

We could check directly against the opcode, but there's one opcode per type (8 or so of them), and then of course occasionally we make new types. So this would make for a very fast but very fragile switch-case above.

3rd option is to generate some new table thru tablegen that has the exact mapping we need.

Opinions?

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
366

not intended, good catch!

528

Yes, it was difficult to find a point where to emit this, since the constructor is executed way before anything is run, there's no onStartOfFile or similar, so this is the earliest part of emitting the first function which seemed most suitable.

llvm/lib/Target/WebAssembly/WebAssemblyInstrTable.td
16

It wasn't breaking much, since this was so far only used in asm. I'd rather keep this all in one place, showing the things the type-checker found.

dschuff added inline comments.Jul 8 2021, 5:55 PM
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
243

why is Stack wrapped inside another struct (instead of a typedef or using)? I don't see any other uses of TypeStack anywhere other than just via TS.

1012

I think you can use LLVM_DUMP_METHOD rather than wrapping this whole method in LLVM_DEBUG. I forget exactly what that does (it's not exactly the same as LLVM_DEBUG) but might still do what you want.

edit: based on how it's used below, I think you'd still need to wrap calls in LLVM_DEBUG() which might be what you're trying to avoid now. So I'll leave it up to you to decide whether you want to do that, but leave this comment here in case you weren't aware of LLVM_DUMP_METHOD

1156

Is this right? function signatures for indirect calls aren't checked until runtime?

1361

This method is really only used by DumpStack, which is only used under LLVM_DEBUG, right? linear search might not be so bad in that case.

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
357

I guess this only needs to run here in the case that there were no linkages to emit? Can that ever happen (a module with no functions and no globals?)

aardappel marked 4 inline comments as done and an inline comment as not done.Jul 9 2021, 10:29 AM
aardappel added inline comments.
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
243

That was to prepare for having copies of this stack in TypedLabels, but that so far isn't implemented. I can remove it for now.

1012

Yes, wasn't aware of it.. LLVM_DUMP_METHOD by itself doesn't do anything, but there is the practice to create dump methods on types rather than what I did here.. I could do that on TypeStack that I am about the remove :) But in all, the existing code is simpler.

1156

I'm not sure what you're asking. For the assembler, the instruction specifies a signature inline, so it is checked against the stack at.. assembly time. For Wasm in general, the i32 on the stack can be any old integer, and thus, any function with any signature, so that is checked at runtime.

1361

No, it is used by the big long if-else chain of the main type checker above, so it is definitely a possible performance issue.

llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
357

yes, this was the location where the code was originally run from, so I put it here to make sure it runs at least once. It may well be that it can never happen, but I'd rather protect against that.

aardappel updated this revision to Diff 357566.Jul 9 2021, 10:34 AM

Addressed code review.

Also moved all of the type checking functionality to its own files, which will mess up past code comment locations, sorry. But it does look a whole lot neater :)

dschuff accepted this revision.Jul 9 2021, 11:14 AM
dschuff added inline comments.
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
1156

oh right, this checks the function as called, not as declared.

This revision is now accepted and ready to land.Jul 9 2021, 11:14 AM

Ok, I can land this as-is, with the few warts it has. Might be easier to address things in followups given the size.

Yeah, I think that makes sense.

This revision was landed with ongoing or failed builds.Jul 9 2021, 2:08 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Jul 9 2021, 4:05 PM

Thanks for updating the .gn files, but that's not necessary: A bot syncs them from the cmake files.