This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Ensure 'end_function' in functions
ClosedPublic

Authored by aheejin on Jan 5 2023, 5:18 PM.

Details

Summary

Local info is supposed to be emitted in the start of every function.
When there are locals, .local section should be present, and we emit
local info according to the section.

If there is no locals, empty local info should be emitted. This empty
local info is emitted whenever a first instruction is emitted within a
function without encountering a .local section. If there is no
instruction, end_function pseudo instruction should be present and the
empty local info will be emitted when parsing the pseudo instruction.

The following assembly is malformed because the function test doesn't
have an end_function at the end, and the parser doesn't end up
emitting the empty local info needed. But currently we don't error out
and silently produce an invalid binary.

.functype test () -> ()
test:

This patch adds one extra state to the Wasm assembly parser,
FunctionLabel to detect whether a function label is parsed but not
ended properly when the next function starts or the file ends.

It is somewhat tricky to distinguish FunctionLabel and
FunctionStart, because it is not always possible to ensure the state
goes from FunctionLabel -> FunctionStart. .functype directive does
not seem to be mandated before a function label, in which case we don't
know if the label is a function at the time of parsing. But when we do
know the label is function, we would like to ensure it ends with an
end_function properly. Also we would like to error out when it does
not.

For example,

.functype test() -> ()
test:

We should error out for this because we know test is a function and it
doesn't end with an end_function. This PR fixes this.

test:

We don't error out for this because there is no info that test is a
function, so we don't know whether there should be an end_function or
not.

test:
.functype test() -> ()

We error out for this currently already, because we currently switch to
FunctionStart state when we first see .functype directive after its
label definition.

Fixes https://github.com/llvm/llvm-project/issues/57427.

Diff Detail

Event Timeline

aheejin created this revision.Jan 5 2023, 5:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 5:18 PM
Herald added subscribers: pmatos, asb, wingo and 5 others. · View Herald Transcript
aheejin requested review of this revision.Jan 5 2023, 5:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 5:18 PM
aheejin edited the summary of this revision. (Show Details)Jan 5 2023, 5:29 PM
aheejin edited the summary of this revision. (Show Details)
aheejin planned changes to this revision.Jan 5 2023, 7:20 PM
sbc100 added inline comments.Jan 6 2023, 2:23 AM
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
882

It seems like if this .functype is the start of a function this condition should be CurrentState == FunctionLabel, no?

aheejin added inline comments.Jan 6 2023, 8:54 AM
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
882

No. Because the grammar doesn't seem to be very rigidly fixed, many cases are possible.

.functype test () ->()
test:
  .functype test () -> ()

This looks to be the case for the generated .s files now. Each function has two .functype directives. Because of the first .functype, by the time we parse test:, we know it is a function label. (This changes the state to FuncLabel) And the second .functype will run this code. (This changes the state to FuncStart) Note that the first .functype directive does not run this part of the code because it fails to meet if (WasmSym->isDefined()) at that point. So in this case the state changes are FileStart -> FuncLabel -> FuncStart.

test:
  .functype test () -> ()

This lacks the .functype before the function. We don't seem to mandate .functype before a function label. We currently accept this, and many functions in the tests under test/MC/WebAssembly also lack the .functype before the function label. In this case, by the time we parse test:, we don't know it is a function label. So we don't enter FuncLabel state. But by the time we parse .functype after that, we run this code and change the state to FuncStart. So the state changes are FileStart -> FuncStart.

.functype test () -> ()
test:

This is the invalid file https://github.com/llvm/llvm-project/issues/57427 reported. Unlike the case before a label, we mandate .functype after a function label. (Otherwise we don't enter FuncStart state at all.) But we don't reject this file; we just silently create an invalid binary, because we don't have any other instruction or end_function in this function, so we don't end up calling ensureLocals. But this test is still recognized as a function because of the .functype before the label, so this writer tries to write this function to the binary but doesn't end up emitting the local info. What this CL does is just reject this kind of malformed file.

So the state doesn't always move FuncLabel -> FuncStart. It sometimes moves to FuncStart directly. So what I wanted was to push Function only once for them. (We can't create two separate things to push because sometimes we don't enter FuncLabel) Hence the condition:

if (CurrentState != FunctionLabel) {
aheejin added inline comments.Jan 6 2023, 8:59 AM
llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
882

I forgot to say this in the comment above. So for the third case,

.functype test () -> ()
test:

The state changes are FileStart -> FuncLabel, and it stops there. We pushed Function when we parse the label, so when we reach the end of a file (or the next function label), we can figure out that this file is malformed. The previous code doesn't do anything when we parse a function label, and moves the state to FuncStart only after it sees .functype after parsing a function label, so we couldn't detect this is malformed.

sbc100 accepted this revision.Jan 9 2023, 3:24 AM
This revision is now accepted and ready to land.Jan 9 2023, 3:24 AM
sbc100 added a comment.Jan 9 2023, 3:26 AM

I wonder if its too late to change the requirement such that .functype should always come before the label? This would make the code simpler and the assembly format more consistent maybe?

asb added a comment.Jan 9 2023, 3:41 AM

I wonder if its too late to change the requirement such that .functype should always come before the label? This would make the code simpler and the assembly format more consistent maybe?

I'd hope there's still a fair amount of freedom for evolving this text format...but I also don't know if anyone is relying on it. Do we know of any wasm projects at all that use LLVM's assembly format for more than the odd inline asm.

I don't have the link to hand right now, but I seem to remember we'd thought about moving to a different marker for the start of a function, as the single-pass parser requires .functype declarations for all functions to be emitted up-front, leading to the change I introduced that printed all such declarations at the top of the file as well.

aheejin added a subscriber: aardappel.EditedJan 9 2023, 11:06 AM

I'm not very familiar with the history with which our assembly format has evolved. I also wonder if we really need two .functype directives. (The 'before label' one seems to be alowed to be omitted, but in normal circumstances it is not) If we mandate the .functype directive before a function label, can we not have the second .functype after a function label? Do we actually need the second one? Maybe @aardappel can shed some light on this, even though I'm not sure if he's still following this repo.

The reason I started working on this was just that I randomly grabbed an existing bug in llvm-project repo, thinking it looked very easy to fix.. But it turned out there could be many issues we can consider about deciding what our format should be. But at the same time, as @asb said, I don't know whether someone is actually relying on this, and we didn't hear many complaints about this anyway (other than the bug this CL fixes, which was about a invalid .s file), so I'm not sure if we need to invest much time on this at this point..

I think this change is find to land as is. If we decided to simplify that .s format we can possibly simplify the parser code later.

This revision was landed with ongoing or failed builds.Jan 9 2023, 11:11 AM
This revision was automatically updated to reflect the committed changes.
asb added a comment.Jan 9 2023, 1:41 PM

I'm not very familiar with the history with which our assembly format has evolved. I also wonder if we really need to .functype directives. (The 'before label' one seems to be alowed to be omitted, but in normal circumstances it is not) If we mandate the .functype directive before a function label, can we not have the second .functype after a function label? Do we actually need the second one? Maybe @aardappel can shed some light on this, even though I'm not sure if he's still following this repo.

The reason I started working on this was just that I randomly grabbed an existing bug in llvm-project repo, thinking it looked very easy to fix.. But it turned out there could be many issues we can consider about deciding what our format should be. But at the same time, as @asb said, I don't know whether someone is actually relying on this, and we didn't hear many complaints about this anyway (other than the bug this CL fixes, which was about a invalid .s file), so I'm not sure if we need to invest much time on this at this point..

Yes, landing this change as-is is definitely the right path. My thought re .functype was Just something to keep in mind if anyone looks at further changes in the future.