This is an archive of the discontinued LLVM Phabricator instance.

[MC] Add three-state parseDirective as a replacement for ParseDirective
ClosedPublic

Authored by barannikov88 on Jun 29 2023, 8:41 AM.

Details

Summary

Conventionally, parsing methods return false on success and true on
error. However, directive parsing methods need a third state: the
directive is not target specific. AsmParser::parseStatement detected
this case by using a fragile heuristic: if the target parser did not
consume any tokens, the directive is assumed to be not target-specific.

Some targets fail to follow the convention: they return success after
emitting an error or do not consume the entire line and return failure
on successful parsing. This was partially worked around by checking for
pending errors in parseStatement.

This patch tries to improve the situation by introducing parseDirective
method that returns ParseStatus -- three-state class. The new method
should eventually replace the old one returning bool.

ParseStatus is intentionally implicitly constructible from bool to allow
uses like return Error(Loc, "message"). It also has a potential to
replace OperandMatchResulTy as it is more convenient to use due to the
implicit construction from bool and more type safe.

Diff Detail

Event Timeline

barannikov88 created this revision.Jun 29 2023, 8:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 8:41 AM

Use the new method in WebAssembly target

barannikov88 published this revision for review.Jun 29 2023, 8:57 AM
barannikov88 added reviewers: MaskRay, benshi001.
barannikov88 added a subscriber: benshi001.
barannikov88 added inline comments.
llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp
700

@benshi001
There is something odd here. It handles cases like: .word foo - bar, emits some data, but then lets the generic parser handle the directive, which results in emitting more data. It also looks like this code path is not covered by object-emission tests. This was introduced in D38029.

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 8:57 AM

From the wasm side, this looks very nice!

MaskRay added inline comments.Jun 29 2023, 9:49 AM
llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp
700

Thank your for noting this.

Side note: I find that many target AsmParser have poor test coverage, and we need some attention.

barannikov88 added inline comments.Jun 29 2023, 9:59 AM
llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp
700

I'll try to add a few tests by the end of the week

Update CSKY, Sparc and VE

barannikov88 added inline comments.Jun 29 2023, 11:01 AM
llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h
149

Does anyone have a preference for ParseResult over ParseStatus? I'm not sure which one is better.

MaskRay accepted this revision.EditedJun 29 2023, 11:34 AM

Conventionally, parsing methods return false on success and true on error. However, directive parsing methods need a third state: [...]

Agreed. Thanks for cleaning up this.

ParseStatus is intentionally implicitly constructible from bool to allow uses like return Error(Loc, "message").

Nice!

It also has a potential to replace OperandMatchResulTy as it is more convenient to use due to the implicit construction from bool and more type safe.

Agreed. It looks like OperandMatchResulTy could be named more generically to support what ParseStatus does in this patch, but unfortunately this is difficult to change as the three enum members have 1000+ references...

llvm/lib/Target/AVR/AsmParser/AVRAsmParser.cpp
700

Thanks! I need to amend my comment. Many target AsmParser have poor test coverage for error conditions.

(For some targets (e.g. CSKY), I think many non-error-condition code paths are not covered.)
For future in-tree targets, I guess we should pay more attention on their test coverage..

This revision is now accepted and ready to land.Jun 29 2023, 11:34 AM

Minor suggestion to the summary: I think you can say that AsmParser::parseStatement has the fragile heuristic in the first paragraph, so that curious readers can find this important location immediately.
Otherwise due to changes to a lot of lib/Target code, they may not notice this quickly.

barannikov88 edited the summary of this revision. (Show Details)Jun 29 2023, 1:01 PM

@MaskRay

Thank you for your comments,

Minor suggestion to the summary: I think you can say that AsmParser::parseStatement has the fragile heuristic in the first paragraph, so that curious readers can find this important location immediately.
Otherwise due to changes to a lot of lib/Target code, they may not notice this quickly.

I tried to improve the summary. If it is what your meant, please feel free to update it (looks like phabricator allows it).

It looks like OperandMatchResulTy could be named more generically to support what ParseStatus does in this patch, but unfortunately this is difficult to change as the three enum members have 1000+ references...

I think it could be handled gradually, but I haven't come with a migration plan yet. The most difficult part is tblgenerated methods such as MatchOperandParserImpl, which poison every operand parsing method they call with ParseStatus across all targets.

@MaskRay

Thank you for your comments,

Minor suggestion to the summary: I think you can say that AsmParser::parseStatement has the fragile heuristic in the first paragraph, so that curious readers can find this important location immediately.
Otherwise due to changes to a lot of lib/Target code, they may not notice this quickly.

I tried to improve the summary. If it is what your meant, please feel free to update it (looks like phabricator allows it).

LG. Thanks!

It looks like OperandMatchResulTy could be named more generically to support what ParseStatus does in this patch, but unfortunately this is difficult to change as the three enum members have 1000+ references...

I think it could be handled gradually, but I haven't come with a migration plan yet. The most difficult part is tblgenerated methods such as MatchOperandParserImpl, which poison every operand parsing method they call with ParseStatus across all targets.

I haven't investigated closely but I guess some user-defined conversion function may help gradual migration.

Address llvm-ml failures

Add some tests

MaskRay added inline comments.Jun 29 2023, 4:05 PM
llvm/test/MC/MSP430/directive-byte-word-long-invalid.s
4

:[[#@LINE+3]]

[[@LINE]] is deprecated FileCheck syntax. I add a : before the line number just in case 3 matches 13, but I don't feel useful including the filename, so... Feel free to not adopt my leading : suggestion.

Add one more test and revert a couple of wrong changes

Use modern FileCheck syntax in new tests

barannikov88 marked an inline comment as done.Jun 29 2023, 4:41 PM
barannikov88 added inline comments.
llvm/test/MC/MSP430/directive-byte-word-long-invalid.s
4

:[[#@LINE+3]]

[[@LINE]] is deprecated FileCheck syntax. I add a : before the line number just in case 3 matches 13, but I don't feel useful including the filename, so... Feel free to not adopt my leading : suggestion.

I didn't know that it is deprecated, thanks for the heads-up. I used the new syntax in new tests, but kept the legacy syntax in the CSKY test for consistency with existing checks.

About the extra colon, I think it is not worth the effort ;) The chances that it will match a different line seem fairly low.

4

:[[#@LINE+3]]

[[@LINE]] is deprecated FileCheck syntax. I add a : before the line number just in case 3 matches 13, but I don't feel useful including the filename, so... Feel free to not adopt my leading : suggestion.

barannikov88 marked an inline comment as done.

Drop outdated comment

Add a few more checks to WebAssemblyAsmParser

It is difficult to write good tests for WebAssembly directives becasue its parseDirective does not track column numbers. It needs larger refactoring, and I don't have time for this. I'll have to leave it to someone else.

It is difficult to write good tests for WebAssembly directives becasue its parseDirective does not track column numbers. It needs larger refactoring, and I don't have time for this. I'll have to leave it to someone else.

Understood, thanks for calling that out.

barannikov88 added a comment.EditedJun 30 2023, 2:16 PM

It is difficult to write good tests for WebAssembly directives becasue its parseDirective does not track column numbers. It needs larger refactoring, and I don't have time for this. I'll have to leave it to someone else.

Understood, thanks for calling that out.

Sorry, I was wrong about that. Here is an example of the current diagnostics:

/mnt/d/llvm-project/llvm/test/MC/WebAssembly/directives-invalid.s:13:18: error: Unknown type in .globaltype directive: i42
.globaltype sym, i42
                 ^

/mnt/d/llvm-project/llvm/test/MC/WebAssembly/directives-invalid.s:16:22: error: Expected identifier, got:

.globaltype sym, i32,
                     ^
/mnt/d/llvm-project/llvm/test/MC/WebAssembly/directives-invalid.s:16:22: error: Unknown type in .globaltype modifier:

.globaltype sym, i32,
                     ^

/mnt/d/llvm-project/llvm/test/MC/WebAssembly/directives-invalid.s:19:23: error: Unknown type in .globaltype modifier: unmutable
.globaltype sym, i32, unmutable
                      ^

        .globaltype     sym, i32, immutable
/mnt/d/llvm-project/llvm/test/MC/WebAssembly/directives-invalid.s:22:32: error: Expected EOL, instead got: ,
.globaltype sym, i32, immutable,
                               ^

That is, column numbers are printed correctly. Still, there is a room for improvement. If you're going to put this into your backlog, here are a few ideas:

  • split parseDirective into several methods, one per directive;
  • use parseComma(), parseOptionalToken(), parseEOL() instead of custom expect() and isNext();
  • report errors via Error() with SMRange argument instead of "got: something" message;
  • it is conventional to start diagnostic messages with a lowercase letter.

I'll add a few tests verifying the current behavior.

Add a few wasm tests

kosarev added a subscriber: kosarev.Jul 2 2023, 1:54 AM
kosarev added inline comments.
llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
520

Can we just remove this function? +Same for LoongArch, M68k, Xtensa.

barannikov88 added inline comments.Jul 2 2023, 2:29 AM
llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
520

I guess we can. The old method was pure virtual and had to be overridden, the new one does not have to.

barannikov88 marked an inline comment as done.Jul 4 2023, 2:03 PM
barannikov88 added inline comments.
llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
520