This is an archive of the discontinued LLVM Phabricator instance.

[MC/AsmLexer] Add '?' (Question) token
ClosedPublic

Authored by barannikov88 on Jun 30 2023, 4:08 AM.

Details

Summary

'?' is a valid token in our downstream target. There seem to be no way
to do target-specific lexing, so just add make AsmParser recognize it.

Diff Detail

Event Timeline

barannikov88 created this revision.Jun 30 2023, 4:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 4:08 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
barannikov88 requested review of this revision.Jun 30 2023, 4:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 4:08 AM

Update a comment

Rearrange switch cases to match the comment

I'll be out for a few days and don't have time to read many patches today.
Do you whether GNU assembler accepts ? for any target? If not, this can be a bit strange...

I'll be out for a few days and don't have time to read many patches today.

It is not urgent and I'm not pushing you. It's just I don't know who else to ask for review, not many people have been contributing to MC layer last years.

Do you whether GNU assembler accepts ? for any target? If not, this can be a bit strange...

Last time I checked, GNU assembler didn't have dedicated lexer. It has helper routines for parsing expressions, identifiers, numbers etc. but otherwise parsing is up to the target, so I guess it can accept pretty much everything.

I wish I could do it by adding some code to target-specific parser, but this is not currently possible: unrecognized characters are replaced with AsmToken::Error by the lexer and they do not even get to MCTargetAsmParser, which is the only customization point (apart from MCAsmInfo callbacks).

Update failing unit test

In MSVC, mangled C++ names start with ? and the symbols are quoted. If your target needs ?, you can use quotes as well?

In GNU assembler, there is an undocumented MRI mode that accepts ?, but we don't support the mode.

In MSVC, mangled C++ names start with ? and the symbols are quoted. If your target needs ?, you can use quotes as well?

'?' is already allowed to be a part of an identifier, and it can also be the first character of an identifier if doesAllowQuestionAtStartOfIdentifier returns true.
I need this symbol in a different context -- it is used as ternary operator in some instructions, e.g. ".float vreg0 = not mask ? vreg1 : vreg2" (assembler syntax is "high level"-ish).
The spaces around '?' are optional.

Note that adding a dedicated token kind for '?' can't break anything, because it used to be a hard error to use it in non-identifier context. This also matches how other "special" identifier characters are handled: '$', '@' and '#'.

MaskRay accepted this revision.Jul 7 2023, 3:19 PM

I agree that this is a way forward and enable disabling ? in the middle of symbol names.

(Unfortunately D72680 did not add ? at symbol start tests, and I am unclear whether ml.exe does accept these symbols.)

This revision is now accepted and ready to land.Jul 7 2023, 3:19 PM
This revision was landed with ongoing or failed builds.Jul 13 2023, 5:46 PM
This revision was automatically updated to reflect the committed changes.