'?' is a valid token in our downstream target. There seem to be no way
to do target-specific lexing, so just add make AsmParser recognize it.
Details
- Reviewers
MaskRay - Commits
- rG682fe17e08c5: [MC/AsmLexer] Add '?' (Question) token
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'll be out for a few days and don't have time to read many patches today.
Do you whether GNU assembler accepts ? for any target? If not, this can be a bit strange...
It is not urgent and I'm not pushing you. It's just I don't know who else to ask for review, not many people have been contributing to MC layer last years.
Do you whether GNU assembler accepts ? for any target? If not, this can be a bit strange...
Last time I checked, GNU assembler didn't have dedicated lexer. It has helper routines for parsing expressions, identifiers, numbers etc. but otherwise parsing is up to the target, so I guess it can accept pretty much everything.
I wish I could do it by adding some code to target-specific parser, but this is not currently possible: unrecognized characters are replaced with AsmToken::Error by the lexer and they do not even get to MCTargetAsmParser, which is the only customization point (apart from MCAsmInfo callbacks).
In MSVC, mangled C++ names start with ? and the symbols are quoted. If your target needs ?, you can use quotes as well?
In GNU assembler, there is an undocumented MRI mode that accepts ?, but we don't support the mode.
'?' is already allowed to be a part of an identifier, and it can also be the first character of an identifier if doesAllowQuestionAtStartOfIdentifier returns true.
I need this symbol in a different context -- it is used as ternary operator in some instructions, e.g. ".float vreg0 = not mask ? vreg1 : vreg2" (assembler syntax is "high level"-ish).
The spaces around '?' are optional.
Note that adding a dedicated token kind for '?' can't break anything, because it used to be a hard error to use it in non-identifier context. This also matches how other "special" identifier characters are handled: '$', '@' and '#'.
I agree that this is a way forward and enable disabling ? in the middle of symbol names.
(Unfortunately D72680 did not add ? at symbol start tests, and I am unclear whether ml.exe does accept these symbols.)