- Previously, https://reviews.llvm.org/D72680 introduced a new attribute called AllowSymbolAtNameStart (in relation to the MAsmParser changes) in MCAsmInfo.h which (according to the comment in the header) allows the following behaviour:
/// This is true if the assembler allows $ @ ? characters at the start of /// symbol names. Defaults to false.
- However, the usage of this field in AsmLexer.cpp doesn't seem completely accurate* for a couple of reasons.
default: if (MAI.doesAllowSymbolAtNameStart()) { // Handle Microsoft-style identifier: [a-zA-Z_$.@?][a-zA-Z0-9_$.@#?]* if (!isDigit(CurChar) && isIdentifierChar(CurChar, MAI.doesAllowAtInName(), AllowHashInIdentifier)) return LexIdentifier(); }
- The Dollar and At tokens, when occurring at the start of the string, are treated as separate tokens (AsmToken::Dollar and AsmToken::At respectively) and not lexed as an Identifier.
- I'm not too sure why MAI.doesAllowAtInName() is used when AllowAtInIdentifier could be used. For X86 platforms, afaict, this shouldn't be an issue, since the CommentString attribute isn't "@". (alternatively the call to the setter can be set anywhere else as needed). The AllowAtInName does have an additional important meaning, but in the context of AsmLexer, shouldn't mean anything different compared to AllowAtInIdentifier
My proposal is the following:
- Introduce 3 new fields called AllowQuestionTokenAtStartOfString, AllowDollarTokenAtStartOfString and AllowAtTokenAtStartOfString in MCAsmInfo.h which will encapsulate the previously documented behaviour of "allowing $, @, ? characters at the start of symbol names")
- Introduce these fields where "$", "@" are lexed, and treat them as identifiers depending on whether Allow[Dollar|At]TokenAtStartOfString is set.
- For the sole case of "?", append it to the existing logic for treating a "default" token as an Identifier.
z/OS (HLASM) will also make use of some of these fields in follow up patches.
completely accurate* - This was based on the comments and the intended behaviour the code. I might have completely misinterpreted it, and if that is the case my sincere apologies. We can close this patch if necessary, if there are no changes to be made :)
Depends on https://reviews.llvm.org/D99374
It might be better to name these ...AtStartOfIdentifier (the previous nomenclature was ...AtNameStart, that works too).
Also nitpick: I would consider dropping Token from the names or using Symbol since I'd argue that these symbols never become tokens in this case. That is super pedantic though. :)