This is an archive of the discontinued LLVM Phabricator instance.

[AsmParser][SystemZ][z/OS] Use updated framework in AsmLexer to accept special tokens as Identifiers
ClosedPublic

Authored by anirudhp on Apr 21 2021, 7:51 AM.

Details

Summary
  • Previously, https://reviews.llvm.org/D99889 changed the framework in the AsmLexer to treat special tokens, if they occur at the start of the string, as Identifiers.
  • These are used by the MASM Parser implementation in LLVM, and we can extend some of the changes made in the previous patch to SystemZ.
  • In SystemZ, the special "tokens" referred to here are "_", "$", "@", "#". [_|$|@|#] are already supported as "part" of an Identifier.
  • The changes in this patch ensure that these special tokens, when they occur at the start of the Identifier, are treated as Identifiers.

Diff Detail

Event Timeline

anirudhp created this revision.Apr 21 2021, 7:51 AM
anirudhp requested review of this revision.Apr 21 2021, 7:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2021, 7:51 AM

Just making sure I understand, this is so an identifier can start with # without being a comment?

anirudhp updated this revision to Diff 339313.Apr 21 2021, 10:33 AM
  • Rebase on latest master

Just making sure I understand, this is so an identifier can start with # without being a comment?

No.
Currently, if the AllowAdditionalComments is set to true, any string starting with # will be lexed as a comment. This is to ensure that if it isn't lexed as a comment, we lex it as an identifier and not as an AsmToken::Hash. Like the comment in MCAsmInfo.h for the added attribute says:

/// If the CommentString is also set to "#", setting this option will have
/// no effect, and the string will be lexed as a comment.
epastor added inline comments.Apr 22 2021, 5:57 AM
llvm/include/llvm/MC/MCAsmInfo.h
207–208

Is there a place where we could add a test for this behavior, if it's how we want this to behave? Is there some natural environment/configuration where we expect the CommentString to be set to "#" while AllowHashAtStartOfIdentifier is also true?

anirudhp added inline comments.Apr 22 2021, 7:41 AM
llvm/include/llvm/MC/MCAsmInfo.h
207–208

Sure, I'll update the unit test to test the interaction between the permutations of the two attributes.

By default, CommentString is set to #. But there's another point here. Whenever a comment string starts with # it will be attempted to be lexed as a comment irrespective of what CommentString is set to because the Lexer currently lexes strings that start with # as a possible comment. This is now guarded by the AllowAdditionalComments attribute in MCAsmInfo.h.

So, if AllowHashAtStartOfIdentifier is set to true, then it will be only lexed as an Identifier, if it is not already lexed as a possible comment. If, under the current Lexer's rules, it is possible to be lexed as a comment, it will be done so (and not as an Identifier)

anirudhp updated this revision to Diff 339636.Apr 22 2021, 8:04 AM
  • Added additional tests to test behaviour between setting AllowHashAtStartOfIdentifier and when AsmLexer "decides" to lex a string as a possible comment.
  • Updated comments for a few attributes in MCAsmInfo.h to make it more clear.
  • Rebase on lastest master
llvm/include/llvm/MC/MCAsmInfo.h
207–208

I have added a few more tests.

Thanks for adding tests. LGTM

This revision is now accepted and ready to land.Apr 26 2021, 10:44 AM