This is an archive of the discontinued LLVM Phabricator instance.

[AsmParser][SystemZ][z/OS] Add in support to accept "#" as part of an Identifier token
ClosedPublic

Authored by anirudhp on Mar 24 2021, 9:36 AM.

Details

Summary
  • This patch adds in support to accept the "#" character as part of an Identifier.
  • This support is needed especially for the HLASM dialect since "#" is treated as part of the valid "Alphabet" range
  • The way this is done is by making use of the previous precedent set by the AllowAtInIdentifier field in MCAsmLexer.h. A new field called AllowHashInIdentifier is introduced.
  • The static function IsIdentifierChar is also updated to accept the # character if the AllowHashInIdentifier field is set to true.

Note: The field introduced in MCAsmLexer.h could very well be moved to MCAsmInfo.h. I'm not opposed to it. I decided to put it in MCAsmLexer since there seems to be some sort of precedent already with AllowAtInIdentifier.

Diff Detail

Event Timeline

anirudhp created this revision.Mar 24 2021, 9:36 AM
anirudhp requested review of this revision.Mar 24 2021, 9:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2021, 9:36 AM
llvm/lib/MC/MCParser/AsmLexer.cpp
145–146

nit: This comment should probably be updated to reflect the new changes.

730

same suggestion here as well. This comment may need to be updated.

anirudhp updated this revision to Diff 333031.Mar 24 2021, 9:57 AM
  • Addressing Comments (Changing comment with the updated grammar)
anirudhp marked 2 inline comments as done.Mar 24 2021, 9:57 AM
anirudhp added inline comments.
llvm/lib/MC/MCParser/AsmLexer.cpp
145–146

Good catch. Done.

anirudhp marked an inline comment as done.Mar 24 2021, 9:57 AM
anirudhp updated this revision to Diff 333075.Mar 24 2021, 11:45 AM
  • Addressed clang-tidy warnings/errors
anirudhp edited the summary of this revision. (Show Details)Mar 24 2021, 12:13 PM
anirudhp edited the summary of this revision. (Show Details)

There is one failing mlir unit test which I suspect might be caused by this patch and should be fixed.

There is one failing mlir unit test which I suspect might be caused by this patch and should be fixed.

Its probably these 2 commits and not because of my commit: https://reviews.llvm.org/rGf6e0fc2ddd8e9286e52a9931c833a64366a1ba41 and https://reviews.llvm.org/rGbc888a0fd61aa3a34102593867f76b5600c7a61e. It was fixed recently. I might have to rebase on latest master just to be consistent.

There is one failing mlir unit test which I suspect might be caused by this patch and should be fixed.

Its probably these 2 commits and not because of my commit: https://reviews.llvm.org/rGf6e0fc2ddd8e9286e52a9931c833a64366a1ba41 and https://reviews.llvm.org/rGbc888a0fd61aa3a34102593867f76b5600c7a61e. It was fixed recently. I might have to rebase on latest master just to be consistent.

oh you're right, those are more likely to be the cause. It would be good to rebase and double-check. The changes LGTM

This revision is now accepted and ready to land.Mar 29 2021, 7:34 AM
anirudhp updated this revision to Diff 333879.Mar 29 2021, 8:16 AM
  • Rebasing on latest master

Gentle Ping :)

anirudhp added a reviewer: nickdesaulniers.
anirudhp added a reviewer: MaskRay.
llvm/include/llvm/MC/MCParser/MCAsmLexer.h
151

The only caller is a test?

I think if you add an llvm/test/MC/SystemZ style test, you'd find this code probably isn't working as expected.

It's curious to me why this differs from how AllowAtInIdentifier is set in the constructor for AsmLexer.

llvm/lib/MC/MCParser/AsmLexer.cpp
149

If you flip the order of AllowHash && C == '#' that would be better, since AllowHash is mostly always false for the general case. Feel free to reorder AllowAt and move both of those to be the final checks.

733

do we need a similar method to MCAsmInfo::doesAllowAtInName? I wonder why MCAsmInfo and MCAsmLexer have their own fields for that? Seems like perhaps a (pre-existing) mistake.

llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp
98

it's not clear to me how any of the changes to llvm/lib/ affect these changes to llvm/unittests/ in regards to explicitly calling Lex.

anirudhp added inline comments.Mar 30 2021, 11:01 AM
llvm/include/llvm/MC/MCParser/MCAsmLexer.h
151

The only caller is a test?

For now, yes. There is another patch in waiting (see https://reviews.llvm.org/D98276). Ideally, the call to the setter would be set in the constructor for the HLASMAsmParser class)

I think if you add an llvm/test/MC/SystemZ style test, you'd find this code probably isn't working as expected.

A bit of background:
I probably should have mentioned this in the description, but these changes would be used by the z/OS target for which we cannot currently generate code yet (work for this is ongoing). Hence, the non-existence of a lit test (which would make things very easy for these types of changes).

Since the LLVM community prefers to have a test for any functional code changes, I came up with a unit test in the interim, since this would help "alleviate" some of the wait time (i.e. while we wait for the corresponding ObjectFile/Streamer support for the z/OS target and other patches that might be in review)

To answer your question:
Yes, a test in llvm/test/MC/SystemZ would not work as "intended" because the call to the setter isn't made anywhere. To be more specific, for the s390x-ibm-linux triple, the behaviour would be "unexpected". For an s390x-none-zos triple, the test wouldn't even run because we don't have relevant ObjectFile/Streamer support yet (as mentioned above)

It's curious to me why this differs from how AllowAtInIdentifier is set in the constructor for AsmLexer.

It doesn't differ in terms of the behaviour that it is exposing. I don't really see an issue with moving it to a constructor and providing with similar behaviour (i.e. is MAI's CommentString isn't # set it to true), but since a default value of false was provided at the time of declaration, I didn't see the need for it.

If it is preferred to move to the constructor, I don't mind doing so.

llvm/lib/MC/MCParser/AsmLexer.cpp
733

I do think you're right, I don't see a reason why for the MS-case, especially in the lexing phase, that we should be using the MCAsmInfo::doesAllowAtInName field. I did spend some time thinking about this. To me, it does look like a "pre-existing mistake" in this particular case, but I will admit I am not too familiar with the MS asm semantics.

However, in terms of the behaviour of the two fields:

From the comment in MCAsmInfo.h, it was to "allow @ in a symbol name" and this is used in AsmParser.cpp. You could have a symbol name such as "abc@variant", where "variant" could possibly be a valid symbol variant (MCSymbolRefExpr::VariantKind). If this attribute is set to true, its assumed there is no special symbol variant for it to lookup.

AllowAtInName in MCAsmLexer.h seems exclusively for lexing purposes. Where you want to include "accepting" @ when you see it as part of an AsmToken::Identifier

llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp
98

My apologies. Could you expand a bit on your comment?

Afaik, we need the initial call to Lex to start the Lexer (to start getting the tokens one by one for later processing and checking). I could technically move it into the lexAndCheckTokens function before I run the loop, if that's what you mean?

But the way it was initially setup, I wanted to ensure the flag was set before the Lexer was run, and not during the checking phase, since at that point, the tokens are already Lexed, and we're merely checking, and I couldn't do that if it was encapsulated into a function that was run prior to the call to the setter.

llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp
98

Ah, sorry, I see now that the call to Lex was moved out of the setup code. Yeah that's fine.

MaskRay accepted this revision.Mar 30 2021, 5:21 PM
MaskRay added inline comments.
llvm/include/llvm/MC/MCParser/MCAsmLexer.h
151

Since it will be immediately used and having the utility functions in this patch seem more appropriate, looks good.

anirudhp updated this revision to Diff 334458.Mar 31 2021, 9:08 AM
anirudhp marked an inline comment as done.
  • Addressed Comments x2 (Reordered the check in isIdentifierChar)
anirudhp marked an inline comment as done.Mar 31 2021, 9:08 AM
anirudhp marked an inline comment as done.
nickdesaulniers accepted this revision.Mar 31 2021, 2:03 PM