This is an archive of the discontinued LLVM Phabricator instance.

[AsmParser][SystemZ][z/OS] Add in support to allow use of additional comment strings.
ClosedPublic

Authored by anirudhp on Mar 24 2021, 11:24 AM.

Details

Summary
  • Currently, MCAsmInfo provides a CommentString attribute, that various targets can set, so that the AsmLexer can appropriately lex a string as a comment based on the set value of the attribute.
  • However, AsmLexer also supports a few additional comment syntaxes, in addition to what's specified as a CommentString attribute. This includes regular C-style block comments (/* ... */), regular C-style line comments (// .... ) and #. While I'm not sure as to why this behaviour exists, I am assuming it does to maintain backward compatibility with GNU AS (see https://sourceware.org/binutils/docs/as/Comments.html#Comments for reference)

For example:
Consider a target which sets the CommentString attribute to '*'.
The following strings are all lexed as comments.

"# abc" -> comment
"// abc" -> comment
"/* abc */ -> comment
"* abc" -> comment
  • In HLASM however, only "*" is accepted as a comment string, and nothing else.
  • To achieve this, an additional attribute (AllowAdditionalComments) has been added to MCAsmInfo. If this attribute is set to false, then only the string specified by the CommentString attribute is used as a possible comment string to be lexed by the AsmLexer. The regular C-style block comments, line comments and "#" are disabled. As a final note, "#" will still be treated as a comment, if the CommentString attribute is set to "#".

Depends on https://reviews.llvm.org/D99277

Diff Detail

Event Timeline

anirudhp created this revision.Mar 24 2021, 11:24 AM
anirudhp requested review of this revision.Mar 24 2021, 11:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2021, 11:24 AM
anirudhp updated this revision to Diff 333082.Mar 24 2021, 12:08 PM
  • Few formatting changes + making the unittest a bit more robust in the tokens it was checking
anirudhp updated this revision to Diff 334823.Apr 1 2021, 2:01 PM
  • Rebasing on latest master.

If the C and BCPL style comments don't cause conflict with HLASM, I'd rather we don't add any code. We don't necessarily reject syntax which are unlikely used incorrectly by the user.
If such distinction is really needed, maybe name the variable with a positive meaning, e.g. AllowCCommentMarker, instead of Only* (Disallow*).

If the C and BCPL style comments don't cause conflict with HLASM, I'd rather we don't add any code. We don't necessarily reject syntax which are unlikely used incorrectly by the user.

They do cause "conflict". HLASM currently does not treat the additional comment strings as "comments".

If such distinction is really needed, maybe name the variable with a positive meaning, e.g. AllowCCommentMarker, instead of Only* (Disallow*).

How about a new field called AllowAdditionalComments instead? This will encapsulate allowing the C-style line and block comments, and the "#" string as additional comment strings in addition to the CommentString attribute.

MaskRay added a comment.EditedApr 4 2021, 11:45 AM

If the C and BCPL style comments don't cause conflict with HLASM, I'd rather we don't add any code. We don't necessarily reject syntax which are unlikely used incorrectly by the user.

They do cause "conflict". HLASM currently does not treat the additional comment strings as "comments".

This does not necessarily mean a conflict. As an concrete example, ld.lld treats # as an additional comment marker for linker scripts. GNU ld don't do this.
The ld.lld way can be seen as an extension. Since there is no meaningful syntax starting with # in GNU ld linker scripts, reserving # for an additional comment marker is totally fine.

My question here is similar. If there is no meaningful syntax starting with /* or //, we don't necessarily detect such cases.

If such distinction is really needed, maybe name the variable with a positive meaning, e.g. AllowCCommentMarker, instead of Only* (Disallow*).

How about a new field called AllowAdditionalComments instead? This will encapsulate allowing the C-style line and block comments, and the "#" string as additional comment strings in addition to the CommentString attribute.

Looks good.

Note that # is special - some dialects don't use it for comment markers at an arbitrary column but allows it at the line beginning.
If HLASM does not use # at line beginning as useful syntax, we don't necessarily detect it.

This does not necessarily mean a conflict. As an concrete example, ld.lld treats # as an additional comment marker for linker scripts. GNU ld don't do this.
The ld.lld way can be seen as an extension. Since there is no meaningful syntax starting with # in GNU ld linker scripts, reserving # for an additional comment marker is totally fine.
My question here is similar. If there is no meaningful syntax starting with /* or //, we don't necessarily detect such cases.

Ahh I see. I think I understand what you're saying. With respect to the extent of the support we are providing for HLASM (only for 1st param of inline asm block only concerned with Z machine instructions), I don't believe there would be any meaningful syntax starting with // or /* .. */, or # (there shouldn't be a reason for a user to use it). However, we also want to ensure that for the z/OS target, for the HLASM syntax support we are providing, we don't leak any accepted gnu syntax.

i.e. Let's say as a usecase, a user for z/OS using the HLASM syntax, does use the // .... , /* ... */ , # ..... in an inline asm statement (even though it might be not anything meaningful).

Under the current behaviour, llvm's assembler/asm lexer will lex it as a line comment. And since we want to reject any accepted gnu syntax, we want to ensure that those "additional comment" syntaxes are not accepted, i.e. they are lexed as regular tokens (non line comments in this case).

anirudhp updated this revision to Diff 335256.Apr 5 2021, 7:47 AM
  • Addressed Comments
  • Renamed MCAsmInfo attribute with a more positive meaning
  • Updated tests and AsmLexer with new attribute name.

Changes LGTM. You will need to update the title and description to reflect the new approach you took to allow additional comment strings, rather than disable them.

This revision is now accepted and ready to land.Apr 7 2021, 8:08 AM
anirudhp retitled this revision from [AsmParser][SystemZ][z/OS] Add in support to only use CommentString as a possible comment syntax to [AsmParser][SystemZ][z/OS] Add in support to allow use of additional comment strings..Apr 7 2021, 8:11 AM
anirudhp edited the summary of this revision. (Show Details)
anirudhp added a comment.EditedApr 8 2021, 9:43 AM

@MaskRay Would you happen to have any other review comments on this patch that I can address?

neumannh added a subscriber: neumannh.
neumannh removed a subscriber: neumannh.
myiwanch accepted this revision.Apr 12 2021, 2:40 PM

The changes to MCAsmInfo are pretty small, so it shouldn't affect others too much.
Looks good

anirudhp set the repository for this revision to rG LLVM Github Monorepo.Apr 13 2021, 6:43 AM
anirudhp updated this revision to Diff 337136.Apr 13 2021, 6:51 AM
  • Updating repo to rg Github monorepo (no functional changes)
anirudhp updated this revision to Diff 337140.Apr 13 2021, 7:03 AM
  • Rebase on latest master