This is an archive of the discontinued LLVM Phabricator instance.

[AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1
ClosedPublic

Authored by anirudhp on Mar 9 2021, 11:29 AM.

Details

Summary
  • This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM (main RFC here)
  • This patch in particular introduces HLASM Parser support for Z machine instructions.
  • The approach taken here was to subclass AsmParser, and make various functions and variables as "protected" wherever appropriate.
  • The HLASMAsmParser class overrides the parseStatement function. Two new private functions parseAsHLASMLabel and parseAsMachineInstruction are introduced as well.

The general syntax is laid out as follows (more information available in HLASM V1R6 Language Reference Manual - Chapter 2 - Instruction Statement Format):

<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
  1. TokA is referred to as the Name Entry. This token is optional
  2. TokB is referred to as the Operation Entry. This token is mandatory.
  3. TokC is referred to as the Operand Entry. This token is mandatory
  4. TokD is referred to as the Remarks Entry. This token is optional
  • If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
  • If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
  • TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
  • TokD if provided is taken as is, and emitted as a comment.

The following additional approach was examined, but not taken:

  • Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.

Testing:

  • This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.
  • Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.

Diff Detail

Event Timeline

anirudhp created this revision.Mar 9 2021, 11:29 AM
anirudhp requested review of this revision.Mar 9 2021, 11:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2021, 11:29 AM
anirudhp edited the summary of this revision. (Show Details)Mar 9 2021, 11:37 AM
anirudhp retitled this revision from [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser to [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1.
anirudhp edited the summary of this revision. (Show Details)Mar 9 2021, 11:40 AM
anirudhp updated this revision to Diff 329427.Mar 9 2021, 11:51 AM
anirudhp updated this revision to Diff 330073.Mar 11 2021, 2:38 PM
  • A bit of refactoring (reduced a bit of duplication of code). Removed functions which no longer need to be under "protected".
uweigand added inline comments.Mar 12 2021, 12:31 AM
llvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
511

Maybe just pass MAI into the routine, so it can check those parameters itself?

llvm/lib/MC/MCParser/AsmParser.cpp
6193

I guess now this doesn't really have to be a separate routine any more?

llvm/include/llvm/MC/MCAsmInfo.h
566

nit: Move this between getCommentString and getLabelSuffix to maintain the same order the class members are listed as.

anirudhp updated this revision to Diff 330253.Mar 12 2021, 8:26 AM
  • Addressing comments
anirudhp marked 2 inline comments as done and an inline comment as not done.Mar 12 2021, 8:29 AM
anirudhp added inline comments.
llvm/include/llvm/MC/MCAsmInfo.h
566

Sure. But I'll wait until https://reviews.llvm.org/D97703 is merged in, as its around the same area. When I rebase it, I'll make sure to move it to the right place.

llvm/lib/MC/MCParser/AsmParser.cpp
6193

Yes, you're right. But since the rest of the function is essentially just getTargetParser().ParseInstruction and getTargetParser().MatchAndEmitInstruction(), I didn't want to decompose logic in checkAndGenerateLOCDirectiveForInstruction into the same function. I'd rather have it as a separate routine, since it does something different.

anirudhp marked an inline comment as done.Mar 12 2021, 8:29 AM
uweigand added inline comments.Mar 12 2021, 9:14 AM
llvm/lib/MC/MCParser/AsmParser.cpp
6193

My point was that it might be preferable to minimize changes to existing code. Just split the old parseStatement in two parts. Leave the second part (the new parseAndMatchAndEmitTargetInstruction) where it is above. The diff should just show the end of one function and the header of the new function being inserted, the rest of the lines are unchanged.

anirudhp updated this revision to Diff 330756.Mar 15 2021, 12:07 PM
  • Addressed Comments x2
  • Removing an unneeded function based on refactoring
anirudhp marked an inline comment as done.Mar 15 2021, 12:07 PM

Ping Ping! :) :)

I have two nits, but overall LGTM, your patch may need to be rebased because CI is failing

llvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
279

nit: Do we need to save this to a bool? We can just do the following if there is only one use.
if (MAI->getEmitGNUAsmStartIndentationMarker())

anirudhp updated this revision to Diff 335290.Apr 5 2021, 10:31 AM
  • Rebasing on latest master + setting the AllowHashInIdentifier in HLASMAsmParser since it has landed.
anirudhp marked 2 inline comments as done.Apr 5 2021, 10:31 AM

Hi All!
Does anyone have feedback on the patch / approach taken? Any opinions would be greatly welcome and appreciated! :)

Is it possible to add partial unit tests like you did in your previous patches?

Is it possible to add partial unit tests like you did in your previous patches?

No (atleast from the top of my head). The very instantiation of the Parser instance is dependent on the target triple itself. And since we don't really have other object writer/streamer support yet, there's no way to instantiate for z/OS (even in the unit test). Also, writing tests end to test the end to end implementation for the parser would be a tremendous amount of work (comparing it to writing simple mc tests).

Kai added inline comments.Apr 21 2021, 11:26 AM
llvm/include/llvm/MC/MCAsmInfo.h
130

The comment contradicts the source below.

anirudhp updated this revision to Diff 341296.Apr 28 2021, 1:06 PM
  • Rebased on latest to bring in new changes
  • Corrected erroneous comment
anirudhp marked an inline comment as done.Apr 28 2021, 1:06 PM

Ping! :)

Kai accepted this revision.Apr 30 2021, 8:54 AM

LGTM.

llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
1377

Nit: If you combine both ifs, then it's more clearer that you just check that there is no space.

This revision is now accepted and ready to land.Apr 30 2021, 8:54 AM
anirudhp updated this revision to Diff 342545.May 3 2021, 2:35 PM
  • Addressing small comment regarding coalescing if statements + rebase on latest.
anirudhp updated this revision to Diff 343260.May 5 2021, 6:23 PM
  • Rebase on latest master + update various HLASM specific setters from MCAsmLexer.
anirudhp updated this revision to Diff 343506.May 6 2021, 2:45 PM
  • Rebase on latest + changes to account for API change of MCContext and MCObjectFileInfo
uweigand added inline comments.May 7 2021, 7:53 AM
llvm/lib/MC/MCParser/AsmParser.cpp
6193

Can you please move this new parseAndMatchAndEmitTargetInstruction routine up in the source file to immediately after AsmParser::parseStatement ? That should make the diff smaller and easier to review.

llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
1389–1415

You can avoid the duplicate error message by moving the code in here up before the isNot(Asm::EndOfStatement) check into a separate

if (isParsingHLASM() && getTok().is(AsmToken::Space))

block

1399

Should be skip the leading space(s), or are they intentionally part of the "remark"?

anirudhp updated this revision to Diff 344158.May 10 2021, 12:48 PM
anirudhp marked an inline comment as done.
  • Addressed comments.
anirudhp marked 3 inline comments as done.May 10 2021, 12:49 PM
anirudhp added inline comments.
llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
1399

They are intentionally part of the "Remark" field.

anirudhp marked an inline comment as done.May 10 2021, 12:49 PM
uweigand added inline comments.May 11 2021, 4:04 AM
llvm/lib/MC/MCParser/AsmParser.cpp
2291

Maybe those two lines should move into parseAndMatchAndEmitTargetInstruction, they are identical between the two callers.

2299

Can you rename the parameters so that you don't need to change the code below just because the names are different? Ideally, the rest of the function should have an empty diff to the current source.

2304

Here's some more spurious differences that should be removed.

llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
1386

I suggested to move the HLASM remark handling here (*before* the EndOfStatement check), I still think this would be a better place. Is there any reason why this doesn't work?

anirudhp added inline comments.May 11 2021, 7:57 AM
llvm/lib/MC/MCParser/AsmParser.cpp
2304

clang-tidy complains that the variable i is lower case, when it should be upper case. Is it okay if we ignore clang-tidy in this case in favour of a more unchanged diff or vice-versa? I'm not sure which is preferred / more important.

uweigand added inline comments.May 11 2021, 8:29 AM
llvm/lib/MC/MCParser/AsmParser.cpp
2304

Given that this is already in the current sources, I'd prefer to leave i in lower case. Once the diff is gone, clang-tidy shouldn't run on in anymore anyway ...

anirudhp updated this revision to Diff 344530.May 11 2021, 12:15 PM
  • Addressed Comments (Moved location of parseAndMatchAndEmitTargetInstruction to make the diff easier to view, changed SystemZAsmParser to emit Remark fields differently)
anirudhp marked 5 inline comments as done.May 11 2021, 12:15 PM
uweigand added inline comments.May 12 2021, 2:31 AM
llvm/lib/MC/MCParser/AsmParser.cpp
6182

Having two variables that are always the complement of one another seems odd, in particular because ShouldParseAsLabel doesn't appear to be used anywhere?

6203

Why not simply "return parseAsMachineInstruction"?

llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
1396

This still looks odd to me. What seems to be going on is that the LexUntilEndOfStatement lexes everything including the initial space. The explicit Lex then lexes the EndOfStatement marker.

I believe that would be wrong: the EndOfStatement is already lexed below (last line of this function). And the comment just immediately following here seems to imply that the "Lex" was intended to lex that initial *space*. But for that to be true, wouldn't the Lex have to be done *before* the LexUntilEndOfStatement?

anirudhp added inline comments.May 12 2021, 8:00 AM
llvm/lib/MC/MCParser/AsmParser.cpp
6182

I guess I could probably remove the assert itself. The if else block determines that either one of the flags would be set but not both, so I don't see the point in the assert. There's no other path that would would set both of them to true. If a flag is set incorrectly, the parsing logic for it should catch it.

ShouldParseAsLabel doesn't appear to be used anywhere?

Currently, no. But since these patches are staggered, the next sequence of patches once this lands, will add in support for parsing in labels. But I agree with your overall comment, that one variable should do the trick. Two are unnecessary. I will remove the ShouldParseAsLabel flag.

llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
1396

This still looks odd to me. What seems to be going on is that the LexUntilEndOfStatement lexes everything including the initial space. The explicit Lex then lexes the EndOfStatement marker.

Hmm, I don't think this is exactly right. The Lexer's SkipSpace attribute is set to false in AsmParser.cpp. Thus, spaces aren't "ignored" as part of regular tokens, and you have an actual AsmToken::Space.

For example, When we have something like this:
" abc def ghij \n"
The sequence of AsmTokens seen are: Space, Identifier, Space, Identifier, Space, Identifier, Space, EndOfStatement"

Relevant implementation in AsmLexer.cpp (the LexToken() function).

case '\t':
  IsAtStartOfStatement = OldIsAtStartOfStatement;
  while (*CurPtr == ' ' || *CurPtr == '\t')
    CurPtr++;
  if (SkipSpace)
    return LexToken(); // Ignore whitespace.
  else
    return AsmToken(AsmToken::Space, StringRef(TokStart, CurPtr - TokStart));

The CurPtr field always jumps over the space, but if SkipSpace is set to false, we don't lex the "next" token, we just return an AsmToken::Space where the onus is on the user to do with it what they want. If its set to true, we just recursively call LexToken() again to return the next possible non-space AsmToken.

If you take a look at the LexUntilEndOfStatement function:

StringRef AsmLexer::LexUntilEndOfStatement() {
  TokStart = CurPtr;

  while (!isAtStartOfComment(CurPtr) &&     // Start of line comment.
         !isAtStatementSeparator(CurPtr) && // End of statement marker.
         *CurPtr != '\n' && *CurPtr != '\r' && CurPtr != CurBuf.end()) {
    ++CurPtr;
  }
  return StringRef(TokStart, CurPtr-TokStart);
}

CurPtr already points to the first non-space character, and like you mentioned it goes on until the "end of statement" is reached. After that, a StringRef is returned.

So, when you come into the relevant block in the code:

    if (isParsingHLASM() && getTok().is(AsmToken::Space)) {
      // We've confirmed that there is a Remark field.
      StringRef Remark(getLexer().LexUntilEndOfStatement());
      Parser.Lex();
....
....
....

We've confirmed that there is a leading space(s), and we want to basically get everything after a space until the end of the statement, to emit as a comment. Once we've done that however, the current AsmToken is still an AsmToken::Space, so when we call Lex, we're still "lexing" the AsmToken::Space, and returning the next token to be lexed which is now a "\n", because we've effectively moved the CurPtr to the end of statement (in the previous call). The point to note here is that the LexUntilEndOfStatement function doesn't return an AsmToken, it simply returns a StringRef from CurPtr until it encounters the first \n (or the carriage return or the end of the string), while advancing CurPtr. So when we exit the block, we still have a AsmToken::EndOfStatement to be checked.

The other way of looking at it is possibly the following (this should be the same behaviour, but I haven't tried it yet, so some pseudo code).

if (isParsingHLASM() && getTok().is(AsmToken::Space)) {
     // Lex the leading spaces first
     
     // In a loop, lex all tokens until you reach the end of statement token
     // aggregating all the "returned" values into a string.
     // i.e. while (getTok().isNot(AsmToken::EndOfStatement) {
     //     RemarkString += getTok().getString()
     //     Parser.Lex()
     // }

     // Emit aggregated string as a comment 
}

Apologies for the very long comment. Please let me know if I've missed something / misunderstood something.

anirudhp updated this revision to Diff 344860.May 12 2021, 10:14 AM
  • Addressed comments (removed unneeded variable in AsmParser.cpp and assert)
anirudhp marked 2 inline comments as done.May 12 2021, 10:20 AM
uweigand accepted this revision.May 18 2021, 8:27 AM

OK, this looks good to me now. Thanks!

llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
1396

Thanks for the details. I think you're correct here. I guess I was confused because LexUntilEndOfStatement directly refers to CurPtr, which is different from the CurTok maintained by the lexer. But existing users of LexUntilEndOfStatement do it the same as you do here, so that should be fine.

anirudhp marked 2 inline comments as done.May 18 2021, 9:12 AM
anirudhp updated this revision to Diff 346198.May 18 2021, 10:00 AM
  • Rebase on latest.