This is an archive of the discontinued LLVM Phabricator instance.

[AsmParser] Hash is not a comment on some targets
ClosedPublic

Authored by olista01 on Aug 7 2017, 8:58 AM.

Details

Summary

The '#' token is not a comment for all targets (on ARM and AArch64 it marks an immediate operand), so we shouldn't treat it as such.

Comments are already converted to AsmToken::EndOfStatement by AsmLexer::LexLineComment, so this check was unnecessary.

Diff Detail

Repository
rL LLVM

Event Timeline

olista01 created this revision.Aug 7 2017, 8:58 AM
niravd requested changes to this revision.Aug 7 2017, 11:00 AM

LexLineComment will only be called when we # is a asm string line comment or we're at the start of a statement. This change would miss the case of a hash-based (preprocessor) EOL comment trailing a non-empty line which we do need to catch. I believe (but am not 100% certain) that I did find a live assembly case where the # as line comment interpreation for AArch64 was used.

The current variation between preprocessor assembly and preprocessed assembly is definitely non-ideal in that some typos will be interpreted as valid parses but AFAIK there's no context in the in-tree targets at least where its ambiguous whether the hash signals a comment or not. All things being equal I think our "preprocessor" behavior should be consistent across targets.

If we want to soften this I suggest we make AsmLexer Target dependent as well and parse Hash token as EndOfStatement.

This revision now requires changes to proceed.Aug 7 2017, 11:00 AM

Do you have an example of a situation where the preprocessor can emit a # mid-line, which is expected to be ignored?

According to the CPP docs [0], line-markers and other directives will always be emitted with the # as the first token on the line. The code in AsmLexer::LexToken already turns lines like this into AsmToken::HashDirective (for line-markers) or AsmToken::EndOfStatement (for any other statement starting with #, regardless of the target's comment token).

[0] https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html

niravd added a comment.Aug 8 2017, 8:50 AM

Hmm, looking around the only cases I see is that after an assembly label # non-directives are accepted by GCC (here's an example in the wild: https://github.com/android/platform_bionic/blob/master/libc/arch-x86/bionic/__bionic_clone.S). Looking at the parser we do have an explicit check for that in the parser so this should be okay. I also see that at least the arm GCC 4.8.4 errors on hash non-directives at the end of line with various assembly directives / instructions. I suspect there are more test cases that will need to be fixed.

This change should also modify MCAsmParser::parseToken and parseEOL and parseStatement where we special case Hash tokens as comments. A quick check on the tests show a few 2 more llvm tests that need minor fixing up.

olista01 updated this revision to Diff 110216.Aug 8 2017, 9:26 AM
olista01 edited edge metadata.
niravd accepted this revision.Aug 8 2017, 10:31 AM

LGTM.

This revision is now accepted and ready to land.Aug 8 2017, 10:31 AM
This revision was automatically updated to reflect the committed changes.