This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Allow slash-star comments in instruction operands
Needs ReviewPublic

Authored by abel-bernabeu on Jun 15 2023, 3:45 AM.

Details

Summary
The following commenting style was unsupported, leading to compilation
errors:

```
unsigned long int dst;
__asm__ __volatile__(
  "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n"
  "add zero, %[dst], %[dst]\n"
  : [ dst ] "=r"(dst)
  :
  :);
```

A code review of the top level parser (AsmParser class) showed that
it was the backend's responsibility to handle the comments, but
RISC-V's backend did not handle the comments in any way.

Beyond the obvious solution of explicitly handling the comments within
the RISC-V backend, another, easier to maintain, was suggested by Sergei
Barannikov in a Discourse discussion thread:

https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8

In summary, all backends, including the RISC-V's, should switch
from getLexer().Lex() to getParser().Lex() in their ParseInstruction
implementation. Here we just do the RISC-V work.

We would also like to mention David Spikett from Arm's community for pointing
out where to start looking within the LLVM code base.

Co-Authored-By: Sergei Barannikov <barannikov88@gmail.com>
Co-Authored-By: Fangrui Song <i@maskray.me>

Differential Revision: https://reviews.llvm.org/D153008

Diff Detail

Unit TestsFailed

Event Timeline

abel-bernabeu created this revision.Jun 15 2023, 3:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 3:45 AM
abel-bernabeu requested review of this revision.Jun 15 2023, 3:45 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 15 2023, 3:45 AM
abel-bernabeu edited the summary of this revision. (Show Details)Jun 15 2023, 3:48 AM
abel-bernabeu edited the summary of this revision. (Show Details)
abel-bernabeu edited the summary of this revision. (Show Details)Jun 15 2023, 3:57 AM
abel-bernabeu added a reviewer: rogfer01.
DavidSpickett added a subscriber: DavidSpickett.

Is the comment here relevant? https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8 Does this patch do that already?

Also is it a problem that the ignored comments are not seen in the output? Perhaps you are just not checking for those bits. For comments on the end, those are propagated to the assembly output I know that much.

abel-bernabeu added a comment.EditedJun 15 2023, 4:07 AM

For the customer who reported the problem, the comments are in the input source (doing their job explaining what the operands are).

Now, if a comment is not seen when compiling with "-S" it is less of a problem than having the compilation not succeeding. I did not see an obvious way to pass those comments to the AsmParser instance.

My understanding is that the ParseInstruction interface needs to be extended for the comments to be collected and passed to the top level parser (similarly to what is done with the operands).

rogfer01 added a comment.EditedJun 15 2023, 5:37 AM

Hi @abel-bernabeu, I did a quick experiment replacing all the calls to getLexer().Lex() with getParser().Lex() and now your second testcase is accepted and looks like this in the output

f2:
	addi	sp, sp, -32
	sd	ra, 24(sp)
	sd	s0, 16(sp)
	addi	s0, sp, 32
	#APP
	lui	a0, 1	# this is fine 	# this should also be fine 
	addiw	a0, a0, 564
	add	zero, a0, a0

	#NO_APP

I understand this is the best we can do within assembly syntax.

I'm not suggesting we should change all of them. There are ~20 occurrences of getLexer.Lex() so I think it should not be too onerous to review each one.

What do you think?

[RISCV] Allow slash-star comments in instruction operands

It has been reported by one of Esperanto's customers that slash-start
comments ("/*") within inline assembly were only allowed before the
first instruction operand or at the end of the lines. Those comments
were, however, not allowed when interleaved within the operands.

An example follows:

unsigned long int dst;
__asm__ __volatile__(
  "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n"
  "add zero, %[dst], %[dst]\n"
  : [ dst ] "=r"(dst)
  :
  :);

A code review of the top level parser (AsmParser class) showed that
when comments were placed before the instruction operand or at end of
a line, then they were gracefully handled irrespective of the backend.
When the comments were interleaved within the instruction operands it
was the backend's responsibility to handle the comments.

RISC-V's backend did not handle the comments in any way.

Beyond the obvious solution of explicitly handling the comments within
the RISC-V backend, another, easier to maintain solution was suggested
by Sergei Barannikov in a Discourse discussion thread:

https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8

In summary, all backends, including the RISC-V's, should switch
from getLexer().Lex() to getParser().Lex() in their ParseInstruction
implementation.

The getLexer().Lex() approach relies on the user to explicitly handle
the comments, whereas the suggested getParser().Lex() alternive already
handles the comments in the same way as done for non-target-specific
assembly directives.

Here we just do the RISC-V work. Other backends should also do their own
review.

In addition to Sergei Barannikov, I would also we thank David Spikett
from Arm's community for pointing out where to start looking within the
LLVM code base, and also the patch reviewers.
abel-bernabeu edited the summary of this revision. (Show Details)Jun 15 2023, 7:00 AM

[RISCV] Allow slash-star comments in instruction operands

It has been reported by one of Esperanto's customers that slash-start
comments ("/*") within inline assembly were only allowed before the
first instruction operand or at the end of the lines. Those comments
were, however, not allowed when interleaved within the operands.

An example follows:

unsigned long int dst;
__asm__ __volatile__(
  "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n"
  "add zero, %[dst], %[dst]\n"
  : [ dst ] "=r"(dst)
  :
  :);

A code review of the top level parser (AsmParser class) showed that
when comments were placed before the instruction operand or at end of
a line, then they were gracefully handled irrespective of the backend.
When the comments were interleaved within the instruction operands it
was the backend's responsibility to handle the comments.

RISC-V's backend did not handle the comments in any way.

Beyond the obvious solution of explicitly handling the comments within
the RISC-V backend, another, easier to maintain solution was suggested
by Sergei Barannikov in a Discourse discussion thread:

https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8

In summary, all backends, including the RISC-V's, should switch
from getLexer().Lex() to getParser().Lex() in their ParseInstruction
implementation.

The getLexer().Lex() approach relies on the user to explicitly handle
the comments, whereas the suggested getParser().Lex() alternive already
handles the comments in the same way as done for non-target-specific
assembly directives.

Here we just do the RISC-V work. Other backends should also do their own
review.

In addition to Sergei Barannikov, I would also we thank David Spikett
from Arm's community for pointing out where to start looking within the
LLVM code base, and also the patch reviewers.

abel-bernabeu edited the summary of this revision. (Show Details)Jun 15 2023, 7:01 AM

I was a bit reluctant to start a discussion on migrating getLexer().Lex() to getParser().Lex(), but I guess it makes sense to do it now rather than deferring.

It is cleaner now, I agree.

[RISCV] Allow slash-star comments in instruction operands

It has been reported by one of Esperanto's customers that slash-start
comments ("/*") within inline assembly were only allowed before the
first instruction operand or at the end of the lines. Those comments
were, however, not allowed when interleaved within the operands.

An example follows:

```
unsigned long int dst;
__asm__ __volatile__(
  "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n"
  "add zero, %[dst], %[dst]\n"
  : [ dst ] "=r"(dst)
  :
  :);
```

A code review of the top level parser (AsmParser class) showed that
when comments were placed before the instruction operand or at end of
a line, then they were gracefully handled irrespective of the backend.
When the comments were interleaved within the instruction operands it
was the backend's responsibility to handle the comments.

RISC-V's backend did not handle the comments in any way.

Beyond the obvious solution of explicitly handling the comments within
the RISC-V backend, another, easier to maintain solution was suggested
by Sergei Barannikov in a Discourse discussion thread:

https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8

In summary, all backends, including the RISC-V's, should switch
from getLexer().Lex() to getParser().Lex() in their ParseInstruction
implementation.

The getLexer().Lex() approach relies on the user to explicitly handle
the comments, whereas the suggested getParser().Lex() alternive already
handles the comments in the same way as done for non-target-specific
assembly directives.

Here we just do the RISC-V work. Other backends should also do their own
review.

In addition to Sergei Barannikov, I would also like to thank David Spikett
from Arm's community for pointing out where to start looking within the
LLVM code base, and also the patch reviewers.
abel-bernabeu edited the summary of this revision. (Show Details)Jun 15 2023, 7:09 AM

[RISCV] Allow slash-star comments in instruction operands

It has been reported by one of Esperanto's customers that slash-start
comments ("/*") within inline assembly were only allowed before the
first instruction operand or at the end of the lines. Those comments
were, however, not allowed when interleaved within the operands.

An example follows:

unsigned long int dst;
__asm__ __volatile__(
  "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n"
  "add zero, %[dst], %[dst]\n"
  : [ dst ] "=r"(dst)
  :
  :);

A code review of the top level parser (AsmParser class) showed that
when comments were placed before the instruction operand or at end of
a line, then they were gracefully handled irrespective of the backend.
When the comments were interleaved within the instruction operands it
was the backend's responsibility to handle the comments.

RISC-V's backend did not handle the comments in any way.

Beyond the obvious solution of explicitly handling the comments within
the RISC-V backend, another, easier to maintain, was suggested by Sergei
Barannikov in a Discourse discussion thread:

https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8

In summary, all backends, including the RISC-V's, should switch
from getLexer().Lex() to getParser().Lex() in their ParseInstruction
implementation.

The getLexer().Lex() approach relies on the user to explicitly handle
the comments, whereas the suggested getParser().Lex() alternive already
handles the comments in the same way as done for non-target-specific
assembly directives.

Here we just do the RISC-V work. Other backends should also do their own
review.

In addition to Sergei Barannikov, I would also like to thank David Spikett
from Arm's community for pointing out where to start looking within the
LLVM code base, and also the patch reviewers.

abel-bernabeu edited the summary of this revision. (Show Details)Jun 15 2023, 7:14 AM
DavidSpickett added inline comments.Jun 15 2023, 7:18 AM
clang/test/CodeGen/RISCV/riscv-inline-asm-gcc-commenting.c
23 ↗(On Diff #531746)

Check for the comment content here too? # this is fine # this should also be fine

barannikov88 added a subscriber: barannikov88.EditedJun 15 2023, 7:43 AM

Does the test have to be a C source? Why not just plain asm (fed into llvm-mc)?

PS
Please don't repeat the commin message on each arc diff, it is noisy. Just write what has been changed since the last revision, e.g. "Address review comments" or "Apply clang-format".

jrtc27 requested changes to this revision.Jun 15 2023, 7:51 AM

Clang tests should not compile to asm. You want an IR test.

This revision now requires changes to proceed.Jun 15 2023, 7:51 AM

Clang tests should not compile to asm. You want an IR test.

Jessica, are there any exceptions for tests is under CodeGen/RISCV intended to exercise the assembly parser?

I have just written a test that reproduces the way I manually test the feature.

Clang tests should not compile to asm. You want an IR test.

Jessica, are there any exceptions for tests is under CodeGen/RISCV intended to exercise the assembly parser?

I have just written a test that reproduces the way I manually test the feature.

No. You can test the assembly parser just as easily from IR.

I'll also note that your assembly isn't particularly minimal, which it should be, unless the add zero, %[dst], %[dst] lines are doing something I'm not aware of?

abel-bernabeu added a comment.EditedJun 15 2023, 10:01 AM

Thanks everyone, taking note of all the comments for improving the test:

  • Simplify the test (can be one instruction, no problem).
  • Check at IR level
  • Check for the comments being placed in the output
  • Do "arc diff" with one-line descriptions

Will update before tomorrow at mid-day (CET).

MaskRay added a comment.EditedJun 15 2023, 11:24 AM

You may add a test file beside llvm/test/CodeGen/RISCV/inline-asm*.
Perhaps llvm/test/MC/RISCV/ is a better choice.

I hope that this patch focuses on getLexer().Lex() instances that actually cause a problem.

Many getLexer().Lex() instances followed by is(...) form a pattern that can be rewritten in a better way. I'll check these instances and fix them.

Simplified the test. Also I check that the comments are a carried over to the assembly output.

This new update still applies many unneeded getParser().Lex(); and adds a test at a wrong layer (clang/test/CodeGen):

abel-bernabeu added a comment.EditedJun 16 2023, 9:13 AM

Clang tests should not compile to asm. You want an IR test.

Jessica, are there any exceptions for tests is under CodeGen/RISCV intended to exercise the assembly parser?

I have just written a test that reproduces the way I manually test the feature.

No. You can test the assembly parser just as easily from IR.

I'll also note that your assembly isn't particularly minimal, which it should be, unless the add zero, %[dst], %[dst] lines are doing something I'm not aware of?

Jessica, the LLVM IR form contains the inline assembly still unparsed. Look, this is what I get:

abel@Docker:~/work/llvm-project/build$ ./bin/clang -fPIC --target=riscv64-unknown-elf   -o - -S  -emit-llvm  ../clang/test/CodeGen/RISCV/riscv-inline-asm-gcc-commenting.c 
; ModuleID = '../clang/test/CodeGen/RISCV/riscv-inline-asm-gcc-commenting.c'
source_filename = "../clang/test/CodeGen/RISCV/riscv-inline-asm-gcc-commenting.c"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-unknown-elf"

; Function Attrs: noinline nounwind optnone
define i64 @f() #0 {
  %1 = alloca i64, align 8
  %2 = call i64 asm "li /* this is fine */ $0 , /* this is also fine */ 0 /* and last but not least */\0A", "=r"() #1, !srcloc !6
  store i64 %2, ptr %1, align 8
  %3 = load i64, ptr %1, align 8
  ret i64 %3
}

attributes #0 = { noinline nounwind optnone "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+c,+m,+relax,-d,-e,-experimental-smaia,-experimental-ssaia,-experimental-zca,-experimental-zcb,-experimental-zcd,-experimental-zcf,-experimental-zcmp,-experimental-zcmt,-experimental-zfa,-experimental-zfbfmin,-experimental-zicond,-experimental-zihintntl,-experimental-ztso,-experimental-zvbb,-experimental-zvbc,-experimental-zvfbfmin,-experimental-zvfbfwma,-experimental-zvfh,-experimental-zvkg,-experimental-zvkn,-experimental-zvknc,-experimental-zvkned,-experimental-zvkng,-experimental-zvknha,-experimental-zvknhb,-experimental-zvks,-experimental-zvksc,-experimental-zvksed,-experimental-zvksg,-experimental-zvksh,-experimental-zvkt,-f,-h,-save-restore,-svinval,-svnapot,-svpbmt,-v,-xsfvcp,-xtheadba,-xtheadbb,-xtheadbs,-xtheadcmo,-xtheadcondmov,-xtheadfmemidx,-xtheadmac,-xtheadmemidx,-xtheadmempair,-xtheadsync,-xtheadvdot,-xventanacondops,-zawrs,-zba,-zbb,-zbc,-zbkb,-zbkc,-zbkx,-zbs,-zdinx,-zfh,-zfhmin,-zfinx,-zhinx,-zhinxmin,-zicbom,-zicbop,-zicboz,-zicntr,-zicsr,-zifencei,-zihintpause,-zihpm,-zk,-zkn,-zknd,-zkne,-zknh,-zkr,-zks,-zksed,-zksh,-zkt,-zmmul,-zve32f,-zve32x,-zve64d,-zve64f,-zve64x,-zvl1024b,-zvl128b,-zvl16384b,-zvl2048b,-zvl256b,-zvl32768b,-zvl32b,-zvl4096b,-zvl512b,-zvl64b,-zvl65536b,-zvl8192b" }
attributes #1 = { nounwind memory(none) }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"target-abi", !"lp64"}
!2 = !{i32 8, !"PIC Level", i32 2}
!3 = !{i32 7, !"frame-pointer", i32 2}
!4 = !{i32 8, !"SmallDataLimit", i32 0}
!5 = !{!"clang version 17.0.0 (git@github.com:llvm/llvm-project.git 61bab164d4c3b15ba13ddd53de7bdeb6b8c9de30)"}
!6 = !{i64 139}

Only the substitution of "%[dst]" for "$0" is observable on the LLVM IR.

Am I misinterpreting your suggestion? Thanks in advance for you clarification.

abel-bernabeu marked an inline comment as done.Jun 16 2023, 9:34 AM

This new update still applies many unneeded getParser().Lex(); and adds a test at a wrong layer (clang/test/CodeGen):

Your comment in here suggests a full review of the parser to make the code consistent with the principle that a token should be eaten in the same function that checks for a specific pattern.

https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/10

I like the suggestion, just a couple of questions:

  • is it something that you would like to try yourself?
  • do you want me to create a refactoring ticket for myself? Not sure when will I be highly available for a major refactor, but I can try to assign time at work for the task.

Let me know your preference.

abel-bernabeu added a comment.EditedJun 16 2023, 9:45 AM

This new update still applies many unneeded getParser().Lex(); and adds a test at a wrong layer (clang/test/CodeGen):

Would you rather have:

  • LLVM IR as input
  • assembly as output
  • the test placed under llvm/test/CodeGen/RISCV/

?

That would make the test more verbose, but it would reduce the testing scope. Please confirm your preference.

The test has been moved to llvm/test/CodeGen/RISCV

Also has been reworked for using LLVM IR as input, so clang is not needed in the loop.

Clang tests should not compile to asm. You want an IR test.

My bad. You suggested IR as input (rather than C) and makes total sense. The latest update captures that suggestion.

This new update still applies many unneeded getParser().Lex(); and adds a test at a wrong layer (clang/test/CodeGen):

Would you rather have:

  • LLVM IR as input
  • assembly as output
  • the test placed under llvm/test/CodeGen/RISCV/

?

That would make the test more verbose, but it would reduce the testing scope. Please confirm your preference.

I created D153204 as an alternative. Thanks for reporting the issue!

This new update still applies many unneeded getParser().Lex(); and adds a test at a wrong layer (clang/test/CodeGen):

Would you rather have:

  • LLVM IR as input
  • assembly as output
  • the test placed under llvm/test/CodeGen/RISCV/

?

That would make the test more verbose, but it would reduce the testing scope. Please confirm your preference.

I created D153204 as an alternative. Thanks for reporting the issue!

Thanks for reviewing. Added my comments on your patch proposal.

abel-bernabeu edited the summary of this revision. (Show Details)Jun 18 2023, 3:06 PM

Updated commit message:

  • fixed a typo
  • added a co-author
abel-bernabeu edited the summary of this revision. (Show Details)Jun 18 2023, 3:16 PM

Fixed a typo on commit message

abel-bernabeu edited the summary of this revision. (Show Details)Jun 18 2023, 3:19 PM
  • I'd prefer Lex() over equivalent getParser().Lex() because it is shorter.
  • Ideally, every change should be tested.
  • The tests should be minimal. That is, they should be assembly files passed to llvm-mc rather than ll files passed to llc.

I'm not very familiar with RISC-V, so I'll leave further review to code owners.

Rebased on top of the latest changes in RISCVAsmParser.cpp

Moved the testing to llc-mc test cases under lvm/test/MC/RISCV/

Covered with tests for every single case where the parser consumes a
token and a potential comment needs to be handled. Having manually
verified that the testing coverage is complete.

Reworked the code using getLexer().peekTokens for ignoring comments.

If a comment is not handled proper, now is a bug, because the feature
is complete and the testing has full coverage.

abel-bernabeu edited the summary of this revision. (Show Details)Jun 19 2023, 9:04 AM
abel-bernabeu added inline comments.Jun 19 2023, 12:27 PM
llvm/test/MC/RISCV/comments.ll
26

This should be CHECK-NEXT rather than CHECK. Will update again tonight.

abel-bernabeu updated this revision to Diff 532753.EditedJun 19 2023, 2:35 PM

Added --match-full-lines to FileCheck for making the check stricter.

Removed a gratuitous blank line

abel-bernabeu accepted this revision.Jun 19 2023, 2:43 PM

Am happy with the current patch version

On the zdinx test, changed a FileCheck comment that was written starting with "//" rather than "#".

Changed for "#" just for consistency with the rest of tests.

abel-bernabeu accepted this revision.Jun 19 2023, 2:54 PM

@jrtc27:

The tests:

  • are under llvm/test/MC/RISCV
  • use llvm-mc as assembler (rather than C!)
  • are as simple as they can be
  • cover every line touched by this patch.

Would you unblock this now? Thanks for your feedback so far.

@MaskRay:

Let me know if you miss any testing or places where comments are still not possible. Am pretty sure I am covering every case.

You have been added as coauthor here (with the standard "Co-Authored-By: " tag on the commit message) and you can close https://reviews.llvm.org/D153204 in return :)

abel-bernabeu updated this revision to Diff 532837.EditedJun 20 2023, 2:36 AM

I just got this flashback...

For code coverage of one of the Lex() calls in ParseInstruction we needed one case
where an instruction has no operands.

We had, not one, but three. Am keeping just the "nop" case, which is the simplest
and most known instruction and moving it to the top of the test suite, so the tests
are sorted by increasing complexity (from the simplest to reach test points to
the most sophisticated and least frequently used parser features).

craig.topper added inline comments.Jun 21 2023, 12:23 AM
llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1586

I don't think this style of comment is common in llvm.

1598

Can this be a regular array instead of a SmallVector?

1601

Capitalize variable names

barannikov88 added inline comments.Jun 21 2023, 1:14 AM
llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp
1610

This is much more common.

Addressed some review comments:

  • Doxygen-style comments are not common, so using triple-slash instead.
  • Using a regular C array on a place where the usage of SmallVector is not justified.
  • Proper capitalization of a variable
  • Usage of "&&" rather than "and", to blend better with the existing code style.
abel-bernabeu signed these changes with MFA.Jun 21 2023, 2:42 AM
abel-bernabeu accepted this revision.
abel-bernabeu marked 4 inline comments as done.

Reduced the commit message verbosity by one half, without losing anything.

Even less verbosity...

abel-bernabeu edited the summary of this revision. (Show Details)Jun 21 2023, 2:51 AM
abel-bernabeu edited the summary of this revision. (Show Details)