The following commenting style was unsupported, leading to compilation errors: ``` unsigned long int dst; __asm__ __volatile__( "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n" "add zero, %[dst], %[dst]\n" : [ dst ] "=r"(dst) : :); ``` A code review of the top level parser (AsmParser class) showed that it was the backend's responsibility to handle the comments, but RISC-V's backend did not handle the comments in any way. Beyond the obvious solution of explicitly handling the comments within the RISC-V backend, another, easier to maintain, was suggested by Sergei Barannikov in a Discourse discussion thread: https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8 In summary, all backends, including the RISC-V's, should switch from getLexer().Lex() to getParser().Lex() in their ParseInstruction implementation. Here we just do the RISC-V work. We would also like to mention David Spikett from Arm's community for pointing out where to start looking within the LLVM code base. Co-Authored-By: Sergei Barannikov <barannikov88@gmail.com> Co-Authored-By: Fangrui Song <i@maskray.me> Differential Revision: https://reviews.llvm.org/D153008
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Is the comment here relevant? https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8 Does this patch do that already?
Also is it a problem that the ignored comments are not seen in the output? Perhaps you are just not checking for those bits. For comments on the end, those are propagated to the assembly output I know that much.
For the customer who reported the problem, the comments are in the input source (doing their job explaining what the operands are).
Now, if a comment is not seen when compiling with "-S" it is less of a problem than having the compilation not succeeding. I did not see an obvious way to pass those comments to the AsmParser instance.
My understanding is that the ParseInstruction interface needs to be extended for the comments to be collected and passed to the top level parser (similarly to what is done with the operands).
Hi @abel-bernabeu, I did a quick experiment replacing all the calls to getLexer().Lex() with getParser().Lex() and now your second testcase is accepted and looks like this in the output
f2: addi sp, sp, -32 sd ra, 24(sp) sd s0, 16(sp) addi s0, sp, 32 #APP lui a0, 1 # this is fine # this should also be fine addiw a0, a0, 564 add zero, a0, a0 #NO_APP
I understand this is the best we can do within assembly syntax.
I'm not suggesting we should change all of them. There are ~20 occurrences of getLexer.Lex() so I think it should not be too onerous to review each one.
What do you think?
[RISCV] Allow slash-star comments in instruction operands It has been reported by one of Esperanto's customers that slash-start comments ("/*") within inline assembly were only allowed before the first instruction operand or at the end of the lines. Those comments were, however, not allowed when interleaved within the operands. An example follows: unsigned long int dst; __asm__ __volatile__( "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n" "add zero, %[dst], %[dst]\n" : [ dst ] "=r"(dst) : :); A code review of the top level parser (AsmParser class) showed that when comments were placed before the instruction operand or at end of a line, then they were gracefully handled irrespective of the backend. When the comments were interleaved within the instruction operands it was the backend's responsibility to handle the comments. RISC-V's backend did not handle the comments in any way. Beyond the obvious solution of explicitly handling the comments within the RISC-V backend, another, easier to maintain solution was suggested by Sergei Barannikov in a Discourse discussion thread: https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8 In summary, all backends, including the RISC-V's, should switch from getLexer().Lex() to getParser().Lex() in their ParseInstruction implementation. The getLexer().Lex() approach relies on the user to explicitly handle the comments, whereas the suggested getParser().Lex() alternive already handles the comments in the same way as done for non-target-specific assembly directives. Here we just do the RISC-V work. Other backends should also do their own review. In addition to Sergei Barannikov, I would also we thank David Spikett from Arm's community for pointing out where to start looking within the LLVM code base, and also the patch reviewers.
[RISCV] Allow slash-star comments in instruction operands
It has been reported by one of Esperanto's customers that slash-start
comments ("/*") within inline assembly were only allowed before the
first instruction operand or at the end of the lines. Those comments
were, however, not allowed when interleaved within the operands.
An example follows:
unsigned long int dst; __asm__ __volatile__( "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n" "add zero, %[dst], %[dst]\n" : [ dst ] "=r"(dst) : :);
A code review of the top level parser (AsmParser class) showed that
when comments were placed before the instruction operand or at end of
a line, then they were gracefully handled irrespective of the backend.
When the comments were interleaved within the instruction operands it
was the backend's responsibility to handle the comments.
RISC-V's backend did not handle the comments in any way.
Beyond the obvious solution of explicitly handling the comments within
the RISC-V backend, another, easier to maintain solution was suggested
by Sergei Barannikov in a Discourse discussion thread:
In summary, all backends, including the RISC-V's, should switch
from getLexer().Lex() to getParser().Lex() in their ParseInstruction
implementation.
The getLexer().Lex() approach relies on the user to explicitly handle
the comments, whereas the suggested getParser().Lex() alternive already
handles the comments in the same way as done for non-target-specific
assembly directives.
Here we just do the RISC-V work. Other backends should also do their own
review.
In addition to Sergei Barannikov, I would also we thank David Spikett
from Arm's community for pointing out where to start looking within the
LLVM code base, and also the patch reviewers.
I was a bit reluctant to start a discussion on migrating getLexer().Lex() to getParser().Lex(), but I guess it makes sense to do it now rather than deferring.
It is cleaner now, I agree.
[RISCV] Allow slash-star comments in instruction operands It has been reported by one of Esperanto's customers that slash-start comments ("/*") within inline assembly were only allowed before the first instruction operand or at the end of the lines. Those comments were, however, not allowed when interleaved within the operands. An example follows: ``` unsigned long int dst; __asm__ __volatile__( "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n" "add zero, %[dst], %[dst]\n" : [ dst ] "=r"(dst) : :); ``` A code review of the top level parser (AsmParser class) showed that when comments were placed before the instruction operand or at end of a line, then they were gracefully handled irrespective of the backend. When the comments were interleaved within the instruction operands it was the backend's responsibility to handle the comments. RISC-V's backend did not handle the comments in any way. Beyond the obvious solution of explicitly handling the comments within the RISC-V backend, another, easier to maintain solution was suggested by Sergei Barannikov in a Discourse discussion thread: https://discourse.llvm.org/t/interleaving-several-c-style-comments-in-the-same-inline-assembly-line/71353/8 In summary, all backends, including the RISC-V's, should switch from getLexer().Lex() to getParser().Lex() in their ParseInstruction implementation. The getLexer().Lex() approach relies on the user to explicitly handle the comments, whereas the suggested getParser().Lex() alternive already handles the comments in the same way as done for non-target-specific assembly directives. Here we just do the RISC-V work. Other backends should also do their own review. In addition to Sergei Barannikov, I would also like to thank David Spikett from Arm's community for pointing out where to start looking within the LLVM code base, and also the patch reviewers.
[RISCV] Allow slash-star comments in instruction operands
It has been reported by one of Esperanto's customers that slash-start
comments ("/*") within inline assembly were only allowed before the
first instruction operand or at the end of the lines. Those comments
were, however, not allowed when interleaved within the operands.
An example follows:
unsigned long int dst; __asm__ __volatile__( "li /* this was fine */ %[dst], /* this was NOT fine */ 0x1234\n" "add zero, %[dst], %[dst]\n" : [ dst ] "=r"(dst) : :);
A code review of the top level parser (AsmParser class) showed that
when comments were placed before the instruction operand or at end of
a line, then they were gracefully handled irrespective of the backend.
When the comments were interleaved within the instruction operands it
was the backend's responsibility to handle the comments.
RISC-V's backend did not handle the comments in any way.
Beyond the obvious solution of explicitly handling the comments within
the RISC-V backend, another, easier to maintain, was suggested by Sergei
Barannikov in a Discourse discussion thread:
In summary, all backends, including the RISC-V's, should switch
from getLexer().Lex() to getParser().Lex() in their ParseInstruction
implementation.
The getLexer().Lex() approach relies on the user to explicitly handle
the comments, whereas the suggested getParser().Lex() alternive already
handles the comments in the same way as done for non-target-specific
assembly directives.
Here we just do the RISC-V work. Other backends should also do their own
review.
In addition to Sergei Barannikov, I would also like to thank David Spikett
from Arm's community for pointing out where to start looking within the
LLVM code base, and also the patch reviewers.
clang/test/CodeGen/RISCV/riscv-inline-asm-gcc-commenting.c | ||
---|---|---|
24 | Check for the comment content here too? # this is fine # this should also be fine |
Does the test have to be a C source? Why not just plain asm (fed into llvm-mc)?
PS
Please don't repeat the commin message on each arc diff, it is noisy. Just write what has been changed since the last revision, e.g. "Address review comments" or "Apply clang-format".
Jessica, are there any exceptions for tests is under CodeGen/RISCV intended to exercise the assembly parser?
I have just written a test that reproduces the way I manually test the feature.
No. You can test the assembly parser just as easily from IR.
I'll also note that your assembly isn't particularly minimal, which it should be, unless the add zero, %[dst], %[dst] lines are doing something I'm not aware of?
Thanks everyone, taking note of all the comments for improving the test:
- Simplify the test (can be one instruction, no problem).
- Check at IR level
- Check for the comments being placed in the output
- Do "arc diff" with one-line descriptions
Will update before tomorrow at mid-day (CET).
You may add a test file beside llvm/test/CodeGen/RISCV/inline-asm*.
Perhaps llvm/test/MC/RISCV/ is a better choice.
I hope that this patch focuses on getLexer().Lex() instances that actually cause a problem.
Many getLexer().Lex() instances followed by is(...) form a pattern that can be rewritten in a better way. I'll check these instances and fix them.
Simplified the test. Also I check that the comments are a carried over to the assembly output.
This new update still applies many unneeded getParser().Lex(); and adds a test at a wrong layer (clang/test/CodeGen):
Jessica, the LLVM IR form contains the inline assembly still unparsed. Look, this is what I get:
abel@Docker:~/work/llvm-project/build$ ./bin/clang -fPIC --target=riscv64-unknown-elf -o - -S -emit-llvm ../clang/test/CodeGen/RISCV/riscv-inline-asm-gcc-commenting.c ; ModuleID = '../clang/test/CodeGen/RISCV/riscv-inline-asm-gcc-commenting.c' source_filename = "../clang/test/CodeGen/RISCV/riscv-inline-asm-gcc-commenting.c" target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" target triple = "riscv64-unknown-unknown-elf" ; Function Attrs: noinline nounwind optnone define i64 @f() #0 { %1 = alloca i64, align 8 %2 = call i64 asm "li /* this is fine */ $0 , /* this is also fine */ 0 /* and last but not least */\0A", "=r"() #1, !srcloc !6 store i64 %2, ptr %1, align 8 %3 = load i64, ptr %1, align 8 ret i64 %3 } attributes #0 = { noinline nounwind optnone "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+c,+m,+relax,-d,-e,-experimental-smaia,-experimental-ssaia,-experimental-zca,-experimental-zcb,-experimental-zcd,-experimental-zcf,-experimental-zcmp,-experimental-zcmt,-experimental-zfa,-experimental-zfbfmin,-experimental-zicond,-experimental-zihintntl,-experimental-ztso,-experimental-zvbb,-experimental-zvbc,-experimental-zvfbfmin,-experimental-zvfbfwma,-experimental-zvfh,-experimental-zvkg,-experimental-zvkn,-experimental-zvknc,-experimental-zvkned,-experimental-zvkng,-experimental-zvknha,-experimental-zvknhb,-experimental-zvks,-experimental-zvksc,-experimental-zvksed,-experimental-zvksg,-experimental-zvksh,-experimental-zvkt,-f,-h,-save-restore,-svinval,-svnapot,-svpbmt,-v,-xsfvcp,-xtheadba,-xtheadbb,-xtheadbs,-xtheadcmo,-xtheadcondmov,-xtheadfmemidx,-xtheadmac,-xtheadmemidx,-xtheadmempair,-xtheadsync,-xtheadvdot,-xventanacondops,-zawrs,-zba,-zbb,-zbc,-zbkb,-zbkc,-zbkx,-zbs,-zdinx,-zfh,-zfhmin,-zfinx,-zhinx,-zhinxmin,-zicbom,-zicbop,-zicboz,-zicntr,-zicsr,-zifencei,-zihintpause,-zihpm,-zk,-zkn,-zknd,-zkne,-zknh,-zkr,-zks,-zksed,-zksh,-zkt,-zmmul,-zve32f,-zve32x,-zve64d,-zve64f,-zve64x,-zvl1024b,-zvl128b,-zvl16384b,-zvl2048b,-zvl256b,-zvl32768b,-zvl32b,-zvl4096b,-zvl512b,-zvl64b,-zvl65536b,-zvl8192b" } attributes #1 = { nounwind memory(none) } !llvm.module.flags = !{!0, !1, !2, !3, !4} !llvm.ident = !{!5} !0 = !{i32 1, !"wchar_size", i32 4} !1 = !{i32 1, !"target-abi", !"lp64"} !2 = !{i32 8, !"PIC Level", i32 2} !3 = !{i32 7, !"frame-pointer", i32 2} !4 = !{i32 8, !"SmallDataLimit", i32 0} !5 = !{!"clang version 17.0.0 (git@github.com:llvm/llvm-project.git 61bab164d4c3b15ba13ddd53de7bdeb6b8c9de30)"} !6 = !{i64 139}
Only the substitution of "%[dst]" for "$0" is observable on the LLVM IR.
Am I misinterpreting your suggestion? Thanks in advance for you clarification.
Your comment in here suggests a full review of the parser to make the code consistent with the principle that a token should be eaten in the same function that checks for a specific pattern.
I like the suggestion, just a couple of questions:
- is it something that you would like to try yourself?
- do you want me to create a refactoring ticket for myself? Not sure when will I be highly available for a major refactor, but I can try to assign time at work for the task.
Let me know your preference.
Would you rather have:
- LLVM IR as input
- assembly as output
- the test placed under llvm/test/CodeGen/RISCV/
?
That would make the test more verbose, but it would reduce the testing scope. Please confirm your preference.
The test has been moved to llvm/test/CodeGen/RISCV
Also has been reworked for using LLVM IR as input, so clang is not needed in the loop.
My bad. You suggested IR as input (rather than C) and makes total sense. The latest update captures that suggestion.
- I'd prefer Lex() over equivalent getParser().Lex() because it is shorter.
- Ideally, every change should be tested.
- The tests should be minimal. That is, they should be assembly files passed to llvm-mc rather than ll files passed to llc.
I'm not very familiar with RISC-V, so I'll leave further review to code owners.
Rebased on top of the latest changes in RISCVAsmParser.cpp
Moved the testing to llc-mc test cases under lvm/test/MC/RISCV/
Covered with tests for every single case where the parser consumes a
token and a potential comment needs to be handled. Having manually
verified that the testing coverage is complete.
Reworked the code using getLexer().peekTokens for ignoring comments.
If a comment is not handled proper, now is a bug, because the feature
is complete and the testing has full coverage.
llvm/test/MC/RISCV/comments.ll | ||
---|---|---|
25 ↗ | (On Diff #532686) | This should be CHECK-NEXT rather than CHECK. Will update again tonight. |
On the zdinx test, changed a FileCheck comment that was written starting with "//" rather than "#".
Changed for "#" just for consistency with the rest of tests.
The tests:
- are under llvm/test/MC/RISCV
- use llvm-mc as assembler (rather than C!)
- are as simple as they can be
- cover every line touched by this patch.
Would you unblock this now? Thanks for your feedback so far.
Let me know if you miss any testing or places where comments are still not possible. Am pretty sure I am covering every case.
You have been added as coauthor here (with the standard "Co-Authored-By: " tag on the commit message) and you can close https://reviews.llvm.org/D153204 in return :)
I just got this flashback...
For code coverage of one of the Lex() calls in ParseInstruction we needed one case
where an instruction has no operands.
We had, not one, but three. Am keeping just the "nop" case, which is the simplest
and most known instruction and moving it to the top of the test suite, so the tests
are sorted by increasing complexity (from the simplest to reach test points to
the most sophisticated and least frequently used parser features).
llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp | ||
---|---|---|
1608 | This is much more common. |
Addressed some review comments:
- Doxygen-style comments are not common, so using triple-slash instead.
- Using a regular C array on a place where the usage of SmallVector is not justified.
- Proper capitalization of a variable
- Usage of "&&" rather than "and", to blend better with the existing code style.
Check for the comment content here too? # this is fine # this should also be fine