Add support for syntax highlighting assembly. The current patch uses a different color to highlight registers, immediate and addresses.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
In its current state this patch is a WIP but I wanted to get some early feedback before I spend too much time on this. My motivation is color/syntax highlighting assembly in LLDB.
I like how the first patch merges the "markup" and the "highlighting": that's something I considered myself. I'll need to take a more thorough look tomorrow to understand the purpose of the span and stack it uses. At least for the arm64 backend I don't think that's necessary as the start and end are clearly delineated.
llvm/include/llvm/MC/MCInstPrinter.h | ||
---|---|---|
40–42 | Could be enum class instead of namespace. | |
llvm/lib/MC/MCInstPrinter.cpp | ||
199 | Some targets may want to highlight other operand types [that are unique to the target] and / or using different color set. Consider Hexagon: one might want to hightlight braces and other punctuators like '++', for example. Can this method be made AArch64-specific? | |
llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp | ||
64 | It might make sense to use RAII object instead of resetting the style to None every time. |
Overall this is a very cool idea. It does seem like it will need sanity checking for each architecture that enables it, but that doesn't mean opting in can't be as simple as a single bool somewhere.
I applied this and the lldb change and used lldb/test/Shell/Commands/command-disassemble-aarch64-extensions.s to test it since it's got most of the possible syntax in it.
First thing I notice is that in something like smmla v1.4s, v16.16b, v31.16b we don't treat the .16b as part of the operand. Same with /m and friends for SVE. I thought this would be bad but in fact I like it. It makes it clear what is register and what is format/mode so no one is thinking that p0/m is literally a register called p0/m, it's p0 with the /m mode for disabled lanes.
One strange thing was in one instruction the immediate is a different colour between objdump and lldb. Objump:
38: 54000490 bc.eq <red>0xc8</red> <lbl>
lldb:
test.o[0x38] <+56>: bc.eq <yellow>0xc8</yellow>
Then I realised that this is because lldb doesn't resolve the branch target to a label because the function ends before it would see it. so it's still an immediate operand. Objdump finds the target which makes it an address. If I add a nop after the label so that the label is part of the chunk lldb looks at, I get:
test.o[0x38] <+56>: bc.eq <yellow>0xc8</yellow> ; <+200>
So FYI in case you also stumble into that.
I unified coloring and markup and used a WithColor-like RAII object. I think compared to the existing code, it improves the readability a lot, while simultaneously completely abstracting the coloring.
- Fix missing space that caused test failures.
- Rename Markup::Address to Markup::Target and add Markup::Memory.
I added a test as part of D159234. That required a few more changes so I broke it up into separate patches to make it easier to review.
Perhaps we can add an option to llvm-mc, similar to --mdis, and tests near test/MC/Disassembler/AArch64/marked-up.txt.
Then keep very simple tests in llvm-objdump to test that coloring can be enabled.
llvm/include/llvm/MC/MCInstPrinter.h | ||
---|---|---|
95 | Coloring the comments too would be nice. |
Limit formatted_raw_ostream workaround to disassembling with color only. Apparently this changes output for certain tests. The formatted_raw_ostream class has several issues related to buffering, but that's orthogonal to this patch.
llvm/test/MC/Disassembler/AArch64/colored.txt | ||
---|---|---|
5 | When testing something that is format sensitive, we want one test to test every space and typically use --strict-whitespace --match-full-lines, e.g. llvm/test/tools/llvm-readobj/ELF/relocations.test | |
7 | I wonder whether this is sufficient to cover changes? Things like printImmSVE seem uncovered? | |
llvm/tools/llvm-mc/llvm-mc.cpp | ||
550 | Document this a bit in the description? | |
601 | IP is immediately dereferenced, so the assert seems unneeded (even if AC_MDisassemble uses it as well) |
llvm/test/MC/Disassembler/AArch64/colored.txt | ||
---|---|---|
7 | Since the patch uses the same mechanism for emitting markup annotations and colors, I don't see the need to duplicate the testing for the two. The latter is (at least partially) covered by marked-up.txt. If we want to make sure every instruction type is covered, then the markup annotations are a much better way to do so: they're much easier to read and write and they are supported on all platforms (unlike ANSI escape codes). |
LGTM. I think @jhenderson may have some opinions.
llvm/docs/CommandGuide/llvm-mc.rst needs updating.
llvm/test/MC/Disassembler/AArch64/colored.txt | ||
---|---|---|
7 | Agree! |
Looks reasonable to me, although it's been a while since I last looked at this sort of area.
llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp | ||
---|---|---|
1240–1241 | Just making sure I fully follow here: this lis essentially setting the markup to Immediate for the rest of the function, whereas the cases like in printShifter below just do it for the sequence of << commands? |
llvm/include/llvm/MC/MCInstPrinter.h | ||
---|---|---|
100 | [[nodiscard]] on constructors is a C++20 feature, even though it works on older compilers (like gcc 7.4 we claim to support), it still produces an annoying (especially if you compile with -Werror) warning. Just wanted to mention. |
llvm/include/llvm/MC/MCInstPrinter.h | ||
---|---|---|
100 | Correction - it's a DR against C++17 so it's retroactively added there, but doesn't change the fact that older compilers may still produce warnings. |
llvm/include/llvm/MC/MCInstPrinter.h | ||
---|---|---|
100 | https://github.com/llvm/llvm-project/commit/48987fb8cc844667c6ee2a315140cb8f3d817fc1 removes it, just because it's blocking some downstream stuff we have. I do see the logic in having it so we don't forget to colour things but not sure how to do that in a compatible way right now. |
llvm/include/llvm/MC/MCInstPrinter.h | ||
---|---|---|
100 | We could do the macro back-compat thing: // [[nodiscard]] on constructors is a C++20 feature... #if __cplusplus >= 202002L # define LLVM_CTOR_NODISCARD [[nodiscard]] #else # define LLVM_CTOR_NODISCARD #endif |
llvm/include/llvm/MC/MCInstPrinter.h | ||
---|---|---|
100 | Since this was a defect report against C++17, it should've been applied against it retroactively, so the proper check would be something like this: #ifdef __has_cpp_attribute # if __has_cpp_attribute(nodiscard) >= 201907L ... see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1771r1.pdf and https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations/#examples |
llvm/include/llvm/MC/MCInstPrinter.h | ||
---|---|---|
100 |
Could be enum class instead of namespace.