This is an archive of the discontinued LLVM Phabricator instance.

[llvm] Add assembly syntax highlighting
ClosedPublic

Authored by JDevlieghere on Aug 29 2023, 9:39 PM.

Details

Summary

Add support for syntax highlighting assembly. The current patch uses a different color to highlight registers, immediate and addresses.

Diff Detail

Event Timeline

JDevlieghere created this revision.Aug 29 2023, 9:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 9:39 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
JDevlieghere requested review of this revision.Aug 29 2023, 9:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 9:39 PM

In its current state this patch is a WIP but I wanted to get some early feedback before I spend too much time on this. My motivation is color/syntax highlighting assembly in LLDB.

In 2019 there is a patch series (D65189/D65190/D65191). I wonder whether some parts can be reused.

In 2019 there is a patch series (D65189/D65190/D65191). I wonder whether some parts can be reused.

I like how the first patch merges the "markup" and the "highlighting": that's something I considered myself. I'll need to take a more thorough look tomorrow to understand the purpose of the span and stack it uses. At least for the arm64 backend I don't think that's necessary as the start and end are clearly delineated.

barannikov88 added inline comments.
llvm/include/llvm/MC/MCInstPrinter.h
40–42

Could be enum class instead of namespace.

llvm/lib/MC/MCInstPrinter.cpp
199

Some targets may want to highlight other operand types [that are unique to the target] and / or using different color set. Consider Hexagon: one might want to hightlight braces and other punctuators like '++', for example. Can this method be made AArch64-specific?

llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
64

It might make sense to use RAII object instead of resetting the style to None every time.

Overall this is a very cool idea. It does seem like it will need sanity checking for each architecture that enables it, but that doesn't mean opting in can't be as simple as a single bool somewhere.

I applied this and the lldb change and used lldb/test/Shell/Commands/command-disassemble-aarch64-extensions.s to test it since it's got most of the possible syntax in it.

First thing I notice is that in something like smmla v1.4s, v16.16b, v31.16b we don't treat the .16b as part of the operand. Same with /m and friends for SVE. I thought this would be bad but in fact I like it. It makes it clear what is register and what is format/mode so no one is thinking that p0/m is literally a register called p0/m, it's p0 with the /m mode for disabled lanes.

One strange thing was in one instruction the immediate is a different colour between objdump and lldb. Objump:

38: 54000490      bc.eq   <red>0xc8</red> <lbl>

lldb:

test.o[0x38] <+56>:  bc.eq  <yellow>0xc8</yellow>

Then I realised that this is because lldb doesn't resolve the branch target to a label because the function ends before it would see it. so it's still an immediate operand. Objdump finds the target which makes it an address. If I add a nop after the label so that the label is part of the chunk lldb looks at, I get:

test.o[0x38] <+56>:  bc.eq  <yellow>0xc8</yellow>           ; <+200>

So FYI in case you also stumble into that.

I unified coloring and markup and used a WithColor-like RAII object. I think compared to the existing code, it improves the readability a lot, while simultaneously completely abstracting the coloring.

Hoist llvm-objdump changes out of this patch.

  • Fix missing space that caused test failures.
  • Rename Markup::Address to Markup::Target and add Markup::Memory.

Seems fine, but we need some tests:)

Seems fine, but we need some tests:)

I added a test as part of D159234. That required a few more changes so I broke it up into separate patches to make it easier to review.

MaskRay added a comment.EditedAug 30 2023, 10:42 PM

Perhaps we can add an option to llvm-mc, similar to --mdis, and tests near test/MC/Disassembler/AArch64/marked-up.txt.

Then keep very simple tests in llvm-objdump to test that coloring can be enabled.

love it!

llvm/include/llvm/MC/MCInstPrinter.h
100

maybe this should be [[nodiscard]]

llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
282

stray tab?

295

likewise

jroelofs added inline comments.Aug 31 2023, 10:26 AM
llvm/include/llvm/MC/MCInstPrinter.h
95

Coloring the comments too would be nice.

JDevlieghere marked 6 inline comments as done.
  • Add llvm-mc test
  • Add [[nodiscard]]
  • Fix stray tab
llvm/include/llvm/MC/MCInstPrinter.h
95

Agreed. Time permitting I might add that in a follow-up.

llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
282

Yeah huh not sure how that snuck in there.

Limit formatted_raw_ostream workaround to disassembling with color only. Apparently this changes output for certain tests. The formatted_raw_ostream class has several issues related to buffering, but that's orthogonal to this patch.

Separate test into its own file so we can easily skip it on Windows.

MaskRay added inline comments.Aug 31 2023, 11:14 PM
llvm/test/MC/Disassembler/AArch64/colored.txt
4 ↗(On Diff #555255)

When testing something that is format sensitive, we want one test to test every space and typically use --strict-whitespace --match-full-lines, e.g. llvm/test/tools/llvm-readobj/ELF/relocations.test

6 ↗(On Diff #555255)

I wonder whether this is sufficient to cover changes? Things like printImmSVE seem uncovered?

llvm/tools/llvm-mc/llvm-mc.cpp
550 ↗(On Diff #555255)

Document this a bit in the description?

600 ↗(On Diff #555255)

IP is immediately dereferenced, so the assert seems unneeded (even if AC_MDisassemble uses it as well)

JDevlieghere marked 4 inline comments as done.Aug 31 2023, 11:33 PM
JDevlieghere added inline comments.
llvm/test/MC/Disassembler/AArch64/colored.txt
6 ↗(On Diff #555255)

Since the patch uses the same mechanism for emitting markup annotations and colors, I don't see the need to duplicate the testing for the two. The latter is (at least partially) covered by marked-up.txt. If we want to make sure every instruction type is covered, then the markup annotations are a much better way to do so: they're much easier to read and write and they are supported on all platforms (unlike ANSI escape codes).

JDevlieghere marked an inline comment as done.
MaskRay accepted this revision.Aug 31 2023, 11:42 PM

LGTM. I think @jhenderson may have some opinions.

llvm/docs/CommandGuide/llvm-mc.rst needs updating.

llvm/test/MC/Disassembler/AArch64/colored.txt
6 ↗(On Diff #555255)

Agree!

This revision is now accepted and ready to land.Aug 31 2023, 11:42 PM
jhenderson accepted this revision.Sep 1 2023, 12:55 AM

Looks reasonable to me, although it's been a while since I last looked at this sort of area.

llvm/lib/Target/AArch64/MCTargetDesc/AArch64InstPrinter.cpp
1240–1241

Just making sure I fully follow here: this lis essentially setting the markup to Immediate for the rest of the function, whereas the cases like in printShifter below just do it for the sequence of << commands?

This revision was landed with ongoing or failed builds.Sep 1 2023, 7:57 AM
This revision was automatically updated to reflect the committed changes.
JDevlieghere marked 2 inline comments as done.
Allen added a subscriber: Allen.Sep 1 2023, 9:28 AM
danilaml added inline comments.
llvm/include/llvm/MC/MCInstPrinter.h
100

[[nodiscard]] on constructors is a C++20 feature, even though it works on older compilers (like gcc 7.4 we claim to support), it still produces an annoying (especially if you compile with -Werror) warning. Just wanted to mention.

danilaml added inline comments.Sep 4 2023, 6:30 AM
llvm/include/llvm/MC/MCInstPrinter.h
100

Correction - it's a DR against C++17 so it's retroactively added there, but doesn't change the fact that older compilers may still produce warnings.

DavidSpickett added inline comments.Sep 5 2023, 1:02 AM
llvm/include/llvm/MC/MCInstPrinter.h
100

https://github.com/llvm/llvm-project/commit/48987fb8cc844667c6ee2a315140cb8f3d817fc1 removes it, just because it's blocking some downstream stuff we have.

I do see the logic in having it so we don't forget to colour things but not sure how to do that in a compatible way right now.

jroelofs added inline comments.Sep 5 2023, 8:34 AM
llvm/include/llvm/MC/MCInstPrinter.h
100

We could do the macro back-compat thing:

// [[nodiscard]] on constructors is a C++20 feature...
#if __cplusplus >= 202002L
#  define LLVM_CTOR_NODISCARD [[nodiscard]]
#else
#  define LLVM_CTOR_NODISCARD
#endif
danilaml added inline comments.Sep 5 2023, 10:55 AM
llvm/include/llvm/MC/MCInstPrinter.h
100

Since this was a defect report against C++17, it should've been applied against it retroactively, so the proper check would be something like this:

#ifdef __has_cpp_attribute
#  if  __has_cpp_attribute(nodiscard) >= 201907L
...

see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1771r1.pdf and https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations/#examples

JDevlieghere added inline comments.Sep 5 2023, 3:30 PM
llvm/include/llvm/MC/MCInstPrinter.h
100