Page MenuHomePhabricator

[llvm-objdump] Implement highlighting
Needs ReviewPublic

Authored by seiya on Jul 23 2019, 10:28 PM.

Details

Summary

As the first step, this patch highlights some elements: registers and immediate values,
and some symbol names.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Jul 24 2019, 5:46 AM
llvm/test/tools/llvm-objdump/highlight.test
1 ↗(On Diff #211432)

Try and avoid adding a binary if you can. There's no particular reason this can't use yaml2obj to do things. Just add a .text section with appropriate content.

2 ↗(On Diff #211432)

REQUIRES usually go first in the output.

4 ↗(On Diff #211432)

Move this comment to near the top of the file, and make it a bit more verbose, so that it describes the test as a whole. Take a look at recent new tests in various of the LLVM tools for examples.

5 ↗(On Diff #211432)

I might be missing something, but how does this show that the syntax highlighting works? Is it cross-platform?

llvm/tools/llvm-objdump/llvm-objdump.cpp
336

LLVM already has a --color option (see for example llvm-dwarfdump output, WithColor::warning() and error() calls etc etc).

I'm inclined to think that syntax highlighting can be on by default and just use --color=false (or whatever it is) to turn it off. If we go that route, the --color option should be documented in the llvm-objdump command guide, and also we should check it is in the tool help text (since it's in a different option category).

803–804

How about instead of "HeaderLength" this is called "InstructionCol". The thing you're measuring isn't really a "Header".

1227

Use ArrayRef here.

grimar added inline comments.Jul 24 2019, 6:05 AM
llvm/test/tools/llvm-objdump/highlight.test
4 ↗(On Diff #211432)

We usually start comment from ## nowadays.

llvm/tools/llvm-objdump/llvm-objdump.cpp
350

It is probably not clear what is Reset. Can it be Default?

354

Make it const?

368

It is generally not a good practice to do a lookup twice.
You can use std::map::find() to avoid that.

(Also I do not think you will be able to use operator[] after making HighlightColors const)

1230

You use BeforeText only once, so you do not need it it seems (just inline it).

1244

Seems this would better look as ifs:

ObjdumpHighlightColor Color;
if (Span.Type == MarkupType::Imm)
  Color = ObjdumpHighlightColor::Immediate;
else if (Span.Type == MarkupType::Reg)
  Color = ObjdumpHighlightColor::Register;
else
  Color = ObjdumpHighlightColor::Reset;
MaskRay added inline comments.Jul 24 2019, 6:21 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
336

It looks lib/Support/WithColor.cpp#L106 already handles this...

bool WithColor::colorsEnabled() {
  if (DisableColors)
    return false;
  if (UseColor == cl::BOU_UNSET)
    return OS.has_colors();  // e.g. is a terminal
  return UseColor == cl::BOU_TRUE;
}
345

Maybe just ObjdumpColor.

369

Use WithColor::changeColor

seiya retitled this revision from [llvm-objcopy] Implement highlighting to [llvm-objdump] Implement highlighting.Jul 30 2019, 8:00 PM
seiya updated this revision to Diff 212532.Jul 31 2019, 1:46 AM
seiya marked 13 inline comments as done.
  • Use --color to enable syntax highlighting.
  • Addressed some review comments.
seiya added a comment.Jul 31 2019, 2:20 AM

Do you have a design or spec for what you consider to be "syntax highlighting?" From the early example you show for coloring literals, registers, etc., I don't see the benefit. For the current output, you can tell literals from registers, right? After all, the assembler is able to re-parse them in.

The original rough design is described in this document.

In addition to this patch, I plan to implement support for symbol names, relocations, and ascii strings in .rodata, and hopefully, highlight lines by instruction types (for example, highlight jmp instructions with green just like radare2 theme). As a handy (not feature-rich like radare2) disassembler, simple syntax highlighting feature is attractive at least to me.

After all, the assembler is able to re-parse them in.

I'm sorry but I can't see your point. Syntax highlighting is not for programs, but for humans to improve readability by emphasizing contexts.

And in general, I dislike color markups in output as they always end up relying on some system/platform screen/color support. For instance, I don't think your highlighting will work on my OpenVMS x86 system.

I agree with you that platform-independence is important too. I don't have experience with OpenVMS at all so could you explain why this patch won't work on it? This patch uses llvm::sys::Process::OutputColor internally (through raw_fd_ostream::changeColor) to avoid being platform-specific.

seiya marked 4 inline comments as done.Jul 31 2019, 2:40 AM
seiya added inline comments.
llvm/include/llvm/Support/WithColor.h
102

Changes to WithColor.cpp and WithColor.h should be a separate patch, but before creating that I'd like to hear your comments.

llvm/test/tools/llvm-objdump/highlight.test
5 ↗(On Diff #211432)

Oh I should have added shell to REQUIRES. This test makes sure that some elements in the disassembly are highlighted by checking ANSI escape sequences.

llvm/tools/llvm-objdump/llvm-objdump.cpp
336

I'm inclined to think that syntax highlighting can be on by default and just use --color=false (or whatever it is) to turn it off. If we go that route, the --color option should be documented in the llvm-objdump command guide, and also we should check it is in the tool help text (since it's in a different option category).

What do you think about the compatibility with GNU objdump? Making highlighting enabled by default is incompatible behavior with GNU objcopy. Using with objdump in script (e.g., llvm-objdump -d foo | grep bar) won't be broken (it checks if stdout is a tty) so it won't be problem I think.

336

Can you investigate a bit if it is possible to have both --color and --color=when ?

Did you mean whether can we allow both --color and --color=true in command-line options or not and how it is handled in colorsEnabled?

MaskRay added inline comments.Jul 31 2019, 4:46 AM
llvm/test/tools/llvm-objdump/highlight.test
4 ↗(On Diff #212532)
# UNSUPPORTED: system-windows
# REQUIRES: shell, x86-registered-target
8 ↗(On Diff #212532)

llvm-objdump -d --no-show-raw-insn --color %t | tr -d '\033' | FileCheck %s --check-prefix=ENABLED

15 ↗(On Diff #212532)

# ENABLED-NEXT: 619 movl 2099697([0;36m%rip[0m), [0;36m%edx[0m

MaskRay added inline comments.Jul 31 2019, 4:49 AM
llvm/lib/Support/WithColor.cpp
103

There is no need to split.

llvm/tools/llvm-objdump/llvm-objdump.cpp
2416

WithColor::isColorOptionSet() && outs().has_colors()

MaskRay added inline comments.Jul 31 2019, 4:53 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
2416

Place extern cl::opt<cl::boolOrDefault> UseColor; at toplevel.

ColorsEnabled = UseColor == cl::BOU_UNSET; may be simpler

MaskRay removed a subscriber: MaskRay.
seiya updated this revision to Diff 212689.Jul 31 2019, 3:57 PM
seiya marked 7 inline comments as done.
  • Addressed review comments.
seiya updated this revision to Diff 212692.Jul 31 2019, 4:00 PM
  • Update the command guide.
seiya added inline comments.Jul 31 2019, 4:11 PM
llvm/test/tools/llvm-objdump/highlight.test
8 ↗(On Diff #212532)

It looks much better. Thanks!

jhenderson added inline comments.Aug 5 2019, 6:47 AM
llvm/test/tools/llvm-objdump/highlight.test
4 ↗(On Diff #212532)

Do we need both UNSUPPORTED: system-windows and REQUIRES: shell? I'd think the latter effectively implies the former (but could be wrong).

1 ↗(On Diff #212692)

What is "it" here?

Perhaps make this more verbose: "Show that llvm-objdump highlights ... by default."

1 ↗(On Diff #211432)

I'm being dumb. This should probably just be an assembly file and use llvm-mc to compile it. That'll make the input text more readable too. Also, the test should be in the X86 directory.

llvm/tools/llvm-objdump/llvm-objdump.cpp
336

What do you think about the compatibility with GNU objdump?

Compatibility is only important for things that can affect parsers (though I have no idea why anybody would be parsing disassembly...). If this can't affect parser behaviour, then I don't think we need to worry about it. If it can, then it should be off by default.

369

Use WithColor::changeColor

This is marked as done, but I don't see it being used?

369–371

Rather than static casting things to int, for the assert, compare the enum values directly:

assert(Color != ObjdumpColor::Sentinel)

The whole point of a enum class is that you can't implicitly convert from/to them. You shouldn't be casting them ever.

Also add a message to the assert i.e. assert(... && "Color cannot be Sentinel");.

This all becomes moot if you follow this suggestion: instead of HighlightColors being an array, it should probably be a std::unordered_map or similar with ObjdumpColor as the key. That way you don't need the cast for the lookup, although you will need to check against HighlightColors.end().

seiya updated this revision to Diff 213841.Aug 7 2019, 4:15 AM
seiya marked 13 inline comments as done.
  • Addressed review comments.
seiya added inline comments.Aug 7 2019, 4:17 AM
llvm/test/tools/llvm-objdump/highlight.test
4 ↗(On Diff #212532)

Basically, REQUIRES: shell implies that the test environment is not Windows, but it looks that setting $LIT_USE_INTERNAL_SHELL forces running tests on Windows that requires shell [1].

Namely, UNSUPPORTED: system-windows disables this test on Windows even if $LIT_USE_INTERNAL_SHELL is set.

I'm not familiar with Windows, but I think this it makes sense because I suppose that SetConsoleTextAttribute API does not emit ANSI escape sequences.

[1]: https://reviews.llvm.org/D31423

1 ↗(On Diff #212692)

What is "it" here?

"it" referred to llvm-objdump. Fixed the comment.

1 ↗(On Diff #211432)

The input file should be an executable to test symbol highlighting because currently only RIP-relative symbols are supported. That said, I think this YAML lacks readability too. Do you have any idea?

llvm/tools/llvm-objdump/llvm-objdump.cpp
336

I think so too (nobody parse the disassembly output written to a tty not a pipe, I believe).

Do you think syntax highlighting should be enabled by default? This current implementation enables the feature only if --color is explicitly specified (ColorsEnabled = UseColor == cl::BOU_TRUE;).

369

I've replaced outs().changeColor with WithColor.

369–371

I used an array since it should be faster lookup but it was a premature optimization. I've replaced with std::map instead of std::unordered_map here. Enum class as key of std::unordered_map is not supported because of "the specified hash does not meet the Hash requirements" (I must have misunderstood something... it is supported in C++14).

jhenderson added inline comments.Aug 7 2019, 4:58 AM
llvm/test/tools/llvm-objdump/X86/highlight.test
1 ↗(On Diff #213841)

Let's rename this test "disassembly-highlighting.test" (or whatever best matches the naming style of other disassembly tests in llvm-objdump). In particular, saying "disassemble" or "disassembly" or similar in the test name is important.

On that note, this comment should say "when disassembling" somewhere.

3 ↗(On Diff #213841)

You can remove "and the X86 disassembler" now.

44 ↗(On Diff #213841)

The input file should be an executable to test symbol highlighting because currently only RIP-relative symbols are supported. That said, I think this YAML lacks readability too. Do you have any idea?

I don't I'm afraid. You might be able to simulate it using an assembly file with a single text section still.

If you do stick with the YAML, I think you probably have too much content in your sections. You only check for 3 instructions, so you can reduce the bytes in the content down to those few. You can also significantly simplify much of this object. You probably don't need either .data and .bss. You probably don't need the section alignment fields. You probably can simplify the addresses to round numbers. You may not need all those symbols either (if I'm reading the checks correctly, only 'foo' is important). You don't need the entry label. Basically, you probably are better off starting with an empty YAML and building it up to the bare minimum for the test rather than using obj2yaml on a binary you want to simulate.

llvm/tools/llvm-objdump/llvm-objdump.cpp
336

Yes, I think syntax highlighting can be on by default (it is for llvm-dwarfdump for example).

seiya updated this revision to Diff 214061.Aug 7 2019, 8:52 PM
seiya marked 7 inline comments as done.
  • Enable syntax highlighting by default.
  • Added WithColor::colorsAvailable
    • I'll submit this as a separate patch if it's a good approach.
  • Replaced the lengthy YAML with a hand-assembled shorter one.
seiya marked an inline comment as done.Aug 7 2019, 9:02 PM
seiya added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
2415

@MaskRay I've splitted WithColor::colorsEnabled into colorsEnabled and colorsAvailable again. Do you think I should do (UseColor == cl::BOU_UNSET && OS.has_colors()) || UseColor == cl::BOU_TRUE here instead? I wonder if keeping UseColor static is better.

jhenderson added inline comments.Aug 8 2019, 2:10 AM
llvm/include/llvm/Support/WithColor.h
97

I think this needs rephrasing. How about "Determine whether the specified stream supports colors."?

llvm/lib/Support/WithColor.cpp
99

How about naming this colorsEnabled too?

102

I'm not sure this clause makes sense to me? If the stream doesn't have colours, saying colours are available for it because somebody has specified --color seems wrong...

Note that colorsEnabled() handles this case via DisableColors, I believe.

llvm/test/tools/llvm-objdump/X86/disassembly-highlighting.test
3

The comment says "by default", but the test doesn't show the default behaviour! It only shows behaviour when --color or --color=false is specified.

seiya marked 2 inline comments as done.Aug 12 2019, 1:57 AM
seiya added inline comments.
llvm/include/llvm/Support/WithColor.h
97

That sounds not sufficient to me. colorsAvailable returns true if the specified stream supports colors or --color=1 is given. As you suggested in another comment, I think adding another colorsEnabled() would be better:

/// Determine whether colors are displayed.
bool colorsEnabled();
static bool colorsEnabled(raw_ostream &OS);
llvm/lib/Support/WithColor.cpp
102

I'm not sure this clause makes sense to me? If the stream doesn't have colours, saying colours are available for it because somebody has specified --color seems wrong...

That makes sense. I'll rename it to colorsEnabled.

Note that colorsEnabled() handles this case via DisableColors, I believe.

DisableColors is not related to the --color option: it disables colors regardless of the option:

/// @param DisableColors Whether to ignore color changes regardless of -color
/// and support in OS
WithColor(raw_ostream &OS, HighlightColor S, bool DisableColors = false)
seiya marked an inline comment as done.Aug 12 2019, 1:58 AM
seiya added inline comments.
llvm/test/tools/llvm-objdump/X86/disassembly-highlighting.test
3

Good catch! I'll fix this.

rupprecht added inline comments.Aug 15 2019, 4:08 PM
llvm/tools/llvm-objdump/llvm-objdump.cpp
366

IIUC, using WithColor to do nested coloring doesn't really work -- ~WithColor resets the terminal color to plain, *not* to what the color was before. So if you have some stream of characters like:
AAABBBCCCBBB
where A maps to no coloring (X), B maps to red (R), and C maps to green (G), you should want to see:
XXXRRRGGGRRR
But I think it will actually be:
XXXRRRGGGXXX
because ~WithColor will reset to plain after closing the span of green.

In practice, I'm not sure any assembly actually has anything after nested spans -- the end of any nested span will line up with the span it's nested in. Still, it seems fragile/something that should be checked for somehow. Is this something you're handling somewhere that I'm just missing?

seiya marked an inline comment as done.Aug 16 2019, 2:54 AM
seiya added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
366

Good point. As you said, this patch doesn't support such a case. A good example is memory operand:

movq  <imm:$1>,  <mem:(<reg:%rax>, <reg:%rbx>)>
                      ^          ^                        
                Change color.    WithColor resets the color here.

I'll consider implementing a new class WithHighlightColor, which behaves like WithColor, but it saves/restores colors using a stack or something.

MaskRay added inline comments.Aug 16 2019, 3:28 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
350

= 0 can be deleted. You don't use its integral value.

seiya updated this revision to Diff 216115.Aug 20 2019, 5:51 AM
seiya marked 9 inline comments as done.
seiya added inline comments.Aug 20 2019, 5:52 AM
llvm/include/llvm/Support/WithColor.h
97

Updated to "Determine whether colors are displayed..." instead of "... stream supports colors" because even if the stream supports colors, it can be disabled by -color=false.

llvm/test/tools/llvm-objdump/X86/disassembly-highlighting.test
3

I just remembered that I added --color intentionally because stdout is not a tty. I've updated the comment: "... by default" → "... if colors are enabled".

seiya updated this revision to Diff 216117.Aug 20 2019, 5:55 AM

clang-formatted

jhenderson added inline comments.Aug 20 2019, 6:14 AM
llvm/lib/Support/WithColor.cpp
102

Under what situations can OS.has_colors() return false? I'm not convinced we want to return true if a stream doesn't have colours, even if the switch has been specified.

llvm/tools/llvm-objdump/llvm-objdump.cpp
368

a -> A

Would it make sense for this code to exist in WithColor.h/.cpp?

In fact, I wonder if this should be used by WithColor directly, i.e. have WithColor restore the previous state by default, rather than the plain version.

370

Consider replacing the pair with a simple struct e.g. ColorState. That'll make the code using it much clearer.

397

destructed -> destroyed

seiya marked 2 inline comments as done.Mon, Aug 26, 7:57 PM
seiya added inline comments.
llvm/lib/Support/WithColor.cpp
102

In UNIX,OS.has_colors() returns false if either the stdout is not a tty (say, pipe) or the it does not support colors. Even if it does not support colors, it's sometimes useful to enable colors with the switch.

For example, in disassembly-highlighting.test, since the stdout of llvm-objdump is a pipe to FileCheck, OS.has_colors() will return false. However, in order to check that the output includes colors, we need a way to force colors by the switch.

llvm/tools/llvm-objdump/llvm-objdump.cpp
368

Would it make sense for this code to exist in WithColor.h/.cpp?

I think so too. I'll submit this part as a separate patch.

In fact, I wonder if this should be used by WithColor directly, i.e. have WithColor restore the previous state by default, rather than the plain version.

That sounds good to me. I'll try it.