This is an archive of the discontinued LLVM Phabricator instance.

[LLDB} Added syntax highlighting support
ClosedPublic

Authored by teemperor on Jul 13 2018, 6:15 PM.

Details

Summary

This patch adds syntax highlighting support to LLDB. When enabled (and lldb is allowed
to use colors), printed source code is annotated with the ANSI color escape sequences.

So far we have only one highlighter which is based on Clang and is responsible for all
languages that are supported by Clang. It essentially just runs the raw lexer over the input
and then surrounds the specific tokens with the configured escape sequences.

Diff Detail

Event Timeline

teemperor created this revision.Jul 13 2018, 6:15 PM

Also: This patch doesn't answer the question how the user will be able to configure the specific colors used for each token (i.e. there is only one style available at the moment). The reason for this is that there seems to be a lot of opinions about whether we want to flood the settings with highlighting-specific fields or if we just provide a fixed set of common styles. So I would prefer if we could defer the discussion about this to another patch that will address this (Unless we somehow magically achieve consensus on this one).

Eugene.Zelenko retitled this revision from Added syntax highlighting support to [LLDB} Added syntax highlighting support.Jul 14 2018, 10:46 AM
Eugene.Zelenko added a reviewer: zturner.
Eugene.Zelenko set the repository for this revision to rL LLVM.
Eugene.Zelenko added a project: Restricted Project.

Ansi escape codes do not work on Windows. LLVM’s stream output classes have
color support that does work in a cross platform manner. Can we use those
instead?

labath added a subscriber: labath.Jul 16 2018, 7:17 AM

We're also trying to avoid adding new clang-specific code to the debugger core. I think it would make more sense if the (clang-based) c++ highlighter was provided by some plugin. I see a couple of options:

  • the c++ language plugin: I think this is the most low-level plugin that is still language specific. However, it is specific to c(++), whereas here you format other languages too.
  • the clang expression parser plugin: it's a bit of a stretch, but syntax higlighting is a kind of expression parsing
  • a completely new plugin

@zturner We can migrate the existing AnsiTerminal.h to reuse the LLVM coloring backend. This way we fix also the code that already uses this convenient interface.

@labath I think we can add to the Language class the option to add its related syntax highlighting support. I'll check if/how that would work.

(Thanks for the reviews!)

teemperor planned changes to this revision.Jul 16 2018, 9:47 AM
teemperor updated this revision to Diff 157054.Jul 24 2018, 9:28 AM

[Updated patch to address Pavel's comments, thanks!]

@zturner So I looked into the Windows support:

Windows requires us to directly flush/signal/write/flush to the console output stream. However lldb's output is buffered by design into a StreamString first to prevent overlapping with the process output. So we can't just add Windows support to the coloring backend as we don't have direct access to the output stream.

If we want to fix this in lldb, then we could write an ANSI color code interpreter and let that run over our final output while we print it. That wouldn't be too complex and would fix all existing coloring output in lldb. It would also fix that lldb configs that enable color support on Windows are broken (because people just add color codes there).

However, it seems Windows starting with W10 anyway supports ANSI color codes (or at least you can enable them in the settings). So that interpreter is only really necessary until everyone moved to a version that supports color codes: https://docs.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences#span-idtextformattingspanspan-idtextformattingspanspan-idtextformattingspantext-formatting

So I would suggest we maybe just hack in the color interpreter and drop it when W7 reaches EoL (?). I can make another patch for that if it sounds good.

teemperor planned changes to this revision.Jul 24 2018, 2:52 PM

Actually, we also have to tear down the plugins, so let me fix that first.

teemperor updated this revision to Diff 157169.Jul 24 2018, 4:26 PM
teemperor added a reviewer: davide.
  • Removed some leftover code from the refactoring.
  • Correctly SetUp/TearDown the test case.

The patch is bigger than ideal for a single change, but I like the way it is structured. Thank you for abstracting the clang specifics.

The one high-level question that came to mind when looking this over is whether we really need to do all this file name matching to get the language type. I would expect we normally display the source code we have debug info for. Is that ever not the case? If it is, then we should be able to just get the language from the debug info and avoid relying on the file names. Have you looked at whether that is possible?

include/lldb/Core/Highlighter.h
50

given stream

150

s/hat/that/

source/Core/Highlighter.cpp
17–18

Are these clang includes used here? If they are, this entire abstraction exercise will have been for nothing.

30

This call isn't exactly cheap. Maybe you could just call the function once during initialization and just store the result?

35

This isn't correct, as you're not writing m_prefix, but it's transmogrified version. Btw, do you really need this return value anyway?

source/Plugins/Language/ClangCommon/ClangHighlighter.cpp
22

Is this used?

156

Do you need to make a copy of the data here? It looks like both of these objects get destroyed at the end of this function anyway.

unittests/Language/Highlighting/HighlighterTest.cpp
133–134

You can just do EXPECT_EQ("", highlightC("", s)). EXPECT_EQ uses operator==, which does the right thing when one of the args is a std::string.

teemperor marked 8 inline comments as done.
  • Addressed Pavel's comments. Also cleaned up some more includes.

Yes, knowing the language we currently display would be nice, but I didn't see a good way to reliably get this information. The File class doesn't know anything about the meaning of its contents. And the SourceManager can be given a SymbolContextList, but looking at the test coverage that list is often just a nullptr.

So I went with the file name matching as it doesn't have any surprising side effects (like starting to demangle symbols) or any special requirements to the caller (like already knowing the language type/symbol context of what we will display).

I think it's a reasonable fallback that should always work and we can later add functionality to pass a language to the SourceManager/File/etc., if the caller has that information available.

(Also thanks for the review).

teemperor added inline comments.Jul 25 2018, 1:26 PM
source/Core/Highlighter.cpp
35

Good catch. And the return value is just to make the SourceManager happy which always returns the total amount of bytes written. I'm working on a patch that will move all the 'written byte counting' in lldb into the Stream class, but as of now that's how it works.

Thanks for the explanation. This looks fine to me, though I would feel better if someone else gave it a look too.

source/Core/Highlighter.cpp
30

extra semicolons

35

That sounds like a good idea. When you do that, could you please refer to the llvm raw_ostream classes to see how they handle this. Long term, it would be great if we could replace lldb's classes with those, so I'd like to avoid diverging from them if possible.

teemperor updated this revision to Diff 157702.Jul 27 2018, 9:51 AM
  • Removed extra semicolons that slipped in somehow.
teemperor marked an inline comment as done.Jul 27 2018, 9:53 AM
teemperor added inline comments.
source/Core/Highlighter.cpp
35

raw_ostream has the same functionality with tell(), so that is actually in line with the idea of moving to LLVM's stream classes.

davide accepted this revision.Aug 1 2018, 2:15 PM
This revision is now accepted and ready to land.Aug 1 2018, 2:15 PM
This revision was automatically updated to reflect the committed changes.