diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst --- a/llvm/docs/CommandGuide/llvm-symbolizer.rst +++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst @@ -12,7 +12,9 @@ ----------- :program:`llvm-symbolizer` reads input names and addresses from the command-line -and prints corresponding source code locations to standard output. +and prints corresponding source code locations to standard output. It can also +symbolize logs containing :doc:`Symbolizer Markup ` via +:option:`--filter`. If no address is specified on the command-line, it reads the addresses from standard input. If no input name is specified on the command-line, but addresses @@ -213,6 +215,12 @@ Look up the object using the given build ID, specified as a hexadecimal string. Mutually exclusive with :option:`--obj`. +.. option:: --color [=] + + Specify whether to use color in :option:`--filter` mode. Defaults to + ``auto``, which detects whether standard output supports color. Specifying + ``--color`` alone is equivalent to ``--color=always``. + .. option:: --debuginfod, --no-debuginfod Whether or not to try debuginfod lookups for debug binaries. Unless specified, @@ -239,6 +247,15 @@ link section, use the specified path as a basis for locating the debug data if it cannot be found relative to the object. +.. option:: --filter + + Reads logs from standard in, converts contained + :doc:`Symbolizer Markup ` into human-readable form, + and prints the results to standard output. Presently, only the following + markup elements are supported: + + * ``{{symbol}}`` + .. _llvm-symbolizer-opt-f: .. option:: --functions [=], -f diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst --- a/llvm/docs/Reference.rst +++ b/llvm/docs/Reference.rst @@ -43,6 +43,7 @@ StackMaps SpeculativeLoadHardening Statepoints + SymbolizerMarkupFormat SystemLibrary TestingGuide TransformMetadata @@ -79,6 +80,9 @@ :doc:`OptBisect` A command line option for debugging optimization-induced failures. +:doc:`SymbolizerMarkupFormat` + A reference for the log symbolizer markup accepted by ``llvm-symbolizer``. + :doc:`The Microsoft PDB File Format ` A detailed description of the Microsoft PDB (Program Database) file format. diff --git a/llvm/docs/SymbolizerMarkupFormat.rst b/llvm/docs/SymbolizerMarkupFormat.rst new file mode 100644 --- /dev/null +++ b/llvm/docs/SymbolizerMarkupFormat.rst @@ -0,0 +1,434 @@ +========================== +Symbolizer Markup Format +========================== + +.. contents:: + :local: + +Overview +======== + +This document defines a text format for log messages that can be processed by a +symbolizing filter. The basic idea is that logging code emits text that contains +raw address values and so forth, without the logging code doing any real work to +convert those values to human-readable form. Instead, logging text uses the +markup format defined here to identify pieces of information that should be +converted to human-readable form after the fact. As with other markup formats, +the expectation is that most of the text will be displayed as is, while the +markup elements will be replaced with expanded text, or converted into active UI +elements, that present more details in symbolic form. + +This means there is no need for symbol tables, DWARF debugging sections, or +similar information to be directly accessible at runtime. There is also no need +at runtime for any logic intended to compute human-readable presentation of +information, such as C++ symbol demangling. Instead, logging must include markup +elements that give the contextual information necessary to make sense of the raw +data, such as memory layout details. + +This format identifies markup elements with a syntax that is both simple and +distinctive. It's simple enough to be matched and parsed with straightforward +code. It's distinctive enough that character sequences that look like the start +or end of a markup element should rarely if ever appear incidentally in logging +text. It's specifically intended not to require sanitizing plain text, such as +the HTML/XML requirement to replace ``<`` with ``<`` and the like. + +:manpage:`llvm-symbolizer(1)` includes a symbolizing filter via its ``--filter`` +option. + +Scope and assumptions +===================== + +A symbolizing filter implementation will be independent both of the target +operating system and machine architecture where the logs are generated and of +the host operating system and machine architecture where the filter runs. + +This format assumes that the symbolizing filter processes intact whole lines. If +long lines might be split during some stage of a logging pipeline, they must be +reassembled to restore the original line breaks before feeding lines into the +symbolizing filter. Most markup elements must appear entirely on a single line +(often with other text before and/or after the markup element). There are some +markup elements that are specified to span lines, with line breaks in the middle +of the element. Even in those cases, the filter is not expected to handle line +breaks in arbitrary places inside a markup element, but only inside certain +fields. + +This format assumes that the symbolizing filter processes a coherent stream of +log lines from a single process address space context. If a logging stream +interleaves log lines from more than one process, these must be collated into +separate per-process log streams and each stream processed by a separate +instance of the symbolizing filter. Because the kernel and user processes use +disjoint address regions in most operating systems, a single user process +address space plus the kernel address space can be treated as a single address +space for symbolization purposes if desired. + +Dependence on Build IDs +======================= + +The symbolizer markup scheme relies on contextual information about runtime +memory address layout to make it possible to convert markup elements into useful +symbolic form. This relies on having an unmistakable identification of which +binary was loaded at each address. + +An ELF Build ID is the payload of an ELF note with name ``"GNU"`` and type +``NT_GNU_BUILD_ID``, a unique byte sequence that identifies a particular binary +(executable, shared library, loadable module, or driver module). The linker +generates this automatically based on a hash that includes the complete symbol +table and debugging information, even if this is later stripped from the binary. + +This specification uses the ELF Build ID as the sole means of identifying +binaries. Each binary relevant to the log must have been linked with a unique +Build ID. The symbolizing filter must have some means of mapping a Build ID back +to the original ELF binary (either the whole unstripped binary, or a stripped +binary paired with a separate debug file). + +Colorization +============ + +The markup format supports a restricted subset of ANSI X3.64 SGR (Select Graphic +Rendition) control sequences. These are unlike other markup elements: + +* They specify presentation details (bold or colors) rather than semantic + information. The association of semantic meaning with color (e.g. red for + errors) is chosen by the code doing the logging, rather than by the UI + presentation of the symbolizing filter. This is a concession to existing code + (e.g. LLVM sanitizer runtimes) that use specific colors and would require + substantial changes to generate semantic markup instead. + +* A single control sequence changes "the state", rather than being an + hierarchical structure that surrounds affected text. + +The filter processes ANSI SGR control sequences only within a single line. If a +control sequence to enter a bold or color state is encountered, it's expected +that the control sequence to reset to default state will be encountered before +the end of that line. If a "dangling" state is left at the end of a line, the +filter may reset to default state for the next line. + +An SGR control sequence is not interpreted inside any other markup element. +However, other markup elements may appear between SGR control sequences and the +color/bold state is expected to apply to the symbolic output that replaces the +markup element in the filter's output. + +The accepted SGR control sequences all have the form ``"\033[%um"`` (expressed here +using C string syntax), where ``%u`` is one of these: + +==== ============================ =============================================== +Code Effect Notes +==== ============================ =============================================== +0 Reset to default formatting. +1 Bold text Combines with color states, doesn't reset them. +30 Black foreground +31 Red foreground +32 Green foreground +33 Yellow foreground +34 Blue foreground +35 Magenta foreground +36 Cyan foreground +37 White foreground +==== ============================ =============================================== + +Common markup element syntax +============================ + +All the markup elements share a common syntactic structure to facilitate simple +matching and parsing code. Each element has the form:: + + {{{tag:fields}}} + +``tag`` identifies one of the element types described below, and is always a +short alphabetic string that must be in lower case. The rest of the element +consists of one or more fields. Fields are separated by ``:`` and cannot contain +any ``:`` or ``}`` characters. How many fields must be or may be present and +what they contain is specified for each element type. + +No markup elements or ANSI SGR control sequences are interpreted inside the +contents of a field. + +In the descriptions of each element type, ``printf``-style placeholders indicate +field contents: + +``%s`` + A string of printable characters, not including ``:`` or ``}``. + +``%p`` + An address value represented by ``0x`` followed by an even number of + hexadecimal digits (using either lower-case or upper-case for ``A``–``F``). + If the digits are all ``0`` then the ``0x`` prefix may be omitted. No more + than 16 hexadecimal digits are expected to appear in a single value (64 bits). + +``%u`` + A nonnegative decimal integer. + +``%i`` + A nonnegative integer. The digits are hexadecimal if prefixed by ``0x``, octal + if prefixed by ``0``, or decimal otherwise. + +``%x`` + A sequence of an even number of hexadecimal digits (using either lower-case or + upper-case for ``A``–``F``), with no ``0x`` prefix. This represents an + arbitrary sequence of bytes, such as an ELF Build ID. + +Presentation elements +===================== + +These are elements that convey a specific program entity to be displayed in +human-readable symbolic form. + +``{{{symbol:%s}}}`` + Here ``%s`` is the linkage name for a symbol or type. It may require + demangling according to language ABI rules. Even for unmangled names, it's + recommended that this markup element be used to identify a symbol name so that + it can be presented distinctively. + + Examples:: + + {{{symbol:_ZN7Mangled4NameEv}}} + {{{symbol:foobar}}} + +``{{{pc:%p}}}``, ``{{{pc:%p:ra}}}``, ``{{{pc:%p:pc}}}`` [#not_yet_implemented]_ + + Here ``%p`` is the memory address of a code location. It might be presented as a + function name and source location. The second two forms distinguish the kind of + code location, as described in detail for bt elements below. + + Examples:: + + {{{pc:0x12345678}}} + {{{pc:0xffffffff9abcdef0}}} + +``{{{data:%p}}}`` [#not_yet_implemented]_ + + Here ``%p`` is the memory address of a data location. It might be presented as + the name of a global variable at that location. + + Examples:: + + {{{data:0x12345678}}} + {{{data:0xffffffff9abcdef0}}} + +``{{{bt:%u:%p}}}``, ``{{{bt:%u:%p:ra}}}``, ``{{{bt:%u:%p:pc}}}`` [#not_yet_implemented]_ + + This represents one frame in a backtrace. It usually appears on a line by + itself (surrounded only by whitespace), in a sequence of such lines with + ascending frame numbers. So the human-readable output might be formatted + assuming that, such that it looks good for a sequence of bt elements each + alone on its line with uniform indentation of each line. But it can appear + anywhere, so the filter should not remove any non-whitespace text surrounding + the element. + + Here ``%u`` is the frame number, which starts at zero for the location of the + fault being identified, increments to one for the caller of frame zero's call + frame, to two for the caller of frame one, etc. ``%p`` is the memory address + of a code location. + + Code locations in a backtrace come from two distinct sources. Most backtrace + frames describe a return address code location, i.e. the instruction + immediately after a call instruction. This is the location of code that has + yet to run, since the function called there has not yet returned. Hence the + code location of actual interest is usually the call site itself rather than + the return address, i.e. one instruction earlier. When presenting the source + location for a return address frame, the symbolizing filter will subtract one + byte or one instruction length from the actual return address for the call + site, with the intent that the address logged can be translated directly to a + source location for the call site and not for the apparent return site + thereafter (which can be confusing). When inlined functions are involved, the + call site and the return site can appear to be in different functions at + entirely unrelated source locations rather than just a line away, making the + confusion of showing the return site rather the call site quite severe. + + Often the first frame in a backtrace ("frame zero") identifies the precise + code location of a fault, trap, or asynchronous interrupt rather than a return + address. At other times, even the first frame is actually a return address + (for example, backtraces collected at the time of an object allocation and + reported later when the allocated object is used or misused). When a system + supports in-thread trap handling, there may also be frames after the first + that represent a precise interrupted code location rather than a return + address, presented as the "caller" of a trap handler function (for example, + signal handlers in POSIX systems). + + Return address frames are identified by the ``:ra`` suffix. Precise code + location frames are identified by the ``:pc`` suffix. + + Traditional practice has often been to collect backtraces as simple address + lists, losing the distinction between return address code locations and + precise code locations. Some such code applies the "subtract one" adjustment + described above to the address values before reporting them, and it's not + always clear or consistent whether this adjustment has been applied or not. + These ambiguous cases are supported by the ``bt`` and ``pc`` forms with no + ``:ra`` or ``:pc`` suffix, which indicate it's unclear which sort of code + location this is. However, it's highly recommended that all emitters use the + suffixed forms and deliver address values with no adjustments applied. When + traditional practice has been ambiguous, the majority of cases seem to have + been of printing addresses that are return address code locations and printing + them without adjustment. So the symbolizing filter will usually apply the + "subtract one byte" adjustment to an address printed without a disambiguating + suffix. Assuming that a call instruction is longer than one byte on all + supported machines, applying the "subtract one byte" adjustment a second time + still results in an address somewhere in the call instruction, so a little + sloppiness here often does little or no harm. + + Examples:: + + {{{bt:0:0x12345678:pc}}} + {{{bt:1:0xffffffff9abcdef0:ra}}} + +``{{{hexdict:...}}}`` [#not_yet_implemented]_ + + This element can span multiple lines. Here ``...`` is a sequence of key-value + pairs where a single ``:`` separates each key from its value, and arbitrary + whitespace separates the pairs. The value (right-hand side) of each pair + either is one or more ``0`` digits, or is ``0x`` followed by hexadecimal + digits. Each value might be a memory address or might be some other integer + (including an integer that looks like a likely memory address but actually has + an unrelated purpose). When the contextual information about the memory layout + suggests that a given value could be a code location or a global variable data + address, it might be presented as a source location or variable name or with + active UI that makes such interpretation optionally visible. + + The intended use is for things like register dumps, where the emitter doesn't + know which values might have a symbolic interpretation but a presentation that + makes plausible symbolic interpretations available might be very useful to + someone reading the log. At the same time, a flat text presentation should + usually avoid interfering too much with the original contents and formatting + of the dump. For example, it might use footnotes with source locations for + values that appear to be code locations. An active UI presentation might show + the dump text as is, but highlight values with symbolic information available + and pop up a presentation of symbolic details when a value is selected. + + Example:: + + {{{hexdict: + CS: 0 RIP: 0x6ee17076fb80 EFL: 0x10246 CR2: 0 + RAX: 0xc53d0acbcf0 RBX: 0x1e659ea7e0d0 RCX: 0 RDX: 0x6ee1708300cc + RSI: 0 RDI: 0x6ee170830040 RBP: 0x3b13734898e0 RSP: 0x3b13734898d8 + R8: 0x3b1373489860 R9: 0x2776ff4f R10: 0x2749d3e9a940 R11: 0x246 + R12: 0x1e659ea7e0f0 R13: 0xd7231230fd6ff2e7 R14: 0x1e659ea7e108 R15: 0xc53d0acbcf0 + }}} + +Trigger elements +================ + +These elements cause an external action and will be presented to the user in a +human readable form. Generally they trigger an external action to occur that +results in a linkable page. The link or some other informative information about +the external action can then be presented to the user. + +``{{{dumpfile:%s:%s}}}`` [#not_yet_implemented]_ + + Here the first ``%s`` is an identifier for a type of dump and the second + ``%s`` is an identifier for a particular dump that's just been published. The + types of dumps, the exact meaning of "published", and the nature of the + identifier are outside the scope of the markup format per se. In general it + might correspond to writing a file by that name or something similar. + + This element may trigger additional post-processing work beyond symbolizing + the markup. It indicates that a dump file of some sort has been published. + Some logic attached to the symbolizing filter may understand certain types of + dump file and trigger additional post-processing of the dump file upon + encountering this element (e.g. generating visualizations, symbolization). The + expectation is that the information collected from contextual elements + (described below) in the logging stream may be necessary to decode the content + of the dump. So if the symbolizing filter triggers other processing, it may + need to feed some distilled form of the contextual information to those + processes. + + An example of a type identifier is ``sancov``, for dumps from LLVM + `SanitizerCoverage `_. + + Example:: + + {{{dumpfile:sancov:sancov.8675}}} + +Contextual elements +=================== + +These are elements that supply information necessary to convert presentation +elements to symbolic form. Unlike presentation elements, they are not directly +related to the surrounding text. Contextual elements should appear alone on +lines with no other non-whitespace text, so that the symbolizing filter might +elide the whole line from its output without hiding any other log text. + +The contextual elements themselves do not necessarily need to be presented in +human-readable output. However, the information they impart may be essential to +understanding the logging text even after symbolization. So it's recommended +that this information be preserved in some form when the original raw log with +markup may no longer be readily accessible for whatever reason. + +Contextual elements should appear in the logging stream before they are needed. +That is, if some piece of context may affect how the symbolizing filter would +interpret or present a later presentation element, the necessary contextual +elements should have appeared somewhere earlier in the logging stream. It should +always be possible for the symbolizing filter to be implemented as a single pass +over the raw logging stream, accumulating context and massaging text as it goes. + +``{{{reset}}}`` [#not_yet_implemented]_ + + This should be output before any other contextual element. The need for this + contextual element is to support implementations that handle logs coming from + multiple processes. Such implementations might not know when a new process + starts or ends. Because some identifying information (like process IDs) might + be the same between old and new processes, a way is needed to distinguish two + processes with such identical identifying information. This element informs + such implementations to reset the state of a filter so that information from a + previous process's contextual elements is not assumed for new process that + just happens have the same identifying information. + +``{{{module:%i:%s:%s:...}}}`` [#not_yet_implemented]_ + + This element represents a so-called "module". A "module" is a single linked + binary, such as a loaded ELF file. Usually each module occupies a contiguous + range of memory. + + Here ``%i`` is the module ID which is used by other contextual elements to + refer to this module. The first ``%s`` is a human-readable identifier for the + module, such as an ELF ``DT_SONAME`` string or a file name; but it might be + empty. It's only for casual information. Only the module ID is used to refer + to this module in other contextual elements, never the ``%s`` string. The + ``module`` element defining a module ID must always be emitted before any + other elements that refer to that module ID, so that a filter never needs to + keep track of dangling references. The second ``%s`` is the module type and it + determines what the remaining fields are. The following module types are + supported: + + * ``elf:%x`` + + Here ``%x`` encodes an ELF Build ID. The Build ID should refer to a single + linked binary. The Build ID string is the sole way to identify the binary from + which this module was loaded. + + Example:: + + {{{module:1:libc.so:elf:83238ab56ba10497}}} + +``{{{mmap:%p:%i:...}}}`` [#not_yet_implemented]_ + + This contextual element is used to give information about a particular region + in memory. ``%p`` is the starting address and ``%i`` gives the size in hex of the + region of memory. The ``...`` part can take different forms to give different + information about the specified region of memory. The allowed forms are the + following: + + * ``load:%i:%s:%p`` + + This subelement informs the filter that a segment was loaded from a module. + The module is identified by its module ID ``%i``. The ``%s`` is one or more of + the letters 'r', 'w', and 'x' (in that order and in either upper or lower + case) to indicate this segment of memory is readable, writable, and/or + executable. The symbolizing filter can use this information to guess whether + an address is a likely code address or a likely data address in the given + module. The remaining ``%p`` gives the module relative address. For ELF files + the module relative address will be the ``p_vaddr`` of the associated program + header. For example if your module's executable segment has + ``p_vaddr=0x1000``, ``p_memsz=0x1234``, and was loaded at ``0x7acba69d5000`` + then you need to subtract ``0x7acba69d4000`` from any address between + ``0x7acba69d5000`` and ``0x7acba69d6234`` to get the module relative address. + The starting address will usually have been rounded down to the active page + size, and the size rounded up. + + Example:: + + {{{mmap:0x7acba69d5000:0x5a000:load:1:rx:0x1000}}} + +.. rubric:: Footnotes + +.. [#not_yet_implemented] This markup element is not yet implemented in + :manpage:`llvm-symbolizer(1)`. diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Filter.h b/llvm/include/llvm/DebugInfo/Symbolize/Filter.h new file mode 100644 --- /dev/null +++ b/llvm/include/llvm/DebugInfo/Symbolize/Filter.h @@ -0,0 +1,76 @@ +//===- Filter.h -------------------------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +/// +/// \file +/// This file declares a log filter that replaces symbolizer markup with +/// human-readable expressions. +/// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_DEBUGINFO_SYMBOLIZE_FILTER_H +#define LLVM_DEBUGINFO_SYMBOLIZE_FILTER_H + +#include "Markup.h" + +#include "llvm/Support/WithColor.h" +#include "llvm/Support/raw_ostream.h" + +namespace llvm { +namespace symbolize { + +/// Filter to convert parsed log symbolizer markup elements into human-readable +/// text. +class LogFilter { +public: + LogFilter(raw_ostream &OS, Optional ColorsEnabled = llvm::None); + + /// Begins a logical \p Line of markup. + /// + /// This must be called for each line of the input stream before calls to + /// filter() for elements of that line. The provided \p Line must be that + /// passed to parseLine() to produce the elements to be later passed to + /// filter(). + /// + /// This informs the filter that a new line is beginning and establishes a + /// context for error location reporting. + void beginLine(StringRef Line); + + /// Handle a \p Node of symbolizer markup. + /// + /// If the node is a recognized, valid markup element, it is replaced with a + /// human-readable string. If the node isn't an element or the element isn't + /// recognized, it is output verbatim. If the element is recognized but isn't + /// valid, it is omitted from the output. + void filter(const MarkupNode &Node); + +private: + bool trySGR(const MarkupNode &Node); + + void beginHighlight(); + void endHighlight(); + void resetColorState(); + + bool checkTag(const MarkupNode &Node) const; + bool checkNumFields(const MarkupNode &Node, size_t Size) const; + + void reportTypeError(StringRef Str, StringRef TypeName) const; + void reportLocation(StringRef::iterator Loc) const; + + raw_ostream &OS; + const bool ColorsEnabled; + + StringRef Line; + + Optional Color; + bool Bold = false; +}; + +} // end namespace symbolize +} // end namespace llvm + +#endif // LLVM_DEBUGINFO_SYMBOLIZE_MARKUP_H diff --git a/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt b/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt --- a/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt +++ b/llvm/lib/DebugInfo/Symbolize/CMakeLists.txt @@ -1,6 +1,7 @@ add_llvm_component_library(LLVMSymbolize DIFetcher.cpp DIPrinter.cpp + Filter.cpp Markup.cpp SymbolizableObjectFile.cpp Symbolize.cpp diff --git a/llvm/lib/DebugInfo/Symbolize/Filter.cpp b/llvm/lib/DebugInfo/Symbolize/Filter.cpp new file mode 100644 --- /dev/null +++ b/llvm/lib/DebugInfo/Symbolize/Filter.cpp @@ -0,0 +1,143 @@ +//===-- lib/DebugInfo/Symbolize/Filter.cpp -------------------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +/// +/// \file +/// This file defines the implementation of a log filter that replaces +/// symbolizer markup with human-readable expressions. +/// +//===----------------------------------------------------------------------===// + +#include "llvm/DebugInfo/Symbolize/Filter.h" + +#include "llvm/ADT/None.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/StringSwitch.h" +#include "llvm/Demangle/Demangle.h" +#include "llvm/Support/WithColor.h" +#include "llvm/Support/raw_ostream.h" + +using namespace llvm; +using namespace llvm::symbolize; + +LogFilter::LogFilter(raw_ostream &OS, Optional ColorsEnabled) + : OS(OS), ColorsEnabled(ColorsEnabled.getValueOr( + WithColor::defaultAutoDetectFunction()(OS))) {} + +void LogFilter::beginLine(StringRef Line) { + this->Line = Line; + resetColorState(); +} + +void LogFilter::filter(const MarkupNode &Node) { + if (!checkTag(Node)) + return; + + if (trySGR(Node)) + return; + + if (Node.Tag == "symbol") { + if (!checkNumFields(Node, 1)) + return; + beginHighlight(); + OS << llvm::demangle(Node.Fields.front().str()); + endHighlight(); + return; + } + + OS << Node.Text; +} + +bool LogFilter::trySGR(const MarkupNode &Node) { + if (Node.Text == "\033[0m") { + resetColorState(); + return true; + } + if (Node.Text == "\033[1m") { + Bold = true; + if (ColorsEnabled) + OS.changeColor(raw_ostream::Colors::SAVEDCOLOR, Bold); + return true; + } + auto SGRColor = StringSwitch>(Node.Text) + .Case("\033[30m", raw_ostream::Colors::BLACK) + .Case("\033[31m", raw_ostream::Colors::RED) + .Case("\033[32m", raw_ostream::Colors::GREEN) + .Case("\033[33m", raw_ostream::Colors::YELLOW) + .Case("\033[34m", raw_ostream::Colors::BLUE) + .Case("\033[35m", raw_ostream::Colors::MAGENTA) + .Case("\033[36m", raw_ostream::Colors::CYAN) + .Case("\033[37m", raw_ostream::Colors::WHITE) + .Default(llvm::None); + if (SGRColor) { + Color = *SGRColor; + if (ColorsEnabled) + OS.changeColor(*Color); + return true; + } + + return false; +} + +// Begin highlighting text by picking a different color than the current color +// state. +void LogFilter::beginHighlight() { + if (!ColorsEnabled) + return; + OS.changeColor(Color == raw_ostream::Colors::BLUE ? raw_ostream::Colors::CYAN + : raw_ostream::Colors::BLUE, + Bold); +} + +// End highlighting text, restoring the current color state. +void LogFilter::endHighlight() { + if (!ColorsEnabled) + return; + if (Color) { + OS.changeColor(*Color, Bold); + } else { + OS.resetColor(); + if (Bold) + OS.changeColor(raw_ostream::Colors::SAVEDCOLOR, Bold); + } +} + +void LogFilter::resetColorState() { + if (!Color && !Bold) + return; + Color.reset(); + Bold = false; + if (ColorsEnabled) + OS.resetColor(); +} + +bool LogFilter::checkTag(const MarkupNode &Node) const { + if (any_of(Node.Tag, [](char C) { return C < 'a' || C > 'z'; })) { + WithColor::error(errs()) << "tags must be all lowercase characters\n"; + reportLocation(Node.Tag.begin()); + return false; + } + return true; +} + +bool LogFilter::checkNumFields(const MarkupNode &Node, size_t Size) const { + if (Node.Fields.size() != Size) { + WithColor::error(errs()) << "expected " << Size << " fields; found " + << Node.Fields.size() << "\n"; + reportLocation(Node.Tag.end()); + return false; + } + return true; +} + +void LogFilter::reportLocation(StringRef::iterator Loc) const { + errs() << Line; + for (unsigned I = 0, E = Loc - Line.begin(); I < E; ++I) + errs() << ' '; + WithColor(errs(), HighlightColor::String) << '^'; + errs() << '\n'; +} diff --git a/llvm/test/DebugInfo/symbolize-filter-color.test b/llvm/test/DebugInfo/symbolize-filter-color.test new file mode 100644 --- /dev/null +++ b/llvm/test/DebugInfo/symbolize-filter-color.test @@ -0,0 +1,31 @@ +RUN: echo -e "\033[1mbold\033[0mreset" > %t.input +RUN: echo -e "\033[1mboldnoreset" >> %t.input +RUN: echo -e "resetafternewline" >> %t.input +RUN: echo -e "\033[30mcolor\033[0m" >> %t.input +RUN: echo -e "\033[31mcolor\033[0m" >> %t.input +RUN: echo -e "\033[32mcolor\033[0m" >> %t.input +RUN: echo -e "\033[33mcolor\033[0m" >> %t.input +RUN: echo -e "\033[34mcolor\033[0m" >> %t.input +RUN: echo -e "\033[35mcolor\033[0m" >> %t.input +RUN: echo -e "\033[36mcolor\033[0m" >> %t.input +RUN: echo -e "\033[37mcolor\033[0m" >> %t.input +RUN: echo -e "\033[33mbefore{{{symbol:highlight}}}after\033[0m" >> %t.input +RUN: echo -e "\033[34msame{{{symbol:highlight}}}after\033[0m" >> %t.input +RUN: echo -e "\033[1mbold{{{symbol:highlight}}}after\033[0m" >> %t.input +RUN: llvm-symbolizer --filter --color=always < %t.input > %t.output +RUN: FileCheck %s --input-file=%t.output --match-full-lines --implicit-check-not {{.}} + +CHECK: {{.}}[1mbold{{.}}[0mreset +CHECK: {{.}}[1mboldnoreset +CHECK: {{.}}[0mresetafternewline +CHECK: {{.}}[0;30mcolor{{.}}[0m +CHECK: {{.}}[0;31mcolor{{.}}[0m +CHECK: {{.}}[0;32mcolor{{.}}[0m +CHECK: {{.}}[0;33mcolor{{.}}[0m +CHECK: {{.}}[0;34mcolor{{.}}[0m +CHECK: {{.}}[0;35mcolor{{.}}[0m +CHECK: {{.}}[0;36mcolor{{.}}[0m +CHECK: {{.}}[0;37mcolor{{.}}[0m +CHECK: {{.}}[0;33mbefore{{.}}[0;34mhighlight{{.}}[0;33mafter{{.}}[0m +CHECK: {{.}}[0;34msame{{.}}[0;36mhighlight{{.}}[0;34mafter{{.}}[0m +CHECK: {{.}}[1mbold{{.}}[0;1;34mhighlight{{.}}[0m{{.}}[1mafter{{.}}[0m diff --git a/llvm/test/DebugInfo/symbolize-filter-error-location.test b/llvm/test/DebugInfo/symbolize-filter-error-location.test new file mode 100644 --- /dev/null +++ b/llvm/test/DebugInfo/symbolize-filter-error-location.test @@ -0,0 +1,17 @@ +RUN: split-file %s %t +RUN: llvm-symbolizer --debug-file-directory=%p/Inputs --filter < %t/log > /dev/null 2> %t.err +RUN: FileCheck %s -input-file=%t.err --match-full-lines --strict-whitespace + +CHECK:error: expected 1 fields; found 0 +CHECK:[[BEGIN:[{]{3}]]symbol[[END:[}]{3}]] +CHECK: ^ +CHECK:error: expected 1 fields; found 0 +CHECK:foo[[BEGIN]]symbol[[END]]bar[[BEGIN]]symbol[[END]]baz +CHECK: ^ +CHECK:error: expected 1 fields; found 0 +CHECK:foo[[BEGIN]]symbol[[END]]bar[[BEGIN]]symbol[[END]]baz +CHECK: ^ + +;--- log +{{{symbol}}} +foo{{{symbol}}}bar{{{symbol}}}baz diff --git a/llvm/test/DebugInfo/symbolize-filter-symbol.test b/llvm/test/DebugInfo/symbolize-filter-symbol.test new file mode 100644 --- /dev/null +++ b/llvm/test/DebugInfo/symbolize-filter-symbol.test @@ -0,0 +1,10 @@ +RUN: split-file %s %t +RUN: llvm-symbolizer --filter < %t/input > %t.output +RUN: FileCheck %s --input-file=%t.output --match-full-lines --implicit-check-not {{.}} + +CHECK: foo +CHECK: Mangled::Name() + +;--- input +{{{symbol:foo}}} +{{{symbol:_ZN7Mangled4NameEv}}} diff --git a/llvm/test/DebugInfo/symbolize-filter-tag.test b/llvm/test/DebugInfo/symbolize-filter-tag.test new file mode 100644 --- /dev/null +++ b/llvm/test/DebugInfo/symbolize-filter-tag.test @@ -0,0 +1,10 @@ +RUN: split-file %s %t +RUN: llvm-symbolizer --filter < %t/input 2> %t.error +RUN: FileCheck %s --input-file=%t.error --match-full-lines + +CHECK: error: tags must be all lowercase characters +CHECK: error: tags must be all lowercase characters + +;--- input +{{{t2g}}} +{{{tAg}}} diff --git a/llvm/test/tools/llvm-symbolizer/filter.test b/llvm/test/tools/llvm-symbolizer/filter.test new file mode 100644 --- /dev/null +++ b/llvm/test/tools/llvm-symbolizer/filter.test @@ -0,0 +1,21 @@ +RUN: echo -e "a{{{symbol:foo}}}b\n{{{symbol:bar}}}\n" > %t.input +RUN: llvm-symbolizer --filter < %t.input > %t.nocolor +RUN: FileCheck %s --check-prefix=NOCOLOR --input-file=%t.nocolor --match-full-lines --implicit-check-not {{.}} + +NOCOLOR: afoob +NOCOLOR: bar + +RUN: llvm-symbolizer --filter --color < %t.input > %t.color +RUN: FileCheck %s --check-prefix=COLOR --input-file=%t.color --match-full-lines --implicit-check-not {{.}} + +RUN: llvm-symbolizer --filter --color=auto < %t.input > %t.autocolor +RUN: FileCheck %s --check-prefix=NOCOLOR --input-file=%t.autocolor --match-full-lines --implicit-check-not {{.}} + +RUN: llvm-symbolizer --filter --color=never < %t.input > %t.nevercolor +RUN: FileCheck %s --check-prefix=NOCOLOR --input-file=%t.nevercolor --match-full-lines --implicit-check-not {{.}} + +RUN: llvm-symbolizer --filter --color=always < %t.input > %t.alwayscolor +RUN: FileCheck %s --check-prefix=COLOR --input-file=%t.alwayscolor --match-full-lines --implicit-check-not {{.}} + +COLOR: a{{.}}[0;34mfoo{{.}}[0mb +COLOR: {{.}}[0;34mbar{{.}}[0m diff --git a/llvm/tools/llvm-symbolizer/Opts.td b/llvm/tools/llvm-symbolizer/Opts.td --- a/llvm/tools/llvm-symbolizer/Opts.td +++ b/llvm/tools/llvm-symbolizer/Opts.td @@ -23,12 +23,15 @@ def basenames : Flag<["--"], "basenames">, HelpText<"Strip directory names from paths">; defm build_id : Eq<"build-id", "Build ID used to look up the object file">; defm cache_size : Eq<"cache-size", "Max size in bytes of the in-memory binary cache.">; +def color : F<"color", "Use color when symbolizing log markup.">; +def color_EQ : Joined<["--"], "color=">, HelpText<"Whether to use color when symbolizing log markup: always, auto, never">, Values<"always,auto,never">; defm debug_file_directory : Eq<"debug-file-directory", "Path to directory where to look for debug files">, MetaVarName<"">; defm debuginfod : B<"debuginfod", "Use debuginfod to find debug binaries", "Don't use debuginfod to find debug binaries">; defm default_arch : Eq<"default-arch", "Default architecture (for multi-arch objects)">, Group; defm demangle : B<"demangle", "Demangle function names", "Don't demangle function names">; +def filter : Flag<["--"], "filter">, HelpText<"Filter symbolizer markup from log on stdin.">; def functions : F<"functions", "Print function name for a given address">; def functions_EQ : Joined<["--"], "functions=">, HelpText<"Print function name for a given address">, Values<"none,short,linkage">; def help : F<"help", "Display this help">; diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp --- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp +++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp @@ -19,6 +19,8 @@ #include "llvm/ADT/StringRef.h" #include "llvm/Config/config.h" #include "llvm/DebugInfo/Symbolize/DIPrinter.h" +#include "llvm/DebugInfo/Symbolize/Filter.h" +#include "llvm/DebugInfo/Symbolize/Markup.h" #include "llvm/DebugInfo/Symbolize/SymbolizableModule.h" #include "llvm/DebugInfo/Symbolize/Symbolize.h" #include "llvm/Debuginfod/DIFetcher.h" @@ -337,6 +339,17 @@ return IsAddr2Line ? FunctionNameKind::None : FunctionNameKind::LinkageName; } +static Optional parseColorArg(const opt::InputArgList &Args) { + if (Args.hasArg(OPT_color)) + return true; + if (const opt::Arg *A = Args.getLastArg(OPT_color_EQ)) + return StringSwitch>(A->getValue()) + .Case("always", true) + .Case("never", false) + .Case("auto", None); + return None; +} + static SmallVector parseBuildIDArg(const opt::InputArgList &Args, int ID) { const opt::Arg *A = Args.getLastArg(ID); @@ -352,6 +365,22 @@ return BuildID; } +// Symbolize the log from stdin and write the result to stdout. +static void filterLog(const opt::InputArgList &Args) { + MarkupParser Parser; + LogFilter Filter(outs(), parseColorArg(Args)); + for (std::string InputString; std::getline(std::cin, InputString);) { + InputString += '\n'; + Parser.parseLine(InputString); + Filter.beginLine(InputString); + while (Optional Element = Parser.nextNode()) + Filter.filter(*Element); + } + Parser.flush(); + while (Optional Element = Parser.nextNode()) + Filter.filter(*Element); +} + ExitOnError ExitOnErr; int main(int argc, char **argv) { @@ -413,6 +442,11 @@ } } + if (Args.hasArg(OPT_filter)) { + filterLog(Args); + return 0; + } + auto Style = IsAddr2Line ? OutputStyle::GNU : OutputStyle::LLVM; if (const opt::Arg *A = Args.getLastArg(OPT_output_style_EQ)) { if (strcmp(A->getValue(), "GNU") == 0)