This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
1/3
llvm-symbolizer.rst
-
include/llvm/DebugInfo/Symbolize/
-
llvm/
-
DebugInfo/
-
Symbolize/
1
MarkupFilter.h
-
lib/DebugInfo/Symbolize/
-
DebugInfo/
-
Symbolize/
1/2
MarkupFilter.cpp
-
test/DebugInfo/
-
DebugInfo/
-
symbolize-filter-markup-dump-process-context.test
-
tools/llvm-symbolizer/
-
llvm-symbolizer/
1
Opts.td
-
llvm-symbolizer.cpp

Differential D146854

[Symbolizer] Add flag to dump process context JSON from markup
Needs ReviewPublic

Authored by mysterymath on Mar 24 2023, 4:42 PM.

Download Raw Diff

Details

Reviewers

mcgrathr
phosek
jhenderson
MaskRay

Summary

This creates a simple JSON format for representing the process contexts
encountered by the markup filter and a flag, --dump-process-context, for
exporting it. Using this flag suppresses the usual symbolized log output
and instead emits this JSON. This provides a machine-readable
representation of the layouts of the processes that emitted the
marked-up logs.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mysterymath created this revision.Mar 24 2023, 4:42 PM

Herald added a reviewer: jhenderson. · View Herald TranscriptMar 24 2023, 4:42 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: hiraditya. · View Herald Transcript

mysterymath requested review of this revision.Mar 24 2023, 4:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 4:42 PM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Minimize diff.

Add check for flag validity.

Harbormaster completed remote builds in B221705: Diff 508252.Mar 24 2023, 5:33 PM

A few minor points from me that I spotted whilst skimming over this due to my herald rule pinging me. I am not familiar enough with the markup stuff to be able to review that.

llvm/test/DebugInfo/symbolize-filter-markup-dump-context.test
5–6 ↗	(On Diff #508252)	You should check that the error message is the expected error message here. Otherwise, these could be failing for an unrelated reason and you wouldn't notice.
9 ↗	(On Diff #508252)	CHECK-NEXT (or CHECK-SAME) here and below?
llvm/tools/llvm-symbolizer/Opts.td
34	Nit: for consistency with other options.

dblaikie added a subscriber: dblaikie.Mar 27 2023, 11:03 AM

Address review comments.

Harbormaster completed remote builds in B222074: Diff 508741.Mar 27 2023, 1:04 PM

Thanks. No more comments from me.

"context" as the name for this confuses me a bit - perhaps you can describe it more (maybe I'm jus tthinking about it wrong, or maybe it could benefit from a rename)

When I think of "dumping context" - I think of dumping the text surrounding the symbolizer markup, which it isn't doing. It looks like it's something like "dump markup as json", is that accurate? (& it drops/doesn't print out any of the non-markup content present in the input?)

phosek added inline comments.Mar 28 2023, 9:02 AM

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
451	For attribute naming, in JSON generated from LLVM tools we typically use either `PascalCase` or `snake_case`, `camelCase` is a bit unusual.

In D146854#4227650, @dblaikie wrote:

"context" as the name for this confuses me a bit - perhaps you can describe it more (maybe I'm jus tthinking about it wrong, or maybe it could benefit from a rename)

When I think of "dumping context" - I think of dumping the text surrounding the symbolizer markup, which it isn't doing. It looks like it's something like "dump markup as json", is that accurate? (& it drops/doesn't print out any of the non-markup content present in the input?)

The term "context" comes from the symbolizer markup format: https://llvm.org/docs/SymbolizerMarkupFormat.html#contextual-elements
In this case it's not actually dumping the presentation requests from the markup, it's dumping a completed representation of the process layout contained in the "contextual elements" of the markup. These contextualize virtual addresses to allow symbolizing them later; and it's that contextual information that it'd be useful to export and import into other tools.

mysterymath added inline comments.Mar 28 2023, 2:54 PM

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
451	I did a quick search around the codebase, and it doesn't look like there's a consistent convention; there's kebab case, pascal case, and camelCase all represented. Didn't really take the time to get accurate counts, but it does seem that various style guides suggest lower camel case: https://google.github.io/styleguide/jsoncstyleguide.xml https://jsonapi.org/recommendations/

mysterymath added a child revision: D148045: [NFC][Symbolizer] Refactor out ProcessContext.Apr 11 2023, 1:49 PM

Rebase.

Harbormaster completed remote builds in B224868: Diff 512581.Apr 11 2023, 2:50 PM

Dump name field as well.

Harbormaster completed remote builds in B224891: Diff 512611.Apr 11 2023, 4:11 PM

mysterymath added a reviewer: MaskRay.Apr 12 2023, 11:37 AM

In D146854#4228721, @mysterymath wrote:

In D146854#4227650, @dblaikie wrote:

"context" as the name for this confuses me a bit - perhaps you can describe it more (maybe I'm jus tthinking about it wrong, or maybe it could benefit from a rename)

When I think of "dumping context" - I think of dumping the text surrounding the symbolizer markup, which it isn't doing. It looks like it's something like "dump markup as json", is that accurate? (& it drops/doesn't print out any of the non-markup content present in the input?)

The term "context" comes from the symbolizer markup format: https://llvm.org/docs/SymbolizerMarkupFormat.html#contextual-elements
In this case it's not actually dumping the presentation requests from the markup, it's dumping a completed representation of the process layout contained in the "contextual elements" of the markup. These contextualize virtual addresses to allow symbolizing them later; and it's that contextual information that it'd be useful to export and import into other tools.

I have the same confusion as well. If there are most significant contextual elements, can their names be mentioned in the help message of --dump-context? Just mentioning the element name should still make the message brief.

llvm/docs/CommandGuide/llvm-symbolizer.rst
247

Defined markup contexts briefly in the flag help message and more thoroughly in
the rst documentation.

Make --filter-markup a doc reference.

Harbormaster completed remote builds in B226732: Diff 515136.Apr 19 2023, 5:53 PM

Rename "[markup] context" to "process context".

mysterymath retitled this revision from [Symbolizer] Add flag to dump symbolizer markup context JSON. to [Symbolizer] Add flag to dump process context JSON from markup.Apr 24 2023, 3:44 PM

mysterymath edited the summary of this revision. (Show Details)

mcgrathr added inline comments.Apr 28 2023, 10:24 PM

llvm/docs/CommandGuide/llvm-symbolizer.rst
248	I think these should be independent options: that is, you can produce filter output, or JSON output, or both. To me, it seems reasonable to have the JSON switch take a separate explicit output file and have a separate flag that says whether or not to filtered markup output, e.g. --parse-markup that is implied by --filter-markup but can be specified separately to consume that input (and which is superfluous with --filter-markup and useless without either --filter-markup or --dump-context). As I mentioned pedantically in another change, I think "process" is actually overly and unnecessarily specific here. Perhaps in the context (pun intention unspecified) of llvm-symbolizer, "context" alone is sufficient, but "address context" also seems pretty clear, specific, and precisely applicable to what matters about it to symbolization. In the long run, I think it makes sense for this not to even imply --parse-markup because we can and should eventually have other sources for the same kind of information, such as switches like elfutils tools such as eu-unstrip have, for /proc/PID/maps (Linux format) text files, live process IDs (that on systems like Linux means read /proc/PID/maps), ELF core files that have NT_GNU_BUILD_ID notes themselves and/or PT_LOAD memory images with enough ELF data to reconstruct runtime layouts and build IDs post mortem, Linux kernel module layouts, and more not covered in elfutils, like minidump format or whatever else. Of course, there could be separate tools that produce this same JSON schema for each of those things, but for at least some of them building direct command-line switch support that can be used in llvm-symbolizer and other tools will make sense. I think --parse-markup also has a nice parity with the coming switch to parse this JSON format as the input source. Furthermore, --parse-markup alone is useful for traditional llvm-symbolizer uses without --filter-markup: I can use --parse-markup=logfile when feeding the symbolizer addresses manually on the command line (or via the stdin protocol if markup input and symbolization-request input formats can come from different explicit inputs since only one or the other can be stdin).
252	When we implement support for the trigger elements (dumpfile), that will be a separate reason to distinguish different context snapshot states. (In between reset elements, trigger elements need to be associated with the context snapshot formed from only the module+mmap elements that were emitted before the trigger element, not later additions/changes.) So I think it makes sense to preemptively describe it here, and specify the JSON schema, as being an array of context snapshots. In future each one might contain descriptors for additional non-context elements that are to be interpreted in the context of the snapshot they appear in. In fact, it could be an option in the JSON dumper feature even now to include the bt et al that were mentioned as if they were trigger elements. That's really tantamount to implementing trigger elements and then adding another switch that says "make all the presentation elements act as trigger elements" too, so it might as well come after trigger elements. But in how we prepare the schema for future optional expansion, we should think about those cases now even if we take a while to implement them all.
llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
35	It seems like it might be a nicer structure to keep this just responsible for the parsing and not the dumping per se. That is, instead of just a built-in flag here, this could be an optional callback to make with ProcessContext whenever a new one is ready, paired with a flag about whether to produce the transformed version of the input text (that could just be a nullable OS pointer here). Then a separate ProcessContext->JSON layer can be trivially plumbed to build a callback to pass here. When you add the JSON input option, then plumbing that to do JSON->ProcessContext->JSON and markup->ProcessContext->JSON and various such combinations as unit tests becomes very attractive.

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-symbolizer.rst

10 lines

include/

llvm/

DebugInfo/

Symbolize/

MarkupFilter.h

11 lines

lib/

DebugInfo/

Symbolize/

MarkupFilter.cpp

112 lines

test/

DebugInfo/

symbolize-filter-markup-dump-process-context.test

74 lines

tools/

llvm-symbolizer/

Opts.td

1 line

llvm-symbolizer.cpp

11 lines

Diff 516540

llvm/docs/CommandGuide/llvm-symbolizer.rst

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines

.. _llvm-symbolizer-opt-C:

.. option:: --demangle, -C

Print demangled function names, if the names are mangled (e.g. the mangled

name `_Z3bazv` becomes `baz()`, whilst the non-mangled name `foz` is printed

as is). Defaults to true.

.. option:: --dump-process-context, --no-dump-process-context

Only valid with :option:`--filter-markup`. Emits a JSON representation of the

MaskRayUnsubmitted

Done

.. option:: --dump-context, --no-dump-context

- Only valid with `--filter-markup`. Emits a JSON representation of the

+ Only valid with ``--filter-markup``. Emits a JSON representation of the

encountered contexts (i.e., process layouts) instead of symbolizing the

MaskRay:

encountered process contexts instead of symbolizing the markup. A process

mcgrathrUnsubmitted

Not Done

I think these should be independent options: that is, you can produce filter output, or JSON output, or both.
To me, it seems reasonable to have the JSON switch take a separate explicit output file and have a separate flag that says whether or not to filtered markup output, e.g. --parse-markup that is implied by --filter-markup but can be specified separately to consume that input (and which is superfluous with --filter-markup and useless without either --filter-markup or --dump-context).

As I mentioned pedantically in another change, I think "process" is actually overly and unnecessarily specific here. Perhaps in the context (pun intention unspecified) of llvm-symbolizer, "context" alone is sufficient, but "address context" also seems pretty clear, specific, and precisely applicable to what matters about it to symbolization.

In the long run, I think it makes sense for this not to even imply --parse-markup because we can and should eventually have other sources for the same kind of information, such as switches like elfutils tools such as eu-unstrip have, for /proc/PID/maps (Linux format) text files, live process IDs (that on systems like Linux means read /proc/PID/maps), ELF core files that have NT_GNU_BUILD_ID notes themselves and/or PT_LOAD memory images with enough ELF data to reconstruct runtime layouts and build IDs post mortem, Linux kernel module layouts, and more not covered in elfutils, like minidump format or whatever else. Of course, there could be separate tools that produce this same JSON schema for each of those things, but for at least some of them building direct command-line switch support that can be used in llvm-symbolizer and other tools will make sense.

I think --parse-markup also has a nice parity with the coming switch to parse this JSON format as the input source.

Furthermore, --parse-markup alone is useful for traditional llvm-symbolizer uses without --filter-markup: I can use --parse-markup=logfile when feeding the symbolizer addresses manually on the command line (or via the stdin protocol if markup input and symbolization-request input formats can come from different explicit inputs since only one or the other can be stdin).

mcgrathr: I think these should be independent options: that is, you can produce filter output, or JSON…

context is a map of the process's runtime memory layout obtained from the

contextual markup elements, i.e. `module`, `mmap`, etc.

Markup may have multiple contexts separated by `reset` elements. The contexts

mcgrathrUnsubmitted

Not Done

When we implement support for the trigger elements (dumpfile), that will be a separate reason to distinguish different context snapshot states. (In between reset elements, trigger elements need to be associated with the context snapshot formed from only the module+mmap elements that were emitted before the trigger element, not later additions/changes.)

So I think it makes sense to preemptively describe it here, and specify the JSON schema, as being an array of context snapshots. In future each one might contain descriptors for additional non-context elements that are to be interpreted in the context of the snapshot they appear in. In fact, it could be an option in the JSON dumper feature even now to include the bt et al that were mentioned as if they were trigger elements. That's really tantamount to implementing trigger elements and then adding another switch that says "make all the presentation elements act as trigger elements" too, so it might as well come after trigger elements. But in how we prepare the schema for future optional expansion, we should think about those cases now even if we take a while to implement them all.

mcgrathr: When we implement support for the trigger elements (dumpfile), that will be a separate reason…

are returned as a JSON array of objects, even if only one context is present.

.. option:: --dwp <path>

Use the specified DWP file at ``<path>`` for any CUs that have split DWARF

debug data.

.. option:: --fallback-debug-path <path>

When a separate file contains debug data, and is referenced by a GNU debug

▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines

llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h

Show All 12 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_DEBUGINFO_SYMBOLIZE_MARKUPFILTER_H		#ifndef LLVM_DEBUGINFO_SYMBOLIZE_MARKUPFILTER_H
#define LLVM_DEBUGINFO_SYMBOLIZE_MARKUPFILTER_H		#define LLVM_DEBUGINFO_SYMBOLIZE_MARKUPFILTER_H

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/DebugInfo/Symbolize/Markup.h"		#include "llvm/DebugInfo/Symbolize/Markup.h"
#include "llvm/Object/BuildID.h"		#include "llvm/Object/BuildID.h"
		#include "llvm/Support/JSON.h"
#include "llvm/Support/WithColor.h"		#include "llvm/Support/WithColor.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <map>		#include <map>

namespace llvm {		namespace llvm {
namespace symbolize {		namespace symbolize {

class LLVMSymbolizer;		class LLVMSymbolizer;

/// Filter to convert parsed log symbolizer markup elements into human-readable		/// Filter to convert parsed log symbolizer markup elements into human-readable
/// text.		/// text.
class MarkupFilter {		class MarkupFilter {
public:		public:
		/// @param DumpProcessContext If true, instead of symbolizing the log, emit a
		mcgrathrUnsubmitted Not Done Reply Inline Actions It seems like it might be a nicer structure to keep this just responsible for the parsing and not the dumping per se. That is, instead of just a built-in flag here, this could be an optional callback to make with ProcessContext whenever a new one is ready, paired with a flag about whether to produce the transformed version of the input text (that could just be a nullable OS pointer here). Then a separate ProcessContext->JSON layer can be trivially plumbed to build a callback to pass here. When you add the JSON input option, then plumbing that to do JSON->ProcessContext->JSON and markup->ProcessContext->JSON and various such combinations as unit tests becomes very attractive. mcgrathr: It seems like it might be a nicer structure to keep this just responsible for the parsing and…
		/// JSON representation of the encountered process contexts.
MarkupFilter(raw_ostream &OS, LLVMSymbolizer &Symbolizer,		MarkupFilter(raw_ostream &OS, LLVMSymbolizer &Symbolizer,
std::optional<bool> ColorsEnabled = std::nullopt);		std::optional<bool> ColorsEnabled = std::nullopt,
		bool DumpProcessContext = false);

/// Filters a line containing symbolizer markup and writes the human-readable		/// Filters a line containing symbolizer markup and writes the human-readable
/// results to the output stream.		/// results to the output stream.
///		///
/// Invalid or unimplemented markup elements are removed. Some output may be		/// Invalid or unimplemented markup elements are removed. Some output may be
/// deferred until future filter() or finish() call.		/// deferred until future filter() or finish() call.
void filter(StringRef Line);		void filter(StringRef Line);

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	private:
void filterNode(const MarkupNode &Node);		void filterNode(const MarkupNode &Node);

bool tryPresentation(const MarkupNode &Node);		bool tryPresentation(const MarkupNode &Node);
bool trySymbol(const MarkupNode &Node);		bool trySymbol(const MarkupNode &Node);
bool tryPC(const MarkupNode &Node);		bool tryPC(const MarkupNode &Node);
bool tryBackTrace(const MarkupNode &Node);		bool tryBackTrace(const MarkupNode &Node);
bool tryData(const MarkupNode &Node);		bool tryData(const MarkupNode &Node);

		bool tryTrigger(const MarkupNode &Node);

bool trySGR(const MarkupNode &Node);		bool trySGR(const MarkupNode &Node);

void highlight();		void highlight();
void highlightValue();		void highlightValue();
void restoreColor();		void restoreColor();
void resetColor();		void resetColor();

void printRawElement(const MarkupNode &Element);		void printRawElement(const MarkupNode &Element);
void printValue(Twine Value);		void printValue(Twine Value);

		void dumpProcessContext();

std::optional<Module> parseModule(const MarkupNode &Element) const;		std::optional<Module> parseModule(const MarkupNode &Element) const;
std::optional<MMap> parseMMap(const MarkupNode &Element) const;		std::optional<MMap> parseMMap(const MarkupNode &Element) const;

std::optional<uint64_t> parseAddr(StringRef Str) const;		std::optional<uint64_t> parseAddr(StringRef Str) const;
std::optional<uint64_t> parseModuleID(StringRef Str) const;		std::optional<uint64_t> parseModuleID(StringRef Str) const;
std::optional<uint64_t> parseSize(StringRef Str) const;		std::optional<uint64_t> parseSize(StringRef Str) const;
object::BuildID parseBuildID(StringRef Str) const;		object::BuildID parseBuildID(StringRef Str) const;
std::optional<std::string> parseMode(StringRef Str) const;		std::optional<std::string> parseMode(StringRef Str) const;
Show All 11 Lines	private:
const MMap *getOverlappingMMap(const MMap &Map) const;		const MMap *getOverlappingMMap(const MMap &Map) const;
const MMap *getContainingMMap(uint64_t Addr) const;		const MMap *getContainingMMap(uint64_t Addr) const;

uint64_t adjustAddr(uint64_t Addr, PCType Type) const;		uint64_t adjustAddr(uint64_t Addr, PCType Type) const;

StringRef lineEnding() const;		StringRef lineEnding() const;

raw_ostream &OS;		raw_ostream &OS;
		std::unique_ptr<json::OStream> JOS;
LLVMSymbolizer &Symbolizer;		LLVMSymbolizer &Symbolizer;
const bool ColorsEnabled;		const bool ColorsEnabled;

MarkupParser Parser;		MarkupParser Parser;

// Current line being filtered.		// Current line being filtered.
StringRef Line;		StringRef Line;

Show All 19 Lines

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp

Show All 22 Lines
#include "llvm/DebugInfo/Symbolize/Markup.h"		#include "llvm/DebugInfo/Symbolize/Markup.h"
#include "llvm/DebugInfo/Symbolize/Symbolize.h"		#include "llvm/DebugInfo/Symbolize/Symbolize.h"
#include "llvm/Debuginfod/Debuginfod.h"		#include "llvm/Debuginfod/Debuginfod.h"
#include "llvm/Demangle/Demangle.h"		#include "llvm/Demangle/Demangle.h"
#include "llvm/Object/ObjectFile.h"		#include "llvm/Object/ObjectFile.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
		#include "llvm/Support/JSON.h"
		#include "llvm/Support/Path.h"
#include "llvm/Support/WithColor.h"		#include "llvm/Support/WithColor.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <optional>		#include <optional>

using namespace llvm;		using namespace llvm;
using namespace llvm::symbolize;		using namespace llvm::symbolize;

MarkupFilter::MarkupFilter(raw_ostream &OS, LLVMSymbolizer &Symbolizer,		MarkupFilter::MarkupFilter(raw_ostream &OS, LLVMSymbolizer &Symbolizer,
std::optional<bool> ColorsEnabled)		std::optional<bool> ColorsEnabled,
		bool DumpProcessContext)
: OS(OS), Symbolizer(Symbolizer),		: OS(OS), Symbolizer(Symbolizer),
ColorsEnabled(		ColorsEnabled(
ColorsEnabled.value_or(WithColor::defaultAutoDetectFunction()(OS))) {}		ColorsEnabled.value_or(WithColor::defaultAutoDetectFunction()(OS))) {
		if (DumpProcessContext) {
		ColorsEnabled = false;
		JOS = std::make_unique<json::OStream>(OS, /Indent=/2);
		JOS->arrayBegin();
		}
		}

void MarkupFilter::filter(StringRef Line) {		void MarkupFilter::filter(StringRef Line) {
this->Line = Line;		this->Line = Line;
resetColor();		resetColor();

Parser.parseLine(Line);		Parser.parseLine(Line);
SmallVector<MarkupNode> DeferredNodes;		SmallVector<MarkupNode> DeferredNodes;
// See if the line is a contextual (i.e. contains a contextual element).		// See if the line is a contextual (i.e. contains a contextual element).
Show All 14 Lines
}		}

void MarkupFilter::finish() {		void MarkupFilter::finish() {
Parser.flush();		Parser.flush();
while (std::optional<MarkupNode> Node = Parser.nextNode())		while (std::optional<MarkupNode> Node = Parser.nextNode())
filterNode(*Node);		filterNode(*Node);
endAnyModuleInfoLine();		endAnyModuleInfoLine();
resetColor();		resetColor();
		if (JOS) {
		dumpProcessContext();
		JOS->arrayEnd();
		OS << '\n';
		}
Modules.clear();		Modules.clear();
MMaps.clear();		MMaps.clear();
}		}

// See if the given node is a contextual element and handle it if so. This may		// See if the given node is a contextual element and handle it if so. This may
// either output or defer the element; in the former case, it will first emit		// either output or defer the element; in the former case, it will first emit
// any DeferredNodes.		// any DeferredNodes.
//		//
Show All 30 Lines	bool MarkupFilter::tryMMap(const MarkupNode &Node,
assert(Res.second && "Overlap check should ensure emplace succeeds.");		assert(Res.second && "Overlap check should ensure emplace succeeds.");
MMap &MMap = Res.first->second;		MMap &MMap = Res.first->second;

if (!MIL \|\| MIL->Mod != MMap.Mod) {		if (!MIL \|\| MIL->Mod != MMap.Mod) {
endAnyModuleInfoLine();		endAnyModuleInfoLine();
for (const MarkupNode &Node : DeferredNodes)		for (const MarkupNode &Node : DeferredNodes)
filterNode(Node);		filterNode(Node);
beginModuleInfoLine(MMap.Mod);		beginModuleInfoLine(MMap.Mod);
		if (!JOS)
OS << "; adds";		OS << "; adds";
}		}
MIL->MMaps.push_back(&MMap);		MIL->MMaps.push_back(&MMap);
return true;		return true;
}		}

bool MarkupFilter::tryReset(const MarkupNode &Node,		bool MarkupFilter::tryReset(const MarkupNode &Node,
const SmallVector<MarkupNode> &DeferredNodes) {		const SmallVector<MarkupNode> &DeferredNodes) {
if (Node.Tag != "reset")		if (Node.Tag != "reset")
return false;		return false;
if (!checkNumFields(Node, 0))		if (!checkNumFields(Node, 0))
return true;		return true;

		if (JOS)
		dumpProcessContext();

if (!Modules.empty() \|\| !MMaps.empty()) {		if (!Modules.empty() \|\| !MMaps.empty()) {
endAnyModuleInfoLine();		endAnyModuleInfoLine();
for (const MarkupNode &Node : DeferredNodes)		for (const MarkupNode &Node : DeferredNodes)
filterNode(Node);		filterNode(Node);

		if (!JOS) {
highlight();		highlight();
OS << "[[[reset]]]" << lineEnding();		OS << "[[[reset]]]" << lineEnding();
restoreColor();		restoreColor();
		}

Modules.clear();		Modules.clear();
MMaps.clear();		MMaps.clear();
}		}
return true;		return true;
}		}

bool MarkupFilter::tryModule(const MarkupNode &Node,		bool MarkupFilter::tryModule(const MarkupNode &Node,
Show All 12 Lines	if (!Res.second) {
return true;		return true;
}		}
Module &Module = *Res.first->second;		Module &Module = *Res.first->second;

endAnyModuleInfoLine();		endAnyModuleInfoLine();
for (const MarkupNode &Node : DeferredNodes)		for (const MarkupNode &Node : DeferredNodes)
filterNode(Node);		filterNode(Node);
beginModuleInfoLine(&Module);		beginModuleInfoLine(&Module);
		if (!JOS) {
OS << "; BuildID=";		OS << "; BuildID=";
printValue(toHex(Module.BuildID, /LowerCase=/true));		printValue(toHex(Module.BuildID, /LowerCase=/true));
		}
return true;		return true;
}		}

void MarkupFilter::beginModuleInfoLine(const Module *M) {		void MarkupFilter::beginModuleInfoLine(const Module *M) {
		if (!JOS) {
highlight();		highlight();
OS << "[[[ELF module";		OS << "[[[ELF module";
printValue(formatv(" #{0:x} ", M->ID));		printValue(formatv(" #{0:x} ", M->ID));
OS << '"';		OS << '"';
printValue(M->Name);		printValue(M->Name);
OS << '"';		OS << '"';
		}
MIL = ModuleInfoLine{M};		MIL = ModuleInfoLine{M};
}		}

void MarkupFilter::endAnyModuleInfoLine() {		void MarkupFilter::endAnyModuleInfoLine() {
if (!MIL)		if (!MIL)
return;		return;
		if (!JOS) {
llvm::stable_sort(MIL->MMaps, [](const MMap A, const MMap B) {		llvm::stable_sort(MIL->MMaps, [](const MMap A, const MMap B) {
return A->Addr < B->Addr;		return A->Addr < B->Addr;
});		});
for (const MMap *M : MIL->MMaps) {		for (const MMap *M : MIL->MMaps) {
OS << (M == MIL->MMaps.front() ? ' ' : ',');		OS << (M == MIL->MMaps.front() ? ' ' : ',');
OS << '[';		OS << '[';
printValue(formatv("{0:x}", M->Addr));		printValue(formatv("{0:x}", M->Addr));
OS << '-';		OS << '-';
printValue(formatv("{0:x}", M->Addr + M->Size - 1));		printValue(formatv("{0:x}", M->Addr + M->Size - 1));
OS << "](";		OS << "](";
printValue(M->Mode);		printValue(M->Mode);
OS << ')';		OS << ')';
}		}
OS << "]]]" << lineEnding();		OS << "]]]" << lineEnding();
restoreColor();		restoreColor();
		}
MIL.reset();		MIL.reset();
}		}

// Handle a node that is known not to be a contextual element.		// Handle a node that is known not to be a contextual element.
void MarkupFilter::filterNode(const MarkupNode &Node) {		void MarkupFilter::filterNode(const MarkupNode &Node) {
if (!checkTag(Node))		if (!checkTag(Node))
return;		return;
		if (JOS)
		return;
if (tryPresentation(Node))		if (tryPresentation(Node))
return;		return;
if (trySGR(Node))		if (trySGR(Node))
return;		return;

OS << Node.Text;		OS << Node.Text;
}		}

▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	bool MarkupFilter::tryData(const MarkupNode &Node) {
}		}

highlight();		highlight();
OS << Symbol->Name;		OS << Symbol->Name;
restoreColor();		restoreColor();
return true;		return true;
}		}

		void MarkupFilter::dumpProcessContext() {
		JOS->object([&] {
		JOS->attributeArray("modules", [&] {
		for (const auto &[_, Module] : Modules) {
		JOS->objectBegin();
		JOS->attribute("id", Module->ID);
		JOS->attribute("name", Module->Name);
		JOS->attribute("type", "elf");
		JOS->attribute("buildID", toHex(Module->BuildID, /LowerCase=/true));
		JOS->objectEnd();
		}
		});
		JOS->attributeArray("mmaps", [&] {
		for (const auto &[_, Map] : MMaps) {
		JOS->objectBegin();
		JOS->attribute("address", Map.Addr);
		JOS->attribute("size", Map.Size);
		JOS->attribute("type", "load");
		JOS->attribute("moduleID", Map.Mod->ID);
		JOS->attribute("mode", Map.Mode);
		phosekUnsubmitted Not Done Reply Inline Actions For attribute naming, in JSON generated from LLVM tools we typically use either `PascalCase` or `snake_case`, `camelCase` is a bit unusual. phosek: For attribute naming, in JSON generated from LLVM tools we typically use either `PascalCase` or…
		mysterymathAuthorUnsubmitted Done Reply Inline Actions I did a quick search around the codebase, and it doesn't look like there's a consistent convention; there's kebab case, pascal case, and camelCase all represented. Didn't really take the time to get accurate counts, but it does seem that various style guides suggest lower camel case: https://google.github.io/styleguide/jsoncstyleguide.xml https://jsonapi.org/recommendations/ mysterymath: I did a quick search around the codebase, and it doesn't look like there's a consistent…
		JOS->attribute("moduleRelativeAddress", Map.ModuleRelativeAddr);
		JOS->objectEnd();
		}
		});
		});
		}

bool MarkupFilter::trySGR(const MarkupNode &Node) {		bool MarkupFilter::trySGR(const MarkupNode &Node) {
if (Node.Text == "\033[0m") {		if (Node.Text == "\033[0m") {
resetColor();		resetColor();
return true;		return true;
}		}
if (Node.Text == "\033[1m") {		if (Node.Text == "\033[1m") {
Bold = true;		Bold = true;
if (ColorsEnabled)		if (ColorsEnabled)
▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/test/DebugInfo/symbolize-filter-markup-dump-process-context.test

This file was added.

				RUN: split-file %s %t
				RUN: llvm-symbolizer --filter-markup --dump-process-context < %t/log > %t.out
				RUN: FileCheck %s --input-file=%t.out --match-full-lines \
				RUN: --implicit-check-not {{.}}
				RUN: not llvm-symbolizer --dump-process-context 2>&1 \| FileCheck --match-full-lines --implicit-check-not {{.}} --check-prefix ERR %s
				RUN: not llvm-symbolizer --no-dump-process-context 2>&1 \| FileCheck --match-full-lines --implicit-check-not {{.}} --check-prefix ERR %s

				CHECK: [
				CHECK-NEXT: {
				CHECK-NEXT: "modules": [
				CHECK-NEXT: {
				CHECK-NEXT: "id": 0,
				CHECK-NEXT: "name": "b.o",
				CHECK-NEXT: "type": "elf",
				CHECK-NEXT: "buildID": "ab"
				CHECK-NEXT: },
				CHECK-NEXT: {
				CHECK-NEXT: "id": 1,
				CHECK-NEXT: "name": "a.o",
				CHECK-NEXT: "type": "elf",
				CHECK-NEXT: "buildID": "cd"
				CHECK-NEXT: }
				CHECK-NEXT: ],
				CHECK-NEXT: "mmaps": [
				CHECK-NEXT: {
				CHECK-NEXT: "address": 16,
				CHECK-NEXT: "size": 16,
				CHECK-NEXT: "type": "load",
				CHECK-NEXT: "moduleID": 1,
				CHECK-NEXT: "mode": "r",
				CHECK-NEXT: "moduleRelativeAddress": 2
				CHECK-NEXT: },
				CHECK-NEXT: {
				CHECK-NEXT: "address": 32,
				CHECK-NEXT: "size": 48,
				CHECK-NEXT: "type": "load",
				CHECK-NEXT: "moduleID": 1,
				CHECK-NEXT: "mode": "w",
				CHECK-NEXT: "moduleRelativeAddress": 3
				CHECK-NEXT: },
				CHECK-NEXT: {
				CHECK-NEXT: "address": 80,
				CHECK-NEXT: "size": 96,
				CHECK-NEXT: "type": "load",
				CHECK-NEXT: "moduleID": 0,
				CHECK-NEXT: "mode": "rx",
				CHECK-NEXT: "moduleRelativeAddress": 4
				CHECK-NEXT: }
				CHECK-NEXT: ]
				CHECK-NEXT: },
				CHECK-NEXT: {
				CHECK-NEXT: "modules": [
				CHECK-NEXT: {
				CHECK-NEXT: "id": 0,
				CHECK-NEXT: "name": "c.o",
				CHECK-NEXT: "type": "elf",
				CHECK-NEXT: "buildID": "ef"
				CHECK-NEXT: }
				CHECK-NEXT: ],
				CHECK-NEXT: "mmaps": []
				CHECK-NEXT: }
				CHECK-NEXT: ]

				ERR: error: --[no-]dump-process-context can only be used with --filter-markup

				;--- log
				{{{module:1:a.o:elf:cd}}}
				{{{module:0:b.o:elf:ab}}}
				{{{mmap:0x10:0x10:load:1:r:0x2}}}
				{{{mmap:0x20:0x30:load:1:w:0x3}}}
				{{{mmap:0x50:0x60:load:0:rx:0x4}}}
				{{{pc:0x20}}}
				{{{reset}}}
				{{{module:0:c.o:elf:ef}}}

llvm/tools/llvm-symbolizer/Opts.td

Show All 25 Lines

def color : F<"color", "Use color when symbolizing log markup.">;

def color_EQ : Joined<["--"], "color=">, HelpText<"Whether to use color when symbolizing log markup: always, auto, never">, Values<"always,auto,never">;

defm debug_file_directory : Eq<"debug-file-directory", "Path to directory where to look for debug files">, MetaVarName<"<dir>">;

defm debuginfod : B<"debuginfod", "Use debuginfod to find debug binaries", "Don't use debuginfod to find debug binaries">;

defm default_arch

: Eq<"default-arch", "Default architecture (for multi-arch objects)">,

Group<grp_mach_o>;

defm demangle : B<"demangle", "Demangle function names", "Don't demangle function names">;

defm dump_process_context : B<"dump-process-context", "Dump process context JSON from markup instead of symbolizing", "Don't dump process contexts">;

jhendersonUnsubmitted

Not Done

defm demangle : B<"demangle", "Demangle function names", "Don't demangle function names">;

- defm dump_context : B<"dump-context", "Only dump symbolizer markup context as JSON.", "Don't dump markup context.">;

+ defm dump_context : B<"dump-context", "Only dump symbolizer markup context as JSON", "Don't dump markup context">;

def filter_markup : Flag<["--"], "filter-markup">, HelpText<"Filter symbolizer markup from stdin.">;

Nit: for consistency with other options.

jhenderson: Nit: for consistency with other options.

def filter_markup : Flag<["--"], "filter-markup">, HelpText<"Filter symbolizer markup from stdin.">;

def functions : F<"functions", "Print function name for a given address">;

def functions_EQ : Joined<["--"], "functions=">, HelpText<"Print function name for a given address">, Values<"none,short,linkage">;

def help : F<"help", "Display this help">;

defm dwp : Eq<"dwp", "Path to DWP file to be use for any split CUs">, MetaVarName<"<file>">;

defm dsym_hint

: Eq<"dsym-hint",

"Path to .dSYM bundles to search for debug info for the object files">,

▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

Show First 20 Lines • Show All 357 Lines • ▼ Show 20 Lines	if (BuildID.empty()) {
errs() << A->getSpelling() + ": expected a build ID, but got '" + V + "'\n";		errs() << A->getSpelling() + ": expected a build ID, but got '" + V + "'\n";
exit(1);		exit(1);
}		}
return BuildID;		return BuildID;
}		}

// Symbolize markup from stdin and write the result to stdout.		// Symbolize markup from stdin and write the result to stdout.
static void filterMarkup(const opt::InputArgList &Args, LLVMSymbolizer &Symbolizer) {		static void filterMarkup(const opt::InputArgList &Args, LLVMSymbolizer &Symbolizer) {
MarkupFilter Filter(outs(), Symbolizer, parseColorArg(Args));		MarkupFilter Filter(outs(), Symbolizer, parseColorArg(Args),
		Args.hasFlag(OPT_dump_process_context,
		OPT_no_dump_process_context, false));
std::string InputString;		std::string InputString;
while (std::getline(std::cin, InputString)) {		while (std::getline(std::cin, InputString)) {
InputString += '\n';		InputString += '\n';
Filter.filter(InputString);		Filter.filter(InputString);
}		}
Filter.finish();		Filter.finish();
}		}

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	#endif
if (Args.hasFlag(OPT_debuginfod, OPT_no_debuginfod, canUseDebuginfod()))		if (Args.hasFlag(OPT_debuginfod, OPT_no_debuginfod, canUseDebuginfod()))
enableDebuginfod(Symbolizer, Args);		enableDebuginfod(Symbolizer, Args);

if (Args.hasArg(OPT_filter_markup)) {		if (Args.hasArg(OPT_filter_markup)) {
filterMarkup(Args, Symbolizer);		filterMarkup(Args, Symbolizer);
return 0;		return 0;
}		}

		if (Args.hasArg(OPT_dump_process_context) \|\|
		Args.hasArg(OPT_no_dump_process_context)) {
		errs() << "error: --[no-]dump-process-context can only be used with "
		"--filter-markup\n";
		return EXIT_FAILURE;
		}

auto Style = IsAddr2Line ? OutputStyle::GNU : OutputStyle::LLVM;		auto Style = IsAddr2Line ? OutputStyle::GNU : OutputStyle::LLVM;
if (const opt::Arg *A = Args.getLastArg(OPT_output_style_EQ)) {		if (const opt::Arg *A = Args.getLastArg(OPT_output_style_EQ)) {
if (strcmp(A->getValue(), "GNU") == 0)		if (strcmp(A->getValue(), "GNU") == 0)
Style = OutputStyle::GNU;		Style = OutputStyle::GNU;
else if (strcmp(A->getValue(), "JSON") == 0)		else if (strcmp(A->getValue(), "JSON") == 0)
Style = OutputStyle::JSON;		Style = OutputStyle::JSON;
else		else
Style = OutputStyle::LLVM;		Style = OutputStyle::LLVM;
Show All 40 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Symbolizer] Add flag to dump process context JSON from markupNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 516540

llvm/docs/CommandGuide/llvm-symbolizer.rst

llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp

llvm/test/DebugInfo/symbolize-filter-markup-dump-process-context.test

llvm/tools/llvm-symbolizer/Opts.td

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

[Symbolizer] Add flag to dump process context JSON from markup
Needs ReviewPublic