Download Raw Diff

Details

Reviewers

phosek
mcgrathr
jhenderson
peter.smith

Commits

rG2040b6df0a3f: [Symbolize] Parser for log symbolizer markup.

Summary

This adds a parser for the log symbolizer markup format discussed in
https://discourse.llvm.org/t/rfc-log-symbolizer/61282. The parser
operates in a line-by-line fashion with minimal memory requirements.

This doesn't yet include support for multi-line tags or specific parsing
for ANSI X3.64 SGR control sequences, but it can be extended to do so.
The latter can also be relatively easily handled by examining the
resulting text elements.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mysterymath created this revision.Apr 29 2022, 10:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2022, 10:28 AM

Herald added subscribers: hiraditya, mgorny. · View Herald Transcript

Fixup before code review.

Add tests for field parsing.

mysterymath published this revision for review.Apr 29 2022, 12:36 PM

mysterymath added reviewers: phosek, mcgrathr, jhenderson.

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2022, 12:36 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Fix typos in comments; expand a bit.

Harbormaster completed remote builds in B162048: Diff 426146.Apr 29 2022, 3:08 PM

Use SmallVector for Buffer; allow main loop to run malloc-free.

mysterymath added a child revision: D124798: [Symbolize] Parse multi-line markup elements..May 2 2022, 1:20 PM

CurEntry => NextIdx

Harbormaster completed remote builds in B162320: Diff 426508.May 2 2022, 3:05 PM

mysterymath added a reviewer: peter.smith.Jun 6 2022, 10:41 AM

Fix parsing for empty field.

Harbormaster completed remote builds in B168941: Diff 435716.Jun 9 2022, 5:01 PM

My apologies for the delay in responding. I've only taken a look at this patch alone. Not looked forward to D124798 and D126980. In general looks good to me. I've a few small suggestions, mostly in naming and comments.

llvm/include/llvm/DebugInfo/Symbolize/Markup.h
12	Although not for this patch, I note that in D126980 the docs for the SymbolizerFormat are added, it would be good to put a link as that specification is needed to understand the code. Perhaps worth adding a TODO for this patch.
34	I got a bit confused with the name `MarkupElement`. Looking at the spec in https://fuchsia.dev/fuchsia-src/reference/kernel/symbolizer_markup and the `An element of symbolizer markup` I was expecting `MarkupElement` to be just a MarkupElement of the form `{{{tag:fields}}}` but it looks like this is a general case of (IIUC): TextElement: a string containing no Markup Elements or SGR codes. SGRElement: a special case of Text Element that contains a single SGR control code. MarkupElement: Also has Tag and Fields. Maybe worth finding a different word to element to distinguish? Something like `MarkupPiece` or `MarkupComponent`? If we do change it will be worth updating the comment. I see that you've used just element in other comments. Perhaps just Element.
61	An alternative behaviour could be that calling `parseLine()` without iterating through `nextLine()` would be to reset NextIdx and clear the Buffer. You'd then not leave the Buffer and NextIdx in an inconsistent state. Not a strong opinion as I expect the number of users of the parser to be low.
llvm/lib/DebugInfo/Symbolize/Markup.cpp
38	Similar to the comment I made in the header. If the Buffer isn't empty I think you could do the equivalent of: NextIdx.reset(); Buffer.clear();
56	IIUC this is looking specifically for a MarkupElement of the form `{{{tag}}}` or `{{{tag:fields}}}` perhaps worth renaming to `parseMarkupElement()`. Could be worth putting in the comment about the expected syntax of the Markup.
76	I expect a common user error could be an upper case tag. Is it worth a warning. This may not be the right place for it though.
96	Sentence in comment looks unfinished. Perhaps `so these are added as individual elements.`

Address review comments.

llvm/include/llvm/DebugInfo/Symbolize/Markup.h
34	I was also split about this naming convention; markup languages like to describe themselves as composed of markup elements, but they're not, they're also composed of all the stuff between the elements. I'd considered just "Element", but that would give this the name "llvm::symbolize::Element", which is a bit too broadly named for what this does. I did a very quick survey of what other SAX-ish HTML parsers do, it looks like they've settled on calling the abstract type a Node, while an actual element is an Element or Tag, which subclasses Node. The usage of Node here would be akin to linked list nodes: an ordered collection of units forming a stream.
61	Ah, that's cleaner; then you can stop handling nodes as soon as you decide you don't care about the rest. I can actually make use of that in the filter later.
llvm/lib/DebugInfo/Symbolize/Markup.cpp
56	Renaming MarkupElement to MarkupNode should make this clearer; with that, Element exclusively means a markup element.
76	I thought a bit more about this, and it seems like this is a distinction that doesn't belong in the determination of whether or not the given text is an element, but whether the element is well-formed. This is more akin to the handling of things like integer parsing errors, which occur later, where we can give good error messages. Accordingly, I've pulled this out of this change, to be added into the third of the series.

Harbormaster completed remote builds in B170359: Diff 437685.Jun 16 2022, 3:55 PM

LGTM thanks for the updates. I spotted what looked like one comment update for the change to Node, but that can easily be fixed up if needed.

llvm/include/llvm/DebugInfo/Symbolize/Markup.h
35	I think `and following elements.` could be `and following nodes`.

This revision is now accepted and ready to land.Jun 17 2022, 12:58 AM

In D124686#3591327, @peter.smith wrote:

LGTM thanks for the updates. I spotted what looked like one comment update for the change to Node, but that can easily be fixed up if needed.

Nice catch, thanks!

Fix missed element->node comment update.

This revision was landed with ongoing or failed builds.Jun 17 2022, 10:26 AM

Closed by commit rG2040b6df0a3f: [Symbolize] Parser for log symbolizer markup. (authored by mysterymath). · Explain Why

This revision was automatically updated to reflect the committed changes.

mysterymath added a commit: rG2040b6df0a3f: [Symbolize] Parser for log symbolizer markup..

Harbormaster completed remote builds in B170549: Diff 437960.Jun 17 2022, 12:33 PM

Is it already planned to add this markup to LLVM's own crash dumps when they aren't already symbolized? Might be a handy use-case for the feature, providing some built-in-to-llvm exercisizing/advertising/experience with the feature? (if the markup is meant to be compatible with human readers who might not be able to symbolize the data later - which seems like a nice feature too)

In D124686#3598029, @dblaikie wrote:

Is it already planned to add this markup to LLVM's own crash dumps when they aren't already symbolized? Might be a handy use-case for the feature, providing some built-in-to-llvm exercisizing/advertising/experience with the feature? (if the markup is meant to be compatible with human readers who might not be able to symbolize the data later - which seems like a nice feature too)

We've had a few discussions about this; I think broadly yes, although we haven't hashed out the full details yet. One of the options would be to always emit symbolizer markup in LLVM, then pass it to a forked llvm-symbolizer instance if one can be found. That way, you'd get online symbolization if possible, but graceful degradation to markup if not.

In D124686#3616216, @mysterymath wrote:

In D124686#3598029, @dblaikie wrote:

Is it already planned to add this markup to LLVM's own crash dumps when they aren't already symbolized? Might be a handy use-case for the feature, providing some built-in-to-llvm exercisizing/advertising/experience with the feature? (if the markup is meant to be compatible with human readers who might not be able to symbolize the data later - which seems like a nice feature too)

We've had a few discussions about this; I think broadly yes, although we haven't hashed out the full details yet. One of the options would be to always emit symbolizer markup in LLVM, then pass it to a forked llvm-symbolizer instance if one can be found. That way, you'd get online symbolization if possible, but graceful degradation to markup if not.

Ah, that makes sense - yeah, I was wondering/figuring the markup probably wasn't human readable, so always passing it through the processor, which can either add the symbolizing info, or at least remove the markup if there's no debug info/other symbolizing to do, etc.

Diff 437962

llvm/include/llvm/DebugInfo/Symbolize/Markup.h

This file was added.

				//===- Markup.h -------------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file declares the log symbolizer markup data model and parser.
				///
				/// \todo Add a link to the reference documentation once added.
				peter.smithUnsubmitted Done Reply Inline Actions Although not for this patch, I note that in D126980 the docs for the SymbolizerFormat are added, it would be good to put a link as that specification is needed to understand the code. Perhaps worth adding a TODO for this patch. peter.smith: Although not for this patch, I note that in D126980 the docs for the SymbolizerFormat are added…
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_DEBUGINFO_SYMBOLIZE_MARKUP_H
				#define LLVM_DEBUGINFO_SYMBOLIZE_MARKUP_H

				#include <iostream>

				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/Regex.h"

				namespace llvm {
				namespace symbolize {

				/// A node of symbolizer markup.
				///
				/// If only the Text field is set, this represents a region of text outside a
				/// markup element. ANSI SGR control codes are also reported this way; if
				/// detected, then the control code will be the entirety of the Text field, and
				/// any surrounding text will be reported as preceding and following nodes.
				peter.smithUnsubmitted Done Reply Inline Actions I got a bit confused with the name `MarkupElement`. Looking at the spec in https://fuchsia.dev/fuchsia-src/reference/kernel/symbolizer_markup and the `An element of symbolizer markup` I was expecting `MarkupElement` to be just a MarkupElement of the form `{{{tag:fields}}}` but it looks like this is a general case of (IIUC): TextElement: a string containing no Markup Elements or SGR codes. SGRElement: a special case of Text Element that contains a single SGR control code. MarkupElement: Also has Tag and Fields. Maybe worth finding a different word to element to distinguish? Something like `MarkupPiece` or `MarkupComponent`? If we do change it will be worth updating the comment. I see that you've used just element in other comments. Perhaps just Element. peter.smith: I got a bit confused with the name `MarkupElement`. Looking at the spec in https://fuchsia.
				mysterymathAuthorUnsubmitted Done Reply Inline Actions I was also split about this naming convention; markup languages like to describe themselves as composed of markup elements, but they're not, they're also composed of all the stuff between the elements. I'd considered just "Element", but that would give this the name "llvm::symbolize::Element", which is a bit too broadly named for what this does. I did a very quick survey of what other SAX-ish HTML parsers do, it looks like they've settled on calling the abstract type a Node, while an actual element is an Element or Tag, which subclasses Node. The usage of Node here would be akin to linked list nodes: an ordered collection of units forming a stream. mysterymath: I was also split about this naming convention; markup languages like to describe themselves as…
				struct MarkupNode {
				peter.smithUnsubmitted Not Done Reply Inline Actions I think `and following elements.` could be `and following nodes`. peter.smith: I think `and following elements.` could be `and following nodes`.
				/// The full text of this node in the input.
				StringRef Text;

				/// If this represents an element, the tag. Otherwise, empty.
				StringRef Tag;

				/// If this represents an element with fields, a list of the field contents.
				/// Otherwise, empty.
				SmallVector<StringRef> Fields;

				bool operator==(const MarkupNode &Other) const {
				return Text == Other.Text && Tag == Other.Tag && Fields == Other.Fields;
				}
				bool operator!=(const MarkupNode &Other) const { return !(*this == Other); }
				};

				/// Parses a log containing symbolizer markup into a sequence of nodes.
				class MarkupParser {
				public:
				MarkupParser();

				/// Parses an individual \p Line of input.
				///
				/// Nodes from the previous parseLine() call that haven't yet been extracted
				/// by nextNode() are discarded. The nodes returned by nextNode() may
				/// reference the input string, so it must be retained by the caller until the
				peter.smithUnsubmitted Done Reply Inline Actions An alternative behaviour could be that calling `parseLine()` without iterating through `nextLine()` would be to reset NextIdx and clear the Buffer. You'd then not leave the Buffer and NextIdx in an inconsistent state. Not a strong opinion as I expect the number of users of the parser to be low. peter.smith: An alternative behaviour could be that calling `parseLine()` without iterating through…
				mysterymathAuthorUnsubmitted Done Reply Inline Actions Ah, that's cleaner; then you can stop handling nodes as soon as you decide you don't care about the rest. I can actually make use of that in the filter later. mysterymath: Ah, that's cleaner; then you can stop handling nodes as soon as you decide you don't care about…
				/// last use.
				void parseLine(StringRef Line);

				/// Returns the next node from the most recent parseLine() call.
				///
				/// Calling nextNode() may invalidate the contents of the node returned by the
				/// previous call.
				///
				/// \returns the next markup node or None if none remain.
				Optional<MarkupNode> nextNode() {
				if (!NextIdx)
				NextIdx = 0;
				if (*NextIdx == Buffer.size()) {
				NextIdx.reset();
				Buffer.clear();
				return None;
				}
				return std::move(Buffer[(*NextIdx)++]);
				}

				private:
				Optional<MarkupNode> parseElement(StringRef Line);
				void parseTextOutsideMarkup(StringRef Text);

				// Buffer for nodes parsed from the current line.
				SmallVector<MarkupNode> Buffer;

				// Next buffer index to return or None if nextNode has not yet been called.
				Optional<size_t> NextIdx;

				// Regular expression matching supported ANSI SGR escape sequences.
				const Regex SGRSyntax;
				};

				} // end namespace symbolize
				} // end namespace llvm

				#endif // LLVM_DEBUGINFO_SYMBOLIZE_MARKUP_H

llvm/lib/DebugInfo/Symbolize/CMakeLists.txt

	add_llvm_component_library(LLVMSymbolize			add_llvm_component_library(LLVMSymbolize
	DIFetcher.cpp			DIFetcher.cpp
	DIPrinter.cpp			DIPrinter.cpp
				Markup.cpp
	SymbolizableObjectFile.cpp			SymbolizableObjectFile.cpp
	Symbolize.cpp			Symbolize.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/DebugInfo/Symbolize			${LLVM_MAIN_INCLUDE_DIR}/llvm/DebugInfo/Symbolize

	LINK_COMPONENTS			LINK_COMPONENTS
	DebugInfoDWARF			DebugInfoDWARF
	DebugInfoPDB			DebugInfoPDB
	Object			Object
	Support			Support
	Demangle			Demangle
	)			)

llvm/lib/DebugInfo/Symbolize/Markup.cpp

This file was added.

				//===- lib/DebugInfo/Symbolize/Markup.cpp ------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the log symbolizer markup data model and parser.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/DebugInfo/Symbolize/Markup.h"

				#include "llvm/ADT/StringExtras.h"

				namespace llvm {
				namespace symbolize {

				// Matches the following:
				// "\033[0m"
				// "\033[1m"
				// "\033[30m" -- "\033[37m"
				static const char SGRSyntaxStr[] = "\033\\[([0-1]\|3[0-7])m";

				MarkupParser::MarkupParser() : SGRSyntax(SGRSyntaxStr) {}

				static StringRef takeTo(StringRef Str, StringRef::iterator Pos) {
				return Str.take_front(Pos - Str.begin());
				}
				static void advanceTo(StringRef &Str, StringRef::iterator Pos) {
				Str = Str.drop_front(Pos - Str.begin());
				}

				void MarkupParser::parseLine(StringRef Line) {
				Buffer.clear();
				while (!Line.empty()) {
				peter.smithUnsubmitted Not Done Reply Inline Actions Similar to the comment I made in the header. If the Buffer isn't empty I think you could do the equivalent of: NextIdx.reset(); Buffer.clear(); peter.smith: Similar to the comment I made in the header. If the Buffer isn't empty I think you could do the…
				// Find the first valid markup element, if any.
				if (Optional<MarkupNode> Element = parseElement(Line)) {
				parseTextOutsideMarkup(takeTo(Line, Element->Text.begin()));
				Buffer.push_back(std::move(*Element));
				advanceTo(Line, Element->Text.end());
				} else {
				// The line doesn't contain any more markup elements, so emit it as text.
				parseTextOutsideMarkup(Line);
				return;
				}
				}
				}

				// Finds and returns the next valid markup element in the given line. Returns
				// None if the line contains no valid elements.
				Optional<MarkupNode> MarkupParser::parseElement(StringRef Line) {
				while (true) {
				// Find next element using begin and end markers.
				peter.smithUnsubmitted Done Reply Inline Actions IIUC this is looking specifically for a MarkupElement of the form `{{{tag}}}` or `{{{tag:fields}}}` perhaps worth renaming to `parseMarkupElement()`. Could be worth putting in the comment about the expected syntax of the Markup. peter.smith: IIUC this is looking specifically for a MarkupElement of the form `{{{tag}}}` or `{{{tag…
				mysterymathAuthorUnsubmitted Done Reply Inline Actions Renaming MarkupElement to MarkupNode should make this clearer; with that, Element exclusively means a markup element. mysterymath: Renaming MarkupElement to MarkupNode should make this clearer; with that, Element exclusively…
				size_t BeginPos = Line.find("{{{");
				if (BeginPos == StringRef::npos)
				return None;
				size_t EndPos = Line.find("}}}", BeginPos + 3);
				if (EndPos == StringRef::npos)
				return None;
				EndPos += 3;
				MarkupNode Element;
				Element.Text = Line.slice(BeginPos, EndPos);
				Line = Line.substr(EndPos);

				// Parse tag.
				StringRef Content = Element.Text.drop_front(3).drop_back(3);
				StringRef FieldsContent;
				std::tie(Element.Tag, FieldsContent) = Content.split(':');
				if (Element.Tag.empty())
				continue;

				// Parse fields.
				if (!FieldsContent.empty())
				peter.smithUnsubmitted Done Reply Inline Actions I expect a common user error could be an upper case tag. Is it worth a warning. This may not be the right place for it though. peter.smith: I expect a common user error could be an upper case tag. Is it worth a warning. This may not be…
				mysterymathAuthorUnsubmitted Done Reply Inline Actions I thought a bit more about this, and it seems like this is a distinction that doesn't belong in the determination of whether or not the given text is an element, but whether the element is well-formed. This is more akin to the handling of things like integer parsing errors, which occur later, where we can give good error messages. Accordingly, I've pulled this out of this change, to be added into the third of the series. mysterymath: I thought a bit more about this, and it seems like this is a distinction that doesn't belong in…
				FieldsContent.split(Element.Fields, ":");
				else if (Content.back() == ':')
				Element.Fields.push_back(FieldsContent);

				return Element;
				}
				}

				static MarkupNode textNode(StringRef Text) {
				MarkupNode Node;
				Node.Text = Text;
				return Node;
				}

				// Parses a region of text known to be outside any markup elements. Such text
				// may still contain SGR control codes, so the region is further subdivided into
				// control codes and true text regions.
				void MarkupParser::parseTextOutsideMarkup(StringRef Text) {
				if (Text.empty())
				return;
				peter.smithUnsubmitted Done Reply Inline Actions Sentence in comment looks unfinished. Perhaps `so these are added as individual elements.` peter.smith: Sentence in comment looks unfinished. Perhaps `so these are added as individual elements.`
				SmallVector<StringRef> Matches;
				while (SGRSyntax.match(Text, &Matches)) {
				// Emit any text before the SGR element.
				if (Matches.begin()->begin() != Text.begin())
				Buffer.push_back(textNode(takeTo(Text, Matches.begin()->begin())));

				Buffer.push_back(textNode(*Matches.begin()));
				advanceTo(Text, Matches.begin()->end());
				}
				if (!Text.empty())
				Buffer.push_back(textNode(Text));
				}

				} // end namespace symbolize
				} // end namespace llvm

llvm/unittests/DebugInfo/CMakeLists.txt

	add_subdirectory(CodeView)			add_subdirectory(CodeView)
	add_subdirectory(DWARF)			add_subdirectory(DWARF)
	add_subdirectory(GSYM)			add_subdirectory(GSYM)
	add_subdirectory(MSF)			add_subdirectory(MSF)
	add_subdirectory(PDB)			add_subdirectory(PDB)
				add_subdirectory(Symbolizer)

llvm/unittests/DebugInfo/Symbolizer/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS Symbolize)
				add_llvm_unittest(DebugInfoSymbolizerTests MarkupTest.cpp)
				target_link_libraries(DebugInfoSymbolizerTests PRIVATE LLVMTestingSupport)

llvm/unittests/DebugInfo/Symbolizer/MarkupTest.cpp

This file was added.


				//===- unittest/DebugInfo/Symbolizer/MarkupTest.cpp - Markup parser tests -===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/DebugInfo/Symbolize/Markup.h"

				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/SmallString.h"
				#include "llvm/ADT/Twine.h"
				#include "llvm/Support/FormatVariadic.h"

				#include "gmock/gmock.h"
				#include "gtest/gtest.h"

				namespace {

				using namespace llvm;
				using namespace llvm::symbolize;
				using namespace testing;

				Matcher<MarkupNode> isNode(StringRef Text, StringRef Tag = "",
				Matcher<SmallVector<StringRef>> Fields = IsEmpty()) {
				return AllOf(Field("Text", &MarkupNode::Text, Text),
				Field("Tag", &MarkupNode::Tag, Tag),
				Field("Fields", &MarkupNode::Fields, Fields));
				}

				TEST(SymbolizerMarkup, NoLines) { EXPECT_EQ(MarkupParser{}.nextNode(), None); }

				TEST(SymbolizerMarkup, LinesWithoutMarkup) {
				MarkupParser Parser;

				Parser.parseLine("text");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("text")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("discarded");
				Parser.parseLine("kept");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("kept")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{}}")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{}}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{}}}")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{}}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{}}}")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{:field}}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{:field}}}")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{tag:");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{tag:")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{tag:field}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{tag:field}}")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("a\033[2mb");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("a\033[2mb")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("a\033[38mb");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("a\033[38mb")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("a\033[4mb");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("a\033[4mb")));
				EXPECT_THAT(Parser.nextNode(), None);
				}

				TEST(SymbolizerMarkup, LinesWithMarkup) {
				MarkupParser Parser;

				Parser.parseLine("{{{tag}}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{tag}}}", "tag")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{tag:f1:f2:f3}}}");
				EXPECT_THAT(Parser.nextNode(),
				testing::Optional(isNode("{{{tag:f1:f2:f3}}}", "tag",
				ElementsAre("f1", "f2", "f3"))));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{tag:}}}");
				EXPECT_THAT(Parser.nextNode(),
				testing::Optional(isNode("{{{tag:}}}", "tag", ElementsAre(""))));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{tag:}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{tag:}}")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{t2g}}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{t2g}}}", "t2g")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{tAg}}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{tAg}}}", "tAg")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("a{{{b}}}c{{{d}}}e");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("a")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{b}}}", "b")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("c")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{d}}}", "d")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("e")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{}}}{{{tag}}}");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{}}}")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("{{{tag}}}", "tag")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("\033[0mA\033[1mB\033[30mC\033[37m");
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("\033[0m")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("A")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("\033[1m")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("B")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("\033[30m")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("C")));
				EXPECT_THAT(Parser.nextNode(), testing::Optional(isNode("\033[37m")));
				EXPECT_THAT(Parser.nextNode(), None);

				Parser.parseLine("{{{tag:\033[0m}}}");
				EXPECT_THAT(Parser.nextNode(),
				testing::Optional(
				isNode("{{{tag:\033[0m}}}", "tag", ElementsAre("\033[0m"))));
				EXPECT_THAT(Parser.nextNode(), None);
				}

				} // namespace

This is an archive of the discontinued LLVM Phabricator instance.

[Symbolize] Parser for log symbolizer markup.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437962

llvm/include/llvm/DebugInfo/Symbolize/Markup.h

llvm/lib/DebugInfo/Symbolize/CMakeLists.txt

llvm/lib/DebugInfo/Symbolize/Markup.cpp

llvm/unittests/DebugInfo/CMakeLists.txt

llvm/unittests/DebugInfo/Symbolizer/CMakeLists.txt

llvm/unittests/DebugInfo/Symbolizer/MarkupTest.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[Symbolize] Parser for log symbolizer markup.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437962

llvm/include/llvm/DebugInfo/Symbolize/Markup.h

llvm/lib/DebugInfo/Symbolize/CMakeLists.txt

llvm/lib/DebugInfo/Symbolize/Markup.cpp

llvm/unittests/DebugInfo/CMakeLists.txt

llvm/unittests/DebugInfo/Symbolizer/CMakeLists.txt

llvm/unittests/DebugInfo/Symbolizer/MarkupTest.cpp

[Symbolize] Parser for log symbolizer markup.
ClosedPublic