This is an archive of the discontinued LLVM Phabricator instance.

[Symbolize] Parse multi-line markup elements.
ClosedPublic

Authored by mysterymath on May 2 2022, 12:52 PM.

Details

Summary

This allows registering certain tags as possibly beginning multi-line
elements in the symbolizer markup parser. The parser is kept agnostic to
how lines are delimited; it reports the entire contents, including line
endings, once the end of element marker is reached.

Diff Detail

Event Timeline

mysterymath created this revision.May 2 2022, 12:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2022, 12:52 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
mysterymath requested review of this revision.May 2 2022, 12:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2022, 12:52 PM

Add to comments; use "multi-line" consistently.

Merge changes to previous patch.

Merge parent.

Improve comment for flush().

peter.smith added inline comments.Jun 20 2022, 4:33 AM
llvm/include/llvm/DebugInfo/Symbolize/Markup.h
70

Will be useful to expand the comment to indicate the expected usage for multiline tags. It looks like multiple calls to parseLine() are needed with the only information that can be used to determine if we are in the middle of processing a mulitple line case is that nextNode() will return None but there is still input remaining?

76

Could be worth giving an example?

/// This allows the parser to finish any deferred processing, such as an in progress Multiline tag, and
/// may cause nextNode() to return additional nodes.
llvm/lib/DebugInfo/Symbolize/Markup.cpp
49

I think this could occur due to malformed input? For example

parseLine("{{{first");
nextElement() returns None
parseLine("}}"); // Typo only two closing braces.
nextElement() returns None
parseLine("{{{second"); // Assert failure "At most one multi-line element can begin at a time"

Would imply that an assertions build would crash on malformed input, but would append {{{second to the in progress multiline otherwise. I think this would be useful to report as a parse error. An assert seems like it is more appropriate for a coding error (misuse of the API).

mysterymath marked 3 inline comments as done.

Add comments; remove assert.

llvm/lib/DebugInfo/Symbolize/Markup.cpp
49

This assertion shouldn't ever be tripped; there's an early if(!InProgressMultiline.empty()) continue/return. I've removed it, as it didn't add much clarity.

For the example you gave, the current behavior would be to consider this a multi-line element with tag first (presuming the first line was {{{first:; multi-line elements have to begin with the field delimiter) and with the contents being everything else given. No elements would be emitted since the end tag still hasn't been seen, if flush() were called, the text would be scanned for control codes and emitted.

Refactor markup parser to parse lazily on each node request.

With the semantic changes to parseLine() in the previous change, this only
requires a slight modifification to the parsing algorithm. This has the
advantage that only the portion of the line that is actually used is parsed,
which is advantageous for context lines, where any suffix of the context
element is summarily discarded.

Add missed NextIdx=0 call.

peter.smith accepted this revision.Jun 22 2022, 1:38 AM

LGTM thanks for the updates. I think the refactoring to a lazy interface has made it easier to see the intended usage.

This revision is now accepted and ready to land.Jun 22 2022, 1:38 AM

Clean up usage of NextIdx.

Missed comment.

Harbormaster completed remote builds in B171360: Diff 439074.
This revision was landed with ongoing or failed builds.Jun 22 2022, 10:03 AM
This revision was automatically updated to reflect the committed changes.