This allows registering certain tags as possibly beginning multi-line
elements in the symbolizer markup parser. The parser is kept agnostic to
how lines are delimited; it reports the entire contents, including line
endings, once the end of element marker is reached.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/include/llvm/DebugInfo/Symbolize/Markup.h | ||
|---|---|---|
| 70 | Will be useful to expand the comment to indicate the expected usage for multiline tags. It looks like multiple calls to parseLine() are needed with the only information that can be used to determine if we are in the middle of processing a mulitple line case is that nextNode() will return None but there is still input remaining? | |
| 76 | Could be worth giving an example? /// This allows the parser to finish any deferred processing, such as an in progress Multiline tag, and /// may cause nextNode() to return additional nodes. | |
| llvm/lib/DebugInfo/Symbolize/Markup.cpp | ||
| 49 | I think this could occur due to malformed input? For example parseLine("{{{first");
nextElement() returns None
parseLine("}}"); // Typo only two closing braces.
nextElement() returns None
parseLine("{{{second"); // Assert failure "At most one multi-line element can begin at a time"Would imply that an assertions build would crash on malformed input, but would append {{{second to the in progress multiline otherwise. I think this would be useful to report as a parse error. An assert seems like it is more appropriate for a coding error (misuse of the API). | |
Add comments; remove assert.
| llvm/lib/DebugInfo/Symbolize/Markup.cpp | ||
|---|---|---|
| 49 | This assertion shouldn't ever be tripped; there's an early if(!InProgressMultiline.empty()) continue/return. I've removed it, as it didn't add much clarity. For the example you gave, the current behavior would be to consider this a multi-line element with tag first (presuming the first line was {{{first:; multi-line elements have to begin with the field delimiter) and with the contents being everything else given. No elements would be emitted since the end tag still hasn't been seen, if flush() were called, the text would be scanned for control codes and emitted. | |
Refactor markup parser to parse lazily on each node request.
With the semantic changes to parseLine() in the previous change, this only
requires a slight modifification to the parsing algorithm. This has the
advantage that only the portion of the line that is actually used is parsed,
which is advantageous for context lines, where any suffix of the context
element is summarily discarded.
LGTM thanks for the updates. I think the refactoring to a lazy interface has made it easier to see the intended usage.
Will be useful to expand the comment to indicate the expected usage for multiline tags. It looks like multiple calls to parseLine() are needed with the only information that can be used to determine if we are in the middle of processing a mulitple line case is that nextNode() will return None but there is still input remaining?