This is an archive of the discontinued LLVM Phabricator instance.

[Symbolizer] Implement contextual symbolizer markup elements.
ClosedPublic

Authored by mysterymath on Jul 11 2022, 2:47 PM.

Details

Summary

This change implements the contextual symbolizer markup elements: reset,
module, and mmap. These provide information about the runtime context of
the binary necessary to resolve addresses to symbolic values.

Summary information is printed to the output about this context.
Multiple mmap elements for the same module line are coalesced together.
The standard requires that such elements occur on their own lines to
allow for this; accordingly, anything after a contextual element on a
line is silently discarded.

Implementing this cleanly requires that the filter drive the parser;
this allows skipped sections to avoid being parsed. This also makes the
filter quite a bit easier to use, at the cost of some unused
flexibility.

Diff Detail

Event Timeline

mysterymath created this revision.Jul 11 2022, 2:47 PM
Herald added a project: Restricted Project. · View Herald Transcript
mysterymath requested review of this revision.Jul 11 2022, 2:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 2:47 PM

Clean up diff.

phosek added inline comments.Jul 11 2022, 3:09 PM
llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
130–131

I'd recommend using DenseMap which is more performant than std::map.

Use Densemap<unique_ptr> for Modules; fix flipped overlappingMMap condition.

llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
130–131

For Modules, it's desirable that Module pointers should be stable; this allows module IDs to be looked up once during parsing, then the resulting pointers stored indefinitely. But this is better done via a DenseMap to unique_ptr, so I've done so.

For MMaps, the collection is ordered by address, and this is used to quickly identify overlapping MMap regions. This isn't obvious though, so I've added a comment.

mysterymath marked an inline comment as done.Jul 11 2022, 4:50 PM
jhenderson added inline comments.Jul 12 2022, 12:57 AM
llvm/docs/CommandGuide/llvm-symbolizer.rst
254–255

The term "only" implies there are many other elements that aren't supported (at least more than the number of supported elements). Is that still the case? If not, delete "only".

llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
68
119
mysterymath marked 3 inline comments as done.

Add periods. Correct typo.

llvm/docs/CommandGuide/llvm-symbolizer.rst
254–255

Yep, still far more unimplemented than implemented. It's also not actually possble to use the markup for symbolization tasks yet, this just establishes the baseline.
SGTM on the language though; it should probably become just a list of supported elements once there's enough present to actually be useful, then a list of unsupported elements once it's past the halfway point.

preceeding -> preceding

Fixed use after free with MarginNodes. MarginNodes ended up being unnecessary,
since when a new module line is opened, the next thing to print will
necessarily be that module line. That means the next things to be output will
be the margin nodes, and these can be summarily printed.

This is also true for the prefix of the module info line, but the code seems a
bit cleaner if the info handling is centralized in flushModuleInfo().

Apologies for the delay in responding. Some small suggestions but otherwise looking good to me.

llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
64

Would it be worth calling this ModuleLineInfo? or ModuleInfoLine? This would make it explicit from the name that this is a textual element and not information about Module that is permanently retained?

69

Would InlineMMaps be more descriptive? I think this is true when there is a module info line that has a module element immediately followed by mmaps for that module id. False if a new module info line is started for a previously seen module id?

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
89

Was thinking if there was a neat way of making sure that if true then all Deferred nodes had been output. Only thing I could think of was clearing Deferred in the FlushDeferred lambda and asserting that it was empty here. It would sadly mean making DeferredNodes non const in tryContextualElement so debateable whether it is worth doing.

302

Perhaps ASSIGN_OR_RETURN_NONE? Makes it a bit clearer at the call site. Not a strong opinion though.

llvm/test/DebugInfo/symbolize-filter-markup-module.test
13

Worth adding a newline to space the errors from the check as I think that would match the log below, which does have a newline in it.

mysterymath marked 5 inline comments as done.

Address review comments. Refactored out printing module info line prefix.

llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
69

Refactored this away.

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
89

We don't necessarily need to output the DeferredNodes if the function returns true. In case of errors, the line is still considered contextual, but it still might be foldable away. In this case, we wouldn't ever call FlushDeferred, but we would still return true.

peter.smith accepted this revision.Jul 15 2022, 1:31 AM

Thanks for the updates. I've no more substantial comments. Will be worth waiting a bit to see if anyone else on the list have anything more to add.

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
148

May be worth making it more explicit that mmap needs to be processed before the other tags due to the presence of endModuleInfoLine here. Without looking carefully it could seem like new tags can be added anywhere. Some possibilities:

  • Comment in header or by mmap.
  • if (Element.Tag == "mmap") { ... } else { endModuleInfoLine(); ... }
  • if (Element.Tag != "mmap") { endModuleInfoLine(); } else if (ElementTag == "mmap") { ... } if (Element.Tag == "reset") { ... }
This revision is now accepted and ready to land.Jul 15 2022, 1:31 AM
phosek added inline comments.Jul 15 2022, 2:13 AM
llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
121

One potential improvement I've been thinking about would be to split the handling of different tags into separate methods. What I had in mind would be to have an enum for known tags, use llvm:StringSwitch to convert the tag to an enum and then use it for the dispatch. The only complication is endModuleInfoLine which needs a special handling. This could also be done in a future change.

mysterymath marked an inline comment as done.

Address review comments.

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
121

I played around with this a bit; it seems like there's exactly one string comparison per element per known tag, so going through an enum woudn't really add anything from a performance standpoint without building a real lexer and trie-ing-up the tags.

The string-switch approach would be more readable though; along the same lines, I broke up out each tag in tryContextualElement into a try<TAG>() function. Each begins with a tag check, so the end result for tryContextualElement ends up looking more like a usual case analysis in a recursive-descent parser.

148

Ended up splitting these out into methods, so I've added a comment in the greatly-reduced tryContextualElement().

Just a couple of typos spotted in a comment.

One thing I find hard to reason about is the DeferredNodes. We add deferred nodes in tryContextualLine, but never in tryContextualElement, from that point onwards we can flush or clear only. As the latter two are read-write we can't use const to let readers know from the type signature what is and isn't expected. One possible way around this is to make deferred nodes its own type that implements more than one interface. An interface that permits adding, and one that does not (only supports flush and clear). The latter interface can be passed to tryContextualElement. Possibly overboard for this patch when there are only a few ContextualElements, but could be worth it if it gets more complicated.

llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
122

typos:
conextual -> contextual
tha -> that

mysterymath marked an inline comment as done.

I did a bit more hammering on the semantics of contexutal line elision.
Previously, contextual elements would be output as text if they were preceded
by text, which wasn't intended. Fixing this means that a non-contextual line
will be entirely deferred. This means that tryContextualLine() doesn't really
make sense anymore; it always handles the whole line. I've thus made the
modified version of it into the new body of filter().

I also made the deferred nodes a const parameter of the try calls again; this
seems cleaner than moving it out into the body, since it's produced by filter()
and possibly consumed by the try() calls.

Fix indentation.

Remove unused declaration.

Remove unnecessary flag from test RUN: lines.

phosek added inline comments.Jul 20 2022, 12:48 AM
llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
121

FYI the code generated by llvm::StringSwitch is more efficient than plain string comparison, the pattern used by llvm::StringSwitch implementation was chosen deliberately to be recognized by the optimizer. However, I don't think the string comparisons used here are performance critical so the difference is probably going to be in the noise.

mysterymath added inline comments.Jul 20 2022, 10:04 AM
llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
122

Removed.

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
121

Ah, that's good to know; I had thought it suspicious that StringSwitch even existed, given that it doesn't tend to make the surrounding code that much clearer than if/else. I didn't know that the C++ optimizer could do detailed looks into things like StringRef's, but it makes sense in retrospect.

Thanks for the update, I prefer the deferred nodes approach you have. I've only got a couple of optional suggestions.

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
45

I think it would be useful to retain the information in the previous comment

// See if the line consists of whitespace and SGR nodes followed by a contextual
// element. In this case, anything after the contextual element can be elided,
// and the entire line might be folded away with the next contextual element.

Without this contextual line looks like it is a term that isn't really defined, other than by guessing from the context.

53

I had to look at the definition of endModuleLine() to check what happened if there was no module to end. Could be worth breaking the symmetry and changing the name to something like endAnyModuleInfoLine which implies that there may not be one? Alternatively // This was not a contextual line. Output any pending ModuleInfo.

mysterymath marked 2 inline comments as done.

Address review comments.

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
45

Done; updated the text to match the new semantics.

53

I like endAnyModuleInfoLine; it captures the notion I was trying to convey better than endModuleLine. Done.

peter.smith accepted this revision.Jul 21 2022, 1:24 AM

Thanks for the update. I've no more comments. I'm happy with the changes.

This revision was landed with ongoing or failed builds.Jul 21 2022, 11:29 AM
This revision was automatically updated to reflect the committed changes.

This broke building with GCC 9:

In file included from ../tools/llvm-symbolizer/llvm-symbolizer.cpp:23:
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:55:19: error: declaration of ‘const llvm::symbolize::MarkupFilter::Module* llvm::symbolize::MarkupFilter::MMap::Module’ changes meaning of ‘Module’ [-fpermissive]
   55 |     const Module *Module;
      |                   ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:46:10: note: ‘Module’ declared here as ‘struct llvm::symbolize::MarkupFilter::Module’
   46 |   struct Module {
      |          ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:65:19: error: declaration of ‘const llvm::symbolize::MarkupFilter::Module* llvm::symbolize::MarkupFilter::ModuleInfoLine::Module’ changes meaning of ‘Module’ [-fpermissive]
   65 |     const Module *Module;
      |                   ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:46:10: note: ‘Module’ declared here as ‘struct llvm::symbolize::MarkupFilter::Module’
   46 |   struct Module {
      |          ^~~~~~

This broke building with GCC 9:

In file included from ../tools/llvm-symbolizer/llvm-symbolizer.cpp:23:
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:55:19: error: declaration of ‘const llvm::symbolize::MarkupFilter::Module* llvm::symbolize::MarkupFilter::MMap::Module’ changes meaning of ‘Module’ [-fpermissive]
   55 |     const Module *Module;
      |                   ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:46:10: note: ‘Module’ declared here as ‘struct llvm::symbolize::MarkupFilter::Module’
   46 |   struct Module {
      |          ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:65:19: error: declaration of ‘const llvm::symbolize::MarkupFilter::Module* llvm::symbolize::MarkupFilter::ModuleInfoLine::Module’ changes meaning of ‘Module’ [-fpermissive]
   65 |     const Module *Module;
      |                   ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:46:10: note: ‘Module’ declared here as ‘struct llvm::symbolize::MarkupFilter::Module’
   46 |   struct Module {
      |          ^~~~~~

Sorry about that, see https://github.com/llvm/llvm-project/commit/6605187103a2369418d014a7f146fee4a04b11bf

This broke building with GCC 9:

In file included from ../tools/llvm-symbolizer/llvm-symbolizer.cpp:23:
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:55:19: error: declaration of ‘const llvm::symbolize::MarkupFilter::Module* llvm::symbolize::MarkupFilter::MMap::Module’ changes meaning of ‘Module’ [-fpermissive]
   55 |     const Module *Module;
      |                   ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:46:10: note: ‘Module’ declared here as ‘struct llvm::symbolize::MarkupFilter::Module’
   46 |   struct Module {
      |          ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:65:19: error: declaration of ‘const llvm::symbolize::MarkupFilter::Module* llvm::symbolize::MarkupFilter::ModuleInfoLine::Module’ changes meaning of ‘Module’ [-fpermissive]
   65 |     const Module *Module;
      |                   ^~~~~~
../include/llvm/DebugInfo/Symbolize/MarkupFilter.h:46:10: note: ‘Module’ declared here as ‘struct llvm::symbolize::MarkupFilter::Module’
   46 |   struct Module {
      |          ^~~~~~

Sorry about that, see https://github.com/llvm/llvm-project/commit/6605187103a2369418d014a7f146fee4a04b11bf

Thanks, that seems to have done the trick!

dyung added a subscriber: dyung.Jul 21 2022, 1:11 PM

One of the tests you added symbolize-filter-markup-reset.test seems to be failing on the PS5 Windows bot, can you take a look?

https://lab.llvm.org/staging/#/builders/204/builds/1751

One of the tests you added symbolize-filter-markup-reset.test seems to be failing on the PS5 Windows bot, can you take a look?

https://lab.llvm.org/staging/#/builders/204/builds/1751

Looking on my Windows box; I'll revert if I can't get to the bottom of it in a few minutes.

One of the tests you added symbolize-filter-markup-reset.test seems to be failing on the PS5 Windows bot, can you take a look?

https://lab.llvm.org/staging/#/builders/204/builds/1751

Looking on my Windows box; I'll revert if I can't get to the bottom of it in a few minutes.

Seems to have been a use-after-free; this should be fixed by cc0a1078f5fb53638e6c125f3be2728dfe84e129.