This is an archive of the discontinued LLVM Phabricator instance.

[Symbolizer] Implement data symbolizer markup element.
ClosedPublic

Authored by mysterymath on Jul 20 2022, 11:06 AM.

Details

Summary

This connects the Symbolizer to the markup filter and enables the first
working end-to-end flow using the filter.

Diff Detail

Event Timeline

mysterymath created this revision.Jul 20 2022, 11:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2022, 11:06 AM
mysterymath requested review of this revision.Jul 20 2022, 11:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2022, 11:06 AM
phosek added inline comments.Jul 21 2022, 2:13 AM
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
470–471

This should be moved as well so we can use debuginfod with the symbolizer filter.

471–477

This is not something that needs to be addressed in this change, but I think it would be valuable to support different output styles with the symbolizer filter as well, ideally by extending and using the existing printers. The JSON output in particular would be useful for consumption from other tools and scripts.

Address review comments.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
470–471

Ah, good catch. Done.

471–477

The use case makes sense, but it would be a somewhat odd way of interpreting the symbolizer markup. For the output to make sense as (e.g.) JSON, you'd need to have the filter ignore any text and SGR nodes present in the log, or you'd need to expect that the input is only markup and whitespace.

Maybe we could support an auxiliary output for this. When the log symbolizer encounters a presentation element, it could also tee the results to the auxiliary in one of these existing formats. This would keep the semantics of the main log symbolization pipeline simple and consistent, while still allowing machine processing of the referenced symbols, perhaps in concert with the filtered markup output.

Integrate from upstream.

One small comment on the test, I'll be out on vacation next week so may be slow to respond. I've not got any objections so please don't wait for me if others are happy to approve.

llvm/test/DebugInfo/symbolize-filter-markup-data.test
18

Possibly worth a test with too many fields?

33

I can't see a label l in the test. Was the intention to set the size of long to 4? If so I think you'll need .size long, 4. The same applies to .size b, 1 below.

Checking the object file the sizes of long and byte are 0.

mysterymath marked an inline comment as done.

Correct size directive.

llvm/test/DebugInfo/symbolize-filter-markup-data.test
18

Eh, for that test to fail and this one to pass, there would have to be a discontinuity in how the implementation handles inputs with too many fields vs too few fields. The implementation currently uses "==", and that's rather unlikely to accidentally change. A possible class of errors would be to swap checkNumFields with checkNumFieldsAtLeast or similar, but these each come with different messages.

33

Yep, these didn't get updated when I renamed the variables. Done.

Apologies for the delay in responding, have been on vacation. I've got one suggestion on the output of the module line having looked at the test. Otherwise looks good to me.

llvm/test/DebugInfo/symbolize-filter-markup-data.test
13

I don't know how much scope there is to change the output format of the mmap information in the module line given that this is based on a pre-existing markup. When I saw 0x0(r) - 0x10(r) I wasn't sure whether this was one contiguous range [0x0, 0x10) or two separate ranges, not necessarily contiguous with one starting at 0x0 and one at 0x10. I guessed at the latter given the {{{mmap}}} elements but it would have been harder to understand if I'd just seen the output and not the input.

Perhaps use a comma instead of a dash to separate the mmaps? As an alternative I guess the size could be incorporated in some way.

No worries if this isn't possible.

mysterymath marked an inline comment as done.

Address review comments.

llvm/test/DebugInfo/symbolize-filter-markup-data.test
13

Generally agree; there shouldn't be anything that hard-depends on the format, as this is intended to be human readable, not machine readable. I think we can mostly escape Hyrum's law too, since usage of this markup format is pretty limited at present.

Spent a bit of time thinking about this, and ended up with [0x0-0x9](r),[0x10-0x11](r). Using brackets agrees with the mathematical notation for inclusive ranges. Using inclusive ranges and dashes inside makes this also readable to those unfamiliar with that notation.

peter.smith accepted this revision.Aug 4 2022, 2:07 AM

Thanks for the update. Looks good to me.

This revision is now accepted and ready to land.Aug 4 2022, 2:07 AM
This revision was landed with ongoing or failed builds.Aug 4 2022, 10:20 AM
This revision was automatically updated to reflect the committed changes.