This is an archive of the discontinued LLVM Phabricator instance.

[Symbolizer] Implement pc element in symbolizing filter.
ClosedPublic

Authored by mysterymath on Aug 3 2022, 2:30 PM.

Details

Summary

Implements the pc element for the symbolizing filter, including it's
"ra" and "pc" modes. Return addresses ("ra") are adjusted by
decrementing one. By default, {{{pc}}} elements are assumed to point to
precise code ("pc") locations. Backtrace elements will adopt the
opposite convention.

Along the way, some minor refactors of value printing and colorization.

Diff Detail

Event Timeline

mysterymath created this revision.Aug 3 2022, 2:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 2:30 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
mysterymath requested review of this revision.Aug 3 2022, 2:30 PM
Herald added a project: Restricted Project. · View Herald Transcript

Change PC location information from angle brackets to brackets to avoid
conflicting with C++ template arguments.

phosek added inline comments.Aug 4 2022, 12:13 AM
llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
74–83

Nit: it's more common in LLVM to use capitalized names for enums, see https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

Correct enum naming convention.
Handle default PC type out of band.
Fix data test.

llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h
74–83

I remember looking this up because I wasn't sure, then apparently I promptly forgot about what I found. Done.

Implementation looks good to me. One quick question to clarify my understanding.

llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
255

Just wanted to check my understanding of whether it is important when subtracting from Addr to land on the start of an instruction? My guess is not, as it would be difficult on a variable length instruction set.

Reading the documentation for ra in the backtrace section:
the symbolizing filter will subtract one byte or one instruction length from the actual return address for the call site
It looks like from the test case

4: e8 00 00 00 00         callq   0x9 <first+0x9> // Line 4
9: 5d                            popq    %rbp                 // Line 5

That address 9 - 1 is within the callq instruction and not the start yet it still displays line 4 so it all looks OK.

On AArch64 it would be possible to subtract 4 as all instructions are 4-bytes, but no need if subtracting 1 will work.

mysterymath marked an inline comment as done.Aug 4 2022, 11:08 AM
mysterymath added inline comments.
llvm/lib/DebugInfo/Symbolize/MarkupFilter.cpp
255

Yep, that's the standard hack, since the Symbolizer will return the same source location for any byte within the call. It's decidedly gross, but piping through per-target instruction length information is probably gross-er.

mysterymath marked an inline comment as done.

Add explanatory comment for ra adjustment.

peter.smith accepted this revision.Aug 5 2022, 1:27 AM

Thanks for the explanation. LGTM. May be worth waiting a day or so to see if anyone else on the review has any further comments.

This revision is now accepted and ready to land.Aug 5 2022, 1:27 AM

Use adjustAddr function declared in header.

Fix test on Windows.

This revision was landed with ongoing or failed builds.Aug 8 2022, 11:09 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Aug 8 2022, 11:33 AM

Looks like this breaks tests on Windows: http://45.33.8.238/win/64016/step_11.txt

Please take a look, and revert for now if it takes a while to fix.

Looks like this breaks tests on Windows: http://45.33.8.238/win/64016/step_11.txt

Please take a look, and revert for now if it takes a while to fix.

Sorry, I had just fixed this, but it got dropped due to a git snafu on my end. Should be fixed by 0d6cf1e8b5