This is an archive of the discontinued LLVM Phabricator instance.

[windows][support] Improve backtrace emitted in crash report without llvm-symbolizer
ClosedPublic

Authored by bd1976llvm on Jun 15 2022, 4:19 PM.

Details

Summary

Currently the backtrace emitted on windows when llvm-symbolizer is not available looks like:

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: c:\\u\\br2\\bin\\ld.lld.exe -o C:\\Users\\BDUNBO~1\\AppData\\Local\\Temp\\lit-tmp-8ymp966z\\tmpztn1fyw1
0x00007FF64EE51D20 (0x0000000000000083 0x00007FF64EEFCAA1 0x0000000000000000 0x000001750CD01140)
0x00007FF64EEFC9DB (0x0000000000000000 0x0000000703B8EBA9 0x0000017603357418 0x0000000000000001)
...

This is not useful in many circumstance as the program counter (RIP) (the numbers in the left-most column) cannot be easily decoded because the addresses have the containing module's run-time base address added into them, but we don't know what those base addresses are. This change emits a module offset rather than an address.

Example output after this change:

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: c:\\u\\br2\\bin\\ld.lld.exe -o C:\\Users\\BDUNBO~1\\AppData\\Local\\Temp\\lit-tmp-8ymp966z\\tmpztn1fyw1
0x00007FF64EE51D20 (0x0000000000000083 0x00007FF64EEFCAA1 0x0000000000000000 0x000001750CD01140), c:\u\br2\bin\ld.lld.exe(0x0000000000B70000) + 0x17E8F7 byte(s)
0x00007FF64EEFC9DB (0x0000000000000000 0x0000000703B8EBA9 0x0000017603357418 0x0000000000000001), C:\WINDOWS\SYSTEM32\ntdll.dll(0x0000000077560000) + 0x46E2C byte(s)
...

Note that the above output is an example of the output when file and line information cannot be retrieved for symbols. This is reasonably likely, for example if the .pdb files were not shipped with the toolchain binaries. If the file and line info can be retrieved lines like the following would be output:

0x00007FF64EE51D20 (0x0000000000000083 0x00007FF64EEFCAA1 0x0000000000000000 0x000001750CD01140), C:\WINDOWS\SYSTEM32\ntdll.dll(0x0000000077560000) + 0x46E2C byte(s), RtlAllocateHeap() + 0x109C byte(s)

Diff Detail

Event Timeline

bd1976llvm created this revision.Jun 15 2022, 4:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 4:19 PM
bd1976llvm requested review of this revision.Jun 15 2022, 4:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 4:19 PM
rnk added a comment.Jun 15 2022, 4:24 PM

Seems useful

llvm/lib/Support/Windows/Signals.inc
339–340

I propose we scrap this parameter printing. I've never found it useful. It's a bunch of hex noise that makes the crash report look scary. My understanding is that doesn't work well on win64, since it assumes parameters are homed into the shadow stack space, which Clang doesn't generally do.

bd1976llvm added inline comments.Jun 15 2022, 4:32 PM
llvm/lib/Support/Windows/Signals.inc
339–340

Yes. Nice suggestion. We recently debugged out a horrible close-to-impossible-to-reproduce issue where this backtrace information was all we had available. I'll ask tomorrow whether anyone made use of these parameters and get back to you...

how's this compare to the linux behavior? be nice to have a similar format

aaron.ballman added inline comments.
llvm/lib/Support/Windows/Signals.inc
339–340

I think the stack traces we print out are basically just there for the call stack alone -- in order to do any real debugging of the situation, I've found I've needed to go to the minidump anyway (https://reviews.llvm.org/D18216). I'd be fine dropping the parameter information to reduce the amount of noise in the crash report.

bd1976llvm edited the summary of this revision. (Show Details)

I have asked around and no one made use of those numbers so I have removed them in the latest diff. However, people would have found the exception code useful so I have added that. Output now looks like:

With llvm-symbolizer:
Stack dump:
0. Program arguments: C:\\u\\br2\\bin\\ld.lld.exe
Exception Code: 0x80000003
#0 0x00007ff61d276294 (C:\u\br2\bin\ld.lld.exe+0x1f6294)
#1 0x00007ff61d2761dd (C:\u\br2\bin\ld.lld.exe+0x1f61dd)
#2 0x00007ff61d12652c (C:\u\br2\bin\ld.lld.exe+0xa652c)
...

Without llvm-symbolizer:
Stack dump:
0. Program arguments: C:\\u\\br2\\bin\\ld.lld.exe
Exception Code: 0x80000003
0x00007FF61D276294, C:\u\br2\bin\ld.lld.exe(0x00007FF61D080000) + 0x1F6294 byte(s)
0x00007FF61D2761DD, C:\u\br2\bin\ld.lld.exe(0x00007FF61D080000) + 0x1F61DD byte(s)
0x00007FF61D12652C, C:\u\br2\bin\ld.lld.exe(0x00007FF61D080000) + 0xA652C byte(s)
...

how's this compare to the linux behavior? be nice to have a similar format

There are differences. This doesn't really bother me as a crash dump is platform specific but if this is important I can try to make the output consistent (or maybe that should be another change)?

I have asked around and no one made use of those numbers so I have removed them in the latest diff. However, people would have found the exception code useful so I have added that. Output now looks like:

+1 to keeping the exception code, that can be very helpful information to have. (I would not be sad if we decided to map the hex value to a more human readable string as well.)

rnk accepted this revision.Jun 17 2022, 10:13 AM

lgtm

+1 for the exception code. I think it may appear as the process return code and some systems will log that, but logging it ourselves, too, is for the best.

I think we could probably go further here. The unsymbolized output when we use llvm-symbolizer and it can't find PDB files is practically the same as the new output with your change. It would be nice to be able to fetch symbols and then re-pipe the crash report text through llvm-symbolizer and get a symbolized stack trace. That can be future work, this is a nice improvement as is.

This revision is now accepted and ready to land.Jun 17 2022, 10:13 AM

how's this compare to the linux behavior? be nice to have a similar format

There are differences. This doesn't really bother me as a crash dump is platform specific but if this is important I can try to make the output consistent (or maybe that should be another change)?

oh, wasn't so much about it being exactly the same formatting (though that seems nice to have) - but whether it contained the same information (or is there a real difference between the platforms that motivates the differences?) - like I don't recall any of this parameter dumping in the linux crash dump.

(& maybe if the OS-specific part was lower level (surfacing the module base addresses and offsets) leaving the rendering to a higher level portable piece of code, that'd address the inconsistencies and reduce code duplication?)

lgtm

+1 for the exception code. I think it may appear as the process return code and some systems will log that, but logging it ourselves, too, is for the best.

I think we could probably go further here. The unsymbolized output when we use llvm-symbolizer and it can't find PDB files is practically the same as the new output with your change. It would be nice to be able to fetch symbols and then re-pipe the crash report text through llvm-symbolizer and get a symbolized stack trace. That can be future work, this is a nice improvement as is.

That sounds like this: https://reviews.llvm.org/D126980 perhaps? (we could change the crash dump output to include the markup required for the filtering, specifically when we can't already symbolize at crash-time)

bd1976llvm closed this revision.Jun 20 2022, 5:13 AM

I have made a commit with the current improvements.

@dblaikie I like your ideas for further improvements. These can be addressed in future patches.