This is an archive of the discontinued LLVM Phabricator instance.

[crashlog] Implement parser for JSON encoded crashlogs
ClosedPublic

Authored by JDevlieghere on Nov 9 2020, 11:54 PM.

Details

Summary

Add a parser for JSON crashlogs. The CrashLogParser now defers to either the JSONCrashLogParser or the TextCrashLogParser. It first tries to interpret the input as JSON, and if that fails falling back to the textual parser.

Diff Detail

Event Timeline

JDevlieghere requested review of this revision.Nov 9 2020, 11:54 PM
JDevlieghere created this revision.

JSON crash logs are almost here??? I remember asking for those back when I was at Apple! Great news.

It would be nice to have a crash log from a small but real process we might be able to load as an end to end test?

lldb/examples/python/crashlog.py
413–414

you ok with this raising throwing an exception when/if these key/value pairs are not in the file?

JDevlieghere marked an inline comment as done.Nov 10 2020, 9:26 PM

JSON crash logs are almost here??? I remember asking for those back when I was at Apple! Great news.

Yep! 🥳

It would be nice to have a crash log from a small but real process we might be able to load as an end to end test?

I'd be happy to, but is there a way to test the crashlog script without having a dsymForUUID?

lldb/examples/python/crashlog.py
413–414

Yeah, at least initially. It appears the format is still not locked down, so I hope this will give us more signal when things end up changing. But good point, long term we should be more defensive.

JSON crash logs are almost here??? I remember asking for those back when I was at Apple! Great news.

Yep! 🥳

Nice!

It would be nice to have a crash log from a small but real process we might be able to load as an end to end test?

I'd be happy to, but is there a way to test the crashlog script without having a dsymForUUID?

Yes, it would still load it up correctly and just show you the stack frame that are in the JSON format. The only reason you need dsymForUUID is to expand inline frames and add source file and line info if it isn't already in the crash log. So the test would test that it can parse the JSON format correctly and display what is already in the crash log without needing to actually re-symbolicate. "crashlog" doesn't assume it will be able to find symbols for all images. It will keep the string from the crashlog file that describes the address:

Thread 0:: CrBrowserMain  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff678a2dfa mach_msg_trap + 10
1   libsystem_kernel.dylib        	0x00007fff678a3170 mach_msg + 60
2   com.apple.CoreFoundation      	0x00007fff2d6f4ef5 __CFRunLoopServiceMachPort + 247
3   com.apple.CoreFoundation      	0x00007fff2d6f39c2 __CFRunLoopRun + 1319
4   com.apple.CoreFoundation      	0x00007fff2d6f2e3e CFRunLoopRunSpecific + 462
5   com.apple.HIToolbox           	0x00007fff2c31fabd RunCurrentEventLoopInMode + 292
6   com.apple.HIToolbox           	0x00007fff2c31f7d5 ReceiveNextEventCommon + 584
7   com.apple.HIToolbox           	0x00007fff2c31f579 _BlockUntilNextEventMatchingListInModeWithFilter + 64
8   com.apple.AppKit              	0x00007fff2a965039 _DPSNextEvent + 883
9   com.apple.AppKit              	0x00007fff2a963880 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1352
10  com.apple.AppKit              	0x00007fff2a95558e -[NSApplication run] + 658
11  com.apple.AppKit              	0x00007fff2a927396 NSApplicationMain + 777
12  com.sketchup.SketchUp.2017    	0x000000010a62638c main + 144
13  libdyld.dylib                 	0x00007fff67761cc9 start + 1

So for the first frame here from the text based crash log it would keep "mach_msg_trap + 10" in case it can't symbolicate any better.

You could also make a fakey dsymForUUID script like we do in some other corefile tests. Compile a C file with foo(), bar(), baz(), substitute the addresses of those symbols from the binary into the crashlog.json, substitute the UUID of the compiled program in the crashlog.json, then the fakey-dsymForUUID finds the binary, lldb loads it at the binary at the correct offset. It's a lot of futzing around, but it would be possible.

You could also make a fakey dsymForUUID script like we do in some other corefile tests. Compile a C file with foo(), bar(), baz(), substitute the addresses of those symbols from the binary into the crashlog.json, substitute the UUID of the compiled program in the crashlog.json, then the fakey-dsymForUUID finds the binary, lldb loads it at the binary at the correct offset. It's a lot of futzing around, but it would be possible.

Crash logs can be loaded on other machines as well. Nothing stopping the test from running on linux, windows, or other systems. The main thing we want to test is if the JSON format creates the right objects with the right info

Crash logs can be loaded on other machines as well. Nothing stopping the test from running on linux, windows, or other systems. The main thing we want to test is if the JSON format creates the right objects with the right info

Yeah, fakey-dsymForUUID isn't necessary. You compile a userland C program, grab the addresses of a few of its functions, substitute them & the UUID into the crashlog.json, then create an SBTarget with the compiled program and run the crashlog.json and test that it slid the binary & prints the right backtrace or whatever.

I guess the register context part of the crashlog.json would be the most architecture-dependent, but I bet crashlog.json doesn't use any of that when printing the backtrace -- it only uses the stack trace's pc values. So you could have a "x86_64" crashlog.json, build an arm64e userland program, substitute in the slide-adjusted values of a few functions and the UUID and it would load & symbolicate in lldb.

JDevlieghere marked an inline comment as done.

Add tests for JSON and text format

clayborg accepted this revision.Nov 12 2020, 2:31 PM

Thanks for adding the extra tests!

This revision is now accepted and ready to land.Nov 12 2020, 2:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2020, 1:50 PM