This is an archive of the discontinued LLVM Phabricator instance.

Make crashlog.py work or binaries with spaces in their names
ClosedPublic

Authored by aprantl on Dec 12 2018, 11:50 AM.

Details

Summary

This is a little dangerous since the crashlog files aren't 100% unambiguous, but the risk is mitigated by using a non-greedy +? pattern.

rdar://problem/38478511

Diff Detail

Repository
rL LLVM

Event Timeline

aprantl updated this revision to Diff 177889.Dec 12 2018, 11:50 AM
aprantl created this revision.
aprantl updated this revision to Diff 177890.Dec 12 2018, 12:03 PM

Be more verbose about what is a valid arch.

jingham requested changes to this revision.Dec 14 2018, 12:56 PM

The frame regex will get confused if you had a binary called "foo 0x1" Then we would treat "foo" as the binary name, 0x1 as the pc, and the rest would be the function name. I don't see how to avoid that altogether. Maybe we could insist that the address have at least 8 numbers in it? Then your binary would have to be called "foo 0x123456789" before we get confused, at which point my caring level has dropped pretty low.

I'm not sure about the image regex. In the part where we are grabbing version numbers, you've replaced:

([^<]+)

with

([0-9a-zA-Z_]+)

In crashlogs, I see lines like:

0x10b60b000 -        0x10f707fff  com.apple.LLDB.framework (1.1000.11.38.2 - 1000.11.38.2) <96E36F5C-1A83-39A1-8713-5FDD9701C3F1> /Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/LLDB

In the first form "([^<]+)" will grab "(1.1000.11.38.2 - 1000.11.38.2) ". I presume we just discard this bit. But ([0-9a-zA-Z_]+) will stop at the first"(" for the version number, won't it?

This revision now requires changes to proceed.Dec 14 2018, 12:56 PM
clayborg requested changes to this revision.Dec 14 2018, 1:52 PM
clayborg added a subscriber: clayborg.
clayborg added inline comments.
examples/python/crashlog.py
99 ↗(On Diff #177890)

This regex doesn't work. Head to https://regex101.com/ and try it out with either:

   0x104591000 -        0x1055cfff7 +llvm-dwarfdump (0) <B104CFA1-046A-36A6-8EB4-07DDD7CC2DF3> /Users/USER/Documents/*/llvm-dwarfdump
0x7fff63f20000 -     0x7fff63f77ff7  libc++.1.dylib (400.9.4) <D4AB366F-48A9-3C7D-91BD-41198F69DD57> /usr/lib/libc++.1.dylib

The old regex works

101 ↗(On Diff #177890)

This one doesn't seem to work either

aprantl updated this revision to Diff 178375.Dec 15 2018, 1:26 PM

Fixed the reported problems with the regexes and added testcases with example lines. I used the funny test setup to avoid burdening the installed crashlog.py script with unit test code that would need to be parsed every time.

clayborg requested changes to this revision.Dec 16 2018, 2:11 PM
clayborg added inline comments.
examples/python/crashlog.py
99 ↗(On Diff #178375)

A better regex that removes the need for "image_regex_no_uuid" is:

(0x[0-9a-fA-F]+)[-\s]+(0x[0-9a-fA-F]+)\s+[+]?([^\s]+)\s+(\(.+\))?\s?(<([-0-9a-fA-F]+)>)? (.*)

The identifier can't contain spaces AFAIK. The problem with the above regex is the "(.+)" is greedy and for

0x104591000 -        0x1055cfff7 +llvm-dwarfdump (0) <B104CFA1-046A-36A6-8EB4-07DDD7CC2DF3> /Users/USER/Documents/*/llvm-dwarfdump

it consumes "llvm-dwarfdump (0)" and the "(\(.+\))?" gets nothing. Changing "(.+)" to "([^\s]+)" fixes things. Added an extra parens around the UUID <> characters so we won't need image_regex_no_uuid anymore

This revision now requires changes to proceed.Dec 16 2018, 2:11 PM
aprantl updated this revision to Diff 178476.Dec 17 2018, 9:03 AM

Got rid of the no_uuid regex and added even more testcases.

clayborg accepted this revision.Dec 17 2018, 9:07 AM

Looks good

This revision was not accepted when it landed; it landed in state Needs Review.Dec 17 2018, 9:30 AM
This revision was automatically updated to reflect the committed changes.