This is an archive of the discontinued LLVM Phabricator instance.

[lldb/crashlog] Load inlined symbol into interactive crashlog
ClosedPublic

Authored by mib on Mar 23 2023, 3:19 PM.

Details

Summary

Sometimes, crash reports come with inlined symbols. These provide the
exact stacktrace from the user binary.

However, when investigating a crash, it's very likely that the images related
to the crashed thread are not available on the debugging user system or
that the versions don't match. This causes interactive crashlog to show
a degraded backtrace in lldb.

This patch aims to address that issue, by parsing the inlined symbols
from the crash report and load them into lldb's target.

This patch is a follow-up to 27f27d1, focusing on inlined symbols
loading from legacy (non-json) crash reports.

To do so, it updates the stack frame regular expression to make the
capture groups more granular, to be able to extract the symbol name, the
offset and the source location if available, while making it more
maintainable.

So now, when parsing the crash report, we build a data structure
containing all the symbol information for each stackframe. Then, after
launching the scripted process for interactive mode, we write a JSON
symbol file for each module, only containing the symbols that it contains.

Finally, we load the json symbol file into lldb, before showing the user
the process status and backtrace.

rdar://97345586

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>

Diff Detail

Event Timeline

mib created this revision.Mar 23 2023, 3:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2023, 3:19 PM
mib requested review of this revision.Mar 23 2023, 3:19 PM
mib added inline comments.Mar 23 2023, 3:28 PM
lldb/examples/python/crashlog.py
552–557

Same problem here I guess: IIUC if pc = image['base'] + frame_offset, then I believe frame_offset is not the address of the beginning of the function, but rather an address in the middle of the function (either at a callsite, or at the crash site).

669

@kastiglione may be you have a better idea how to handle symbol + offset where + offset might be optional.

880

I'm not sure if I should subtract the offset from the frame_addr so the symbol is at the right address.

Currently, the symbol name is symbol + offset, so I didn't have to do anything subtraction.

kastiglione added inline comments.Mar 23 2023, 3:45 PM
lldb/examples/python/crashlog.py
669

symbol is always present?

bulbazord added inline comments.Mar 23 2023, 3:50 PM
lldb/examples/python/crashlog.py
540

is image not the same as json_image from a few lines above this?

552–557

Yeah so pc is definitely not the right address of the symbol. It should be pc - frame_offset right?

606–611

What is the difference between frame_ofs and frame_offset?

1175–1176

Are these indented too much?

1194–1195

Won't this string look like "couldn't import crash report inlined symbols for %s (%s)"? You could probably just do string concatenation instead.

mib added inline comments.Mar 23 2023, 3:57 PM
lldb/examples/python/crashlog.py
540

+1

606–611

frame_ofs is actually the frame symbol name. I guess I should rename this for more clarity.

Some stack frame might have a symbol name and a offset, other might know have a symbol name.

Now that I'm thinking about this, I'm not sure if a crash report could be already symbolicated (i.e. symbol:line:column). In that case I think it will be pretty tricky to get the symbol address from the source info.

669

Technically, in the past the offs (symbol + offset) was optional:

r'(?: +(.*))?'

So I guess symbol could be missing.

kastiglione added inline comments.Mar 23 2023, 7:40 PM
lldb/examples/python/crashlog.py
669

Breaking it down, there is:

  1. a symbol (maybe)
  2. the literal text " + " followed by a number (also maybe)

I'll start start with 2:

\+ (\d+)

For the symbol, .*, it should instead have at least a length of 1. I'd use + instead of *. And, it shouldn't be _any_ character. At the very least it should be non-space, which is \S.

To combine these at a high level it looks like:

(?:(<symbol>)(?:<offset>)?)?

Filling in these two with symbol='\S+' and offset=' \+ (\d+)', it becomes:

(?:(\S+)(?: \+ (\d+))?)?

Here's some python real session that minimally exercises this code:

% python
>>> import re
>>> pat = re.compile(r"(?:(\S+)(?: \+ (\d+))?)?")
>>> pat.match("func + 123").groups()
('func', '123')
>>> pat.match("func").groups()
('func', None)
>>> pat.match("").groups()
(None, None)
>>>
JDevlieghere added inline comments.Mar 24 2023, 10:16 AM
lldb/examples/python/crashlog.py
1187–1192

We shouldn't keep those temp files around once they've been loaded by LLDB. We should use a tempfile.NamedTemporaryFile for this with the UUID as part of its name pattern.

lldb/include/lldb/API/SBModuleSpec.h
78

We don't need the uuid_len, the string should be NULL terminated. That matches other FromString SB APIs.

lldb/source/API/SBModuleSpec.cpp
136

newline after LLDB_INSTRUMENT

mib updated this revision to Diff 511805.Apr 7 2023, 3:11 PM
mib marked 7 inline comments as done and an inline comment as not done.

Address some of the comments.

kastiglione added inline comments.Apr 7 2023, 3:22 PM
lldb/examples/python/crashlog.py
669
(?:(?: +\+ +)([0-9]+))

Non-capturing groups are only needed when you use a quantifier, like ?, *, or +. These regex has two non-capturing groups that don't use a quantifier, which means you can remove the groups:

+\+ +([0-9]+)
mib marked 2 inline comments as done.May 18 2023, 3:13 PM
mib added inline comments.
lldb/examples/python/crashlog.py
669

Thanks for the tips! Regarding replacing the .+ by \S+ doesn't work because the symbol name could have a space in it (for objc symbols).

I'll try to break down these regex in smaller ones to make it easier to debug in the future.

mib updated this revision to Diff 523948.May 19 2023, 3:14 PM

Address feedbacks:

  • Simplify and Improve regex
  • Add test
mib edited the summary of this revision. (Show Details)May 19 2023, 3:15 PM
mib set the repository for this revision to rG LLVM Github Monorepo.

Looks good! A few minor comments.

lldb/examples/python/crashlog.py
618–619

I think this comment better describes parse_thread rather than parse_asi_backtrace no?

624–625

Does it make sense for self.images to be a dictionary instead of a list? Something that maps image_name -> image_info. That way you could do something like self.images[frame_img_name].symbols[frame_symbol] = { ... } instead of iterating over every image (with appropriate error checking in the event that frame_img_name isn't in self.images yet).

629

if frame_offset is None, you're doing int(None) which will give you a TypeError. Above you can do something like frame_offset_value = 0 and in the if frame_offset: block you can do frame_offset_value = int(frame_offset).

688

typo: infor -> info

691

What does garbage mean in this case? I assume it's boilerplate of some kind.

917
921

Same situation here as above, frame_offset might be None and then you'll crash with a TypeError.

mib marked 2 inline comments as done.May 19 2023, 3:55 PM
mib marked 7 inline comments as done.May 19 2023, 4:06 PM
mib added inline comments.
lldb/examples/python/crashlog.py
618–619

+1, that's an oversight.

624–625

I thought about that but the problem is that JSONCrashLogParser uses image index and TextCrashLogParser uses image name as dictionary keys. It would be pretty convoluted to make the dictionary that is the same for both parser so I'd rather keep a list here.

629

Good point!

691

When I broke down the regex, I really to isolate the capture groups that we cared the most about.
By "garbage" I mean text that are required to parse the whole thing but that isn't captured.

mib updated this revision to Diff 523961.May 19 2023, 4:18 PM
mib marked 4 inline comments as done.

Address @bulbazord comments

bulbazord accepted this revision.May 19 2023, 4:20 PM

Ok, looks good to me now. Thanks!

This revision is now accepted and ready to land.May 19 2023, 4:20 PM
kastiglione added inline comments.May 19 2023, 4:29 PM
lldb/examples/python/crashlog.py
683

this is just "spaces", not "spaces and + sign"

685

this is "space and + sign" not "spaces"

690

: does not need to be escaped

mib updated this revision to Diff 523980.May 19 2023, 8:06 PM
mib marked 3 inline comments as done.

Address @kastiglione comments and reformat.

This revision was landed with ongoing or failed builds.May 19 2023, 8:08 PM
This revision was automatically updated to reflect the committed changes.