This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Parallelize fetching symbol files in crashlog.py
ClosedPublic

Authored by JDevlieghere on May 6 2022, 10:57 AM.

Details

Reviewers
mib
clayborg
Summary

When using dsymForUUID, the majority of time symbolication a crashlog with crashlog.py is spent waiting for it to complete. Currently, we're calling dsymForUUID sequentially when iterating over the modules. We can drastically cut down this time by calling dsymForUUID in parallel. This patch uses Python's ThreadPoolExecutor (introduced in Python 3.2) to parallelize this IO-bound operation.

The performance improvement is hard to benchmark, because even with an empty local cache, consecutive calls to dsymForUUID for the same UUID complete faster. With warm caches, I'm seeing a ~30% performance improvement (~90s -> ~60s). I suspect the gains will be much bigger for a cold cache.

Diff Detail

Event Timeline

JDevlieghere created this revision.May 6 2022, 10:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 10:57 AM
JDevlieghere requested review of this revision.May 6 2022, 10:57 AM
JDevlieghere edited the summary of this revision. (Show Details)
JDevlieghere added inline comments.May 6 2022, 11:00 AM
lldb/examples/python/crashlog.py
272

The changes to this line and the one below are the result of "Getting symbols [...]" and "Resolved symbols [...]" no longer appearing after each other. By parallelizing this operation you get a bunch of consecutive "Getting symbols [...]" followed by a bunch of "Resolved symbols [...]" lines.

mib accepted this revision.May 6 2022, 11:04 AM

Very cool! LGTM!

This revision is now accepted and ready to land.May 6 2022, 11:04 AM
clayborg accepted this revision.May 10 2022, 1:36 PM

lgtm!