When using dsymForUUID, the majority of time symbolication a crashlog with crashlog.py is spent waiting for it to complete. Currently, we're calling dsymForUUID sequentially when iterating over the modules. We can drastically cut down this time by calling dsymForUUID in parallel. This patch uses Python's ThreadPoolExecutor (introduced in Python 3.2) to parallelize this IO-bound operation.
The performance improvement is hard to benchmark, because even with an empty local cache, consecutive calls to dsymForUUID for the same UUID complete faster. With warm caches, I'm seeing a ~30% performance improvement (~90s -> ~60s). I suspect the gains will be much bigger for a cold cache.
The changes to this line and the one below are the result of "Getting symbols [...]" and "Resolved symbols [...]" no longer appearing after each other. By parallelizing this operation you get a bunch of consecutive "Getting symbols [...]" followed by a bunch of "Resolved symbols [...]" lines.