Page MenuHomePhabricator

[libunwind] Optimize dl_iterate_phdr's findUnwindSectionsByPhdr

Authored by rprichard on Sep 17 2020, 9:19 PM.



Currently, findUnwindSectionsByPhdr is slightly micro-optimized for the
case where the first callback has the target address, and is otherwise
very inefficient -- it decodes .eh_frame_hdr even when no PT_LOAD
matches the PC. (If the FrameHeaderCache is enabled, then the
micro-optimization only helps the first time unwind info is looked up.)

Instead, it makes more sense to optimize for the case where the
callback *doesn't* find the target address, so search for a PT_LOAD
segment first, and only look for the unwind info section if a matching
PT_LOAD is found.

This change helps on an Android benchmark with 100 shared objects,
where the DSO at the end of the dl_iterate_phdr list throws 10000
exceptions. Assuming the frame cache is disabled, this change cuts
about 30-40% off the benchmark's runtime.

Diff Detail

Event Timeline

rprichard created this revision.Sep 17 2020, 9:19 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 17 2020, 9:19 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
rprichard requested review of this revision.Sep 17 2020, 9:19 PM

I uploaded the Android benchmark I used:

Results on a Pixel 3 blueline device, running Q, with taskset 10:


ninja: Entering directory `out'
[102/102] /x/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/
out/: 105 files pushed, 0 skipped. 44.9 MB/s (2362029 bytes in 0.050s)
/x/multitime-android/multitime: 1 file pushed, 0 skipped. 226.5 MB/s (123912 bytes in 0.001s)
===> /data/local/tmp/multitime results
1: /data/local/tmp/out/main
            Mean                 Std.Dev.    Min         Median      Max
real        0.4897+/-0.04305     0.1671      0.1677      0.4727      0.7813      
user        0.4815+/-0.04297     0.1668      0.1533      0.4617      0.7733      
sys         0.0048+/-0.00074     0.0029      0.0000      0.0033      0.0133


ninja: Entering directory `out'
[102/102] /x/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/
out/: 105 files pushed, 0 skipped. 46.3 MB/s (2362029 bytes in 0.049s)
/x/multitime-android/multitime: 1 file pushed, 0 skipped. 247.7 MB/s (123912 bytes in 0.000s)
===> /data/local/tmp/multitime results
1: /data/local/tmp/out/main
            Mean                 Std.Dev.    Min         Median      Max
real        0.3254+/-0.02297     0.0892      0.1630      0.3386      0.4572      
user        0.3174+/-0.02284     0.0887      0.1567      0.3333      0.4500      
sys         0.0051+/-0.00067     0.0026      0.0000      0.0067      0.0133

The large variation between min and max run-times comes from the cbdata->targetAddr < pinfo->dlpi_addr optimization in findUnwindSectionsByPhdr. On Bionic, when the dynamic loader loads a group of DSOs, all at once, it randomizes the order in which the files are mapped into memory, so the order of DSOs with respective to each other varies from one run to the next. In other respects, the order of the libraries is deterministic -- e.g. the order that constructors are called, and the order in which dl_iterate_phdr iterates over the DSOs.

FWIW: An Android NDK user did complain about the performance of EH with many libraries:

I'm still interested in making the FrameHeaderCache actually work on Android, but for now an NDK app can't use the cache because the dlpi_adds/dlpi_subs fields aren't present in most Android versions.

saugustine accepted this revision.Sep 21 2020, 10:55 AM

Thanks for the review. It looks like I still need someone from the libunwind group to accept it, though.

compnerd accepted this revision.Sep 22 2020, 9:26 AM
This revision is now accepted and ready to land.Sep 22 2020, 9:26 AM
This revision was landed with ongoing or failed builds.Sep 23 2020, 3:42 PM
This revision was automatically updated to reflect the committed changes.