This is an archive of the discontinued LLVM Phabricator instance.

[dsymutil] Fix O(n^2) behavior when running on ld64.lld's current ICF output
ClosedPublic

Authored by thakis on Apr 6 2022, 7:10 AM.

Details

Summary

STABS information consists of a list of records in the linked binary
that look like this:

OSO: path/to/some.o
SO: path/to/some.c
FUN: sym1
FUN: sym2
...

The linked binary has one such set of records for every .o file linked
into it.

When dsymutil processes the binary's STABS information, it:

  1. Reads the .o file mentioned in the OSO line
  2. For each FUN entry after it in the main executable's STABS info:
    1. it looks up that symbol in the symbol of that .o file
    2. if it doesn't find it there, it goes through all symbols in the main binary at the same address and sees if any of those match

With ICF, ld64.lld's STABS output claims that all identical functions
that were folded are in the .o file of the one that's deemed the
canonical one. Many small functions might be folded into a single
function, so there are .o OSO entries that end up with many FUN lines,
but almost none of them exist in the .o file's symbol table.

Previously, dsymutil would do a full scan of all symbols in the main
executable _for every of these entries_.

This patch instead scans all aliases once and remembers them per name.
This reduces the alias resolution complexity from
O(number_of_aliases_in_o_file * number_of_symbols_in_main_executable) to
O(number_of_aliases_in_o_file * log(number_of_aliases_in_o_file)).

In practice, it reduces the time spent to run dsymutil on
Chromium Framework from 26 min (after https://reviews.llvm.org/D89444)
or 12 min (before https://reviews.llvm.org/D89444) to ~8m20s.

We probably want to change how ld64.lld writes STABS entries when ICF
is enabled, but making dsymutil not have pathological performance for
this input seems like a good change as well.

No expected behavior change (other than it's faster). I verified that
for Chromium Framework, the generated .dSYM is identical with and
without this patch.

Diff Detail

Event Timeline

thakis created this revision.Apr 6 2022, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 7:10 AM
thakis requested review of this revision.Apr 6 2022, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 7:10 AM
thakis edited the summary of this revision. (Show Details)Apr 6 2022, 7:14 AM

(Note to self: Upstream bug is https://crbug.com/1280965)

thakis edited the summary of this revision. (Show Details)Apr 6 2022, 7:15 AM
thakis edited the summary of this revision. (Show Details)
thakis edited the summary of this revision. (Show Details)
thakis added a subscriber: Restricted Project.
thakis added a comment.Apr 6 2022, 7:59 AM

Didn't see the LLVM.tools/dsymutil/ARM::extern-alias.test failure locally since I'm building with only X86 and Aarch64 enabled. Looking…

thakis updated this revision to Diff 420893.Apr 6 2022, 8:42 AM

This fixes the test. It found a problem, I had confused main binary and object file values.

Need to re-run numbers since we do some more work now – but the improvement in O() complexity remains, so hopefully it's still a good improvement.

JDevlieghere accepted this revision.Apr 6 2022, 8:51 AM

Thanks Nico. LGTM modulo the inline nit.

llvm/tools/dsymutil/MachODebugMapParser.cpp
68

SeenValues sounds a bit generic. Maybe something like SeenAliasValues would make it more obvious that this is specific to aliases?

This revision is now accepted and ready to land.Apr 6 2022, 8:51 AM
thakis marked an inline comment as done.Apr 6 2022, 9:39 AM

Thanks!

I re-ran the numbers. With the tweaked patch, needed 9m30s on the first run and 8m30s on the second run (warmer disk cache), so doing this correctly didn't cost a lot of performance. Still way over 2x faster than HEAD, and still 25% faster than HEAD with D89444 reverted.

This revision was landed with ongoing or failed builds.Apr 6 2022, 9:40 AM
This revision was automatically updated to reflect the committed changes.