This is an archive of the discontinued LLVM Phabricator instance.

[lld/mac] Write every weak symbol only once in the output
ClosedPublic

Authored by thakis on May 7 2021, 8:22 AM.

Details

Reviewers
int3
gkm
Group Reviewers
Restricted Project
Commits
rGd5a70db1938c: [lld/mac] Write every weak symbol only once in the output
Summary

Before this, if an inline function was defined in several input files,
lld would write each copy of the inline function the output. With this
patch, it only writes one copy.

Reduces the size of Chromium Framework from 378MB to 345MB (compared
to 290MB linked with ld64, which also does dead-stripping, which we
don't do yet), and makes linking it faster:

    N           Min           Max        Median           Avg        Stddev
x  10     3.9957051     4.3496981     4.1411121      4.156837    0.10092097
+  10      3.908154      4.169318     3.9712729     3.9846753   0.075773012
Difference at 95.0% confidence
        -0.172162 +/- 0.083847
        -4.14165% +/- 2.01709%
        (Student's t, pooled s = 0.0892373)

Implementation-wise, when merging two weak symbols, this sets a
"canOmitFromOutput" on the InputSection belonging to the weak symbol not put in
the symbol table. We then don't write InputSections that have this set, as long
as they are not referenced from other symbols. (This happens e.g. for object
files that don't set .subsections_via_symbols or that use .alt_entry.)

Some restrictions:

  • not yet done for bitcode inputs
  • no "comdat" handling (kindNoneGroupSubordinate* in ld64) -- Frame Descriptor Entries (FDEs), Language Specific Data Areas (LSDAs) (that is, catch block unwind information) and Personality Routines associated with weak functions still not stripped. This is wasteful, but harmless.
  • However, this does strip weaks from __unwind_info (which is needed for correctness and not just for size)
  • This nopes out on InputSections that are referenced form more than one symbol (eg from .alt_entry) for now

Things that work based on symbols Just Work:

  • map files (change in MapFile.cpp is no-op and not needed; I just found it a bit more explicit)
  • exports

Things that work with inputSections need to explicitly check if
an inputSection is written (e.g. unwind info).

This patch is useful in itself, but it's also likely also a useful foundation
for dead_strip.

I used to have a "canoncialRepresentative" pointer on InputSection instead of
just the bool, which would be handy for ICF too. But I ended up not needing it
for this patch, so I removed that again for now.

Diff Detail

Event Timeline

thakis created this revision.May 7 2021, 8:22 AM
Herald added a project: Restricted Project. · View Herald Transcript
thakis requested review of this revision.May 7 2021, 8:22 AM
gkm added a subscriber: gkm.May 7 2021, 9:16 AM
gkm accepted this revision.May 7 2021, 12:49 PM

LGTM

lld/MachO/InputFiles.cpp
614–615

These are already specified as initial values in the class declaration.

lld/MachO/UnwindInfoSection.cpp
218–219

... or omit entirely.

lld/test/MachO/weak-definition-gc.s
49
This revision is now accepted and ready to land.May 7 2021, 12:49 PM
thakis marked 2 inline comments as done.May 7 2021, 2:11 PM

Thanks!

lld/MachO/InputFiles.cpp
614–615

This is actually a bit subtle: the creation 3 lines up uses the copy ctor, so we do need to reset to the default here. (This is covered by the test.)

This revision was landed with ongoing or failed builds.May 7 2021, 2:11 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2021, 2:12 PM