This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Teach ICF to dedup functions with identical unwind info
ClosedPublic

Authored by int3 on Sep 16 2021, 8:37 PM.

Details

Summary

Dedup'ing unwind info is tricky because each CUE contains a different
function address, if ICF operated naively and compared the entire
contents of each CUE, entries with identical unwind info but belonging
to different functions would never be considered identical. To work
around this problem, we slice away the function address before
performing ICF. We rely on relocateCompactUnwind() to correctly handle
these truncated input sections.

Here are the numbers before and after D109944, D109945, and this diff
were applied, as tested on my 3.2 GHz 16-Core Intel Xeon W:

Without any optimizations:

           base           diff           difference (95% CI)
sys_time   0.849 ± 0.015  0.896 ± 0.012  [  +4.8% ..   +6.2%]
user_time  3.357 ± 0.030  3.512 ± 0.023  [  +4.3% ..   +5.0%]
wall_time  3.944 ± 0.039  4.032 ± 0.031  [  +1.8% ..   +2.6%]
samples    40             38

With -dead_strip:

           base           diff           difference (95% CI)
sys_time   0.847 ± 0.010  0.896 ± 0.012  [  +5.2% ..   +6.5%]
user_time  3.377 ± 0.014  3.532 ± 0.015  [  +4.4% ..   +4.8%]
wall_time  3.962 ± 0.024  4.060 ± 0.030  [  +2.1% ..   +2.8%]
samples    47             30

With -dead_strip and --icf=all:

           base           diff           difference (95% CI)
sys_time   0.935 ± 0.013  0.957 ± 0.018  [  +1.5% ..   +3.2%]
user_time  3.472 ± 0.022  6.531 ± 0.046  [ +87.6% ..  +88.7%]
wall_time  4.080 ± 0.040  5.329 ± 0.060  [ +30.0% ..  +31.2%]
samples    37             30

Unsurprisingly, ICF is now a lot slower, likely due to the much larger
number of input sections it needs to process. But the rest of the
linker only suffers a mild slowdown.

Note that the compact-unwind-bad-reloc.s test was expanded because we
now handle the relocation for CUE's function address in a separate code
path from the rest of the CUE relocations. The extended test covers both
code paths.

Diff Detail

Event Timeline

int3 created this revision.Sep 16 2021, 8:37 PM
Herald added a project: Restricted Project. · View Herald Transcript
int3 requested review of this revision.Sep 16 2021, 8:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2021, 8:37 PM
int3 updated this revision to Diff 373134.Sep 16 2021, 8:44 PM

add comment

Unsurprisingly, ICF is now a lot slower, likely due to the much larger number of input sections it needs to process.

Is it worth making this an option that could be turned off?

int3 added a comment.Sep 17 2021, 11:09 AM

If there's demand for it I guess we can add it later. Maybe as part of some -O1/2/3 flags.

thakis added a subscriber: thakis.Oct 26 2021, 5:58 PM

I patched this in to play with it, and I discovered that this makes lld crash for the repro in comment 0 of https://bugs.llvm.org/show_bug.cgi?id=50411.

int3 planned changes to this revision.Oct 27 2021, 3:08 PM
int3 requested review of this revision.Oct 27 2021, 8:32 PM

That crash gets fixed by D112687. I think it'd be fairly awkward/tricky to write a test for that particular crash though

int3 updated this revision to Diff 384461.Nov 3 2021, 8:35 AM

update

int3 added a comment.Nov 12 2021, 12:53 PM

Sigh I just spent a decent amount of time setting up a Linux env to track down my test failures in D113721, only to find that the culprit was really that I had it stacked upon this diff in my local copy :/

Can I get a stamp to land this?

This revision is now accepted and ready to land.Nov 12 2021, 12:57 PM