This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Enable EH frame relocation / pruning
ClosedPublic

Authored by int3 on Jul 12 2022, 12:12 AM.

Details

Reviewers
MaskRay
oontvoo
Group Reviewers
Restricted Project
Commits
rG403d61aeddec: [lld-macho] Enable EH frame relocation / pruning
Summary

This just removes the code that gates the logic. The main issue here is
perf impact: without D122258: [MC] Omit DWARF unwind info if compact unwind is present where eligible, LLD takes a significant perf hit because
it now has to do a lot more work in the input parsing phase. But with
that change to eliminate unnecessary EH frames from input object files,
the perf overhead here is minimal. Concretely, here are the numbers for
some builds as measured on my 16-core Mac Pro:

chromium_framework

This is without the use of -femit-dwarf-unwind=no-compact-unwind:

           base           diff           difference (95% CI)
sys_time   1.826 ± 0.019  1.962 ± 0.034  [  +6.5% ..   +8.4%]
user_time  9.306 ± 0.054  9.926 ± 0.082  [  +6.2% ..   +7.1%]
wall_time  8.225 ± 0.068  8.947 ± 0.128  [  +8.0% ..   +9.6%]
samples    15             22

With that flag enabled, the regression mostly disappears, as hoped:

           base           diff           difference (95% CI)
sys_time   1.839 ± 0.062  1.866 ± 0.068  [  -0.9% ..   +3.8%]
user_time  9.452 ± 0.068  9.490 ± 0.067  [  -0.1% ..   +0.9%]
wall_time  8.383 ± 0.127  8.452 ± 0.114  [  -0.1% ..   +1.8%]
samples    17             21

Unnamed internal app

Without -femit-dwarf-unwind, this is the perf hit:

           base           diff           difference (95% CI)
sys_time   1.372 ± 0.029  1.317 ± 0.024  [  -4.6% ..   -3.5%]
user_time  2.835 ± 0.028  2.980 ± 0.027  [  +4.8% ..   +5.4%]
wall_time  3.205 ± 0.079  3.383 ± 0.066  [  +4.9% ..   +6.2%]
samples    102            83

With -femit-dwarf-unwind, the perf hit almost disappears:

           base           diff           difference (95% CI)
sys_time   1.274 ± 0.026  1.270 ± 0.025  [  -0.9% ..   +0.3%]
user_time  2.812 ± 0.023  2.822 ± 0.035  [  +0.1% ..   +0.7%]
wall_time  3.166 ± 0.047  3.174 ± 0.059  [  -0.2% ..   +0.7%]
samples    95             97

Just for fun, I measured the impact of -femit-dwarf-unwind on ld64
(base has the extra DWARF unwind info in the input object files,
diff doesn't):

           base           diff           difference (95% CI)
sys_time   1.128 ± 0.010  1.124 ± 0.023  [  -1.3% ..   +0.6%]
user_time  7.176 ± 0.030  7.106 ± 0.094  [  -1.5% ..   -0.4%]
wall_time  7.874 ± 0.041  7.795 ± 0.121  [  -1.7% ..   -0.3%]
samples    16             25

And for LLD:

           base           diff           difference (95% CI)
sys_time   1.315 ± 0.019  1.280 ± 0.019  [  -3.2% ..   -2.0%]
user_time  2.980 ± 0.022  2.822 ± 0.016  [  -5.5% ..   -5.0%]
wall_time  3.369 ± 0.038  3.175 ± 0.033  [  -6.2% ..   -5.3%]
samples    47             47

So parsing the extra EH frames is a lot more expensive for us than for
ld64. But given that we are quite a lot faster than ld64 to begin with,
I guess this isn't entirely unexpected...

Diff Detail

Event Timeline

int3 created this revision.Jul 12 2022, 12:12 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 12 2022, 12:12 AM
int3 requested review of this revision.Jul 12 2022, 12:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2022, 12:12 AM
int3 edited the summary of this revision. (Show Details)Jul 12 2022, 12:17 AM
int3 added subscribers: oontvoo, thakis.

@thakis @oontvoo y'all might want to have a look at this since deploying it w/o a perf will require adding a new flag to your builds

The patch description of D122258 doesn't mention any -femit-dwarf-unwind flags, and -femit-dwarf-unwind=no-compact-unwind to me isn't very self-explanatory. So just to make sure I got it right, from reading through the diff:

  • On Intel, if I pass -femit-dwarf-unwind=no-compact-unwind, clang will only emit dwarf unwind information for functions where compact unwind information can't express their unwinding behavior (i.e. almost nothing gets dwarf unwind info)
  • On arm, that's the default behavior even without that flag (?)

So we should add -femit-dwarf-unwind=no-compact-unwind to our builds, yes? Want me to make repro files at the same chromium rev with and without that flag? Is it sufficient to do that on intel, or does the flag _have_ an effect on arm?

(Also, FYI, in case you want to do more timing, I _think_ mold does eh frame handling (not 100% sure though) and it's advanced a lot in the last few weeks and can link Chromium Framework as of 3 weeks ago or so…)

int3 added a comment.Jul 12 2022, 10:01 AM

Want me to make repro files at the same chromium rev with and without that flag?

That would be great, yes :)

Is it sufficient to do that on intel, or does the flag _have_ an effect on arm?

D122258: [MC] Omit DWARF unwind info if compact unwind is present where eligible makes it so the default behavior on arm64 and arm64_32 is to omit the redundant DWARF info, so the flag only has an effect on other platforms. So Intel and I guess arm32 too (but of course LLD doesn't handle that so it's moot)

would you be ok with flipping the default? (ie., preserving current behaviour?) rationale being: we are adding to the [n]ever-ending list of knobs that one needs to set to have compatible behaviours with LD64 and I worry it's getting a bit ugly.
if not, can we also add a note for this to the documentation somewhere? Otherwise, we might "forget" to set it and spend countless hours debugging stuff :\

what's ld64's default?

int3 added a comment.Jul 12 2022, 7:08 PM

we are adding to the [n]ever-ending list of knobs that one needs to set to have compatible behaviours with LD64 and I worry it's getting a bit ugly

But parsing / relocating / pruning EH frames *is* what LD64 is doing, so to be compatible with it, we should turn this on :)

It's also a correctness issue -- w/o this flag, we aren't actually applying the right relocations to our EH frames.

Moreover the compiler flag is only needed for perf reasons, and it helps both LLD and LD64, so it's something that can likely be turned on globally (unless you are targeting old x86_64 platforms)

if not, can we also add a note for this to the documentation somewhere?

This should be documented somewhere for sure, but the question is where... it's not really a diff between ld64 and LLD, so ld64-vs-lld.rst is not the right place. I will of course put it in the release notes for sure when the time comes for the llvm-15 cut. And I suppose once we have a full-blown LLD-MachO doc page we can mention this compiler flag there?

int3 added a comment.Jul 12 2022, 7:13 PM

Re benchmarking mold: it doesn't look like it builds on macos yet, and I'm a little lazy to set up a Linux box for now...

int3 updated this revision to Diff 444343.Jul 13 2022, 11:22 AM

add to release notes

oontvoo accepted this revision.Jul 13 2022, 4:24 PM

Re benchmarking mold: it doesn't look like it builds on macos yet, and I'm a little lazy to set up a Linux box for now...

i have a linux box (probably well setup) - can run this if you'd like

This revision is now accepted and ready to land.Jul 13 2022, 4:24 PM
int3 added a comment.Jul 13 2022, 4:28 PM

Thanks! Thakis told me it does build on macOS so I tried again, and it turns out I was just building with the wrong compiler (locally-built copy instead of the system one). So I have a working build now, will benchmark later when I'm no longer using the machine :)

int3 added a comment.Jul 13 2022, 4:40 PM

Well, it looks like mold segfaults on the internal build that I was testing on, so no numbers for now

Here are repro files. Sorry it took a bit.

Chromium Framework for intel: https://drive.google.com/file/d/10uskfM01xf86XW3Qk8_eJDoQO9h5831W/view?usp=sharing

Same, but with -femit-dwarf-unwind=no-compact-unwind in cflags (but not in asmflags – from what I understand, the flag has no effect for asm files): https://drive.google.com/file/d/1GG1xAGhvIC3vrofJpskxuYSTfQz2w--N/view?usp=sharing

(at chromium rev 3dac9776078c561f24f,
use_goma = true
is_debug = false
symbol_level = 0

followed steps in https://bugs.llvm.org/show_bug.cgi?id=48657#c0

my local diff for adding the flag:

diff --git a/build/config/mac/BUILD.gn b/build/config/mac/BUILD.gn
index fa114a572138c..db0eedf7de21e 100644
--- a/build/config/mac/BUILD.gn
+++ b/build/config/mac/BUILD.gn
@@ -55,6 +55,9 @@ config("compiler") {
   if (export_libcxxabi_from_executables) {
     ldflags += [ "-Wl,-undefined,dynamic_lookup" ]
   }
+  cflags += [ "-femit-dwarf-unwind=no-compact-unwind" ]
 }

)

The flag shouldn't be needed on iOS since that's arm64…oh I guess maybe for iOS simulator it's needed too, right?

int3 edited the summary of this revision. (Show Details)Jul 13 2022, 6:09 PM

Here are repro files.

Thanks! I've updated the commit message with the numbers. Looks like -femit-dwarf-unwind does achieve its intended effect.

oh I guess maybe for iOS simulator it's needed too, right?

yep

int3 retitled this revision from [lld-macho] Enable EH frame parsing / pruning to [lld-macho] Enable EH frame relocation / pruning.Jul 13 2022, 6:13 PM
This revision was automatically updated to reflect the committed changes.

Thanks!

It'd be nice if the commit description was more explicit about -femit-dwarf-unwind=no-compact-unwind only having an effect on intel (or explaining the flag at all – as far as I can tell, there isn't a place that describes the flag well.)

d'oh, too slow :/

int3 added a comment.Jul 13 2022, 6:20 PM

Oh whoops I should have waited a bit heh. Well maybe I can add it somewhere else -- Options.td? I was looking at clang.rst too but it doesn't look like most of its flags are documented there

Oh whoops I should have waited a bit heh. Well maybe I can add it somewhere else -- Options.td? I was looking at clang.rst too but it doesn't look like most of its flags are documented there

+1 for Options.td

I don't think any user reads Options.td.

https://clang.llvm.org/docs/UsersManual.html looks like it might be the right place?

I don't think any user reads Options.td.

well that's where I usually looks up lld's related flags/options :)

https://clang.llvm.org/docs/UsersManual.html looks like it might be the right place?

int3 added a comment.Jul 14 2022, 6:49 AM

Okay, added it here: D129772: [clang] Document -femit-compact-unwind option in the User’s Manual

well that's where I usually looks up lld's related flags/options :)

The flag is mentioned there, though the eligible flag values and the reason for the flag's existence is not really explained. But judging from the other flags in that file, it isn't the best place for lengthy descriptions