This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
MachO/
-
Config.h
-
Driver.cpp
-
InputFiles.cpp
-
docs/
-
ReleaseNotes.rst

Differential D129540

[lld-macho] Enable EH frame relocation / pruning
ClosedPublic

Authored by int3 on Jul 12 2022, 12:12 AM.

Download Raw Diff

Details

Reviewers

MaskRay
oontvoo

Group Reviewers

Restricted Project

Commits

rG403d61aeddec: [lld-macho] Enable EH frame relocation / pruning

Summary

This just removes the code that gates the logic. The main issue here is
perf impact: without D122258: [MC] Omit DWARF unwind info if compact unwind is present where eligible, LLD takes a significant perf hit because
it now has to do a lot more work in the input parsing phase. But with
that change to eliminate unnecessary EH frames from input object files,
the perf overhead here is minimal. Concretely, here are the numbers for
some builds as measured on my 16-core Mac Pro:

chromium_framework

This is without the use of -femit-dwarf-unwind=no-compact-unwind:

           base           diff           difference (95% CI)
sys_time   1.826 ± 0.019  1.962 ± 0.034  [  +6.5% ..   +8.4%]
user_time  9.306 ± 0.054  9.926 ± 0.082  [  +6.2% ..   +7.1%]
wall_time  8.225 ± 0.068  8.947 ± 0.128  [  +8.0% ..   +9.6%]
samples    15             22

With that flag enabled, the regression mostly disappears, as hoped:

           base           diff           difference (95% CI)
sys_time   1.839 ± 0.062  1.866 ± 0.068  [  -0.9% ..   +3.8%]
user_time  9.452 ± 0.068  9.490 ± 0.067  [  -0.1% ..   +0.9%]
wall_time  8.383 ± 0.127  8.452 ± 0.114  [  -0.1% ..   +1.8%]
samples    17             21

Unnamed internal app

Without -femit-dwarf-unwind, this is the perf hit:

           base           diff           difference (95% CI)
sys_time   1.372 ± 0.029  1.317 ± 0.024  [  -4.6% ..   -3.5%]
user_time  2.835 ± 0.028  2.980 ± 0.027  [  +4.8% ..   +5.4%]
wall_time  3.205 ± 0.079  3.383 ± 0.066  [  +4.9% ..   +6.2%]
samples    102            83

With -femit-dwarf-unwind, the perf hit almost disappears:

           base           diff           difference (95% CI)
sys_time   1.274 ± 0.026  1.270 ± 0.025  [  -0.9% ..   +0.3%]
user_time  2.812 ± 0.023  2.822 ± 0.035  [  +0.1% ..   +0.7%]
wall_time  3.166 ± 0.047  3.174 ± 0.059  [  -0.2% ..   +0.7%]
samples    95             97

Just for fun, I measured the impact of -femit-dwarf-unwind on ld64
(base has the extra DWARF unwind info in the input object files,
diff doesn't):

           base           diff           difference (95% CI)
sys_time   1.128 ± 0.010  1.124 ± 0.023  [  -1.3% ..   +0.6%]
user_time  7.176 ± 0.030  7.106 ± 0.094  [  -1.5% ..   -0.4%]
wall_time  7.874 ± 0.041  7.795 ± 0.121  [  -1.7% ..   -0.3%]
samples    16             25

And for LLD:

           base           diff           difference (95% CI)
sys_time   1.315 ± 0.019  1.280 ± 0.019  [  -3.2% ..   -2.0%]
user_time  2.980 ± 0.022  2.822 ± 0.016  [  -5.5% ..   -5.0%]
wall_time  3.369 ± 0.038  3.175 ± 0.033  [  -6.2% ..   -5.3%]
samples    47             47

So parsing the extra EH frames is a lot more expensive for us than for
ld64. But given that we are quite a lot faster than ld64 to begin with,
I guess this isn't entirely unexpected...

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

int3 created this revision.Jul 12 2022, 12:12 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 12 2022, 12:12 AM

int3 requested review of this revision.Jul 12 2022, 12:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2022, 12:12 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

@thakis @oontvoo y'all might want to have a look at this since deploying it w/o a perf will require adding a new flag to your builds

Harbormaster completed remote builds in B174801: Diff 443840.Jul 12 2022, 12:25 AM

The patch description of D122258 doesn't mention any -femit-dwarf-unwind flags, and -femit-dwarf-unwind=no-compact-unwind to me isn't very self-explanatory. So just to make sure I got it right, from reading through the diff:

On Intel, if I pass -femit-dwarf-unwind=no-compact-unwind, clang will only emit dwarf unwind information for functions where compact unwind information can't express their unwinding behavior (i.e. almost nothing gets dwarf unwind info)
On arm, that's the default behavior even without that flag (?)

So we should add -femit-dwarf-unwind=no-compact-unwind to our builds, yes? Want me to make repro files at the same chromium rev with and without that flag? Is it sufficient to do that on intel, or does the flag _have_ an effect on arm?

(Also, FYI, in case you want to do more timing, I _think_ mold does eh frame handling (not 100% sure though) and it's advanced a lot in the last few weeks and can link Chromium Framework as of 3 weeks ago or so…)

Want me to make repro files at the same chromium rev with and without that flag?

That would be great, yes :)

Is it sufficient to do that on intel, or does the flag _have_ an effect on arm?

D122258: [MC] Omit DWARF unwind info if compact unwind is present where eligible makes it so the default behavior on arm64 and arm64_32 is to omit the redundant DWARF info, so the flag only has an effect on other platforms. So Intel and I guess arm32 too (but of course LLD doesn't handle that so it's moot)

would you be ok with flipping the default? (ie., preserving current behaviour?) rationale being: we are adding to the [n]ever-ending list of knobs that one needs to set to have compatible behaviours with LD64 and I worry it's getting a bit ugly.
if not, can we also add a note for this to the documentation somewhere? Otherwise, we might "forget" to set it and spend countless hours debugging stuff :\

what's ld64's default?

we are adding to the [n]ever-ending list of knobs that one needs to set to have compatible behaviours with LD64 and I worry it's getting a bit ugly

But parsing / relocating / pruning EH frames *is* what LD64 is doing, so to be compatible with it, we should turn this on :)

It's also a correctness issue -- w/o this flag, we aren't actually applying the right relocations to our EH frames.

Moreover the compiler flag is only needed for perf reasons, and it helps both LLD and LD64, so it's something that can likely be turned on globally (unless you are targeting old x86_64 platforms)

if not, can we also add a note for this to the documentation somewhere?

This should be documented somewhere for sure, but the question is where... it's not really a diff between ld64 and LLD, so ld64-vs-lld.rst is not the right place. I will of course put it in the release notes for sure when the time comes for the llvm-15 cut. And I suppose once we have a full-blown LLD-MachO doc page we can mention this compiler flag there?

Re benchmarking mold: it doesn't look like it builds on macos yet, and I'm a little lazy to set up a Linux box for now...

add to release notes

Herald added a reviewer: MaskRay. · View Herald TranscriptJul 13 2022, 11:22 AM

Harbormaster completed remote builds in B175182: Diff 444343.Jul 13 2022, 1:06 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 13 2022, 1:06 PM

In D129540#3647173, @int3 wrote:

Re benchmarking mold: it doesn't look like it builds on macos yet, and I'm a little lazy to set up a Linux box for now...

i have a linux box (probably well setup) - can run this if you'd like

This revision is now accepted and ready to land.Jul 13 2022, 4:24 PM

Thanks! Thakis told me it does build on macOS so I tried again, and it turns out I was just building with the wrong compiler (locally-built copy instead of the system one). So I have a working build now, will benchmark later when I'm no longer using the machine :)

Well, it looks like mold segfaults on the internal build that I was testing on, so no numbers for now

Here are repro files. Sorry it took a bit.

Chromium Framework for intel: https://drive.google.com/file/d/10uskfM01xf86XW3Qk8_eJDoQO9h5831W/view?usp=sharing

Same, but with -femit-dwarf-unwind=no-compact-unwind in cflags (but not in asmflags – from what I understand, the flag has no effect for asm files): https://drive.google.com/file/d/1GG1xAGhvIC3vrofJpskxuYSTfQz2w--N/view?usp=sharing

(at chromium rev 3dac9776078c561f24f,
use_goma = true
is_debug = false
symbol_level = 0

followed steps in https://bugs.llvm.org/show_bug.cgi?id=48657#c0

my local diff for adding the flag:

diff --git a/build/config/mac/BUILD.gn b/build/config/mac/BUILD.gn
index fa114a572138c..db0eedf7de21e 100644
--- a/build/config/mac/BUILD.gn
+++ b/build/config/mac/BUILD.gn
@@ -55,6 +55,9 @@ config("compiler") {
   if (export_libcxxabi_from_executables) {
     ldflags += [ "-Wl,-undefined,dynamic_lookup" ]
   }
+  cflags += [ "-femit-dwarf-unwind=no-compact-unwind" ]
 }

)

The flag shouldn't be needed on iOS since that's arm64…oh I guess maybe for iOS simulator it's needed too, right?

Here are repro files.

Thanks! I've updated the commit message with the numbers. Looks like -femit-dwarf-unwind does achieve its intended effect.

oh I guess maybe for iOS simulator it's needed too, right?

yep

int3 retitled this revision from [lld-macho] Enable EH frame parsing / pruning to [lld-macho] Enable EH frame relocation / pruning.Jul 13 2022, 6:13 PM

Closed by commit rG403d61aeddec: [lld-macho] Enable EH frame relocation / pruning (authored by int3). · Explain WhyJul 13 2022, 6:14 PM

This revision was automatically updated to reflect the committed changes.

int3 added a commit: rG403d61aeddec: [lld-macho] Enable EH frame relocation / pruning.

Thanks!

It'd be nice if the commit description was more explicit about -femit-dwarf-unwind=no-compact-unwind only having an effect on intel (or explaining the flag at all – as far as I can tell, there isn't a place that describes the flag well.)

d'oh, too slow :/

Oh whoops I should have waited a bit heh. Well maybe I can add it somewhere else -- Options.td? I was looking at clang.rst too but it doesn't look like most of its flags are documented there

In D129540#3650381, @int3 wrote:

Oh whoops I should have waited a bit heh. Well maybe I can add it somewhere else -- Options.td? I was looking at clang.rst too but it doesn't look like most of its flags are documented there

+1 for Options.td

I don't think any user reads Options.td.

https://clang.llvm.org/docs/UsersManual.html looks like it might be the right place?

In D129540#3651283, @thakis wrote:

I don't think any user reads Options.td.

well that's where I usually looks up lld's related flags/options :)

https://clang.llvm.org/docs/UsersManual.html looks like it might be the right place?

Okay, added it here: D129772: [clang] Document -femit-compact-unwind option in the User’s Manual

well that's where I usually looks up lld's related flags/options :)

The flag is mentioned there, though the eligible flag values and the reason for the flag's existence is not really explained. But judging from the other flags in that file, it isn't the best place for lengthy descriptions

Revision Contents

Path

Size

lld/

MachO/

Config.h

3 lines

Driver.cpp

1 line

InputFiles.cpp

4 lines

docs/

ReleaseNotes.rst

6 lines

Diff 444482

lld/MachO/Config.h

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	struct Configuration {
bool emitDataInCodeInfo = false;		bool emitDataInCodeInfo = false;
bool emitEncryptionInfo = false;		bool emitEncryptionInfo = false;
bool timeTraceEnabled = false;		bool timeTraceEnabled = false;
bool dataConst = false;		bool dataConst = false;
bool dedupLiterals = true;		bool dedupLiterals = true;
bool omitDebugInfo = false;		bool omitDebugInfo = false;
bool warnDylibInstallName = false;		bool warnDylibInstallName = false;
bool ignoreOptimizationHints = false;		bool ignoreOptimizationHints = false;
// Temporary config flag that will be removed once we have fully implemented
// support for __eh_frame.
bool parseEhFrames = false;
uint32_t headerPad;		uint32_t headerPad;
uint32_t dylibCompatibilityVersion = 0;		uint32_t dylibCompatibilityVersion = 0;
uint32_t dylibCurrentVersion = 0;		uint32_t dylibCurrentVersion = 0;
uint32_t timeTraceGranularity = 500;		uint32_t timeTraceGranularity = 500;
unsigned optimize;		unsigned optimize;
std::string progName;		std::string progName;

// For `clang -arch arm64 -arch x86_64`, clang will:		// For `clang -arch arm64 -arch x86_64`, clang will:
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

lld/MachO/Driver.cpp

Show First 20 Lines • Show All 1,299 Lines • ▼ Show 20 Lines	config->dedupLiterals =
args.hasFlag(OPT_deduplicate_literals, OPT_icf_eq, false) \|\|		args.hasFlag(OPT_deduplicate_literals, OPT_icf_eq, false) \|\|
config->icfLevel != ICFLevel::none;		config->icfLevel != ICFLevel::none;
config->warnDylibInstallName = args.hasFlag(		config->warnDylibInstallName = args.hasFlag(
OPT_warn_dylib_install_name, OPT_no_warn_dylib_install_name, false);		OPT_warn_dylib_install_name, OPT_no_warn_dylib_install_name, false);
config->ignoreOptimizationHints = args.hasArg(OPT_ignore_optimization_hints);		config->ignoreOptimizationHints = args.hasArg(OPT_ignore_optimization_hints);
config->callGraphProfileSort = args.hasFlag(		config->callGraphProfileSort = args.hasFlag(
OPT_call_graph_profile_sort, OPT_no_call_graph_profile_sort, true);		OPT_call_graph_profile_sort, OPT_no_call_graph_profile_sort, true);
config->printSymbolOrder = args.getLastArgValue(OPT_print_symbol_order);		config->printSymbolOrder = args.getLastArgValue(OPT_print_symbol_order);
config->parseEhFrames = static_cast<bool>(getenv("LLD_IN_TEST"));

// FIXME: Add a commandline flag for this too.		// FIXME: Add a commandline flag for this too.
config->zeroModTime = getenv("ZERO_AR_DATE");		config->zeroModTime = getenv("ZERO_AR_DATE");

std::array<PlatformType, 3> encryptablePlatforms{		std::array<PlatformType, 3> encryptablePlatforms{
PLATFORM_IOS, PLATFORM_WATCHOS, PLATFORM_TVOS};		PLATFORM_IOS, PLATFORM_WATCHOS, PLATFORM_TVOS};
config->emitEncryptionInfo =		config->emitEncryptionInfo =
args.hasFlag(OPT_encryptable, OPT_no_encryption,		args.hasFlag(OPT_encryptable, OPT_no_encryption,
▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

lld/MachO/InputFiles.cpp

Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	if (sectionType(sec.flags) == S_CSTRING_LITERALS \|\|
// FIXME: parallelize this?		// FIXME: parallelize this?
cast<CStringInputSection>(isec)->splitIntoPieces();		cast<CStringInputSection>(isec)->splitIntoPieces();
} else {		} else {
isec = make<WordLiteralInputSection>(section, data, align);		isec = make<WordLiteralInputSection>(section, data, align);
}		}
section.subsections.push_back({0, isec});		section.subsections.push_back({0, isec});
} else if (auto recordSize = getRecordSize(segname, name)) {		} else if (auto recordSize = getRecordSize(segname, name)) {
splitRecords(*recordSize);		splitRecords(*recordSize);
} else if (config->parseEhFrames && name == section_names::ehFrame &&		} else if (name == section_names::ehFrame &&
segname == segment_names::text) {		segname == segment_names::text) {
splitEhFrames(data, *sections.back());		splitEhFrames(data, *sections.back());
} else if (segname == segment_names::llvm) {		} else if (segname == segment_names::llvm) {
if (config->callGraphProfileSort && name == section_names::cgProfile)		if (config->callGraphProfileSort && name == section_names::cgProfile)
checkError(parseCallGraph(data, callGraph));		checkError(parseCallGraph(data, callGraph));
// ld64 does not appear to emit contents from sections within the __LLVM		// ld64 does not appear to emit contents from sections within the __LLVM
// segment. Symbols within those sections point to bitcode metadata		// segment. Symbols within those sections point to bitcode metadata
// instead of actual symbols. Global symbols within those sections could		// instead of actual symbols. Global symbols within those sections could
▲ Show 20 Lines • Show All 753 Lines • ▼ Show 20 Lines	Section s = StringSwitch<Section >(sec->name)
.Case(section_names::compactUnwind, &compactUnwindSection)		.Case(section_names::compactUnwind, &compactUnwindSection)
.Case(section_names::ehFrame, &ehFrameSection)		.Case(section_names::ehFrame, &ehFrameSection)
.Default(nullptr);		.Default(nullptr);
if (s)		if (s)
*s = sec;		*s = sec;
}		}
if (compactUnwindSection)		if (compactUnwindSection)
registerCompactUnwind(*compactUnwindSection);		registerCompactUnwind(*compactUnwindSection);
if (config->parseEhFrames && ehFrameSection)		if (ehFrameSection)
registerEhFrames(*ehFrameSection);		registerEhFrames(*ehFrameSection);
}		}

template <class LP> void ObjFile::parseLazy() {		template <class LP> void ObjFile::parseLazy() {
using Header = typename LP::mach_header;		using Header = typename LP::mach_header;
using NList = typename LP::nlist;		using NList = typename LP::nlist;

auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());		auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());
▲ Show 20 Lines • Show All 1,064 Lines • Show Last 20 Lines

lld/docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	MinGW Improvements			MinGW Improvements
	------------------			------------------

	* ...			* ...

	MachO Improvements			MachO Improvements
	------------------			------------------

	* Item 1.			* We now support proper relocation and pruning of EH frames. Note: this
				comes at some performance overhead on x86_64 builds, and we recommend adding
				the ``-femit-compact-unwind=no-compact-unwind`` compile flag to avoid it.
				(`D129540 <https://reviews.llvm.org/D129540>`_,
				`D122258 <https://reviews.llvm.org/D122258>`_)

	WebAssembly Improvements			WebAssembly Improvements
	------------------------			------------------------