Page MenuHomePhabricator

[lld-macho] Enable copy-on-write for input buffers
AbandonedPublic

Authored by int3 on Jul 16 2021, 11:41 PM.

Details

Reviewers
gkm
Group Reviewers
Restricted Project
Summary

The Mach-O format makes extensive use of embedded addends,
particularly for x86_64. This makes ICF's job a bit more difficult:
sections that are otherwise semantically identical may have different
raw bytes due to the embedded addends. Hashing and comparing the raw
bytes naively means that we miss folding opportunities.

This diff canonicalizes these sections by writing over those embedded
addends with zeros. In order to make this possible, the our previously
read-only MemoryBuffers need to be made into copy-on-write.

This requires some awkward casts, but I'm not sure there's a better way to do
things...

Diff Detail

Unit TestsFailed

TimeTest
2,910 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
3,040 msx64 debian > libarcher.parallel::parallel-firstprivate.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-firstprivate.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-firstprivate.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-firstprivate.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-firstprivate.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-firstprivate.c
2,840 msx64 debian > libarcher.parallel::parallel-simple2.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c
2,750 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
2,750 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
View Full Test Results (16 Failed)

Event Timeline

int3 created this revision.Jul 16 2021, 11:41 PM
Herald added a project: Restricted Project. · View Herald Transcript
int3 requested review of this revision.Jul 16 2021, 11:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2021, 11:41 PM
thakis added a subscriber: thakis.Jul 19 2021, 5:38 PM

How much of a difference does ICF'ing these make?

int3 added a comment.Jul 19 2021, 8:53 PM

Right now, pretty small. (0.3% for the internal build I'm testing, and 0.002% for chromium_framework.) But I'm building upon this to apply ICF to functions that have unwind info, since CompactUnwindEntries use unsigned section relocations -- which embed their addends -- to refer to functions. This has a much larger impact -- 3.2% for the internal build, about doubling the impact that ICF currently has. Not sure about chromium_framework yet, as my implementation is currently crashing.

OTOH, we might want CompactUnwindEntries to eventually use a special-cased subclass of InputSection for performance reasons, and that subclass could just ignore the functionAddress pointer when doing hashing / equality comparison. So deduping code with embedded addends is not a strict prerequisite. However, it does make the initial implementation of unwind ICF simpler, and moreover the code complexity here is largely self-contained -- unlike adding a new subclass, we don't have to add extra handling code throughout the linker.

Looks pretty good to me, other than the comment about the casting.

llvm/lib/Support/MemoryBufferRef.cpp
21

Is there any way to prevent casting if this was called on a non-writeable buffer? I was thinking of some flag based approach that is set for writeable buffer and have that be determine whether casting can be done. That way, this can be a bit safer than having to let the developer have context about whether this is safe or not.

Any idea what ld64 does in this case? Might be worth checking (should be testable, I think - give it a really big input section that it needs to canonicalize/fold, and see if that produces more total memory usage than if the section is small?) - and if it doesn't have the same negative effect on memory usage, maybe that's a hint that there's some other lower-(memory)-cost solution to the problem?

Any idea what ld64 does in this case? Might be worth checking (should be testable, I think - give it a really big input section that it needs to canonicalize/fold, and see if that produces more total memory usage than if the section is small?) - and if it doesn't have the same negative effect on memory usage, maybe that's a hint that there's some other lower-(memory)-cost solution to the problem?

I believe ld64 takes a different approach: it treats the fixups as-if-zero when comparing/hashing content, without sacrificing the mmap optimization.

int3 added a comment.Jul 26 2021, 4:30 PM

I am not able to get ld64 to dedup even the simplest of functions, let alone more complicated cases like these. Or perhaps I'm missing something in how ld64 is supposed to be invoked... this is what I've tried:

(base) ~/tmp: cat test.s
.text
.weak_definition _foo
.weak_definition _bar
.weak_definition _baz

_foo:
  ret

_bar:
  ret

_baz:
  ret

.subsections_via_symbols

(base) ~/tmp: llvm-mc -filetype=obj -triple=x86_64-apple-darwin19.0.0 test.s > test.o
(base) ~/tmp: ld test.o -o test -dylib
(base) ~/tmp: llvm-objdump -d test

test:	file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000003fb5 <_foo>:
    3fb5: c3                           	retq

0000000000003fb6 <_bar>:
    3fb6: c3                           	retq

0000000000003fb7 <_baz>:
    3fb7: c3                           	retq

Right now, pretty small. (0.3% for the internal build I'm testing, and 0.002% for chromium_framework.) But I'm building upon this to apply ICF to functions that have unwind info, since CompactUnwindEntries use unsigned section relocations -- which embed their addends -- to refer to functions. This has a much larger impact -- 3.2% for the internal build, about doubling the impact that ICF currently has. Not sure about chromium_framework yet, as my implementation is currently crashing.

How much size win do you need to catch up to ld64?

(It's probably clear that I'm asking because this seems like a somewhat ugly change conceptually, and I'm wondering if it's worth it.)

I am not able to get ld64 to dedup even the simplest of functions, let alone more complicated cases like these.

It's not safe to dedup symbols in general, since it's not always known if their addresses are compared.

Mach-O can be marked "autohide" or .weak_def_can_be_hidden, something LLVM sets on symbols when GlobalValue::canBeOmittedFromSymbolTable (see https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Globals.cpp#L311 or grep for MCSA_WeakDefAutoPrivate) -- that is, when a symbol is linkonce_odr and unnamed_addr. GlobalVariables that are also const can just be local_unnamed_addr. For C++ code, this should be true for most inlines and templates.

ld64 reuses this autohide flag to trigger dedup only when it's safe.

int3 planned changes to this revision.Aug 26 2021, 12:16 PM
ormris removed a subscriber: ormris.Jan 24 2022, 11:44 AM
int3 abandoned this revision.Mar 16 2022, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 2:33 PM