This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Add initial support for chained fixups
ClosedPublic

Authored by BertalanD on Aug 24 2022, 7:26 AM.

Details

Summary

This commit adds support for chained fixups, which were introduced in
Apple's late 2020 OS releases. This format replaces the dyld opcodes
used for supplying rebase and binding information, and produces smaller
binaries and allows page-in linking.

A high-level overview of the format and my implementation can be found
in SyntheticSections.h.

This feature is not enabled by default for now, as further real world
testing is required to ensure there are no regressions.

Like in ld64, lazy binding is disabled when chained fixups are in use,
and the -init_offsets transformation is performed by default.

Diff Detail

Event Timeline

BertalanD created this revision.Aug 24 2022, 7:26 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 24 2022, 7:26 AM
Herald added a subscriber: mgrang. · View Herald Transcript
BertalanD requested review of this revision.Aug 24 2022, 7:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 7:26 AM
thakis added a subscriber: thakis.Aug 24 2022, 12:07 PM

Very cool. I haven't really looked at it yet. (I'll play with it once the prerequisites are landed; fewer patch invocations then :) )

lld/MachO/Driver.cpp
1571

Emit a diag if makeChainedFixups is true but the deployment target (config->platformInfo.minimum) is too old to support them.

Below are no action required:

  • Idea guying, not for this patch (and likely never): What does dyld do if a binary has both bind opcodes and chained fixups? Does it use the opcodes on older macOS (or iOS or…) versions but the fixups on newer versions?
  • nit: I'd s/makeChainedFixups/emitFixupChains/ (to be closer to the flag name), but meh. Oh, you made it match the section, not the flag. Even more meh then, but s/make/emit/ in the variable still seems marginally better to me.
lld/MachO/SyntheticSections.h
668

Maybe mention that the motivation is that text pages can be relocated lazily at page-in time, and that they don't become dirty and can be discarded and paged in again as needed. That is, the benefits are:

  • somewhat faster startup (no need to do all relocs at app startup)
  • more memory-efficient, since text pages can be discarded

That motivates why the fixups are per-page and then chained (since that's how they're accessed at page-in time).

BertalanD added inline comments.Aug 25 2022, 2:03 PM
lld/MachO/Driver.cpp
1571

Emit a diag if makeChainedFixups is true but the deployment target (config->platformInfo.minimum) is too old to support them.

Will do!

Idea guying, not for this patch (and likely never): What does dyld do if a binary has both bind opcodes and chained fixups? Does it use the opcodes on older macOS (or iOS or…) versions but the fixups on newer versions?

That would be an awesome hack. We could *maybe* make chained fixups' new binding behavior (always eager, uses the GOT instead of LazyPointerSection) work with bind opcodes, but it still wouldn't work with an older dyld. LC_DYLD_CHAINED_FIXUPS is defined as 0x34 | LC_REQ_DYLD, which means dyld will refuse to load the file with a "load command unknown" message if it does not know about it.

nit: I'd s/makeChainedFixups/emitFixupChains/ (to be closer to the flag name), but meh. Oh, you made it match the section, not the flag. Even more meh then, but s/make/emit/ in the variable still seems marginally better to me.

makeChainedFixups it is then. The double "fixup chains" and "chained fixups" naming is not ideal, but oh well. Apple's user-facing documentation uses the latter, that's what we should do as well IMO.

BertalanD added inline comments.Aug 25 2022, 3:17 PM
lld/MachO/Driver.cpp
1571

of course I meant emitFixupChains. note to self: don't respond to code review at 11 PM.

BertalanD updated this revision to Diff 455900.Aug 26 2022, 7:06 AM

Added diagnostics for too old deployment targets, rebase target address overflows, and tweaked comments.

BertalanD marked an inline comment as done.Aug 26 2022, 7:25 AM
BertalanD updated this revision to Diff 456019.Aug 26 2022, 2:11 PM

Attempt to fix the Windows build.

thakis added inline comments.Aug 28 2022, 5:45 AM
lld/test/MachO/chained-fixups-addend.s
15

I just noticed that otool -chained_fixups -dyld_info a.out prints dyld_info output before chained_fixups output. We should probably change llvm-otool to match (and then update this test – looks like that should be doable?)

BertalanD added inline comments.Aug 29 2022, 10:02 AM
lld/test/MachO/chained-fixups-addend.s
15

I believe the symbol table should also come before either of these. It would simplify the flat-namespace-interposable test. Could we also fix that in D132865?

thakis added inline comments.Aug 29 2022, 4:53 PM
lld/test/MachO/chained-fixups-addend.s
15

Do you mean -I output? That's already before the two others. Do you mean --syms output? That doesn't have an otool equivalent, so we can't match otool's order there (which D132865 is about).

Sorry, I'm not sure what you mean. I'm guessing you mean "It'd be convenient if objdump --syms output was before --chained-fixups and --dyld-info output". If so, sure go for it, but imho it doesn't belong in a patch that makes output consistent with otool. (…at least I think there's no way to trigger --syms output via llvm-otool?)

BertalanD marked 2 inline comments as done.Aug 29 2022, 11:47 PM
BertalanD added inline comments.
lld/test/MachO/chained-fixups-addend.s
15

What I meant to write (and spectacularly failed to do so) is that llvm-objdump --macho --syms --dyld-info should print the symbol table before the fixups just like it puts the symbol table before --bind or --rebase, for consistency. Not sure where I got the idea about Apple otool being like this. Anyway, I agree that I should put it up as a separate patch then.

BertalanD marked an inline comment as done.

Rebase on top of D132865 and D132947.

thakis added a comment.Sep 3 2022, 8:16 AM

Sorry that the review is taking so long! Here's one question about the driver code (and a few minor suggestions).

lld/MachO/Driver.cpp
933

(This seems unrelated. I'd land that separately if you want (no review needed), or just revert it. This runs just once, so std::vector is good enough, and in c++20 vector is constexpr anyways. And having to manually repeat the number of elements is a tiny bit inconvenient.)

982

Shouldn't this return false too (…or fall through to the return false two lines further down)? (If this is intentional, then this check should probably be below the arch and pic checks below – else we return true here without. checkout archs and pic!)

Does ld just warn when this happens? Maybe we should see if we can get away with making this an error.

Maybe it'd be good to have some tests for the diags here.

1705

If you want, you can land moving this further up in a separate commit too (
"""
[lld/mac] Move isPic up

Needed for chainedFixups patch.
No behavior change.
""") – just land, no review needed.

But keeping it here is fine too.

lld/MachO/Options.td
1234

I think you can move these out of undocumented. The help text says "experimental", that seems clear enough :)

(…and then we won't forget to move it out of undocumented when we enable it by default.)

BertalanD marked an inline comment as not done.Sep 4 2022, 11:33 AM

Thank you for the review!

lld/MachO/Driver.cpp
933

I know this change is absolutely insignificant, but this pointless allocation has been bothering me so much...

I'll land this change separately.

982

Shouldn't this return false too (…or fall through to the return false two lines further down)?

Oh, right. It should definitely fall through. I didn't want to make this a hard error, because ld64 doesn't have such a diagnostic at all. I figured we might want to play around with enabling it for unsupported targets to see what happens, and calling error() would have prevented that.

As for the rest of the diagnostics in this function:

  • Chained fixups do exist for arm64_32, but this patch does not attempt to implement the 32-bit encoding, which has some peculiarities due to the next offsets not being able to encode the whole range of a page. And I don't think we should make things more complicated for the sake of a probably soon-to-be-obsoleted platform.
  • ld64 falls back to generating dyld opcodes when PIE is disabled, even if chained fixups were explicitly requested with -fixup_chains. That's just weird.

Maybe it'd be good to have some tests for the diags here.

Agreed, I'm on it.

lld/MachO/Options.td
1234

I'll move it into grp_rare then. -fixup_chains is not documented in the ld64 man page, but I guess it's useful enough to move it to be near the less esoteric options.

thakis accepted this revision.Sep 16 2022, 7:26 AM

This is excellent.

Apologies for taking so long to take a real look.

A long list of nits below, but nothing substantial.

lld/MachO/Arch/ARM64Common.h
117

(this should be

encodePage21(&buf32[0], d, stubCode[0], pageBits(pointerVA) - pcPageBits);
encodePageOff12(&buf32[1], d, stubCode[1], pointerVA);

after rebasing across D133269)

lld/MachO/Driver.cpp
933

Software is one of the few areas where human brains have to deal with 9+ orders of magnitude within a relatively short timespan: a single bit can be a critical source of a bug, and right after that you might think about how to best organize gigabytes of data. You might try to shave off cycles form a very hot inner loop, or have a function that's 10^6 times as inefficient than it could be and it'll still finished imperceptibly fast as it's only called once.

This is really hard to wrap your brain around.

I get it, superfluous allocations feel silly. But a _single_ superfluous allocation that happens just one during startup is absolutely, completely irrelevant. (This one happens to go away by itself when we adopt C++20 too, but that fact really doesn't matter here.)

Go ahead and change it if you want (in a separate commit), but learning to quiet the part of the brain that screams and shouts "this! is! so! inefficient!" and considering if it actually matters is imho a very useful skill to have.

(In contrast, increasing readability and clarity code on the other hand is almost always worth it.)

1526

Maybe add something like "Like in ld64, enabling chained fixups enables the -init_offsets transformation" to commit message.

lld/MachO/SyntheticSections.cpp
340

nit: This conditional is duplicated in writeChainedFixup. Should it be in a file-local-static definedNeedsBind() helper?

346

nit: return here, keep non-chained path dedented and don't add an else

355

Maybe assert(reinterpret_cast<uintptr_t>(buf) % target->wordSize == 0 && "unaligned fixup address")? (optional)

357

nit: maybe & 0xf'ffff'ffff to make the truncation explicit?

373

(same)

2094

Since we always write this section: Can we end up in a situation where we write this section but pageStarts is empty (and calling back() is UB)?

AFAICT pageStarts is only added to if there actually are fixups, and a tiny program not calling anything might not have any (?)

2102

Out of curiosity: What are the implications of (not) doing this?

2106

(same question as above)

2140

Is the on-disk layout of this section described anywhere? Consider putting a short comment at the top of this function that describes the layout, maybe.

Maybe something like:

// LC_DYLD_CHAINED_FIXUPS data consists of (in this order):
// * A dyld_chained_fixups_header
// * A dyld_chained_starts_in_image
// * One dyld_chained_starts_in_segment per segment
// * List of all imports (dyld_chained_import, dyld_chained_import_addend, 
     or dyld_chained_import_addend64)
// * Names of imported symbols

(Do we have flexibility where to put the symbol name table? Does ld64 put it after the imports?)

2142

Maybe add // This is step 3 of the algorithm described in the class comment of ChainedFixupsSection.

2149

Explicitly list needed captures instead of saying &. & capture lists are bug magnets. (In this case it's obviously harmless, but it's imho still a good habit to almost never capture &)

2161

What is this hole for? Is it filled elsewhere?

i.e. maybe a comment like "dyld_chained_starts_in_image is followed by an uint32_t for each segment. Leave room for it, and fill it via the segStarts pointer. This is extremely obvious once you know it, but it took me a bit to figure out. (Now that I did, it's obvious to me too and I don't know why it took me that long. But it did.)

2168

Cute. But doesn't buf = alignTo<8>(buf); do the same thing in slightly less roundabout? :D

2178

nit: Maybe make writeImport return void and do buf += importEntrySize(importFormat) in the next line instead? feels a bit clearer to me – that way, writeImport has only a single responsibility.

(But a writing function returning how much it wrote is fairly common too, so up to you)

2196

Maybe add // This is step 2 of the algorithm described in the class comment of ChainedFixupsSection.

2197

nit: Should something somewhere error out if !isUInt<32>(symtabSize) since even dyld_chained_import_addend64::name_offset is too small then?

(I doubt anything will ever hit that; otoh you check most other bounds and it's like 2 lines)

lld/MachO/SyntheticSections.h
275

No change needed here, but: They're bound eagerly at page-in time on new OS versions, which in a way means they're always bound lazily as-needed, right?

(Comment is fine as-is, I'm asking just to make sure I'm understanding things. And there are OS versions that have chained fixups but no page-in linking.)

706

nit: dylibs are binaries too. s/binaries/executables/, but I'm guessing bundles (MH_BUNDLES) can have chained fixups as well, so maybe s/binaries and dylibs/binaries/ instead to cover all three.

735

Great comment :)

770

remove trailing extraneous ;, then clang-format will do a better job of formatting this line

(…maybe we should build llvm with -Wextra-semi :) )

lld/MachO/Writer.cpp
1094

A bit unfortunate that clang-format wants to reformat this, makes it hard to see that the change here is the addition of in.chainedFixups. Even if you land the reformatting without the addition first, everything will shift around when you add it and it's still a bit hard to see (but maybe slightly easier since the change is on the first touched line then). So maybe precommit the format change (and the change to std::array – but see above :P)

1200

Maybe add // This is step 5 of the algorithm described in the class comment of ChainedFixupsSection.

This revision is now accepted and ready to land.Sep 16 2022, 7:26 AM

Also, you mentioned elsewhere that lld "creates 5 more __DATA pages for Chromium (which apparently aren't page-in linked?) because LLD doesn't do ObjC optimizations". Could you file an issue for that with some details and reference that in a FIXME somewhere (commit message, maybe)?

Maybe include a short summary of missing pieces in the commit message might generally be good:

  • Link to that to-be-filed ObjC thing
  • DYLD_CHAINED_PTR_64_OFFSET support
  • …I think that's it?
BertalanD added inline comments.Sep 17 2022, 8:36 AM
lld/MachO/Driver.cpp
933

This is great advice, I'm going to keep this in mind the next time I do changes like this one.

(fwiw, this change has already landed in 4f688d00f43056a94869aff32d5cdcddedf88a84 though)

lld/MachO/SyntheticSections.cpp
2094

SegmentInfo objects are created by traversing ChainedFixupsSection::locations, so they should always have at least one entry in pageStarts.

2102

I do not know, unfortunately I couldn't find any design docs that would explain the rationale.

The source releases aren't too helpful either: ld64 mentions // move to DYLD_CHAINED_PTR_64_OFFSET when OS is ready (https://github.com/apple-opensource/ld64/blob/e28c028b20af187a16a7161d89e91868a450cadc/src/ld/OutputFile.cpp#L5249), but not much more.

lld/MachO/SyntheticSections.h
275

They're bound eagerly at page-in time on new OS versions, which in a way means they're always bound lazily as-needed, right?

That's correct. They are bound lazily on macOS 13+ in the sense that symbol lookups aren't performed upfront at load time. I wouldn't call it true lazy binding though, as the observable behavior is the same as eager binding: symbols from dlopen()-ed dylibs can't fulfill undefined references:

❯ cat undef.c
int undef() { return 42; }
❯ clang -dynamiclib undef.c -o libundef.dylib
❯ cat main.c
#include <dlfcn.h>
int undef();

int main() { dlopen("./libundef.dylib", RTLD_NOW); return undef(); }
❯ clang main.c -undefined dynamic_lookup
ld: warning: -undefined dynamic_lookup may not work with chained fixups
❯ ./a.out
dyld[59352]: symbol not found in flat namespace '_undef'
[1]    59352 abort      ./a.out

I believe this is not only because the GOT entries for undef and dlopen end up on the same page (so calling dlopen would cause both symbols to be bound): the WWDC video mentions that dlopen()-ed dylibs continue to be relocated by dyld, so it wouldn't surprise me if the kernel knew nothing about their symbols at all even after they've been loaded.

BertalanD updated this revision to Diff 461012.Sep 17 2022, 9:47 AM
BertalanD edited the summary of this revision. (Show Details)

rebased and started addressing review comments

int3 added a subscriber: int3.Sep 19 2022, 11:59 AM
int3 added inline comments.
lld/MachO/SyntheticSections.cpp
2159–2160

I think return buf - reinterpret_cast<uint8_t *>(header) should work too and is a bit more succinct

lld/MachO/SyntheticSections.h
276
keith accepted this revision.Sep 22 2022, 4:09 PM
BertalanD updated this revision to Diff 464524.Oct 1 2022, 2:50 PM

Cleaned up writeStub() implementations by taking the GOT/lazy pointer address as a parameter. Addressed review comments from Jez.

BertalanD marked 15 inline comments as done and an inline comment as not done.Oct 3 2022, 1:07 PM
BertalanD added inline comments.
lld/MachO/SyntheticSections.cpp
355

We diagnose unaligned fixups when chaining them together in Writer::buildFixupChains(). The format actually allows 8k+4-addresses.

BertalanD updated this revision to Diff 464795.Oct 3 2022, 1:43 PM
BertalanD marked an inline comment as not done.

Added driver tests, tweaked help messages.

BertalanD marked 3 inline comments as done.Oct 3 2022, 1:44 PM
BertalanD added inline comments.
lld/MachO/Options.td
1234

Moving out of grp_undocumented feels wrong to me. I made the help text slightly more useful though.

This revision was landed with ongoing or failed builds.Oct 4 2022, 2:51 AM
This revision was automatically updated to reflect the committed changes.

I'll be mostly AFK for a couple of hours, so feel free to revert immediately if this commit starts breaking bots.