This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Support calls to functions in dylibs
ClosedPublic

Authored by int3 on Apr 15 2020, 9:47 PM.

Details

Summary

This diff implements lazy symbol binding -- very similar to the PLT
mechanism in ELF.

ELF's .plt section is broken up into two sections in Mach-O:
StubsSection and StubHelperSection. Calls to functions in dylibs will
end up calling into StubsSection, which contains indirect jumps to
addresses stored in the LazyPointerSection (the counterpart to ELF's
.plt.got).

Initially, the LazyPointerSection contains addresses that point into one
of the entry points in the middle of the StubHelperSection. The code in
StubHelperSection will push on the stack an offset into the
LazyBindingSection. The push is followed by a jump to the beginning of
the StubHelperSection (similar to PLT0), which then calls into
dyld_stub_binder. dyld_stub_binder is a non-lazily bound symbol, so this
call looks it up in the GOT.

The stub binder will look up the bind opcodes in the LazyBindingSection
at the given offset. The bind opcodes will tell the binder to update the
address in the LazyPointerSection to point to the symbol, so that
subsequent calls don't have to redo the symbol resolution. The binder
will then jump to the resolved symbol.

Depends on D78269.

Diff Detail

Event Timeline

int3 created this revision.Apr 15 2020, 9:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2020, 9:47 PM
MaskRay added inline comments.Apr 27 2020, 2:24 PM
lld/MachO/InputSection.cpp
11–12

Why is #include "SyntheticSections.h" dropped?

lld/test/MachO/dylink-lazy.s
21

--disassemble-all can be shorten as -D. --no-show-raw-insn may not be needed for -D

42

Consider increasing indentation (i.e. add one space) for instructions covered by the section Disassembly of section __TEXT,__stub_helper:

int3 updated this revision to Diff 260850.Apr 29 2020, 12:11 AM
int3 marked 6 inline comments as done.

address comments

lld/MachO/InputSection.cpp
11–12

It was originally included so we could access in.got. But now we're just using target->getDylibSymbolVA() from Target.h.

lld/test/MachO/dylink-lazy.s
21

Will change. Though it looks like --no-show-raw-insn is still needed

42

Looking at a smattering of lld-ELF's tests, I found files that used zero, two, four, and eight spaces :|

An odd number of spaces feels a bit, uh, odd :P I think I'll go with two

smeenai added inline comments.Apr 29 2020, 7:12 PM
lld/MachO/Arch/X86_64.cpp
74

I'd add a comment that the + 1 is because RIP on x86-64 reflects the next instruction instead of the current instruction.

Or perhaps that can be a more general comment above this section, since it's relevant for the other stubs as well.

106

Nit: can type be a RelocationInfoType instead of a uint8_t?

113

Are there any other sorts of relocations we would conceivably need to handle for this in the future? If not, this should be an llvm_unreachable

From playing around with this a bit, clang generates an X86_64_RELOC_GOT for dylib function calls when you compile with -fno-plt, so perhaps that's at least one other case we need to care about (eventually).

lld/MachO/Driver.cpp
161

Perhaps add a TODO to replace this with a stub tbd file when we have TAPI support?

lld/MachO/InputFiles.h
68

Why can't we just use the other constructor with your dummy MemoryBufferRef?

lld/MachO/SyntheticSections.cpp
265

What's the purpose of this return, given we're gonna error anyway?

lld/MachO/SyntheticSections.h
102

The comment is great!

151

Huh, interesting.

CC @Ktwu

lld/test/MachO/dylink-lazy.s
49

How come we aren't checking the value here?

int3 updated this revision to Diff 261122.Apr 29 2020, 9:23 PM

factor out more target-specific code

int3 marked 5 inline comments as done.Apr 29 2020, 9:45 PM
int3 added inline comments.
lld/MachO/Arch/X86_64.cpp
113

I'm not sure if there are others, but I guess I can make this an assert for now and relax it if/when we discover more

lld/MachO/InputFiles.h
68

The DylibFile constructor attempts to read from the MemoryBufferRef and throws errors when it doesn't find what it expects

lld/MachO/SyntheticSections.cpp
265

error doesn't cause the program to exit, for that we need fatal. But I guess this should really be a fatal.

lld/MachO/SyntheticSections.h
102

Thanks! It took me quite a while to figure out how everything fit together, so it was fun to explain it :)

lld/test/MachO/dylink-lazy.s
49

I guess you mean on line 48? That's the offset into the bind opcode stream, and there's no nice arithmetic to describe it. Hard-coding the value will be brittle -- it will vary based on which entries appear earlier in the stream, and how large they are (which includes the length of their symbol names). There's the counterargument that these things aren't likely to change that often in our tests of course...

lld/MachO/Arch/X86_64.cpp
92

i'm wondering if we can replace these numbers (7, 11, 15, ... ) either with some expression (like sizeof(..) etc) or named constants or add comments

smeenai added inline comments.Apr 29 2020, 10:28 PM
lld/MachO/SyntheticSections.cpp
265

I know that error doesn't cause the program to exit right away, but it will eventually cause the program to exit with an error. Given that, I was wondering if not having the return here and instead falling through to the os << below would cause any harm.

lld/test/MachO/dylink-lazy.s
49

Yeah, we should strike a balance between testing everything and having brittle tests, but in this scenario, assuming we don't modify the test itself, I'd say it's worth checking the actual offset. I'd even consider checking the bind opcode bytes we're emitting (assuming those aren't likely to change from future linker changes, only from future changes to this test itself), to ensure we don't accidentally regress those in the future.

int3 updated this revision to Diff 261160.Apr 30 2020, 2:28 AM
int3 marked 10 inline comments as done.

address comments

lld/MachO/Arch/X86_64.cpp
92

lld-ELF uses a bunch of literal constants too, but I agree it's not ideal. I've added offsets to the comments and added a helper function, hope it's more readable now :)

106

The input parameters would need to be static_cast. Alternatively we could change the definition of relocation_info::type to be a RelocationInfoType, and declare the enum as enum RelocationInfoType : uint8_t, but that currently runs into compile-time errors in the for the lld-macho backend due to an ambiguous type conversion (since what was formerly a strict enum is now maybe also a uint8_t). Seems like too much work to sort it out right now

lld/MachO/SyntheticSections.cpp
265

Chatted offline, and we agreed that fatal is fine for things that we haven't implemented yet. It's not a user error, so error() -- which allows for more than one error diagnostic to be emitted -- isn't really useful as the user can't do much to address the problem. And the code is not unreachable with valid inputs, so llvm_unreachable isn't quite right either.

lld/test/MachO/dylink-lazy.s
49

Fair enough. Re the bind opcodes, they're being indirectly tested via the "Bind table" and "Lazy bind table" checks above, so I think we've got that covered

Harbormaster failed remote builds in B55270: Diff 261162!
Harbormaster failed remote builds in B55268: Diff 261160!
int3 updated this revision to Diff 262247.May 5 2020, 3:56 PM

rebase

smeenai accepted this revision.May 5 2020, 10:26 PM

LGTM

This revision is now accepted and ready to land.May 5 2020, 10:26 PM
This revision was automatically updated to reflect the committed changes.