Page MenuHomePhabricator

[lld-macho] Add basic support for linking against dylibs
ClosedPublic

Authored by int3 on Mar 16 2020, 2:07 PM.

Details

Summary

This diff implements:

  • dylib loading (much of which is being restored from @pcc and @ruiu's original work)
  • The GOT_LOAD relocation, which allows us to load non-lazy dylib symbols
  • Basic bind opcode emission, which tells dyld how to populate the GOT

Depends on D75382.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I agree with the previous discussions about llvm-mc vs yaml2obj. Most of time an assembly test is easier to read. In some cases yaml2obj can be more convenient. One notable inability of llvm-mc is that it cannot create invalid tests.

As to binary blobs: I am another disliker. I hope we can avoid them if possible. This may require some contribution to yaml2obj.

  1. contribute yaml2obj 2) improve lld Mach-O add YAML tests

is better than

  1. improve lld Mach-O and add binary tests 2) contribute to yaml2obj 3) convert binary tests to YAML tests

Note that the latter can increase the repository size indefinitely with these binary blobs. I am concerned about some practice in some areas of llvm-project, in particular, XCOFF in Object, and llvm-objdump Mach-O.

int3 added a comment.Mar 18 2020, 8:01 PM

Just to be clear, none of the tests right now would benefit from additional work on yaml2obj. We just need lld-macho to produce dylibs, which I expect I'll have have a diff for by EOW. In the meantime I'll leave the tests as-is in YAML, and I'm happy to wait for the dylib-production diff to be approved before landing everything as a stack, so we don't have to worry about sullying trunk with undesired test formats :)

int3 updated this revision to Diff 251544.Mar 19 2020, 8:18 PM

alphabetize

int3 updated this revision to Diff 252809.Mar 26 2020, 5:24 AM

clang-tidy

int3 updated this revision to Diff 253051.Mar 27 2020, 1:00 AM

use objdump --bind instead of obj2yaml to get symbol addresses

int3 updated this revision to Diff 253272.Mar 27 2020, 6:24 PM

fix test

int3 added a reviewer: Ktwu.Mar 28 2020, 1:37 AM
int3 added a reviewer: gkm.
ruiu added inline comments.Mar 30 2020, 11:30 PM
lld/MachO/InputFiles.cpp
215

Can you add a brief comment here that we initialize dylibName.

223

Ditto -- we are initializing symbols.

lld/MachO/Symbols.h
79

Isn't uint32_t enough? It's not a big deal, but I'd use uint32_t for an index of GOT.

lld/MachO/SyntheticSections.h
34

So GOT has no data but dynamic relocations? If so, I'd leave a brief comment here.

36

nit: add a blank line before a label.

lld/MachO/Writer.cpp
300

Replace auto with a concrete type.

357–360

nit: we usually do

os << foo << bar
   << baz << fizz;

instead of

os << foo << bar;
os << baz << fizz;
int3 marked 8 inline comments as done.Mar 31 2020, 11:39 AM
int3 added inline comments.
lld/MachO/Writer.cpp
357–360

sgtm. Any thoughts on whether we should make a wrapper class like the ByteBuffer in ReaderWriter/MachO? I didn't see a whole lot of usage of raw_ostreams in the other lld implementations but I didn't find any abstraction layer over them either

int3 updated this revision to Diff 253940.Mar 31 2020, 11:40 AM
int3 marked an inline comment as done.

address comments

int3 updated this revision to Diff 253943.Mar 31 2020, 11:51 AM

static_cast

Harbormaster completed remote builds in B51160: Diff 253940.
int3 updated this revision to Diff 253976.Mar 31 2020, 12:56 PM
int3 edited the summary of this revision. (Show Details)
int3 removed reviewers: Ktwu, gkm.
int3 removed a subscriber: grimar.

clang-format

int3 added reviewers: Ktwu, gkm.Mar 31 2020, 1:01 PM
int3 added a subscriber: grimar.
int3 marked an inline comment as done.Mar 31 2020, 4:03 PM
int3 added inline comments.
lld/MachO/SyntheticSections.h
34

Actually, I'm not sure it's right to say that it has "no data" -- it does actually occupy space in the binary (unlike the __PAGEZERO segment, a section can't have zero filesize while having a non-zero-length address range), so I think saying it contains all zeroes is more accurate.

int3 updated this revision to Diff 254040.Mar 31 2020, 4:26 PM

rephrase comments

int3 updated this revision to Diff 254050.Mar 31 2020, 5:14 PM
int3 marked 2 inline comments as done.

forgot to address one more

ruiu added inline comments.Mar 31 2020, 9:52 PM
lld/MachO/Writer.cpp
357–360

We don't usually use raw_ostream because we usually write to the output file in two passes: in the first pass, we compute an offset for each output element, and then in the second pass, we let each output element to copy itself to the output file.

However, that two-pass technique cannot be used to create a ULEB-encoded stuff. because we don't know the exact size of each output element until we fix the file contents. Constructing an ULEB-encoded section contents are naturally sequential. So the usage of raw_ostream in for this section is fine. I don't see any problem with that.

This LGTM with the comments. @ruiu, @MaskRay, any further comments? I think it's fine leaving the tests as YAML for now, since subsequent diffs in the stack will enabling LLD to produce dylibs itself and we'll replace the tests at that point.

lld/MachO/InputFiles.cpp
225

Is it correct to just consider LC_SYMTAB for this, or should we also be consulting LC_DYSYMTAB and/or the export trie? If so, we don't have to address that in this diff, but we should add a TODO.

We should also add a TODO about handling a missing LC_SYMTAB.

lld/MachO/InputFiles.h
37

Where is this used?

lld/MachO/SyntheticSections.h
24

It's kinda interesting to me that GotSection would inherit from InputSection, since it's not an input section, of course.

LLD ELF explains this design decision like so:

// Synthetic sections are designed as input sections as opposed to
// output sections because we want to allow them to be manipulated
// using linker scripts just like other input sections from regular
// files.

For Mach-O, we don't have linker scripts, so perhaps that reasoning doesn't hold.

COFF uses Chunks instead of InputSections, since there's more synthesized data (https://lld.llvm.org/NewLLD.html#important-data-structures). I think Mach-O has a good amount of linker-synthesized data and input section processing as well, so a COFF-like design might make more sense. It's hard to say until we have more of the implementation though.

For now, I think leaving this inheritance as-is is fine, but we should add a TODO to reconsider it once we've implemented more of the linker. @ruiu, @MaskRay, what do you think?

int3 marked 2 inline comments as done.Apr 9 2020, 5:11 PM
int3 added inline comments.
lld/MachO/InputFiles.cpp
225

I *think* the export trie is only consumed by dyld. Let me see what the deal is with LC_DYSYMTAB...

lld/MachO/InputFiles.h
37

error("dylib " + path + " missing LC_ID_DYLIB load command"); in InputFiles.cpp

int3 updated this revision to Diff 256500.Apr 9 2020, 8:56 PM
int3 marked an inline comment as done.

add TODO; use reinterpret_cast

int3 added inline comments.Apr 9 2020, 8:57 PM
lld/MachO/InputFiles.cpp
225

Okay, I looked through loader.h; didn't entirely understand it, but I think LC_DYSYMTAB just contains additional info about / indirect references to symbols in LC_SYMTAB.

I'll give other reviewers a few more days to take a look.

lld/MachO/InputFiles.cpp
235

We shouldn't be doing this for symbols that are undefined in the dylib's symbol table, right? (And if so, we should add a test for that, though I don't think that'll be possible till after your follow-ups.)

Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table, presumably to be able to check if all undefined symbols for dylibs are satisfied when linking an executable. Whether or not we do that depends on ld64's behavior there.

MaskRay added inline comments.Apr 15 2020, 10:31 AM
lld/MachO/InputFiles.cpp
235

Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table, presumably to be able to check if all undefined symbols for dylibs are satisfied when linking an executable.

Yes. See D57385 and D57569.

MaskRay added inline comments.Apr 15 2020, 10:32 AM
lld/MachO/InputFiles.cpp
235

It is also used to set the exportDynamic property of a symbol. When linking an executable, the symbols used by DSOs will be added to .dynsym

int3 marked 2 inline comments as done.Apr 15 2020, 3:24 PM
int3 added inline comments.
lld/MachO/InputFiles.cpp
235

I think re-exported symbols also appear as undefined in the symbol table... I need to figure out how to distinguish them. I'll add a TODO.

lld/MachO/InputFiles.h
37

Actually, I see that we already have getName(). It just doesn't always work because we neglected to copy the strings from our input arguments before discarding them. I'll fix it and use that

int3 updated this revision to Diff 257875.Apr 15 2020, 3:39 PM

use getName(); add TODO for undefined symbols

smeenai accepted this revision.Apr 20 2020, 4:20 PM

LGTM with the comments addressed. I can commit this for you once they are.

lld/MachO/Driver.cpp
92–93

I believe these should actually be placed on the search path *after* any user-specified -L directories. At least that's what ld64 appears to do, based on the output from its -v option.

lld/MachO/InputSection.cpp
34

Nit: when one branch of an if-else has braces, all the others should too.

36

(same here)

This revision is now accepted and ready to land.Apr 20 2020, 4:20 PM
int3 marked an inline comment as done.Apr 20 2020, 10:37 PM
int3 added inline comments.
lld/MachO/Driver.cpp
92–93

Oh good catch. I'll make our -v option emit that too so we can test for it

This revision was automatically updated to reflect the committed changes.