This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
MachO/
-
Arch/
4/5
X86_64.cpp
-
CMakeLists.txt
-
Config.h
1/2
Driver.cpp
2/3
InputFiles.h
10/17
InputFiles.cpp
-
InputSection.h
2
InputSection.cpp
-
Options.td
-
OutputSegment.h
-
SymbolTable.h
-
SymbolTable.cpp
1/1
Symbols.h
3/4
SyntheticSections.h
-
SyntheticSections.cpp
3/3
Target.h
-
Writer.h
6/7
Writer.cpp
-
test/MachO/
-
MachO/
-
Inputs/
3/4
goodbye-dylib.yaml
-
hello-dylib.yaml
-
no-id-dylib.yaml
3/3
dylink.s
-
missing-dylib.s
-
no-id-dylink.s
-
search-paths.test

Differential D76252

[lld-macho] Add basic support for linking against dylibs
ClosedPublic

Authored by int3 on Mar 16 2020, 2:07 PM.

Download Raw Diff

Details

Reviewers

pete
lgerbarg
ruiu
pcc
MaskRay
mtrent
kledzik
smeenai
respindola
bbaren
alexander-shaposhnikov
jhenderson
christylee
Ktwu
gkm

Commits

rG060efd24c7f0: [lld-macho] Add basic support for linking against dylibs

Summary

This diff implements:

dylib loading (much of which is being restored from @pcc and @ruiu's original work)
The GOT_LOAD relocation, which allows us to load non-lazy dylib symbols
Basic bind opcode emission, which tells dyld how to populate the GOT

Depends on D75382.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

int3 created this revision.Mar 16 2020, 2:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2020, 2:07 PM

Herald added subscribers: llvm-commits, mgorny. · View Herald Transcript

int3 added reviewers: pete, lgerbarg, ruiu, pcc, MaskRay, mtrent, kledzik, smeenai, respindola, bbaren, alexander-shaposhnikov, jhenderson, christylee.Mar 16 2020, 2:08 PM

Herald added a subscriber: ormris. · View Herald TranscriptMar 16 2020, 2:08 PM

int3 retitled this revision from [lld] Add basic support for linking against dylibs to [lld-macho] Add basic support for linking against dylibs.Mar 16 2020, 2:15 PM

delete extra line

Harbormaster completed remote builds in B49359: Diff 250627.Mar 16 2020, 2:46 PM

Harbormaster completed remote builds in B49361: Diff 250629.Mar 16 2020, 3:19 PM

Initial set of comments. I'm still going through the rest.

lld/MachO/Arch/X86_64.cpp
47	Nit: change the two spaces between "are" and "only" to one
lld/MachO/InputFiles.cpp
78	Universal binary support could technically be a diff by itself. I'd prefer to do so, although it's small enough that I'm not super opposed to folding it into this diff. We definitely need to add tests for it either way. llvm-lipo is complete enough to be used for the tests. I'd also note in the commit message that some of this functionality is being restored from @pcc and @ruiu's initial commit, for completeness.
89	We should add some sort of checking here to prevent going out of bounds of the file (if you have a ridiculously large `nfat_arch` in a malformed file, for example). We should also add tests for the error conditions here if possible (assuming yaml2obj lets us construct malformed universal binaries that would trigger the conditions).
99	I believe ld64 warns if you give it a universal binary input file and it can't find any slices for the architecture being linked. We should do the same.
246	Is it valid to not have an `LC_ID_DYLIB` load command? If it is, should we figure out a fallback `dylibName`? If not, we should error. (Also, we should add tests for this either way if possible.)
251	Is it valid to not have a symtab? If not, we should error.
266	Unnecessary comment change? We aren't adding support for archive files yet. (CC @Ktwu, this'll need to be adjusted when archive file support is added.)
lld/test/MachO/Inputs/goodbye-dylib.yaml
2	Description for hello and goodbye seems swapped.

MaskRay added inline comments.Mar 16 2020, 9:48 PM

lld/MachO/Arch/X86_64.cpp
38	This message enhancement should be moved to the initial patch.
lld/test/MachO/Inputs/goodbye-dylib.yaml
2	Use `##` for comments and `#` for RUN and CHECK lines.
109	Are all these fields important? I think we probably should think what fields are optional and improve yaml2obj. Otherwise the verbosity can make tests a lot more complex.
lld/test/MachO/dylink.s
4	`>` -> `-o`
8	Should obj2yaml and llvm-objdump -d tests be separated?

int3 edited the summary of this revision. (Show Details)Mar 16 2020, 10:17 PM

ruiu added inline comments.Mar 16 2020, 10:20 PM

lld/MachO/Arch/X86_64.cpp
38	Yeah, but I'm not worried about the exact order and the contents of initial patch series, as this is going to be submitted as soon as the first patch is submitted, and no one will be using the feature only with the first patch, so I think I'm fine with this.
lld/MachO/Target.h
18	Are you going to make it work only for 64-bit platforms? IIRC, both macOS and iOS are dropping 32-bit app support, so it is probably a good choice to not support 32-bit, but just to confirm.
lld/MachO/Writer.cpp
316	I wonder if you directly dyn_cast to `DylibSymbol ` as `r.target.dyn_cast<DylibSymbol >()`.
334	Can you add a comment as to what dyld info contains?

address comments

lld/MachO/InputFiles.cpp
78	Fair enough, will split into another diff
251	Not sure why, but unlike the LC_ID_DYLIB case, trying to create a dylib without LC_SYMTAB makes yaml2obj crash. Probably some offsets / indices are off. Looking at ld64's source, it seems that not having an LC_SYMTAB is only an error if LC_DYLD_INFO(_ONLY) is also missing. The dylib I looked at had LC_DYLD_INFO_ONLY, so I'm not sure if such a case arises in practice... Anyway, proper handling of a missing LC_SYMTAB should probably cover the ObjectFile case as well, so I'm inclined to punt on it for now.
lld/test/MachO/Inputs/goodbye-dylib.yaml
109	@smeenai earlier asked about deleting LC_SYMTAB, and I ran into some issues with yaml2obj while trying it (see my other comment). Given that using yaml2obj is mostly a stop-gap measure until lld itself can emit dylibs, I'd prefer not to spend too much time making the test case minimal.

int3 edited the summary of this revision. (Show Details)Mar 16 2020, 10:48 PM

add comment about LC_DYLD_INFO

int3 marked 2 inline comments as done.Mar 16 2020, 10:50 PM

int3 added inline comments.

lld/MachO/Arch/X86_64.cpp
38	no worries, wasn't hard to split out
lld/MachO/Target.h
18	I'm not sure we're 100% not going to support it, but we're definitely prioritizing 64-bit for now
lld/MachO/Writer.cpp
316	just tried, doesn't work sadly

MaskRay added inline comments.Mar 16 2020, 10:50 PM

lld/MachO/Arch/X86_64.cpp
38	I would argue that when sending patches as a series, we should make extra efforts keeping each commit clean and avoiding unneeded code move if possible... I think the this patch is good in its current status but will hope Jez can make some final cleanups to avoid subsequent formatting/cleanup changes.

address another comment

lld/test/MachO/dylink.s
8	not possible with the current setup; I'm using obj2yaml to get the address of the GOT, and to check that the order of the symbols references in it match the addresses printed by objdump

Harbormaster completed remote builds in B49393: Diff 250692.Mar 16 2020, 11:25 PM

Harbormaster completed remote builds in B49398: Diff 250697.Mar 17 2020, 12:08 AM

Harbormaster completed remote builds in B49397: Diff 250696.

As to the tests, we generally don't use yaml2obj to write tests (except for COFF which is historically using yaml2obj). If you are using yaml2obj just until lld itself is able to produce files used by tests, I don't think you have to spend too much time writing tests in yaml2obj. Instead, you can submit binary files for now and then replace them later.

lld/MachO/Target.h
18	Then please leave a comment here saying that we are currently supporting only 64-bit targets as macOS and iOS are deprecating 32-bit apps.

The no-id-dylink.s test needs yaml2obj to generate an invalid dylib, but yeah I can replace the other two with binary blobs

In D76252#1926984, @int3 wrote:

The no-id-dylink.s test needs yaml2obj to generate an invalid dylib, but yeah I can replace the other two with binary blobs

We are trying to avoid binary blob in LLVM binary utilities + lld. @grimar has done lots of work in this area. Can you figure out how much effort it takes to implement relevant features in yaml2obj Mach-O? I hope we can avoid binary blobs early.

I am indifferent as to whether we use binary blobs or yaml2obj for now, but I expect I'll have a diff up to make lld produce dylibs within the next week.

In D76252#1927200, @MaskRay wrote:

In D76252#1926984, @int3 wrote:

The no-id-dylink.s test needs yaml2obj to generate an invalid dylib, but yeah I can replace the other two with binary blobs

We are trying to avoid binary blob in LLVM binary utilities + lld. @grimar has done lots of work in this area. Can you figure out how much effort it takes to implement relevant features in yaml2obj Mach-O? I hope we can avoid binary blobs early.

Yeah but my point is that we are not going to keep using yaml2obj for most test cases. Once lld is able to produce binary files used in the tests, we can replace the binary blobs with an invocation of lld. So checking in binary files as a temporary measure is pretty much okay to me, and it actually look like a good engineering decision to me. At this early stage of development, I think things don't always have to be done in the "right" way, as it usually has lots of scaffolding which will be replaced or removed later.

rebase

Harbormaster completed remote builds in B49553: Diff 251001.Mar 18 2020, 12:31 AM

In D76252#1926005, @ruiu wrote:

As to the tests, we generally don't use yaml2obj to write tests (except for COFF which is historically using yaml2obj). If you are using yaml2obj just until lld itself is able to produce files used by tests, I don't think you have to spend too much time writing tests in yaml2obj. Instead, you can submit binary files for now and then replace them later.

Hi @ruiu,

I'm just doing a quick drive by comment here, since I'm not really doing much LLD work at the moment myself. Why are you opposed to yaml2obj being used to generate test inputs instead of llvm-mc? Surely yaml2obj gives more direct control over what is actually in the input? In llvm-mc you only have relatively limited control over some things like relocations.

It seems to me like yaml2obj is a better instrument to be using for the vast majority of test cases, with only a limited set using llvm-mc as a catch-all test to make sure that we don't miss some odd interaction. Of course, this is all making the assumption that mach-o yaml2obj is straightforward to use (I have a feeling it might not be).

In D76252#1928398, @jhenderson wrote:

In D76252#1926005, @ruiu wrote:

As to the tests, we generally don't use yaml2obj to write tests (except for COFF which is historically using yaml2obj). If you are using yaml2obj just until lld itself is able to produce files used by tests, I don't think you have to spend too much time writing tests in yaml2obj. Instead, you can submit binary files for now and then replace them later.

Hi @ruiu,

I'm just doing a quick drive by comment here, since I'm not really doing much LLD work at the moment myself. Why are you opposed to yaml2obj being used to generate test inputs instead of llvm-mc? Surely yaml2obj gives more direct control over what is actually in the input? In llvm-mc you only have relatively limited control over some things like relocations.

It seems to me like yaml2obj is a better instrument to be using for the vast majority of test cases, with only a limited set using llvm-mc as a catch-all test to make sure that we don't miss some odd interaction. Of course, this is all making the assumption that mach-o yaml2obj is straightforward to use (I have a feeling it might not be).

You can write tests with yaml2obj too, but if you take a look at ELF, don't you think that we can write fine-grained test cases with llvm-mc too? I want tests to be realistic input files, and in that respect, test cases written in llvm-mc are better than ones in obj2yaml.

I also found that tests written in llvm-mc are easier to maintain. For example, imagine that you already have a test file containing tests for relocations and you want to add a new type of relocation. Then, you'll have to not only add a new relocation to a yaml file but also add a few extra bytes with some addend at a right bit-wise position to the .text section, which isn't easy to do by hand. So, I guess you'll end up writing a test in assembly, compile it and then convert it to yaml -- but if that's the case, we should write tests in assembly in the first place. On the other hand, I didn't find a test case that is easy to write in yaml but not in assembly. There might be some, but that shouldn't be too many. So it looks to me that llvm-mc is a better choice to write test files.

In D76252#1928449, @ruiu wrote:

In D76252#1928398, @jhenderson wrote:

In D76252#1926005, @ruiu wrote:

As to the tests, we generally don't use yaml2obj to write tests (except for COFF which is historically using yaml2obj). If you are using yaml2obj just until lld itself is able to produce files used by tests, I don't think you have to spend too much time writing tests in yaml2obj. Instead, you can submit binary files for now and then replace them later.

Hi @ruiu,

I'm just doing a quick drive by comment here, since I'm not really doing much LLD work at the moment myself. Why are you opposed to yaml2obj being used to generate test inputs instead of llvm-mc? Surely yaml2obj gives more direct control over what is actually in the input? In llvm-mc you only have relatively limited control over some things like relocations.

It seems to me like yaml2obj is a better instrument to be using for the vast majority of test cases, with only a limited set using llvm-mc as a catch-all test to make sure that we don't miss some odd interaction. Of course, this is all making the assumption that mach-o yaml2obj is straightforward to use (I have a feeling it might not be).

You can write tests with yaml2obj too, but if you take a look at ELF, don't you think that we can write fine-grained test cases with llvm-mc too? I want tests to be realistic input files, and in that respect, test cases written in llvm-mc are better than ones in obj2yaml.

I also found that tests written in llvm-mc are easier to maintain. For example, imagine that you already have a test file containing tests for relocations and you want to add a new type of relocation. Then, you'll have to not only add a new relocation to a yaml file but also add a few extra bytes with some addend at a right bit-wise position to the .text section, which isn't easy to do by hand. So, I guess you'll end up writing a test in assembly, compile it and then convert it to yaml -- but if that's the case, we should write tests in assembly in the first place. On the other hand, I didn't find a test case that is easy to write in yaml but not in assembly. There might be some, but that shouldn't be too many. So it looks to me that llvm-mc is a better choice to write test files.

I think there is no universal approach and I agree with both of you. For example, for testing relocation relaxations I'd definetely prefer using YAML because it is important to get the particular op-codes and relocations in the inputs.
It is probaly easier to use YAMLs rather than using llvm-mc and verifying inputs with objdump or alike. For many other things, using llvm-mc is probably a bit more natural.

What I do not agree with is that commiting binaries is OK. From my one last experience after working on
yaml2obj and converting LLVM binaries, I found them are source of all kind of problems. If we can use yaml2obj for that out of box, I think we should.
If we can't then probably it is OK to use binaries temporary, but probably a bug should be reported about lacking features of yaml2obj and the
test should have a FIXME comment with a link.

In D76252#1928528, @grimar wrote:

In D76252#1928449, @ruiu wrote:

In D76252#1928398, @jhenderson wrote:

In D76252#1926005, @ruiu wrote:

As to the tests, we generally don't use yaml2obj to write tests (except for COFF which is historically using yaml2obj). If you are using yaml2obj just until lld itself is able to produce files used by tests, I don't think you have to spend too much time writing tests in yaml2obj. Instead, you can submit binary files for now and then replace them later.

Hi @ruiu,

I'm just doing a quick drive by comment here, since I'm not really doing much LLD work at the moment myself. Why are you opposed to yaml2obj being used to generate test inputs instead of llvm-mc? Surely yaml2obj gives more direct control over what is actually in the input? In llvm-mc you only have relatively limited control over some things like relocations.

It seems to me like yaml2obj is a better instrument to be using for the vast majority of test cases, with only a limited set using llvm-mc as a catch-all test to make sure that we don't miss some odd interaction. Of course, this is all making the assumption that mach-o yaml2obj is straightforward to use (I have a feeling it might not be).

You can write tests with yaml2obj too, but if you take a look at ELF, don't you think that we can write fine-grained test cases with llvm-mc too? I want tests to be realistic input files, and in that respect, test cases written in llvm-mc are better than ones in obj2yaml.

I also found that tests written in llvm-mc are easier to maintain. For example, imagine that you already have a test file containing tests for relocations and you want to add a new type of relocation. Then, you'll have to not only add a new relocation to a yaml file but also add a few extra bytes with some addend at a right bit-wise position to the .text section, which isn't easy to do by hand. So, I guess you'll end up writing a test in assembly, compile it and then convert it to yaml -- but if that's the case, we should write tests in assembly in the first place. On the other hand, I didn't find a test case that is easy to write in yaml but not in assembly. There might be some, but that shouldn't be too many. So it looks to me that llvm-mc is a better choice to write test files.

I think there is no universal approach and I agree with both of you. For example, for testing relocation relaxations I'd definetely prefer using YAML because it is important to get the particular op-codes and relocations in the inputs.
It is probaly easier to use YAMLs rather than using llvm-mc and verifying inputs with objdump or alike. For many other things, using llvm-mc is probably a bit more natural.

What I do not agree with is that commiting binaries is OK. From my one last experience after working on
yaml2obj and converting LLVM binaries, I found them are source of all kind of problems. If we can use yaml2obj for that out of box, I think we should.
If we can't then probably it is OK to use binaries temporary, but probably a bug should be reported about lacking features of yaml2obj and the
test should have a FIXME comment with a link.

I think that we are on the same page. I'm not recommending submitting binary files as a permanent measure. What I'm saying is that if a test file can eventually be produced by mach-o lld itself, then we should allow binary files as a temporary measure. For example, it is okay to submit a shared object file as a binary file as a temporary measure until lld is able to produce a shared library file.

I agree with the previous discussions about llvm-mc vs yaml2obj. Most of time an assembly test is easier to read. In some cases yaml2obj can be more convenient. One notable inability of llvm-mc is that it cannot create invalid tests.

As to binary blobs: I am another disliker. I hope we can avoid them if possible. This may require some contribution to yaml2obj.

contribute yaml2obj 2) improve lld Mach-O add YAML tests

is better than

improve lld Mach-O and add binary tests 2) contribute to yaml2obj 3) convert binary tests to YAML tests

Note that the latter can increase the repository size indefinitely with these binary blobs. I am concerned about some practice in some areas of llvm-project, in particular, XCOFF in Object, and llvm-objdump Mach-O.

Just to be clear, none of the tests right now would benefit from additional work on yaml2obj. We just need lld-macho to produce dylibs, which I expect I'll have have a diff for by EOW. In the meantime I'll leave the tests as-is in YAML, and I'm happy to wait for the dylib-production diff to be approved before landing everything as a stack, so we don't have to worry about sullying trunk with undesired test formats :)

alphabetize

Harbormaster completed remote builds in B49830: Diff 251544.Mar 19 2020, 8:50 PM

Ktwu mentioned this in D76742: [lld-macho] Add basic symbol table output.Mar 24 2020, 5:14 PM

Ktwu added a child revision: D76742: [lld-macho] Add basic symbol table output.Mar 24 2020, 5:21 PM

clang-tidy

int3 added a child revision: D76839: [lld-macho] Extend SyntheticSections to cover all segment load commands.Mar 26 2020, 5:24 AM

Harbormaster completed remote builds in B50522: Diff 252809.Mar 26 2020, 6:28 AM

use objdump --bind instead of obj2yaml to get symbol addresses

int3 mentioned this in D76908: [lld-macho] Add support for emitting dylibs with a single symbol.Mar 27 2020, 1:12 AM

Harbormaster failed remote builds in B50662: Diff 253051!Mar 27 2020, 1:34 AM

fix test

Harbormaster failed remote builds in B50761: Diff 253272!Mar 27 2020, 7:17 PM

int3 added a reviewer: Ktwu.Mar 28 2020, 1:37 AM

int3 added a reviewer: gkm.

ruiu added inline comments.Mar 30 2020, 11:30 PM

lld/MachO/InputFiles.cpp
215	Can you add a brief comment here that we initialize `dylibName`.
223	Ditto -- we are initializing `symbols`.
lld/MachO/Symbols.h
79	Isn't uint32_t enough? It's not a big deal, but I'd use uint32_t for an index of GOT.
lld/MachO/SyntheticSections.h
34	So GOT has no data but dynamic relocations? If so, I'd leave a brief comment here.
36	nit: add a blank line before a label.
lld/MachO/Writer.cpp
300	Replace auto with a concrete type.
357–360	nit: we usually do os << foo << bar << baz << fizz; instead of os << foo << bar; os << baz << fizz;

int3 marked 8 inline comments as done.Mar 31 2020, 11:39 AM

int3 added inline comments.

lld/MachO/Writer.cpp
357–360	sgtm. Any thoughts on whether we should make a wrapper class like the ByteBuffer in ReaderWriter/MachO? I didn't see a whole lot of usage of `raw_ostream`s in the other lld implementations but I didn't find any abstraction layer over them either

address comments

static_cast

Harbormaster completed remote builds in B51163: Diff 253943.Mar 31 2020, 12:33 PM

Harbormaster completed remote builds in B51160: Diff 253940.

clang-format

int3 added reviewers: Ktwu, gkm.Mar 31 2020, 1:01 PM

int3 added a subscriber: grimar.

int3 marked an inline comment as done.Mar 31 2020, 4:03 PM

int3 added inline comments.

lld/MachO/SyntheticSections.h
34	Actually, I'm not sure it's right to say that it has "no data" -- it does actually occupy space in the binary (unlike the __PAGEZERO segment, a section can't have zero filesize while having a non-zero-length address range), so I think saying it contains all zeroes is more accurate.

rephrase comments

Harbormaster completed remote builds in B51215: Diff 254040.Mar 31 2020, 5:06 PM

forgot to address one more

Harbormaster completed remote builds in B51223: Diff 254050.Mar 31 2020, 6:12 PM

ruiu added inline comments.Mar 31 2020, 9:52 PM

lld/MachO/Writer.cpp
357–360	We don't usually use `raw_ostream` because we usually write to the output file in two passes: in the first pass, we compute an offset for each output element, and then in the second pass, we let each output element to copy itself to the output file. However, that two-pass technique cannot be used to create a ULEB-encoded stuff. because we don't know the exact size of each output element until we fix the file contents. Constructing an ULEB-encoded section contents are naturally sequential. So the usage of `raw_ostream` in for this section is fine. I don't see any problem with that.

This LGTM with the comments. @ruiu, @MaskRay, any further comments? I think it's fine leaving the tests as YAML for now, since subsequent diffs in the stack will enabling LLD to produce dylibs itself and we'll replace the tests at that point.

lld/MachO/InputFiles.cpp
225	Is it correct to just consider LC_SYMTAB for this, or should we also be consulting LC_DYSYMTAB and/or the export trie? If so, we don't have to address that in this diff, but we should add a TODO. We should also add a TODO about handling a missing LC_SYMTAB.
lld/MachO/InputFiles.h
37	Where is this used?
lld/MachO/SyntheticSections.h
24	It's kinda interesting to me that GotSection would inherit from InputSection, since it's not an input section, of course. LLD ELF explains this design decision like so: // Synthetic sections are designed as input sections as opposed to // output sections because we want to allow them to be manipulated // using linker scripts just like other input sections from regular // files. For Mach-O, we don't have linker scripts, so perhaps that reasoning doesn't hold. COFF uses Chunks instead of InputSections, since there's more synthesized data (https://lld.llvm.org/NewLLD.html#important-data-structures). I think Mach-O has a good amount of linker-synthesized data and input section processing as well, so a COFF-like design might make more sense. It's hard to say until we have more of the implementation though. For now, I think leaving this inheritance as-is is fine, but we should add a TODO to reconsider it once we've implemented more of the linker. @ruiu, @MaskRay, what do you think?

int3 marked 2 inline comments as done.Apr 9 2020, 5:11 PM

int3 added inline comments.

lld/MachO/InputFiles.cpp
225	I think the export trie is only consumed by dyld. Let me see what the deal is with LC_DYSYMTAB...
lld/MachO/InputFiles.h
37	`error("dylib " + path + " missing LC_ID_DYLIB load command");` in InputFiles.cpp

add TODO; use reinterpret_cast

int3 added inline comments.Apr 9 2020, 8:57 PM

lld/MachO/InputFiles.cpp
225	Okay, I looked through loader.h; didn't entirely understand it, but I think LC_DYSYMTAB just contains additional info about / indirect references to symbols in LC_SYMTAB.

Harbormaster failed remote builds in B52628: Diff 256500!Apr 9 2020, 9:25 PM

int3 removed a child revision: D76742: [lld-macho] Add basic symbol table output.Apr 10 2020, 12:17 AM

int3 mentioned this in D76839: [lld-macho] Extend SyntheticSections to cover all segment load commands.Apr 10 2020, 12:21 AM

update

Harbormaster failed remote builds in B52887: Diff 256899!Apr 12 2020, 6:42 PM

I'll give other reviewers a few more days to take a look.

lld/MachO/InputFiles.cpp
235	We shouldn't be doing this for symbols that are undefined in the dylib's symbol table, right? (And if so, we should add a test for that, though I don't think that'll be possible till after your follow-ups.) Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table, presumably to be able to check if all undefined symbols for dylibs are satisfied when linking an executable. Whether or not we do that depends on ld64's behavior there.

MaskRay added inline comments.Apr 15 2020, 10:31 AM

lld/MachO/InputFiles.cpp
235	Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table, presumably to be able to check if all undefined symbols for dylibs are satisfied when linking an executable. Yes. See D57385 and D57569.

MaskRay added inline comments.Apr 15 2020, 10:32 AM

lld/MachO/InputFiles.cpp
235	It is also used to set the `exportDynamic` property of a symbol. When linking an executable, the symbols used by DSOs will be added to `.dynsym`

int3 marked 2 inline comments as done.Apr 15 2020, 3:24 PM

int3 added inline comments.

lld/MachO/InputFiles.cpp
235	I think re-exported symbols also appear as undefined in the symbol table... I need to figure out how to distinguish them. I'll add a TODO.
lld/MachO/InputFiles.h
37	Actually, I see that we already have `getName()`. It just doesn't always work because we neglected to copy the strings from our input arguments before discarding them. I'll fix it and use that

use getName(); add TODO for undefined symbols

Harbormaster failed remote builds in B53443: Diff 257875!Apr 15 2020, 4:02 PM

rebase

int3 removed a parent revision: D75382: [lld] Initial commit for new Mach-O backend.Apr 16 2020, 1:20 PM

Harbormaster failed remote builds in B53616: Diff 258139!Apr 16 2020, 1:24 PM

LGTM with the comments addressed. I can commit this for you once they are.

lld/MachO/Driver.cpp
92–93	I believe these should actually be placed on the search path after any user-specified `-L` directories. At least that's what ld64 appears to do, based on the output from its `-v` option.
lld/MachO/InputSection.cpp
34	Nit: when one branch of an if-else has braces, all the others should too.
36	(same here)

This revision is now accepted and ready to land.Apr 20 2020, 4:20 PM

int3 marked an inline comment as done.Apr 20 2020, 10:37 PM

int3 added inline comments.

lld/MachO/Driver.cpp
92–93	Oh good catch. I'll make our `-v` option emit that too so we can test for it

address comments

Harbormaster completed remote builds in B54040: Diff 258902.Apr 20 2020, 10:45 PM

Closed by commit rG060efd24c7f0: [lld-macho] Add basic support for linking against dylibs (authored by int3, committed by smeenai). · Explain WhyApr 21 2020, 2:06 PM

This revision was automatically updated to reflect the committed changes.

smeenai mentioned this in rG9598778bd191: [lld-macho] Add support for emitting dylibs with a single symbol.Apr 27 2020, 2:02 PM

Revision Contents

Path

Size

lld/

MachO/

Arch/

6 lines

1 line

4 lines

48 lines

13 lines

45 lines

6 lines

14 lines

9 lines

5 lines

6 lines

10 lines

24 lines

52 lines

SyntheticSections.cpp

36 lines

Target.h

3 lines

Writer.h

2 lines

Writer.cpp

103 lines

test/

MachO/

Inputs/

175 lines

169 lines

160 lines

35 lines

5 lines

13 lines

12 lines

Diff 259101

lld/MachO/Arch/X86_64.cpp

	Show All 26 Lines
	X86_64::X86_64() {			X86_64::X86_64() {
	cpuType = CPU_TYPE_X86_64;			cpuType = CPU_TYPE_X86_64;
	cpuSubtype = CPU_SUBTYPE_X86_64_ALL;			cpuSubtype = CPU_SUBTYPE_X86_64_ALL;
	}			}

	uint64_t X86_64::getImplicitAddend(const uint8_t *loc, uint8_t type) const {			uint64_t X86_64::getImplicitAddend(const uint8_t *loc, uint8_t type) const {
	switch (type) {			switch (type) {
	case X86_64_RELOC_SIGNED:			case X86_64_RELOC_SIGNED:
				case X86_64_RELOC_GOT_LOAD:
	return read32le(loc);			return read32le(loc);
	default:			default:
	error("TODO: Unhandled relocation type " + std::to_string(type));			error("TODO: Unhandled relocation type " + std::to_string(type));
				MaskRayUnsubmitted Done Reply Inline Actions This message enhancement should be moved to the initial patch. MaskRay: This message enhancement should be moved to the initial patch.
				ruiuUnsubmitted Done Reply Inline Actions Yeah, but I'm not worried about the exact order and the contents of initial patch series, as this is going to be submitted as soon as the first patch is submitted, and no one will be using the feature only with the first patch, so I think I'm fine with this. ruiu: Yeah, but I'm not worried about the exact order and the contents of initial patch series, as…
				MaskRayUnsubmitted Not Done Reply Inline Actions I would argue that when sending patches as a series, we should make extra efforts keeping each commit clean and avoiding unneeded code move if possible... I think the this patch is good in its current status but will hope Jez can make some final cleanups to avoid subsequent formatting/cleanup changes. MaskRay: I would argue that when sending patches as a series, we should make extra efforts keeping each…
				int3AuthorUnsubmitted Done Reply Inline Actions no worries, wasn't hard to split out int3: no worries, wasn't hard to split out
	return 0;			return 0;
	}			}
	}			}

	void X86_64::relocateOne(uint8_t *loc, uint8_t type, uint64_t val) const {			void X86_64::relocateOne(uint8_t *loc, uint8_t type, uint64_t val) const {
	switch (type) {			switch (type) {
	case X86_64_RELOC_SIGNED:			case X86_64_RELOC_SIGNED:
	// This type is only used for pc-relative relocations, so offset by 4 since			case X86_64_RELOC_GOT_LOAD:
	// the RIP has advanced by 4 at this point.			// These types are only used for pc-relative relocations, so offset by 4
				smeenaiUnsubmitted Done Reply Inline Actions Nit: change the two spaces between "are" and "only" to one smeenai: Nit: change the two spaces between "are" and "only" to one
				// since the RIP has advanced by 4 at this point.
	write32le(loc, val - 4);			write32le(loc, val - 4);
	break;			break;
	default:			default:
	llvm_unreachable(			llvm_unreachable(
	"getImplicitAddend should have flagged all unhandled relocation types");			"getImplicitAddend should have flagged all unhandled relocation types");
	}			}
	}			}

	} // namespace			} // namespace

	TargetInfo *macho::createX86_64TargetInfo() {			TargetInfo *macho::createX86_64TargetInfo() {
	static X86_64 t;			static X86_64 t;
	return &t;			return &t;
	}			}

lld/MachO/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS Options.td)			set(LLVM_TARGET_DEFINITIONS Options.td)
	tablegen(LLVM Options.inc -gen-opt-parser-defs)			tablegen(LLVM Options.inc -gen-opt-parser-defs)
	add_public_tablegen_target(MachOOptionsTableGen)			add_public_tablegen_target(MachOOptionsTableGen)

	add_lld_library(lldMachO2			add_lld_library(lldMachO2
	Arch/X86_64.cpp			Arch/X86_64.cpp
	Driver.cpp			Driver.cpp
	InputFiles.cpp			InputFiles.cpp
	InputSection.cpp			InputSection.cpp
	OutputSegment.cpp			OutputSegment.cpp
	SymbolTable.cpp			SymbolTable.cpp
	Symbols.cpp			Symbols.cpp
				SyntheticSections.cpp
	Target.cpp			Target.cpp
	Writer.cpp			Writer.cpp

	LINK_COMPONENTS			LINK_COMPONENTS
	${LLVM_TARGETS_TO_BUILD}			${LLVM_TARGETS_TO_BUILD}
	BinaryFormat			BinaryFormat
	Core			Core
	Object			Object
	Show All 11 Lines

lld/MachO/Config.h

	//===- Config.h -------------------------------------------------- C++ --===//			//===- Config.h -------------------------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLD_MACHO_CONFIG_H			#ifndef LLD_MACHO_CONFIG_H
	#define LLD_MACHO_CONFIG_H			#define LLD_MACHO_CONFIG_H

	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"

				#include <vector>

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	class Symbol;			class Symbol;

	struct Configuration {			struct Configuration {
	llvm::StringRef outputFile;			llvm::StringRef outputFile;
	Symbol *entry;			Symbol *entry;

				std::vector<llvm::StringRef> searchPaths;
	};			};

	extern Configuration *config;			extern Configuration *config;

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/Driver.cpp

Show All 15 Lines
#include "Writer.h"		#include "Writer.h"

#include "lld/Common/Args.h"		#include "lld/Common/Args.h"
#include "lld/Common/Driver.h"		#include "lld/Common/Driver.h"
#include "lld/Common/ErrorHandler.h"		#include "lld/Common/ErrorHandler.h"
#include "lld/Common/LLVM.h"		#include "lld/Common/LLVM.h"
#include "lld/Common/Memory.h"		#include "lld/Common/Memory.h"
#include "lld/Common/Version.h"		#include "lld/Common/Version.h"
		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/BinaryFormat/MachO.h"		#include "llvm/BinaryFormat/MachO.h"
#include "llvm/BinaryFormat/Magic.h"		#include "llvm/BinaryFormat/Magic.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
#include "llvm/Option/Option.h"		#include "llvm/Option/Option.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"

using namespace llvm;		using namespace llvm;
Show All 31 Lines	opt::InputArgList MachOOptTable::parse(ArrayRef<const char *> argv) {
if (missingCount)		if (missingCount)
error(Twine(args.getArgString(missingIndex)) + ": missing argument");		error(Twine(args.getArgString(missingIndex)) + ": missing argument");

for (opt::Arg *arg : args.filtered(OPT_UNKNOWN))		for (opt::Arg *arg : args.filtered(OPT_UNKNOWN))
error("unknown argument: " + arg->getSpelling());		error("unknown argument: " + arg->getSpelling());
return args;		return args;
}		}

		// This is for -lfoo. We'll look for libfoo.dylib from search paths.
		static Optional<std::string> findDylib(StringRef name) {
		for (StringRef dir : config->searchPaths) {
		std::string path = (dir + "/lib" + name + ".dylib").str();
		if (fs::exists(path))
		return path;
		}
		error("library not found for -l" + name);
		return None;
		}

static TargetInfo *createTargetInfo(opt::InputArgList &args) {		static TargetInfo *createTargetInfo(opt::InputArgList &args) {
StringRef s = args.getLastArgValue(OPT_arch, "x86_64");		StringRef s = args.getLastArgValue(OPT_arch, "x86_64");
if (s != "x86_64")		if (s != "x86_64")
error("missing or unsupported -arch " + s);		error("missing or unsupported -arch " + s);
return createX86_64TargetInfo();		return createX86_64TargetInfo();
}		}

		static std::vector<StringRef> getSearchPaths(opt::InputArgList &args) {
		std::vector<StringRef> ret{args::getStrings(args, OPT_L)};
		if (!args.hasArg(OPT_Z)) {
		ret.push_back("/usr/lib");
		smeenaiUnsubmitted Not Done Reply Inline Actions I believe these should actually be placed on the search path after any user-specified `-L` directories. At least that's what ld64 appears to do, based on the output from its `-v` option. smeenai: I believe these should actually be placed on the search path after any user-specified `-L`…
		int3AuthorUnsubmitted Done Reply Inline Actions Oh good catch. I'll make our `-v` option emit that too so we can test for it int3: Oh good catch. I'll make our `-v` option emit that too so we can test for it
		ret.push_back("/usr/local/lib");
		}
		return ret;
		}

static void addFile(StringRef path) {		static void addFile(StringRef path) {
Optional<MemoryBufferRef> buffer = readFile(path);		Optional<MemoryBufferRef> buffer = readFile(path);
if (!buffer)		if (!buffer)
return;		return;
MemoryBufferRef mbref = *buffer;		MemoryBufferRef mbref = *buffer;

switch (identify_magic(mbref.getBuffer())) {		switch (identify_magic(mbref.getBuffer())) {
case file_magic::macho_object:		case file_magic::macho_object:
inputFiles.push_back(make<ObjFile>(mbref));		inputFiles.push_back(make<ObjFile>(mbref));
break;		break;
		case file_magic::macho_dynamically_linked_shared_lib:
		inputFiles.push_back(make<DylibFile>(mbref));
		break;
default:		default:
error(path + ": unhandled file type");		error(path + ": unhandled file type");
}		}
}		}

bool macho::link(llvm::ArrayRef<const char *> argsArr, bool canExitEarly,		bool macho::link(llvm::ArrayRef<const char *> argsArr, bool canExitEarly,
raw_ostream &stdoutOS, raw_ostream &stderrOS) {		raw_ostream &stdoutOS, raw_ostream &stderrOS) {
lld::stdoutOS = &stdoutOS;		lld::stdoutOS = &stdoutOS;
lld::stderrOS = &stderrOS;		lld::stderrOS = &stderrOS;

MachOOptTable parser;		MachOOptTable parser;
opt::InputArgList args = parser.parse(argsArr.slice(1));		opt::InputArgList args = parser.parse(argsArr.slice(1));

if (args.hasArg(OPT_v)) {
message(getLLDVersion());
freeArena();
return !errorCount();
}

config = make<Configuration>();		config = make<Configuration>();
symtab = make<SymbolTable>();		symtab = make<SymbolTable>();
target = createTargetInfo(args);		target = createTargetInfo(args);

config->entry = symtab->addUndefined(args.getLastArgValue(OPT_e, "_main"));		config->entry = symtab->addUndefined(args.getLastArgValue(OPT_e, "_main"));
config->outputFile = args.getLastArgValue(OPT_o, "a.out");		config->outputFile = args.getLastArgValue(OPT_o, "a.out");
		config->searchPaths = getSearchPaths(args);

		if (args.hasArg(OPT_v)) {
		message(getLLDVersion());
		std::vector<StringRef> &searchPaths = config->searchPaths;
		message("Library search paths:\n" +
		llvm::join(searchPaths.begin(), searchPaths.end(), "\n"));
		freeArena();
		return !errorCount();
		}

getOrCreateOutputSegment("__TEXT", VM_PROT_READ \| VM_PROT_EXECUTE);		getOrCreateOutputSegment("__TEXT", VM_PROT_READ \| VM_PROT_EXECUTE);
getOrCreateOutputSegment("__DATA", VM_PROT_READ \| VM_PROT_WRITE);		getOrCreateOutputSegment("__DATA", VM_PROT_READ \| VM_PROT_WRITE);
		getOrCreateOutputSegment("__DATA_CONST", VM_PROT_READ \| VM_PROT_WRITE);

for (opt::Arg *arg : args) {		for (opt::Arg *arg : args) {
switch (arg->getOption().getID()) {		switch (arg->getOption().getID()) {
case OPT_INPUT:		case OPT_INPUT:
addFile(arg->getValue());		addFile(arg->getValue());
break;		break;
		case OPT_l:
		if (Optional<std::string> path = findDylib(arg->getValue()))
		addFile(*path);
		break;
}		}
}		}

if (!isa<Defined>(config->entry)) {		if (!isa<Defined>(config->entry)) {
error("undefined symbol: " + config->entry->getName());		error("undefined symbol: " + config->entry->getName());
return false;		return false;
}		}

		createSyntheticSections();

// Initialize InputSections.		// Initialize InputSections.
for (InputFile *file : inputFiles)		for (InputFile *file : inputFiles)
for (InputSection *sec : file->sections)		for (InputSection *sec : file->sections)
inputSections.push_back(sec);		inputSections.push_back(sec);

// Add input sections to output segments.		// Add input sections to output segments.
for (InputSection *isec : inputSections) {		for (InputSection *isec : inputSections) {
OutputSegment *os =		OutputSegment *os =
getOrCreateOutputSegment(isec->segname, VM_PROT_READ \| VM_PROT_WRITE);		getOrCreateOutputSegment(isec->segname, VM_PROT_READ \| VM_PROT_WRITE);
		isec->parent = os;
os->sections[isec->name].push_back(isec);		os->sections[isec->name].push_back(isec);
}		}

// Write to an output file.		// Write to an output file.
writeResult();		writeResult();

if (canExitEarly)		if (canExitEarly)
exitLld(errorCount() ? 1 : 0);		exitLld(errorCount() ? 1 : 0);

freeArena();		freeArena();
return !errorCount();		return !errorCount();
}		}

lld/MachO/InputFiles.h

	Show All 21 Lines
	class InputSection;			class InputSection;
	class Symbol;			class Symbol;
	struct Reloc;			struct Reloc;

	class InputFile {			class InputFile {
	public:			public:
	enum Kind {			enum Kind {
	ObjKind,			ObjKind,
				DylibKind,
	};			};

	virtual ~InputFile() = default;			virtual ~InputFile() = default;

	Kind kind() const { return fileKind; }			Kind kind() const { return fileKind; }
	StringRef getName() const { return mb.getBufferIdentifier(); }			StringRef getName() const { return mb.getBufferIdentifier(); }

	MemoryBufferRef mb;			MemoryBufferRef mb;
				smeenaiUnsubmitted Not Done Reply Inline Actions Where is this used? smeenai: Where is this used?
				int3AuthorUnsubmitted Done Reply Inline Actions `error("dylib " + path + " missing LC_ID_DYLIB load command");` in InputFiles.cpp int3: `error("dylib " + path + " missing LC_ID_DYLIB load command");` in InputFiles.cpp
				int3AuthorUnsubmitted Done Reply Inline Actions Actually, I see that we already have `getName()`. It just doesn't always work because we neglected to copy the strings from our input arguments before discarding them. I'll fix it and use that int3: Actually, I see that we already have `getName()`. It just doesn't always work because we…
	std::vector<Symbol *> symbols;			std::vector<Symbol *> symbols;
	std::vector<InputSection *> sections;			std::vector<InputSection *> sections;
	StringRef dylibName;

	protected:			protected:
	InputFile(Kind kind, MemoryBufferRef mb) : mb(mb), fileKind(kind) {}			InputFile(Kind kind, MemoryBufferRef mb) : mb(mb), fileKind(kind) {}

	std::vector<InputSection *> parseSections(ArrayRef<llvm::MachO::section_64>);			std::vector<InputSection *> parseSections(ArrayRef<llvm::MachO::section_64>);

	void parseRelocations(const llvm::MachO::section_64 &,			void parseRelocations(const llvm::MachO::section_64 &,
	std::vector<Reloc> &relocs);			std::vector<Reloc> &relocs);

	private:			private:
	const Kind fileKind;			const Kind fileKind;
	};			};

	// .o file			// .o file
	class ObjFile : public InputFile {			class ObjFile : public InputFile {
	public:			public:
	explicit ObjFile(MemoryBufferRef mb);			explicit ObjFile(MemoryBufferRef mb);
	static bool classof(const InputFile *f) { return f->kind() == ObjKind; }			static bool classof(const InputFile *f) { return f->kind() == ObjKind; }
	};			};

				// .dylib file
				class DylibFile : public InputFile {
				public:
				explicit DylibFile(MemoryBufferRef mb);
				static bool classof(const InputFile *f) { return f->kind() == DylibKind; }

				StringRef dylibName;
				uint64_t ordinal = 0; // Ordinal numbering starts from 1, so 0 is a sentinel
				};

	extern std::vector<InputFile *> inputFiles;			extern std::vector<InputFile *> inputFiles;

	llvm::Optional<MemoryBufferRef> readFile(StringRef path);			llvm::Optional<MemoryBufferRef> readFile(StringRef path);

	} // namespace macho			} // namespace macho

	std::string toString(const macho::InputFile *file);			std::string toString(const macho::InputFile *file);
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/InputFiles.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	Optional<MemoryBufferRef> macho::readFile(StringRef path) {
if (auto ec = mbOrErr.getError()) {		if (auto ec = mbOrErr.getError()) {
error("cannot open " + path + ": " + ec.message());		error("cannot open " + path + ": " + ec.message());
return None;		return None;
}		}

std::unique_ptr<MemoryBuffer> &mb = *mbOrErr;		std::unique_ptr<MemoryBuffer> &mb = *mbOrErr;
MemoryBufferRef mbref = mb->getMemBufferRef();		MemoryBufferRef mbref = mb->getMemBufferRef();
make<std::unique_ptr<MemoryBuffer>>(std::move(mb)); // take mb ownership		make<std::unique_ptr<MemoryBuffer>>(std::move(mb)); // take mb ownership

		// If this is a regular non-fat file, return it.
		smeenaiUnsubmitted Done Reply Inline Actions Universal binary support could technically be a diff by itself. I'd prefer to do so, although it's small enough that I'm not super opposed to folding it into this diff. We definitely need to add tests for it either way. llvm-lipo is complete enough to be used for the tests. I'd also note in the commit message that some of this functionality is being restored from @pcc and @ruiu's initial commit, for completeness. smeenai: Universal binary support could technically be a diff by itself. I'd prefer to do so, although…
		int3AuthorUnsubmitted Done Reply Inline Actions Fair enough, will split into another diff int3: Fair enough, will split into another diff
		const char *buf = mbref.getBufferStart();
		auto hdr = reinterpret_cast<const MachO::fat_header >(buf);
		if (read32be(&hdr->magic) != MachO::FAT_MAGIC)
return mbref;		return mbref;

		error("TODO: Add support for universal binaries");
		return None;
}		}

static const load_command findCommand(const mach_header_64 hdr,		static const load_command findCommand(const mach_header_64 hdr,
uint32_t type) {		uint32_t type) {
		smeenaiUnsubmitted Not Done Reply Inline Actions We should add some sort of checking here to prevent going out of bounds of the file (if you have a ridiculously large `nfat_arch` in a malformed file, for example). We should also add tests for the error conditions here if possible (assuming yaml2obj lets us construct malformed universal binaries that would trigger the conditions). smeenai: We should add some sort of checking here to prevent going out of bounds of the file (if you…
const uint8_t *p =		const uint8_t *p =
reinterpret_cast<const uint8_t *>(hdr) + sizeof(mach_header_64);		reinterpret_cast<const uint8_t *>(hdr) + sizeof(mach_header_64);

for (uint32_t i = 0, n = hdr->ncmds; i < n; ++i) {		for (uint32_t i = 0, n = hdr->ncmds; i < n; ++i) {
auto cmd = reinterpret_cast<const load_command >(p);		auto cmd = reinterpret_cast<const load_command >(p);
if (cmd->cmd == type)		if (cmd->cmd == type)
return cmd;		return cmd;
p += cmd->cmdsize;		p += cmd->cmdsize;
}		}
return nullptr;		return nullptr;
		smeenaiUnsubmitted Not Done Reply Inline Actions I believe ld64 warns if you give it a universal binary input file and it can't find any slices for the architecture being linked. We should do the same. smeenai: I believe ld64 warns if you give it a universal binary input file and it can't find any slices…
}		}

std::vector<InputSection *>		std::vector<InputSection *>
InputFile::parseSections(ArrayRef<section_64> sections) {		InputFile::parseSections(ArrayRef<section_64> sections) {
std::vector<InputSection *> ret;		std::vector<InputSection *> ret;
ret.reserve(sections.size());		ret.reserve(sections.size());

auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());		auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	ObjFile::ObjFile(MemoryBufferRef mb) : InputFile(ObjKind, mb) {

if (const load_command *cmd = findCommand(hdr, LC_SEGMENT_64)) {		if (const load_command *cmd = findCommand(hdr, LC_SEGMENT_64)) {
auto c = reinterpret_cast<const segment_command_64 >(cmd);		auto c = reinterpret_cast<const segment_command_64 >(cmd);
objSections = ArrayRef<section_64>{		objSections = ArrayRef<section_64>{
reinterpret_cast<const section_64 *>(c + 1), c->nsects};		reinterpret_cast<const section_64 *>(c + 1), c->nsects};
sections = parseSections(objSections);		sections = parseSections(objSections);
}		}

		// TODO: Error on missing LC_SYMTAB?
if (const load_command *cmd = findCommand(hdr, LC_SYMTAB)) {		if (const load_command *cmd = findCommand(hdr, LC_SYMTAB)) {
auto c = reinterpret_cast<const symtab_command >(cmd);		auto c = reinterpret_cast<const symtab_command >(cmd);
const char strtab = reinterpret_cast<const char >(buf) + c->stroff;		const char strtab = reinterpret_cast<const char >(buf) + c->stroff;
ArrayRef<const nlist_64> nList(		ArrayRef<const nlist_64> nList(
reinterpret_cast<const nlist_64 *>(buf + c->symoff), c->nsyms);		reinterpret_cast<const nlist_64 *>(buf + c->symoff), c->nsyms);

symbols.reserve(c->nsyms);		symbols.reserve(c->nsyms);

for (const nlist_64 &sym : nList) {		for (const nlist_64 &sym : nList) {
StringRef name = strtab + sym.n_strx;		StringRef name = strtab + sym.n_strx;

// Undefined symbol		// Undefined symbol
if (!sym.n_sect) {		if (!sym.n_sect) {
error("TODO: Support undefined symbols");		symbols.push_back(symtab->addUndefined(name));
continue;		continue;
}		}

InputSection *isec = sections[sym.n_sect - 1];		InputSection *isec = sections[sym.n_sect - 1];
const section_64 &objSec = objSections[sym.n_sect - 1];		const section_64 &objSec = objSections[sym.n_sect - 1];
uint64_t value = sym.n_value - objSec.addr;		uint64_t value = sym.n_value - objSec.addr;

// Global defined symbol		// Global defined symbol
Show All 13 Lines	if (!sections.empty()) {
auto it = sections.begin();		auto it = sections.begin();
for (const section_64 &sec : objSections) {		for (const section_64 &sec : objSections) {
parseRelocations(sec, (*it)->relocs);		parseRelocations(sec, (*it)->relocs);
++it;		++it;
}		}
}		}
}		}

		DylibFile::DylibFile(MemoryBufferRef mb) : InputFile(DylibKind, mb) {
		auto buf = reinterpret_cast<const uint8_t >(mb.getBufferStart());
		auto hdr = reinterpret_cast<const mach_header_64 >(mb.getBufferStart());

		// Initialize dylibName.
		if (const load_command *cmd = findCommand(hdr, LC_ID_DYLIB)) {
		ruiuUnsubmitted Done Reply Inline Actions Can you add a brief comment here that we initialize `dylibName`. ruiu: Can you add a brief comment here that we initialize `dylibName`.
		auto c = reinterpret_cast<const dylib_command >(cmd);
		dylibName = reinterpret_cast<const char *>(cmd) + read32le(&c->dylib.name);
		} else {
		error("dylib " + getName() + " missing LC_ID_DYLIB load command");
		return;
		}

		// Initialize symbols.
		ruiuUnsubmitted Done Reply Inline Actions Ditto -- we are initializing `symbols`. ruiu: Ditto -- we are initializing `symbols`.
		if (const load_command *cmd = findCommand(hdr, LC_SYMTAB)) {
		auto c = reinterpret_cast<const symtab_command >(cmd);
		smeenaiUnsubmitted Not Done Reply Inline Actions Is it correct to just consider LC_SYMTAB for this, or should we also be consulting LC_DYSYMTAB and/or the export trie? If so, we don't have to address that in this diff, but we should add a TODO. We should also add a TODO about handling a missing LC_SYMTAB. smeenai: Is it correct to just consider LC_SYMTAB for this, or should we also be consulting LC_DYSYMTAB…
		int3AuthorUnsubmitted Done Reply Inline Actions I think the export trie is only consumed by dyld. Let me see what the deal is with LC_DYSYMTAB... int3: I think the export trie is only consumed by dyld. Let me see what the deal is with…
		int3AuthorUnsubmitted Done Reply Inline Actions Okay, I looked through loader.h; didn't entirely understand it, but I think LC_DYSYMTAB just contains additional info about / indirect references to symbols in LC_SYMTAB. int3: Okay, I looked through loader.h; didn't entirely understand it, but I think LC_DYSYMTAB just…
		const char strtab = reinterpret_cast<const char >(buf + c->stroff);
		ArrayRef<const nlist_64> nList(
		reinterpret_cast<const nlist_64 *>(buf + c->symoff), c->nsyms);

		symbols.reserve(c->nsyms);

		for (const nlist_64 &sym : nList) {
		StringRef name = strtab + sym.n_strx;
		// TODO: Figure out what to do about undefined symbols: ignore or warn
		// if unsatisfied? Also make sure we handle re-exported symbols
		smeenaiUnsubmitted Not Done Reply Inline Actions We shouldn't be doing this for symbols that are undefined in the dylib's symbol table, right? (And if so, we should add a test for that, though I don't think that'll be possible till after your follow-ups.) Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table, presumably to be able to check if all undefined symbols for dylibs are satisfied when linking an executable. Whether or not we do that depends on ld64's behavior there. smeenai: We shouldn't be doing this for symbols that are undefined in the dylib's symbol table, right?
		MaskRayUnsubmitted Not Done Reply Inline Actions Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table, presumably to be able to check if all undefined symbols for dylibs are satisfied when linking an executable. Yes. See D57385 and D57569. MaskRay: > Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table…
		MaskRayUnsubmitted Not Done Reply Inline Actions It is also used to set the `exportDynamic` property of a symbol. When linking an executable, the symbols used by DSOs will be added to `.dynsym` MaskRay: It is also used to set the `exportDynamic` property of a symbol. When linking an executable…
		int3AuthorUnsubmitted Done Reply Inline Actions I think re-exported symbols also appear as undefined in the symbol table... I need to figure out how to distinguish them. I'll add a TODO. int3: I think re-exported symbols also appear as undefined in the symbol table... I need to figure…
		// correctly.
		symbols.push_back(symtab->addDylib(name, this));
		}
		}
		}

// Returns "<internal>" or "baz.o".		// Returns "<internal>" or "baz.o".
std::string lld::toString(const InputFile *file) {		std::string lld::toString(const InputFile *file) {
return file ? std::string(file->getName()) : "<internal>";		return file ? std::string(file->getName()) : "<internal>";
}		}
		smeenaiUnsubmitted Done Reply Inline Actions Unnecessary comment change? We aren't adding support for archive files yet. (CC @Ktwu, this'll need to be adjusted when archive file support is added.) smeenai: Unnecessary comment change? We aren't adding support for archive files yet. (CC @Ktwu, this'll…
		smeenaiUnsubmitted Done Reply Inline Actions Is it valid to not have an `LC_ID_DYLIB` load command? If it is, should we figure out a fallback `dylibName`? If not, we should error. (Also, we should add tests for this either way if possible.) smeenai: Is it valid to not have an `LC_ID_DYLIB` load command? If it is, should we figure out a…
		smeenaiUnsubmitted Not Done Reply Inline Actions Is it valid to not have a symtab? If not, we should error. smeenai: Is it valid to not have a symtab? If not, we should error.
		int3AuthorUnsubmitted Done Reply Inline Actions Not sure why, but unlike the LC_ID_DYLIB case, trying to create a dylib without LC_SYMTAB makes yaml2obj crash. Probably some offsets / indices are off. Looking at ld64's source, it seems that not having an LC_SYMTAB is only an error if LC_DYLD_INFO(_ONLY) is also missing. The dylib I looked at had LC_DYLD_INFO_ONLY, so I'm not sure if such a case arises in practice... Anyway, proper handling of a missing LC_SYMTAB should probably cover the ObjectFile case as well, so I'm inclined to punt on it for now. int3: Not sure why, but unlike the LC_ID_DYLIB case, trying to create a dylib without LC_SYMTAB makes…

lld/MachO/InputSection.h

	Show All 13 Lines
	#include "llvm/ADT/PointerUnion.h"			#include "llvm/ADT/PointerUnion.h"
	#include "llvm/BinaryFormat/MachO.h"			#include "llvm/BinaryFormat/MachO.h"

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	class InputFile;			class InputFile;
	class InputSection;			class InputSection;
				class OutputSegment;
	class Symbol;			class Symbol;

	struct Reloc {			struct Reloc {
	uint8_t type;			uint8_t type;
	uint32_t addend;			uint32_t addend;
	uint32_t offset;			uint32_t offset;
	llvm::PointerUnion<Symbol , InputSection > target;			llvm::PointerUnion<Symbol , InputSection > target;
	};			};

	class InputSection {			class InputSection {
	public:			public:
	void writeTo(uint8_t *buf);			virtual ~InputSection() = default;
				virtual void writeTo(uint8_t *buf);
				virtual size_t getSize() const { return data.size(); }

	InputFile *file = nullptr;			InputFile *file = nullptr;
				OutputSegment *parent = nullptr;
	StringRef name;			StringRef name;
	StringRef segname;			StringRef segname;

	ArrayRef<uint8_t> data;			ArrayRef<uint8_t> data;
	uint64_t addr = 0;			uint64_t addr = 0;
	uint32_t align = 1;			uint32_t align = 1;
	uint32_t flags = 0;			uint32_t flags = 0;

	Show All 9 Lines

lld/MachO/InputSection.cpp

	//===- InputSection.cpp ---------------------------------------------------===//			//===- InputSection.cpp ---------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "InputSection.h"			#include "InputSection.h"
	#include "Symbols.h"			#include "Symbols.h"
				#include "SyntheticSections.h"
	#include "Target.h"			#include "Target.h"
	#include "lld/Common/Memory.h"			#include "lld/Common/Memory.h"
	#include "llvm/Support/Endian.h"			#include "llvm/Support/Endian.h"

	using namespace llvm::MachO;			using namespace llvm::MachO;
	using namespace llvm::support;			using namespace llvm::support;
	using namespace lld;			using namespace lld;
	using namespace lld::macho;			using namespace lld::macho;

	std::vector<InputSection *> macho::inputSections;			std::vector<InputSection *> macho::inputSections;

	void InputSection::writeTo(uint8_t *buf) {			void InputSection::writeTo(uint8_t *buf) {
	memcpy(buf, data.data(), data.size());			memcpy(buf, data.data(), data.size());

	for (Reloc &r : relocs) {			for (Reloc &r : relocs) {
	uint64_t va = 0;			uint64_t va = 0;
	if (auto s = r.target.dyn_cast<Symbol >())			if (auto s = r.target.dyn_cast<Symbol >()) {
				if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s)) {
				va = in.got->addr - ImageBase + dylibSymbol->gotIndex * WordSize;
				} else {
	va = s->getVA();			va = s->getVA();
	else if (auto isec = r.target.dyn_cast<InputSection >())			}
				} else if (auto isec = r.target.dyn_cast<InputSection >()) {
				smeenaiUnsubmitted Not Done Reply Inline Actions Nit: when one branch of an if-else has braces, all the others should too. smeenai: Nit: when one branch of an if-else has braces, all the others should too.
	va = isec->addr;			va = isec->addr;
	else			} else {
				smeenaiUnsubmitted Not Done Reply Inline Actions (same here) smeenai: (same here)
	llvm_unreachable("Unknown relocation target");			llvm_unreachable("Unknown relocation target");
				}

	uint64_t val = va + r.addend;			uint64_t val = va + r.addend;
	if (1) // TODO: handle non-pcrel relocations			if (1) // TODO: handle non-pcrel relocations
	val -= addr - ImageBase + r.offset;			val -= addr - ImageBase + r.offset;
	target->relocateOne(buf + r.offset, r.type, val);			target->relocateOne(buf + r.offset, r.type, val);
	}			}
	}			}

lld/MachO/Options.td

	include "llvm/Option/OptParser.td"			include "llvm/Option/OptParser.td"

				def L: JoinedOrSeparate<["-"], "L">, MetaVarName<"<dir>">,
				HelpText<"Add directory to library search path">;

				def Z: Flag<["-"], "Z">,
				HelpText<"Do not add standard directories to library search path">;

	def arch: Separate<["-"], "arch">, MetaVarName<"<arch-name>">,			def arch: Separate<["-"], "arch">, MetaVarName<"<arch-name>">,
	HelpText<"Architecture to link">;			HelpText<"Architecture to link">;

	def e: Separate<["-"], "e">, HelpText<"Name of entry point symbol">;			def e: Separate<["-"], "e">, HelpText<"Name of entry point symbol">;

				def l: Joined<["-"], "l">, MetaVarName<"<libname>">,
				HelpText<"Base name of library searched for in -L directories">;

	def o: Separate<["-"], "o">, MetaVarName<"<path>">,			def o: Separate<["-"], "o">, MetaVarName<"<path>">,
	HelpText<"Path to file to write output">;			HelpText<"Path to file to write output">;

	def v: Flag<["-"], "v">, HelpText<"Display the version number and exit">;			def v: Flag<["-"], "v">, HelpText<"Display the version number and exit">;

	// Ignored options			// Ignored options
	def: Flag<["-"], "demangle">;			def: Flag<["-"], "demangle">;
	def: Flag<["-"], "dynamic">;			def: Flag<["-"], "dynamic">;
	def: Flag<["-"], "no_deduplicate">;			def: Flag<["-"], "no_deduplicate">;
	def: Separate<["-"], "lto_library">;			def: Separate<["-"], "lto_library">;
	def: Separate<["-"], "macosx_version_min">;			def: Separate<["-"], "macosx_version_min">;

lld/MachO/OutputSegment.h

	Show All 13 Lines

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	class InputSection;			class InputSection;

	class OutputSegment {			class OutputSegment {
	public:			public:
				InputSection *firstSection() const { return sections.front().second.at(0); }

				InputSection *lastSection() const { return sections.back().second.back(); }

	StringRef name;			StringRef name;
	uint32_t perms;			uint32_t perms;
				uint8_t index;
	llvm::MapVector<StringRef, std::vector<InputSection *>> sections;			llvm::MapVector<StringRef, std::vector<InputSection *>> sections;
	};			};

	extern std::vector<OutputSegment *> outputSegments;			extern std::vector<OutputSegment *> outputSegments;

	OutputSegment *getOrCreateOutputSegment(StringRef name, uint32_t perms);			OutputSegment *getOrCreateOutputSegment(StringRef name, uint32_t perms);

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/SymbolTable.h

	Show All 10 Lines

	#include "lld/Common/LLVM.h"			#include "lld/Common/LLVM.h"
	#include "llvm/ADT/CachedHashString.h"			#include "llvm/ADT/CachedHashString.h"
	#include "llvm/Object/Archive.h"			#include "llvm/Object/Archive.h"

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	class InputFile;
	class InputSection;
	class ArchiveFile;			class ArchiveFile;
				class DylibFile;
				class InputSection;
	class Symbol;			class Symbol;

	class SymbolTable {			class SymbolTable {
	public:			public:
	Symbol addDefined(StringRef name, InputSection isec, uint32_t value);			Symbol addDefined(StringRef name, InputSection isec, uint32_t value);

	Symbol *addUndefined(StringRef name);			Symbol *addUndefined(StringRef name);

				Symbol addDylib(StringRef name, DylibFile file);

	ArrayRef<Symbol *> getSymbols() const { return symVector; }			ArrayRef<Symbol *> getSymbols() const { return symVector; }
	Symbol *find(StringRef name);			Symbol *find(StringRef name);

	private:			private:
	std::pair<Symbol *, bool> insert(StringRef name);			std::pair<Symbol *, bool> insert(StringRef name);
	llvm::DenseMap<llvm::CachedHashStringRef, int> symMap;			llvm::DenseMap<llvm::CachedHashStringRef, int> symMap;
	std::vector<Symbol *> symVector;			std::vector<Symbol *> symVector;
	};			};

	extern SymbolTable *symtab;			extern SymbolTable *symtab;

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/SymbolTable.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	Symbol *SymbolTable::addUndefined(StringRef name) {
bool wasInserted;		bool wasInserted;
std::tie(s, wasInserted) = insert(name);		std::tie(s, wasInserted) = insert(name);

if (wasInserted)		if (wasInserted)
replaceSymbol<Undefined>(s, name);		replaceSymbol<Undefined>(s, name);
return s;		return s;
}		}

		Symbol SymbolTable::addDylib(StringRef name, DylibFile file) {
		Symbol *s;
		bool wasInserted;
		std::tie(s, wasInserted) = insert(name);

		if (wasInserted)
		replaceSymbol<DylibSymbol>(s, file, name);
		return s;
		}

SymbolTable *macho::symtab;		SymbolTable *macho::symtab;

lld/MachO/Symbols.h

	Show All 12 Lines
	#include "Target.h"			#include "Target.h"
	#include "lld/Common/Strings.h"			#include "lld/Common/Strings.h"
	#include "llvm/Object/Archive.h"			#include "llvm/Object/Archive.h"

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	class InputSection;			class InputSection;
	class InputFile;			class DylibFile;
	class ArchiveFile;			class ArchiveFile;

	struct StringRefZ {			struct StringRefZ {
	StringRefZ(const char *s) : data(s), size(-1) {}			StringRefZ(const char *s) : data(s), size(-1) {}
	StringRefZ(StringRef s) : data(s.data()), size(s.size()) {}			StringRefZ(StringRef s) : data(s.data()), size(s.size()) {}

	const char *data;			const char *data;
	const uint32_t size;			const uint32_t size;
	};			};

	class Symbol {			class Symbol {
	public:			public:
	enum Kind {			enum Kind {
	DefinedKind,			DefinedKind,
	UndefinedKind,			UndefinedKind,
				DylibKind,
	};			};

	Kind kind() const { return static_cast<Kind>(symbolKind); }			Kind kind() const { return static_cast<Kind>(symbolKind); }

	StringRef getName() const { return {name.data, name.size}; }			StringRef getName() const { return {name.data, name.size}; }

	uint64_t getVA() const;			uint64_t getVA() const;

	InputFile *file;

	protected:			protected:
	Symbol(Kind k, InputFile *file, StringRefZ name)			Symbol(Kind k, StringRefZ name) : symbolKind(k), name(name) {}
	: file(file), symbolKind(k), name(name) {}

	Kind symbolKind;			Kind symbolKind;
	StringRefZ name;			StringRefZ name;
	};			};

	class Defined : public Symbol {			class Defined : public Symbol {
	public:			public:
	Defined(StringRefZ name, InputSection *isec, uint32_t value)			Defined(StringRefZ name, InputSection *isec, uint32_t value)
	: Symbol(DefinedKind, nullptr, name), isec(isec), value(value) {}			: Symbol(DefinedKind, name), isec(isec), value(value) {}

	InputSection *isec;			InputSection *isec;
	uint32_t value;			uint32_t value;

	static bool classof(const Symbol *s) { return s->kind() == DefinedKind; }			static bool classof(const Symbol *s) { return s->kind() == DefinedKind; }
	};			};

	class Undefined : public Symbol {			class Undefined : public Symbol {
	public:			public:
	Undefined(StringRefZ name) : Symbol(UndefinedKind, nullptr, name) {}			Undefined(StringRefZ name) : Symbol(UndefinedKind, name) {}

	static bool classof(const Symbol *s) { return s->kind() == UndefinedKind; }			static bool classof(const Symbol *s) { return s->kind() == UndefinedKind; }
	};			};

				class DylibSymbol : public Symbol {
				public:
				DylibSymbol(DylibFile *file, StringRefZ name)
				: Symbol(DylibKind, name), file(file) {}

				static bool classof(const Symbol *s) { return s->kind() == DylibKind; }

				DylibFile *file;
				uint32_t gotIndex = UINT32_MAX;
				ruiuUnsubmitted Done Reply Inline Actions Isn't uint32_t enough? It's not a big deal, but I'd use uint32_t for an index of GOT. ruiu: Isn't uint32_t enough? It's not a big deal, but I'd use uint32_t for an index of GOT.
				};

	inline uint64_t Symbol::getVA() const {			inline uint64_t Symbol::getVA() const {
	if (auto *d = dyn_cast<Defined>(this))			if (auto *d = dyn_cast<Defined>(this))
	return d->isec->addr + d->value - ImageBase;			return d->isec->addr + d->value - ImageBase;
	return 0;			return 0;
	}			}

	union SymbolUnion {			union SymbolUnion {
	alignas(Defined) char a[sizeof(Defined)];			alignas(Defined) char a[sizeof(Defined)];
	alignas(Undefined) char b[sizeof(Undefined)];			alignas(Undefined) char b[sizeof(Undefined)];
				alignas(DylibSymbol) char c[sizeof(DylibSymbol)];
	};			};

	template <typename T, typename... ArgT>			template <typename T, typename... ArgT>
	void replaceSymbol(Symbol *s, ArgT &&... arg) {			void replaceSymbol(Symbol *s, ArgT &&... arg) {
	static_assert(sizeof(T) <= sizeof(SymbolUnion), "SymbolUnion too small");			static_assert(sizeof(T) <= sizeof(SymbolUnion), "SymbolUnion too small");
	static_assert(alignof(T) <= alignof(SymbolUnion),			static_assert(alignof(T) <= alignof(SymbolUnion),
	"SymbolUnion not aligned enough");			"SymbolUnion not aligned enough");
	assert(static_cast<Symbol >(static_cast<T >(nullptr)) == nullptr &&			assert(static_cast<Symbol >(static_cast<T >(nullptr)) == nullptr &&
	Show All 11 Lines

lld/MachO/SyntheticSections.h

This file was added.

				//===- SyntheticSections.h -------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLD_MACHO_SYNTHETIC_SECTIONS_H
				#define LLD_MACHO_SYNTHETIC_SECTIONS_H

				#include "InputSection.h"
				#include "Target.h"
				#include "llvm/ADT/SetVector.h"

				namespace lld {
				namespace macho {

				class DylibSymbol;

				// This section will be populated by dyld with addresses to non-lazily-loaded
				// dylib symbols.
				class GotSection : public InputSection {
				public:
				smeenaiUnsubmitted Not Done Reply Inline Actions It's kinda interesting to me that GotSection would inherit from InputSection, since it's not an input section, of course. LLD ELF explains this design decision like so: // Synthetic sections are designed as input sections as opposed to // output sections because we want to allow them to be manipulated // using linker scripts just like other input sections from regular // files. For Mach-O, we don't have linker scripts, so perhaps that reasoning doesn't hold. COFF uses Chunks instead of InputSections, since there's more synthesized data (https://lld.llvm.org/NewLLD.html#important-data-structures). I think Mach-O has a good amount of linker-synthesized data and input section processing as well, so a COFF-like design might make more sense. It's hard to say until we have more of the implementation though. For now, I think leaving this inheritance as-is is fine, but we should add a TODO to reconsider it once we've implemented more of the linker. @ruiu, @MaskRay, what do you think? smeenai: It's kinda interesting to me that GotSection would inherit from InputSection, since it's not an…
				GotSection();

				void addEntry(DylibSymbol &sym);
				const llvm::SetVector<const DylibSymbol *> &getEntries() const {
				return entries;
				}

				size_t getSize() const override { return entries.size() * WordSize; }

				void writeTo(uint8_t *buf) override {
				ruiuUnsubmitted Done Reply Inline Actions So GOT has no data but dynamic relocations? If so, I'd leave a brief comment here. ruiu: So GOT has no data but dynamic relocations? If so, I'd leave a brief comment here.
				int3AuthorUnsubmitted Done Reply Inline Actions Actually, I'm not sure it's right to say that it has "no data" -- it does actually occupy space in the binary (unlike the __PAGEZERO segment, a section can't have zero filesize while having a non-zero-length address range), so I think saying it contains all zeroes is more accurate. int3: Actually, I'm not sure it's right to say that it has "no data" -- it does actually occupy space…
				// Nothing to write, GOT contains all zeros at link time; it's populated at
				// runtime by dyld.
				ruiuUnsubmitted Done Reply Inline Actions nit: add a blank line before a label. ruiu: nit: add a blank line before a label.
				}

				private:
				llvm::SetVector<const DylibSymbol *> entries;
				};

				struct InStruct {
				GotSection *got;
				};

				extern InStruct in;

				} // namespace macho
				} // namespace lld

				#endif

lld/MachO/SyntheticSections.cpp

This file was added.

				//===- SyntheticSections.cpp ---------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "SyntheticSections.h"
				#include "Symbols.h"

				using namespace llvm::MachO;

				namespace lld {
				namespace macho {

				GotSection::GotSection() {
				segname = "__DATA_CONST";
				name = "__got";
				align = 8;
				flags = S_NON_LAZY_SYMBOL_POINTERS;

				// TODO: section_64::reserved1 should be an index into the indirect symbol
				// table, which we do not currently emit
				}

				void GotSection::addEntry(DylibSymbol &sym) {
				if (entries.insert(&sym)) {
				sym.gotIndex = entries.size() - 1;
				}
				}

				InStruct in;

				} // namespace macho
				} // namespace lld

lld/MachO/Target.h

	Show All 9 Lines
	#define LLD_MACHO_TARGET_H			#define LLD_MACHO_TARGET_H

	#include <cstdint>			#include <cstdint>

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	enum {			enum {
				// We are currently only supporting 64-bit targets since macOS and iOS are
				ruiuUnsubmitted Done Reply Inline Actions Are you going to make it work only for 64-bit platforms? IIRC, both macOS and iOS are dropping 32-bit app support, so it is probably a good choice to not support 32-bit, but just to confirm. ruiu: Are you going to make it work only for 64-bit platforms? IIRC, both macOS and iOS are dropping…
				int3AuthorUnsubmitted Done Reply Inline Actions I'm not sure we're 100% not going to support it, but we're definitely prioritizing 64-bit for now int3: I'm not sure we're 100% not going to support it, but we're definitely prioritizing 64-bit for…
				ruiuUnsubmitted Done Reply Inline Actions Then please leave a comment here saying that we are currently supporting only 64-bit targets as macOS and iOS are deprecating 32-bit apps. ruiu: Then please leave a comment here saying that we are currently supporting only 64-bit targets as…
				// deprecating 32-bit apps.
				WordSize = 8,
	PageSize = 4096,			PageSize = 4096,
	ImageBase = 4096,			ImageBase = 4096,
	MaxAlignmentPowerOf2 = 32,			MaxAlignmentPowerOf2 = 32,
	};			};

	class TargetInfo {			class TargetInfo {
	public:			public:
	virtual ~TargetInfo() = default;			virtual ~TargetInfo() = default;
	Show All 16 Lines

lld/MachO/Writer.h

	//===- Writer.h -------------------------------------------------- C++ --===//			//===- Writer.h -------------------------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLD_MACHO_WRITER_H			#ifndef LLD_MACHO_WRITER_H
	#define LLD_MACHO_WRITER_H			#define LLD_MACHO_WRITER_H

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	void writeResult();			void writeResult();

				void createSyntheticSections();

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/Writer.cpp

//===- Writer.cpp ---------------------------------------------------------===//		//===- Writer.cpp ---------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Writer.h"		#include "Writer.h"
#include "Config.h"		#include "Config.h"
#include "InputFiles.h"		#include "InputFiles.h"
#include "InputSection.h"		#include "InputSection.h"
#include "OutputSegment.h"		#include "OutputSegment.h"
#include "SymbolTable.h"		#include "SymbolTable.h"
#include "Symbols.h"		#include "Symbols.h"
		#include "SyntheticSections.h"
#include "Target.h"		#include "Target.h"

#include "lld/Common/ErrorHandler.h"		#include "lld/Common/ErrorHandler.h"
#include "lld/Common/Memory.h"		#include "lld/Common/Memory.h"
#include "llvm/BinaryFormat/MachO.h"		#include "llvm/BinaryFormat/MachO.h"
		#include "llvm/Support/EndianStream.h"
#include "llvm/Support/LEB128.h"		#include "llvm/Support/LEB128.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::MachO;		using namespace llvm::MachO;
using namespace llvm::support;		using namespace llvm::support;
using namespace lld;		using namespace lld;
using namespace lld::macho;		using namespace lld::macho;
Show All 10 Lines	public:
virtual void writeTo(uint8_t *buf) const = 0;		virtual void writeTo(uint8_t *buf) const = 0;
};		};

class Writer {		class Writer {
public:		public:
Writer() : buffer(errorHandler().outputBuffer) {}		Writer() : buffer(errorHandler().outputBuffer) {}

void createLoadCommands();		void createLoadCommands();
		void scanRelocations();
void assignAddresses();		void assignAddresses();

		void createDyldInfoContents();

void openFile();		void openFile();
void writeHeader();		void writeHeader();
void writeSections();		void writeSections();

void run();		void run();

std::vector<LoadCommand *> loadCommands;		std::vector<LoadCommand *> loadCommands;
std::unique_ptr<FileOutputBuffer> &buffer;		std::unique_ptr<FileOutputBuffer> &buffer;
Show All 21 Lines
public:		public:
uint32_t getSize() const override { return sizeof(segment_command_64); }		uint32_t getSize() const override { return sizeof(segment_command_64); }

void writeTo(uint8_t *buf) const override {		void writeTo(uint8_t *buf) const override {
auto c = reinterpret_cast<segment_command_64 >(buf);		auto c = reinterpret_cast<segment_command_64 >(buf);
c->cmd = LC_SEGMENT_64;		c->cmd = LC_SEGMENT_64;
c->cmdsize = getSize();		c->cmdsize = getSize();
strcpy(c->segname, "__LINKEDIT");		strcpy(c->segname, "__LINKEDIT");
		c->vmaddr = addr;
c->fileoff = fileOff;		c->fileoff = fileOff;
c->filesize = contents.size();		c->filesize = c->vmsize = contents.size();
c->maxprot = VM_PROT_READ \| VM_PROT_WRITE;		c->maxprot = VM_PROT_READ \| VM_PROT_WRITE;
c->initprot = VM_PROT_READ;		c->initprot = VM_PROT_READ;
}		}

uint64_t getOffset() const { return fileOff + contents.size(); }		uint64_t getOffset() const { return fileOff + contents.size(); }

uint64_t fileOff = 0;		uint64_t fileOff = 0;
		uint64_t addr = 0;
SmallVector<char, 128> contents;		SmallVector<char, 128> contents;
};		};

class LCDyldInfo : public LoadCommand {		class LCDyldInfo : public LoadCommand {
public:		public:
uint32_t getSize() const override { return sizeof(dyld_info_command); }		uint32_t getSize() const override { return sizeof(dyld_info_command); }

void writeTo(uint8_t *buf) const override {		void writeTo(uint8_t *buf) const override {
auto c = reinterpret_cast<dyld_info_command >(buf);		auto c = reinterpret_cast<dyld_info_command >(buf);
c->cmd = LC_DYLD_INFO_ONLY;		c->cmd = LC_DYLD_INFO_ONLY;
c->cmdsize = getSize();		c->cmdsize = getSize();
		c->bind_off = bindOff;
		c->bind_size = bindSize;
c->export_off = exportOff;		c->export_off = exportOff;
c->export_size = exportSize;		c->export_size = exportSize;
}		}

		uint64_t bindOff = 0;
		uint64_t bindSize = 0;
uint64_t exportOff = 0;		uint64_t exportOff = 0;
uint64_t exportSize = 0;		uint64_t exportSize = 0;
};		};

class LCDysymtab : public LoadCommand {		class LCDysymtab : public LoadCommand {
public:		public:
uint32_t getSize() const override { return sizeof(dysymtab_command); }		uint32_t getSize() const override { return sizeof(dysymtab_command); }

Show All 16 Lines	public:
void writeTo(uint8_t *buf) const override {		void writeTo(uint8_t *buf) const override {
auto c = reinterpret_cast<segment_command_64 >(buf);		auto c = reinterpret_cast<segment_command_64 >(buf);
buf += sizeof(segment_command_64);		buf += sizeof(segment_command_64);

c->cmd = LC_SEGMENT_64;		c->cmd = LC_SEGMENT_64;
c->cmdsize = getSize();		c->cmdsize = getSize();
memcpy(c->segname, name.data(), name.size());		memcpy(c->segname, name.data(), name.size());

InputSection *firstSec = seg->sections.front().second[0];
InputSection *lastSec = seg->sections.back().second.back();

// dyld3's MachOLoaded::getSlide() assumes that the __TEXT segment starts		// dyld3's MachOLoaded::getSlide() assumes that the __TEXT segment starts
// from the beginning of the file (i.e. the header).		// from the beginning of the file (i.e. the header).
// TODO: replace this logic by creating a synthetic __TEXT,__mach_header		// TODO: replace this logic by creating a synthetic __TEXT,__mach_header
// section instead.		// section instead.
c->fileoff = name == "__TEXT" ? 0 : firstSec->addr - ImageBase;		c->fileoff = name == "__TEXT" ? 0 : seg->firstSection()->addr - ImageBase;
c->vmaddr = c->fileoff + ImageBase;		c->vmaddr = c->fileoff + ImageBase;
c->vmsize = c->filesize = lastSec->addr + lastSec->data.size() - c->vmaddr;		c->vmsize = c->filesize =
		seg->lastSection()->addr + seg->lastSection()->getSize() - c->vmaddr;
c->maxprot = VM_PROT_READ \| VM_PROT_WRITE \| VM_PROT_EXECUTE;		c->maxprot = VM_PROT_READ \| VM_PROT_WRITE \| VM_PROT_EXECUTE;
c->initprot = seg->perms;		c->initprot = seg->perms;
c->nsects = seg->sections.size();		c->nsects = seg->sections.size();

for (auto &p : seg->sections) {		for (auto &p : seg->sections) {
StringRef s = p.first;		StringRef s = p.first;
std::vector<InputSection *> &sections = p.second;		std::vector<InputSection *> &sections = p.second;

auto sectHdr = reinterpret_cast<section_64 >(buf);		auto sectHdr = reinterpret_cast<section_64 >(buf);
buf += sizeof(section_64);		buf += sizeof(section_64);

memcpy(sectHdr->sectname, s.data(), s.size());		memcpy(sectHdr->sectname, s.data(), s.size());
memcpy(sectHdr->segname, name.data(), name.size());		memcpy(sectHdr->segname, name.data(), name.size());

sectHdr->addr = sections[0]->addr;		sectHdr->addr = sections[0]->addr;
sectHdr->offset = sections[0]->addr - ImageBase;		sectHdr->offset = sections[0]->addr - ImageBase;
sectHdr->align = sections[0]->align;		sectHdr->align = sections[0]->align;
uint32_t maxAlign = 0;		uint32_t maxAlign = 0;
for (const InputSection *section : sections)		for (const InputSection *section : sections)
maxAlign = std::max(maxAlign, section->align);		maxAlign = std::max(maxAlign, section->align);
sectHdr->align = Log2_32(maxAlign);		sectHdr->align = Log2_32(maxAlign);
sectHdr->flags = sections[0]->flags;		sectHdr->flags = sections[0]->flags;
sectHdr->size = sections.back()->addr + sections.back()->data.size() -		sectHdr->size = sections.back()->addr + sections.back()->getSize() -
sections[0]->addr;		sections[0]->addr;
}		}
}		}

private:		private:
StringRef name;		StringRef name;
OutputSegment *seg;		OutputSegment *seg;
};		};
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	void Writer::createLoadCommands() {

loadCommands.push_back(linkEditSeg);		loadCommands.push_back(linkEditSeg);
loadCommands.push_back(dyldInfoSeg);		loadCommands.push_back(dyldInfoSeg);
loadCommands.push_back(symtabSeg);		loadCommands.push_back(symtabSeg);
loadCommands.push_back(make<LCPagezero>());		loadCommands.push_back(make<LCPagezero>());
loadCommands.push_back(make<LCLoadDylinker>());		loadCommands.push_back(make<LCLoadDylinker>());
loadCommands.push_back(make<LCDysymtab>());		loadCommands.push_back(make<LCDysymtab>());
loadCommands.push_back(make<LCMain>());		loadCommands.push_back(make<LCMain>());

		uint8_t segIndex = 1; // LCPagezero is a segment load command
		for (OutputSegment *seg : outputSegments) {
		if (!seg->sections.empty()) {
		loadCommands.push_back(make<LCSegment>(seg->name, seg));
		seg->index = segIndex++;
		}
		}

		uint64_t dylibOrdinal = 1;
		for (InputFile *file : inputFiles) {
		if (auto *dylibFile = dyn_cast<DylibFile>(file)) {
		loadCommands.push_back(make<LCLoadDylib>(dylibFile->dylibName));
		dylibFile->ordinal = dylibOrdinal++;
		}
		}

// TODO: dyld requires libSystem to be loaded. libSystem is a universal		// TODO: dyld requires libSystem to be loaded. libSystem is a universal
// binary and we don't have support for that yet, so mock it out here.		// binary and we don't have support for that yet, so mock it out here.
loadCommands.push_back(make<LCLoadDylib>("/usr/lib/libSystem.B.dylib"));		loadCommands.push_back(make<LCLoadDylib>("/usr/lib/libSystem.B.dylib"));
		}

for (OutputSegment *seg : outputSegments)		void Writer::scanRelocations() {
if (!seg->sections.empty())		for (InputSection *sect : inputSections)
		ruiuUnsubmitted Done Reply Inline Actions Replace auto with a concrete type. ruiu: Replace auto with a concrete type.
loadCommands.push_back(make<LCSegment>(seg->name, seg));		for (Reloc &r : sect->relocs)
		if (auto s = r.target.dyn_cast<Symbol >())
		if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s))
		in.got->addEntry(*dylibSymbol);
}		}

void Writer::assignAddresses() {		void Writer::assignAddresses() {
uint64_t addr = ImageBase + sizeof(mach_header_64);		uint64_t addr = ImageBase + sizeof(mach_header_64);

uint64_t size = 0;		uint64_t size = 0;
for (LoadCommand *lc : loadCommands)		for (LoadCommand *lc : loadCommands)
size += lc->getSize();		size += lc->getSize();
sizeofCmds = size;		sizeofCmds = size;
addr += size;		addr += size;

for (OutputSegment *seg : outputSegments) {		for (OutputSegment *seg : outputSegments) {
		ruiuUnsubmitted Done Reply Inline Actions I wonder if you directly dyn_cast to `DylibSymbol ` as `r.target.dyn_cast<DylibSymbol >()`. ruiu: I wonder if you directly dyn_cast to `DylibSymbol ` as `r.target.dyn_cast<DylibSymbol >()`.
		int3AuthorUnsubmitted Done Reply Inline Actions just tried, doesn't work sadly int3: just tried, doesn't work sadly
addr = alignTo(addr, PageSize);		addr = alignTo(addr, PageSize);

for (auto &p : seg->sections) {		for (auto &p : seg->sections) {
ArrayRef<InputSection *> sections = p.second;		ArrayRef<InputSection *> sections = p.second;
for (InputSection *isec : sections) {		for (InputSection *isec : sections) {
addr = alignTo(addr, isec->align);		addr = alignTo(addr, isec->align);
isec->addr = addr;		isec->addr = addr;
addr += isec->data.size();		addr += isec->getSize();
}		}
}		}
}		}

		addr = alignTo(addr, PageSize);
		linkEditSeg->addr = addr;
linkEditSeg->fileOff = addr - ImageBase;		linkEditSeg->fileOff = addr - ImageBase;
}		}

		// LC_DYLD_INFO_ONLY contains symbol import/export information. Imported
		ruiuUnsubmitted Done Reply Inline Actions Can you add a comment as to what dyld info contains? ruiu: Can you add a comment as to what dyld info contains?
		// symbols are described by a sequence of bind opcodes, which allow for a
		// compact encoding. Exported symbols are described using a trie.
		void Writer::createDyldInfoContents() {
		uint64_t sectionStart = linkEditSeg->getOffset();
		raw_svector_ostream os{linkEditSeg->contents};

		if (in.got->getSize() != 0) {
		// Emit bind opcodes, which tell dyld which dylib symbols to load.

		// Tell dyld to write to the section containing the GOT.
		os << static_cast<uint8_t>(BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB \|
		in.got->parent->index);
		encodeULEB128(in.got->addr - in.got->parent->firstSection()->addr, os);
		for (const DylibSymbol *sym : in.got->getEntries()) {
		// TODO: Implement compact encoding -- we only need to encode the
		// differences between consecutive symbol entries.
		if (sym->file->ordinal <= BIND_IMMEDIATE_MASK) {
		os << static_cast<uint8_t>(BIND_OPCODE_SET_DYLIB_ORDINAL_IMM \|
		sym->file->ordinal);
		} else {
		error("TODO: Support larger dylib symbol ordinals");
		continue;
		}
		os << static_cast<uint8_t>(BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM)
		<< sym->getName() << '\0'
		<< static_cast<uint8_t>(BIND_OPCODE_SET_TYPE_IMM \| BIND_TYPE_POINTER)
		ruiuUnsubmitted Done Reply Inline Actions nit: we usually do os << foo << bar << baz << fizz; instead of os << foo << bar; os << baz << fizz; ruiu: nit: we usually do os << foo << bar << baz << fizz; instead of os << foo << bar…
		int3AuthorUnsubmitted Done Reply Inline Actions sgtm. Any thoughts on whether we should make a wrapper class like the ByteBuffer in ReaderWriter/MachO? I didn't see a whole lot of usage of `raw_ostream`s in the other lld implementations but I didn't find any abstraction layer over them either int3: sgtm. Any thoughts on whether we should make a wrapper class like the ByteBuffer in…
		ruiuUnsubmitted Not Done Reply Inline Actions We don't usually use `raw_ostream` because we usually write to the output file in two passes: in the first pass, we compute an offset for each output element, and then in the second pass, we let each output element to copy itself to the output file. However, that two-pass technique cannot be used to create a ULEB-encoded stuff. because we don't know the exact size of each output element until we fix the file contents. Constructing an ULEB-encoded section contents are naturally sequential. So the usage of `raw_ostream` in for this section is fine. I don't see any problem with that. ruiu: We don't usually use `raw_ostream` because we usually write to the output file in two passes…
		<< static_cast<uint8_t>(BIND_OPCODE_DO_BIND);
		}

		os << static_cast<uint8_t>(BIND_OPCODE_DONE);

		dyldInfoSeg->bindOff = sectionStart;
		dyldInfoSeg->bindSize = linkEditSeg->getOffset() - sectionStart;
		}

		// TODO: emit bind opcodes for lazy symbols.
		// TODO: Implement symbol export trie.
		}

void Writer::openFile() {		void Writer::openFile() {
Expected<std::unique_ptr<FileOutputBuffer>> bufferOrErr =		Expected<std::unique_ptr<FileOutputBuffer>> bufferOrErr =
FileOutputBuffer::create(config->outputFile, fileSize,		FileOutputBuffer::create(config->outputFile, fileSize,
FileOutputBuffer::F_executable);		FileOutputBuffer::F_executable);

if (!bufferOrErr)		if (!bufferOrErr)
error("failed to open " + config->outputFile + ": " +		error("failed to open " + config->outputFile + ": " +
llvm::toString(bufferOrErr.takeError()));		llvm::toString(bufferOrErr.takeError()));
Show All 27 Lines	for (auto &sect : seg->sections)
isec->writeTo(buf + isec->addr - ImageBase);		isec->writeTo(buf + isec->addr - ImageBase);

memcpy(buf + linkEditSeg->fileOff, linkEditSeg->contents.data(),		memcpy(buf + linkEditSeg->fileOff, linkEditSeg->contents.data(),
linkEditSeg->contents.size());		linkEditSeg->contents.size());
}		}

void Writer::run() {		void Writer::run() {
createLoadCommands();		createLoadCommands();
		scanRelocations();
assignAddresses();		assignAddresses();

		// Fill __LINKEDIT contents
		createDyldInfoContents();
fileSize = linkEditSeg->fileOff + linkEditSeg->contents.size();		fileSize = linkEditSeg->fileOff + linkEditSeg->contents.size();

openFile();		openFile();
if (errorCount())		if (errorCount())
return;		return;

writeHeader();		writeHeader();
writeSections();		writeSections();

if (auto e = buffer->commit())		if (auto e = buffer->commit())
error("failed to write to the output file: " + toString(std::move(e)));		error("failed to write to the output file: " + toString(std::move(e)));
}		}

void macho::writeResult() { Writer().run(); }		void macho::writeResult() { Writer().run(); }

		void macho::createSyntheticSections() {
		in.got = make<GotSection>();
		inputSections.push_back(in.got);
		}

lld/test/MachO/Inputs/goodbye-dylib.yaml

This file was added.

				## This yaml file was originally generated from linking the following source
				## input with ld64:
				smeenaiUnsubmitted Done Reply Inline Actions Description for hello and goodbye seems swapped. smeenai: Description for hello and goodbye seems swapped.
				MaskRayUnsubmitted Done Reply Inline Actions Use `##` for comments and `#` for RUN and CHECK lines. MaskRay: Use `## ` for comments and `# ` for RUN and CHECK lines.
				##
				## .section __TEXT,__cstring
				## .globl _goodbye_world
				##
				## _goodbye_world:
				## .asciz "Goodbye world!\n"
				##
				## When lld can produce dylibs, we will use that instead for our test setup.

				--- !mach-o
				FileHeader:
				magic: 0xFEEDFACF
				cputype: 0x01000007
				cpusubtype: 0x00000003
				filetype: 0x00000006
				ncmds: 11
				sizeofcmds: 624
				flags: 0x00100085
				reserved: 0x00000000
				LoadCommands:
				- cmd: LC_SEGMENT_64
				cmdsize: 232
				segname: __TEXT
				vmaddr: 0
				vmsize: 4096
				fileoff: 0
				filesize: 4096
				maxprot: 5
				initprot: 5
				nsects: 2
				flags: 0
				Sections:
				- sectname: __text
				segname: __TEXT
				addr: 0x0000000000000FF0
				size: 0
				offset: 0x00000FF0
				align: 0
				reloff: 0x00000000
				nreloc: 0
				flags: 0x80000400
				reserved1: 0x00000000
				reserved2: 0x00000000
				reserved3: 0x00000000
				content: ''
				- sectname: __cstring
				segname: __TEXT
				addr: 0x0000000000000FF0
				size: 16
				offset: 0x00000FF0
				align: 0
				reloff: 0x00000000
				nreloc: 0
				flags: 0x00000002
				reserved1: 0x00000000
				reserved2: 0x00000000
				reserved3: 0x00000000
				content: 476F6F6462796520776F726C64210A00
				- cmd: LC_SEGMENT_64
				cmdsize: 72
				segname: __LINKEDIT
				vmaddr: 4096
				vmsize: 4096
				fileoff: 4096
				filesize: 72
				maxprot: 1
				initprot: 1
				nsects: 0
				flags: 0
				- cmd: LC_ID_DYLIB
				cmdsize: 64
				dylib:
				name: 24
				timestamp: 1
				current_version: 0
				compatibility_version: 0
				PayloadString: '@executable_path/libgoodbye.dylib'
				ZeroPadBytes: 7
				- cmd: LC_DYLD_INFO_ONLY
				cmdsize: 48
				rebase_off: 0
				rebase_size: 0
				bind_off: 0
				bind_size: 0
				weak_bind_off: 0
				weak_bind_size: 0
				lazy_bind_off: 0
				lazy_bind_size: 0
				export_off: 4096
				export_size: 24
				- cmd: LC_SYMTAB
				cmdsize: 24
				symoff: 4128
				nsyms: 1
				stroff: 4144
				strsize: 24
				- cmd: LC_DYSYMTAB
				cmdsize: 80
				ilocalsym: 0
				nlocalsym: 0
				iextdefsym: 0
				nextdefsym: 1
				iundefsym: 1
				nundefsym: 0
				tocoff: 0
				ntoc: 0
				modtaboff: 0
				MaskRayUnsubmitted Not Done Reply Inline Actions Are all these fields important? I think we probably should think what fields are optional and improve yaml2obj. Otherwise the verbosity can make tests a lot more complex. MaskRay: Are all these fields important? I think we probably should think what fields are optional and…
				int3AuthorUnsubmitted Done Reply Inline Actions @smeenai earlier asked about deleting LC_SYMTAB, and I ran into some issues with yaml2obj while trying it (see my other comment). Given that using yaml2obj is mostly a stop-gap measure until lld itself can emit dylibs, I'd prefer not to spend too much time making the test case minimal. int3: @smeenai earlier asked about deleting LC_SYMTAB, and I ran into some issues with yaml2obj while…
				nmodtab: 0
				extrefsymoff: 0
				nextrefsyms: 0
				indirectsymoff: 0
				nindirectsyms: 0
				extreloff: 0
				nextrel: 0
				locreloff: 0
				nlocrel: 0
				- cmd: LC_UUID
				cmdsize: 24
				uuid: EA09CDDC-A3EA-3EB9-8C4F-334077FE6E5A
				- cmd: LC_BUILD_VERSION
				cmdsize: 32
				platform: 1
				minos: 659200
				sdk: 659200
				ntools: 1
				Tools:
				- tool: 3
				version: 34734080
				- cmd: LC_SOURCE_VERSION
				cmdsize: 16
				version: 0
				- cmd: LC_FUNCTION_STARTS
				cmdsize: 16
				dataoff: 4120
				datasize: 8
				- cmd: LC_DATA_IN_CODE
				cmdsize: 16
				dataoff: 4128
				datasize: 0
				LinkEditData:
				ExportTrie:
				TerminalSize: 0
				NodeOffset: 0
				Name: ''
				Flags: 0x0000000000000000
				Address: 0x0000000000000000
				Other: 0x0000000000000000
				ImportName: ''
				Children:
				- TerminalSize: 3
				NodeOffset: 18
				Name: _goodbye_world
				Flags: 0x0000000000000000
				Address: 0x0000000000000FF0
				Other: 0x0000000000000000
				ImportName: ''
				NameList:
				- n_strx: 2
				n_type: 0x0F
				n_sect: 2
				n_desc: 0
				n_value: 4080
				StringTable:
				- ' '
				- _goodbye_world
				- ''
				- ''
				- ''
				- ''
				- ''
				- ''
				- ''
				...

lld/test/MachO/Inputs/hello-dylib.yaml

This file was added.

				## This yaml file was originally generated from linking the following source
				## input with ld64:
				##
				## .section __TEXT,__cstring
				## .globl _hello_world
				##
				## _hello_world:
				## .asciz "Hello world!\n"
				##
				## When lld can produce dylibs, we will use that instead for our test setup.

				--- !mach-o
				FileHeader:
				magic: 0xFEEDFACF
				cputype: 0x01000007
				cpusubtype: 0x00000003
				filetype: 0x00000006
				ncmds: 11
				sizeofcmds: 616
				flags: 0x00100085
				reserved: 0x00000000
				LoadCommands:
				- cmd: LC_SEGMENT_64
				cmdsize: 232
				segname: __TEXT
				vmaddr: 0
				vmsize: 4096
				fileoff: 0
				filesize: 4096
				maxprot: 5
				initprot: 5
				nsects: 2
				flags: 0
				Sections:
				- sectname: __text
				segname: __TEXT
				addr: 0x0000000000000FF2
				size: 0
				offset: 0x00000FF2
				align: 0
				reloff: 0x00000000
				nreloc: 0
				flags: 0x80000400
				reserved1: 0x00000000
				reserved2: 0x00000000
				reserved3: 0x00000000
				content: ''
				- sectname: __cstring
				segname: __TEXT
				addr: 0x0000000000000FF2
				size: 14
				offset: 0x00000FF2
				align: 0
				reloff: 0x00000000
				nreloc: 0
				flags: 0x00000002
				reserved1: 0x00000000
				reserved2: 0x00000000
				reserved3: 0x00000000
				content: 48656C6C6F20776F726C64210A00
				- cmd: LC_SEGMENT_64
				cmdsize: 72
				segname: __LINKEDIT
				vmaddr: 4096
				vmsize: 4096
				fileoff: 4096
				filesize: 64
				maxprot: 1
				initprot: 1
				nsects: 0
				flags: 0
				- cmd: LC_ID_DYLIB
				cmdsize: 56
				dylib:
				name: 24
				timestamp: 1
				current_version: 0
				compatibility_version: 0
				PayloadString: '@executable_path/libhello.dylib'
				ZeroPadBytes: 1
				- cmd: LC_DYLD_INFO_ONLY
				cmdsize: 48
				rebase_off: 0
				rebase_size: 0
				bind_off: 0
				bind_size: 0
				weak_bind_off: 0
				weak_bind_size: 0
				lazy_bind_off: 0
				lazy_bind_size: 0
				export_off: 4096
				export_size: 24
				- cmd: LC_SYMTAB
				cmdsize: 24
				symoff: 4128
				nsyms: 1
				stroff: 4144
				strsize: 16
				- cmd: LC_DYSYMTAB
				cmdsize: 80
				ilocalsym: 0
				nlocalsym: 0
				iextdefsym: 0
				nextdefsym: 1
				iundefsym: 1
				nundefsym: 0
				tocoff: 0
				ntoc: 0
				modtaboff: 0
				nmodtab: 0
				extrefsymoff: 0
				nextrefsyms: 0
				indirectsymoff: 0
				nindirectsyms: 0
				extreloff: 0
				nextrel: 0
				locreloff: 0
				nlocrel: 0
				- cmd: LC_UUID
				cmdsize: 24
				uuid: 4826226E-9210-3984-A388-D5BD6D6DB368
				- cmd: LC_BUILD_VERSION
				cmdsize: 32
				platform: 1
				minos: 659200
				sdk: 659200
				ntools: 1
				Tools:
				- tool: 3
				version: 34734080
				- cmd: LC_SOURCE_VERSION
				cmdsize: 16
				version: 0
				- cmd: LC_FUNCTION_STARTS
				cmdsize: 16
				dataoff: 4120
				datasize: 8
				- cmd: LC_DATA_IN_CODE
				cmdsize: 16
				dataoff: 4128
				datasize: 0
				LinkEditData:
				ExportTrie:
				TerminalSize: 0
				NodeOffset: 0
				Name: ''
				Flags: 0x0000000000000000
				Address: 0x0000000000000000
				Other: 0x0000000000000000
				ImportName: ''
				Children:
				- TerminalSize: 3
				NodeOffset: 16
				Name: _hello_world
				Flags: 0x0000000000000000
				Address: 0x0000000000000FF2
				Other: 0x0000000000000000
				ImportName: ''
				NameList:
				- n_strx: 2
				n_type: 0x0F
				n_sect: 2
				n_desc: 0
				n_value: 4082
				StringTable:
				- ' '
				- _hello_world
				- ''
				...

lld/test/MachO/Inputs/no-id-dylib.yaml

This file was added.

				## This yaml file was originally generated from linking the following source
				## input with ld64:
				##
				## .section __TEXT,__cstring
				## .globl _hello_world
				##
				## _hello_world:
				## .asciz "Hello world!\n"
				##
				## Then we deleted the LC_ID_DYLIB command from the YAML file.

				--- !mach-o
				FileHeader:
				magic: 0xFEEDFACF
				cputype: 0x01000007
				cpusubtype: 0x00000003
				filetype: 0x00000006
				ncmds: 10
				sizeofcmds: 616
				flags: 0x00100085
				reserved: 0x00000000
				LoadCommands:
				- cmd: LC_SEGMENT_64
				cmdsize: 232
				segname: __TEXT
				vmaddr: 0
				vmsize: 4096
				fileoff: 0
				filesize: 4096
				maxprot: 5
				initprot: 5
				nsects: 2
				flags: 0
				Sections:
				- sectname: __text
				segname: __TEXT
				addr: 0x0000000000000FF2
				size: 0
				offset: 0x00000FF2
				align: 0
				reloff: 0x00000000
				nreloc: 0
				flags: 0x80000400
				reserved1: 0x00000000
				reserved2: 0x00000000
				reserved3: 0x00000000
				content: ''
				- sectname: __cstring
				segname: __TEXT
				addr: 0x0000000000000FF2
				size: 14
				offset: 0x00000FF2
				align: 0
				reloff: 0x00000000
				nreloc: 0
				flags: 0x00000002
				reserved1: 0x00000000
				reserved2: 0x00000000
				reserved3: 0x00000000
				content: 48656C6C6F20776F726C64210A00
				- cmd: LC_SEGMENT_64
				cmdsize: 72
				segname: __LINKEDIT
				vmaddr: 4096
				vmsize: 4096
				fileoff: 4096
				filesize: 64
				maxprot: 1
				initprot: 1
				nsects: 0
				flags: 0
				- cmd: LC_DYLD_INFO_ONLY
				cmdsize: 48
				rebase_off: 0
				rebase_size: 0
				bind_off: 0
				bind_size: 0
				weak_bind_off: 0
				weak_bind_size: 0
				lazy_bind_off: 0
				lazy_bind_size: 0
				export_off: 4096
				export_size: 24
				- cmd: LC_SYMTAB
				cmdsize: 24
				symoff: 4128
				nsyms: 1
				stroff: 4144
				strsize: 16
				- cmd: LC_DYSYMTAB
				cmdsize: 80
				ilocalsym: 0
				nlocalsym: 0
				iextdefsym: 0
				nextdefsym: 1
				iundefsym: 1
				nundefsym: 0
				tocoff: 0
				ntoc: 0
				modtaboff: 0
				nmodtab: 0
				extrefsymoff: 0
				nextrefsyms: 0
				indirectsymoff: 0
				nindirectsyms: 0
				extreloff: 0
				nextrel: 0
				locreloff: 0
				nlocrel: 0
				- cmd: LC_UUID
				cmdsize: 24
				uuid: 4826226E-9210-3984-A388-D5BD6D6DB368
				- cmd: LC_BUILD_VERSION
				cmdsize: 32
				platform: 1
				minos: 659200
				sdk: 659200
				ntools: 1
				Tools:
				- tool: 3
				version: 34734080
				- cmd: LC_SOURCE_VERSION
				cmdsize: 16
				version: 0
				- cmd: LC_FUNCTION_STARTS
				cmdsize: 16
				dataoff: 4120
				datasize: 8
				- cmd: LC_DATA_IN_CODE
				cmdsize: 16
				dataoff: 4128
				datasize: 0
				LinkEditData:
				ExportTrie:
				TerminalSize: 0
				NodeOffset: 0
				Name: ''
				Flags: 0x0000000000000000
				Address: 0x0000000000000000
				Other: 0x0000000000000000
				ImportName: ''
				Children:
				- TerminalSize: 3
				NodeOffset: 16
				Name: _hello_world
				Flags: 0x0000000000000000
				Address: 0x0000000000000FF2
				Other: 0x0000000000000000
				ImportName: ''
				NameList:
				- n_strx: 2
				n_type: 0x0F
				n_sect: 2
				n_desc: 0
				n_value: 4082
				StringTable:
				- ' '
				- _hello_world
				- ''
				...

lld/test/MachO/dylink.s

This file was added.

				# REQUIRES: x86
				# RUN: mkdir -p %t
				# RUN: yaml2obj %p/Inputs/hello-dylib.yaml -o %t/libhello.dylib
				# RUN: yaml2obj %p/Inputs/goodbye-dylib.yaml -o %t/libgoodbye.dylib
				MaskRayUnsubmitted Done Reply Inline Actions `>` -> `-o` MaskRay: `>` -> `-o`
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s -o %t/dylink.o
				# RUN: lld -flavor darwinnew -o %t/dylink -Z -L%t -lhello -lgoodbye %t/dylink.o
				# RUN: llvm-objdump --bind -d %t/dylink \| FileCheck %s

				MaskRayUnsubmitted Done Reply Inline Actions Should obj2yaml and llvm-objdump -d tests be separated? MaskRay: Should obj2yaml and llvm-objdump -d tests be separated?
				int3AuthorUnsubmitted Done Reply Inline Actions not possible with the current setup; I'm using obj2yaml to get the address of the GOT, and to check that the order of the symbols references in it match the addresses printed by objdump int3: not possible with the current setup; I'm using obj2yaml to get the address of the GOT, and to…
				# CHECK: movq [[#%u, HELLO_OFF:]](%rip), %rsi
				# CHECK-NEXT: [[#%x, HELLO_RIP:]]:

				# CHECK: movq [[#%u, GOODBYE_OFF:]](%rip), %rsi
				# CHECK-NEXT: [[#%x, GOODBYE_RIP:]]:

				# CHECK-LABEL: Bind table:
				# CHECK-DAG: __DATA_CONST __got 0x{{0*}}[[#%x, HELLO_RIP + HELLO_OFF]] pointer 0 libhello _hello_world
				# CHECK-DAG: __DATA_CONST __got 0x{{0*}}[[#%x, GOODBYE_RIP + GOODBYE_OFF]] pointer 0 libgoodbye _goodbye_world

				.section __TEXT,__text
				.globl _main

				_main:
				movl $0x2000004, %eax # write() syscall
				mov $1, %rdi # stdout
				movq _hello_world@GOTPCREL(%rip), %rsi
				mov $13, %rdx # length of str
				syscall

				movl $0x2000004, %eax # write() syscall
				mov $1, %rdi # stdout
				movq _goodbye_world@GOTPCREL(%rip), %rsi
				mov $15, %rdx # length of str
				syscall
				mov $0, %rax
				ret

lld/test/MachO/missing-dylib.s

This file was added.

				# REQUIRES: x86
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s -o %t.o
				# RUN: not lld -flavor darwinnew -Z -o %t -lmissing %t.o 2>&1 \| FileCheck %s

				# CHECK: library not found for -lmissing

lld/test/MachO/no-id-dylink.s

This file was added.

				# REQUIRES: x86
				# RUN: mkdir -p %t
				# RUN: yaml2obj %p/Inputs/no-id-dylib.yaml -o %t/libnoid.dylib
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s -o %t/no-id-dylink.o
				# RUN: not lld -flavor darwinnew -o %t/no-id-dylink -Z -L%t -lnoid %t/no-id-dylink.o 2>&1 \| FileCheck %s
				# CHECK: dylib {{.*}}libnoid.dylib missing LC_ID_DYLIB load command

				.text
				.globl _main

				_main:
				mov $0, %rax
				ret

lld/test/MachO/search-paths.test

This file was added.

				RUN: mkdir -p %t

				RUN: lld -flavor darwinnew -v -L%t 2>&1 \| FileCheck -DDIR=%t %s
				CHECK: Library search paths:
				CHECK-NEXT: [[DIR]]
				CHECK-NEXT: /usr/lib
				CHECK-NEXT: /usr/local/lib

				RUN: lld -flavor darwinnew -v -L%t -Z 2>&1 \| FileCheck -DDIR=%t --check-prefix=CHECK_Z %s
				CHECK_Z: Library search paths:
				CHECK_Z-NEXT: [[DIR]]
				CHECK_Z-NOT: /usr/

This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Add basic support for linking against dylibsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 259101

lld/MachO/Arch/X86_64.cpp

lld/MachO/CMakeLists.txt

lld/MachO/Config.h

lld/MachO/Driver.cpp

lld/MachO/InputFiles.h

lld/MachO/InputFiles.cpp

lld/MachO/InputSection.h

lld/MachO/InputSection.cpp

lld/MachO/Options.td

lld/MachO/OutputSegment.h

lld/MachO/SymbolTable.h

lld/MachO/SymbolTable.cpp

lld/MachO/Symbols.h

lld/MachO/SyntheticSections.h

lld/MachO/SyntheticSections.cpp

lld/MachO/Target.h

lld/MachO/Writer.h

lld/MachO/Writer.cpp

lld/test/MachO/Inputs/goodbye-dylib.yaml

lld/test/MachO/Inputs/hello-dylib.yaml

lld/test/MachO/Inputs/no-id-dylib.yaml

lld/test/MachO/dylink.s

lld/test/MachO/missing-dylib.s

lld/test/MachO/no-id-dylink.s

lld/test/MachO/search-paths.test

[lld-macho] Add basic support for linking against dylibs
ClosedPublic