This is an archive of the discontinued LLVM Phabricator instance.

[lld] Initial commit for new Mach-O backend
ClosedPublic

Authored by int3 on Feb 28 2020, 1:14 PM.

Details

Summary

This is the first commit for the new Mach-O backend, designed to roughly follow the architecture of the existing ELF and COFF backends, and building off work that @ruiu and @pcc did in a branch a while back. Note that this is a very stripped-down commit with the bare minimum of functionality for ease of review. We'll be following up with more diffs soon.

Currently, we're able to generate a simple "Hello World!" executable that runs on OS X Catalina (and possibly on earlier OS X versions; I haven't tested them). (This executable can be obtained by compiling test/MachO/relocations.s.) We're mocking out a few load commands to achieve this -- for example, we can't load dynamic libraries, but Catalina requires binaries to be linked against dyld, so we hardcode the emission of a LC_LOAD_DYLIB command. Other mocked out load commands include LC_SYMTAB and LC_DYSYMTAB.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 1:14 PM
int3 edited the summary of this revision. (Show Details)Feb 28 2020, 1:18 PM
int3 planned changes to this revision.Feb 28 2020, 1:19 PM
int3 edited the summary of this revision. (Show Details)Feb 28 2020, 1:22 PM
int3 added subscribers: ruiu, pcc.
int3 requested review of this revision.Feb 28 2020, 1:28 PM

Excited to see this! (I generally like playing with binary formats, though I know close to zero about Mach-O.)

https://lists.llvm.org/pipermail/llvm-dev/2018-June/123784.html

But I don't think that's the real issue. I think the real issue is the lack of maintenance and ownership of the mach-O lld tree. There's no activities for the tree for years, though we've been making efforts to keep it compile and pass all the existing tests.

So, is this an announcement that you want to claim the maintenance? ☺️

Excited to see this! (I generally like playing with binary formats, though I know close to zero about Mach-O.)

https://lists.llvm.org/pipermail/llvm-dev/2018-June/123784.html

But I don't think that's the real issue. I think the real issue is the lack of maintenance and ownership of the mach-O lld tree. There's no activities for the tree for years, though we've been making efforts to keep it compile and pass all the existing tests.

So, is this an announcement that you want to claim the maintenance? ☺️

Yes :) Although note that we're developing a new Mach-O backend that uses the same design as COFF and ELF, not working with the existing one.

smeenai added a subscriber: Ktwu.Feb 28 2020, 2:47 PM
dmajor added a subscriber: dmajor.Feb 28 2020, 3:10 PM
keith added a subscriber: keith.Feb 28 2020, 4:15 PM
MaskRay added inline comments.Feb 28 2020, 11:08 PM
lld/MachO/Driver.cpp
32

Reorder llvm namespaces before lld.

67

auto -> opt::Arg

148

excess parentheses

lld/MachO/InputFiles.cpp
104

MachO/section-names.s should test a long section name.

105

There should be a test.

125

Logically unreachable code uses llvm_unreachable. Unimplemented features should call error() instead.

149
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_vector.h:398:7: error: static_assert failed due to requirement 'is_same<llvm::MachO
::section_64, const llvm::MachO::section_64>::value' "std::vector must have a non-const, non-volatile value_type"
      static_assert(is_same<typename remove_cv<_Tp>::type, _Tp>::value,
lld/MachO/Target.cpp
21

How many targets do Mach-O have? If it can be larger, like ELF/Arch/X86_64.cpp, we can create MachO/Arch accordingly.

37

Logically unreachable code uses llvm_unreachable. Unimplemented relocation types should call error() instead.

lld/MachO/Writer.cpp
250

Is it a const?

Who defines it as /usr/lib/dyld ?

lld/test/MachO/duplicate-symbol.s
5

Does lld -flavor darwinnew -o /dev/null %t.o %t.o work?

lld/test/MachO/relocations.s
12

Delete unneeded instructions.

lld/test/MachO/section-names.s
24

too much indentation

lld/test/MachO/undefined-entry.s
11

8-space indentation may be too much. 2 is fine.
If the instruction is not significant, delete mov $0, %rax

lld/tools/lld/CMakeLists.txt
15

I hope we can name it lldMacho.

plotfi added a subscriber: plotfi.Mar 1 2020, 5:42 PM
ruiu added a comment.Mar 1 2020, 9:07 PM

This seems a pretty straightforward port of the existing backends. Very nice!

Can you extend the commit message a bit to explain what you can do with this initial patch -- e.g. can you create a syntactically-valid mach-o executable with this? Can you create a trivial executable? (I guess the answer is no because you need to support dynamic linking.)? etc.

int3 edited the summary of this revision. (Show Details)Mar 2 2020, 6:46 AM
int3 edited the summary of this revision. (Show Details)
int3 marked 4 inline comments as done.Mar 2 2020, 6:48 AM

I guess the answer is no because you need to support dynamic linking.

The answer is actually yes, though we cheat a bit :D I've updated the summary.

lld/MachO/InputFiles.cpp
149

ah I need to figure out how to build with gcc locally... but it looks like the error comes from the ArrayRef of const, so I've removed the const

lld/MachO/Target.cpp
21

we'll want to target ARM at some point. I'll create the subdir

lld/MachO/Writer.cpp
250

ld64 defaults this value to /usr/lib/dyld but allows you to override it via the env var LD_DYLD_PATH. However recent versions of the kernel don't execute anything with a different dyld path so it's effectively a constant: https://github.com/apple/darwin-xnu/pull/8

I'll make this a const and add a comment.

lld/tools/lld/CMakeLists.txt
15

I'm not sure if anyone is still depending on the old partial Mach-O implementation. If no one objects I'll take over its build/flavor names

int3 updated this revision to Diff 247627.Mar 2 2020, 6:50 AM

address comments

int3 marked 14 inline comments as done.Mar 2 2020, 6:52 AM
int3 added inline comments.
lld/MachO/InputFiles.cpp
104

okay, added a 16 char section name (the maximum allowed by the format)

lld/test/MachO/duplicate-symbol.s
5

yup it does

int3 marked 2 inline comments as done.Mar 2 2020, 6:53 AM
int3 updated this revision to Diff 247629.Mar 2 2020, 6:56 AM

update

smeenai added inline comments.Mar 2 2020, 11:16 AM
lld/tools/lld/CMakeLists.txt
15

I believe some people are using it right now, so I was planning to wait until we were more feature-complete to do this.

pcc added inline comments.Mar 2 2020, 11:44 AM
lld/MachO/InputFiles.cpp
28

Since this isn't implemented yet, maybe this part of the comment should be omitted?

38

I don't believe that this is true in general. In the case where R_SCATTERED is false, bit 27 of r_word1 (the extern bit) indicates whether the relocation refers to a symbol or a section address. You can see this by grepping for rExtern in the existing lld, and seeing how relocations with/without the bit are handled (e.g. http://llvm-cs.pcc.me.uk/tools/lld/lib/ReaderWriter/MachO/ArchHandler_arm64.cpp#434 ).

int3 marked an inline comment as done.Mar 2 2020, 12:26 PM
int3 added inline comments.
lld/MachO/InputFiles.cpp
38

This was originally written by @ruiu, but I *think* what this means is that R_SCATTERED determines whether the relocation is referring to something within the same section, and not whether it's referring to a symbol or a section. I'll rephrase things...

int3 updated this revision to Diff 247709.Mar 2 2020, 12:32 PM

update comment

int3 marked an inline comment as done.Mar 2 2020, 12:33 PM
aprantl added inline comments.
lld/MachO/Driver.cpp
1

Putting a -*- C++ -*- mark into a .cpp file is redundant. These are only necessary for .h files where the language is ambiguous.

lld/MachO/OutputSegment.cpp
16

@ruiu, @pcc - just curious, would you mind adding some comments regarding the design considerations which motivated this approach (using global variables to store various parts of the object file, storing the state globally etc) ?

pcc added inline comments.Mar 2 2020, 1:37 PM
lld/MachO/InputFiles.cpp
38

I think that both non-R_SCATTERED and R_SCATTERED can refer to locations within different sections. You can see this in the implementation of "atomByAddr" in the old linker:
http://llvm-cs.pcc.me.uk/tools/lld/lib/ReaderWriter/MachO/MachONormalizedFileToAtoms.cpp#527
which contains code to look up potentially different sections.

This function is called e.g. here:
http://llvm-cs.pcc.me.uk/tools/lld/lib/ReaderWriter/MachO/ArchHandler_arm64.cpp#444

and the "symbol" field (which becomes the first argument of that function) is filled in here:
http://llvm-cs.pcc.me.uk/tools/lld/lib/ReaderWriter/MachO/MachONormalizedFileBinaryUtils.h#144
(and code below to handle the non-R_SCATTERED case).

I would summarise the situation as being:

  • non-scattered relocations may refer to either an external symbol or a location within a (same or different) section (index is specified in relocation, or inferred from address if 0)
  • scattered relocations may only refer to a location within a (same or different) section (index is inferred from address).
MaskRay added inline comments.Mar 2 2020, 1:50 PM
lld/MachO/OutputSegment.cpp
16

I started to be heavily involved in lld/ELF since mid 2018. I do not know much about the previous history.

You find the discussions in D70421 and https://reviews.llvm.org/D70766#1761624 helpful. If we can avoid global variables, that will be nice.

int3 marked an inline comment as done.Mar 3 2020, 8:38 AM
int3 added inline comments.
lld/MachO/InputFiles.cpp
38

Thanks for the pointers! That makes sense. One question:

section (index is specified in relocation, or inferred from address if 0)

Can a valid relocation actually have 0 as the section index? Relocation::value for a non-scattered relocation is always zero, so I'm not sure what address we could infer things from...

pcc added inline comments.Mar 3 2020, 9:15 AM
lld/MachO/InputFiles.cpp
38

In this case, the address that we infer things from is stored inline in the section data (see getImplicitAddend here, or the second argument to atomByAddr in the old linker).

int3 updated this revision to Diff 247959.Mar 3 2020, 10:46 AM

update comments

int3 marked an inline comment as done.Mar 3 2020, 10:47 AM
int3 added inline comments.
lld/MachO/InputFiles.cpp
38

I see, thanks!

int3 marked an inline comment as done.Mar 3 2020, 10:47 AM
ruiu added inline comments.Mar 4 2020, 1:20 AM
lld/MachO/OutputSegment.cpp
16

This variable is essentially a singleton, as you need only one list of output segments for each instance of the linker, and in many places you'll need to access it, so I decided to make it a global variable. Other singletons are also represented by global variables. In addition to that, this code is simply written in an old-fashioned way to reflect my taste, which shouldn't be necessarily bad. Actually I'm fairly satisfied with this design as it is pretty straightforward.

An alternative design is to define a "context" object and make that object to have all singletons like this. That design isn't bad too and is perhaps arguably better, but if you do that you'll end up having to add one extra parameter to virtually all functions.

Anyway, this topic tends to become a bikesheddy, so I'd focus on bringing up the mach-o linker while reusing the exact same design as the other ports.

zapster added a subscriber: zapster.Mar 4 2020, 5:17 AM
lhames added a subscriber: lhames.Mar 4 2020, 12:35 PM

So, is this an announcement that you want to claim the maintenance? ☺️

Yes :) Although note that we're developing a new Mach-O backend that uses the same design as COFF and ELF, not working with the existing one.

I'm happy to resign as code owner of the existing MachO LLD codebase. If/when possible, I think it should be removed from the tree.

keith removed a subscriber: keith.Mar 4 2020, 1:01 PM
alexander-shaposhnikov added inline comments.
lld/MachO/OutputSegment.cpp
16

@ruiu, for what it's worth, my 0.02$.
it seems like there are multiple downsides of this "design", starting from readability, clarity of the internal model of the output file, ending with, for example, if one wanted to use LLD as a library and construct multiple instances of different output files, or, e.g. write unit-tests in memory to verify some parts of the functionality separately. Probably, the list can be continued. So it's not clear at all what dictates them to be singletons.
If i am not mistaken at some point it was also mentioned in the LLVM Coding Standards https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors, some it would be interesting to find out why it was ignored for ELF/COFF.
Regarding the argument "I'd focus on bringing up the mach-o linker while reusing the exact same design as the other ports" - I fully support this effort of writing a modern solid implementation of LLD for MachO and, in particular, don't want to block this diff, however, this argument looks quite weak, in the future when more complexity will be accumulated in this code it'll be much harder to change the course, at least it is hard to see how it can get any simpler. cc: David @dblaikie

Alpha added a subscriber: Alpha.Mar 6 2020, 8:52 AM
dblaikie added inline comments.Mar 6 2020, 6:21 PM
lld/MachO/OutputSegment.cpp
16

These issues were discussed at length previously without a particularly satisfying resolution (perhaps more accurately - it didn't go the way I wanted it to go, but not everything can/should/does) in the context of the broader LLVM project (being library centric, etc) - and lld has continued to exist (& be adopted in production even by those, myself included, who had/have misgivings about the design choices here).

(it's probably not helpful to use quotation marks around the word 'design' to call into question someone else's perspective - it is an intentional design choice that has been discussed at length in the past, not an accident or choice made casually)

While writing the design discussion/conclusions down somewhere probably would be nice - even the act of doing that might just end up igniting opinions/relitigating the previous decisions & generally being a bit unproductive. So it's not super high on my priority list nor something I'd strongly encourage/request others do either.

dblaikie added inline comments.Mar 6 2020, 6:23 PM
lld/MachO/OutputSegment.cpp
16

Oh, one other thing here, I guess - LLVM started with some similar issues & eventually LLVMContext was added, etc. Probably (I'm totally guessing here) when LLVM was a bit larger than lld is now. So it's probably still pretty tractible if/when anyone wants to librarify lld by making it uses non halt-on-error error handling, and moving global state into an LLDContext or similar. (I'm guessing the error handling might be the bigger part of that thing - and could be solved without adding the context & doing that work would help justify the work to migrate from global variables to a context)

lld/MachO/OutputSegment.cpp
16

@dblaikie ok, thanks. Even though it feels like some decisions here are suboptimal (and I voiced these concerns because this is just the beginning of the project (for MachO), later it will arguably get more complicated), that's just my personal opinion, no more than that. I can only emphasize that don't want to block this diff and I'm also interested in having MachO support in LLD.

This revision is now accepted and ready to land.Mar 6 2020, 9:41 PM
dblaikie added inline comments.Mar 7 2020, 11:42 AM
lld/MachO/OutputSegment.cpp
16

Yeah, I can totally appreciate the timing there - a new target being a new/independent opportunity to try something different and see how it looks/might inform the existing implementations, but just given the history here I think that'd be difficult and end up derailing this review so better handled as an independent proposal/experiment (which, to be sure, could be done on one target implementation - similarly to inform/see if it's a good idea for the others, etc)

Thanks for keeping these sort of design choices in mind, in any case!

ruiu added inline comments.Mar 9 2020, 1:41 AM
lld/MachO/OutputSegment.cpp
16

I wrote a comment but deleted to avoid derailing the code review (and most comments were already mentioned in previous threads anyway.) As to this mach-o port, I strongly prefer to stick with the same design as other ports so that we don't have to invent new ways of doing something but focus on writing a working mach-o linker first. In fact, if we want to experiment an idea to change the way how we manage the context, I think we should do that for other port, as it should give you a whole picture how it will look like at last.

@ruiu, as LLD code owner, would you be able to review this and give a LGTM as well, since it's the initial commit for a new backend?

ruiu accepted this revision.Mar 10 2020, 3:14 AM

LGTM

Thank you very much for doing this!

lld/MachO/InputFiles.cpp
142

nit: ObjKind -> objKind

int3 marked an inline comment as done.Mar 12 2020, 11:38 AM

Thanks everyone for the reviews and suggestions!

int3 added inline comments.Mar 12 2020, 11:38 AM
lld/MachO/InputFiles.cpp
142

hm this is an enum value, not a member/parameter, so I think CamelCase is correct here?

smeenai added inline comments.Mar 12 2020, 11:59 AM
lld/MachO/InputFiles.cpp
142

Yeah ObjKind is consistent with LLD ELF.

I'll carefully read through the patch. Hope you don't mind keeping it open for more time.

I'll carefully read through the patch. Hope you don't mind keeping it open for more time.

Sure, go ahead. The more eyes on this the better :)

ruiu added a comment.Mar 13 2020, 12:09 AM

Jez and Shoaib, if you guys have more patches on this one, you can send them now for review if you want.

Jez and Shoaib, if you guys have more patches on this one, you can send them now for review if you want.

Yup, we'll start putting up follow-ups soon (probably next week).

int3 updated this revision to Diff 250317.Mar 13 2020, 3:54 PM

clang-tidy + rebase

MaskRay added inline comments.Mar 16 2020, 9:15 PM
lld/MachO/Driver.cpp
41

const char *NAME[] = VALUE;

75

Add a -arch i386 test. There is currently no test.

90

Add a test similar to test/ELF/invalid/executable.s

127

Add an -e test with an undefined symbol.

140

vec can be omitted.

lld/MachO/InputFiles.cpp
25

the beginning

Do you mean that a symbol defined in the section must be at the beginning of a subsection, unless the N_ALT_ENTRY attribute is set?

71

Add a cannot open test. See test/ELF/basic.s (I don't think placing many things in one file is good but you can copy one of the tests (%t.no.such.file))

85
86

reinterpret_cast is more idiomatic.

107

What if sec.align is >= 32 ?

MaskRay added inline comments.Mar 16 2020, 9:41 PM
lld/MachO/Arch/X86_64.cpp
51

getImplicitAddend errors for an unknown relocation type. Is here unreachable?

lld/MachO/Options.td
12

--version uses a double-dash form while other long options use single-dash form. Can you confirm?

lld/MachO/Target.h
19

pageSize
imageBase

28

In ELF, D73254 renamed relocateOne to relocate and passed in Relocation. You may think whether Mach-O will need other fields of Reloc.

lld/MachO/Writer.cpp
39

virtual uint64_t getSize() const = 0;

50

Why are the two variables not placed with other member variables?

95

const

100

Can os be local to createDyldInfoContents?

int3 updated this revision to Diff 250695.Mar 16 2020, 10:48 PM

minor update

int3 updated this revision to Diff 250999.Mar 17 2020, 11:46 PM
int3 marked 20 inline comments as done.

address comments

int3 updated this revision to Diff 251000.Mar 17 2020, 11:47 PM

update

lld/MachO/Arch/X86_64.cpp
52

good point, hadn't noticed that Writer would exit if errorCount was nonzero.

lld/MachO/InputFiles.cpp
25

It was @ruiu who originally wrote this, but I believe it means that a symbol w/o the N_ALT_ENTRY attribute defines the beginning of a subsection. So your statement should be true by definition. Maybe "by default" should be replaced with "by definition"...

86

fair enough. I've converted all the C-style casts I could find.

107

I haven't found the logic in ld64 that handles this, but some experimentation (creating object files with high alignments using yaml2obj and then passing it to ld64) suggests it does applies mod 256 to sec.align w/o emitting warnings or anything. Obviously 2 ** 256 is still ridiculously large, and ultimately ld64 emits a 2GB object file, so it looks like some silent overflows are happening.

I'm leaning towards applying mod 32 here and emitting a warning.

lld/MachO/Options.td
12

hm, ld64 actually doesn't have a version flag at all, I was just copying lld-elf here... but I suppose it would make sense for this to be single-dash to be consistent with the other flags here (and in ld64)

lld/MachO/Target.h
19

clang-tidy specifies enums to be CamelCase

int3 marked 3 inline comments as done.Mar 17 2020, 11:55 PM
ruiu added inline comments.Mar 18 2020, 12:22 AM
lld/MachO/InputFiles.cpp
107

mod 32 seems a bit too arbitrary, so how about always setting isec->align to sec.align with a warning if sec.align is too large?

lld/MachO/Options.td
12

It looks like ld64 does have -v instead of --version or -version.

Harbormaster failed remote builds in B49555: Diff 251003!
Harbormaster failed remote builds in B49551: Diff 250999!
int3 marked an inline comment as done.Mar 18 2020, 12:34 AM
int3 added inline comments.
lld/MachO/InputFiles.cpp
107

how do you propose we align the addresses for sections with large align values?

ruiu added inline comments.Mar 18 2020, 12:42 AM
lld/MachO/InputFiles.cpp
107

Oh, the first thing I'd like to point out is that we store raw alignment values instead of power-of-two values to alignment variables in other ports, and mach-o probably should follow that to avoid confusion, even if it means we compute a power-of-two and then convert it back to an exponent.

To answer to your question, if you use uint32_t for align member, the maximum alignment we can represent is naturally limited to 2^31. How about print out an error (instead of warning) if it is greater than that?

int3 updated this revision to Diff 251013.Mar 18 2020, 1:45 AM
int3 marked 3 inline comments as done.

change alignment handling

lld/MachO/InputFiles.cpp
107

Ah, thanks for the explanation. And yeah, I guess ld64's behavior isn't worth emulating, erroring out is better.

ruiu accepted this revision.Mar 18 2020, 2:47 AM

LGTM

lld/MachO/InputFiles.cpp
109

nit: error messages should start with a lowercase letter.

110

Set a dummy value (e.g. 1) to isec->align if the input is invalid, so that even after we meet an error condition, all the member variables are at least initialized with some value instead of left uninitialized.

int3 updated this revision to Diff 251075.Mar 18 2020, 7:14 AM
int3 marked an inline comment as done.

lowercase

int3 marked an inline comment as done.Mar 18 2020, 7:15 AM
int3 added inline comments.
lld/MachO/InputFiles.cpp
110

align is default-initialized to zero in the class definition

int3 updated this revision to Diff 251139.Mar 18 2020, 11:35 AM

more const + override

int3 updated this revision to Diff 251533.Mar 19 2020, 7:29 PM

fix align handling

int3 updated this revision to Diff 252808.Mar 26 2020, 5:24 AM

have LC::getSize() return uint32_t since sizeofcmds is 32 bit

gkm added a subscriber: gkm.Mar 26 2020, 12:54 PM
ruiu added a comment.Mar 30 2020, 10:19 PM

Can you submit this change? I think this is good enough as an initial commit, and you can make further changes on top of this.

int3 added a comment.Mar 31 2020, 11:13 AM

I don't have commit access yet. @smeenai, could you do it?

I don't have commit access yet. @smeenai, could you do it?

Sure, I can get to this soon.

I don't have commit access yet. @smeenai, could you do it?

Sure, I can get to this soon.

Make sure to use git c --amend --author='Jez Ng <.............>'

MaskRay accepted this revision.Mar 31 2020, 11:37 AM
This revision was automatically updated to reflect the committed changes.

I don't have commit access yet. @smeenai, could you do it?

Sure, I can get to this soon.

Make sure to use git c --amend --author='Jez Ng <.............>'

@int3 used arc diff to upload the patch, so arc patch sets the correct author information automatically.

MaskRay added a comment.EditedApr 2 2020, 11:55 AM

This was reverted by af39151f3c54b90ff0ae1064f540a97b640298e9 along with my gn build fix (c886e2be1ee4c57ef9f3a75509b4f2299ad9f61b).

@ostannard Hope you can paste a notice and give some time before reverting :)
This looks like an easy fix. A revert could be more disruptive.

Relanded with commit 6acd3003755db3944f7fcbea7542bd574a41c0a0