This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
MachO/
-
CMakeLists.txt
-
Driver.cpp
1/2
ExportTrie.h
-
ExportTrie.cpp
-
InputFiles.cpp
3/4
InputSection.h
-
InputSection.cpp
4/10
MergedOutputSection.h
7/14
MergedOutputSection.cpp
8/16
OutputSection.h
7/17
OutputSection.cpp
8/12
OutputSegment.h
5/9
OutputSegment.cpp
-
Symbols.h
1/4
SyntheticSections.h
3/7
SyntheticSections.cpp
6/13
Writer.cpp
-
test/MachO/
-
MachO/
-
Inputs/
-
libfunction.s
3/10
section-merge.s

Differential D77893

[lld] Merge Mach-O input sections
ClosedPublic

Authored by Ktwu on Apr 10 2020, 1:07 PM.

Download Raw Diff

Details

Reviewers

ruiu
pcc
gkm
MaskRay
alexander-shaposhnikov
christylee
smeenai
int3

Commits

rG6cb073133c56: [lld] Merge Mach-O input sections

Summary

Similar to other formats, input sections in the MachO implementation are now grouped under output sections. This is primarily a refactor, although there's some new logic (like resolving the output section's flags based on its inputs).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Looks good at a high level!

lld/MachO/InputSection.h
49–50	LLD ELF has an `outSecOff` in its InputSections, which tracks the offset of this particular section within its output section. I think that'd be better than storing the absolute address in the InputSection.
lld/MachO/OutputSection.cpp
21	What's this computation doing?
33	We should experiment with how ld64 handles merging hidden sections, or if that's even a thing it does. (I don't think you can specify a section is hidden yourself; ld64 just has a list of atoms it defines to be hidden.)
lld/MachO/OutputSegment.cpp
39–43	I'm wondering if it'd be better to construct an OutputSection independently and then pass that into this function instead.
53–54	Is the check flipped?

Ktwu marked 3 inline comments as done.Apr 14 2020, 4:32 PM

Ktwu added inline comments.

lld/MachO/InputSection.h
49–50	Interesting, I'll look into that.
lld/MachO/OutputSection.cpp
21	It's copying what InputSection did to calculate its file offset. I didn't entirely comprehend this math tbh.
lld/MachO/OutputSegment.cpp
53–54	Oops, yes.

Passing most local tests now

Harbormaster failed remote builds in B53281: Diff 257574!Apr 14 2020, 6:31 PM

Looking good! Needs tests once everything's working :)

lld/MachO/InputSection.h
38	I think `getVA` would be more in line with the LLD naming for this concept.
47	Similarly, since this is the same notion as LLD ELF's `outSecOff`, I'd probably stick with that name, just so it's easy to map concepts between the two.
lld/MachO/OutputSection.cpp
21	Okay, this makes sense.
31	Can you add more details to this error message? It'd be ideal to have things like the section in question, the object file it's coming from, etc. Also, is this what ld64 does?
35	Can you add a TODO for figuring out how we should handle input sections with conflicting hidden-ness?

Ktwu marked 4 inline comments as done.Apr 16 2020, 11:42 AM

Ktwu added inline comments.

lld/MachO/OutputSection.cpp
31	Sure. So far as I can tell, ld64 doesn't do section merging on a flag level like I first thought; now that I'm diving into it, OutputFile.cpp separates flags into individual boolean attributes. It's not the easiest codebase to navigate :/

Added a test; unfortunately symbol export is blocking the text so I need to rebase

Harbormaster failed remote builds in B53605: Diff 258118!Apr 16 2020, 12:15 PM

int3 added inline comments.Apr 16 2020, 12:25 PM

lld/MachO/OutputSection.cpp
21	The parent segment's start address is defined as the address of the first section it contains. So `addr - parent->firstSection()->addr` computes the section's offset within the segment. Maybe we don't need this any more if we have `outSecOff` (going to investigate)
35	Personally I don't think we should support hidden InputSections until we find a use case (I'm not aware of one so far)

int3 added inline comments.Apr 16 2020, 12:50 PM

lld/MachO/OutputSection.cpp
21	Oh, outSecOff is for the InputSection's offset within the OutputSection, but this computation is for the OutputSection's offset within its segment. So having `outSecOff` doesn't impact this. That said the ELF implementation seems to have an explicit `OutputSection::offset` field, though I'm not sure what populates it... but I think this scheme of computing the offset from the address is fine for now

Ktwu added a parent revision: D76977: [lld-macho] Implement basic export trie.Apr 16 2020, 1:57 PM

Rebased, various comments about renames, basic test

Ktwu retitled this revision from WIP [lld] Merge Mach-O input sections to [lld] Merge Mach-O input sections.Apr 16 2020, 3:02 PM

Ktwu edited the summary of this revision. (Show Details)

Ktwu added reviewers: ruiu, pcc, gkm, MaskRay, alexander-shaposhnikov, christylee, smeenai, int3.

Harbormaster failed remote builds in B53646: Diff 258182!Apr 16 2020, 3:05 PM

int3 added inline comments.Apr 16 2020, 3:10 PM

lld/MachO/OutputSection.cpp
35	I.e. I think the synthetic sections could become OutputSections. So the only InputSections would actually be from real inputs, which will never be hidden

Ktwu marked 2 inline comments as done.Apr 16 2020, 7:20 PM

Ktwu added inline comments.

lld/MachO/OutputSection.cpp
35	Yeah I could try that!

Ktwu added a child revision: D78342: [lld] Add archive file support to Mach-O backend.Apr 16 2020, 7:30 PM

Refactored to turn synthetic input sections into output sections. A new class represents merged input sections, the MergedOutputSection.

Harbormaster failed remote builds in B54051: Diff 258916!Apr 21 2020, 1:35 AM

Pruned unnecessary hidden checks, coalesced some functions, removed unused imports

tweak comment

Harbormaster failed remote builds in B54132: Diff 259061!Apr 21 2020, 11:22 AM

Harbormaster failed remote builds in B54128: Diff 259056!

int3 added inline comments.Apr 21 2020, 7:56 PM

lld/MachO/MergedOutputSection.cpp
21	bikeshed: Do we want to prefix all member accesses with `this->`? lld-ELF seems to use it inconsistently, lld-COFF in just a few places... I have a slight preference towards omitting it, but we should be consistent either way
lld/MachO/OutputSection.cpp
26–29	can we just define this method on MergedOutputSection?
lld/MachO/OutputSection.h
21	How does this class hierarchy compare to that in the ELF implementation? I know they have `SectionBase`, `InputSectionBase`, and `MergeInputSection`... was planning to dig into it eventually but curious if you have looked
lld/MachO/OutputSegment.h
34–35	nit: I think `using SectionMap = ...` is the more modern C++ way (and is the method favored by lld-ELF/COFF)
43	@ruiu will object to the `auto` :) I personally hate typing out an `std::pair` type, but we should at least unpack `i.second` into a var with a named type nit 2: use `auto &` here alternatively we could replace the loop with `std::any_of`
52	`addOutputSection` seems like a more fitting name
55–56	if `sections` isn't private any more, it doesn't need an accessor edit: I see that it's non-private only because we need to sort it. How about making the sorting a method on this class? Related thought... I see that MapVector has a `takeVector` method that clears out the map and returns the underlying vector. Maybe we could do that -- have a `getSortedSections` method that returns an empty vector until we actually sort things. That would mean that the comparator can work on an actual vector & take single elements instead of `std::pair`s. Just my 2c, might be overcomplicating things here
lld/MachO/SyntheticSections.h
50	should this field live in the parent class? edit: Oh I see, OutputSections don't define `segname` because that's accessed through `parent->name`, so only InputSections and synthetic sections need to define segnames. How about defining a `SyntheticSection` superclass that defines this field in its ctor, and have every class in this file inherit from it?
lld/MachO/Writer.cpp
254–255	Oh I see... looks like the problem with my initial sorting scheme is that it assumed that the segments were only ever created once all the sections were sorted. But this didn't account for __LINKEDIT, though I was lucky enough to create it after the sorting, so things worked out... but explicitly sorting both the segments and sections is more reliable. I think we can simplify the `order` method a bit though -- it's currently written to support comparing sections across different segments, and we're not using that functionality any more. Can we have two separate `order` methods, one for segments and the other for sections in the same segment?
395–396	Thanks, I think it's clearer this way

int3 added inline comments.Apr 21 2020, 8:08 PM

lld/MachO/OutputSegment.h
52	also, I think if would make sense if we moved `createOutputSection` from Writer.cpp to a method on this class, then this helper method can be made private

Ktwu marked 7 inline comments as done.Apr 21 2020, 10:22 PM

Ktwu added inline comments.

lld/MachO/OutputSegment.h
55–56	Yes to section sorting living here, nay to the `takeVector` idea (at least in this diff).
lld/MachO/SyntheticSections.h
50	Yup, a base SyntheticSection class sounds ++
lld/MachO/Writer.cpp
254–255	I was thinking about that, too, so I'll try it.

Ktwu marked 3 inline comments as done.Apr 22 2020, 1:02 PM

Ktwu added inline comments.

lld/MachO/MergedOutputSection.cpp
21	Good point; if I was rigorous, I think I'd prefer omitting it, too.
lld/MachO/OutputSection.cpp
26–29	Since output segments just contain output sections now, they're not aware of whether they contain synthetic sections or mergable sections. I'm not sure how to cast that away (I guess a dynamic cast at runtime, but I figured this would be cheaper).
lld/MachO/OutputSection.h
21	I hadn't looked too closely tbh.

Move comparator logic within output segment / section
Refactored creation of synthetic classes to use a base class / add to segments within constructor
removed instances of "this"

Harbormaster failed remote builds in B54297: Diff 259376!Apr 22 2020, 2:09 PM

The change looks good, but it needs a bunch more testing, e.g:

Various flag merging scenarios (I recognize some of that is in flux right now, but e.g. the pure instructions mergnig)
Checking for merging text and data sections as well as cstrings, including checking the merging of the contents as well as the symbols
Perhaps something for the section sorting, since you're changing that (though perhaps the existing tests for that already provide enough coverage)

lld/MachO/ExportTrie.h
27	@Ktwu, if these changes were incorporated in D76977, would you not need to depend on that? That one's a meatier review than this one, so it'd be easier to get this in first if possible.
lld/MachO/MergedOutputSection.cpp
34	Super nit: `isecAddr` is a bit more understandable IMO.
37	Super nit: `InputSection` loop variables are usually `isec`
52	Could you elaborate on what might be wrong in this comment?
66	segment -> section
67	If I'm understanding this correctly, it'll end up negating the pure bit when you don't want it to. If both `inputFlags` and `flags` have the bit set, the result of the AND for that bit will be 1, so it'll be 0 in your mask, so it'll get unset.
lld/MachO/MergedOutputSection.h
20	Could you add a class comment about what this represents? I'd prefer renaming `OutputSection` to `OutputSectionBase` and `MergedOutputSection` to just `OutputSection`, to be more in line with ELF, but I don't feel super strongly about that.
lld/MachO/OutputSection.h
21	`MergeInputSection` is for the ELF SHF_MERGE concept, where a linker can merge the objects inside a section ... it's used for strings, for example (where the linker can perform string merging and tail merging). I think ld64 just treats certain sections types as mergeable instead of having a special flags for that.
22	Can you add a class comment?
27–28	Why do these need to be virtual?
42	Is this needed for anything other than MergedOutputSection?
lld/MachO/OutputSegment.cpp
39–43	If this is gonna be called often, we should either do some sort of caching, or else just do the computation as part of `addOutputSection` and make it a public member variable.
41	`auto &i`, and same comment about unpacking `i.second`
lld/MachO/OutputSegment.h
34–35	This one needs to be addressed.
75	Given that this might be the common case, would it make sense to cache the `find` somehow?
lld/MachO/SyntheticSections.h
166–170	I believe the `in` is short for "globally accessibly input sections". LLD ELF has a corresponding struct called `Out` for output sections (in OutputSections.h), and given that our synthetic sections are output sections now, it might make sense to adopt that name.
lld/MachO/Writer.cpp
326–329	Nit: might be nicer to assign these to variables instead of chaining.
376–385	Nit: use an explicit type instead of auto
lld/test/MachO/section-merge.s
4	It'd be nice to check that the contents of the sections are merged correctly as well. Also, we should be checking that text segments are merged correctly, since that's the most common case.
32–33	We aren't defining these symbols in this file, so the `.global` directives aren't doing anything.

@smeenai's comments aside, it looks pretty good to me :)

lld/MachO/MergedOutputSection.h
20	+1 on the renaming
lld/MachO/OutputSection.h
77–78	"which stores them in a MapVector of section name -> section" seems clearer. I think it's worth pointing out we have a MapVector since it doesn't make sense to sort any other kind of map
79–80	nit: take by const ref
84	I think `operator<` typically returns a `bool`
lld/MachO/SyntheticSections.cpp
30	ultra nit: how about "Synthetic sections always know which segment they belong to, so hook them up when they're made"? "No need to orphan" seems a bit weird because it hints that one might naively want to orphan them but doesn't indicate why

Ktwu marked 11 inline comments as done.Apr 24 2020, 4:31 PM

Ktwu added inline comments.

lld/MachO/ExportTrie.h
27	As discussed in the meeting, section merging depends on being able to have more than one symbol at a time for testing. The export trie is a hard dependency if we want to test this properly :/
lld/MachO/MergedOutputSection.h
20	Sure.

Ktwu added inline comments.Apr 24 2020, 4:31 PM

lld/MachO/MergedOutputSection.cpp
34	Sure
52	I'll add this to the comment, but in short, there's no research that's gone into validating what the merge behavior for flags like this ought to be (it's also why there are no tests for flag merging).
66	Oops thanks.
67	Ahhh nice catch (and ideally one that a test will confirm). I believe what I want is: uint32_t pureMask = ~MachO::S_ATTR_PURE_INSTRUCTIONS \| (inputFlags & flags);
lld/MachO/OutputSection.h
27–28	ah, they don't, it's refactoring cruft
42	No, it's not needed except for MergedOutputSection. Should I dynamically cast each output section before trying to merge an input section into it instead (I wanted to avoid the runtime hit doing that).
lld/MachO/OutputSegment.cpp
39–43	I believe I hit issues trying to cache this on adding an OutputSection; I think the GotSection made things difficult since its isNeeded() attribute is dynamic.
lld/MachO/OutputSegment.h
75	Good idea!

smeenai added inline comments.Apr 24 2020, 5:16 PM

lld/MachO/OutputSection.h
42	Yeah, dynamically casting wouldn't be ideal. We could leave this as-is, or make `getOrCreateOutputSection` return a `MergedOutputSection ` instead of an `OutputSection ` (idk if that'd cause other complications).

Ktwu marked 9 inline comments as done.Apr 27 2020, 3:16 PM

Ktwu added inline comments.

lld/MachO/OutputSection.h
42	I believe that would require having an assert that the output class being returned, if already created, is in fact a mergeable section. I think a static_cast would be what we'd want, but I like having the explicit assert here in case something goes wrong instead of the undefined behavior of a static_cast gone wrong.
lld/MachO/Writer.cpp
326–329	I personally prefer chaining :D
376–385	What's up with `auto`? It's not like it's forbidden from use in the style guide: https://llvm.org/docs/CodingStandards.html#id27 I'm curious about folks' reasoning behind its usage (or discouragement of).

Comments. More significantly, added more checks to verify __text merging.

Harbormaster failed remote builds in B54890: Diff 260486!Apr 27 2020, 4:44 PM

smeenai added inline comments.Apr 28 2020, 12:23 PM

lld/MachO/OutputSection.h
42	Ah. The assert would require an `isa`, which is basically the same overhead as a `dyn_cast`, so perhaps just leaving it as-is is best for now.
lld/MachO/Writer.cpp
326–329	Haha, fair enough.
376–385	Good question. LLD has its own style guidelines (e.g. variables are lowerCamelCase). Unfortunately, I don't think those are codified anywhere, but I'm basing this on what I've seen in reviews. `auto` is discouraged unless the actual type is spelled out in the same expression somewhere (e.g. as a result of a cast) or is a huge pain to type out (e.g. iterators). Otherwise explicit types are preferred. I personally like that because I don't use an IDE (and I haven't set up any ctags or LSP-like things for my editor), so having explicit types available makes it easier for me to comprehend the code. It's a little less clear-cut in cases like auto linkEditSegment = getOrCreateOutputSegment(segment_names::linkEdit); where it'd be pretty fair to assume that `getOrCreateOutputSegment` returns an `OutputSegment `. It's a bit ambiguous whether it'd be a pointer, reference, or copy, but you could use `auto ` to disambiguate that. Nevertheless, it's not too much more typing to just spell the name out, so I'd prefer to err on the side of explicitness there. There was a post to the mailing list about auto usage in LLVM a while back, but there was no clear resolution: http://llvm.1065342.n5.nabble.com/llvm-dev-RFC-Modernizing-our-use-of-auto-td123947.html (the authors on that thread are bogus; if you want the original authors, look for that subject on http://lists.llvm.org/pipermail/llvm-dev/2018-November/ and http://lists.llvm.org/pipermail/llvm-dev/2018-December/)

smeenai added inline comments.Apr 28 2020, 5:02 PM

lld/MachO/MergedOutputSection.cpp
40	@int3 is changing this to take the section's alignment into account in D79050, so this should follow suit.
48	Same here.
lld/MachO/Writer.cpp
344	Would it make sense to have each `OutputSection` store its `fileOff` (in addition to or instead of the `OutputSegment` holding it), so that we don't need to recompute it in the writeSections loop below?

There's also the comment in D79050 about setting the alignment in the SyntheticSection base class once we have that.

I'm assuming the diff hasn't been updated with the OutputSection -> OutputSectionBase and MergedOutputSection -> OutputSection renaming yet (assuming you're good with that).

lld/MachO/MergedOutputSection.cpp
67	Yup, looks good.
lld/MachO/SyntheticSections.h
166–170	This one still needs addressing, though I'm okay with doing it in a follow-up if you'd prefer.
lld/test/MachO/section-merge.s
46	Might be easier to make sense of this as assembly (`llvm-objdump -d`)

Ktwu marked 6 inline comments as done and an inline comment as not done.Apr 29 2020, 10:10 PM

Ktwu added inline comments.

lld/MachO/MergedOutputSection.h
20	Er, actually, I like MergedOutputSection because it conveys more information about what kind of OutputSection it is. Given how many of the other classes used don't use -Base in their name -- InputSections, OutputSegments -- it feels clunky to have an OutputSectionBase. @int3 do you feel strongly about renaming this?
lld/test/MachO/section-merge.s
46	Is this a big deal? I compared this output to ld; I don't care about the content so much as making sure it matches what ld outputs.

Rebased, aligned file offsets, and tested with UBSAN.

clang-format

smeenai added inline comments.Apr 29 2020, 10:49 PM

lld/test/MachO/section-merge.s
46	I think it makes the test a lot more intelligible and easy to verify. Right now, for me, this is just a blob of bytes. If it were written out as instructions, I could verify that it's the instructions in `_some_function` followed by the instructions in `_main` (as it should be). In general, matching ld64's output is important, but we also want our tests to work well standalone. The cstring check below is great because it's easy to tell at a glance that all the strings from the various input files are being combined together (as they should be); using the disassembly will let us do the same for the text section.

smeenai added inline comments.Apr 29 2020, 10:53 PM

lld/MachO/MergedOutputSection.h
20	Sure, I'm good with the parent class being named `OutputSection` and having specific subclasses representing the different types of output sections. I'm not the biggest fan of the `MergedOutputSection` name because to me it suggests output sections being merged together rather than input sections being merged into a single output section, but I can't think of anything better either, so I'm good with it if we can't come up with a better name. Naming is hard :D

Harbormaster failed remote builds in B55252: Diff 261131!Apr 29 2020, 11:13 PM

Harbormaster failed remote builds in B55251: Diff 261130!

int3 added inline comments.Apr 30 2020, 3:53 AM

lld/MachO/MergedOutputSection.h
20	I prefer the rename because it more closely parallels what lld-ELF has. Moreover ELF's MergeInputSection is quite different, so it's almost like a false parallel... Re conveying information, my mental model has been that OutputSections are by default mergeable, and only the special SyntheticSections aren't, so I don't feel that there's a need to explicitly call out the mergeability. `-Base` being a clunky, non-functionally-descriptive suffix is a fair point though...

One thing I just thought about (sorry :/). How do we want to handle sections in input files with the same name as our special sections? For example, what if the user gives us an input file with the section __DATA_CONST,__got? ld64 appears to handle this fine; it just combines the user-provided section with its own synthesized one. It also handles the case where a user input file has a __TEXT,__mach_header section, and treats it as distinct from its own hidden synthesized section with that name.

lld/MachO/MergedOutputSection.h
20	To be fair, I don't think LLD ELF has an OutputSectionBase. It does have an InputSectionBase though. The hierarchy is also different cos ELF synthetic sections are input sections (so they can be manipulated by linker scripts), whereas ours are output sections, so it's a bit hard to compare.
20	The comment looks great! Super nit: LLD always formats these types of comments using single-line comments (`//`) instead of `/* */`, so we should follow suit here.
lld/MachO/OutputSection.cpp
23	We should only hit this if we have a programming error on our end, right? If so, this should be `llvm_unreachable` instead of `error`.
lld/MachO/OutputSection.h
22	Comment looks great! Same nit about using single-line comments.
lld/MachO/OutputSegment.cpp
39–43	Ah, makes sense.
50	Should we assert that `sections[os->name]` doesn't already exist?
54	FWIW, `auto` is fine for iterators, but this is fine too.
lld/MachO/OutputSegment.h
75	This one isn't addressed, but given that we shouldn't have too many segments (besides the ones already in the map, I can only think of `__DATA`), perhaps this is okay as-is?
lld/MachO/SyntheticSections.cpp
32	Super nit: use a member initialization list instead.
70–71	@int3 how come these strings are just here directly vs. all the other ones being named constants? Also, not this diff, but is `__DATA_CONST` correct? ld64 puts this in `__DATA` instead as far as I can see. It's not quite constant cos the dynamic linker's gonna fill it in, though idk if it has the equivalent to ELF's RELRO.
lld/test/MachO/section-merge.s
16	We only need to check the properties we care about. Detailed symbol table checking should happen in the symbol table tests. Over here, I think we just care about the symbol name, section, and value, so we can drop the checks for the other fields (Extern, Type, RefType, Flags) We should also be checking for the text section symbols.

int3 added inline comments.Apr 30 2020, 7:21 PM

lld/test/MachO/section-merge.s
16	+1 for terser checks. We can also use `llvm-objdump --syms` here -- its output is much more compact and suitable for when we're not checking all the properties

int3 added inline comments.Apr 30 2020, 7:23 PM

lld/MachO/SyntheticSections.cpp
70–71	I don't think we're currently referencing this from anywhere else (in particular I didn't give them an order in the sorting comparator), so it wasn't technically necessary, though we could definitely make them constants for the sake of uniformity

Ktwu marked 11 inline comments as done.Apr 30 2020, 11:45 PM

Ktwu added inline comments.

lld/MachO/MergedOutputSection.h
20	D:
lld/MachO/OutputSection.cpp
23	Yup, since we're assuming that synthetic sections cannot be merged, this shouldn't happen.
lld/MachO/OutputSegment.h
75	Ah, no, I legit forgot to address this, so I don't mind getting to it...
lld/MachO/SyntheticSections.cpp
70–71	I'll make 'em a constant for now; can we deal with DATA vs DATA_CONST in another diff if need be?
lld/test/MachO/section-merge.s
16	llvm-objdump --syms is pretty nice, so I'll use that here instead.
46	Fair enough!

initialization list for SyntheticSection
caching position for "default" sections
objdump instead of readobj in test
constants for __DATA_CONST
llvm_unreachable

Harbormaster failed remote builds in B55413: Diff 261433!Apr 30 2020, 11:47 PM

int3 added inline comments.May 1 2020, 12:50 AM

lld/MachO/SyntheticSections.cpp
70–71	Oh sorry I missed the 2nd part of the comment about the segment. I'm pretty sure it's in `__DATA_CONST` at least on Catalina; just tried it out. But yeah we can deal with it in another diff if necessary

int3 marked an inline comment as done.May 1 2020, 2:58 AM

int3 added inline comments.

lld/MachO/MergedOutputSection.h
20	Not sure if that emoji was in reaction to my naming nit, but please feel free to stick with what you have. I was just giving my 2c, but it's your diff :)

smeenai added inline comments.May 1 2020, 1:54 PM

lld/MachO/SyntheticSections.cpp
70–71	Ah, sorry, I was just wondering why these weren't constants. I didn't mean to imply that you had to do it in this diff, but thanks for taking care of it :) @int3 interesting, this appears to be a Catalina vs older OS thing. Possibly related to dyld3?

Looks great! @int3, any other comments?

lld/test/MachO/section-merge.s
12	This seems to be a leftover from testing :)

This revision is now accepted and ready to land.May 1 2020, 1:56 PM

Yeah this lgtm, let's ship it (after rebasing). Pretty sure D78168: [lld-macho][rfc] Have Symbol::getVA() return a non-relative virtual address causes a rebase conflict, not sure if there's anything else...

int3 accepted this revision.May 1 2020, 2:19 PM

Remove testing cruft, oops

In D77893#2014371, @smeenai wrote:

One thing I just thought about (sorry :/). How do we want to handle sections in input files with the same name as our special sections? For example, what if the user gives us an input file with the section __DATA_CONST,__got? ld64 appears to handle this fine; it just combines the user-provided section with its own synthesized one. It also handles the case where a user input file has a __TEXT,__mach_header section, and treats it as distinct from its own hidden synthesized section with that name.

For a large internal binary, I confirmed that none of the input section names clashed with our synthetic section names, so I think it's pretty safe to just error out if we run into that.

int3 edited parent revisions, added: D78269: [lld-macho] Support X86_64_RELOC_BRANCH; removed: D76977: [lld-macho] Implement basic export trie.May 1 2020, 5:06 PM

Closed by commit rG6cb073133c56: [lld] Merge Mach-O input sections (authored by Ktwu, committed by int3). · Explain WhyMay 1 2020, 5:31 PM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B55498: Diff 261554!May 1 2020, 8:01 PM

Noticed a couple of minor things while rebasing on top of this. I've folded changes for them into D78270: [lld-macho] Support calls to functions in dylibs.

lld/MachO/OutputSection.h
40	IMO this should just be `return true`. Whether a section is hidden is orthogonal from whether it is needed: hidden sections will never have a header regardless of whether they have a body. (I know we override this method with `return false` for synthetic sections, but regardless I think it's confusing to write it this way for non-synthetic sections.)
lld/MachO/Writer.cpp
117–118	I think this should stay as `getSections().empty()`, and the check for `isNeeded()` should be moved outside writeTo(). We should just not create LCSegment commands for unneeded segments. The `empty()` check however is still needed because `__LINKEDIT` can be empty.
368–369	we should avoid writing unneeded output sections (My stacked diff makes that assumption for one of the stub helper synthetic section)

In D77893#2021478, @int3 wrote:

Noticed a couple of minor things while rebasing on top of this. I've folded changes for them into D78270: [lld-macho] Support calls to functions in dylibs.

Ideally they'd be separated into their own small changes so that it's easier to review them.

If it's changes you feel pretty confident about, it's fine to just commit them directly (for post-commit review). If it's changes where you want input, putting them up for review separately makes it easier to give focused feedback.

Fair enough, I was being lazy... I'd made those changes while fixing rebase conflicts + getting tests to pass, so it was convenient that way. I'll try to untangle them

Alright, put up D79460

int3 added inline comments.May 6 2020, 2:53 AM

lld/MachO/OutputSegment.h
55–56	I was looking at the implementation of `MapVector` today and I realized that the sort only operates on the vector and not the map, so the container exhibits some questionable behavior after sorting :D MapVector<int, int> mv; mv[2] = 98; mv[1] = 99; std::sort(mv.begin(), mv.end()); for (int i = 1; i <= 2; ++i) fprintf(stderr, "%d %d\n", i, mv[i]); This prints 1 98 2 99 So I think we should really use `takeVector` before sorting. Two more refactoring ideas to tack on to that: We could filter out the unneeded OutputSections as this stage too, so we don't have to worry about checking isNeeded() afterward. Maybe we could consider not creating the OutputSegments till after the sorting/filtering has been done, so we don't have the OutputSegments in a state where some operations aren't valid. But that would probably mean an additional `outputSections` global, so there's some tradeoff there. Up to you

Ktwu removed a child revision: D78342: [lld] Add archive file support to Mach-O backend.May 7 2020, 11:48 AM

Ktwu added a child revision: D78342: [lld] Add archive file support to Mach-O backend.May 8 2020, 6:14 PM

int3 mentioned this in rGdb157d27337f: [lld-macho] Follow-up to D77893.May 9 2020, 9:16 PM

smeenai mentioned this in D87199: [lld-macho] Implement support for PIC.Sep 23 2020, 11:15 PM

Revision Contents

Path

Size

lld/

MachO/

2 lines

1 line

2 lines

6 lines

2 lines

18 lines

8 lines

MergedOutputSection.h

51 lines

MergedOutputSection.cpp

72 lines

100 lines

23 lines

55 lines

94 lines

2 lines

42 lines

SyntheticSections.cpp

65 lines

Writer.cpp

199 lines

test/

MachO/

Inputs/

libfunction.s

6 lines

section-merge.s

35 lines

Diff 261592

lld/MachO/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS Options.td)			set(LLVM_TARGET_DEFINITIONS Options.td)
	tablegen(LLVM Options.inc -gen-opt-parser-defs)			tablegen(LLVM Options.inc -gen-opt-parser-defs)
	add_public_tablegen_target(MachOOptionsTableGen)			add_public_tablegen_target(MachOOptionsTableGen)

	add_lld_library(lldMachO2			add_lld_library(lldMachO2
	Arch/X86_64.cpp			Arch/X86_64.cpp
	Driver.cpp			Driver.cpp
	ExportTrie.cpp			ExportTrie.cpp
	InputFiles.cpp			InputFiles.cpp
	InputSection.cpp			InputSection.cpp
				MergedOutputSection.cpp
				OutputSection.cpp
	OutputSegment.cpp			OutputSegment.cpp
	SymbolTable.cpp			SymbolTable.cpp
	Symbols.cpp			Symbols.cpp
	SyntheticSections.cpp			SyntheticSections.cpp
	Target.cpp			Target.cpp
	Writer.cpp			Writer.cpp

	LINK_COMPONENTS			LINK_COMPONENTS
	Show All 15 Lines

lld/MachO/Driver.cpp

	//===- Driver.cpp ---------------------------------------------------------===//			//===- Driver.cpp ---------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "Driver.h"			#include "Driver.h"
	#include "Config.h"			#include "Config.h"
	#include "InputFiles.h"			#include "InputFiles.h"
				#include "OutputSection.h"
	#include "OutputSegment.h"			#include "OutputSegment.h"
	#include "SymbolTable.h"			#include "SymbolTable.h"
	#include "Symbols.h"			#include "Symbols.h"
	#include "Target.h"			#include "Target.h"
	#include "Writer.h"			#include "Writer.h"

	#include "lld/Common/Args.h"			#include "lld/Common/Args.h"
	#include "lld/Common/Driver.h"			#include "lld/Common/Driver.h"
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

lld/MachO/ExportTrie.h

	Show All 18 Lines
	struct TrieNode;			struct TrieNode;
	class Symbol;			class Symbol;

	class TrieBuilder {			class TrieBuilder {
	public:			public:
	void addSymbol(const Symbol &sym) { exported.push_back(&sym); }			void addSymbol(const Symbol &sym) { exported.push_back(&sym); }
	// Returns the size in bytes of the serialized trie.			// Returns the size in bytes of the serialized trie.
	size_t build();			size_t build();
	void writeTo(uint8_t *buf);			void writeTo(uint8_t *buf) const;
				smeenaiUnsubmitted Not Done Reply Inline Actions @Ktwu, if these changes were incorporated in D76977, would you not need to depend on that? That one's a meatier review than this one, so it'd be easier to get this in first if possible. smeenai: @Ktwu, if these changes were incorporated in D76977, would you not need to depend on that? That…
				KtwuAuthorUnsubmitted Done Reply Inline Actions As discussed in the meeting, section merging depends on being able to have more than one symbol at a time for testing. The export trie is a hard dependency if we want to test this properly :/ Ktwu: As discussed in the meeting, section merging depends on being able to have more than one symbol…

	private:			private:
	TrieNode *makeNode();			TrieNode *makeNode();
	void sortAndBuild(llvm::MutableArrayRef<const Symbol > vec, TrieNode node,			void sortAndBuild(llvm::MutableArrayRef<const Symbol > vec, TrieNode node,
	size_t lastPos, size_t pos);			size_t lastPos, size_t pos);

	std::vector<const Symbol *> exported;			std::vector<const Symbol *> exported;
	std::vector<TrieNode *> nodes;			std::vector<TrieNode *> nodes;
	};			};

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/ExportTrie.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	struct TrieNode {
Optional<ExportInfo> info;		Optional<ExportInfo> info;
// Estimated offset from the start of the serialized trie to the current node.		// Estimated offset from the start of the serialized trie to the current node.
// This will converge to the true offset when updateOffset() is run to a		// This will converge to the true offset when updateOffset() is run to a
// fixpoint.		// fixpoint.
size_t offset = 0;		size_t offset = 0;

// Returns whether the new estimated offset differs from the old one.		// Returns whether the new estimated offset differs from the old one.
bool updateOffset(size_t &nextOffset);		bool updateOffset(size_t &nextOffset);
void writeTo(uint8_t *buf);		void writeTo(uint8_t *buf) const;
};		};

bool TrieNode::updateOffset(size_t &nextOffset) {		bool TrieNode::updateOffset(size_t &nextOffset) {
// Size of the whole node (including the terminalSize and the outgoing edges.)		// Size of the whole node (including the terminalSize and the outgoing edges.)
// In contrast, terminalSize only records the size of the other data in the		// In contrast, terminalSize only records the size of the other data in the
// node.		// node.
size_t nodeSize;		size_t nodeSize;
if (info) {		if (info) {
Show All 15 Lines	bool TrieNode::updateOffset(size_t &nextOffset) {
// On input, 'nextOffset' is the new preferred location for this node.		// On input, 'nextOffset' is the new preferred location for this node.
bool result = (offset != nextOffset);		bool result = (offset != nextOffset);
// Store new location in node object for use by parents.		// Store new location in node object for use by parents.
offset = nextOffset;		offset = nextOffset;
nextOffset += nodeSize;		nextOffset += nodeSize;
return result;		return result;
}		}

void TrieNode::writeTo(uint8_t *buf) {		void TrieNode::writeTo(uint8_t *buf) const {
buf += offset;		buf += offset;
if (info) {		if (info) {
// TrieNodes with Symbol info: size, flags address		// TrieNodes with Symbol info: size, flags address
uint64_t flags = 0; // TODO: emit proper flags		uint64_t flags = 0; // TODO: emit proper flags
uint32_t terminalSize =		uint32_t terminalSize =
getULEB128Size(flags) + getULEB128Size(info->address);		getULEB128Size(flags) + getULEB128Size(info->address);
buf += encodeULEB128(terminalSize, buf);		buf += encodeULEB128(terminalSize, buf);
buf += encodeULEB128(flags, buf);		buf += encodeULEB128(flags, buf);
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	do {
more = false;		more = false;
for (TrieNode *node : nodes)		for (TrieNode *node : nodes)
more \|= node->updateOffset(offset);		more \|= node->updateOffset(offset);
} while (more);		} while (more);

return offset;		return offset;
}		}

void TrieBuilder::writeTo(uint8_t *buf) {		void TrieBuilder::writeTo(uint8_t *buf) const {
for (TrieNode *node : nodes)		for (TrieNode *node : nodes)
node->writeTo(buf);		node->writeTo(buf);
}		}

} // namespace macho		} // namespace macho
} // namespace lld		} // namespace lld

lld/MachO/InputFiles.cpp

	Show All 37 Lines
	//			//
	// Without the above differences, I think you can use your knowledge about ELF			// Without the above differences, I think you can use your knowledge about ELF
	// and COFF for Mach-O.			// and COFF for Mach-O.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "InputFiles.h"			#include "InputFiles.h"
	#include "InputSection.h"			#include "InputSection.h"
	#include "OutputSegment.h"			#include "OutputSection.h"
	#include "SymbolTable.h"			#include "SymbolTable.h"
	#include "Symbols.h"			#include "Symbols.h"
	#include "Target.h"			#include "Target.h"

	#include "lld/Common/ErrorHandler.h"			#include "lld/Common/ErrorHandler.h"
	#include "lld/Common/Memory.h"			#include "lld/Common/Memory.h"
	#include "llvm/BinaryFormat/MachO.h"			#include "llvm/BinaryFormat/MachO.h"
	#include "llvm/Support/Endian.h"			#include "llvm/Support/Endian.h"
	▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

lld/MachO/InputSection.h

	Show All 13 Lines
	#include "llvm/ADT/PointerUnion.h"			#include "llvm/ADT/PointerUnion.h"
	#include "llvm/BinaryFormat/MachO.h"			#include "llvm/BinaryFormat/MachO.h"

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	class InputFile;			class InputFile;
	class InputSection;			class InputSection;
	class OutputSegment;			class OutputSection;
	class Symbol;			class Symbol;

	struct Reloc {			struct Reloc {
	uint8_t type;			uint8_t type;
	uint32_t addend;			uint32_t addend;
	uint32_t offset;			uint32_t offset;
	llvm::PointerUnion<Symbol , InputSection > target;			llvm::PointerUnion<Symbol , InputSection > target;
	};			};

	class InputSection {			class InputSection {
	public:			public:
	virtual ~InputSection() = default;			virtual ~InputSection() = default;
	virtual size_t getSize() const { return data.size(); }			virtual size_t getSize() const { return data.size(); }
	virtual uint64_t getFileSize() const { return getSize(); }			virtual uint64_t getFileSize() const { return getSize(); }
	uint64_t getFileOffset() const;			uint64_t getFileOffset() const;
	// Don't emit section_64 headers for hidden sections.			uint64_t getVA() const;
				smeenaiUnsubmitted Done Reply Inline Actions I think `getVA` would be more in line with the LLD naming for this concept. smeenai: I think `getVA` would be more in line with the LLD naming for this concept.
	virtual bool isHidden() const { return false; }
	// Unneeded sections are omitted entirely (header and body).
	virtual bool isNeeded() const { return true; }
	virtual void writeTo(uint8_t *buf);			virtual void writeTo(uint8_t *buf);

	InputFile *file = nullptr;			InputFile *file = nullptr;
	OutputSegment *parent = nullptr;
	StringRef name;			StringRef name;
	StringRef segname;			StringRef segname;

	ArrayRef<uint8_t> data;			OutputSection *parent = nullptr;
				uint64_t outSecOff = 0;
				smeenaiUnsubmitted Done Reply Inline Actions Similarly, since this is the same notion as LLD ELF's `outSecOff`, I'd probably stick with that name, just so it's easy to map concepts between the two. smeenai: Similarly, since this is the same notion as LLD ELF's `outSecOff`, I'd probably stick with that…
				uint64_t outSecFileOff = 0;

	// TODO these properties ought to live in an OutputSection class.
	// Move them once available.
	uint64_t addr = 0;
	uint32_t align = 1;			uint32_t align = 1;
				smeenaiUnsubmitted Not Done Reply Inline Actions LLD ELF has an `outSecOff` in its InputSections, which tracks the offset of this particular section within its output section. I think that'd be better than storing the absolute address in the InputSection. smeenai: LLD ELF has an `outSecOff` in its InputSections, which tracks the offset of this particular…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Interesting, I'll look into that. Ktwu: Interesting, I'll look into that.
	uint32_t sectionIndex = 0;
	uint32_t flags = 0;			uint32_t flags = 0;

				ArrayRef<uint8_t> data;
	std::vector<Reloc> relocs;			std::vector<Reloc> relocs;
	};			};

	extern std::vector<InputSection *> inputSections;			extern std::vector<InputSection *> inputSections;

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/InputSection.cpp

	Show All 16 Lines
	using namespace llvm::MachO;			using namespace llvm::MachO;
	using namespace llvm::support;			using namespace llvm::support;
	using namespace lld;			using namespace lld;
	using namespace lld::macho;			using namespace lld::macho;

	std::vector<InputSection *> macho::inputSections;			std::vector<InputSection *> macho::inputSections;

	uint64_t InputSection::getFileOffset() const {			uint64_t InputSection::getFileOffset() const {
	return parent->fileOff + addr - parent->firstSection()->addr;			return parent->fileOff + outSecFileOff;
	}			}

				uint64_t InputSection::getVA() const { return parent->addr + outSecOff; }

	void InputSection::writeTo(uint8_t *buf) {			void InputSection::writeTo(uint8_t *buf) {
	if (!data.empty())			if (!data.empty())
	memcpy(buf, data.data(), data.size());			memcpy(buf, data.data(), data.size());

	for (Reloc &r : relocs) {			for (Reloc &r : relocs) {
	uint64_t va = 0;			uint64_t va = 0;
	if (auto s = r.target.dyn_cast<Symbol >()) {			if (auto s = r.target.dyn_cast<Symbol >()) {
	if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s)) {			if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s)) {
	va = in.got->addr + dylibSymbol->gotIndex * WordSize;			va = in.got->addr + dylibSymbol->gotIndex * WordSize;
	} else {			} else {
	va = s->getVA();			va = s->getVA();
	}			}
	} else if (auto isec = r.target.dyn_cast<InputSection >()) {			} else if (auto isec = r.target.dyn_cast<InputSection >()) {
	va = isec->addr;			va = isec->getVA();
	} else {			} else {
	llvm_unreachable("Unknown relocation target");			llvm_unreachable("Unknown relocation target");
	}			}

	uint64_t val = va + r.addend;			uint64_t val = va + r.addend;
	if (1) // TODO: handle non-pcrel relocations			if (1) // TODO: handle non-pcrel relocations
	val -= addr + r.offset;			val -= getVA() + r.offset;
	target->relocateOne(buf + r.offset, r.type, val);			target->relocateOne(buf + r.offset, r.type, val);
	}			}
	}			}

lld/MachO/MergedOutputSection.h

This file was added.

				//===- OutputSection.h ------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLD_MACHO_MERGED_OUTPUT_SECTION_H
				#define LLD_MACHO_MERGED_OUTPUT_SECTION_H

				#include "InputSection.h"
				#include "OutputSection.h"
				#include "lld/Common/LLVM.h"

				namespace lld {
				namespace macho {

				// Linking multiple files will inevitably mean resolving sections in different
				// files that are labeled with the same segment and section name. This class
				smeenaiUnsubmitted Not Done Reply Inline Actions Could you add a class comment about what this represents? I'd prefer renaming `OutputSection` to `OutputSectionBase` and `MergedOutputSection` to just `OutputSection`, to be more in line with ELF, but I don't feel super strongly about that. smeenai: Could you add a class comment about what this represents? I'd prefer renaming `OutputSection`…
				int3Unsubmitted Not Done Reply Inline Actions +1 on the renaming int3: +1 on the renaming
				KtwuAuthorUnsubmitted Not Done Reply Inline Actions Sure. Ktwu: Sure.
				KtwuAuthorUnsubmitted Done Reply Inline Actions Er, actually, I like MergedOutputSection because it conveys more information about what kind of OutputSection it is. Given how many of the other classes used don't use -Base in their name -- InputSections, OutputSegments -- it feels clunky to have an OutputSectionBase. @int3 do you feel strongly about renaming this? Ktwu: Er, actually, I like MergedOutputSection because it conveys more information about what kind of…
				smeenaiUnsubmitted Not Done Reply Inline Actions Sure, I'm good with the parent class being named `OutputSection` and having specific subclasses representing the different types of output sections. I'm not the biggest fan of the `MergedOutputSection` name because to me it suggests output sections being merged together rather than input sections being merged into a single output section, but I can't think of anything better either, so I'm good with it if we can't come up with a better name. Naming is hard :D smeenai: Sure, I'm good with the parent class being named `OutputSection` and having specific subclasses…
				int3Unsubmitted Not Done Reply Inline Actions I prefer the rename because it more closely parallels what lld-ELF has. Moreover ELF's MergeInputSection is quite different, so it's almost like a false parallel... Re conveying information, my mental model has been that OutputSections are by default mergeable, and only the special SyntheticSections aren't, so I don't feel that there's a need to explicitly call out the mergeability. `-Base` being a clunky, non-functionally-descriptive suffix is a fair point though... int3: I prefer the rename because it more closely parallels what lld-ELF has. Moreover ELF's…
				smeenaiUnsubmitted Not Done Reply Inline Actions To be fair, I don't think LLD ELF has an OutputSectionBase. It does have an InputSectionBase though. The hierarchy is also different cos ELF synthetic sections are input sections (so they can be manipulated by linker scripts), whereas ours are output sections, so it's a bit hard to compare. smeenai: To be fair, I don't think LLD ELF has an OutputSectionBase. It does have an InputSectionBase…
				smeenaiUnsubmitted Done Reply Inline Actions The comment looks great! Super nit: LLD always formats these types of comments using single-line comments (`//`) instead of `/* /`, so we should follow suit here. smeenai:* The comment looks great! Super nit: LLD always formats these types of comments using single…
				KtwuAuthorUnsubmitted Done Reply Inline Actions D: Ktwu: D:
				int3Unsubmitted Done Reply Inline Actions Not sure if that emoji was in reaction to my naming nit, but please feel free to stick with what you have. I was just giving my 2c, but it's your diff :) int3: Not sure if that emoji was in reaction to my naming nit, but please feel free to stick with…
				// contains all such sections and writes the data from each section sequentially
				// in the final binary.
				class MergedOutputSection : public OutputSection {
				public:
				MergedOutputSection(StringRef name) : OutputSection(name) {}

				const InputSection *firstSection() const { return inputs.front(); }
				const InputSection *lastSection() const { return inputs.back(); }

				// These accessors will only be valid after finalizing the section
				size_t getSize() const override { return size; }
				uint64_t getFileSize() const override { return fileSize; }

				void mergeInput(InputSection *input) override;
				void finalize() override;

				void writeTo(uint8_t *buf) const override;

				std::vector<InputSection *> inputs;

				private:
				void mergeFlags(uint32_t inputFlags);

				size_t size = 0;
				uint64_t fileSize = 0;
				};

				} // namespace macho
				} // namespace lld

				#endif

lld/MachO/MergedOutputSection.cpp

This file was added.

				//===- OutputSection.cpp --------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "MergedOutputSection.h"
				#include "lld/Common/ErrorHandler.h"
				#include "lld/Common/Memory.h"
				#include "llvm/BinaryFormat/MachO.h"

				using namespace llvm;
				using namespace llvm::MachO;
				using namespace lld;
				using namespace lld::macho;

				void MergedOutputSection::mergeInput(InputSection *input) {
				if (inputs.empty()) {
				align = input->align;
				int3Unsubmitted Not Done Reply Inline Actions bikeshed: Do we want to prefix all member accesses with `this->`? lld-ELF seems to use it inconsistently, lld-COFF in just a few places... I have a slight preference towards omitting it, but we should be consistent either way int3: bikeshed: Do we want to prefix all member accesses with `this->`? lld-ELF seems to use it…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Good point; if I was rigorous, I think I'd prefer omitting it, too. Ktwu: Good point; if I was rigorous, I think I'd prefer omitting it, too.
				flags = input->flags;
				} else {
				mergeFlags(input->flags);
				align = std::max(align, input->align);
				}

				inputs.push_back(input);
				input->parent = this;
				}

				void MergedOutputSection::finalize() {
				uint64_t isecAddr = addr;
				uint64_t isecFileOff = fileOff;
				smeenaiUnsubmitted Not Done Reply Inline Actions Super nit: `isecAddr` is a bit more understandable IMO. smeenai: Super nit: `isecAddr` is a bit more understandable IMO.
				KtwuAuthorUnsubmitted Done Reply Inline Actions Sure Ktwu: Sure
				for (InputSection *i : inputs) {
				i->outSecOff = alignTo(isecAddr, i->align) - addr;
				i->outSecFileOff = alignTo(isecFileOff, i->align) - fileOff;
				smeenaiUnsubmitted Not Done Reply Inline Actions Super nit: `InputSection` loop variables are usually `isec` smeenai: Super nit: `InputSection` loop variables are usually `isec`
				isecAddr += i->getSize();
				isecFileOff += i->getFileSize();
				}
				smeenaiUnsubmitted Done Reply Inline Actions @int3 is changing this to take the section's alignment into account in D79050, so this should follow suit. smeenai: @int3 is changing this to take the section's alignment into account in D79050, so this should…
				size = isecAddr - addr;
				fileSize = isecFileOff - fileOff;
				}

				void MergedOutputSection::writeTo(uint8_t *buf) const {
				for (InputSection *isec : inputs) {
				isec->writeTo(buf + isec->outSecFileOff);
				}
				smeenaiUnsubmitted Done Reply Inline Actions Same here. smeenai: Same here.
				}

				// TODO: this is most likely wrong; reconsider how section flags
				// are actually merged. The logic presented here was written without
				smeenaiUnsubmitted Not Done Reply Inline Actions Could you elaborate on what might be wrong in this comment? smeenai: Could you elaborate on what might be wrong in this comment?
				KtwuAuthorUnsubmitted Done Reply Inline Actions I'll add this to the comment, but in short, there's no research that's gone into validating what the merge behavior for flags like this ought to be (it's also why there are no tests for flag merging). Ktwu: I'll add this to the comment, but in short, there's no research that's gone into validating…
				// any form of informed research.
				void MergedOutputSection::mergeFlags(uint32_t inputFlags) {
				uint8_t sectionFlag = MachO::SECTION_TYPE & inputFlags;
				if (sectionFlag != (MachO::SECTION_TYPE & flags))
				error("Cannot add merge section; inconsistent type flags " +
				Twine(sectionFlag));

				uint32_t inconsistentFlags =
				MachO::S_ATTR_DEBUG \| MachO::S_ATTR_STRIP_STATIC_SYMS \|
				MachO::S_ATTR_NO_DEAD_STRIP \| MachO::S_ATTR_LIVE_SUPPORT;
				if ((inputFlags ^ flags) & inconsistentFlags)
				error("Cannot add merge section; cannot merge inconsistent flags");

				// Negate pure instruction presence if any section isn't pure.
				smeenaiUnsubmitted Not Done Reply Inline Actions segment -> section smeenai: segment -> section
				KtwuAuthorUnsubmitted Done Reply Inline Actions Oops thanks. Ktwu: Oops thanks.
				uint32_t pureMask = ~MachO::S_ATTR_PURE_INSTRUCTIONS \| (inputFlags & flags);
				smeenaiUnsubmitted Not Done Reply Inline Actions If I'm understanding this correctly, it'll end up negating the pure bit when you don't want it to. If both `inputFlags` and `flags` have the bit set, the result of the AND for that bit will be 1, so it'll be 0 in your mask, so it'll get unset. smeenai: If I'm understanding this correctly, it'll end up negating the pure bit when you don't want it…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Ahhh nice catch (and ideally one that a test will confirm). I believe what I want is: uint32_t pureMask = ~MachO::S_ATTR_PURE_INSTRUCTIONS \| (inputFlags & flags); Ktwu: Ahhh nice catch (and ideally one that a test will confirm). I believe what I want is: ```…
				smeenaiUnsubmitted Not Done Reply Inline Actions Yup, looks good. smeenai: Yup, looks good.

				// Merge the rest
				flags \|= inputFlags;
				flags &= pureMask;
				}

lld/MachO/OutputSection.h

This file was added.

				//===- OutputSection.h ------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLD_MACHO_OUTPUT_SECTION_H
				#define LLD_MACHO_OUTPUT_SECTION_H

				#include "lld/Common/LLVM.h"
				#include "llvm/ADT/DenseMap.h"

				namespace lld {
				namespace macho {

				class InputSection;
				class OutputSegment;

				// Output sections represent the finalized sections present within the final
				int3Unsubmitted Not Done Reply Inline Actions How does this class hierarchy compare to that in the ELF implementation? I know they have `SectionBase`, `InputSectionBase`, and `MergeInputSection`... was planning to dig into it eventually but curious if you have looked int3: How does this class hierarchy compare to that in the ELF implementation? I know they have…
				KtwuAuthorUnsubmitted Done Reply Inline Actions I hadn't looked too closely tbh. Ktwu: I hadn't looked too closely tbh.
				smeenaiUnsubmitted Not Done Reply Inline Actions `MergeInputSection` is for the ELF SHF_MERGE concept, where a linker can merge the objects inside a section ... it's used for strings, for example (where the linker can perform string merging and tail merging). I think ld64 just treats certain sections types as mergeable instead of having a special flags for that. smeenai: `MergeInputSection` is for the ELF SHF_MERGE concept, where a linker can merge the objects…
				// linked executable. They can represent special sections (like the symbol
				smeenaiUnsubmitted Not Done Reply Inline Actions Can you add a class comment? smeenai: Can you add a class comment?
				smeenaiUnsubmitted Not Done Reply Inline Actions Comment looks great! Same nit about using single-line comments. smeenai: Comment looks great! Same nit about using single-line comments.
				// table), or represent coalesced sections from the various inputs given to the
				// linker with the same segment / section name.
				class OutputSection {
				public:
				OutputSection(StringRef name) : name(name) {}
				virtual ~OutputSection() = default;
				smeenaiUnsubmitted Not Done Reply Inline Actions Why do these need to be virtual? smeenai: Why do these need to be virtual?
				KtwuAuthorUnsubmitted Done Reply Inline Actions ah, they don't, it's refactoring cruft Ktwu: ah, they don't, it's refactoring cruft

				// These accessors will only be valid after finalizing the section.
				uint64_t getSegmentOffset() const;

				// How much space the section occupies in the address space.
				virtual size_t getSize() const = 0;
				// How much space the section occupies in the file. Most sections are copied
				// as-is so their file size is the same as their address space size.
				virtual uint64_t getFileSize() const { return getSize(); }

				// Hidden sections omit header content, but body content is still present.
				virtual bool isHidden() const { return !this->isNeeded(); }
				int3Unsubmitted Done Reply Inline Actions IMO this should just be `return true`. Whether a section is hidden is orthogonal from whether it is needed: hidden sections will never have a header regardless of whether they have a body. (I know we override this method with `return false` for synthetic sections, but regardless I think it's confusing to write it this way for non-synthetic sections.) int3: IMO this should just be `return true`. Whether a section is hidden is orthogonal from whether…
				// Unneeded sections are omitted entirely (header and body).
				virtual bool isNeeded() const { return true; }
				smeenaiUnsubmitted Not Done Reply Inline Actions Is this needed for anything other than MergedOutputSection? smeenai: Is this needed for anything other than MergedOutputSection?
				KtwuAuthorUnsubmitted Done Reply Inline Actions No, it's not needed except for MergedOutputSection. Should I dynamically cast each output section before trying to merge an input section into it instead (I wanted to avoid the runtime hit doing that). Ktwu: No, it's not needed except for MergedOutputSection. Should I dynamically cast each output…
				smeenaiUnsubmitted Not Done Reply Inline Actions Yeah, dynamically casting wouldn't be ideal. We could leave this as-is, or make `getOrCreateOutputSection` return a `MergedOutputSection ` instead of an `OutputSection ` (idk if that'd cause other complications). smeenai: Yeah, dynamically casting wouldn't be ideal. We could leave this as-is, or make…
				KtwuAuthorUnsubmitted Done Reply Inline Actions I believe that would require having an assert that the output class being returned, if already created, is in fact a mergeable section. I think a static_cast would be what we'd want, but I like having the explicit assert here in case something goes wrong instead of the undefined behavior of a static_cast gone wrong. Ktwu: I believe that would require having an assert that the output class being returned, if already…
				smeenaiUnsubmitted Not Done Reply Inline Actions Ah. The assert would require an `isa`, which is basically the same overhead as a `dyn_cast`, so perhaps just leaving it as-is is best for now. smeenai: Ah. The assert would require an `isa`, which is basically the same overhead as a `dyn_cast`, so…

				// Some sections may allow coalescing other raw input sections.
				virtual void mergeInput(InputSection *input);

				// Specifically finalizes addresses and section size, not content.
				virtual void finalize() {
				// TODO investigate refactoring synthetic section finalization logic into
				// overrides of this function.
				}

				virtual void writeTo(uint8_t *buf) const = 0;

				StringRef name;
				OutputSegment *parent = nullptr;

				uint32_t index = 0;
				uint64_t addr = 0;
				uint64_t fileOff = 0;
				uint32_t align = 1;
				uint32_t flags = 0;
				};

				class OutputSectionComparator {
				public:
				OutputSectionComparator(uint32_t segmentOrder,
				const std::vector<StringRef> &sectOrdering)
				: segmentOrder(segmentOrder) {
				for (uint32_t j = 0, m = sectOrdering.size(); j < m; ++j)
				sectionOrdering[sectOrdering[j]] = j;
				}

				uint32_t sectionOrder(StringRef secname) {
				auto sectIt = sectionOrdering.find(secname);
				if (sectIt != sectionOrdering.end())
				return sectIt->second;
				return sectionOrdering.size();
				int3Unsubmitted Done Reply Inline Actions "which stores them in a MapVector of section name -> section" seems clearer. I think it's worth pointing out we have a MapVector since it doesn't make sense to sort any other kind of map int3: "which stores them in a MapVector of section name -> section" seems clearer. I think it's worth…
				}

				int3Unsubmitted Done Reply Inline Actions nit: take by const ref int3: nit: take by const ref
				// Sort sections within a common segment, which stores them in
				// a MapVector of section name -> section
				bool operator()(const std::pair<StringRef, OutputSection *> &a,
				const std::pair<StringRef, OutputSection *> &b) {
				int3Unsubmitted Done Reply Inline Actions I think `operator<` typically returns a `bool` int3: I think `operator<` typically returns a `bool`
				return sectionOrder(a.first) < sectionOrder(b.first);
				}

				bool operator<(const OutputSectionComparator &b) {
				return segmentOrder < b.segmentOrder;
				}

				private:
				uint32_t segmentOrder;
				llvm::DenseMap<StringRef, uint32_t> sectionOrdering;
				};

				} // namespace macho
				} // namespace lld

				#endif

lld/MachO/OutputSection.cpp

This file was added.

				//===- OutputSection.cpp --------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "OutputSection.h"
				#include "OutputSegment.h"
				#include "lld/Common/ErrorHandler.h"

				using namespace llvm;
				using namespace lld;
				using namespace lld::macho;

				uint64_t OutputSection::getSegmentOffset() const {
				return addr - parent->firstSection()->addr;
				}

				void OutputSection::mergeInput(InputSection *input) {
				smeenaiUnsubmitted Not Done Reply Inline Actions What's this computation doing? smeenai: What's this computation doing?
				KtwuAuthorUnsubmitted Done Reply Inline Actions It's copying what InputSection did to calculate its file offset. I didn't entirely comprehend this math tbh. Ktwu: It's copying what InputSection did to calculate its file offset. I didn't entirely comprehend…
				smeenaiUnsubmitted Not Done Reply Inline Actions Okay, this makes sense. smeenai: Okay, this makes sense.
				int3Unsubmitted Not Done Reply Inline Actions The parent segment's start address is defined as the address of the first section it contains. So `addr - parent->firstSection()->addr` computes the section's offset within the segment. Maybe we don't need this any more if we have `outSecOff` (going to investigate) int3: The parent segment's start address is defined as the address of the first section it contains.
				int3Unsubmitted Not Done Reply Inline Actions Oh, outSecOff is for the InputSection's offset within the OutputSection, but this computation is for the OutputSection's offset within its segment. So having `outSecOff` doesn't impact this. That said the ELF implementation seems to have an explicit `OutputSection::offset` field, though I'm not sure what populates it... but I think this scheme of computing the offset from the address is fine for now int3: Oh, outSecOff is for the InputSection's offset within the OutputSection, but this computation…
				llvm_unreachable("Cannot merge input section into unmergable output section");
				}
				int3Unsubmitted Not Done Reply Inline Actions hidden sections should still be written, only their headers get omitted. Also I think we may be able to have the isHidden() property on (synthetic) OutputSections and not on InputSections int3: hidden sections should still be written, only their headers get omitted. Also I think we may…
				smeenaiUnsubmitted Not Done Reply Inline Actions We should experiment with how ld64 handles merging hidden sections, or if that's even a thing it does. (I don't think you can specify a section is hidden yourself; ld64 just has a list of atoms it defines to be hidden.) smeenai: We should experiment with how ld64 handles merging hidden sections, or if that's even a thing…
				smeenaiUnsubmitted Done Reply Inline Actions Can you add a TODO for figuring out how we should handle input sections with conflicting hidden-ness? smeenai: Can you add a TODO for figuring out how we should handle input sections with conflicting hidden…
				int3Unsubmitted Not Done Reply Inline Actions Personally I don't think we should support hidden InputSections until we find a use case (I'm not aware of one so far) int3: Personally I don't think we should support hidden InputSections until we find a use case (I'm…
				int3Unsubmitted Not Done Reply Inline Actions I.e. I think the synthetic sections could become OutputSections. So the only InputSections would actually be from real inputs, which will never be hidden int3: I.e. I think the synthetic sections could become OutputSections. So the only InputSections…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Yeah I could try that! Ktwu: Yeah I could try that!
				smeenaiUnsubmitted Not Done Reply Inline Actions Can you add more details to this error message? It'd be ideal to have things like the section in question, the object file it's coming from, etc. Also, is this what ld64 does? smeenai: Can you add more details to this error message? It'd be ideal to have things like the section…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Sure. So far as I can tell, ld64 doesn't do section merging on a flag level like I first thought; now that I'm diving into it, OutputFile.cpp separates flags into individual boolean attributes. It's not the easiest codebase to navigate :/ Ktwu: Sure. So far as I can tell, ld64 doesn't do section merging on a flag level like I first…
				int3Unsubmitted Not Done Reply Inline Actions can we just define this method on MergedOutputSection? int3: can we just define this method on MergedOutputSection?
				KtwuAuthorUnsubmitted Done Reply Inline Actions Since output segments just contain output sections now, they're not aware of whether they contain synthetic sections or mergable sections. I'm not sure how to cast that away (I guess a dynamic cast at runtime, but I figured this would be cheaper). Ktwu: Since output segments just contain output sections now, they're not aware of whether they…
				smeenaiUnsubmitted Done Reply Inline Actions We should only hit this if we have a programming error on our end, right? If so, this should be `llvm_unreachable` instead of `error`. smeenai: We should only hit this if we have a programming error on our end, right? If so, this should be…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Yup, since we're assuming that synthetic sections cannot be merged, this shouldn't happen. Ktwu: Yup, since we're assuming that synthetic sections cannot be merged, this shouldn't happen.

lld/MachO/OutputSegment.h

	//===- OutputSegment.h ------------------------------------------- C++ --===//			//===- OutputSegment.h ------------------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLD_MACHO_OUTPUT_SEGMENT_H			#ifndef LLD_MACHO_OUTPUT_SEGMENT_H
	#define LLD_MACHO_OUTPUT_SEGMENT_H			#define LLD_MACHO_OUTPUT_SEGMENT_H

				#include "OutputSection.h"
	#include "lld/Common/LLVM.h"			#include "lld/Common/LLVM.h"
	#include "llvm/ADT/MapVector.h"			#include "llvm/ADT/MapVector.h"

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	namespace segment_names {			namespace segment_names {

	constexpr const char *text = "__TEXT";			constexpr const char *text = "__TEXT";
	constexpr const char *pageZero = "__PAGEZERO";			constexpr const char *pageZero = "__PAGEZERO";
	constexpr const char *linkEdit = "__LINKEDIT";			constexpr const char *linkEdit = "__LINKEDIT";
				constexpr const char *dataConst = "__DATA_CONST";

	} // namespace segment_names			} // namespace segment_names

				class OutputSection;
				class OutputSegmentComparator;
	class InputSection;			class InputSection;

	class OutputSegment {			class OutputSegment {
	public:			public:
	InputSection *firstSection() const { return sections.front().second.at(0); }			using SectionMap = typename llvm::MapVector<StringRef, OutputSection *>;
				using SectionMapEntry = typename std::pair<StringRef, OutputSection *>;
				int3Unsubmitted Done Reply Inline Actions nit: I think `using SectionMap = ...` is the more modern C++ way (and is the method favored by lld-ELF/COFF) int3: nit: I think `using SectionMap = ...` is the more modern C++ way (and is the method favored by…
				smeenaiUnsubmitted Done Reply Inline Actions This one needs to be addressed. smeenai: This one needs to be addressed.

	InputSection *lastSection() const { return sections.back().second.back(); }			const OutputSection *firstSection() const { return sections.front().second; }
				const OutputSection *lastSection() const { return sections.back().second; }

	bool isNeeded() const {			bool isNeeded() const {
	return !sections.empty() \|\| name == segment_names::linkEdit;			if (name == segment_names::linkEdit)
				return true;
				for (const SectionMapEntry &i : sections) {
				int3Unsubmitted Done Reply Inline Actions @ruiu will object to the `auto` :) I personally hate typing out an `std::pair` type, but we should at least unpack `i.second` into a var with a named type nit 2: use `auto &` here alternatively we could replace the loop with `std::any_of` int3: @ruiu will object to the `auto` :) I personally hate typing out an `std::pair` type, but we…
				OutputSection *os = i.second;
				if (os->isNeeded())
				return true;
				}
				return false;
	}			}

	void addSection(InputSection *);			OutputSection *getOrCreateOutputSection(StringRef name);
				void addOutputSection(OutputSection *os);
				int3Unsubmitted Done Reply Inline Actions `addOutputSection` seems like a more fitting name int3: `addOutputSection` seems like a more fitting name
				int3Unsubmitted Done Reply Inline Actions also, I think if would make sense if we moved `createOutputSection` from Writer.cpp to a method on this class, then this helper method can be made private int3: also, I think if would make sense if we moved `createOutputSection` from Writer.cpp to a method…
				void sortOutputSections(OutputSegmentComparator *comparator);

	const llvm::MapVector<StringRef, std::vector<InputSection *>> &			const SectionMap &getSections() const { return sections; }
	getSections() const {			size_t numNonHiddenSections() const;
				int3Unsubmitted Not Done Reply Inline Actions if `sections` isn't private any more, it doesn't need an accessor edit: I see that it's non-private only because we need to sort it. How about making the sorting a method on this class? Related thought... I see that MapVector has a `takeVector` method that clears out the map and returns the underlying vector. Maybe we could do that -- have a `getSortedSections` method that returns an empty vector until we actually sort things. That would mean that the comparator can work on an actual vector & take single elements instead of `std::pair`s. Just my 2c, might be overcomplicating things here int3: if `sections` isn't private any more, it doesn't need an accessor edit: I see that it's non…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Yes to section sorting living here, nay to the `takeVector` idea (at least in this diff). Ktwu: Yes to section sorting living here, nay to the `takeVector` idea (at least in this diff).
				int3Unsubmitted Not Done Reply Inline Actions I was looking at the implementation of `MapVector` today and I realized that the sort only operates on the vector and not the map, so the container exhibits some questionable behavior after sorting :D MapVector<int, int> mv; mv[2] = 98; mv[1] = 99; std::sort(mv.begin(), mv.end()); for (int i = 1; i <= 2; ++i) fprintf(stderr, "%d %d\n", i, mv[i]); This prints 1 98 2 99 So I think we should really use `takeVector` before sorting. Two more refactoring ideas to tack on to that: We could filter out the unneeded OutputSections as this stage too, so we don't have to worry about checking isNeeded() afterward. Maybe we could consider not creating the OutputSegments till after the sorting/filtering has been done, so we don't have the OutputSegments in a state where some operations aren't valid. But that would probably mean an additional `outputSections` global, so there's some tradeoff there. Up to you int3: I was looking at the implementation of `MapVector` today and I realized that the sort only…
	return sections;
	}

	uint64_t fileOff = 0;			uint64_t fileOff = 0;
	StringRef name;			StringRef name;
	uint32_t numNonHiddenSections = 0;
	uint32_t maxProt = 0;			uint32_t maxProt = 0;
	uint32_t initProt = 0;			uint32_t initProt = 0;
	uint8_t index;			uint8_t index;

	private:			private:
	llvm::MapVector<StringRef, std::vector<InputSection *>> sections;			SectionMap sections;
				};

				class OutputSegmentComparator {
				public:
				OutputSegmentComparator();

				OutputSectionComparator sectionComparator(const OutputSegment os) {
				auto it = orderMap.find(os->name);
				if (it == orderMap.end()) {
				return defaultPositionComparator;
				smeenaiUnsubmitted Not Done Reply Inline Actions Given that this might be the common case, would it make sense to cache the `find` somehow? smeenai: Given that this might be the common case, would it make sense to cache the `find` somehow?
				KtwuAuthorUnsubmitted Done Reply Inline Actions Good idea! Ktwu: Good idea!
				smeenaiUnsubmitted Not Done Reply Inline Actions This one isn't addressed, but given that we shouldn't have too many segments (besides the ones already in the map, I can only think of `__DATA`), perhaps this is okay as-is? smeenai: This one isn't addressed, but given that we shouldn't have too many segments (besides the ones…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Ah, no, I legit forgot to address this, so I don't mind getting to it... Ktwu: Ah, no, I legit forgot to address this, so I don't mind getting to it...
				}
				return &it->second;
				}

				bool operator()(const OutputSegment a, const OutputSegment b) {
				return sectionComparator(a) < sectionComparator(b);
				}

				private:
				const StringRef defaultPosition = StringRef();
				llvm::DenseMap<StringRef, OutputSectionComparator> orderMap;
				OutputSectionComparator *defaultPositionComparator;
	};			};

	extern std::vector<OutputSegment *> outputSegments;			extern std::vector<OutputSegment *> outputSegments;

	OutputSegment *getOutputSegment(StringRef name);			OutputSegment *getOutputSegment(StringRef name);
	OutputSegment *getOrCreateOutputSegment(StringRef name);			OutputSegment *getOrCreateOutputSegment(StringRef name);
				void sortOutputSegmentsAndSections();

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/OutputSegment.cpp

	//===- OutputSegment.cpp --------------------------------------------------===//			//===- OutputSegment.cpp --------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "OutputSegment.h"			#include "OutputSegment.h"
	#include "InputSection.h"			#include "InputSection.h"
				#include "MergedOutputSection.h"
				#include "SyntheticSections.h"

				#include "lld/Common/ErrorHandler.h"
	#include "lld/Common/Memory.h"			#include "lld/Common/Memory.h"
	#include "llvm/BinaryFormat/MachO.h"			#include "llvm/BinaryFormat/MachO.h"

	using namespace llvm;			using namespace llvm;
	using namespace llvm::MachO;			using namespace llvm::MachO;
	using namespace lld;			using namespace lld;
	using namespace lld::macho;			using namespace lld::macho;

	static uint32_t initProt(StringRef name) {			static uint32_t initProt(StringRef name) {
	if (name == segment_names::text)			if (name == segment_names::text)
	return VM_PROT_READ \| VM_PROT_EXECUTE;			return VM_PROT_READ \| VM_PROT_EXECUTE;
	if (name == segment_names::pageZero)			if (name == segment_names::pageZero)
	return 0;			return 0;
	if (name == segment_names::linkEdit)			if (name == segment_names::linkEdit)
	return VM_PROT_READ;			return VM_PROT_READ;
	return VM_PROT_READ \| VM_PROT_WRITE;			return VM_PROT_READ \| VM_PROT_WRITE;
	}			}

	static uint32_t maxProt(StringRef name) {			static uint32_t maxProt(StringRef name) {
	if (name == segment_names::pageZero)			if (name == segment_names::pageZero)
	return 0;			return 0;
	return VM_PROT_READ \| VM_PROT_WRITE \| VM_PROT_EXECUTE;			return VM_PROT_READ \| VM_PROT_WRITE \| VM_PROT_EXECUTE;
	}			}

	void OutputSegment::addSection(InputSection *isec) {			size_t OutputSegment::numNonHiddenSections() const {
	isec->parent = this;			size_t count = 0;
	std::vector<InputSection *> &vec = sections[isec->name];			for (const OutputSegment::SectionMapEntry &i : sections) {
				smeenaiUnsubmitted Done Reply Inline Actions `auto &i`, and same comment about unpacking `i.second` smeenai: `auto &i`, and same comment about unpacking `i.second`
	if (vec.empty() && !isec->isHidden()) {			OutputSection *os = i.second;
	++numNonHiddenSections;			count += (os->isHidden() ? 0 : 1);
				smeenaiUnsubmitted Not Done Reply Inline Actions I'm wondering if it'd be better to construct an OutputSection independently and then pass that into this function instead. smeenai: I'm wondering if it'd be better to construct an OutputSection independently and then pass that…
				smeenaiUnsubmitted Not Done Reply Inline Actions If this is gonna be called often, we should either do some sort of caching, or else just do the computation as part of `addOutputSection` and make it a public member variable. smeenai: If this is gonna be called often, we should either do some sort of caching, or else just do the…
				KtwuAuthorUnsubmitted Done Reply Inline Actions I believe I hit issues trying to cache this on adding an OutputSection; I think the GotSection made things difficult since its isNeeded() attribute is dynamic. Ktwu: I believe I hit issues trying to cache this on adding an OutputSection; I think the GotSection…
				smeenaiUnsubmitted Not Done Reply Inline Actions Ah, makes sense. smeenai: Ah, makes sense.
	}			}
	vec.push_back(isec);			return count;
				}

				void OutputSegment::addOutputSection(OutputSection *os) {
				os->parent = this;
				std::pair<SectionMap::iterator, bool> result =
				smeenaiUnsubmitted Done Reply Inline Actions Should we assert that `sections[os->name]` doesn't already exist? smeenai: Should we assert that `sections[os->name]` doesn't already exist?
				sections.insert(SectionMapEntry(os->name, os));
				if (!result.second) {
				llvm_unreachable("Attempted to set section, but a section with the same "
				"name already exists");
				smeenaiUnsubmitted Not Done Reply Inline Actions Is the check flipped? smeenai: Is the check flipped?
				KtwuAuthorUnsubmitted Done Reply Inline Actions Oops, yes. Ktwu: Oops, yes.
				smeenaiUnsubmitted Done Reply Inline Actions FWIW, `auto` is fine for iterators, but this is fine too. smeenai: FWIW, `auto` is fine for iterators, but this is fine too.
				}
				}

				OutputSection *OutputSegment::getOrCreateOutputSection(StringRef name) {
				OutputSegment::SectionMap::iterator i = sections.find(name);
				if (i != sections.end()) {
				return i->second;
				}

				auto *os = make<MergedOutputSection>(name);
				addOutputSection(os);
				return os;
				}

				void OutputSegment::sortOutputSections(OutputSegmentComparator *comparator) {
				llvm::stable_sort(sections, *comparator->sectionComparator(this));
				}

				OutputSegmentComparator::OutputSegmentComparator() {
				// This defines the order of segments and the sections within each segment.
				// Segments that are not mentioned here will end up at defaultPosition;
				// sections that are not mentioned will end up at the end of the section
				// list for their given segment.
				std::vector<std::pair<StringRef, std::vector<StringRef>>> ordering{
				{segment_names::pageZero, {}},
				{segment_names::text, {section_names::header}},
				{defaultPosition, {}},
				// Make sure __LINKEDIT is the last segment (i.e. all its hidden
				// sections must be ordered after other sections).
				{segment_names::linkEdit,
				{
				section_names::binding,
				section_names::export_,
				section_names::symbolTable,
				section_names::stringTable,
				}},
				};

				for (uint32_t i = 0, n = ordering.size(); i < n; ++i) {
				auto &p = ordering[i];
				StringRef segname = p.first;
				const std::vector<StringRef> &sectOrdering = p.second;
				orderMap.insert(std::pair<StringRef, OutputSectionComparator>(
				segname, OutputSectionComparator(i, sectOrdering)));
				}

				// Cache the position for the default comparator since this is the likely
				// scenario.
				defaultPositionComparator = &orderMap.find(defaultPosition)->second;
	}			}

	static llvm::DenseMap<StringRef, OutputSegment *> nameToOutputSegment;			static llvm::DenseMap<StringRef, OutputSegment *> nameToOutputSegment;
	std::vector<OutputSegment *> macho::outputSegments;			std::vector<OutputSegment *> macho::outputSegments;

	OutputSegment *macho::getOutputSegment(StringRef name) {			OutputSegment *macho::getOutputSegment(StringRef name) {
	return nameToOutputSegment.lookup(name);			return nameToOutputSegment.lookup(name);
	}			}

	OutputSegment *macho::getOrCreateOutputSegment(StringRef name) {			OutputSegment *macho::getOrCreateOutputSegment(StringRef name) {
	OutputSegment *&segRef = nameToOutputSegment[name];			OutputSegment *&segRef = nameToOutputSegment[name];
	if (segRef != nullptr)			if (segRef != nullptr)
	return segRef;			return segRef;

	segRef = make<OutputSegment>();			segRef = make<OutputSegment>();
	segRef->name = name;			segRef->name = name;
	segRef->maxProt = maxProt(name);			segRef->maxProt = maxProt(name);
	segRef->initProt = initProt(name);			segRef->initProt = initProt(name);

	outputSegments.push_back(segRef);			outputSegments.push_back(segRef);
	return segRef;			return segRef;
	}			}

				void macho::sortOutputSegmentsAndSections() {
				// Sorting only can happen once all outputs have been collected.
				// Since output sections are grouped by segment, sorting happens
				// first over all segments, then over sections per segment.
				auto comparator = OutputSegmentComparator();
				llvm::stable_sort(outputSegments, comparator);

				// Now that the output sections are sorted, assign the final
				// output section indices.
				uint32_t sectionIndex = 0;
				for (OutputSegment *seg : outputSegments) {
				seg->sortOutputSections(&comparator);
				for (auto &p : seg->getSections()) {
				OutputSection *section = p.second;
				if (!section->isHidden()) {
				section->index = ++sectionIndex;
				}
				}
				}
				}

lld/MachO/Symbols.h

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	public:
static bool classof(const Symbol *s) { return s->kind() == DylibKind; }		static bool classof(const Symbol *s) { return s->kind() == DylibKind; }

DylibFile *file;		DylibFile *file;
uint32_t gotIndex = UINT32_MAX;		uint32_t gotIndex = UINT32_MAX;
};		};

inline uint64_t Symbol::getVA() const {		inline uint64_t Symbol::getVA() const {
if (auto *d = dyn_cast<Defined>(this))		if (auto *d = dyn_cast<Defined>(this))
return d->isec->addr + d->value;		return d->isec->getVA() + d->value;
return 0;		return 0;
}		}

union SymbolUnion {		union SymbolUnion {
alignas(Defined) char a[sizeof(Defined)];		alignas(Defined) char a[sizeof(Defined)];
alignas(Undefined) char b[sizeof(Undefined)];		alignas(Undefined) char b[sizeof(Undefined)];
alignas(DylibSymbol) char c[sizeof(DylibSymbol)];		alignas(DylibSymbol) char c[sizeof(DylibSymbol)];
};		};
Show All 18 Lines

lld/MachO/SyntheticSections.h

	//===- SyntheticSections.h -------------------------------------- C++ --===//			//===- SyntheticSections.h -------------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLD_MACHO_SYNTHETIC_SECTIONS_H			#ifndef LLD_MACHO_SYNTHETIC_SECTIONS_H
	#define LLD_MACHO_SYNTHETIC_SECTIONS_H			#define LLD_MACHO_SYNTHETIC_SECTIONS_H

	#include "ExportTrie.h"			#include "ExportTrie.h"
	#include "InputSection.h"			#include "OutputSection.h"
	#include "Target.h"			#include "Target.h"
	#include "llvm/ADT/SetVector.h"			#include "llvm/ADT/SetVector.h"

	using namespace llvm::MachO;

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	namespace section_names {			namespace section_names {

	constexpr const char *pageZero = "__pagezero";			constexpr const char *pageZero = "__pagezero";
	constexpr const char *header = "__mach_header";			constexpr const char *header = "__mach_header";
	constexpr const char *binding = "__binding";			constexpr const char *binding = "__binding";
	constexpr const char *export_ = "__export";			constexpr const char *export_ = "__export";
	constexpr const char *symbolTable = "__symbol_table";			constexpr const char *symbolTable = "__symbol_table";
	constexpr const char *stringTable = "__string_table";			constexpr const char *stringTable = "__string_table";
				constexpr const char *got = "__got";

	} // namespace section_names			} // namespace section_names

	class DylibSymbol;			class DylibSymbol;
	class LoadCommand;			class LoadCommand;

				class SyntheticSection : public OutputSection {
				public:
				SyntheticSection(const char segname, const char name);
				virtual ~SyntheticSection() = default;
				};

	// The header of the Mach-O file, which must have a file offset of zero.			// The header of the Mach-O file, which must have a file offset of zero.
	class MachHeaderSection : public InputSection {			class MachHeaderSection : public SyntheticSection {
	public:			public:
	MachHeaderSection();			MachHeaderSection();
	void addLoadCommand(LoadCommand *);			void addLoadCommand(LoadCommand *);
	bool isHidden() const override { return true; }			bool isHidden() const override { return true; }
	size_t getSize() const override;			size_t getSize() const override;
	void writeTo(uint8_t *buf) override;			void writeTo(uint8_t *buf) const override;

	private:			private:
				int3Unsubmitted Not Done Reply Inline Actions should this field live in the parent class? edit: Oh I see, OutputSections don't define `segname` because that's accessed through `parent->name`, so only InputSections and synthetic sections need to define segnames. How about defining a `SyntheticSection` superclass that defines this field in its ctor, and have every class in this file inherit from it? int3: should this field live in the parent class? edit: Oh I see, OutputSections don't define…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Yup, a base SyntheticSection class sounds ++ Ktwu: Yup, a base SyntheticSection class sounds ++
	std::vector<LoadCommand *> loadCommands;			std::vector<LoadCommand *> loadCommands;
	uint32_t sizeOfCmds = 0;			uint32_t sizeOfCmds = 0;
	};			};

	// A hidden section that exists solely for the purpose of creating the			// A hidden section that exists solely for the purpose of creating the
	// __PAGEZERO segment, which is used to catch null pointer dereferences.			// __PAGEZERO segment, which is used to catch null pointer dereferences.
	class PageZeroSection : public InputSection {			class PageZeroSection : public SyntheticSection {
	public:			public:
	PageZeroSection();			PageZeroSection();
	bool isHidden() const override { return true; }			bool isHidden() const override { return true; }
	size_t getSize() const override { return ImageBase; }			size_t getSize() const override { return ImageBase; }
	uint64_t getFileSize() const override { return 0; }			uint64_t getFileSize() const override { return 0; }
				void writeTo(uint8_t *buf) const override {}
	};			};

	// This section will be populated by dyld with addresses to non-lazily-loaded			// This section will be populated by dyld with addresses to non-lazily-loaded
	// dylib symbols.			// dylib symbols.
	class GotSection : public InputSection {			class GotSection : public SyntheticSection {
	public:			public:
	GotSection();			GotSection();

	void addEntry(DylibSymbol &sym);			void addEntry(DylibSymbol &sym);
	const llvm::SetVector<const DylibSymbol *> &getEntries() const {			const llvm::SetVector<const DylibSymbol *> &getEntries() const {
	return entries;			return entries;
	}			}

	size_t getSize() const override { return entries.size() * WordSize; }

	bool isNeeded() const override { return !entries.empty(); }			bool isNeeded() const override { return !entries.empty(); }

	void writeTo(uint8_t *buf) override {			size_t getSize() const override { return entries.size() * WordSize; }

				void writeTo(uint8_t *buf) const override {
	// Nothing to write, GOT contains all zeros at link time; it's populated at			// Nothing to write, GOT contains all zeros at link time; it's populated at
	// runtime by dyld.			// runtime by dyld.
	}			}

	private:			private:
	llvm::SetVector<const DylibSymbol *> entries;			llvm::SetVector<const DylibSymbol *> entries;
	};			};

	// Stores bind opcodes for telling dyld which symbols to load non-lazily.			// Stores bind opcodes for telling dyld which symbols to load non-lazily.
	class BindingSection : public InputSection {			class BindingSection : public SyntheticSection {
	public:			public:
	BindingSection();			BindingSection();
	void finalizeContents();			void finalizeContents();
	size_t getSize() const override { return contents.size(); }			size_t getSize() const override { return contents.size(); }
	// Like other sections in __LINKEDIT, the binding section is special: its			// Like other sections in __LINKEDIT, the binding section is special: its
	// offsets are recorded in the LC_DYLD_INFO_ONLY load command, instead of in			// offsets are recorded in the LC_DYLD_INFO_ONLY load command, instead of in
	// section headers.			// section headers.
	bool isHidden() const override { return true; }			bool isHidden() const override { return true; }
	bool isNeeded() const override;			bool isNeeded() const override;
	void writeTo(uint8_t *buf) override;			void writeTo(uint8_t *buf) const override;

	SmallVector<char, 128> contents;			SmallVector<char, 128> contents;
	};			};

	// Stores a trie that describes the set of exported symbols.			// Stores a trie that describes the set of exported symbols.
	class ExportSection : public InputSection {			class ExportSection : public SyntheticSection {
	public:			public:
	ExportSection();			ExportSection();
	void finalizeContents();			void finalizeContents();
	size_t getSize() const override { return size; }			size_t getSize() const override { return size; }
	// Like other sections in __LINKEDIT, the export section is special: its			// Like other sections in __LINKEDIT, the export section is special: its
	// offsets are recorded in the LC_DYLD_INFO_ONLY load command, instead of in			// offsets are recorded in the LC_DYLD_INFO_ONLY load command, instead of in
	// section headers.			// section headers.
	bool isHidden() const override { return true; }			bool isHidden() const override { return true; }
	void writeTo(uint8_t *buf) override;			void writeTo(uint8_t *buf) const override;

	private:			private:
	TrieBuilder trieBuilder;			TrieBuilder trieBuilder;
	size_t size = 0;			size_t size = 0;
	};			};

	// Stores the strings referenced by the symbol table.			// Stores the strings referenced by the symbol table.
	class StringTableSection : public InputSection {			class StringTableSection : public SyntheticSection {
	public:			public:
	StringTableSection();			StringTableSection();
	// Returns the start offset of the added string.			// Returns the start offset of the added string.
	uint32_t addString(StringRef);			uint32_t addString(StringRef);
	size_t getSize() const override { return size; }			size_t getSize() const override { return size; }
	// Like other sections in __LINKEDIT, the string table section is special: its			// Like other sections in __LINKEDIT, the string table section is special: its
	// offsets are recorded in the LC_SYMTAB load command, instead of in section			// offsets are recorded in the LC_SYMTAB load command, instead of in section
	// headers.			// headers.
	bool isHidden() const override { return true; }			bool isHidden() const override { return true; }
	void writeTo(uint8_t *buf) override;			void writeTo(uint8_t *buf) const override;

	private:			private:
	// An n_strx value of 0 always indicates the empty string, so we must locate			// An n_strx value of 0 always indicates the empty string, so we must locate
	// our non-empty string values at positive offsets in the string table.			// our non-empty string values at positive offsets in the string table.
	// Therefore we insert a dummy value at position zero.			// Therefore we insert a dummy value at position zero.
	std::vector<StringRef> strings{"\0"};			std::vector<StringRef> strings{"\0"};
	size_t size = 1;			size_t size = 1;
	};			};

	struct SymtabEntry {			struct SymtabEntry {
	Symbol *sym;			Symbol *sym;
	size_t strx;			size_t strx;
	};			};

	class SymtabSection : public InputSection {			class SymtabSection : public SyntheticSection {
	public:			public:
	SymtabSection(StringTableSection &);			SymtabSection(StringTableSection &);
	void finalizeContents();			void finalizeContents();
	size_t getNumSymbols() const { return symbols.size(); }			size_t getNumSymbols() const { return symbols.size(); }
	size_t getSize() const override;			size_t getSize() const override;
	// Like other sections in __LINKEDIT, the symtab section is special: its			// Like other sections in __LINKEDIT, the symtab section is special: its
	// offsets are recorded in the LC_SYMTAB load command, instead of in section			// offsets are recorded in the LC_SYMTAB load command, instead of in section
	// headers.			// headers.
	bool isHidden() const override { return true; }			bool isHidden() const override { return true; }
	void writeTo(uint8_t *buf) override;			void writeTo(uint8_t *buf) const override;

	private:			private:
	StringTableSection &stringTableSection;			StringTableSection &stringTableSection;
	std::vector<SymtabEntry> symbols;			std::vector<SymtabEntry> symbols;
	};			};

	struct InStruct {			struct InStruct {
	GotSection *got = nullptr;			GotSection *got = nullptr;
	};			};

	extern InStruct in;			extern InStruct in;
				smeenaiUnsubmitted Not Done Reply Inline Actions I believe the `in` is short for "globally accessibly input sections". LLD ELF has a corresponding struct called `Out` for output sections (in OutputSections.h), and given that our synthetic sections are output sections now, it might make sense to adopt that name. smeenai: I believe the `in` is short for "globally accessibly input sections". LLD ELF has a…
				smeenaiUnsubmitted Not Done Reply Inline Actions This one still needs addressing, though I'm okay with doing it in a follow-up if you'd prefer. smeenai: This one still needs addressing, though I'm okay with doing it in a follow-up if you'd prefer.

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/SyntheticSections.cpp

	Show All 20 Lines

	using namespace llvm;			using namespace llvm;
	using namespace llvm::MachO;			using namespace llvm::MachO;
	using namespace llvm::support;			using namespace llvm::support;

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	MachHeaderSection::MachHeaderSection() {			SyntheticSection::SyntheticSection(const char segname, const char name)
				: OutputSection(name) {
				int3Unsubmitted Done Reply Inline Actions ultra nit: how about "Synthetic sections always know which segment they belong to, so hook them up when they're made"? "No need to orphan" seems a bit weird because it hints that one might naively want to orphan them but doesn't indicate why int3: ultra nit: how about "Synthetic sections always know which segment they belong to, so hook them…
				// Synthetic sections always know which segment they belong to so hook
				// them up when they're made
				smeenaiUnsubmitted Done Reply Inline Actions Super nit: use a member initialization list instead. smeenai: Super nit: use a member initialization list instead.
				getOrCreateOutputSegment(segname)->addOutputSection(this);
				}

	// dyld3's MachOLoaded::getSlide() assumes that the __TEXT segment starts			// dyld3's MachOLoaded::getSlide() assumes that the __TEXT segment starts
	// from the beginning of the file (i.e. the header).			// from the beginning of the file (i.e. the header).
	segname = segment_names::text;			MachHeaderSection::MachHeaderSection()
	name = section_names::header;			: SyntheticSection(segment_names::text, section_names::header) {}
	}

	void MachHeaderSection::addLoadCommand(LoadCommand *lc) {			void MachHeaderSection::addLoadCommand(LoadCommand *lc) {
	loadCommands.push_back(lc);			loadCommands.push_back(lc);
	sizeOfCmds += lc->getSize();			sizeOfCmds += lc->getSize();
	}			}

	size_t MachHeaderSection::getSize() const {			size_t MachHeaderSection::getSize() const {
	return sizeof(mach_header_64) + sizeOfCmds;			return sizeof(mach_header_64) + sizeOfCmds;
	}			}

	void MachHeaderSection::writeTo(uint8_t *buf) {			void MachHeaderSection::writeTo(uint8_t *buf) const {
	auto hdr = reinterpret_cast<mach_header_64 >(buf);			auto hdr = reinterpret_cast<mach_header_64 >(buf);
	hdr->magic = MH_MAGIC_64;			hdr->magic = MH_MAGIC_64;
	hdr->cputype = CPU_TYPE_X86_64;			hdr->cputype = CPU_TYPE_X86_64;
	hdr->cpusubtype = CPU_SUBTYPE_X86_64_ALL \| CPU_SUBTYPE_LIB64;			hdr->cpusubtype = CPU_SUBTYPE_X86_64_ALL \| CPU_SUBTYPE_LIB64;
	hdr->filetype = config->outputType;			hdr->filetype = config->outputType;
	hdr->ncmds = loadCommands.size();			hdr->ncmds = loadCommands.size();
	hdr->sizeofcmds = sizeOfCmds;			hdr->sizeofcmds = sizeOfCmds;
	hdr->flags = MH_NOUNDEFS \| MH_DYLDLINK \| MH_TWOLEVEL;			hdr->flags = MH_NOUNDEFS \| MH_DYLDLINK \| MH_TWOLEVEL;

	uint8_t p = reinterpret_cast<uint8_t >(hdr + 1);			uint8_t p = reinterpret_cast<uint8_t >(hdr + 1);
	for (LoadCommand *lc : loadCommands) {			for (LoadCommand *lc : loadCommands) {
	lc->writeTo(p);			lc->writeTo(p);
	p += lc->getSize();			p += lc->getSize();
	}			}
	}			}

	PageZeroSection::PageZeroSection() {			PageZeroSection::PageZeroSection()
	segname = segment_names::pageZero;			: SyntheticSection(segment_names::pageZero, section_names::pageZero) {}
	name = section_names::pageZero;
	}

	GotSection::GotSection() {			GotSection::GotSection()
	segname = "__DATA_CONST";			: SyntheticSection(segment_names::dataConst, section_names::got) {
				smeenaiUnsubmitted Not Done Reply Inline Actions @int3 how come these strings are just here directly vs. all the other ones being named constants? Also, not this diff, but is `__DATA_CONST` correct? ld64 puts this in `__DATA` instead as far as I can see. It's not quite constant cos the dynamic linker's gonna fill it in, though idk if it has the equivalent to ELF's RELRO. smeenai: @int3 how come these strings are just here directly vs. all the other ones being named…
				int3Unsubmitted Not Done Reply Inline Actions I don't think we're currently referencing this from anywhere else (in particular I didn't give them an order in the sorting comparator), so it wasn't technically necessary, though we could definitely make them constants for the sake of uniformity int3: I don't think we're currently referencing this from anywhere else (in particular I didn't give…
				KtwuAuthorUnsubmitted Done Reply Inline Actions I'll make 'em a constant for now; can we deal with DATA vs DATA_CONST in another diff if need be? Ktwu: I'll make 'em a constant for now; can we deal with __DATA vs __DATA_CONST in another diff if…
				int3Unsubmitted Not Done Reply Inline Actions Oh sorry I missed the 2nd part of the comment about the segment. I'm pretty sure it's in `__DATA_CONST` at least on Catalina; just tried it out. But yeah we can deal with it in another diff if necessary int3: Oh sorry I missed the 2nd part of the comment about the segment. I'm pretty sure it's in…
				smeenaiUnsubmitted Not Done Reply Inline Actions Ah, sorry, I was just wondering why these weren't constants. I didn't mean to imply that you had to do it in this diff, but thanks for taking care of it :) @int3 interesting, this appears to be a Catalina vs older OS thing. Possibly related to dyld3? smeenai: Ah, sorry, I was just wondering why these weren't constants. I didn't mean to imply that you…
	name = "__got";
	align = 8;			align = 8;
	flags = S_NON_LAZY_SYMBOL_POINTERS;			flags = S_NON_LAZY_SYMBOL_POINTERS;

	// TODO: section_64::reserved1 should be an index into the indirect symbol			// TODO: section_64::reserved1 should be an index into the indirect symbol
	// table, which we do not currently emit			// table, which we do not currently emit
	}			}

	void GotSection::addEntry(DylibSymbol &sym) {			void GotSection::addEntry(DylibSymbol &sym) {
	if (entries.insert(&sym)) {			if (entries.insert(&sym)) {
	sym.gotIndex = entries.size() - 1;			sym.gotIndex = entries.size() - 1;
	}			}
	}			}

	BindingSection::BindingSection() {			BindingSection::BindingSection()
	segname = segment_names::linkEdit;			: SyntheticSection(segment_names::linkEdit, section_names::binding) {}
	name = section_names::binding;
	}

	bool BindingSection::isNeeded() const { return in.got->isNeeded(); }			bool BindingSection::isNeeded() const { return in.got->isNeeded(); }

	// Emit bind opcodes, which are a stream of byte-sized opcodes that dyld			// Emit bind opcodes, which are a stream of byte-sized opcodes that dyld
	// interprets to update a record with the following fields:			// interprets to update a record with the following fields:
	// * segment index (of the segment to write the symbol addresses to, typically			// * segment index (of the segment to write the symbol addresses to, typically
	// the __DATA_CONST segment which contains the GOT)			// the __DATA_CONST segment which contains the GOT)
	// * offset within the segment, indicating the next location to write a binding			// * offset within the segment, indicating the next location to write a binding
	// * symbol type			// * symbol type
	// * symbol library ordinal (the index of its library's LC_LOAD_DYLIB command)			// * symbol library ordinal (the index of its library's LC_LOAD_DYLIB command)
	// * symbol name			// * symbol name
	// * addend			// * addend
	// When dyld sees BIND_OPCODE_DO_BIND, it uses the current record state to bind			// When dyld sees BIND_OPCODE_DO_BIND, it uses the current record state to bind
	// a symbol in the GOT, and increments the segment offset to point to the next			// a symbol in the GOT, and increments the segment offset to point to the next
	// entry. It does not clear the record state after doing the bind, so			// entry. It does not clear the record state after doing the bind, so
	// subsequent opcodes only need to encode the differences between bindings.			// subsequent opcodes only need to encode the differences between bindings.
	void BindingSection::finalizeContents() {			void BindingSection::finalizeContents() {
	if (!isNeeded())			if (!isNeeded())
	return;			return;

	raw_svector_ostream os{contents};			raw_svector_ostream os{contents};
	os << static_cast<uint8_t>(BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB \|			os << static_cast<uint8_t>(BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB \|
	in.got->parent->index);			in.got->parent->index);
	encodeULEB128(in.got->addr - in.got->parent->firstSection()->addr, os);			encodeULEB128(in.got->getSegmentOffset(), os);
	for (const DylibSymbol *sym : in.got->getEntries()) {			for (const DylibSymbol *sym : in.got->getEntries()) {
	// TODO: Implement compact encoding -- we only need to encode the			// TODO: Implement compact encoding -- we only need to encode the
	// differences between consecutive symbol entries.			// differences between consecutive symbol entries.
	if (sym->file->ordinal <= BIND_IMMEDIATE_MASK) {			if (sym->file->ordinal <= BIND_IMMEDIATE_MASK) {
	os << static_cast<uint8_t>(BIND_OPCODE_SET_DYLIB_ORDINAL_IMM \|			os << static_cast<uint8_t>(BIND_OPCODE_SET_DYLIB_ORDINAL_IMM \|
	sym->file->ordinal);			sym->file->ordinal);
	} else {			} else {
	error("TODO: Support larger dylib symbol ordinals");			error("TODO: Support larger dylib symbol ordinals");
	continue;			continue;
	}			}
	os << static_cast<uint8_t>(BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM)			os << static_cast<uint8_t>(BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM)
	<< sym->getName() << '\0'			<< sym->getName() << '\0'
	<< static_cast<uint8_t>(BIND_OPCODE_SET_TYPE_IMM \| BIND_TYPE_POINTER)			<< static_cast<uint8_t>(BIND_OPCODE_SET_TYPE_IMM \| BIND_TYPE_POINTER)
	<< static_cast<uint8_t>(BIND_OPCODE_DO_BIND);			<< static_cast<uint8_t>(BIND_OPCODE_DO_BIND);
	}			}

	os << static_cast<uint8_t>(BIND_OPCODE_DONE);			os << static_cast<uint8_t>(BIND_OPCODE_DONE);
	}			}

	void BindingSection::writeTo(uint8_t *buf) {			void BindingSection::writeTo(uint8_t *buf) const {
	memcpy(buf, contents.data(), contents.size());			memcpy(buf, contents.data(), contents.size());
	}			}

	ExportSection::ExportSection() {			ExportSection::ExportSection()
	segname = segment_names::linkEdit;			: SyntheticSection(segment_names::linkEdit, section_names::export_) {}
	name = section_names::export_;
	}

	void ExportSection::finalizeContents() {			void ExportSection::finalizeContents() {
	// TODO: We should check symbol visibility.			// TODO: We should check symbol visibility.
	for (const Symbol *sym : symtab->getSymbols())			for (const Symbol *sym : symtab->getSymbols())
	if (auto *defined = dyn_cast<Defined>(sym))			if (auto *defined = dyn_cast<Defined>(sym))
	trieBuilder.addSymbol(*defined);			trieBuilder.addSymbol(*defined);
	size = trieBuilder.build();			size = trieBuilder.build();
	}			}

	void ExportSection::writeTo(uint8_t *buf) { trieBuilder.writeTo(buf); }			void ExportSection::writeTo(uint8_t *buf) const { trieBuilder.writeTo(buf); }

	SymtabSection::SymtabSection(StringTableSection &stringTableSection)			SymtabSection::SymtabSection(StringTableSection &stringTableSection)
	: stringTableSection(stringTableSection) {			: SyntheticSection(segment_names::linkEdit, section_names::symbolTable),
	segname = segment_names::linkEdit;			stringTableSection(stringTableSection) {
	name = section_names::symbolTable;
	// TODO: When we introduce the SyntheticSections superclass, we should make			// TODO: When we introduce the SyntheticSections superclass, we should make
	// all synthetic sections aligned to WordSize by default.			// all synthetic sections aligned to WordSize by default.
	align = WordSize;			align = WordSize;
	}			}

	size_t SymtabSection::getSize() const {			size_t SymtabSection::getSize() const {
	return symbols.size() * sizeof(nlist_64);			return symbols.size() * sizeof(nlist_64);
	}			}

	void SymtabSection::finalizeContents() {			void SymtabSection::finalizeContents() {
	// TODO support other symbol types			// TODO support other symbol types
	for (Symbol *sym : symtab->getSymbols())			for (Symbol *sym : symtab->getSymbols())
	if (isa<Defined>(sym))			if (isa<Defined>(sym))
	symbols.push_back({sym, stringTableSection.addString(sym->getName())});			symbols.push_back({sym, stringTableSection.addString(sym->getName())});
	}			}

	void SymtabSection::writeTo(uint8_t *buf) {			void SymtabSection::writeTo(uint8_t *buf) const {
	auto nList = reinterpret_cast<nlist_64 >(buf);			auto nList = reinterpret_cast<nlist_64 >(buf);
	for (const SymtabEntry &entry : symbols) {			for (const SymtabEntry &entry : symbols) {
	nList->n_strx = entry.strx;			nList->n_strx = entry.strx;
	// TODO support other symbol types			// TODO support other symbol types
	// TODO populate n_desc			// TODO populate n_desc
	if (auto defined = dyn_cast<Defined>(entry.sym)) {			if (auto defined = dyn_cast<Defined>(entry.sym)) {
	nList->n_type = N_EXT \| N_SECT;			nList->n_type = N_EXT \| N_SECT;
	nList->n_sect = defined->isec->sectionIndex;			nList->n_sect = defined->isec->parent->index;
	// For the N_SECT symbol type, n_value is the address of the symbol			// For the N_SECT symbol type, n_value is the address of the symbol
	nList->n_value = defined->value + defined->isec->addr;			nList->n_value = defined->value + defined->isec->getVA();
	}			}
	++nList;			++nList;
	}			}
	}			}

	StringTableSection::StringTableSection() {			StringTableSection::StringTableSection()
	segname = segment_names::linkEdit;			: SyntheticSection(segment_names::linkEdit, section_names::stringTable) {}
	name = section_names::stringTable;
	}

	uint32_t StringTableSection::addString(StringRef str) {			uint32_t StringTableSection::addString(StringRef str) {
	uint32_t strx = size;			uint32_t strx = size;
	strings.push_back(str);			strings.push_back(str);
	size += str.size() + 1; // account for null terminator			size += str.size() + 1; // account for null terminator
	return strx;			return strx;
	}			}

	void StringTableSection::writeTo(uint8_t *buf) {			void StringTableSection::writeTo(uint8_t *buf) const {
	uint32_t off = 0;			uint32_t off = 0;
	for (StringRef str : strings) {			for (StringRef str : strings) {
	memcpy(buf + off, str.data(), str.size());			memcpy(buf + off, str.data(), str.size());
	off += str.size() + 1; // account for null terminator			off += str.size() + 1; // account for null terminator
	}			}
	}			}

	InStruct in;			InStruct in;

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

lld/MachO/Writer.cpp

Show All 31 Lines
class LCDyldInfo;		class LCDyldInfo;
class LCSymtab;		class LCSymtab;

class Writer {		class Writer {
public:		public:
Writer() : buffer(errorHandler().outputBuffer) {}		Writer() : buffer(errorHandler().outputBuffer) {}

void scanRelocations();		void scanRelocations();
void createHiddenSections();		void createOutputSections();
void sortSections();
void createLoadCommands();		void createLoadCommands();
void assignAddresses(OutputSegment *);		void assignAddresses(OutputSegment *);
void createSymtabContents();		void createSymtabContents();

void openFile();		void openFile();
void writeSections();		void writeSections();

void run();		void run();
Show All 16 Lines	public:

uint32_t getSize() const override { return sizeof(dyld_info_command); }		uint32_t getSize() const override { return sizeof(dyld_info_command); }

void writeTo(uint8_t *buf) const override {		void writeTo(uint8_t *buf) const override {
auto c = reinterpret_cast<dyld_info_command >(buf);		auto c = reinterpret_cast<dyld_info_command >(buf);
c->cmd = LC_DYLD_INFO_ONLY;		c->cmd = LC_DYLD_INFO_ONLY;
c->cmdsize = getSize();		c->cmdsize = getSize();
if (bindingSection->isNeeded()) {		if (bindingSection->isNeeded()) {
c->bind_off = bindingSection->getFileOffset();		c->bind_off = bindingSection->fileOff;
c->bind_size = bindingSection->getFileSize();		c->bind_size = bindingSection->getFileSize();
}		}
if (exportSection->isNeeded()) {		if (exportSection->isNeeded()) {
c->export_off = exportSection->getFileOffset();		c->export_off = exportSection->fileOff;
c->export_size = exportSection->getFileSize();		c->export_size = exportSection->getFileSize();
}		}
}		}

BindingSection *bindingSection;		BindingSection *bindingSection;
ExportSection *exportSection;		ExportSection *exportSection;
};		};

Show All 9 Lines
};		};

class LCSegment : public LoadCommand {		class LCSegment : public LoadCommand {
public:		public:
LCSegment(StringRef name, OutputSegment *seg) : name(name), seg(seg) {}		LCSegment(StringRef name, OutputSegment *seg) : name(name), seg(seg) {}

uint32_t getSize() const override {		uint32_t getSize() const override {
return sizeof(segment_command_64) +		return sizeof(segment_command_64) +
seg->numNonHiddenSections * sizeof(section_64);		seg->numNonHiddenSections() * sizeof(section_64);
}		}

void writeTo(uint8_t *buf) const override {		void writeTo(uint8_t *buf) const override {
auto c = reinterpret_cast<segment_command_64 >(buf);		auto c = reinterpret_cast<segment_command_64 >(buf);
buf += sizeof(segment_command_64);		buf += sizeof(segment_command_64);

c->cmd = LC_SEGMENT_64;		c->cmd = LC_SEGMENT_64;
c->cmdsize = getSize();		c->cmdsize = getSize();
memcpy(c->segname, name.data(), name.size());		memcpy(c->segname, name.data(), name.size());
c->fileoff = seg->fileOff;		c->fileoff = seg->fileOff;
c->maxprot = seg->maxProt;		c->maxprot = seg->maxProt;
c->initprot = seg->initProt;		c->initprot = seg->initProt;

if (seg->getSections().empty())		if (!seg->isNeeded())
return;		return;
		int3Unsubmitted Done Reply Inline Actions I think this should stay as `getSections().empty()`, and the check for `isNeeded()` should be moved outside writeTo(). We should just not create LCSegment commands for unneeded segments. The `empty()` check however is still needed because `__LINKEDIT` can be empty. int3: I think this should stay as `getSections().empty()`, and the check for `isNeeded()` should be…

c->vmaddr = seg->firstSection()->addr;		c->vmaddr = seg->firstSection()->addr;
c->vmsize =		c->vmsize =
seg->lastSection()->addr + seg->lastSection()->getSize() - c->vmaddr;		seg->lastSection()->addr + seg->lastSection()->getSize() - c->vmaddr;
c->nsects = seg->numNonHiddenSections;		c->nsects = seg->numNonHiddenSections();

for (auto &p : seg->getSections()) {		for (auto &p : seg->getSections()) {
StringRef s = p.first;		StringRef s = p.first;
ArrayRef<InputSection *> sections = p.second;		OutputSection *section = p.second;
for (InputSection *isec : sections)		c->filesize += section->getFileSize();
c->filesize += isec->getFileSize();		if (section->isHidden())
if (sections[0]->isHidden())
continue;		continue;

auto sectHdr = reinterpret_cast<section_64 >(buf);		auto sectHdr = reinterpret_cast<section_64 >(buf);
buf += sizeof(section_64);		buf += sizeof(section_64);

memcpy(sectHdr->sectname, s.data(), s.size());		memcpy(sectHdr->sectname, s.data(), s.size());
memcpy(sectHdr->segname, name.data(), name.size());		memcpy(sectHdr->segname, name.data(), name.size());

sectHdr->addr = sections[0]->addr;		sectHdr->addr = section->addr;
sectHdr->offset = sections[0]->getFileOffset();		sectHdr->offset = section->fileOff;
sectHdr->align = sections[0]->align;		sectHdr->align = Log2_32(section->align);
uint32_t maxAlign = 0;		sectHdr->flags = section->flags;
for (const InputSection *section : sections)		sectHdr->size = section->getSize();
maxAlign = std::max(maxAlign, section->align);
sectHdr->align = Log2_32(maxAlign);
sectHdr->flags = sections[0]->flags;
sectHdr->size = sections.back()->addr + sections.back()->getSize() -
sections[0]->addr;
}		}
}		}

private:		private:
StringRef name;		StringRef name;
OutputSegment *seg;		OutputSegment *seg;
};		};

Show All 15 Lines	LCSymtab(SymtabSection symtabSection, StringTableSection stringTableSection)
: symtabSection(symtabSection), stringTableSection(stringTableSection) {}		: symtabSection(symtabSection), stringTableSection(stringTableSection) {}

uint32_t getSize() const override { return sizeof(symtab_command); }		uint32_t getSize() const override { return sizeof(symtab_command); }

void writeTo(uint8_t *buf) const override {		void writeTo(uint8_t *buf) const override {
auto c = reinterpret_cast<symtab_command >(buf);		auto c = reinterpret_cast<symtab_command >(buf);
c->cmd = LC_SYMTAB;		c->cmd = LC_SYMTAB;
c->cmdsize = getSize();		c->cmdsize = getSize();
c->symoff = symtabSection->getFileOffset();		c->symoff = symtabSection->fileOff;
c->nsyms = symtabSection->getNumSymbols();		c->nsyms = symtabSection->getNumSymbols();
c->stroff = stringTableSection->getFileOffset();		c->stroff = stringTableSection->fileOff;
c->strsize = stringTableSection->getFileSize();		c->strsize = stringTableSection->getFileSize();
}		}

SymtabSection *symtabSection = nullptr;		SymtabSection *symtabSection = nullptr;
StringTableSection *stringTableSection = nullptr;		StringTableSection *stringTableSection = nullptr;
};		};

class LCLoadDylib : public LoadCommand {		class LCLoadDylib : public LoadCommand {
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	void writeTo(uint8_t *buf) const override {
memcpy(buf, path.data(), path.size());		memcpy(buf, path.data(), path.size());
buf[path.size()] = '\0';		buf[path.size()] = '\0';
}		}

private:		private:
// Recent versions of Darwin won't run any binary that has dyld at a		// Recent versions of Darwin won't run any binary that has dyld at a
// different location.		// different location.
const StringRef path = "/usr/lib/dyld";		const StringRef path = "/usr/lib/dyld";
};		};

class SectionComparator {
public:
struct OrderInfo {
uint32_t segmentOrder;
DenseMap<StringRef, uint32_t> sectionOrdering;
};

SectionComparator() {
// This defines the order of segments and the sections within each segment.
// Segments that are not mentioned here will end up at defaultPosition;
// sections that are not mentioned will end up at the end of the section
// list for their given segment.
std::vector<std::pair<StringRef, std::vector<StringRef>>> ordering{
{segment_names::pageZero, {}},
{segment_names::text, {section_names::header}},
{defaultPosition, {}},
// Make sure __LINKEDIT is the last segment (i.e. all its hidden
// sections must be ordered after other sections).
{segment_names::linkEdit,
{
section_names::binding,
section_names::export_,
section_names::symbolTable,
section_names::stringTable,
}},
};

for (uint32_t i = 0, n = ordering.size(); i < n; ++i) {
auto &p = ordering[i];
StringRef segname = p.first;
const std::vector<StringRef> &sectOrdering = p.second;
OrderInfo &info = orderMap[segname];
info.segmentOrder = i;
for (uint32_t j = 0, m = sectOrdering.size(); j < m; ++j)
info.sectionOrdering[sectOrdering[j]] = j;
}
}

// Return a {segmentOrder, sectionOrder} pair. Using this as a key will
// ensure that all sections in the same segment are sorted contiguously.
std::pair<uint32_t, uint32_t> order(const InputSection *isec) {
auto it = orderMap.find(isec->segname);
if (it == orderMap.end())
return {orderMap[defaultPosition].segmentOrder, 0};
OrderInfo &info = it->second;
auto sectIt = info.sectionOrdering.find(isec->name);
if (sectIt != info.sectionOrdering.end())
return {info.segmentOrder, sectIt->second};
return {info.segmentOrder, info.sectionOrdering.size()};
}

bool operator()(const InputSection a, const InputSection b) {
return order(a) < order(b);
}

private:
const StringRef defaultPosition = StringRef();
DenseMap<StringRef, OrderInfo> orderMap;
};

} // namespace		} // namespace
		int3Unsubmitted Not Done Reply Inline Actions Oh I see... looks like the problem with my initial sorting scheme is that it assumed that the segments were only ever created once all the sections were sorted. But this didn't account for __LINKEDIT, though I was lucky enough to create it after the sorting, so things worked out... but explicitly sorting both the segments and sections is more reliable. I think we can simplify the `order` method a bit though -- it's currently written to support comparing sections across different segments, and we're not using that functionality any more. Can we have two separate `order` methods, one for segments and the other for sections in the same segment? int3: Oh I see... looks like the problem with my initial sorting scheme is that it assumed that the…
		KtwuAuthorUnsubmitted Done Reply Inline Actions I was thinking about that, too, so I'll try it. Ktwu: I was thinking about that, too, so I'll try it.

template <typename SectionType, typename... ArgT>
SectionType *createInputSection(ArgT &&... args) {
auto *section = make<SectionType>(std::forward<ArgT>(args)...);
inputSections.push_back(section);
return section;
}

void Writer::scanRelocations() {		void Writer::scanRelocations() {
for (InputSection *sect : inputSections)		for (InputSection *sect : inputSections)
for (Reloc &r : sect->relocs)		for (Reloc &r : sect->relocs)
if (auto s = r.target.dyn_cast<Symbol >())		if (auto s = r.target.dyn_cast<Symbol >())
if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s))		if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s))
in.got->addEntry(*dylibSymbol);		in.got->addEntry(*dylibSymbol);
}		}

Show All 33 Lines	void Writer::createLoadCommands() {
}		}

// TODO: dyld requires libSystem to be loaded. libSystem is a universal		// TODO: dyld requires libSystem to be loaded. libSystem is a universal
// binary and we don't have support for that yet, so mock it out here.		// binary and we don't have support for that yet, so mock it out here.
headerSection->addLoadCommand(		headerSection->addLoadCommand(
make<LCLoadDylib>("/usr/lib/libSystem.B.dylib"));		make<LCLoadDylib>("/usr/lib/libSystem.B.dylib"));
}		}

void Writer::createHiddenSections() {		void Writer::createOutputSections() {
headerSection = createInputSection<MachHeaderSection>();		// First, create hidden sections
bindingSection = createInputSection<BindingSection>();		headerSection = make<MachHeaderSection>();
stringTableSection = createInputSection<StringTableSection>();		bindingSection = make<BindingSection>();
symtabSection = createInputSection<SymtabSection>(*stringTableSection);		stringTableSection = make<StringTableSection>();
exportSection = createInputSection<ExportSection>();		symtabSection = make<SymtabSection>(*stringTableSection);
		exportSection = make<ExportSection>();

switch (config->outputType) {		switch (config->outputType) {
case MH_EXECUTE:		case MH_EXECUTE:
createInputSection<PageZeroSection>();		make<PageZeroSection>();
break;		break;
case MH_DYLIB:		case MH_DYLIB:
break;		break;
default:		default:
llvm_unreachable("unhandled output file type");		llvm_unreachable("unhandled output file type");
}		}
}

void Writer::sortSections() {
llvm::stable_sort(inputSections, SectionComparator());

// TODO This is wrong; input sections ought to be grouped into		// Then merge input sections into output sections/segments.
// output sections, which are then organized like this.
uint32_t sectionIndex = 0;
// Add input sections to output segments.
for (InputSection *isec : inputSections) {		for (InputSection *isec : inputSections) {
if (isec->isNeeded()) {		getOrCreateOutputSegment(isec->segname)
if (!isec->isHidden())		->getOrCreateOutputSection(isec->name)
isec->sectionIndex = ++sectionIndex;		->mergeInput(isec);
getOrCreateOutputSegment(isec->segname)->addSection(isec);
}
}		}
		smeenaiUnsubmitted Not Done Reply Inline Actions Nit: might be nicer to assign these to variables instead of chaining. smeenai: Nit: might be nicer to assign these to variables instead of chaining.
		KtwuAuthorUnsubmitted Done Reply Inline Actions I personally prefer chaining :D Ktwu: I personally prefer chaining :D
		smeenaiUnsubmitted Not Done Reply Inline Actions Haha, fair enough. smeenai: Haha, fair enough.
}		}

void Writer::assignAddresses(OutputSegment *seg) {		void Writer::assignAddresses(OutputSegment *seg) {
addr = alignTo(addr, PageSize);		addr = alignTo(addr, PageSize);
fileOff = alignTo(fileOff, PageSize);		fileOff = alignTo(fileOff, PageSize);
seg->fileOff = fileOff;		seg->fileOff = fileOff;

for (auto &p : seg->getSections()) {		for (auto &p : seg->getSections()) {
ArrayRef<InputSection *> sections = p.second;		OutputSection *section = p.second;
for (InputSection *isec : sections) {		addr = alignTo(addr, section->align);
		int3Unsubmitted Not Done Reply Inline Actions isHidden check should be retained here int3: isHidden check should be retained here
addr = alignTo(addr, isec->align);
// We must align the file offsets too to avoid misaligned writes of		// We must align the file offsets too to avoid misaligned writes of
// structs.		// structs.
fileOff = alignTo(fileOff, isec->align);		fileOff = alignTo(fileOff, section->align);
isec->addr = addr;		section->addr = addr;
addr += isec->getSize();		section->fileOff = fileOff;
		smeenaiUnsubmitted Done Reply Inline Actions Would it make sense to have each `OutputSection` store its `fileOff` (in addition to or instead of the `OutputSegment` holding it), so that we don't need to recompute it in the writeSections loop below? smeenai: Would it make sense to have each `OutputSection` store its `fileOff` (in addition to or instead…
fileOff += isec->getFileSize();		section->finalize();
}
		addr += section->getSize();
		fileOff += section->getFileSize();
}		}
}		}

void Writer::openFile() {		void Writer::openFile() {
Expected<std::unique_ptr<FileOutputBuffer>> bufferOrErr =		Expected<std::unique_ptr<FileOutputBuffer>> bufferOrErr =
FileOutputBuffer::create(config->outputFile, fileOff,		FileOutputBuffer::create(config->outputFile, fileOff,
FileOutputBuffer::F_executable);		FileOutputBuffer::F_executable);

if (!bufferOrErr)		if (!bufferOrErr)
error("failed to open " + config->outputFile + ": " +		error("failed to open " + config->outputFile + ": " +
llvm::toString(bufferOrErr.takeError()));		llvm::toString(bufferOrErr.takeError()));
else		else
buffer = std::move(*bufferOrErr);		buffer = std::move(*bufferOrErr);
}		}

void Writer::writeSections() {		void Writer::writeSections() {
uint8_t *buf = buffer->getBufferStart();		uint8_t *buf = buffer->getBufferStart();
for (OutputSegment *seg : outputSegments) {		for (OutputSegment *seg : outputSegments) {
uint64_t fileOff = seg->fileOff;		for (auto &p : seg->getSections()) {
for (auto &sect : seg->getSections()) {		OutputSection *section = p.second;
for (InputSection *isec : sect.second) {		section->writeTo(buf + section->fileOff);
		int3Unsubmitted Not Done Reply Inline Actions we should avoid writing unneeded output sections (My stacked diff makes that assumption for one of the stub helper synthetic section) int3: we should avoid writing unneeded output sections (My stacked diff makes that assumption for one…
fileOff = alignTo(fileOff, isec->align);
isec->writeTo(buf + fileOff);
fileOff += isec->getFileSize();
}
}		}
}		}
}		}

void Writer::run() {		void Writer::run() {
scanRelocations();
createHiddenSections();
// Sort and assign sections to their respective segments. No more sections can
// be created after this method runs.
sortSections();
// dyld requires __LINKEDIT segment to always exist (even if empty).		// dyld requires __LINKEDIT segment to always exist (even if empty).
		OutputSegment *linkEditSegment =
getOrCreateOutputSegment(segment_names::linkEdit);		getOrCreateOutputSegment(segment_names::linkEdit);
// No more segments can be created after this method runs.
		scanRelocations();

		// Sort and assign sections to their respective segments. No more sections nor
		// segments may be created after this method runs.
		createOutputSections();
		sortOutputSegmentsAndSections();

		smeenaiUnsubmitted Done Reply Inline Actions Nit: use an explicit type instead of auto smeenai: Nit: use an explicit type instead of auto
		KtwuAuthorUnsubmitted Done Reply Inline Actions What's up with `auto`? It's not like it's forbidden from use in the style guide: https://llvm.org/docs/CodingStandards.html#id27 I'm curious about folks' reasoning behind its usage (or discouragement of). Ktwu: What's up with `auto`? It's not like it's forbidden from use in the style guide: https://llvm.
		smeenaiUnsubmitted Not Done Reply Inline Actions Good question. LLD has its own style guidelines (e.g. variables are lowerCamelCase). Unfortunately, I don't think those are codified anywhere, but I'm basing this on what I've seen in reviews. `auto` is discouraged unless the actual type is spelled out in the same expression somewhere (e.g. as a result of a cast) or is a huge pain to type out (e.g. iterators). Otherwise explicit types are preferred. I personally like that because I don't use an IDE (and I haven't set up any ctags or LSP-like things for my editor), so having explicit types available makes it easier for me to comprehend the code. It's a little less clear-cut in cases like auto linkEditSegment = getOrCreateOutputSegment(segment_names::linkEdit); where it'd be pretty fair to assume that `getOrCreateOutputSegment` returns an `OutputSegment `. It's a bit ambiguous whether it'd be a pointer, reference, or copy, but you could use `auto ` to disambiguate that. Nevertheless, it's not too much more typing to just spell the name out, so I'd prefer to err on the side of explicitness there. There was a post to the mailing list about auto usage in LLVM a while back, but there was no clear resolution: http://llvm.1065342.n5.nabble.com/llvm-dev-RFC-Modernizing-our-use-of-auto-td123947.html (the authors on that thread are bogus; if you want the original authors, look for that subject on http://lists.llvm.org/pipermail/llvm-dev/2018-November/ and http://lists.llvm.org/pipermail/llvm-dev/2018-December/) smeenai: Good question. LLD has its own style guidelines (e.g. variables are lowerCamelCase).
createLoadCommands();		createLoadCommands();

// Ensure that segments (and the sections they contain) are allocated		// Ensure that segments (and the sections they contain) are allocated
// addresses in ascending order, which dyld requires.		// addresses in ascending order, which dyld requires.
//		//
// Note that at this point, __LINKEDIT sections are empty, but we need to		// Note that at this point, __LINKEDIT sections are empty, but we need to
// determine addresses of other segments/sections before generating its		// determine addresses of other segments/sections before generating its
// contents.		// contents.
for (OutputSegment *seg : outputSegments)		for (OutputSegment *seg : outputSegments)
		if (seg != linkEditSegment)
assignAddresses(seg);		assignAddresses(seg);
		int3Unsubmitted Not Done Reply Inline Actions Thanks, I think it's clearer this way int3: Thanks, I think it's clearer this way

// Fill __LINKEDIT contents.		// Fill __LINKEDIT contents.
bindingSection->finalizeContents();		bindingSection->finalizeContents();
exportSection->finalizeContents();		exportSection->finalizeContents();
symtabSection->finalizeContents();		symtabSection->finalizeContents();

// Now that __LINKEDIT is filled out, do a proper calculation of its		// Now that __LINKEDIT is filled out, do a proper calculation of its
// addresses and offsets. We don't have to recalculate the other segments		// addresses and offsets.
// since sortSections() ensures that __LINKEDIT is the last segment.		assignAddresses(linkEditSegment);
assignAddresses(getOutputSegment(segment_names::linkEdit));

openFile();		openFile();
if (errorCount())		if (errorCount())
return;		return;

writeSections();		writeSections();

if (auto e = buffer->commit())		if (auto e = buffer->commit())
error("failed to write to the output file: " + toString(std::move(e)));		error("failed to write to the output file: " + toString(std::move(e)));
}		}

void macho::writeResult() { Writer().run(); }		void macho::writeResult() { Writer().run(); }

void macho::createSyntheticSections() {		void macho::createSyntheticSections() { in.got = make<GotSection>(); }
in.got = createInputSection<GotSection>();
}

lld/test/MachO/Inputs/libfunction.s

This file was added.

				.section __TEXT,__text
				.globl _some_function

				_some_function:
				mov $1, %rax
				ret

lld/test/MachO/section-merge.s

This file was added.

				# REQUIRES: x86
				# RUN: mkdir -p %t
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %p/Inputs/libhello.s \
				# RUN: -o %t/libhello.o
				smeenaiUnsubmitted Not Done Reply Inline Actions It'd be nice to check that the contents of the sections are merged correctly as well. Also, we should be checking that text segments are merged correctly, since that's the most common case. smeenai: It'd be nice to check that the contents of the sections are merged correctly as well. Also, we…
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %p/Inputs/libgoodbye.s \
				# RUN: -o %t/libgoodbye.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %p/Inputs/libfunction.s \
				# RUN: -o %t/libfunction.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s \
				# RUN: -o %t/main.o
				# RUN: lld -flavor darwinnew -o %t/output %t/libfunction.o %t/libgoodbye.o %t/libhello.o %t/main.o

				smeenaiUnsubmitted Not Done Reply Inline Actions This seems to be a leftover from testing :) smeenai: This seems to be a leftover from testing :)
				# RUN: llvm-objdump --syms %t/output \| FileCheck %s
				# CHECK: SYMBOL TABLE:
				# CHECK-DAG: {{[0-9a-z]+}} g O __TEXT,__cstring _goodbye_world
				# CHECK-DAG: {{[0-9a-z]+}} g O __TEXT,__cstring _hello_its_me
				smeenaiUnsubmitted Not Done Reply Inline Actions We only need to check the properties we care about. Detailed symbol table checking should happen in the symbol table tests. Over here, I think we just care about the symbol name, section, and value, so we can drop the checks for the other fields (Extern, Type, RefType, Flags) We should also be checking for the text section symbols. smeenai: We only need to check the properties we care about. Detailed symbol table checking should…
				int3Unsubmitted Not Done Reply Inline Actions +1 for terser checks. We can also use `llvm-objdump --syms` here -- its output is much more compact and suitable for when we're not checking all the properties int3: +1 for terser checks. We can also use `llvm-objdump --syms` here -- its output is much more…
				KtwuAuthorUnsubmitted Done Reply Inline Actions llvm-objdump --syms is pretty nice, so I'll use that here instead. Ktwu: llvm-objdump --syms is pretty nice, so I'll use that here instead.
				# CHECK-DAG: {{[0-9a-z]+}} g O __TEXT,__cstring _hello_world
				# CHECK-DAG: {{[0-9a-z]+}} g F __TEXT,__text _main
				# CHECK-DAG: {{[0-9a-z]+}} g F __TEXT,__text _some_function

				# RUN: llvm-objdump -d %t/output \| FileCheck %s --check-prefix DATA
				# DATA: Disassembly of section __TEXT,__text:
				# DATA: {{0*}}[[#%x,BASE:]] <_some_function>:
				# DATA-NEXT: [[#BASE]]: 48 c7 c0 01 00 00 00 movq $1, %rax
				# DATA-NEXT: [[#BASE + 0x7]]: c3 retq
				# DATA: {{0*}}[[#BASE + 0x8]] <_main>:
				# DATA-NEXT: [[#BASE + 0x8]]: 48 c7 c0 00 00 00 00 movq $0, %rax
				# DATA-NEXT: [[#BASE + 0xf]]: c3 retq

				.section __TEXT,__text
				.global _main

				_main:
				smeenaiUnsubmitted Not Done Reply Inline Actions We aren't defining these symbols in this file, so the `.global` directives aren't doing anything. smeenai: We aren't defining these symbols in this file, so the `.global` directives aren't doing…
				mov $0, %rax
				ret
				smeenaiUnsubmitted Not Done Reply Inline Actions Might be easier to make sense of this as assembly (`llvm-objdump -d`) smeenai: Might be easier to make sense of this as assembly (`llvm-objdump -d`)
				KtwuAuthorUnsubmitted Done Reply Inline Actions Is this a big deal? I compared this output to ld; I don't care about the content so much as making sure it matches what ld outputs. Ktwu: Is this a big deal? I compared this output to ld; I don't care about the content so much as…
				smeenaiUnsubmitted Not Done Reply Inline Actions I think it makes the test a lot more intelligible and easy to verify. Right now, for me, this is just a blob of bytes. If it were written out as instructions, I could verify that it's the instructions in `_some_function` followed by the instructions in `_main` (as it should be). In general, matching ld64's output is important, but we also want our tests to work well standalone. The cstring check below is great because it's easy to tell at a glance that all the strings from the various input files are being combined together (as they should be); using the disassembly will let us do the same for the text section. smeenai: I think it makes the test a lot more intelligible and easy to verify. Right now, for me, this…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Fair enough! Ktwu: Fair enough!

This is an archive of the discontinued LLVM Phabricator instance.

[lld] Merge Mach-O input sectionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 261592

lld/MachO/CMakeLists.txt

lld/MachO/Driver.cpp

lld/MachO/ExportTrie.h

lld/MachO/ExportTrie.cpp

lld/MachO/InputFiles.cpp

lld/MachO/InputSection.h

lld/MachO/InputSection.cpp

lld/MachO/MergedOutputSection.h

lld/MachO/MergedOutputSection.cpp

lld/MachO/OutputSection.h

lld/MachO/OutputSection.cpp

lld/MachO/OutputSegment.h

lld/MachO/OutputSegment.cpp

lld/MachO/Symbols.h

lld/MachO/SyntheticSections.h

lld/MachO/SyntheticSections.cpp

lld/MachO/Writer.cpp

lld/test/MachO/Inputs/libfunction.s

lld/test/MachO/section-merge.s

[lld] Merge Mach-O input sections
ClosedPublic