This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
MachO/
-
CMakeLists.txt
-
Driver.cpp
-
InputFiles.cpp
3/4
InputSection.h
-
InputSection.cpp
8/16
OutputSection.h
7/17
OutputSection.cpp
8/12
OutputSegment.h
5/9
OutputSegment.cpp
-
Symbols.h
3/7
SyntheticSections.cpp
6/13
Writer.cpp
-
test/MachO/
-
MachO/
3/10
section-merge.s

Differential D77893

[lld] Merge Mach-O input sections
ClosedPublic

Authored by Ktwu on Apr 10 2020, 1:07 PM.

Download Raw Diff

Details

Reviewers

ruiu
pcc
gkm
MaskRay
alexander-shaposhnikov
christylee
smeenai
int3

Commits

rG6cb073133c56: [lld] Merge Mach-O input sections

Summary

Similar to other formats, input sections in the MachO implementation are now grouped under output sections. This is primarily a refactor, although there's some new logic (like resolving the output section's flags based on its inputs).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Looks good at a high level!

lld/MachO/InputSection.h
52–53	LLD ELF has an `outSecOff` in its InputSections, which tracks the offset of this particular section within its output section. I think that'd be better than storing the absolute address in the InputSection.
lld/MachO/OutputSection.cpp
21	What's this computation doing?
33	We should experiment with how ld64 handles merging hidden sections, or if that's even a thing it does. (I don't think you can specify a section is hidden yourself; ld64 just has a list of atoms it defines to be hidden.)
lld/MachO/OutputSegment.cpp
36–41	I'm wondering if it'd be better to construct an OutputSection independently and then pass that into this function instead.
51–52	Is the check flipped?

Ktwu marked 3 inline comments as done.Apr 14 2020, 4:32 PM

Ktwu added inline comments.

lld/MachO/InputSection.h
52–53	Interesting, I'll look into that.
lld/MachO/OutputSection.cpp
21	It's copying what InputSection did to calculate its file offset. I didn't entirely comprehend this math tbh.
lld/MachO/OutputSegment.cpp
51–52	Oops, yes.

Passing most local tests now

Harbormaster failed remote builds in B53281: Diff 257574!Apr 14 2020, 6:31 PM

Looking good! Needs tests once everything's working :)

lld/MachO/InputSection.h
38	I think `getVA` would be more in line with the LLD naming for this concept.
51	Similarly, since this is the same notion as LLD ELF's `outSecOff`, I'd probably stick with that name, just so it's easy to map concepts between the two.
lld/MachO/OutputSection.cpp
21	Okay, this makes sense.
31	Can you add more details to this error message? It'd be ideal to have things like the section in question, the object file it's coming from, etc. Also, is this what ld64 does?
35	Can you add a TODO for figuring out how we should handle input sections with conflicting hidden-ness?

Ktwu marked 4 inline comments as done.Apr 16 2020, 11:42 AM

Ktwu added inline comments.

lld/MachO/OutputSection.cpp
31	Sure. So far as I can tell, ld64 doesn't do section merging on a flag level like I first thought; now that I'm diving into it, OutputFile.cpp separates flags into individual boolean attributes. It's not the easiest codebase to navigate :/

Added a test; unfortunately symbol export is blocking the text so I need to rebase

Harbormaster failed remote builds in B53605: Diff 258118!Apr 16 2020, 12:15 PM

int3 added inline comments.Apr 16 2020, 12:25 PM

lld/MachO/OutputSection.cpp
21	The parent segment's start address is defined as the address of the first section it contains. So `addr - parent->firstSection()->addr` computes the section's offset within the segment. Maybe we don't need this any more if we have `outSecOff` (going to investigate)
35	Personally I don't think we should support hidden InputSections until we find a use case (I'm not aware of one so far)

int3 added inline comments.Apr 16 2020, 12:50 PM

lld/MachO/OutputSection.cpp
21	Oh, outSecOff is for the InputSection's offset within the OutputSection, but this computation is for the OutputSection's offset within its segment. So having `outSecOff` doesn't impact this. That said the ELF implementation seems to have an explicit `OutputSection::offset` field, though I'm not sure what populates it... but I think this scheme of computing the offset from the address is fine for now

Ktwu added a parent revision: D76977: [lld-macho] Implement basic export trie.Apr 16 2020, 1:57 PM

Rebased, various comments about renames, basic test

Ktwu retitled this revision from WIP [lld] Merge Mach-O input sections to [lld] Merge Mach-O input sections.Apr 16 2020, 3:02 PM

Ktwu edited the summary of this revision. (Show Details)

Ktwu added reviewers: ruiu, pcc, gkm, MaskRay, alexander-shaposhnikov, christylee, smeenai, int3.

Harbormaster failed remote builds in B53646: Diff 258182!Apr 16 2020, 3:05 PM

int3 added inline comments.Apr 16 2020, 3:10 PM

lld/MachO/OutputSection.cpp
35	I.e. I think the synthetic sections could become OutputSections. So the only InputSections would actually be from real inputs, which will never be hidden

Ktwu marked 2 inline comments as done.Apr 16 2020, 7:20 PM

Ktwu added inline comments.

lld/MachO/OutputSection.cpp
35	Yeah I could try that!

Ktwu added a child revision: D78342: [lld] Add archive file support to Mach-O backend.Apr 16 2020, 7:30 PM

Refactored to turn synthetic input sections into output sections. A new class represents merged input sections, the MergedOutputSection.

Harbormaster failed remote builds in B54051: Diff 258916!Apr 21 2020, 1:35 AM

Pruned unnecessary hidden checks, coalesced some functions, removed unused imports

tweak comment

Harbormaster failed remote builds in B54132: Diff 259061!Apr 21 2020, 11:22 AM

Harbormaster failed remote builds in B54128: Diff 259056!

int3 added inline comments.Apr 21 2020, 7:56 PM

lld/MachO/MergedOutputSection.cpp
20 ↗	(On Diff #258916)	bikeshed: Do we want to prefix all member accesses with `this->`? lld-ELF seems to use it inconsistently, lld-COFF in just a few places... I have a slight preference towards omitting it, but we should be consistent either way
lld/MachO/OutputSection.cpp
26–29	can we just define this method on MergedOutputSection?
lld/MachO/OutputSection.h
21	How does this class hierarchy compare to that in the ELF implementation? I know they have `SectionBase`, `InputSectionBase`, and `MergeInputSection`... was planning to dig into it eventually but curious if you have looked
lld/MachO/OutputSegment.h
32	nit: I think `using SectionMap = ...` is the more modern C++ way (and is the method favored by lld-ELF/COFF)
41	@ruiu will object to the `auto` :) I personally hate typing out an `std::pair` type, but we should at least unpack `i.second` into a var with a named type nit 2: use `auto &` here alternatively we could replace the loop with `std::any_of`
43	`addOutputSection` seems like a more fitting name
44	if `sections` isn't private any more, it doesn't need an accessor edit: I see that it's non-private only because we need to sort it. How about making the sorting a method on this class? Related thought... I see that MapVector has a `takeVector` method that clears out the map and returns the underlying vector. Maybe we could do that -- have a `getSortedSections` method that returns an empty vector until we actually sort things. That would mean that the comparator can work on an actual vector & take single elements instead of `std::pair`s. Just my 2c, might be overcomplicating things here
lld/MachO/SyntheticSections.h
43 ↗	(On Diff #258916)	should this field live in the parent class? edit: Oh I see, OutputSections don't define `segname` because that's accessed through `parent->name`, so only InputSections and synthetic sections need to define segnames. How about defining a `SyntheticSection` superclass that defines this field in its ctor, and have every class in this file inherit from it?
lld/MachO/Writer.cpp
309–310	Oh I see... looks like the problem with my initial sorting scheme is that it assumed that the segments were only ever created once all the sections were sorted. But this didn't account for __LINKEDIT, though I was lucky enough to create it after the sorting, so things worked out... but explicitly sorting both the segments and sections is more reliable. I think we can simplify the `order` method a bit though -- it's currently written to support comparing sections across different segments, and we're not using that functionality any more. Can we have two separate `order` methods, one for segments and the other for sections in the same segment?
476	Thanks, I think it's clearer this way

int3 added inline comments.Apr 21 2020, 8:08 PM

lld/MachO/OutputSegment.h
43	also, I think if would make sense if we moved `createOutputSection` from Writer.cpp to a method on this class, then this helper method can be made private

Ktwu marked 7 inline comments as done.Apr 21 2020, 10:22 PM

Ktwu added inline comments.

lld/MachO/OutputSegment.h
44	Yes to section sorting living here, nay to the `takeVector` idea (at least in this diff).
lld/MachO/SyntheticSections.h
43 ↗	(On Diff #258916)	Yup, a base SyntheticSection class sounds ++
lld/MachO/Writer.cpp
309–310	I was thinking about that, too, so I'll try it.

Ktwu marked 3 inline comments as done.Apr 22 2020, 1:02 PM

Ktwu added inline comments.

lld/MachO/MergedOutputSection.cpp
20 ↗	(On Diff #258916)	Good point; if I was rigorous, I think I'd prefer omitting it, too.
lld/MachO/OutputSection.cpp
26–29	Since output segments just contain output sections now, they're not aware of whether they contain synthetic sections or mergable sections. I'm not sure how to cast that away (I guess a dynamic cast at runtime, but I figured this would be cheaper).
lld/MachO/OutputSection.h
21	I hadn't looked too closely tbh.

Move comparator logic within output segment / section
Refactored creation of synthetic classes to use a base class / add to segments within constructor
removed instances of "this"

Harbormaster failed remote builds in B54297: Diff 259376!Apr 22 2020, 2:09 PM

The change looks good, but it needs a bunch more testing, e.g:

Various flag merging scenarios (I recognize some of that is in flux right now, but e.g. the pure instructions mergnig)
Checking for merging text and data sections as well as cstrings, including checking the merging of the contents as well as the symbols
Perhaps something for the section sorting, since you're changing that (though perhaps the existing tests for that already provide enough coverage)

lld/MachO/ExportTrie.h
27 ↗	(On Diff #259376)	@Ktwu, if these changes were incorporated in D76977, would you not need to depend on that? That one's a meatier review than this one, so it'd be easier to get this in first if possible.
lld/MachO/MergedOutputSection.cpp
33 ↗	(On Diff #259376)	Super nit: `isecAddr` is a bit more understandable IMO.
36 ↗	(On Diff #259376)	Super nit: `InputSection` loop variables are usually `isec`
51 ↗	(On Diff #259376)	Could you elaborate on what might be wrong in this comment?
65 ↗	(On Diff #259376)	segment -> section
66 ↗	(On Diff #259376)	If I'm understanding this correctly, it'll end up negating the pure bit when you don't want it to. If both `inputFlags` and `flags` have the bit set, the result of the AND for that bit will be 1, so it'll be 0 in your mask, so it'll get unset.
lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #259376)	Could you add a class comment about what this represents? I'd prefer renaming `OutputSection` to `OutputSectionBase` and `MergedOutputSection` to just `OutputSection`, to be more in line with ELF, but I don't feel super strongly about that.
lld/MachO/OutputSection.h
21	`MergeInputSection` is for the ELF SHF_MERGE concept, where a linker can merge the objects inside a section ... it's used for strings, for example (where the linker can perform string merging and tail merging). I think ld64 just treats certain sections types as mergeable instead of having a special flags for that.
22	Can you add a class comment?
27–28	Why do these need to be virtual?
42	Is this needed for anything other than MergedOutputSection?
lld/MachO/OutputSegment.cpp
36–41	If this is gonna be called often, we should either do some sort of caching, or else just do the computation as part of `addOutputSection` and make it a public member variable.
38	`auto &i`, and same comment about unpacking `i.second`
lld/MachO/OutputSegment.h
32	This one needs to be addressed.
62	Given that this might be the common case, would it make sense to cache the `find` somehow?
lld/MachO/SyntheticSections.h
165–169 ↗	(On Diff #259376)	I believe the `in` is short for "globally accessibly input sections". LLD ELF has a corresponding struct called `Out` for output sections (in OutputSections.h), and given that our synthetic sections are output sections now, it might make sense to adopt that name.
lld/MachO/Writer.cpp
402–415	Nit: might be nicer to assign these to variables instead of chaining.
466	Nit: use an explicit type instead of auto
lld/test/MachO/section-merge.s
4	It'd be nice to check that the contents of the sections are merged correctly as well. Also, we should be checking that text segments are merged correctly, since that's the most common case.
32–33	We aren't defining these symbols in this file, so the `.global` directives aren't doing anything.

@smeenai's comments aside, it looks pretty good to me :)

lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #259376)	+1 on the renaming
lld/MachO/OutputSection.h
77–78	"which stores them in a MapVector of section name -> section" seems clearer. I think it's worth pointing out we have a MapVector since it doesn't make sense to sort any other kind of map
79–80	nit: take by const ref
84	I think `operator<` typically returns a `bool`
lld/MachO/SyntheticSections.cpp
34	ultra nit: how about "Synthetic sections always know which segment they belong to, so hook them up when they're made"? "No need to orphan" seems a bit weird because it hints that one might naively want to orphan them but doesn't indicate why

Ktwu marked 11 inline comments as done.Apr 24 2020, 4:31 PM

Ktwu added inline comments.

lld/MachO/ExportTrie.h
27 ↗	(On Diff #259376)	As discussed in the meeting, section merging depends on being able to have more than one symbol at a time for testing. The export trie is a hard dependency if we want to test this properly :/
lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #259376)	Sure.

Ktwu added inline comments.Apr 24 2020, 4:31 PM

lld/MachO/MergedOutputSection.cpp
33 ↗	(On Diff #259376)	Sure
51 ↗	(On Diff #259376)	I'll add this to the comment, but in short, there's no research that's gone into validating what the merge behavior for flags like this ought to be (it's also why there are no tests for flag merging).
65 ↗	(On Diff #259376)	Oops thanks.
66 ↗	(On Diff #259376)	Ahhh nice catch (and ideally one that a test will confirm). I believe what I want is: uint32_t pureMask = ~MachO::S_ATTR_PURE_INSTRUCTIONS \| (inputFlags & flags);
lld/MachO/OutputSection.h
27–28	ah, they don't, it's refactoring cruft
42	No, it's not needed except for MergedOutputSection. Should I dynamically cast each output section before trying to merge an input section into it instead (I wanted to avoid the runtime hit doing that).
lld/MachO/OutputSegment.cpp
36–41	I believe I hit issues trying to cache this on adding an OutputSection; I think the GotSection made things difficult since its isNeeded() attribute is dynamic.
lld/MachO/OutputSegment.h
62	Good idea!

smeenai added inline comments.Apr 24 2020, 5:16 PM

lld/MachO/OutputSection.h
42	Yeah, dynamically casting wouldn't be ideal. We could leave this as-is, or make `getOrCreateOutputSection` return a `MergedOutputSection ` instead of an `OutputSection ` (idk if that'd cause other complications).

Ktwu marked 9 inline comments as done.Apr 27 2020, 3:16 PM

Ktwu added inline comments.

lld/MachO/OutputSection.h
42	I believe that would require having an assert that the output class being returned, if already created, is in fact a mergeable section. I think a static_cast would be what we'd want, but I like having the explicit assert here in case something goes wrong instead of the undefined behavior of a static_cast gone wrong.
lld/MachO/Writer.cpp
402–415	I personally prefer chaining :D
466	What's up with `auto`? It's not like it's forbidden from use in the style guide: https://llvm.org/docs/CodingStandards.html#id27 I'm curious about folks' reasoning behind its usage (or discouragement of).

Comments. More significantly, added more checks to verify __text merging.

Harbormaster failed remote builds in B54890: Diff 260486!Apr 27 2020, 4:44 PM

smeenai added inline comments.Apr 28 2020, 12:23 PM

lld/MachO/OutputSection.h
42	Ah. The assert would require an `isa`, which is basically the same overhead as a `dyn_cast`, so perhaps just leaving it as-is is best for now.
lld/MachO/Writer.cpp
402–415	Haha, fair enough.
466	Good question. LLD has its own style guidelines (e.g. variables are lowerCamelCase). Unfortunately, I don't think those are codified anywhere, but I'm basing this on what I've seen in reviews. `auto` is discouraged unless the actual type is spelled out in the same expression somewhere (e.g. as a result of a cast) or is a huge pain to type out (e.g. iterators). Otherwise explicit types are preferred. I personally like that because I don't use an IDE (and I haven't set up any ctags or LSP-like things for my editor), so having explicit types available makes it easier for me to comprehend the code. It's a little less clear-cut in cases like auto linkEditSegment = getOrCreateOutputSegment(segment_names::linkEdit); where it'd be pretty fair to assume that `getOrCreateOutputSegment` returns an `OutputSegment `. It's a bit ambiguous whether it'd be a pointer, reference, or copy, but you could use `auto ` to disambiguate that. Nevertheless, it's not too much more typing to just spell the name out, so I'd prefer to err on the side of explicitness there. There was a post to the mailing list about auto usage in LLVM a while back, but there was no clear resolution: http://llvm.1065342.n5.nabble.com/llvm-dev-RFC-Modernizing-our-use-of-auto-td123947.html (the authors on that thread are bogus; if you want the original authors, look for that subject on http://lists.llvm.org/pipermail/llvm-dev/2018-November/ and http://lists.llvm.org/pipermail/llvm-dev/2018-December/)

smeenai added inline comments.Apr 28 2020, 5:02 PM

lld/MachO/MergedOutputSection.cpp
39 ↗	(On Diff #260486)	@int3 is changing this to take the section's alignment into account in D79050, so this should follow suit.
47 ↗	(On Diff #260486)	Same here.
lld/MachO/Writer.cpp
430	Would it make sense to have each `OutputSection` store its `fileOff` (in addition to or instead of the `OutputSegment` holding it), so that we don't need to recompute it in the writeSections loop below?

There's also the comment in D79050 about setting the alignment in the SyntheticSection base class once we have that.

I'm assuming the diff hasn't been updated with the OutputSection -> OutputSectionBase and MergedOutputSection -> OutputSection renaming yet (assuming you're good with that).

lld/MachO/MergedOutputSection.cpp
66 ↗	(On Diff #259376)	Yup, looks good.
lld/MachO/SyntheticSections.h
165–169 ↗	(On Diff #259376)	This one still needs addressing, though I'm okay with doing it in a follow-up if you'd prefer.
lld/test/MachO/section-merge.s
46	Might be easier to make sense of this as assembly (`llvm-objdump -d`)

Ktwu marked 6 inline comments as done and an inline comment as not done.Apr 29 2020, 10:10 PM

Ktwu added inline comments.

lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #259376)	Er, actually, I like MergedOutputSection because it conveys more information about what kind of OutputSection it is. Given how many of the other classes used don't use -Base in their name -- InputSections, OutputSegments -- it feels clunky to have an OutputSectionBase. @int3 do you feel strongly about renaming this?
lld/test/MachO/section-merge.s
46	Is this a big deal? I compared this output to ld; I don't care about the content so much as making sure it matches what ld outputs.

Rebased, aligned file offsets, and tested with UBSAN.

clang-format

smeenai added inline comments.Apr 29 2020, 10:49 PM

lld/test/MachO/section-merge.s
46	I think it makes the test a lot more intelligible and easy to verify. Right now, for me, this is just a blob of bytes. If it were written out as instructions, I could verify that it's the instructions in `_some_function` followed by the instructions in `_main` (as it should be). In general, matching ld64's output is important, but we also want our tests to work well standalone. The cstring check below is great because it's easy to tell at a glance that all the strings from the various input files are being combined together (as they should be); using the disassembly will let us do the same for the text section.

smeenai added inline comments.Apr 29 2020, 10:53 PM

lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #259376)	Sure, I'm good with the parent class being named `OutputSection` and having specific subclasses representing the different types of output sections. I'm not the biggest fan of the `MergedOutputSection` name because to me it suggests output sections being merged together rather than input sections being merged into a single output section, but I can't think of anything better either, so I'm good with it if we can't come up with a better name. Naming is hard :D

Harbormaster failed remote builds in B55252: Diff 261131!Apr 29 2020, 11:13 PM

Harbormaster failed remote builds in B55251: Diff 261130!

int3 added inline comments.Apr 30 2020, 3:53 AM

lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #259376)	I prefer the rename because it more closely parallels what lld-ELF has. Moreover ELF's MergeInputSection is quite different, so it's almost like a false parallel... Re conveying information, my mental model has been that OutputSections are by default mergeable, and only the special SyntheticSections aren't, so I don't feel that there's a need to explicitly call out the mergeability. `-Base` being a clunky, non-functionally-descriptive suffix is a fair point though...

One thing I just thought about (sorry :/). How do we want to handle sections in input files with the same name as our special sections? For example, what if the user gives us an input file with the section __DATA_CONST,__got? ld64 appears to handle this fine; it just combines the user-provided section with its own synthesized one. It also handles the case where a user input file has a __TEXT,__mach_header section, and treats it as distinct from its own hidden synthesized section with that name.

lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #261131)	The comment looks great! Super nit: LLD always formats these types of comments using single-line comments (`//`) instead of `/* */`, so we should follow suit here.
19 ↗	(On Diff #259376)	To be fair, I don't think LLD ELF has an OutputSectionBase. It does have an InputSectionBase though. The hierarchy is also different cos ELF synthetic sections are input sections (so they can be manipulated by linker scripts), whereas ours are output sections, so it's a bit hard to compare.
lld/MachO/OutputSection.cpp
23	We should only hit this if we have a programming error on our end, right? If so, this should be `llvm_unreachable` instead of `error`.
lld/MachO/OutputSection.h
22	Comment looks great! Same nit about using single-line comments.
lld/MachO/OutputSegment.cpp
36–41	Ah, makes sense.
47	Should we assert that `sections[os->name]` doesn't already exist?
51	FWIW, `auto` is fine for iterators, but this is fine too.
lld/MachO/OutputSegment.h
62	This one isn't addressed, but given that we shouldn't have too many segments (besides the ones already in the map, I can only think of `__DATA`), perhaps this is okay as-is?
lld/MachO/SyntheticSections.cpp
36	Super nit: use a member initialization list instead.
69	@int3 how come these strings are just here directly vs. all the other ones being named constants? Also, not this diff, but is `__DATA_CONST` correct? ld64 puts this in `__DATA` instead as far as I can see. It's not quite constant cos the dynamic linker's gonna fill it in, though idk if it has the equivalent to ELF's RELRO.
lld/test/MachO/section-merge.s
16	We only need to check the properties we care about. Detailed symbol table checking should happen in the symbol table tests. Over here, I think we just care about the symbol name, section, and value, so we can drop the checks for the other fields (Extern, Type, RefType, Flags) We should also be checking for the text section symbols.

int3 added inline comments.Apr 30 2020, 7:21 PM

lld/test/MachO/section-merge.s
16	+1 for terser checks. We can also use `llvm-objdump --syms` here -- its output is much more compact and suitable for when we're not checking all the properties

int3 added inline comments.Apr 30 2020, 7:23 PM

lld/MachO/SyntheticSections.cpp
69	I don't think we're currently referencing this from anywhere else (in particular I didn't give them an order in the sorting comparator), so it wasn't technically necessary, though we could definitely make them constants for the sake of uniformity

Ktwu marked 11 inline comments as done.Apr 30 2020, 11:45 PM

Ktwu added inline comments.

lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #261131)	D:
lld/MachO/OutputSection.cpp
23	Yup, since we're assuming that synthetic sections cannot be merged, this shouldn't happen.
lld/MachO/OutputSegment.h
62	Ah, no, I legit forgot to address this, so I don't mind getting to it...
lld/MachO/SyntheticSections.cpp
69	I'll make 'em a constant for now; can we deal with DATA vs DATA_CONST in another diff if need be?
lld/test/MachO/section-merge.s
16	llvm-objdump --syms is pretty nice, so I'll use that here instead.
46	Fair enough!

initialization list for SyntheticSection
caching position for "default" sections
objdump instead of readobj in test
constants for __DATA_CONST
llvm_unreachable

Harbormaster failed remote builds in B55413: Diff 261433!Apr 30 2020, 11:47 PM

int3 added inline comments.May 1 2020, 12:50 AM

lld/MachO/SyntheticSections.cpp
69	Oh sorry I missed the 2nd part of the comment about the segment. I'm pretty sure it's in `__DATA_CONST` at least on Catalina; just tried it out. But yeah we can deal with it in another diff if necessary

int3 marked an inline comment as done.May 1 2020, 2:58 AM

int3 added inline comments.

lld/MachO/MergedOutputSection.h
19 ↗	(On Diff #261131)	Not sure if that emoji was in reaction to my naming nit, but please feel free to stick with what you have. I was just giving my 2c, but it's your diff :)

smeenai added inline comments.May 1 2020, 1:54 PM

lld/MachO/SyntheticSections.cpp
69	Ah, sorry, I was just wondering why these weren't constants. I didn't mean to imply that you had to do it in this diff, but thanks for taking care of it :) @int3 interesting, this appears to be a Catalina vs older OS thing. Possibly related to dyld3?

Looks great! @int3, any other comments?

lld/test/MachO/section-merge.s
12	This seems to be a leftover from testing :)

This revision is now accepted and ready to land.May 1 2020, 1:56 PM

Yeah this lgtm, let's ship it (after rebasing). Pretty sure D78168: [lld-macho][rfc] Have Symbol::getVA() return a non-relative virtual address causes a rebase conflict, not sure if there's anything else...

int3 accepted this revision.May 1 2020, 2:19 PM

Remove testing cruft, oops

In D77893#2014371, @smeenai wrote:

One thing I just thought about (sorry :/). How do we want to handle sections in input files with the same name as our special sections? For example, what if the user gives us an input file with the section __DATA_CONST,__got? ld64 appears to handle this fine; it just combines the user-provided section with its own synthesized one. It also handles the case where a user input file has a __TEXT,__mach_header section, and treats it as distinct from its own hidden synthesized section with that name.

For a large internal binary, I confirmed that none of the input section names clashed with our synthetic section names, so I think it's pretty safe to just error out if we run into that.

int3 edited parent revisions, added: D78269: [lld-macho] Support X86_64_RELOC_BRANCH; removed: D76977: [lld-macho] Implement basic export trie.May 1 2020, 5:06 PM

Closed by commit rG6cb073133c56: [lld] Merge Mach-O input sections (authored by Ktwu, committed by int3). · Explain WhyMay 1 2020, 5:31 PM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B55498: Diff 261554!May 1 2020, 8:01 PM

Noticed a couple of minor things while rebasing on top of this. I've folded changes for them into D78270: [lld-macho] Support calls to functions in dylibs.

lld/MachO/OutputSection.h
41	IMO this should just be `return true`. Whether a section is hidden is orthogonal from whether it is needed: hidden sections will never have a header regardless of whether they have a body. (I know we override this method with `return false` for synthetic sections, but regardless I think it's confusing to write it this way for non-synthetic sections.)
lld/MachO/Writer.cpp
118–119	I think this should stay as `getSections().empty()`, and the check for `isNeeded()` should be moved outside writeTo(). We should just not create LCSegment commands for unneeded segments. The `empty()` check however is still needed because `__LINKEDIT` can be empty.
451–452	we should avoid writing unneeded output sections (My stacked diff makes that assumption for one of the stub helper synthetic section)

In D77893#2021478, @int3 wrote:

Noticed a couple of minor things while rebasing on top of this. I've folded changes for them into D78270: [lld-macho] Support calls to functions in dylibs.

Ideally they'd be separated into their own small changes so that it's easier to review them.

If it's changes you feel pretty confident about, it's fine to just commit them directly (for post-commit review). If it's changes where you want input, putting them up for review separately makes it easier to give focused feedback.

Fair enough, I was being lazy... I'd made those changes while fixing rebase conflicts + getting tests to pass, so it was convenient that way. I'll try to untangle them

Alright, put up D79460

int3 added inline comments.May 6 2020, 2:53 AM

lld/MachO/OutputSegment.h
44	I was looking at the implementation of `MapVector` today and I realized that the sort only operates on the vector and not the map, so the container exhibits some questionable behavior after sorting :D MapVector<int, int> mv; mv[2] = 98; mv[1] = 99; std::sort(mv.begin(), mv.end()); for (int i = 1; i <= 2; ++i) fprintf(stderr, "%d %d\n", i, mv[i]); This prints 1 98 2 99 So I think we should really use `takeVector` before sorting. Two more refactoring ideas to tack on to that: We could filter out the unneeded OutputSections as this stage too, so we don't have to worry about checking isNeeded() afterward. Maybe we could consider not creating the OutputSegments till after the sorting/filtering has been done, so we don't have the OutputSegments in a state where some operations aren't valid. But that would probably mean an additional `outputSections` global, so there's some tradeoff there. Up to you

Ktwu removed a child revision: D78342: [lld] Add archive file support to Mach-O backend.May 7 2020, 11:48 AM

Ktwu added a child revision: D78342: [lld] Add archive file support to Mach-O backend.May 8 2020, 6:14 PM

int3 mentioned this in rGdb157d27337f: [lld-macho] Follow-up to D77893.May 9 2020, 9:16 PM

smeenai mentioned this in D87199: [lld-macho] Implement support for PIC.Sep 23 2020, 11:15 PM

Revision Contents

Path

Size

lld/

MachO/

1 line

1 line

2 lines

13 lines

10 lines

59 lines

80 lines

21 lines

24 lines

2 lines

SyntheticSections.cpp

6 lines

Writer.cpp

76 lines

test/

MachO/

section-merge.s

37 lines

Diff 258182

lld/MachO/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS Options.td)			set(LLVM_TARGET_DEFINITIONS Options.td)
	tablegen(LLVM Options.inc -gen-opt-parser-defs)			tablegen(LLVM Options.inc -gen-opt-parser-defs)
	add_public_tablegen_target(MachOOptionsTableGen)			add_public_tablegen_target(MachOOptionsTableGen)

	add_lld_library(lldMachO2			add_lld_library(lldMachO2
	Arch/X86_64.cpp			Arch/X86_64.cpp
	Driver.cpp			Driver.cpp
	ExportTrie.cpp			ExportTrie.cpp
	InputFiles.cpp			InputFiles.cpp
	InputSection.cpp			InputSection.cpp
				OutputSection.cpp
	OutputSegment.cpp			OutputSegment.cpp
	SymbolTable.cpp			SymbolTable.cpp
	Symbols.cpp			Symbols.cpp
	SyntheticSections.cpp			SyntheticSections.cpp
	Target.cpp			Target.cpp
	Writer.cpp			Writer.cpp

	LINK_COMPONENTS			LINK_COMPONENTS
	Show All 15 Lines

lld/MachO/Driver.cpp

	//===- Driver.cpp ---------------------------------------------------------===//			//===- Driver.cpp ---------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "Driver.h"			#include "Driver.h"
	#include "Config.h"			#include "Config.h"
	#include "InputFiles.h"			#include "InputFiles.h"
				#include "OutputSection.h"
	#include "OutputSegment.h"			#include "OutputSegment.h"
	#include "SymbolTable.h"			#include "SymbolTable.h"
	#include "Symbols.h"			#include "Symbols.h"
	#include "Target.h"			#include "Target.h"
	#include "Writer.h"			#include "Writer.h"

	#include "lld/Common/Args.h"			#include "lld/Common/Args.h"
	#include "lld/Common/Driver.h"			#include "lld/Common/Driver.h"
	▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

lld/MachO/InputFiles.cpp

	Show All 37 Lines
	//			//
	// Without the above differences, I think you can use your knowledge about ELF			// Without the above differences, I think you can use your knowledge about ELF
	// and COFF for Mach-O.			// and COFF for Mach-O.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "InputFiles.h"			#include "InputFiles.h"
	#include "InputSection.h"			#include "InputSection.h"
	#include "OutputSegment.h"			#include "OutputSection.h"
	#include "SymbolTable.h"			#include "SymbolTable.h"
	#include "Symbols.h"			#include "Symbols.h"
	#include "Target.h"			#include "Target.h"

	#include "lld/Common/ErrorHandler.h"			#include "lld/Common/ErrorHandler.h"
	#include "lld/Common/Memory.h"			#include "lld/Common/Memory.h"
	#include "llvm/BinaryFormat/MachO.h"			#include "llvm/BinaryFormat/MachO.h"
	#include "llvm/Support/Endian.h"			#include "llvm/Support/Endian.h"
	▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

lld/MachO/InputSection.h

	Show All 13 Lines
	#include "llvm/ADT/PointerUnion.h"			#include "llvm/ADT/PointerUnion.h"
	#include "llvm/BinaryFormat/MachO.h"			#include "llvm/BinaryFormat/MachO.h"

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	class InputFile;			class InputFile;
	class InputSection;			class InputSection;
	class OutputSegment;			class OutputSection;
	class Symbol;			class Symbol;

	struct Reloc {			struct Reloc {
	uint8_t type;			uint8_t type;
	uint32_t addend;			uint32_t addend;
	uint32_t offset;			uint32_t offset;
	llvm::PointerUnion<Symbol , InputSection > target;			llvm::PointerUnion<Symbol , InputSection > target;
	};			};

	class InputSection {			class InputSection {
	public:			public:
	virtual ~InputSection() = default;			virtual ~InputSection() = default;
	virtual size_t getSize() const { return data.size(); }			virtual size_t getSize() const { return data.size(); }
	virtual uint64_t getFileSize() const { return getSize(); }			virtual uint64_t getFileSize() const { return getSize(); }
	uint64_t getFileOffset() const;			uint64_t getFileOffset() const;
				uint64_t getVA() const;
				smeenaiUnsubmitted Done Reply Inline Actions I think `getVA` would be more in line with the LLD naming for this concept. smeenai: I think `getVA` would be more in line with the LLD naming for this concept.

	// Don't emit section_64 headers for hidden sections.			// Don't emit section_64 headers for hidden sections.
	virtual bool isHidden() const { return false; }			virtual bool isHidden() const { return false; }
	// Unneeded sections are omitted entirely (header and body).			// Unneeded sections are omitted entirely (header and body).
	virtual bool isNeeded() const { return true; }			virtual bool isNeeded() const { return true; }
	virtual void writeTo(uint8_t *buf);			virtual void writeTo(uint8_t *buf);

	InputFile *file = nullptr;			InputFile *file = nullptr;
	OutputSegment *parent = nullptr;
	StringRef name;			StringRef name;
	StringRef segname;			StringRef segname;

	ArrayRef<uint8_t> data;			OutputSection *parent = nullptr;
				uint64_t outSecOff = 0;
				smeenaiUnsubmitted Done Reply Inline Actions Similarly, since this is the same notion as LLD ELF's `outSecOff`, I'd probably stick with that name, just so it's easy to map concepts between the two. smeenai: Similarly, since this is the same notion as LLD ELF's `outSecOff`, I'd probably stick with that…

	// TODO these properties ought to live in an OutputSection class.
	// Move them once available.
	uint64_t addr = 0;
	uint32_t align = 1;			uint32_t align = 1;
				smeenaiUnsubmitted Not Done Reply Inline Actions LLD ELF has an `outSecOff` in its InputSections, which tracks the offset of this particular section within its output section. I think that'd be better than storing the absolute address in the InputSection. smeenai: LLD ELF has an `outSecOff` in its InputSections, which tracks the offset of this particular…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Interesting, I'll look into that. Ktwu: Interesting, I'll look into that.
	uint32_t sectionIndex = 0;
	uint32_t flags = 0;			uint32_t flags = 0;

				ArrayRef<uint8_t> data;
	std::vector<Reloc> relocs;			std::vector<Reloc> relocs;
	};			};

	extern std::vector<InputSection *> inputSections;			extern std::vector<InputSection *> inputSections;

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

	#endif			#endif

lld/MachO/InputSection.cpp

	Show All 16 Lines
	using namespace llvm::MachO;			using namespace llvm::MachO;
	using namespace llvm::support;			using namespace llvm::support;
	using namespace lld;			using namespace lld;
	using namespace lld::macho;			using namespace lld::macho;

	std::vector<InputSection *> macho::inputSections;			std::vector<InputSection *> macho::inputSections;

	uint64_t InputSection::getFileOffset() const {			uint64_t InputSection::getFileOffset() const {
	return parent->fileOff + addr - parent->firstSection()->addr;			return parent->getFileOffset() + outSecOff;
	}			}

				uint64_t InputSection::getVA() const { return parent->addr + outSecOff; }

	void InputSection::writeTo(uint8_t *buf) {			void InputSection::writeTo(uint8_t *buf) {
	memcpy(buf, data.data(), data.size());			memcpy(buf, data.data(), data.size());

	for (Reloc &r : relocs) {			for (Reloc &r : relocs) {
	uint64_t va = 0;			uint64_t va = 0;
	if (auto s = r.target.dyn_cast<Symbol >()) {			if (auto s = r.target.dyn_cast<Symbol >()) {
	if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s)) {			if (auto *dylibSymbol = dyn_cast<DylibSymbol>(s)) {
	va = in.got->addr - ImageBase + dylibSymbol->gotIndex * WordSize;			va = in.got->getVA() - ImageBase + dylibSymbol->gotIndex * WordSize;
	} else {			} else {
	va = s->getVA();			va = s->getVA();
	}			}
	} else if (auto isec = r.target.dyn_cast<InputSection >())			} else if (auto isec = r.target.dyn_cast<InputSection >())
	va = isec->addr;			va = isec->getVA();
	else			else
	llvm_unreachable("Unknown relocation target");			llvm_unreachable("Unknown relocation target");

	uint64_t val = va + r.addend;			uint64_t val = va + r.addend;
	if (1) // TODO: handle non-pcrel relocations			if (1) // TODO: handle non-pcrel relocations
	val -= addr - ImageBase + r.offset;			val -= getVA() - ImageBase + r.offset;
	target->relocateOne(buf + r.offset, r.type, val);			target->relocateOne(buf + r.offset, r.type, val);
	}			}
	}			}

lld/MachO/OutputSection.h

This file was added.

				//===- OutputSection.h ------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLD_MACHO_OUTPUT_SECTION_H
				#define LLD_MACHO_OUTPUT_SECTION_H

				#include "InputSection.h"
				#include "lld/Common/LLVM.h"
				#include "llvm/ADT/MapVector.h"

				namespace lld {
				namespace macho {

				class InputSection;
				class OutputSegment;

				int3Unsubmitted Not Done Reply Inline Actions How does this class hierarchy compare to that in the ELF implementation? I know they have `SectionBase`, `InputSectionBase`, and `MergeInputSection`... was planning to dig into it eventually but curious if you have looked int3: How does this class hierarchy compare to that in the ELF implementation? I know they have…
				KtwuAuthorUnsubmitted Done Reply Inline Actions I hadn't looked too closely tbh. Ktwu: I hadn't looked too closely tbh.
				smeenaiUnsubmitted Not Done Reply Inline Actions `MergeInputSection` is for the ELF SHF_MERGE concept, where a linker can merge the objects inside a section ... it's used for strings, for example (where the linker can perform string merging and tail merging). I think ld64 just treats certain sections types as mergeable instead of having a special flags for that. smeenai: `MergeInputSection` is for the ELF SHF_MERGE concept, where a linker can merge the objects…
				class OutputSection {
				smeenaiUnsubmitted Not Done Reply Inline Actions Can you add a class comment? smeenai: Can you add a class comment?
				smeenaiUnsubmitted Not Done Reply Inline Actions Comment looks great! Same nit about using single-line comments. smeenai: Comment looks great! Same nit about using single-line comments.
				public:
				const InputSection *firstSection() const { return inputs.front(); }
				const InputSection *lastSection() const { return inputs.back(); }

				// These accessors will only be valid after finalizing the section
				uint64_t getFileOffset() const;
				smeenaiUnsubmitted Not Done Reply Inline Actions Why do these need to be virtual? smeenai: Why do these need to be virtual?
				KtwuAuthorUnsubmitted Done Reply Inline Actions ah, they don't, it's refactoring cruft Ktwu: ah, they don't, it's refactoring cruft
				size_t getSize() const { return size; }
				size_t getFileSize() const { return fileSize; }
				bool isHidden() const { return hidden; }

				void addInput(InputSection *input);
				void finalize();

				void writeTo(uint8_t *buf) const;

				StringRef name;
				OutputSegment *parent = nullptr;
				std::vector<InputSection *> inputs;

				int3Unsubmitted Done Reply Inline Actions IMO this should just be `return true`. Whether a section is hidden is orthogonal from whether it is needed: hidden sections will never have a header regardless of whether they have a body. (I know we override this method with `return false` for synthetic sections, but regardless I think it's confusing to write it this way for non-synthetic sections.) int3: IMO this should just be `return true`. Whether a section is hidden is orthogonal from whether…
				uint32_t index = 0;
				smeenaiUnsubmitted Not Done Reply Inline Actions Is this needed for anything other than MergedOutputSection? smeenai: Is this needed for anything other than MergedOutputSection?
				KtwuAuthorUnsubmitted Done Reply Inline Actions No, it's not needed except for MergedOutputSection. Should I dynamically cast each output section before trying to merge an input section into it instead (I wanted to avoid the runtime hit doing that). Ktwu: No, it's not needed except for MergedOutputSection. Should I dynamically cast each output…
				smeenaiUnsubmitted Not Done Reply Inline Actions Yeah, dynamically casting wouldn't be ideal. We could leave this as-is, or make `getOrCreateOutputSection` return a `MergedOutputSection ` instead of an `OutputSection ` (idk if that'd cause other complications). smeenai: Yeah, dynamically casting wouldn't be ideal. We could leave this as-is, or make…
				KtwuAuthorUnsubmitted Done Reply Inline Actions I believe that would require having an assert that the output class being returned, if already created, is in fact a mergeable section. I think a static_cast would be what we'd want, but I like having the explicit assert here in case something goes wrong instead of the undefined behavior of a static_cast gone wrong. Ktwu: I believe that would require having an assert that the output class being returned, if already…
				smeenaiUnsubmitted Not Done Reply Inline Actions Ah. The assert would require an `isa`, which is basically the same overhead as a `dyn_cast`, so perhaps just leaving it as-is is best for now. smeenai: Ah. The assert would require an `isa`, which is basically the same overhead as a `dyn_cast`, so…

				uint64_t addr = 0;
				uint32_t align = 1;
				uint32_t flags = 0;

				private:
				void mergeFlags(uint32_t inputFlags);

				bool hidden = false;
				size_t size = 0;
				size_t fileSize = 0;
				};

				} // namespace macho
				} // namespace lld

				#endif
				int3Unsubmitted Done Reply Inline Actions nit: take by const ref int3: nit: take by const ref
				int3Unsubmitted Done Reply Inline Actions I think `operator<` typically returns a `bool` int3: I think `operator<` typically returns a `bool`
				int3Unsubmitted Done Reply Inline Actions "which stores them in a MapVector of section name -> section" seems clearer. I think it's worth pointing out we have a MapVector since it doesn't make sense to sort any other kind of map int3: "which stores them in a MapVector of section name -> section" seems clearer. I think it's worth…

lld/MachO/OutputSection.cpp

This file was added.

				//===- OutputSection.cpp --------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "OutputSection.h"
				#include "OutputSegment.h"
				#include "lld/Common/ErrorHandler.h"
				#include "lld/Common/Memory.h"
				#include "llvm/BinaryFormat/MachO.h"

				using namespace llvm;
				using namespace llvm::MachO;
				using namespace lld;
				using namespace lld::macho;

				uint64_t OutputSection::getFileOffset() const {
				return parent->fileOff + addr - parent->firstSection()->addr;
				smeenaiUnsubmitted Not Done Reply Inline Actions What's this computation doing? smeenai: What's this computation doing?
				KtwuAuthorUnsubmitted Done Reply Inline Actions It's copying what InputSection did to calculate its file offset. I didn't entirely comprehend this math tbh. Ktwu: It's copying what InputSection did to calculate its file offset. I didn't entirely comprehend…
				smeenaiUnsubmitted Not Done Reply Inline Actions Okay, this makes sense. smeenai: Okay, this makes sense.
				int3Unsubmitted Not Done Reply Inline Actions The parent segment's start address is defined as the address of the first section it contains. So `addr - parent->firstSection()->addr` computes the section's offset within the segment. Maybe we don't need this any more if we have `outSecOff` (going to investigate) int3: The parent segment's start address is defined as the address of the first section it contains.
				int3Unsubmitted Not Done Reply Inline Actions Oh, outSecOff is for the InputSection's offset within the OutputSection, but this computation is for the OutputSection's offset within its segment. So having `outSecOff` doesn't impact this. That said the ELF implementation seems to have an explicit `OutputSection::offset` field, though I'm not sure what populates it... but I think this scheme of computing the offset from the address is fine for now int3: Oh, outSecOff is for the InputSection's offset within the OutputSection, but this computation…
				}

				smeenaiUnsubmitted Done Reply Inline Actions We should only hit this if we have a programming error on our end, right? If so, this should be `llvm_unreachable` instead of `error`. smeenai: We should only hit this if we have a programming error on our end, right? If so, this should be…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Yup, since we're assuming that synthetic sections cannot be merged, this shouldn't happen. Ktwu: Yup, since we're assuming that synthetic sections cannot be merged, this shouldn't happen.
				void OutputSection::addInput(InputSection *input) {
				if (this->inputs.empty()) {
				this->align = input->align;
				this->flags = input->flags;
				} else {
				this->mergeFlags(input->flags);
				int3Unsubmitted Not Done Reply Inline Actions can we just define this method on MergedOutputSection? int3: can we just define this method on MergedOutputSection?
				KtwuAuthorUnsubmitted Done Reply Inline Actions Since output segments just contain output sections now, they're not aware of whether they contain synthetic sections or mergable sections. I'm not sure how to cast that away (I guess a dynamic cast at runtime, but I figured this would be cheaper). Ktwu: Since output segments just contain output sections now, they're not aware of whether they…
				this->align = std::max(this->align, input->align);
				}
				smeenaiUnsubmitted Not Done Reply Inline Actions Can you add more details to this error message? It'd be ideal to have things like the section in question, the object file it's coming from, etc. Also, is this what ld64 does? smeenai: Can you add more details to this error message? It'd be ideal to have things like the section…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Sure. So far as I can tell, ld64 doesn't do section merging on a flag level like I first thought; now that I'm diving into it, OutputFile.cpp separates flags into individual boolean attributes. It's not the easiest codebase to navigate :/ Ktwu: Sure. So far as I can tell, ld64 doesn't do section merging on a flag level like I first…

				// TODO: reconsider how hidden inputs are merged (or figure out
				smeenaiUnsubmitted Not Done Reply Inline Actions We should experiment with how ld64 handles merging hidden sections, or if that's even a thing it does. (I don't think you can specify a section is hidden yourself; ld64 just has a list of atoms it defines to be hidden.) smeenai: We should experiment with how ld64 handles merging hidden sections, or if that's even a thing…
				// if they need merging at all)
				this->hidden \|= input->isHidden();
				smeenaiUnsubmitted Done Reply Inline Actions Can you add a TODO for figuring out how we should handle input sections with conflicting hidden-ness? smeenai: Can you add a TODO for figuring out how we should handle input sections with conflicting hidden…
				int3Unsubmitted Not Done Reply Inline Actions Personally I don't think we should support hidden InputSections until we find a use case (I'm not aware of one so far) int3: Personally I don't think we should support hidden InputSections until we find a use case (I'm…
				int3Unsubmitted Not Done Reply Inline Actions I.e. I think the synthetic sections could become OutputSections. So the only InputSections would actually be from real inputs, which will never be hidden int3: I.e. I think the synthetic sections could become OutputSections. So the only InputSections…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Yeah I could try that! Ktwu: Yeah I could try that!
				this->inputs.push_back(input);
				input->parent = this;
				}

				void OutputSection::finalize() {
				uint64_t addr = this->addr;
				this->fileSize = 0;

				for (InputSection *i : inputs) {
				i->outSecOff = alignTo(addr, i->align) - this->addr;
				addr += i->getSize();
				this->fileSize += i->getFileSize();
				}
				this->size = addr - this->addr;
				}

				void OutputSection::writeTo(uint8_t *buf) const {
				for (InputSection *i : inputs) {
				i->writeTo(buf);
				buf += i->getFileSize();
				}
				}

				// TODO: this is most likely wrong; reconsider how section flags
				// are actually merged.
				int3Unsubmitted Not Done Reply Inline Actions hidden sections should still be written, only their headers get omitted. Also I think we may be able to have the isHidden() property on (synthetic) OutputSections and not on InputSections int3: hidden sections should still be written, only their headers get omitted. Also I think we may…
				void OutputSection::mergeFlags(uint32_t inputFlags) {
				uint8_t sectionFlag = MachO::SECTION_TYPE & inputFlags;
				if (sectionFlag != (MachO::SECTION_TYPE & this->flags))
				error("Cannot add merge section; inconsistent type flags " +
				Twine(sectionFlag));

				uint32_t inconsistentFlags =
				MachO::S_ATTR_DEBUG \| MachO::S_ATTR_STRIP_STATIC_SYMS \|
				MachO::S_ATTR_NO_DEAD_STRIP \| MachO::S_ATTR_LIVE_SUPPORT;
				if ((inputFlags ^ this->flags) & inconsistentFlags)
				error("Cannot add merge section; cannot merge inconsistent flags");

				// Negate pure instruction presence if any segment isn't pure.
				uint32_t pureMask =
				~(MachO::S_ATTR_PURE_INSTRUCTIONS & inputFlags & this->flags);

				// Merge the rest
				this->flags \|= inputFlags;
				this->flags &= pureMask;
				}

lld/MachO/OutputSegment.h

	//===- OutputSegment.h ------------------------------------------- C++ --===//			//===- OutputSegment.h ------------------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLD_MACHO_OUTPUT_SEGMENT_H			#ifndef LLD_MACHO_OUTPUT_SEGMENT_H
	#define LLD_MACHO_OUTPUT_SEGMENT_H			#define LLD_MACHO_OUTPUT_SEGMENT_H

				#include "OutputSection.h"
	#include "lld/Common/LLVM.h"			#include "lld/Common/LLVM.h"
	#include "llvm/ADT/MapVector.h"			#include "llvm/ADT/MapVector.h"

	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	namespace segment_names {			namespace segment_names {

	constexpr const char *text = "__TEXT";			constexpr const char *text = "__TEXT";
	constexpr const char *pageZero = "__PAGEZERO";			constexpr const char *pageZero = "__PAGEZERO";
	constexpr const char *linkEdit = "__LINKEDIT";			constexpr const char *linkEdit = "__LINKEDIT";

	} // namespace segment_names			} // namespace segment_names

				class OutputSection;
	class InputSection;			class InputSection;

	class OutputSegment {			class OutputSegment {
	public:			public:
	InputSection *firstSection() const { return sections.front().second.at(0); }			typedef llvm::MapVector<StringRef, OutputSection *> SectionMap;
				int3Unsubmitted Done Reply Inline Actions nit: I think `using SectionMap = ...` is the more modern C++ way (and is the method favored by lld-ELF/COFF) int3: nit: I think `using SectionMap = ...` is the more modern C++ way (and is the method favored by…
				smeenaiUnsubmitted Done Reply Inline Actions This one needs to be addressed. smeenai: This one needs to be addressed.

	InputSection *lastSection() const { return sections.back().second.back(); }			const OutputSection *firstSection() const { return sections.front().second; }

				const OutputSection *lastSection() const { return sections.back().second; }

	bool isNeeded() const {			bool isNeeded() const {
	return !sections.empty() \|\| name == segment_names::linkEdit;			return !sections.empty() \|\| name == segment_names::linkEdit;
	}			}

				int3Unsubmitted Done Reply Inline Actions @ruiu will object to the `auto` :) I personally hate typing out an `std::pair` type, but we should at least unpack `i.second` into a var with a named type nit 2: use `auto &` here alternatively we could replace the loop with `std::any_of` int3: @ruiu will object to the `auto` :) I personally hate typing out an `std::pair` type, but we…
	void addSection(InputSection *);			OutputSection addSection(InputSection );

				int3Unsubmitted Done Reply Inline Actions `addOutputSection` seems like a more fitting name int3: `addOutputSection` seems like a more fitting name
				int3Unsubmitted Done Reply Inline Actions also, I think if would make sense if we moved `createOutputSection` from Writer.cpp to a method on this class, then this helper method can be made private int3: also, I think if would make sense if we moved `createOutputSection` from Writer.cpp to a method…
	const llvm::MapVector<StringRef, std::vector<InputSection *>> &			const SectionMap &getSections() const { return sections; }
				int3Unsubmitted Not Done Reply Inline Actions if `sections` isn't private any more, it doesn't need an accessor edit: I see that it's non-private only because we need to sort it. How about making the sorting a method on this class? Related thought... I see that MapVector has a `takeVector` method that clears out the map and returns the underlying vector. Maybe we could do that -- have a `getSortedSections` method that returns an empty vector until we actually sort things. That would mean that the comparator can work on an actual vector & take single elements instead of `std::pair`s. Just my 2c, might be overcomplicating things here int3: if `sections` isn't private any more, it doesn't need an accessor edit: I see that it's non…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Yes to section sorting living here, nay to the `takeVector` idea (at least in this diff). Ktwu: Yes to section sorting living here, nay to the `takeVector` idea (at least in this diff).
				int3Unsubmitted Not Done Reply Inline Actions I was looking at the implementation of `MapVector` today and I realized that the sort only operates on the vector and not the map, so the container exhibits some questionable behavior after sorting :D MapVector<int, int> mv; mv[2] = 98; mv[1] = 99; std::sort(mv.begin(), mv.end()); for (int i = 1; i <= 2; ++i) fprintf(stderr, "%d %d\n", i, mv[i]); This prints 1 98 2 99 So I think we should really use `takeVector` before sorting. Two more refactoring ideas to tack on to that: We could filter out the unneeded OutputSections as this stage too, so we don't have to worry about checking isNeeded() afterward. Maybe we could consider not creating the OutputSegments till after the sorting/filtering has been done, so we don't have the OutputSegments in a state where some operations aren't valid. But that would probably mean an additional `outputSections` global, so there's some tradeoff there. Up to you int3: I was looking at the implementation of `MapVector` today and I realized that the sort only…
	getSections() const {
	return sections;
	}

				uint32_t numNonHiddenSections = 0;
	uint64_t fileOff = 0;			uint64_t fileOff = 0;
	StringRef name;			StringRef name;
	uint32_t numNonHiddenSections = 0;
	uint32_t maxProt = 0;			uint32_t maxProt = 0;
	uint32_t initProt = 0;			uint32_t initProt = 0;
	uint8_t index;			uint8_t index;
				SectionMap sections;
	private:
	llvm::MapVector<StringRef, std::vector<InputSection *>> sections;
	};			};

	extern std::vector<OutputSegment *> outputSegments;			extern std::vector<OutputSegment *> outputSegments;

	OutputSegment *getOutputSegment(StringRef name);			OutputSegment *getOutputSegment(StringRef name);
	OutputSegment *getOrCreateOutputSegment(StringRef name);			OutputSegment *getOrCreateOutputSegment(StringRef name);

	} // namespace macho			} // namespace macho
	} // namespace lld			} // namespace lld

				smeenaiUnsubmitted Not Done Reply Inline Actions Given that this might be the common case, would it make sense to cache the `find` somehow? smeenai: Given that this might be the common case, would it make sense to cache the `find` somehow?
				KtwuAuthorUnsubmitted Done Reply Inline Actions Good idea! Ktwu: Good idea!
				smeenaiUnsubmitted Not Done Reply Inline Actions This one isn't addressed, but given that we shouldn't have too many segments (besides the ones already in the map, I can only think of `__DATA`), perhaps this is okay as-is? smeenai: This one isn't addressed, but given that we shouldn't have too many segments (besides the ones…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Ah, no, I legit forgot to address this, so I don't mind getting to it... Ktwu: Ah, no, I legit forgot to address this, so I don't mind getting to it...
	#endif			#endif

lld/MachO/OutputSegment.cpp

	Show All 27 Lines
	}			}

	static uint32_t maxProt(StringRef name) {			static uint32_t maxProt(StringRef name) {
	if (name == segment_names::pageZero)			if (name == segment_names::pageZero)
	return 0;			return 0;
	return VM_PROT_READ \| VM_PROT_WRITE \| VM_PROT_EXECUTE;			return VM_PROT_READ \| VM_PROT_WRITE \| VM_PROT_EXECUTE;
	}			}

	void OutputSegment::addSection(InputSection *isec) {			OutputSection OutputSegment::addSection(InputSection input) {
	isec->parent = this;			OutputSegment::SectionMap::iterator i = this->sections.find(input->name);
	std::vector<InputSection *> &vec = sections[isec->name];			if (i != this->sections.end()) {
				smeenaiUnsubmitted Done Reply Inline Actions `auto &i`, and same comment about unpacking `i.second` smeenai: `auto &i`, and same comment about unpacking `i.second`
	if (vec.empty() && !isec->isHidden()) {			auto os = i->second;
	++numNonHiddenSections;			os->addInput(input);
				return os;
				smeenaiUnsubmitted Not Done Reply Inline Actions I'm wondering if it'd be better to construct an OutputSection independently and then pass that into this function instead. smeenai: I'm wondering if it'd be better to construct an OutputSection independently and then pass that…
				smeenaiUnsubmitted Not Done Reply Inline Actions If this is gonna be called often, we should either do some sort of caching, or else just do the computation as part of `addOutputSection` and make it a public member variable. smeenai: If this is gonna be called often, we should either do some sort of caching, or else just do the…
				KtwuAuthorUnsubmitted Done Reply Inline Actions I believe I hit issues trying to cache this on adding an OutputSection; I think the GotSection made things difficult since its isNeeded() attribute is dynamic. Ktwu: I believe I hit issues trying to cache this on adding an OutputSection; I think the GotSection…
				smeenaiUnsubmitted Not Done Reply Inline Actions Ah, makes sense. smeenai: Ah, makes sense.
	}			}
	vec.push_back(isec);
				auto *os = make<OutputSection>();
				os->name = input->name;
				os->parent = this;
				os->addInput(input);
				smeenaiUnsubmitted Done Reply Inline Actions Should we assert that `sections[os->name]` doesn't already exist? smeenai: Should we assert that `sections[os->name]` doesn't already exist?
				this->sections[os->name] = os;

				if (!os->isHidden()) {
				this->numNonHiddenSections++;
				smeenaiUnsubmitted Done Reply Inline Actions FWIW, `auto` is fine for iterators, but this is fine too. smeenai: FWIW, `auto` is fine for iterators, but this is fine too.
				}
				smeenaiUnsubmitted Not Done Reply Inline Actions Is the check flipped? smeenai: Is the check flipped?
				KtwuAuthorUnsubmitted Done Reply Inline Actions Oops, yes. Ktwu: Oops, yes.

				return os;
	}			}

	static llvm::DenseMap<StringRef, OutputSegment *> nameToOutputSegment;			static llvm::DenseMap<StringRef, OutputSegment *> nameToOutputSegment;
	std::vector<OutputSegment *> macho::outputSegments;			std::vector<OutputSegment *> macho::outputSegments;

	OutputSegment *macho::getOutputSegment(StringRef name) {			OutputSegment *macho::getOutputSegment(StringRef name) {
	return nameToOutputSegment.lookup(name);			return nameToOutputSegment.lookup(name);
	}			}
	Show All 14 Lines

lld/MachO/Symbols.h

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	public:
static bool classof(const Symbol *s) { return s->kind() == DylibKind; }		static bool classof(const Symbol *s) { return s->kind() == DylibKind; }

DylibFile *file;		DylibFile *file;
uint32_t gotIndex = UINT32_MAX;		uint32_t gotIndex = UINT32_MAX;
};		};

inline uint64_t Symbol::getVA() const {		inline uint64_t Symbol::getVA() const {
if (auto *d = dyn_cast<Defined>(this))		if (auto *d = dyn_cast<Defined>(this))
return d->isec->addr + d->value - ImageBase;		return d->isec->getVA() + d->value - ImageBase;
return 0;		return 0;
}		}

union SymbolUnion {		union SymbolUnion {
alignas(Defined) char a[sizeof(Defined)];		alignas(Defined) char a[sizeof(Defined)];
alignas(Undefined) char b[sizeof(Undefined)];		alignas(Undefined) char b[sizeof(Undefined)];
alignas(DylibSymbol) char c[sizeof(DylibSymbol)];		alignas(DylibSymbol) char c[sizeof(DylibSymbol)];
};		};
Show All 18 Lines

lld/MachO/SyntheticSections.cpp

	Show All 25 Lines
	namespace lld {			namespace lld {
	namespace macho {			namespace macho {

	MachHeaderSection::MachHeaderSection() {			MachHeaderSection::MachHeaderSection() {
	// dyld3's MachOLoaded::getSlide() assumes that the __TEXT segment starts			// dyld3's MachOLoaded::getSlide() assumes that the __TEXT segment starts
	// from the beginning of the file (i.e. the header).			// from the beginning of the file (i.e. the header).
	segname = segment_names::text;			segname = segment_names::text;
	name = section_names::header;			name = section_names::header;
	}			}
				int3Unsubmitted Done Reply Inline Actions ultra nit: how about "Synthetic sections always know which segment they belong to, so hook them up when they're made"? "No need to orphan" seems a bit weird because it hints that one might naively want to orphan them but doesn't indicate why int3: ultra nit: how about "Synthetic sections always know which segment they belong to, so hook them…

	void MachHeaderSection::addLoadCommand(LoadCommand *lc) {			void MachHeaderSection::addLoadCommand(LoadCommand *lc) {
				smeenaiUnsubmitted Done Reply Inline Actions Super nit: use a member initialization list instead. smeenai: Super nit: use a member initialization list instead.
	loadCommands.push_back(lc);			loadCommands.push_back(lc);
	sizeOfCmds += lc->getSize();			sizeOfCmds += lc->getSize();
	}			}

	size_t MachHeaderSection::getSize() const {			size_t MachHeaderSection::getSize() const {
	return sizeof(mach_header_64) + sizeOfCmds;			return sizeof(mach_header_64) + sizeOfCmds;
	}			}

	Show All 16 Lines

	PageZeroSection::PageZeroSection() {			PageZeroSection::PageZeroSection() {
	segname = segment_names::pageZero;			segname = segment_names::pageZero;
	name = section_names::pageZero;			name = section_names::pageZero;
	}			}

	GotSection::GotSection() {			GotSection::GotSection() {
	segname = "__DATA_CONST";			segname = "__DATA_CONST";
	name = "__got";			name = "__got";
				smeenaiUnsubmitted Not Done Reply Inline Actions @int3 how come these strings are just here directly vs. all the other ones being named constants? Also, not this diff, but is `__DATA_CONST` correct? ld64 puts this in `__DATA` instead as far as I can see. It's not quite constant cos the dynamic linker's gonna fill it in, though idk if it has the equivalent to ELF's RELRO. smeenai: @int3 how come these strings are just here directly vs. all the other ones being named…
				int3Unsubmitted Not Done Reply Inline Actions I don't think we're currently referencing this from anywhere else (in particular I didn't give them an order in the sorting comparator), so it wasn't technically necessary, though we could definitely make them constants for the sake of uniformity int3: I don't think we're currently referencing this from anywhere else (in particular I didn't give…
				KtwuAuthorUnsubmitted Done Reply Inline Actions I'll make 'em a constant for now; can we deal with DATA vs DATA_CONST in another diff if need be? Ktwu: I'll make 'em a constant for now; can we deal with __DATA vs __DATA_CONST in another diff if…
				int3Unsubmitted Not Done Reply Inline Actions Oh sorry I missed the 2nd part of the comment about the segment. I'm pretty sure it's in `__DATA_CONST` at least on Catalina; just tried it out. But yeah we can deal with it in another diff if necessary int3: Oh sorry I missed the 2nd part of the comment about the segment. I'm pretty sure it's in…
				smeenaiUnsubmitted Not Done Reply Inline Actions Ah, sorry, I was just wondering why these weren't constants. I didn't mean to imply that you had to do it in this diff, but thanks for taking care of it :) @int3 interesting, this appears to be a Catalina vs older OS thing. Possibly related to dyld3? smeenai: Ah, sorry, I was just wondering why these weren't constants. I didn't mean to imply that you…
	align = 8;			align = 8;
	flags = S_NON_LAZY_SYMBOL_POINTERS;			flags = S_NON_LAZY_SYMBOL_POINTERS;

	// TODO: section_64::reserved1 should be an index into the indirect symbol			// TODO: section_64::reserved1 should be an index into the indirect symbol
	// table, which we do not currently emit			// table, which we do not currently emit
	}			}

	void GotSection::addEntry(DylibSymbol &sym) {			void GotSection::addEntry(DylibSymbol &sym) {
	Show All 24 Lines
	// subsequent opcodes only need to encode the differences between bindings.			// subsequent opcodes only need to encode the differences between bindings.
	void BindingSection::finalizeContents() {			void BindingSection::finalizeContents() {
	if (!isNeeded())			if (!isNeeded())
	return;			return;

	raw_svector_ostream os{contents};			raw_svector_ostream os{contents};
	os << static_cast<uint8_t>(BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB \|			os << static_cast<uint8_t>(BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB \|
	in.got->parent->index);			in.got->parent->index);
	encodeULEB128(in.got->addr - in.got->parent->firstSection()->addr, os);			encodeULEB128(in.got->outSecOff, os);
	for (const DylibSymbol *sym : in.got->getEntries()) {			for (const DylibSymbol *sym : in.got->getEntries()) {
	// TODO: Implement compact encoding -- we only need to encode the			// TODO: Implement compact encoding -- we only need to encode the
	// differences between consecutive symbol entries.			// differences between consecutive symbol entries.
	if (sym->file->ordinal <= BIND_IMMEDIATE_MASK) {			if (sym->file->ordinal <= BIND_IMMEDIATE_MASK) {
	os << static_cast<uint8_t>(BIND_OPCODE_SET_DYLIB_ORDINAL_IMM \|			os << static_cast<uint8_t>(BIND_OPCODE_SET_DYLIB_ORDINAL_IMM \|
	sym->file->ordinal);			sym->file->ordinal);
	} else {			} else {
	error("TODO: Support larger dylib symbol ordinals");			error("TODO: Support larger dylib symbol ordinals");
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	void SymtabSection::writeTo(uint8_t *buf) {			void SymtabSection::writeTo(uint8_t *buf) {
	auto nList = reinterpret_cast<nlist_64 >(buf);			auto nList = reinterpret_cast<nlist_64 >(buf);
	for (const SymtabEntry &entry : symbols) {			for (const SymtabEntry &entry : symbols) {
	nList->n_strx = entry.strx;			nList->n_strx = entry.strx;
	// TODO support other symbol types			// TODO support other symbol types
	// TODO populate n_desc			// TODO populate n_desc
	if (auto defined = dyn_cast<Defined>(entry.sym)) {			if (auto defined = dyn_cast<Defined>(entry.sym)) {
	nList->n_type = N_EXT \| N_SECT;			nList->n_type = N_EXT \| N_SECT;
	nList->n_sect = defined->isec->sectionIndex;			nList->n_sect = defined->isec->parent->index;
	// For the N_SECT symbol type, n_value is the address of the symbol			// For the N_SECT symbol type, n_value is the address of the symbol
	nList->n_value = defined->value + defined->isec->addr;			nList->n_value = defined->value + defined->isec->getVA();
	}			}
	++nList;			++nList;
	}			}
	}			}

	StringPoolSection::StringPoolSection() {			StringPoolSection::StringPoolSection() {
	segname = segment_names::linkEdit;			segname = segment_names::linkEdit;
	name = section_names::stringPool;			name = section_names::stringPool;
	Show All 21 Lines

lld/MachO/Writer.cpp

Show All 32 Lines
class LCSymtab;		class LCSymtab;

class Writer {		class Writer {
public:		public:
Writer() : buffer(errorHandler().outputBuffer) {}		Writer() : buffer(errorHandler().outputBuffer) {}

void scanRelocations();		void scanRelocations();
void createHiddenSections();		void createHiddenSections();
void sortSections();		void createOutputSections();
void createLoadCommands();		void createLoadCommands();
void assignAddresses(OutputSegment *);		void assignAddresses(OutputSegment *);
void createSymtabContents();		void createSymtabContents();

void openFile();		void openFile();
void writeSections();		void writeSections();

void run();		void run();
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void writeTo(uint8_t *buf) const override {

c->cmd = LC_SEGMENT_64;		c->cmd = LC_SEGMENT_64;
c->cmdsize = getSize();		c->cmdsize = getSize();
memcpy(c->segname, name.data(), name.size());		memcpy(c->segname, name.data(), name.size());
c->fileoff = seg->fileOff;		c->fileoff = seg->fileOff;
c->maxprot = seg->maxProt;		c->maxprot = seg->maxProt;
c->initprot = seg->initProt;		c->initprot = seg->initProt;

if (seg->getSections().empty())		if (seg->getSections().empty())
return;		return;
		int3Unsubmitted Done Reply Inline Actions I think this should stay as `getSections().empty()`, and the check for `isNeeded()` should be moved outside writeTo(). We should just not create LCSegment commands for unneeded segments. The `empty()` check however is still needed because `__LINKEDIT` can be empty. int3: I think this should stay as `getSections().empty()`, and the check for `isNeeded()` should be…

c->vmaddr = seg->firstSection()->addr;		c->vmaddr = seg->firstSection()->addr;
c->vmsize =		c->vmsize =
seg->lastSection()->addr + seg->lastSection()->getSize() - c->vmaddr;		seg->lastSection()->addr + seg->lastSection()->getSize() - c->vmaddr;
c->nsects = seg->numNonHiddenSections;		c->nsects = seg->numNonHiddenSections;

for (auto &p : seg->getSections()) {		for (auto &p : seg->getSections()) {
StringRef s = p.first;		StringRef s = p.first;
ArrayRef<InputSection *> sections = p.second;		OutputSection *section = p.second;
for (InputSection *isec : sections)		c->filesize += section->getFileSize();
c->filesize += isec->getFileSize();		if (section->isHidden())
if (sections[0]->isHidden())
continue;		continue;

auto sectHdr = reinterpret_cast<section_64 >(buf);		auto sectHdr = reinterpret_cast<section_64 >(buf);
buf += sizeof(section_64);		buf += sizeof(section_64);

memcpy(sectHdr->sectname, s.data(), s.size());		memcpy(sectHdr->sectname, s.data(), s.size());
memcpy(sectHdr->segname, name.data(), name.size());		memcpy(sectHdr->segname, name.data(), name.size());

sectHdr->addr = sections[0]->addr;		sectHdr->addr = section->addr;
sectHdr->offset = sections[0]->getFileOffset();		sectHdr->offset = section->getFileOffset();
sectHdr->align = sections[0]->align;		sectHdr->align = Log2_32(section->align);
uint32_t maxAlign = 0;		sectHdr->flags = section->flags;
for (const InputSection *section : sections)		sectHdr->size = section->getSize();
maxAlign = std::max(maxAlign, section->align);
sectHdr->align = Log2_32(maxAlign);
sectHdr->flags = sections[0]->flags;
sectHdr->size = sections.back()->addr + sections.back()->getSize() -
sections[0]->addr;
}		}
}		}

private:		private:
StringRef name;		StringRef name;
OutputSegment *seg;		OutputSegment *seg;
};		};

▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	std::pair<uint32_t, uint32_t> order(const InputSection *isec) {
OrderInfo &info = it->second;		OrderInfo &info = it->second;
auto sectIt = info.sectionOrdering.find(isec->name);		auto sectIt = info.sectionOrdering.find(isec->name);
if (sectIt != info.sectionOrdering.end())		if (sectIt != info.sectionOrdering.end())
return {info.segmentOrder, sectIt->second};		return {info.segmentOrder, sectIt->second};
return {info.segmentOrder, info.sectionOrdering.size()};		return {info.segmentOrder, info.sectionOrdering.size()};
}		}

bool operator()(const InputSection a, const InputSection b) {		bool operator()(const InputSection a, const InputSection b) {
return order(a) < order(b);		return order(a) < order(b);
}		}
		int3Unsubmitted Not Done Reply Inline Actions Oh I see... looks like the problem with my initial sorting scheme is that it assumed that the segments were only ever created once all the sections were sorted. But this didn't account for __LINKEDIT, though I was lucky enough to create it after the sorting, so things worked out... but explicitly sorting both the segments and sections is more reliable. I think we can simplify the `order` method a bit though -- it's currently written to support comparing sections across different segments, and we're not using that functionality any more. Can we have two separate `order` methods, one for segments and the other for sections in the same segment? int3: Oh I see... looks like the problem with my initial sorting scheme is that it assumed that the…
		KtwuAuthorUnsubmitted Done Reply Inline Actions I was thinking about that, too, so I'll try it. Ktwu: I was thinking about that, too, so I'll try it.

private:		private:
const StringRef defaultPosition = StringRef();		const StringRef defaultPosition = StringRef();
DenseMap<StringRef, OrderInfo> orderMap;		DenseMap<StringRef, OrderInfo> orderMap;
};		};

} // namespace		} // namespace

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	void Writer::createHiddenSections() {
case MH_DYLIB:		case MH_DYLIB:
break;		break;
default:		default:
error("unhandled output file type");		error("unhandled output file type");
return;		return;
}		}
}		}

void Writer::sortSections() {		void Writer::createOutputSections() {
llvm::stable_sort(inputSections, SectionComparator());		llvm::stable_sort(inputSections, SectionComparator());

// TODO This is wrong; input sections ought to be grouped into		// Add input sections to output sections/segments.
// output sections, which are then organized like this.
uint32_t sectionIndex = 0;
// Add input sections to output segments.
for (InputSection *isec : inputSections) {		for (InputSection *isec : inputSections) {
if (isec->isNeeded()) {		if (isec->isNeeded()) {
if (!isec->isHidden())
isec->sectionIndex = ++sectionIndex;
getOrCreateOutputSegment(isec->segname)->addSection(isec);		getOrCreateOutputSegment(isec->segname)->addSection(isec);
}		}
}		}

		// Now that the input sections are sorted, assign the final
		// output section indices.
		uint32_t sectionIndex = 0;
		for (OutputSegment *seg : outputSegments) {
		for (auto &p : seg->getSections()) {
		OutputSection *section = p.second;
		if (!section->isHidden()) {
		section->index = ++sectionIndex;
		}
		int3Unsubmitted Not Done Reply Inline Actions isHidden check should be retained here int3: isHidden check should be retained here
		}
		}
		smeenaiUnsubmitted Not Done Reply Inline Actions Nit: might be nicer to assign these to variables instead of chaining. smeenai: Nit: might be nicer to assign these to variables instead of chaining.
		KtwuAuthorUnsubmitted Done Reply Inline Actions I personally prefer chaining :D Ktwu: I personally prefer chaining :D
		smeenaiUnsubmitted Not Done Reply Inline Actions Haha, fair enough. smeenai: Haha, fair enough.
}		}

void Writer::assignAddresses(OutputSegment *seg) {		void Writer::assignAddresses(OutputSegment *seg) {
addr = alignTo(addr, PageSize);		addr = alignTo(addr, PageSize);
fileOff = alignTo(fileOff, PageSize);		fileOff = alignTo(fileOff, PageSize);
seg->fileOff = fileOff;		seg->fileOff = fileOff;

for (auto &p : seg->getSections()) {		for (auto &p : seg->getSections()) {
ArrayRef<InputSection *> sections = p.second;		OutputSection *section = p.second;
for (InputSection *isec : sections) {		addr = alignTo(addr, section->align);
addr = alignTo(addr, isec->align);		section->addr = addr;
isec->addr = addr;		section->finalize();
addr += isec->getSize();
fileOff += isec->getFileSize();		addr += section->getSize();
}		fileOff += section->getFileSize();
		smeenaiUnsubmitted Done Reply Inline Actions Would it make sense to have each `OutputSection` store its `fileOff` (in addition to or instead of the `OutputSegment` holding it), so that we don't need to recompute it in the writeSections loop below? smeenai: Would it make sense to have each `OutputSection` store its `fileOff` (in addition to or instead…
}		}
}		}

void Writer::openFile() {		void Writer::openFile() {
Expected<std::unique_ptr<FileOutputBuffer>> bufferOrErr =		Expected<std::unique_ptr<FileOutputBuffer>> bufferOrErr =
FileOutputBuffer::create(config->outputFile, fileOff,		FileOutputBuffer::create(config->outputFile, fileOff,
FileOutputBuffer::F_executable);		FileOutputBuffer::F_executable);

if (!bufferOrErr)		if (!bufferOrErr)
error("failed to open " + config->outputFile + ": " +		error("failed to open " + config->outputFile + ": " +
llvm::toString(bufferOrErr.takeError()));		llvm::toString(bufferOrErr.takeError()));
else		else
buffer = std::move(*bufferOrErr);		buffer = std::move(*bufferOrErr);
}		}

void Writer::writeSections() {		void Writer::writeSections() {
uint8_t *buf = buffer->getBufferStart();		uint8_t *buf = buffer->getBufferStart();
for (OutputSegment *seg : outputSegments) {		for (OutputSegment *seg : outputSegments) {
uint64_t fileOff = seg->fileOff;		uint64_t fileOff = seg->fileOff;
for (auto &sect : seg->getSections()) {		for (auto &p : seg->getSections()) {
for (InputSection *isec : sect.second) {		OutputSection *section = p.second;
isec->writeTo(buf + fileOff);		section->writeTo(buf + fileOff);
		int3Unsubmitted Not Done Reply Inline Actions we should avoid writing unneeded output sections (My stacked diff makes that assumption for one of the stub helper synthetic section) int3: we should avoid writing unneeded output sections (My stacked diff makes that assumption for one…
fileOff += isec->getFileSize();		fileOff += section->getFileSize();
}
}		}
}		}
}		}

void Writer::run() {		void Writer::run() {
scanRelocations();		scanRelocations();
createHiddenSections();		createHiddenSections();
// Sort and assign sections to their respective segments. No more sections can		// Sort and assign sections to their respective segments. No more sections can
// be created after this method runs.		// be created after this method runs.
sortSections();		createOutputSections();
// dyld requires __LINKEDIT segment to always exist (even if empty).		// dyld requires __LINKEDIT segment to always exist (even if empty).
getOrCreateOutputSegment(segment_names::linkEdit);		auto *linkEditSegment = getOrCreateOutputSegment(segment_names::linkEdit);
// No more segments can be created after this method runs.		// No more segments can be created after this method runs.
		smeenaiUnsubmitted Done Reply Inline Actions Nit: use an explicit type instead of auto smeenai: Nit: use an explicit type instead of auto
		KtwuAuthorUnsubmitted Done Reply Inline Actions What's up with `auto`? It's not like it's forbidden from use in the style guide: https://llvm.org/docs/CodingStandards.html#id27 I'm curious about folks' reasoning behind its usage (or discouragement of). Ktwu: What's up with `auto`? It's not like it's forbidden from use in the style guide: https://llvm.
		smeenaiUnsubmitted Not Done Reply Inline Actions Good question. LLD has its own style guidelines (e.g. variables are lowerCamelCase). Unfortunately, I don't think those are codified anywhere, but I'm basing this on what I've seen in reviews. `auto` is discouraged unless the actual type is spelled out in the same expression somewhere (e.g. as a result of a cast) or is a huge pain to type out (e.g. iterators). Otherwise explicit types are preferred. I personally like that because I don't use an IDE (and I haven't set up any ctags or LSP-like things for my editor), so having explicit types available makes it easier for me to comprehend the code. It's a little less clear-cut in cases like auto linkEditSegment = getOrCreateOutputSegment(segment_names::linkEdit); where it'd be pretty fair to assume that `getOrCreateOutputSegment` returns an `OutputSegment `. It's a bit ambiguous whether it'd be a pointer, reference, or copy, but you could use `auto ` to disambiguate that. Nevertheless, it's not too much more typing to just spell the name out, so I'd prefer to err on the side of explicitness there. There was a post to the mailing list about auto usage in LLVM a while back, but there was no clear resolution: http://llvm.1065342.n5.nabble.com/llvm-dev-RFC-Modernizing-our-use-of-auto-td123947.html (the authors on that thread are bogus; if you want the original authors, look for that subject on http://lists.llvm.org/pipermail/llvm-dev/2018-November/ and http://lists.llvm.org/pipermail/llvm-dev/2018-December/) smeenai: Good question. LLD has its own style guidelines (e.g. variables are lowerCamelCase).
createLoadCommands();		createLoadCommands();

// Ensure that segments (and the sections they contain) are allocated		// Ensure that segments (and the sections they contain) are allocated
// addresses in ascending order, which dyld requires.		// addresses in ascending order, which dyld requires.
//		//
// Note that at this point, __LINKEDIT sections are empty, but we need to		// Note that at this point, __LINKEDIT sections are empty, but we need to
// determine addresses of other segments/sections before generating its		// determine addresses of other segments/sections before generating its
// contents.		// contents.
for (OutputSegment *seg : outputSegments)		for (OutputSegment *seg : outputSegments)
assignAddresses(seg);		assignAddresses(seg);
		int3Unsubmitted Not Done Reply Inline Actions Thanks, I think it's clearer this way int3: Thanks, I think it's clearer this way

// Fill __LINKEDIT contents.		// Fill __LINKEDIT contents.
bindingSection->finalizeContents();		bindingSection->finalizeContents();
exportSection->finalizeContents();		exportSection->finalizeContents();
symtabSection->finalizeContents();		symtabSection->finalizeContents();

// Now that __LINKEDIT is filled out, do a proper calculation of its		// Now that __LINKEDIT is filled out, do a proper calculation of its
// addresses and offsets. We don't have to recalculate the other segments		// addresses and offsets. We don't have to recalculate the other segments
// since sortSections() ensures that __LINKEDIT is the last segment.		// since createOutputSections() ensures that __LINKEDIT is the last segment.
assignAddresses(getOutputSegment(segment_names::linkEdit));		assignAddresses(linkEditSegment);

openFile();		openFile();
if (errorCount())		if (errorCount())
return;		return;

writeSections();		writeSections();

if (auto e = buffer->commit())		if (auto e = buffer->commit())
error("failed to write to the output file: " + toString(std::move(e)));		error("failed to write to the output file: " + toString(std::move(e)));
}		}

void macho::writeResult() { Writer().run(); }		void macho::writeResult() { Writer().run(); }

void macho::createSyntheticSections() {		void macho::createSyntheticSections() {
in.got = createInputSection<GotSection>();		in.got = createInputSection<GotSection>();
}		}

lld/test/MachO/section-merge.s

This file was added.

				# REQUIRES: x86
				# RUN: mkdir -p %t
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %p/Inputs/libhello.s \
				# RUN: -o %t/libhello.o
				smeenaiUnsubmitted Not Done Reply Inline Actions It'd be nice to check that the contents of the sections are merged correctly as well. Also, we should be checking that text segments are merged correctly, since that's the most common case. smeenai: It'd be nice to check that the contents of the sections are merged correctly as well. Also, we…
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %p/Inputs/libgoodbye.s \
				# RUN: -o %t/libgoodbye.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %s \
				# RUN: -o %t/main.o
				# RUN: lld -flavor darwinnew -o %t/output %t/libgoodbye.o %t/libhello.o %t/main.o
				# RUN: llvm-readobj -symbols %t/output \| FileCheck %s

				# CHECK: Name: _goodbye_world
				smeenaiUnsubmitted Not Done Reply Inline Actions This seems to be a leftover from testing :) smeenai: This seems to be a leftover from testing :)
				# CHECK-NEXT: Extern
				# CHECK-NEXT: Type: Section (0xE)
				# CHECK-NEXT: Section: __cstring (0x2)
				# CHECK-NEXT: RefType:
				smeenaiUnsubmitted Not Done Reply Inline Actions We only need to check the properties we care about. Detailed symbol table checking should happen in the symbol table tests. Over here, I think we just care about the symbol name, section, and value, so we can drop the checks for the other fields (Extern, Type, RefType, Flags) We should also be checking for the text section symbols. smeenai: We only need to check the properties we care about. Detailed symbol table checking should…
				int3Unsubmitted Not Done Reply Inline Actions +1 for terser checks. We can also use `llvm-objdump --syms` here -- its output is much more compact and suitable for when we're not checking all the properties int3: +1 for terser checks. We can also use `llvm-objdump --syms` here -- its output is much more…
				KtwuAuthorUnsubmitted Done Reply Inline Actions llvm-objdump --syms is pretty nice, so I'll use that here instead. Ktwu: llvm-objdump --syms is pretty nice, so I'll use that here instead.
				# CHECK-NEXT: Flags [ (0x0)
				# CHECK-NEXT: ]
				# CHECK-NEXT: Value: 0x[[#%X,BASE:]]

				# CHECK: Name: _hello_world
				# CHECK-NEXT: Extern
				# CHECK-NEXT: Type: Section (0xE)
				# CHECK-NEXT: Section: __cstring (0x2)
				# CHECK-NEXT: RefType:
				# CHECK-NEXT: Flags [ (0x0)
				# CHECK-NEXT: ]
				# CHECK-NEXT: Value: 0x[[#BASE + 0x10]]

				.section __TEXT,__text
				.global _goodbye_world
				.global _hello_world
				.global _main
				smeenaiUnsubmitted Not Done Reply Inline Actions We aren't defining these symbols in this file, so the `.global` directives aren't doing anything. smeenai: We aren't defining these symbols in this file, so the `.global` directives aren't doing…

				_main:
				mov $0, %rax
				ret
				smeenaiUnsubmitted Not Done Reply Inline Actions Might be easier to make sense of this as assembly (`llvm-objdump -d`) smeenai: Might be easier to make sense of this as assembly (`llvm-objdump -d`)
				KtwuAuthorUnsubmitted Done Reply Inline Actions Is this a big deal? I compared this output to ld; I don't care about the content so much as making sure it matches what ld outputs. Ktwu: Is this a big deal? I compared this output to ld; I don't care about the content so much as…
				smeenaiUnsubmitted Not Done Reply Inline Actions I think it makes the test a lot more intelligible and easy to verify. Right now, for me, this is just a blob of bytes. If it were written out as instructions, I could verify that it's the instructions in `_some_function` followed by the instructions in `_main` (as it should be). In general, matching ld64's output is important, but we also want our tests to work well standalone. The cstring check below is great because it's easy to tell at a glance that all the strings from the various input files are being combined together (as they should be); using the disassembly will let us do the same for the text section. smeenai: I think it makes the test a lot more intelligible and easy to verify. Right now, for me, this…
				KtwuAuthorUnsubmitted Done Reply Inline Actions Fair enough! Ktwu: Fair enough!

This is an archive of the discontinued LLVM Phabricator instance.

[lld] Merge Mach-O input sectionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 258182

lld/MachO/CMakeLists.txt

lld/MachO/Driver.cpp

lld/MachO/InputFiles.cpp

lld/MachO/InputSection.h

lld/MachO/InputSection.cpp

lld/MachO/OutputSection.h

lld/MachO/OutputSection.cpp

lld/MachO/OutputSegment.h

lld/MachO/OutputSegment.cpp

lld/MachO/Symbols.h

lld/MachO/SyntheticSections.cpp

lld/MachO/Writer.cpp

lld/test/MachO/section-merge.s

[lld] Merge Mach-O input sections
ClosedPublic