This is an archive of the discontinued LLVM Phabricator instance.

[lld] Merge Mach-O input sections
ClosedPublic

Authored by Ktwu on Apr 10 2020, 1:07 PM.

Details

Summary

Similar to other formats, input sections in the MachO implementation are now grouped under output sections. This is primarily a refactor, although there's some new logic (like resolving the output section's flags based on its inputs).

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Looks good at a high level!

lld/MachO/InputSection.h
52–53

LLD ELF has an outSecOff in its InputSections, which tracks the offset of this particular section within its output section. I think that'd be better than storing the absolute address in the InputSection.

lld/MachO/OutputSection.cpp
21

What's this computation doing?

33

We should experiment with how ld64 handles merging hidden sections, or if that's even a thing it does. (I don't think you can specify a section is hidden yourself; ld64 just has a list of atoms it defines to be hidden.)

lld/MachO/OutputSegment.cpp
36–41

I'm wondering if it'd be better to construct an OutputSection independently and then pass that into this function instead.

51–52

Is the check flipped?

Ktwu marked 3 inline comments as done.Apr 14 2020, 4:32 PM
Ktwu added inline comments.
lld/MachO/InputSection.h
52–53

Interesting, I'll look into that.

lld/MachO/OutputSection.cpp
21

It's copying what InputSection did to calculate its file offset. I didn't entirely comprehend this math tbh.

lld/MachO/OutputSegment.cpp
51–52

Oops, yes.

Ktwu updated this revision to Diff 257574.Apr 14 2020, 5:39 PM

Passing most local tests now

Looking good! Needs tests once everything's working :)

lld/MachO/InputSection.h
38

I think getVA would be more in line with the LLD naming for this concept.

51

Similarly, since this is the same notion as LLD ELF's outSecOff, I'd probably stick with that name, just so it's easy to map concepts between the two.

lld/MachO/OutputSection.cpp
21

Okay, this makes sense.

31

Can you add more details to this error message? It'd be ideal to have things like the section in question, the object file it's coming from, etc.

Also, is this what ld64 does?

35

Can you add a TODO for figuring out how we should handle input sections with conflicting hidden-ness?

Ktwu marked 4 inline comments as done.Apr 16 2020, 11:42 AM
Ktwu added inline comments.
lld/MachO/OutputSection.cpp
31

Sure.

So far as I can tell, ld64 doesn't do section merging on a flag level like I first thought; now that I'm diving into it, OutputFile.cpp separates flags into individual boolean attributes. It's not the easiest codebase to navigate :/

Ktwu updated this revision to Diff 258118.Apr 16 2020, 11:53 AM

Added a test; unfortunately symbol export is blocking the text so I need to rebase

int3 added inline comments.Apr 16 2020, 12:25 PM
lld/MachO/OutputSection.cpp
21

The parent segment's start address is defined as the address of the first section it contains. So addr - parent->firstSection()->addr computes the section's offset within the segment. Maybe we don't need this any more if we have outSecOff (going to investigate)

35

Personally I don't think we should support hidden InputSections until we find a use case (I'm not aware of one so far)

int3 added inline comments.Apr 16 2020, 12:50 PM
lld/MachO/OutputSection.cpp
21

Oh, outSecOff is for the InputSection's offset within the OutputSection, but this computation is for the OutputSection's offset within its segment. So having outSecOff doesn't impact this. That said the ELF implementation seems to have an explicit OutputSection::offset field, though I'm not sure what populates it... but I think this scheme of computing the offset from the address is fine for now

Ktwu updated this revision to Diff 258182.Apr 16 2020, 2:53 PM

Rebased, various comments about renames, basic test

Ktwu retitled this revision from WIP [lld] Merge Mach-O input sections to [lld] Merge Mach-O input sections.Apr 16 2020, 3:02 PM
Ktwu edited the summary of this revision. (Show Details)
int3 added inline comments.Apr 16 2020, 3:10 PM
lld/MachO/OutputSection.cpp
35

I.e. I think the synthetic sections could become OutputSections. So the only InputSections would actually be from real inputs, which will never be hidden

Ktwu marked 2 inline comments as done.Apr 16 2020, 7:20 PM
Ktwu added inline comments.
lld/MachO/OutputSection.cpp
35

Yeah I could try that!

Ktwu updated this revision to Diff 258916.Apr 21 2020, 12:51 AM

Refactored to turn synthetic input sections into output sections. A new class represents merged input sections, the MergedOutputSection.

Ktwu updated this revision to Diff 259056.Apr 21 2020, 10:52 AM

Pruned unnecessary hidden checks, coalesced some functions, removed unused imports

Ktwu updated this revision to Diff 259061.Apr 21 2020, 11:03 AM

tweak comment

int3 added inline comments.Apr 21 2020, 7:56 PM
lld/MachO/MergedOutputSection.cpp
20 ↗(On Diff #258916)

bikeshed: Do we want to prefix all member accesses with this->? lld-ELF seems to use it inconsistently, lld-COFF in just a few places... I have a slight preference towards omitting it, but we should be consistent either way

lld/MachO/OutputSection.cpp
26–29

can we just define this method on MergedOutputSection?

lld/MachO/OutputSection.h
21

How does this class hierarchy compare to that in the ELF implementation? I know they have SectionBase, InputSectionBase, and MergeInputSection... was planning to dig into it eventually but curious if you have looked

lld/MachO/OutputSegment.h
32

nit: I think using SectionMap = ... is the more modern C++ way (and is the method favored by lld-ELF/COFF)

41

@ruiu will object to the auto :) I personally hate typing out an std::pair type, but we should at least unpack i.second into a var with a named type

nit 2: use auto & here

alternatively we could replace the loop with std::any_of

43

addOutputSection seems like a more fitting name

44

if sections isn't private any more, it doesn't need an accessor

edit: I see that it's non-private only because we need to sort it. How about making the sorting a method on this class?

Related thought... I see that MapVector has a takeVector method that clears out the map and returns the underlying vector. Maybe we could do that -- have a getSortedSections method that returns an empty vector until we actually sort things. That would mean that the comparator can work on an actual vector & take single elements instead of std::pairs. Just my 2c, might be overcomplicating things here

lld/MachO/SyntheticSections.h
43 ↗(On Diff #258916)

should this field live in the parent class?

edit: Oh I see, OutputSections don't define segname because that's accessed through parent->name, so only InputSections and synthetic sections need to define segnames. How about defining a SyntheticSection superclass that defines this field in its ctor, and have every class in this file inherit from it?

lld/MachO/Writer.cpp
309–310

Oh I see... looks like the problem with my initial sorting scheme is that it assumed that the segments were only ever created once all the sections were sorted. But this didn't account for __LINKEDIT, though I was lucky enough to create it after the sorting, so things worked out... but explicitly sorting both the segments and sections is more reliable. I think we can simplify the order method a bit though -- it's currently written to support comparing sections across different segments, and we're not using that functionality any more. Can we have two separate order methods, one for segments and the other for sections in the same segment?

476

Thanks, I think it's clearer this way

int3 added inline comments.Apr 21 2020, 8:08 PM
lld/MachO/OutputSegment.h
43

also, I think if would make sense if we moved createOutputSection from Writer.cpp to a method on this class, then this helper method can be made private

Ktwu marked 7 inline comments as done.Apr 21 2020, 10:22 PM
Ktwu added inline comments.
lld/MachO/OutputSegment.h
44

Yes to section sorting living here, nay to the takeVector idea (at least in this diff).

lld/MachO/SyntheticSections.h
43 ↗(On Diff #258916)

Yup, a base SyntheticSection class sounds ++

lld/MachO/Writer.cpp
309–310

I was thinking about that, too, so I'll try it.

Ktwu marked 3 inline comments as done.Apr 22 2020, 1:02 PM
Ktwu added inline comments.
lld/MachO/MergedOutputSection.cpp
20 ↗(On Diff #258916)

Good point; if I was rigorous, I think I'd prefer omitting it, too.

lld/MachO/OutputSection.cpp
26–29

Since output segments just contain output sections now, they're not aware of whether they contain synthetic sections or mergable sections. I'm not sure how to cast that away (I guess a dynamic cast at runtime, but I figured this would be cheaper).

lld/MachO/OutputSection.h
21

I hadn't looked too closely tbh.

Ktwu updated this revision to Diff 259376.Apr 22 2020, 1:10 PM
  • Move comparator logic within output segment / section
  • Refactored creation of synthetic classes to use a base class / add to segments within constructor
  • removed instances of "this"

The change looks good, but it needs a bunch more testing, e.g:

  • Various flag merging scenarios (I recognize some of that is in flux right now, but e.g. the pure instructions mergnig)
  • Checking for merging text and data sections as well as cstrings, including checking the merging of the contents as well as the symbols
  • Perhaps something for the section sorting, since you're changing that (though perhaps the existing tests for that already provide enough coverage)
lld/MachO/ExportTrie.h
27 ↗(On Diff #259376)

@Ktwu, if these changes were incorporated in D76977, would you not need to depend on that? That one's a meatier review than this one, so it'd be easier to get this in first if possible.

lld/MachO/MergedOutputSection.cpp
33 ↗(On Diff #259376)

Super nit: isecAddr is a bit more understandable IMO.

36 ↗(On Diff #259376)

Super nit: InputSection loop variables are usually isec

51 ↗(On Diff #259376)

Could you elaborate on what might be wrong in this comment?

65 ↗(On Diff #259376)

segment -> section

66 ↗(On Diff #259376)

If I'm understanding this correctly, it'll end up negating the pure bit when you don't want it to. If both inputFlags and flags have the bit set, the result of the AND for that bit will be 1, so it'll be 0 in your mask, so it'll get unset.

lld/MachO/MergedOutputSection.h
19 ↗(On Diff #259376)

Could you add a class comment about what this represents?

I'd prefer renaming OutputSection to OutputSectionBase and MergedOutputSection to just OutputSection, to be more in line with ELF, but I don't feel super strongly about that.

lld/MachO/OutputSection.h
21

MergeInputSection is for the ELF SHF_MERGE concept, where a linker can merge the objects inside a section ... it's used for strings, for example (where the linker can perform string merging and tail merging). I think ld64 just treats certain sections types as mergeable instead of having a special flags for that.

22

Can you add a class comment?

27–28

Why do these need to be virtual?

42

Is this needed for anything other than MergedOutputSection?

lld/MachO/OutputSegment.cpp
36–41

If this is gonna be called often, we should either do some sort of caching, or else just do the computation as part of addOutputSection and make it a public member variable.

38

auto &i, and same comment about unpacking i.second

lld/MachO/OutputSegment.h
32

This one needs to be addressed.

62

Given that this might be the common case, would it make sense to cache the find somehow?

lld/MachO/SyntheticSections.h
165–169 ↗(On Diff #259376)

I believe the in is short for "globally accessibly input sections". LLD ELF has a corresponding struct called Out for output sections (in OutputSections.h), and given that our synthetic sections are output sections now, it might make sense to adopt that name.

lld/MachO/Writer.cpp
402–415

Nit: might be nicer to assign these to variables instead of chaining.

466

Nit: use an explicit type instead of auto

lld/test/MachO/section-merge.s
4

It'd be nice to check that the contents of the sections are merged correctly as well. Also, we should be checking that text segments are merged correctly, since that's the most common case.

32–33

We aren't defining these symbols in this file, so the .global directives aren't doing anything.

int3 added a comment.Apr 24 2020, 3:16 PM

@smeenai's comments aside, it looks pretty good to me :)

lld/MachO/MergedOutputSection.h
19 ↗(On Diff #259376)

+1 on the renaming

lld/MachO/OutputSection.h
77–78

"which stores them in a MapVector of section name -> section" seems clearer. I think it's worth pointing out we have a MapVector since it doesn't make sense to sort any other kind of map

79–80

nit: take by const ref

84

I think operator< typically returns a bool

lld/MachO/SyntheticSections.cpp
34

ultra nit: how about "Synthetic sections always know which segment they belong to, so hook them up when they're made"? "No need to orphan" seems a bit weird because it hints that one might naively want to orphan them but doesn't indicate why

Ktwu marked 11 inline comments as done.Apr 24 2020, 4:31 PM
Ktwu added inline comments.
lld/MachO/ExportTrie.h
27 ↗(On Diff #259376)

As discussed in the meeting, section merging depends on being able to have more than one symbol at a time for testing. The export trie is a hard dependency if we want to test this properly :/

lld/MachO/MergedOutputSection.h
19 ↗(On Diff #259376)

Sure.

Ktwu added inline comments.Apr 24 2020, 4:31 PM
lld/MachO/MergedOutputSection.cpp
33 ↗(On Diff #259376)

Sure

51 ↗(On Diff #259376)

I'll add this to the comment, but in short, there's no research that's gone into validating what the merge behavior for flags like this ought to be (it's also why there are no tests for flag merging).

65 ↗(On Diff #259376)

Oops thanks.

66 ↗(On Diff #259376)

Ahhh nice catch (and ideally one that a test will confirm). I believe what I want is:

uint32_t pureMask = ~MachO::S_ATTR_PURE_INSTRUCTIONS | (inputFlags & flags);
lld/MachO/OutputSection.h
27–28

ah, they don't, it's refactoring cruft

42

No, it's not needed except for MergedOutputSection. Should I dynamically cast each output section before trying to merge an input section into it instead (I wanted to avoid the runtime hit doing that).

lld/MachO/OutputSegment.cpp
36–41

I believe I hit issues trying to cache this on adding an OutputSection; I think the GotSection made things difficult since its isNeeded() attribute is dynamic.

lld/MachO/OutputSegment.h
62

Good idea!

smeenai added inline comments.Apr 24 2020, 5:16 PM
lld/MachO/OutputSection.h
42

Yeah, dynamically casting wouldn't be ideal. We could leave this as-is, or make getOrCreateOutputSection return a MergedOutputSection * instead of an OutputSection * (idk if that'd cause other complications).

Ktwu marked 9 inline comments as done.Apr 27 2020, 3:16 PM
Ktwu added inline comments.
lld/MachO/OutputSection.h
42

I believe that would require having an assert that the output class being returned, if already created, is in fact a mergeable section. I think a static_cast would be what we'd want, but I like having the explicit assert here in case something goes wrong instead of the undefined behavior of a static_cast gone wrong.

lld/MachO/Writer.cpp
402–415

I personally prefer chaining :D

466

What's up with auto? It's not like it's forbidden from use in the style guide:

https://llvm.org/docs/CodingStandards.html#id27

I'm curious about folks' reasoning behind its usage (or discouragement of).

Ktwu updated this revision to Diff 260486.Apr 27 2020, 3:53 PM
Ktwu marked an inline comment as done.

Comments. More significantly, added more checks to verify __text merging.

smeenai added inline comments.Apr 28 2020, 12:23 PM
lld/MachO/OutputSection.h
42

Ah. The assert would require an isa, which is basically the same overhead as a dyn_cast, so perhaps just leaving it as-is is best for now.

lld/MachO/Writer.cpp
402–415

Haha, fair enough.

466

Good question. LLD has its own style guidelines (e.g. variables are lowerCamelCase). Unfortunately, I don't think those are codified anywhere, but I'm basing this on what I've seen in reviews.

auto is discouraged unless the actual type is spelled out in the same expression somewhere (e.g. as a result of a cast) or is a huge pain to type out (e.g. iterators). Otherwise explicit types are preferred. I personally like that because I don't use an IDE (and I haven't set up any ctags or LSP-like things for my editor), so having explicit types available makes it easier for me to comprehend the code.

It's a little less clear-cut in cases like

auto linkEditSegment = getOrCreateOutputSegment(segment_names::linkEdit);

where it'd be pretty fair to assume that getOrCreateOutputSegment returns an OutputSegment *. It's a bit ambiguous whether it'd be a pointer, reference, or copy, but you could use auto * to disambiguate that. Nevertheless, it's not too much more typing to just spell the name out, so I'd prefer to err on the side of explicitness there.

There was a post to the mailing list about auto usage in LLVM a while back, but there was no clear resolution: http://llvm.1065342.n5.nabble.com/llvm-dev-RFC-Modernizing-our-use-of-auto-td123947.html (the authors on that thread are bogus; if you want the original authors, look for that subject on http://lists.llvm.org/pipermail/llvm-dev/2018-November/ and http://lists.llvm.org/pipermail/llvm-dev/2018-December/)

smeenai added inline comments.Apr 28 2020, 5:02 PM
lld/MachO/MergedOutputSection.cpp
39 ↗(On Diff #260486)

@int3 is changing this to take the section's alignment into account in D79050, so this should follow suit.

47 ↗(On Diff #260486)

Same here.

lld/MachO/Writer.cpp
430

Would it make sense to have each OutputSection store its fileOff (in addition to or instead of the OutputSegment holding it), so that we don't need to recompute it in the writeSections loop below?

There's also the comment in D79050 about setting the alignment in the SyntheticSection base class once we have that.

I'm assuming the diff hasn't been updated with the OutputSection -> OutputSectionBase and MergedOutputSection -> OutputSection renaming yet (assuming you're good with that).

lld/MachO/MergedOutputSection.cpp
66 ↗(On Diff #259376)

Yup, looks good.

lld/MachO/SyntheticSections.h
165–169 ↗(On Diff #259376)

This one still needs addressing, though I'm okay with doing it in a follow-up if you'd prefer.

lld/test/MachO/section-merge.s
46

Might be easier to make sense of this as assembly (llvm-objdump -d)

Ktwu marked 6 inline comments as done and an inline comment as not done.Apr 29 2020, 10:10 PM
Ktwu added inline comments.
lld/MachO/MergedOutputSection.h
19 ↗(On Diff #259376)

Er, actually, I like MergedOutputSection because it conveys more information about what kind of OutputSection it is. Given how many of the other classes used don't use -Base in their name -- InputSections, OutputSegments -- it feels clunky to have an OutputSectionBase.

@int3 do you feel strongly about renaming this?

lld/test/MachO/section-merge.s
46

Is this a big deal? I compared this output to ld; I don't care about the content so much as making sure it matches what ld outputs.

Ktwu updated this revision to Diff 261130.Apr 29 2020, 10:13 PM
Ktwu marked an inline comment as not done.

Rebased, aligned file offsets, and tested with UBSAN.

Ktwu updated this revision to Diff 261131.Apr 29 2020, 10:16 PM

clang-format

smeenai added inline comments.Apr 29 2020, 10:49 PM
lld/test/MachO/section-merge.s
46

I think it makes the test a lot more intelligible and easy to verify. Right now, for me, this is just a blob of bytes. If it were written out as instructions, I could verify that it's the instructions in _some_function followed by the instructions in _main (as it should be).

In general, matching ld64's output is important, but we also want our tests to work well standalone. The cstring check below is great because it's easy to tell at a glance that all the strings from the various input files are being combined together (as they should be); using the disassembly will let us do the same for the text section.

smeenai added inline comments.Apr 29 2020, 10:53 PM
lld/MachO/MergedOutputSection.h
19 ↗(On Diff #259376)

Sure, I'm good with the parent class being named OutputSection and having specific subclasses representing the different types of output sections. I'm not the biggest fan of the MergedOutputSection name because to me it suggests output sections being merged together rather than input sections being merged into a single output section, but I can't think of anything better either, so I'm good with it if we can't come up with a better name.

Naming is hard :D

int3 added inline comments.Apr 30 2020, 3:53 AM
lld/MachO/MergedOutputSection.h
19 ↗(On Diff #259376)

I prefer the rename because it more closely parallels what lld-ELF has. Moreover ELF's MergeInputSection is quite different, so it's almost like a false parallel...

Re conveying information, my mental model has been that OutputSections are by default mergeable, and only the special SyntheticSections aren't, so I don't feel that there's a need to explicitly call out the mergeability.

-Base being a clunky, non-functionally-descriptive suffix is a fair point though...

One thing I just thought about (sorry :/). How do we want to handle sections in input files with the same name as our special sections? For example, what if the user gives us an input file with the section __DATA_CONST,__got? ld64 appears to handle this fine; it just combines the user-provided section with its own synthesized one. It also handles the case where a user input file has a __TEXT,__mach_header section, and treats it as distinct from its own hidden synthesized section with that name.

lld/MachO/MergedOutputSection.h
19 ↗(On Diff #261131)

The comment looks great!

Super nit: LLD always formats these types of comments using single-line comments (//) instead of /* */, so we should follow suit here.

19 ↗(On Diff #259376)

To be fair, I don't think LLD ELF has an OutputSectionBase. It does have an InputSectionBase though.

The hierarchy is also different cos ELF synthetic sections are input sections (so they can be manipulated by linker scripts), whereas ours are output sections, so it's a bit hard to compare.

lld/MachO/OutputSection.cpp
23

We should only hit this if we have a programming error on our end, right? If so, this should be llvm_unreachable instead of error.

lld/MachO/OutputSection.h
22

Comment looks great! Same nit about using single-line comments.

lld/MachO/OutputSegment.cpp
36–41

Ah, makes sense.

47

Should we assert that sections[os->name] doesn't already exist?

51

FWIW, auto is fine for iterators, but this is fine too.

lld/MachO/OutputSegment.h
62

This one isn't addressed, but given that we shouldn't have too many segments (besides the ones already in the map, I can only think of __DATA), perhaps this is okay as-is?

lld/MachO/SyntheticSections.cpp
36

Super nit: use a member initialization list instead.

69

@int3 how come these strings are just here directly vs. all the other ones being named constants?

Also, not this diff, but is __DATA_CONST correct? ld64 puts this in __DATA instead as far as I can see. It's not quite constant cos the dynamic linker's gonna fill it in, though idk if it has the equivalent to ELF's RELRO.

lld/test/MachO/section-merge.s
16

We only need to check the properties we care about. Detailed symbol table checking should happen in the symbol table tests. Over here, I think we just care about the symbol name, section, and value, so we can drop the checks for the other fields (Extern, Type, RefType, Flags)

We should also be checking for the text section symbols.

int3 added inline comments.Apr 30 2020, 7:21 PM
lld/test/MachO/section-merge.s
16

+1 for terser checks. We can also use llvm-objdump --syms here -- its output is much more compact and suitable for when we're not checking all the properties

int3 added inline comments.Apr 30 2020, 7:23 PM
lld/MachO/SyntheticSections.cpp
69

I don't think we're currently referencing this from anywhere else (in particular I didn't give them an order in the sorting comparator), so it wasn't technically necessary, though we could definitely make them constants for the sake of uniformity

Ktwu marked 11 inline comments as done.Apr 30 2020, 11:45 PM
Ktwu added inline comments.
lld/MachO/MergedOutputSection.h
19 ↗(On Diff #261131)

D:

lld/MachO/OutputSection.cpp
23

Yup, since we're assuming that synthetic sections cannot be merged, this shouldn't happen.

lld/MachO/OutputSegment.h
62

Ah, no, I legit forgot to address this, so I don't mind getting to it...

lld/MachO/SyntheticSections.cpp
69

I'll make 'em a constant for now; can we deal with DATA vs DATA_CONST in another diff if need be?

lld/test/MachO/section-merge.s
16

llvm-objdump --syms is pretty nice, so I'll use that here instead.

46

Fair enough!

Ktwu updated this revision to Diff 261433.Apr 30 2020, 11:46 PM
Ktwu marked 2 inline comments as done.
  • initialization list for SyntheticSection
  • caching position for "default" sections
  • objdump instead of readobj in test
  • constants for __DATA_CONST
  • llvm_unreachable
int3 added inline comments.May 1 2020, 12:50 AM
lld/MachO/SyntheticSections.cpp
69

Oh sorry I missed the 2nd part of the comment about the segment. I'm pretty sure it's in __DATA_CONST at least on Catalina; just tried it out. But yeah we can deal with it in another diff if necessary

int3 marked an inline comment as done.May 1 2020, 2:58 AM
int3 added inline comments.
lld/MachO/MergedOutputSection.h
19 ↗(On Diff #261131)

Not sure if that emoji was in reaction to my naming nit, but please feel free to stick with what you have. I was just giving my 2c, but it's your diff :)

smeenai added inline comments.May 1 2020, 1:54 PM
lld/MachO/SyntheticSections.cpp
69

Ah, sorry, I was just wondering why these weren't constants. I didn't mean to imply that you had to do it in this diff, but thanks for taking care of it :)

@int3 interesting, this appears to be a Catalina vs older OS thing. Possibly related to dyld3?

smeenai accepted this revision.May 1 2020, 1:56 PM

Looks great! @int3, any other comments?

lld/test/MachO/section-merge.s
12

This seems to be a leftover from testing :)

This revision is now accepted and ready to land.May 1 2020, 1:56 PM
int3 added a comment.May 1 2020, 2:19 PM

Yeah this lgtm, let's ship it (after rebasing). Pretty sure D78168: [lld-macho][rfc] Have Symbol::getVA() return a non-relative virtual address causes a rebase conflict, not sure if there's anything else...

int3 accepted this revision.May 1 2020, 2:19 PM
Ktwu updated this revision to Diff 261554.May 1 2020, 2:52 PM

Remove testing cruft, oops

One thing I just thought about (sorry :/). How do we want to handle sections in input files with the same name as our special sections? For example, what if the user gives us an input file with the section __DATA_CONST,__got? ld64 appears to handle this fine; it just combines the user-provided section with its own synthesized one. It also handles the case where a user input file has a __TEXT,__mach_header section, and treats it as distinct from its own hidden synthesized section with that name.

For a large internal binary, I confirmed that none of the input section names clashed with our synthetic section names, so I think it's pretty safe to just error out if we run into that.

This revision was automatically updated to reflect the committed changes.
int3 marked 2 inline comments as done.May 5 2020, 3:56 PM

Noticed a couple of minor things while rebasing on top of this. I've folded changes for them into D78270: [lld-macho] Support calls to functions in dylibs.

lld/MachO/OutputSection.h
41

IMO this should just be return true. Whether a section is hidden is orthogonal from whether it is needed: hidden sections will never have a header regardless of whether they have a body. (I know we override this method with return false for synthetic sections, but regardless I think it's confusing to write it this way for non-synthetic sections.)

lld/MachO/Writer.cpp
118–119

I think this should stay as getSections().empty(), and the check for isNeeded() should be moved outside writeTo(). We should just not create LCSegment commands for unneeded segments. The empty() check however is still needed because __LINKEDIT can be empty.

451–452

we should avoid writing unneeded output sections (My stacked diff makes that assumption for one of the stub helper synthetic section)

Noticed a couple of minor things while rebasing on top of this. I've folded changes for them into D78270: [lld-macho] Support calls to functions in dylibs.

Ideally they'd be separated into their own small changes so that it's easier to review them.

If it's changes you feel pretty confident about, it's fine to just commit them directly (for post-commit review). If it's changes where you want input, putting them up for review separately makes it easier to give focused feedback.

int3 added a comment.EditedMay 5 2020, 5:23 PM

Fair enough, I was being lazy... I'd made those changes while fixing rebase conflicts + getting tests to pass, so it was convenient that way. I'll try to untangle them

int3 added a comment.May 5 2020, 5:36 PM

Alright, put up D79460

int3 added inline comments.May 6 2020, 2:53 AM
lld/MachO/OutputSegment.h
44

I was looking at the implementation of MapVector today and I realized that the sort only operates on the vector and not the map, so the container exhibits some questionable behavior after sorting :D

MapVector<int, int> mv;
mv[2] = 98;
mv[1] = 99;
std::sort(mv.begin(), mv.end());
for (int i = 1; i <= 2; ++i)
  fprintf(stderr, "%d %d\n", i, mv[i]);

This prints

1 98
2 99

So I think we should really use takeVector before sorting. Two more refactoring ideas to tack on to that:

  1. We could filter out the unneeded OutputSections as this stage too, so we don't have to worry about checking isNeeded() afterward.
  2. Maybe we could consider not creating the OutputSegments till after the sorting/filtering has been done, so we don't have the OutputSegments in a state where some operations aren't valid. But that would probably mean an additional outputSections global, so there's some tradeoff there. Up to you