Page MenuHomePhabricator

[lld-macho] Support .subsections_via_symbols

Authored by int3 on Thu, May 14, 1:42 AM.



This diff restores and builds upon @pcc and @ruiu's initial work on

The .subsections_via_symbols directive indicates we can split each
section along symbol boundaries, unless those symbols have been marked
with .alt_entry.

We exercise this functionality in our tests by using order files that
rearrange those symbols.

Depends on D79668.

Diff Detail

Event Timeline

int3 created this revision.Thu, May 14, 1:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptThu, May 14, 1:42 AM
smeenai requested changes to this revision.Fri, May 15, 5:07 PM

Looks really good! Putting this back on your queue for some minor comments and questions.


Nit: would emplace_back let us get rid of one pair of curlies?


I think \p only makes sense for Doxygen-style comments (which would have triple slashes)? I'm not very well versed with this though.


Nit: make this static


I know you're just moving this code here, but we should also add a TODO that the offset of 4 only works for relocations with a length of 4 bytes (i.e. a length field of 2).


What's the reason for this?


Thanks for the comment!


The implication is that the nList entries aren't (necessarily) in address order, right? Otherwise we'd know the final subsection for an alt-entry symbol by the time we reached it.


Nit: now that we're not adjusting addend anywhere, we can just get rid of the variable and use r.addend directly everywhere, right?


It'd be nice to dump the string section and confirm the ordering is correct (to make sure we're setting subsection contents directly).


Could you add a comment explaining the - 3? I believe it's because L_.str should end up at CSTRING_ADDR + 4, and the leaq instruction is 7 bytes long so the relocation is against ADDR + 7, so you get CSTRING_ADDR + 4 - (ADDR + 7), but that took a bit of working out :)


Hmm, how can it emit a symbol relocation against a private symbol? Or does it convert it into a non-private symbol?

In general, ld64 atomizes cstring sections by NUL delimiters instead of by symbols, so we'll need to adjust things if we go down that route. This is good for now though.


It'd be good to verify the rest of the instructions as well, so we can confirm the subsection contents are correct.

This revision now requires changes to proceed.Fri, May 15, 5:07 PM
int3 marked 17 inline comments as done.Fri, May 15, 9:16 PM
int3 added inline comments.

doesn't seem to work:

/Users/jezng/local/llvm-project/lld/MachO/InputFiles.cpp:146:17: error: no matching member function for call to 'emplace_back'
    subsections.emplace_back({0, isec});
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/vector:722:19: note: candidate function [with _Args = <>] not viable: requires 0 arguments, but 1 was provided
        void      emplace_back(_Args&&... __args);

not really sure what that means, but I'm guessing that the initializer list is screwing up the template resolution?


Ah, yeah I'm not very familiar with Doxygen either... in any case, this static function probably doesn't merit a Doxygen comment, but I wanted a way to denote a parameter, and typing out "parameter foo" quickly gets repetitive. I guess I could do something like "offset: an offset relative to ..."


The symbol we're splitting on may not be at an aligned offset from the first input section. E.g. we may have a section __text aligned to 0x8 and a symbol offset 0x4 from the start of __text.


yup. I'll make that explicit


yup, you're right that it's not straightforward...


it doesn't emit a symbol relocation against a private symbol, but private symbols are omitted from the object file, so we have no way of reordering them...

int3 updated this revision to Diff 264413.Fri, May 15, 9:16 PM
int3 marked 4 inline comments as done.

address comments

smeenai accepted this revision.Mon, May 18, 3:20 PM



Ah, yeah, I think there's some interesting clashes between template deduction and initializer lists.


Hmm. I guess choosing the minimum makes sense in that case? I'd be curious if ld64 does that or just preserves the alignment of the original section (and aligns the subsection accordingly).

This revision is now accepted and ready to land.Mon, May 18, 3:20 PM
int3 marked an inline comment as done.Mon, May 18, 8:09 PM
int3 added inline comments.

Good point, I hadn't actually checked what ld64 was doing -- the present minalign behavior was from @pcc's original implementation (with some tweaking).

Some testing suggests that what ld64 is doing is preserving the alignment of the original section, plus adding an offset for each subsection to preserve how far away it is from an aligned address. This is the file I was playing around with to observe this behavior:

.globl _main
.p2align 4
  .space 2
  .space 1



linked with the order file


_bar ends up at an aligned address + 2, and baz at another aligned address + 3.

I think I'd like to punt on emulating this for now, but I'll add a TODO.

int3 updated this revision to Diff 264774.Mon, May 18, 8:17 PM

add TODO

thakis added inline comments.

This breaks the build on Linux:
And Windows:

Please take a look, and revert if it takes a while to investigate.

This revision was automatically updated to reflect the committed changes.
int3 marked an inline comment as done.Tue, May 19, 8:31 AM
int3 added inline comments.

weird, I'd wouldn't have expected ArrayRef to construct an std::vector :/ reverted for now, thanks

int3 reopened this revision.Tue, May 19, 8:52 AM
This revision is now accepted and ready to land.Tue, May 19, 8:52 AM
int3 updated this revision to Diff 264935.Tue, May 19, 8:53 AM

ArrayRef is immutable so elements don't need to be const

This revision was automatically updated to reflect the committed changes.