This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Support .subsections_via_symbols
ClosedPublic

Authored by int3 on May 14 2020, 1:42 AM.

Details

Summary

This diff restores and builds upon @pcc and @ruiu's initial work on
subsections.

The .subsections_via_symbols directive indicates we can split each
section along symbol boundaries, unless those symbols have been marked
with .alt_entry.

We exercise this functionality in our tests by using order files that
rearrange those symbols.

Depends on D79668.

Diff Detail

Event Timeline

int3 created this revision.May 14 2020, 1:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2020, 1:42 AM
smeenai requested changes to this revision.May 15 2020, 5:07 PM

Looks really good! Putting this back on your queue for some minor comments and questions.

lld/MachO/InputFiles.cpp
146

Nit: would emplace_back let us get rid of one pair of curlies?

153

I think \p only makes sense for Doxygen-style comments (which would have triple slashes)? I'm not very well versed with this though.

157

Nit: make this static

200

I know you're just moving this code here, but we should also add a TODO that the offset of 4 only works for relocations with a length of 4 bytes (i.e. a length field of 2).

213

What's the reason for this?

215

Thanks for the comment!

253

The implication is that the nList entries aren't (necessarily) in address order, right? Otherwise we'd know the final subsection for an alt-entry symbol by the time we reached it.

lld/MachO/InputSection.cpp
45

Nit: now that we're not adjusting addend anywhere, we can just get rid of the variable and use r.addend directly everywhere, right?

lld/test/MachO/subsections-section-relocs.s
10

It'd be nice to dump the string section and confirm the ordering is correct (to make sure we're setting subsection contents directly).

15

Could you add a comment explaining the - 3? I believe it's because L_.str should end up at CSTRING_ADDR + 4, and the leaq instruction is 7 bytes long so the relocation is against ADDR + 7, so you get CSTRING_ADDR + 4 - (ADDR + 7), but that took a bit of working out :)

33

Hmm, how can it emit a symbol relocation against a private symbol? Or does it convert it into a non-private symbol?

In general, ld64 atomizes cstring sections by NUL delimiters instead of by symbols, so we'll need to adjust things if we go down that route. This is good for now though.

lld/test/MachO/subsections-symbol-relocs.s
29

It'd be good to verify the rest of the instructions as well, so we can confirm the subsection contents are correct.

This revision now requires changes to proceed.May 15 2020, 5:07 PM
int3 marked 17 inline comments as done.May 15 2020, 9:16 PM
int3 added inline comments.
lld/MachO/InputFiles.cpp
146

doesn't seem to work:

/Users/jezng/local/llvm-project/lld/MachO/InputFiles.cpp:146:17: error: no matching member function for call to 'emplace_back'
    subsections.emplace_back({0, isec});
    ~~~~~~~~~~~~^~~~~~~~~~~~
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/vector:722:19: note: candidate function [with _Args = <>] not viable: requires 0 arguments, but 1 was provided
        void      emplace_back(_Args&&... __args);

not really sure what that means, but I'm guessing that the initializer list is screwing up the template resolution?

153

Ah, yeah I'm not very familiar with Doxygen either... in any case, this static function probably doesn't merit a Doxygen comment, but I wanted a way to denote a parameter, and typing out "parameter foo" quickly gets repetitive. I guess I could do something like "offset: an offset relative to ..."

213

The symbol we're splitting on may not be at an aligned offset from the first input section. E.g. we may have a section __text aligned to 0x8 and a symbol offset 0x4 from the start of __text.

253

yup. I'll make that explicit

lld/test/MachO/subsections-section-relocs.s
15

yup, you're right that it's not straightforward...

33

it doesn't emit a symbol relocation against a private symbol, but private symbols are omitted from the object file, so we have no way of reordering them...

int3 updated this revision to Diff 264413.May 15 2020, 9:16 PM
int3 marked 4 inline comments as done.

address comments

smeenai accepted this revision.May 18 2020, 3:20 PM

LGTM

lld/MachO/InputFiles.cpp
146

Ah, yeah, I think there's some interesting clashes between template deduction and initializer lists.

213

Hmm. I guess choosing the minimum makes sense in that case? I'd be curious if ld64 does that or just preserves the alignment of the original section (and aligns the subsection accordingly).

This revision is now accepted and ready to land.May 18 2020, 3:20 PM
int3 marked an inline comment as done.May 18 2020, 8:09 PM
int3 added inline comments.
lld/MachO/InputFiles.cpp
213

Good point, I hadn't actually checked what ld64 was doing -- the present minalign behavior was from @pcc's original implementation (with some tweaking).

Some testing suggests that what ld64 is doing is preserving the alignment of the original section, plus adding an offset for each subsection to preserve how far away it is from an aligned address. This is the file I was playing around with to observe this behavior:

.globl _main
.p2align 4
.text
_foo:
  .space 2
_bar:
  .space 1
_baz:

_main:
  ret

.subsections_via_symbols

linked with the order file

_baz
_bar
_foo

_bar ends up at an aligned address + 2, and baz at another aligned address + 3.

I think I'd like to punt on emulating this for now, but I'll add a TODO.

int3 updated this revision to Diff 264774.May 18 2020, 8:17 PM

add TODO

thakis added a subscriber: thakis.May 19 2020, 7:57 AM
thakis added inline comments.
lld/MachO/InputFiles.cpp
318

This breaks the build on Linux: http://45.33.8.238/linux/18060/step_4.txt
And Windows: http://45.33.8.238/win/15558/step_4.txt

Please take a look, and revert if it takes a while to investigate.

This revision was automatically updated to reflect the committed changes.
int3 marked an inline comment as done.May 19 2020, 8:31 AM
int3 added inline comments.
lld/MachO/InputFiles.cpp
318

weird, I'd wouldn't have expected ArrayRef to construct an std::vector :/ reverted for now, thanks

int3 reopened this revision.May 19 2020, 8:52 AM
This revision is now accepted and ready to land.May 19 2020, 8:52 AM
int3 updated this revision to Diff 264935.May 19 2020, 8:53 AM

ArrayRef is immutable so elements don't need to be const

This revision was automatically updated to reflect the committed changes.