This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Support -order_file
ClosedPublic

Authored by int3 on May 9 2020, 3:07 AM.

Details

Summary

The order file indicates how input sections should be sorted within each output
section, based on the symbols contained within those sections.

This diff sets the stage for implementing and testing
.subsections_via_symbols, where we will break up InputSections by each symbol
and sort them more granularly.

Depends on D79228.

Diff Detail

Event Timeline

int3 created this revision.May 9 2020, 3:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2020, 3:08 AM
int3 updated this revision to Diff 263008.May 9 2020, 3:10 AM

update

int3 updated this revision to Diff 263009.May 9 2020, 3:12 AM

update

int3 updated this revision to Diff 264127.May 14 2020, 4:55 PM

rebase + support archive files

smeenai requested changes to this revision.May 15 2020, 2:10 PM

Looks great; it's a really clean implementation :) Just had a few comments and questions; requesting changes to put it back on your queue.

lld/MachO/Config.h
31

From what I understand of https://llvm.org/docs/ProgrammersManual.html#dss-stringmap, StringMap copies key strings over into its own storage. That's great when you want it to take ownership of the key strings, but all our strings here should already be part of the global allocator pool. I think DenseMap<StringRef, SymbolPriorityEntry> would work better for us here.

In the process, I went down a bit of a rabbit hole on LLVM's hash tables and CachedHashStringRef, and now I'm confused about LLD's usage of the latter :D I'll probably post to llvm-dev.

38

Grammar nit: "naturally end up" -> "naturally ends up"

40

It'd be helpful to have a comment explaining what these represent.

lld/MachO/Driver.cpp
132

I'm assuming the names are the same ld64 uses? ("arm" in particular.)

Also, considering these are string constants and there's therefore no ownership concerns, a DenseSet<StringRef> might be better, for the same reasons as discussed above with the DenseMap? (Tbh, for this number of entries, just a linear search might be faster still, but that's definitely getting into unnecessary premature optimization territory :)).

168

Add a TODO or other comment to serve as a reminder to change this when we add support for other architectures?

171

It's valid (albeit unlikely) for a symbol name to end with .o. It's also valid (albeit unlikely) for a symbol to have the same name as an architecture.

Can we keep track of fields explicitly instead? For each line, build up its colon delimited fields. If you have more than three fields, it's an error. If you have three fields, the first one should be an architecture (which we should verify), the second one should be an object file (which we should verify), and the third one should be the symbol name. If you have only one field, it's the symbol name. And so on.

lld/MachO/InputFiles.cpp
326

Ah, I didn't realize we'd need to copy over the symbols too. Given that, making a global objFiles vector (or other similar solution to not have to do this copying) would be a good follow-up.

lld/MachO/Writer.cpp
323

ELF iterates over the global symbols from the symbol table and only looks at the input files for local symbols. I'm not entirely sure why it's done that way ... perhaps it's so that for weak symbols, the ordering reflects the final symbol resolution?

As in, let's say foo.o defines importantFunction weakly and bar.o defines it strongly. With the input file search, we'd order both foo.o and bar.o for importantFunction. With the symbol table search, we'd only order bar.o.

We don't have weak symbols right now, so we can just add a TODO and punt on it for now. It'd be interesting to look at how ld64 handles such weak symbol ordering scenarios though.

lld/test/MachO/order-file.s
5

You have a : after .text, so that's gonna create a label called .text instead of being the .text directive.

18

For all these tests, can you pass the object files in both orders, to confirm that the order file does the right thing regardless of the order of object files on the command line?

32

Can we also test the FOO-FIRST case for archives?

34

Can you add a test case for the order file being something like

foo.o:_foo
_main

to make sure the object file matching works in the positive case as well? (We're only checking the negative case above, with bar.o.) It's indirectly tested by the ordering tests below, but it'd be nice to have an explicit test too.

62

Probably worth adding a comment about what's going on here ... it took me a bit to understand it.

With multiple symbols pointing to the same location, I think it's technically arbitrary which one llvm-objdump chooses to display, but it seems like it's just picking the first name, so that works for us.

Is the intention here that _foo and _bar are different symbols in the same section (and they'd be split into their own sections once we have subsections via symbols), or that they're aliases to the same symbol (and therefore ordering one should order the other)?

This revision now requires changes to proceed.May 15 2020, 2:10 PM
int3 marked 4 inline comments as done.May 15 2020, 4:41 PM
int3 added inline comments.
lld/MachO/Config.h
41

I guess I should change this to a DenseMap too

lld/MachO/Driver.cpp
132

I'm assuming the names are the same ld64 uses? ("arm" in particular.)

yup

171

I think ld64 makes these assumptions as well (that symbol names in the order file will never end with an .o), but yeah, I'll try and make this more robust

lld/test/MachO/order-file.s
62

they're aliases to the same symbol (and therefore ordering one should order the other)?

This latter one is the reason. The subsections code actually looks for and avoids creating a new subsection when two symbols alias the same location.

int3 marked 13 inline comments as done.May 15 2020, 8:32 PM
int3 added inline comments.
lld/test/MachO/order-file.s
62

I'll make _bar non-global so the objdump output will be unambiguous

int3 updated this revision to Diff 264409.May 15 2020, 8:32 PM

address comments

smeenai accepted this revision.May 18 2020, 3:07 PM

Thanks for changing the parsing! LGTM with the comments addressed.

lld/MachO/Driver.cpp
131

I *think* this'll end up creating a global constructor, which is unfortunate. Would a std::array suffice, either globally or as a function static? I think it'd let us avoid that. Or even an llvm::ArrayRef, perhaps? (Haven't thought about that too much.)

Also, I'm assuming the DenseSet can't just be directly constructed with the names :(

182

This could either be the object file or the architecture. We'll need some additional checking here (I think assuming it's an architecture if it doesn't end with .o will just do the right thing, cos then the later check will bail out).

This revision is now accepted and ready to land.May 18 2020, 3:07 PM
int3 marked 2 inline comments as done.May 18 2020, 4:10 PM
int3 added inline comments.
lld/MachO/Driver.cpp
131

DenseSet can be directly constructed; this is defined outside isArchName so that the error message can list out all the valid archs

182

oh yeah, good catch

int3 updated this revision to Diff 264765.May 18 2020, 6:40 PM

address comments

This revision was automatically updated to reflect the committed changes.
smeenai added inline comments.May 19 2020, 1:25 PM
lld/MachO/Driver.cpp
182

Hah, a ternary operator as an lvalue ... neat. It's been a long time since I've seen that.

Useless language trivia: ternary operators can only be lvalues in C++, not C.