Page MenuHomePhabricator

[lld-macho] Fix duplicate symbols with relocatable objects
AcceptedPublic

Authored by keith on Nov 30 2021, 8:28 PM.

Details

Reviewers
thakis
oontvoo
Group Reviewers
Restricted Project
Summary

In the case your framework bundles contain relocatable objects, and your
objects include LC_LINKER_OPTIONs for the framework, previously they
would not be deduplicated like they would have if they were static
archives. This was also the case if you passed -framework for the
framework as well.

Diff Detail

Event Timeline

keith created this revision.Nov 30 2021, 8:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2021, 8:28 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
keith requested review of this revision.Nov 30 2021, 8:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2021, 8:28 PM
keith added a comment.Nov 30 2021, 8:29 PM

Hoping for some feedback here. This feels pretty weird in the case that you actually passed the same object multiple times, and you wanted that to surface duplicate symbol errors, but maybe that isn't a big deal.

keith added a comment.Nov 30 2021, 9:01 PM

Ah yea the tests confirm the concern about duplicate symbols in the invalid case.

thakis accepted this revision.Dec 1 2021, 5:46 AM
thakis added a subscriber: thakis.

lg

This revision is now accepted and ready to land.Dec 1 2021, 5:46 AM
oontvoo added a subscriber: oontvoo.Dec 1 2021, 6:33 AM
oontvoo added inline comments.
lld/MachO/Driver.cpp
255

nit: no longer applying only to archives

lld/test/MachO/framework-object.ll
6

llvm-mc

12

could you spell out the output here to be either /dev/null or %t/... ?
(without it, it'd put the output in location not actually writeable on our build machines and that'd cause problems during integrates)

oontvoo requested changes to this revision.Dec 1 2021, 8:34 AM
oontvoo added inline comments.
lld/MachO/Driver.cpp
324–341

On a second thought, this is suspicious. Doesn't this mean %lld foo.o foo.o wouldn't yield "duplicate symbols" error anymore?

This revision now requires changes to proceed.Dec 1 2021, 8:34 AM
keith added inline comments.Dec 1 2021, 11:52 AM
lld/MachO/Driver.cpp
324–341

Yea I mentioned this with a top level comment as well, it does break the related test. I need to find a better heuristic for skipping these, since we only want to do it for LC_LINKER_OPTION's / duplicate -framework commands etc (not sure if the latter is handled today or not, especially as it conflicts with -framework foo -weak_framework foo etc).

I was hoping to do this specifically in the LC_LINKER_OPTION logic to avoid changing things here, but the issue is the order of hitting those options vs the command line options depends on the order things are passed, so I couldn't rely on that.

oontvoo added inline comments.Dec 1 2021, 12:08 PM
lld/MachO/Driver.cpp
324–341

Ah, I've missed the previous comments. Sorry!

In any case, I can think of a contrived example where you have the input obj files (standalone or in archives) containing no symbols but contributing to global ctors. In that case, wouldn't we want the side effect of these being loaded individually?

keith updated this revision to Diff 391177.Dec 1 2021, 5:59 PM

Add complex solution for discussion

keith updated this revision to Diff 391178.Dec 1 2021, 6:00 PM

Formatting fixes

keith added a comment.Dec 1 2021, 6:03 PM

I pushed another solution here that is pretty complex but I think covers the intended logic. The gist is:

  1. If something is loaded implicitly, that can happen any number of times
  2. If something is loaded implicitly, and then explicitly, that's fine, the implicit loads can still happen any number of times
  3. If something is loaded explicitly, and then implicitly, that's fine, the implicit loads can still happen any number of times
  4. If something is loaded explicitly multiple times, that is not fine, regardless of implicit loads

Note that for 2 and 3 the only difference is order based on whether the object file comes first or the -framework flag.

I would be interested in hearing how other folks think we can simplify this

int3 added a subscriber: int3.Dec 1 2021, 6:06 PM
int3 added inline comments.
lld/MachO/Driver.cpp
262–263

this seems a bit awkward since a) loadedObjects now contains both objects and archives and b) we are querying this map even if the file ends up being a dylib. Can we have separate loadedArchives and loadedObjects maps?

324–341

I don't think it'd be terribly hard to handle this, we'd just need an extra boolean to addFile to indicate if a file is being loaded via add{Framework,Library} or if it's being loaded directly via a path. That said... I'm not sure there's much value in handling this edge case. I'm more than happy to just dedup everything and document this behavioral difference in ld64-vs-lld.rst.

I can think of a contrived example where you have the input obj files (standalone or in archives) containing no symbols but contributing to global ctors

What does 'contributing to global ctors' mean exactly?

lld/test/MachO/framework-object.ll
2

not sure this test deserves its own file... seems like we could fold it into framework.s and lc-linker-option.ll

int3 added a comment.Dec 1 2021, 6:07 PM

Commented before reading your latest changes, but my opinion is unchanged :)

keith added inline comments.Dec 1 2021, 6:27 PM
lld/MachO/Driver.cpp
324–341

FWIW tensorflow is how I discovered this. They produce an object file in a framework and you get corresponding linker options loading it if you include it from swift. So our app doesn't link in that case. On the swift side there is a private flag to exclude the linker option, but that's not ideal

int3 added inline comments.Dec 1 2021, 6:33 PM
lld/MachO/Driver.cpp
324–341

I'm not sure there's much value in handling this edge case

Just to be clear, I meant there's no much value in raising duplicate symbol errors when an object file is specified twice on the command line. We should definitely handle object files in frameworks that get loaded multiple times

oontvoo added inline comments.Dec 1 2021, 6:47 PM
lld/MachO/Driver.cpp
324–341

Just to be clear, I meant there's no much value in raising duplicate symbol errors when an object file is specified twice on the command line.

Ok, agreed there's no value in raising duplicate error (that's just the side effect of not loading the file again)

What does 'contributing to global ctors' mean exactly?

The "contrived" case I'm talking about is this:

// main.c
#include <stdio.h>
void a_ctor() {
  // do stuff
  printf("ctor called\n");
}

int main() {return 0; }

// my_init.s
// This file defines no symbol - but it sticks the _a_ctor into the global ctors list
L_.str:  
	.section	__DATA,__mod_init_func,mod_init_funcs
	.p2align	3
	.quad	_a_ctor

Link it ld <other args> init.o init.o main.o

Run it:

$ ./my_main.out 
ctor called
ctor called

With this patch, it'll only execute the ctor once

lld/test/MachO/framework-object.ll
15–16

here too?

int3 added inline comments.Dec 1 2021, 6:51 PM
lld/MachO/Driver.cpp
324–341

oh right I see what you mean. yeah seems very edge-case-y...

oontvoo added inline comments.Dec 1 2021, 7:13 PM
lld/MachO/Driver.cpp
324–341

In normal applications, this might be an edge case, yes, but it's not uncommon to see all kinds of creative usages of custom init (eg., the sanitizers do it all the time).

To think of it another way, if we were to propose this change to other ports (eg ELF), do you think they'd accept it?
If yes, then ok 🤐
If no, then why would MachO do it? (it's a weird behaviour ... and tbh, I don't think they'd accept it )

But I've said my piece and it seems your position has the popular vote here :P so ...

("this change" specifically being "if you see multiple identical input paths then, only look at one of them")

smeenai added a subscriber: smeenai.Dec 1 2021, 8:50 PM
smeenai added inline comments.
lld/MachO/Driver.cpp
324–341

I believe LLD COFF already deduplicates input files (but I also think they're just emulating Microsoft's linker's behavior there).

oontvoo accepted this revision.Dec 2 2021, 11:36 AM

LG
(Please see the requested changes in the test file, though. Thanks!)

lld/MachO/Driver.cpp
324–341

That's fair.
Feel free to revert to the simpler version. :)

This revision is now accepted and ready to land.Dec 2 2021, 11:36 AM
int3 added inline comments.Dec 2 2021, 11:22 PM
lld/MachO/Driver.cpp
324–341

@oontvoo I think you make good points, but I also think that the rare builds relying on this can easily restructure themselves to not have to load the same file multiple times.

Moreover, LLD-ELF already deviates from bfd in several ways, and I'm not sure this additional discrepancy would be that egregious.