In the case your framework bundles contain relocatable objects, and your
objects include LC_LINKER_OPTIONs for the framework, previously they
would not be deduplicated like they would have if they were static
archives. This was also the case if you passed -framework for the
framework as well.
Details
- Reviewers
thakis oontvoo int3 - Group Reviewers
Restricted Project - Commits
- rG187ce07a06f5: [lld-macho] Fix duplicate symbols with relocatable objects
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Hoping for some feedback here. This feels pretty weird in the case that you actually passed the same object multiple times, and you wanted that to surface duplicate symbol errors, but maybe that isn't a big deal.
lld/MachO/Driver.cpp | ||
---|---|---|
254 | nit: no longer applying only to archives | |
lld/test/MachO/framework-object.ll | ||
5 ↗ | (On Diff #390895) | llvm-mc |
11 ↗ | (On Diff #390895) | could you spell out the output here to be either /dev/null or %t/... ? |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | On a second thought, this is suspicious. Doesn't this mean %lld foo.o foo.o wouldn't yield "duplicate symbols" error anymore? |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | Yea I mentioned this with a top level comment as well, it does break the related test. I need to find a better heuristic for skipping these, since we only want to do it for LC_LINKER_OPTION's / duplicate -framework commands etc (not sure if the latter is handled today or not, especially as it conflicts with -framework foo -weak_framework foo etc). I was hoping to do this specifically in the LC_LINKER_OPTION logic to avoid changing things here, but the issue is the order of hitting those options vs the command line options depends on the order things are passed, so I couldn't rely on that. |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | Ah, I've missed the previous comments. Sorry! In any case, I can think of a contrived example where you have the input obj files (standalone or in archives) containing no symbols but contributing to global ctors. In that case, wouldn't we want the side effect of these being loaded individually? |
I pushed another solution here that is pretty complex but I think covers the intended logic. The gist is:
- If something is loaded implicitly, that can happen any number of times
- If something is loaded implicitly, and then explicitly, that's fine, the implicit loads can still happen any number of times
- If something is loaded explicitly, and then implicitly, that's fine, the implicit loads can still happen any number of times
- If something is loaded explicitly multiple times, that is not fine, regardless of implicit loads
Note that for 2 and 3 the only difference is order based on whether the object file comes first or the -framework flag.
I would be interested in hearing how other folks think we can simplify this
lld/MachO/Driver.cpp | ||
---|---|---|
261–262 | this seems a bit awkward since a) loadedObjects now contains both objects and archives and b) we are querying this map even if the file ends up being a dylib. Can we have separate loadedArchives and loadedObjects maps? | |
323–325 | I don't think it'd be terribly hard to handle this, we'd just need an extra boolean to addFile to indicate if a file is being loaded via add{Framework,Library} or if it's being loaded directly via a path. That said... I'm not sure there's much value in handling this edge case. I'm more than happy to just dedup everything and document this behavioral difference in ld64-vs-lld.rst.
What does 'contributing to global ctors' mean exactly? | |
lld/test/MachO/framework-object.ll | ||
1 ↗ | (On Diff #390895) | not sure this test deserves its own file... seems like we could fold it into framework.s and lc-linker-option.ll |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | FWIW tensorflow is how I discovered this. They produce an object file in a framework and you get corresponding linker options loading it if you include it from swift. So our app doesn't link in that case. On the swift side there is a private flag to exclude the linker option, but that's not ideal |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 |
Just to be clear, I meant there's no much value in raising duplicate symbol errors when an object file is specified twice on the command line. We should definitely handle object files in frameworks that get loaded multiple times |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 |
Ok, agreed there's no value in raising duplicate error (that's just the side effect of not loading the file again)
The "contrived" case I'm talking about is this: // main.c #include <stdio.h> void a_ctor() { // do stuff printf("ctor called\n"); } int main() {return 0; } // my_init.s // This file defines no symbol - but it sticks the _a_ctor into the global ctors list L_.str: .section __DATA,__mod_init_func,mod_init_funcs .p2align 3 .quad _a_ctor Link it ld <other args> init.o init.o main.o Run it: $ ./my_main.out ctor called ctor called With this patch, it'll only execute the ctor once | |
lld/test/MachO/framework-object.ll | ||
14–15 ↗ | (On Diff #391177) | here too? |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | oh right I see what you mean. yeah seems very edge-case-y... |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | In normal applications, this might be an edge case, yes, but it's not uncommon to see all kinds of creative usages of custom init (eg., the sanitizers do it all the time). To think of it another way, if we were to propose this change to other ports (eg ELF), do you think they'd accept it? But I've said my piece and it seems your position has the popular vote here :P so ... ("this change" specifically being "if you see multiple identical input paths then, only look at one of them") |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | I believe LLD COFF already deduplicates input files (but I also think they're just emulating Microsoft's linker's behavior there). |
LG
(Please see the requested changes in the test file, though. Thanks!)
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | That's fair. |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | @oontvoo I think you make good points, but I also think that the rare builds relying on this can easily restructure themselves to not have to load the same file multiple times. Moreover, LLD-ELF already deviates from bfd in several ways, and I'm not sure this additional discrepancy would be that egregious. |
lld/MachO/Driver.cpp | ||
---|---|---|
254 | I'm not sure what this was referring to, but I believe it was a comment I had in a previous iteration, let me know if you meant something else! | |
323–325 | Sorry to resurrect this one, but I think with my current implementation of tracking implicit vs explicit, we get the best of both worlds here, I tested the ctors example above and it gets called twice as expected, but duplicate symbols still get reported as expected as well. Let me know if you have any concerns with it at this point though! | |
336 | I'm loosely considering not supporting this just to reduce the complexity here in general, since I think this case is much less likely to happen, but I don't have a strong preference, happy to hear thoughts from others | |
lld/test/MachO/framework-object.ll | ||
1 ↗ | (On Diff #390895) | Happy to, what's the line for a new file or not? I guess I was thinking separate was nicer just because files with a ton of unrelated RUNs gets a bit harder to navigate IMO. But also not having a clear name for this one did feel a bit weird |
5 ↗ | (On Diff #390895) | Are you saying I should rewrite this to use asm for this case instead? Or is there some version of that command that can do this same conversion? AFAICT if the source is either, I have to use multiple tools to get an object one time, and bitcode the other time. |
lg thanks! (please run clang-format )
lld/test/MachO/framework-object.ll | ||
---|---|---|
5 ↗ | (On Diff #390895) | actually ignore this - I'd previously thought llc wasn't on the list of available tools for test, hence I was suggesting using llvm-mc instead. |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | Hm, I think the current implementation looks a bit complicated and may have subtle bugs. For instance, what happens if a library is loaded implicitly first, and then subsequently is loaded explicitly twice in a row? I *think* LLD would end up loading it only once with this implementation... A possibly simpler implementation would be to hoist DylibFile::explicitlyLinked into the parent InputFile class, then combine the two loaded*Objects hashmaps into one. Personally though I'd prefer a simpler always-dedup implementation -- there'd be fewer potential bugs to avoid :) | |
336 | I think consistency is nice. (I'm assuming ld64 also dedups bitcode?) | |
lld/test/MachO/framework-object.ll | ||
1 ↗ | (On Diff #390895) |
Yeah that's kind of my bar. I try to have test files that are conceptually disjoint, so it's easy to decide which file to add future tests to |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | While I'd prefer the LD64's behaviour, I think I agree with int3 that simple implementation here is better. So my vote would have been: (a) is clearly not a good idea, hence 👍 for (b) |
lld/MachO/Driver.cpp | ||
---|---|---|
323–325 | I just pushed a different approach here. I let us get a bit off track because of how this was implemented, as far as I know at this point this is only a problem with frameworks, and the only reason I had implemented it here was because this is where we already had the info to know if it was one of these cases. I just pushed hoisting this out into addFramework itself instead, which reads the magic there and makes this decision and caching. This shouldn't have any performance impact on reading twice since the first time we read the file we cache it in memory anyways, so in the case we don't do anything, and immediately call addFile, it should still be fine. Let me know what you think! |
yeah, a more limited-scope change seems like a good idea :)
lld/MachO/Driver.cpp | ||
---|---|---|
391 | Q: why the name "container"? | |
402 | is this cleaner than caching all frameworks, regardless of file type? Or would we actually behave differently if this cached all files? (just curious, not asking you to change it. But we should probably have a comment here about why we're only caching object and bitcode files) | |
411 | nit: no need for braces |
lld/MachO/Driver.cpp | ||
---|---|---|
402 | Similar topic, is it better to check the cache first? (before even bothering to read the content of the file?) |
lld/MachO/Driver.cpp | ||
---|---|---|
391 | I don't feel strongly about it, totally up for suggestions, my thought here was that unlike dylib frameworks where the framework structure is long lived, for these other file types the framework structure just serves as an ephemeral container for the binary that's then linked in and the framework is practically discarded | |
402 |
I was careful to exclude dylibs from this because theoretically their isNeeded and other attrs could flip (I'm not sure how gracefully that's handled today), and I assume in that case we want to be careful about letting new loads overwrite previous loads? For the other file types I was just trying to reduce the radius of this logic, since archives for example have separate caching logic in addFile | |
402 |
Great point! Moving the cache checking up here greatly simplifies the implementation, since we no longer have to do the magic reading at all this way! Which seems obvious in retrospect |
lgtm, thanks for taking the time to hash out the different approaches!
lld/MachO/Driver.cpp | ||
---|---|---|
391 | hmm. that's... quite a few steps of thinking :D how about loadedObjectFrameworks? | |
402 | sgtm. Can we have a comment here that mentions how dylib and archives and separate handling? | |
411 | so actually we avoid braces when it's a single-line if statement, but if we are looking at else if, the convention is to use braces for it if any other part of the if-else chain requires braces. Sorry to make you go back and forth... | |
lld/test/MachO/lc-linker-option.ll | ||
85 |
Thanks for all the feedback folks, this is definitely better than my original version!
lld/MachO/Driver.cpp | ||
---|---|---|
410 | file can be null here (e.g. if loading an invalid tbd file), which now makes lld crash with Assertion failed: (Val && "isa<> used on a null pointer"), function doit, file Casting.h, line 104. Could you fix this? |
lld/MachO/Driver.cpp | ||
---|---|---|
410 | Thanks! https://reviews.llvm.org/D124271 |
nit: no longer applying only to archives