Builds that error out on duplicate symbols can still succeed if the symbols
will be dead stripped. Currently, this is the current behavior in ld64.
https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/Resolver.cpp#L2018.
In order to provide an easier to path for adoption, introduce a new flag that will
retain compatibility with ld64's behavior (similar to --deduplicate-literals). This is
turned off by default since we do not encourage this behavior in the linker.
Details
- Reviewers
MaskRay thakis int3 - Group Reviewers
Restricted Project - Commits
- rG58edaef3fe08: [lld-macho] Do not error out on dead stripped duplicate symbols
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
it's far better to align with ld64's behavior to be closer as a drop-in replacement
FWIW, I don't agree with this argument, and our doc listing intentional differences doesn't either. (Not deduping strings by default is a big difference and we're stricter in various other areas too.)
I don't have a problem with the patch itself (people who want the stricter behavior can not dead_strip, and most projects probably don't do dead stripping in debug builds), but you need to come up with a better justification: why is fixing this problem harder than others you have to fix when you want to use lld?
FWIW, I don't agree with this argument, and our doc listing intentional differences doesn't either. (Not deduping strings by default is a big difference and we're stricter in various other areas too.)
I'm not sure if people were aware of this behavior, so it wouldn't be listed as an intentional difference if it's a new finding. Deduping strings is also a performance concern. If LLD had done that, it would slowed down build speed. OTOH, this change has negligible impact to LLD's performance.
why is fixing this problem harder than others you have to fix when you want to use lld?
While it's definitely worth addressing the bad source code issues, there are also third-party libraries that we don't own that requires dead stripping in order for the link to succeed. While I think it's a valiant effort to engage with all those maintainers to fix this bug, it also seems infeasible to fix all of them. That said, would you be more inclined to this change if it were gated behind a flag instead of being the default behavior?
worth addressing the bad source code issues, there are also third-party libraries that we don't own that requires dead stripping in order for the link to succeed. While I think it's a valiant effort to engage with all those maintainers to fix this bug, it also seems infeasible to fix all of them. That said, would you be more inclined to this change if it were gated behind a flag instead of being the default behavior?
Do you have data how much harder these are to fix than (say) the text deduping failures? Like, how many known-broken libraries are we talking?
why is fixing this problem harder than others you have to fix when you want to use lld?
I would argue that reducing barriers to LLD adoption is good, regardless of whether the problem is harder or easier to fix. Even if it's just one extra error, that error might be the one that makes a potential adopter decide that adoption is just too much effort.
(Ngl I have already slightly regretted the choice of not deduplicating strings by default. I'm not convinced it saves us that much link time, and moreover I think it's more likely than not that the average adopter would just add that --deduplicate-literals flag rather than fixing their build.)
I think there's a good argument for having stricter behavior by default, though. My preference would be to support this but gate it under a flag (or possibly have a --ld64-compat flag that enables this behavior as well as --deduplicate-literals.)
Do you have data how much harder these are to fix than (say) the text deduping failures?
I don't have data on how much harder these are to fix since I'm not familiar with most of the third-party libraries in-use. This would require reaching out to the integrators/owners to understand how to fix the offending symbols.
Like, how many known-broken libraries are we talking?
I've kicked off a full re-build of the world, and so far, I'm seeing about 51 failures and counting (at least 11k more in pending). A good chunk of them can be deduplicated to the same issues. Just based on inspection, I'm seeing at least 5 major third-parties that probably need to be fixed. That might seem low, but it's also not trivial to fix legacy related code that also depended on this particular behavior for many years.
With those numbers in mind, are you more open to accepting this change or at least behind some flag to make users easier to transition to LLD? (I can also report back with the final results if you're interested but it would take some time for all the test runs to finish. But tbh, I'm not sure if knowing that changes anything about the above statement I made).
I'm surprised it's that many, but that sounds pretty convincing.
If it's not too much trouble, maybe putting this behind a flag like Jez suggests might be good – code that can _only_ link with -dead_strip seems pretty broken and it might be nice to catch this by default.
Thanks much!
lld/MachO/Config.h | ||
---|---|---|
138 ↗ | (On Diff #464072) | call variable like the flag (deadStripDuplicates) |
lld/MachO/SymbolTable.cpp | ||
---|---|---|
49 | How about "DuplicateSymbolDiag" to parallel "UndefinedDiag"? IMO a "reporter" sounds like something that has nontrivial logic in it, rather than a POD type | |
56 | maybe make this a SmallVector? I think MaskRay has been converting a lot of std::vectors to SmallVector in order to save on binary size | |
97–103 | could consider moving this into reportPendingDuplicateSymbols -- that way builds that have --dead-strip-duplicates won't pay the cost of building these strings that said the cost is probably trivial, so up to you | |
lld/docs/MachO/ld64-vs-lld.rst | ||
18 ↗ | (On Diff #464072) | sorry this is my OCD |
19 ↗ | (On Diff #464072) | or "strips dead code" |
20–21 ↗ | (On Diff #464072) | super nits:
|
Fixed up nits and naming
lld/MachO/SymbolTable.cpp | ||
---|---|---|
97–103 | Did a very minor refactor on the struct to be able to move the string building to reportPendingDuplicateSymbols. In particular, I needed to save the src locations and files since accessing the values directly from defined would have been replaced with incorrect values by the time we got to the reporting stage. |
How about "DuplicateSymbolDiag" to parallel "UndefinedDiag"? IMO a "reporter" sounds like something that has nontrivial logic in it, rather than a POD type