This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Do not error out on dead stripped duplicate symbols
ClosedPublic

Authored by thevinster on Sep 27 2022, 11:51 PM.

Details

Summary

Builds that error out on duplicate symbols can still succeed if the symbols
will be dead stripped. Currently, this is the current behavior in ld64.
https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/Resolver.cpp#L2018.
In order to provide an easier to path for adoption, introduce a new flag that will
retain compatibility with ld64's behavior (similar to --deduplicate-literals). This is
turned off by default since we do not encourage this behavior in the linker.

Diff Detail

Event Timeline

thevinster created this revision.Sep 27 2022, 11:51 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 27 2022, 11:51 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
thevinster published this revision for review.Sep 28 2022, 12:06 AM
thevinster edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 12:08 AM
thakis added a subscriber: thakis.Sep 28 2022, 4:27 AM

it's far better to align with ld64's behavior to be closer as a drop-in replacement

FWIW, I don't agree with this argument, and our doc listing intentional differences doesn't either. (Not deduping strings by default is a big difference and we're stricter in various other areas too.)

I don't have a problem with the patch itself (people who want the stricter behavior can not dead_strip, and most projects probably don't do dead stripping in debug builds), but you need to come up with a better justification: why is fixing this problem harder than others you have to fix when you want to use lld?

kyulee added a subscriber: kyulee.Sep 28 2022, 9:40 AM

FWIW, I don't agree with this argument, and our doc listing intentional differences doesn't either. (Not deduping strings by default is a big difference and we're stricter in various other areas too.)

I'm not sure if people were aware of this behavior, so it wouldn't be listed as an intentional difference if it's a new finding. Deduping strings is also a performance concern. If LLD had done that, it would slowed down build speed. OTOH, this change has negligible impact to LLD's performance.

why is fixing this problem harder than others you have to fix when you want to use lld?

While it's definitely worth addressing the bad source code issues, there are also third-party libraries that we don't own that requires dead stripping in order for the link to succeed. While I think it's a valiant effort to engage with all those maintainers to fix this bug, it also seems infeasible to fix all of them. That said, would you be more inclined to this change if it were gated behind a flag instead of being the default behavior?

worth addressing the bad source code issues, there are also third-party libraries that we don't own that requires dead stripping in order for the link to succeed. While I think it's a valiant effort to engage with all those maintainers to fix this bug, it also seems infeasible to fix all of them. That said, would you be more inclined to this change if it were gated behind a flag instead of being the default behavior?

Do you have data how much harder these are to fix than (say) the text deduping failures? Like, how many known-broken libraries are we talking?

int3 added a subscriber: int3.Sep 28 2022, 11:44 AM

why is fixing this problem harder than others you have to fix when you want to use lld?

I would argue that reducing barriers to LLD adoption is good, regardless of whether the problem is harder or easier to fix. Even if it's just one extra error, that error might be the one that makes a potential adopter decide that adoption is just too much effort.

(Ngl I have already slightly regretted the choice of not deduplicating strings by default. I'm not convinced it saves us that much link time, and moreover I think it's more likely than not that the average adopter would just add that --deduplicate-literals flag rather than fixing their build.)

I think there's a good argument for having stricter behavior by default, though. My preference would be to support this but gate it under a flag (or possibly have a --ld64-compat flag that enables this behavior as well as --deduplicate-literals.)

Do you have data how much harder these are to fix than (say) the text deduping failures?

I don't have data on how much harder these are to fix since I'm not familiar with most of the third-party libraries in-use. This would require reaching out to the integrators/owners to understand how to fix the offending symbols.

Like, how many known-broken libraries are we talking?

I've kicked off a full re-build of the world, and so far, I'm seeing about 51 failures and counting (at least 11k more in pending). A good chunk of them can be deduplicated to the same issues. Just based on inspection, I'm seeing at least 5 major third-parties that probably need to be fixed. That might seem low, but it's also not trivial to fix legacy related code that also depended on this particular behavior for many years.

With those numbers in mind, are you more open to accepting this change or at least behind some flag to make users easier to transition to LLD? (I can also report back with the final results if you're interested but it would take some time for all the test runs to finish. But tbh, I'm not sure if knowing that changes anything about the above statement I made).

I'm surprised it's that many, but that sounds pretty convincing.

If it's not too much trouble, maybe putting this behind a flag like Jez suggests might be good – code that can _only_ link with -dead_strip seems pretty broken and it might be nice to catch this by default.

thevinster edited the summary of this revision. (Show Details)

Add flag to gate dead stripped duplicate symbols

thevinster edited the summary of this revision. (Show Details)Sep 29 2022, 3:31 PM
thevinster edited the summary of this revision. (Show Details)

Update comment

thakis accepted this revision.Sep 30 2022, 6:17 AM

Thanks much!

lld/MachO/Config.h
138

call variable like the flag (deadStripDuplicates)

This revision is now accepted and ready to land.Sep 30 2022, 6:17 AM
int3 added inline comments.Sep 30 2022, 12:02 PM
lld/MachO/SymbolTable.cpp
49

How about "DuplicateSymbolDiag" to parallel "UndefinedDiag"? IMO a "reporter" sounds like something that has nontrivial logic in it, rather than a POD type

56

maybe make this a SmallVector? I think MaskRay has been converting a lot of std::vectors to SmallVector in order to save on binary size

97–98

could consider moving this into reportPendingDuplicateSymbols -- that way builds that have --dead-strip-duplicates won't pay the cost of building these strings

that said the cost is probably trivial, so up to you

lld/docs/MachO/ld64-vs-lld.rst
18

sorry this is my OCD

19

or "strips dead code"

20–21

super nits:

  1. ld64's behavior is not a violation of ODR; rather the build is violating ODR and ld64 is not flagging it
  2. we're not maintaining compatibility so much as providing a way for the user to obtain compatible behavior
  3. double backticks because rST (I hate how it's subtly different from markdown but oh well)
int3 accepted this revision.Sep 30 2022, 12:02 PM

And thanks for doing this!

thevinster marked 7 inline comments as done.

Fixed up nits and naming

lld/MachO/SymbolTable.cpp
97–98

Did a very minor refactor on the struct to be able to move the string building to reportPendingDuplicateSymbols. In particular, I needed to save the src locations and files since accessing the values directly from defined would have been replaced with incorrect values by the time we got to the reporting stage.

int3 added inline comments.Sep 30 2022, 2:35 PM
lld/MachO/SymbolTable.cpp
60

maybe rename this to dupSymDiags :)

97–98

gotcha makes sense

thevinster marked an inline comment as done.

Rename dupSymReporter to dupSymDiags

This revision was landed with ongoing or failed builds.Sep 30 2022, 3:10 PM
This revision was automatically updated to reflect the committed changes.
lld/test/MachO/dead-strip.s