This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Ensure objects from static libraries linked in correct order
AbandonedPublic

Authored by tejohnson on Jan 10 2017, 11:49 AM.

Details

Reviewers
pcc
mehdi_amini
Summary

I recently discovered that ThinLTO native objects links sometimes
occur in a different order than the objects were selected out of
static libraries by the native linker. The ThinLTO ModuleMap
is built (e.g. for gold-plugin) in the original order the objects are
listed in the library. However, the linker may select for inclusion
a later object in a static library first, based on the strong references
it satisfies. It's copies of weak symbols may then be prevailing instead
of an earlier object in the same library. However, the native objects
are linked in the order they were added to the ModuleMap, which would
result in the wrong copy of a weak function being selected in the final
native link for ThinLTO, resulting in incorrect behavior.

Since the linker doesn't provide the native object ordering to the
plugin, we can deduce the correct ordering among modules defining
the same weak symbol based on the prevailing symbols
in the combined index. With this patch we use a stable topological
sort to sort the ModuleMap based on the partial ordering provided by
the combined index.

Event Timeline

tejohnson updated this revision to Diff 83840.Jan 10 2017, 11:49 AM
tejohnson retitled this revision from to [ThinLTO] Ensure objects from static libraries linked in correct order.
tejohnson updated this object.
tejohnson added reviewers: mehdi_amini, pcc.
tejohnson added a subscriber: llvm-commits.
mehdi_amini edited edge metadata.Jan 10 2017, 11:58 AM

I'm not fond of this patch, this does not seems it should belong here: the client of the API is providing the module in order, we should just return them back in the same order. Up to the client to handle this correctly.

lib/LTO/LTO.cpp
839

That assuming that it is possible, i.e. that a linker will always ask for the first symbol to be the prevailing one (if it is not the case you may not be able to generate a topological sort)

pcc edited edge metadata.Jan 10 2017, 11:59 AM

Couldn't this be fixed more simply by dropping the losing weak definitions?

I like better what @pcc said :)

In D28523#641676, @pcc wrote:

Couldn't this be fixed more simply by dropping the losing weak definitions?

Maybe, maybe not. A few thoughts:

  • Right now we don't drop non-prevailing LinkOnceAny or WeakAny to available_externally in thinLTOResolveWeakForLinkerGUID, but I suppose we could since although they are not guaranteed to be identical (as in the ODR case), we know which is prevailing and it can't be inlined anyway (isInterposableLinkage).
  • We can't always drop to available_externally, e.g. in the case where this is an alias or aliasee.
  • The original case that motivated my looking into this was more complicated, and in fact we did drop the non-prevailing copies. In that case we had 2 linkonce_odr copies, and both were in comdats. We dropped the non-prevailing copy to available_externally, but it was first in the static library so currently linked first in the native link. The problem was that the comdat group in the module with the non-prevailing copy was still non-empty, it looked like:

COMDAT group section [ 238] `.group' [_ZN4foo_E] contains 2 sections:

[Index]    Name
[  239]   .ctors
[  240]   .rela.ctors

With the wrong object ordering on the final link, this partial comdat group was selected and the one containing the intended prevailing copies was completely removed. (I don't have the final IR handy anymore, but the symbol was originally in the llvm.global_ctors list so presumably that's why we still end up with a comdat group like that?)

Perhaps there is a way to fix my original issue by removing the comdat and all of its constituents when we mark a symbol as available_externally in the backend, but I haven't thought through this too much since discovering the weak symbol issue in the test case here that is also related to the final native link order issue. Fixing that issue by dropping the non-prevailing weak symbol requires also fixing the comdat issue, as well as addressing the limitation we have on dropping non-prevailing copies that are involved with aliases (requires duplication of the aliased definition I believe).

In general, it seems like we're asking for trouble by performing the final native link in a different order than the linker decided on during the thin link. That being said, I don't love this solution, but I didn't think of a better way to determine that ordering.

tejohnson abandoned this revision.Jan 14 2017, 6:07 PM

After discussions on IRC, we've decided to pursue a different set of fixes for this and related issues. Essentially forcing the second link to have the same resolution, by aggressively dropping any non-prevailing defs. This will require a few fixes, such as enabling dropping non-prevailing non-odr weak/linkonce, enabling dropping non-prevailing weak involved in an alias, and ensuring comdats with non-prevailing copies are removed. The first new patch relates to the last issue (D28737).