This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Perform conservative weak/linkonce resolution in distributed backend case
AbandonedPublic

Authored by tejohnson on Jul 14 2016, 7:43 AM.

Details

Summary

One tricky aspect of the weak/linkonce resolution in the distributed
backend case occurs if the link involves --start-lib/--end-lib with some
of the object files. Because there are two separate links (the ThinLink
and then the final native link), depending on the intervening importing
and inlining we can get into a situation where the linkonce selected
as prevailing in the ThinLink is no longer linked in by the second link.
The linker will only pull symbols from an archive library, in this
case formed via --start-lib/--end-lib, if there is a strong reference
to a symbol in that library from a library/object listed earlier in the
link, which may no longer be the case after importing etc. Note that
the gold-plugin does not know whether the objects are in a library
formed by --start-lib/--end-lib.

To handle this, under thinlto_index_only (which indicates we will have separate
ThinLTO processes), a new flag to thinLTOResolveWeakForLinkerInIndex and
a change to the isPrevailing callback conservatively ensure that otherwise non-prevailing
linkonce/weak are kept and also converted to weak when exported.

The new thinlto_preserve_nonprevailodr.ll test case ensures this works.

Diff Detail

Event Timeline

tejohnson updated this revision to Diff 63972.Jul 14 2016, 7:43 AM
tejohnson retitled this revision from to [ThinLTO] Perform index-based weak/linkonce resolution in import pass.
tejohnson updated this object.
tejohnson added a reviewer: mehdi_amini.
tejohnson added a subscriber: llvm-commits.

As I just noted in D21545, that patch will have the side effect of enabling the linkonce/weak resolution for the distributed backend case. I like the way that patch adds the resolution before invoking the importer, instead of doing it via the FunctionImporter pass as is done here. However, it will expose the bug I am fixing here with the new PreserveNonPrevailing flag. I think I should probably change this patch just to fix that bug and add the new test case, and submit that just after D21545 goes in.

As such, please review the patch ignoring the changes to Transforms/IPO/FunctionImport.cpp, which will go away once D21545 goes in.

pcc added inline comments.Jul 14 2016, 11:56 AM
include/llvm/LTO/LTO.h
71

Instead of adding this parameter (and adding back isExported), can you simply pass in an isPrevailing that always returns true?

tejohnson added inline comments.Jul 14 2016, 12:39 PM
include/llvm/LTO/LTO.h
71

Good idea on changing the IsPrevailing callback to avoid passing IsExported. However we still need the flag to avoid converting to available externally in the non exported case. We need to keep link once as link once e.g. when there is a ref in the same module in case it isn't inlined (and the previous prevailing copy is no longer linked in).

As I just noted in D21545, that patch will have the side effect of enabling the linkonce/weak resolution for the distributed backend case. I like the way that patch adds the resolution before invoking the importer, instead of doing it via the FunctionImporter pass as is done here. However, it will expose the bug I am fixing here with the new PreserveNonPrevailing flag. I think I should probably change this patch just to fix that bug and add the new test case, and submit that just after D21545 goes in.

As such, please review the patch ignoring the changes to Transforms/IPO/FunctionImport.cpp, which will go away once D21545 goes in.

Actually, I just realized that I can simulate the effect of D21545 without my FunctionImport pass changes via the right sequence of llvm-lto -thinlto-action invocations. Will rework the patch that way so that it can go in and hopefully fix this issue before D21545 goes in and exposes it.

pcc added inline comments.Jul 14 2016, 1:27 PM
include/llvm/LTO/LTO.h
71

However we still need the flag to avoid converting to available externally in the non exported case. We need to keep link once as link once e.g. when there is a ref in the same module in case it isn't inlined (and the previous prevailing copy is no longer linked in).

In the non-exported case, aren't we changing the linkage to internal anyway? That was what D21883 was about.

Anyway, I'll wait and see what your updated patch looks like.

tejohnson retitled this revision from [ThinLTO] Perform index-based weak/linkonce resolution in import pass to [ThinLTO] Perform conservative weak/linkonce resolution in distributed backend case.Jul 14 2016, 7:50 PM
tejohnson updated this object.
tejohnson updated this revision to Diff 64081.Jul 14 2016, 7:51 PM

Rebase and address review comments:

  • Fold the isExported check into the isPrevailing callback from

gold-plugin, to avoid needing to pass that callback.

  • Remove FunctionImport.cpp changes, since they will be obviated by

D21545.

  • Rework test case to use llvm-lto for weak symbol resolution and

importing, to simulate the behavior of the ThinLTO backend invoked by
clang after D21545.

mehdi_amini edited edge metadata.Jul 18 2016, 3:24 PM

Right now it is not clear to me why it would be legit (understand: if it should be supported) to perform the second link using static archive.
The first link already selected object from the archives, so we should be able to provide a list of objects to the second/final link.

Right now it is not clear to me why it would be legit (understand: if it should be supported) to perform the second link using static archive.
The first link already selected object from the archives, so we should be able to provide a list of objects to the second/final link.

That's a good question and an idea I thought about briefly but discarded for a couple reasons. I was concerned about requiring communication between the ThinLink and final link to build the link line (it would be more difficult to support in a build system, and also seems conceptually more complicated). Also I'm not 100% convinced that removing the --start-lib/--end-lib, even if we only include those object files the linker decided to select symbols from, would result in the same linking behavior. But maybe I just need to think through that some more...

Added davidxl for thoughts on changing the final link line as Mehdi suggested.

That's a good question and an idea I thought about briefly but discarded for a couple reasons. I was concerned about requiring communication between the ThinLink and final link to build the link line (it would be more difficult to support in a build system, and also seems conceptually more complicated).

How is the final link invocation computed right now?

Also I'm not 100% convinced that removing the --start-lib/--end-lib, even if we only include those object files the linker decided to select symbols from, would result in the same linking behavior.

Your observations about the linker picking different symbols seem to indicate that the --start-lib/--end-lib model is already broken.
If a list of .o on the command line is not enough for relinking, there's gonna be a need for a "linker resolution map" file that drives the linker.

That's a good question and an idea I thought about briefly but discarded for a couple reasons. I was concerned about requiring communication between the ThinLink and final link to build the link line (it would be more difficult to support in a build system, and also seems conceptually more complicated).

How is the final link invocation computed right now?

The link line is essentially the same, but with native .o instead of the bitcode .o. (See the new test case for an example)

Also I'm not 100% convinced that removing the --start-lib/--end-lib, even if we only include those object files the linker decided to select symbols from, would result in the same linking behavior.

Your observations about the linker picking different symbols seem to indicate that the --start-lib/--end-lib model is already broken.

When you say "is already broken" do you mean even in non-ThinLTO mode? I'm not sure why - it is just like having an archive of the objects between each start/end pair.

If a list of .o on the command line is not enough for relinking, there's gonna be a need for a "linker resolution map" file that drives the linker.

In ThinLTO it is because of the change (between the ThinLink and native object link) in which strong references exist between objects/libraries due to importing and inlining. But I believe with this patch and the follow-on D22467 the importing and symbol resolution is made suitably conservative.

davidxl edited edge metadata.Jul 18 2016, 5:35 PM

How about the following solution:

  1. in second link, force referencing symbols that are marked as prevailing definitions

or

  1. for distributed build mode, do not even do prevailing symbol selection in the first link. The side effect is the same as what this patch does -- slightly increased object file size (not the final binary size).

That's a good question and an idea I thought about briefly but discarded for a couple reasons. I was concerned about requiring communication between the ThinLink and final link to build the link line (it would be more difficult to support in a build system, and also seems conceptually more complicated).

How is the final link invocation computed right now?

The link line is essentially the same, but with native .o instead of the bitcode .o. (See the new test case for an example)

The question is: in the presence of static archives, how do you generates --start-lib/--end-lib? This seems to already require some build-system integration?

Also I'm not 100% convinced that removing the --start-lib/--end-lib, even if we only include those object files the linker decided to select symbols from, would result in the same linking behavior.

Your observations about the linker picking different symbols seem to indicate that the --start-lib/--end-lib model is already broken.

When you say "is already broken" do you mean even in non-ThinLTO mode? I'm not sure why - it is just like having an archive of the objects between each start/end pair.

I'm only talking about ThinLTO and the two-stage linking, i.e. the second invocation of the linker does not end-up with the same prevailing resolution as the first invocation. Your current patches are working around this deficiency.

If a list of .o on the command line is not enough for relinking, there's gonna be a need for a "linker resolution map" file that drives the linker.

In ThinLTO it is because of the change (between the ThinLink and native object link) in which strong references exist between objects/libraries due to importing and inlining. But I believe with this patch and the follow-on D22467 the importing and symbol resolution is made suitably conservative.

I don't see any justification for --start-lib/--end-lib right now.

That's a good question and an idea I thought about briefly but discarded for a couple reasons. I was concerned about requiring communication between the ThinLink and final link to build the link line (it would be more difficult to support in a build system, and also seems conceptually more complicated).

How is the final link invocation computed right now?

The link line is essentially the same, but with native .o instead of the bitcode .o. (See the new test case for an example)

The question is: in the presence of static archives, how do you generates --start-lib/--end-lib? This seems to already require some build-system integration?

We use --start-lib/--end-lib in all of our links - not regular archive libraries. Yes, a distributed build on regular archive libraries will require build system integration to extract the individual object files first.

Also I'm not 100% convinced that removing the --start-lib/--end-lib, even if we only include those object files the linker decided to select symbols from, would result in the same linking behavior.

Your observations about the linker picking different symbols seem to indicate that the --start-lib/--end-lib model is already broken.

When you say "is already broken" do you mean even in non-ThinLTO mode? I'm not sure why - it is just like having an archive of the objects between each start/end pair.

I'm only talking about ThinLTO and the two-stage linking, i.e. the second invocation of the linker does not end-up with the same prevailing resolution as the first invocation. Your current patches are working around this deficiency.

If a list of .o on the command line is not enough for relinking, there's gonna be a need for a "linker resolution map" file that drives the linker.

In ThinLTO it is because of the change (between the ThinLink and native object link) in which strong references exist between objects/libraries due to importing and inlining. But I believe with this patch and the follow-on D22467 the importing and symbol resolution is made suitably conservative.

I don't see any justification for --start-lib/--end-lib right now.

We use --start-lib/--end-lib internally instead of regular objects. So it is not a matter of justification, it is a matter of keeping that working.

I don't see any justification for --start-lib/--end-lib right now.

We use --start-lib/--end-lib internally instead of regular objects. So it is not a matter of justification, it is a matter of keeping that working.

Sorry, I meant "instead of regular archives".

The link line is essentially the same, but with native .o instead of the bitcode .o. (See the new test case for an example)

The question is: in the presence of static archives, how do you generates --start-lib/--end-lib? This seems to already require some build-system integration?

We use --start-lib/--end-lib in all of our links - not regular archive libraries. Yes, a distributed build on regular archive libraries will require build system integration to extract the individual object files first.

Ok I see, makes sense, I thought you were avoiding re-creating the static library and emulating it with these options.

Also I'm not 100% convinced that removing the --start-lib/--end-lib, even if we only include those object files the linker decided to select symbols from, would result in the same linking behavior.

Your observations about the linker picking different symbols seem to indicate that the --start-lib/--end-lib model is already broken.

When you say "is already broken" do you mean even in non-ThinLTO mode? I'm not sure why - it is just like having an archive of the objects between each start/end pair.

I'm only talking about ThinLTO and the two-stage linking, i.e. the second invocation of the linker does not end-up with the same prevailing resolution as the first invocation. Your current patches are working around this deficiency.

If a list of .o on the command line is not enough for relinking, there's gonna be a need for a "linker resolution map" file that drives the linker.

In ThinLTO it is because of the change (between the ThinLink and native object link) in which strong references exist between objects/libraries due to importing and inlining. But I believe with this patch and the follow-on D22467 the importing and symbol resolution is made suitably conservative.

I don't see any justification for --start-lib/--end-lib right now.

We use --start-lib/--end-lib internally instead of regular objects. So it is not a matter of justification, it is a matter of keeping that working.

I don't believe this is relevant: the fact that the first link is taking libraries as an input does not make it a compelling case to use them for the second link. Static libraries or start-lib/end-lib are a specific semantic model, and I believe it is just wrong to pass them to the final link.

The reason is that the first link is performing linker resolution: this decision process carry some specific semantic with archives. After this resolution and the ThinLTO process, there is no reason that makes sense to me right now to repeat this process.

It is possible that it is because I have a different mental model of static archives right now. AFAIK, the semantic difference between plain objects and archive is that an object defined in an archive is loaded and selected by the linker only if one the symbol it defines is referenced.

Keeping the linker semantic with ThinLTO means that the objects and symbols selected during the first link should be the "source of truth": i.e. we don't want a different linker resolution during the second link. Every objects that was selected for the first link should be included in the second link (hence it is wrong to use --start-lib/--end-lib).

Also, the distributed build system probably needs to handle the case where an object in the archive was not selected to be part of the link at all, won't be processed by ThinLTO, and there won't be any object to pass to the final link. I'm not sure how you're handling this with gold right now though.

The link line is essentially the same, but with native .o instead of the bitcode .o. (See the new test case for an example)

The question is: in the presence of static archives, how do you generates --start-lib/--end-lib? This seems to already require some build-system integration?

We use --start-lib/--end-lib in all of our links - not regular archive libraries. Yes, a distributed build on regular archive libraries will require build system integration to extract the individual object files first.

Ok I see, makes sense, I thought you were avoiding re-creating the static library and emulating it with these options.

Also I'm not 100% convinced that removing the --start-lib/--end-lib, even if we only include those object files the linker decided to select symbols from, would result in the same linking behavior.

Your observations about the linker picking different symbols seem to indicate that the --start-lib/--end-lib model is already broken.

When you say "is already broken" do you mean even in non-ThinLTO mode? I'm not sure why - it is just like having an archive of the objects between each start/end pair.

I'm only talking about ThinLTO and the two-stage linking, i.e. the second invocation of the linker does not end-up with the same prevailing resolution as the first invocation. Your current patches are working around this deficiency.

If a list of .o on the command line is not enough for relinking, there's gonna be a need for a "linker resolution map" file that drives the linker.

In ThinLTO it is because of the change (between the ThinLink and native object link) in which strong references exist between objects/libraries due to importing and inlining. But I believe with this patch and the follow-on D22467 the importing and symbol resolution is made suitably conservative.

I don't see any justification for --start-lib/--end-lib right now.

We use --start-lib/--end-lib internally instead of regular objects. So it is not a matter of justification, it is a matter of keeping that working.

I don't believe this is relevant: the fact that the first link is taking libraries as an input does not make it a compelling case to use them for the second link. Static libraries or start-lib/end-lib are a specific semantic model, and I believe it is just wrong to pass them to the final link.

The reason is that the first link is performing linker resolution: this decision process carry some specific semantic with archives. After this resolution and the ThinLTO process, there is no reason that makes sense to me right now to repeat this process.

But it shouldn't be a correctness issue to do so.

It is possible that it is because I have a different mental model of static archives right now. AFAIK, the semantic difference between plain objects and archive is that an object defined in an archive is loaded and selected by the linker only if one the symbol it defines is referenced.

That is my understanding as well.

Keeping the linker semantic with ThinLTO means that the objects and symbols selected during the first link should be the "source of truth": i.e. we don't want a different linker resolution during the second link. Every objects that was selected for the first link should be included in the second link (hence it is wrong to use --start-lib/--end-lib).

I disagree that it should be wrong from a correctness point to do the final link with the same options.

I'm not convinced this is a better approach. It makes the build system's job more complicated as noted earlier, and requires it for correctness. I think it is preferable to have the correctness managed by the compiler itself where it is doing the importing and linkonce resolution in the first place, using the necessary conservative behavior in this situation.

Also, the distributed build system probably needs to handle the case where an object in the archive was not selected to be part of the link at all, won't be processed by ThinLTO, and there won't be any object to pass to the final link. I'm not sure how you're handling this with gold right now though.

The gold plugin processes all the bitcode files, and gold simply tells it which symbols will be prevailing and which are preempted. So there is always a resulting object file to pass to the final link.

The reason is that the first link is performing linker resolution: this decision process carry some specific semantic with archives. After this resolution and the ThinLTO process, there is no reason that makes sense to me right now to repeat this process.

But it shouldn't be a correctness issue to do so.

Cf answer to David by email.

Also, the distributed build system probably needs to handle the case where an object in the archive was not selected to be part of the link at all, won't be processed by ThinLTO, and there won't be any object to pass to the final link. I'm not sure how you're handling this with gold right now though.

The gold plugin processes all the bitcode files, and gold simply tells it which symbols will be prevailing and which are preempted. So there is always a resulting object file to pass to the final link.

I don't have gold to verify, but can you confirm what happens with:

A.cpp:
int main() {}
B.cpp:
void foo() {}

when built with:

clang -c -flto -c A.cpp B.cpp
clang -flto A.o --start-lib B.o --end-lib -Wl,-save-temps

(Not sure what is the right options to save the LTO bitcode file).

Will B.o be sent to LTO? Will it be merged with A.o?
Similarly, repeating the process with ThinLTO, will B.o be part of the index? (That'd be a semantic break).

tejohnson abandoned this revision.Jul 19 2016, 6:17 AM

The reason is that the first link is performing linker resolution: this decision process carry some specific semantic with archives. After this resolution and the ThinLTO process, there is no reason that makes sense to me right now to repeat this process.

But it shouldn't be a correctness issue to do so.

Cf answer to David by email.

Yes, as responded there that is a compelling example for doing this via the build system. Abandoning this revision while I work on that solution.

Also, the distributed build system probably needs to handle the case where an object in the archive was not selected to be part of the link at all, won't be processed by ThinLTO, and there won't be any object to pass to the final link. I'm not sure how you're handling this with gold right now though.

The gold plugin processes all the bitcode files, and gold simply tells it which symbols will be prevailing and which are preempted. So there is always a resulting object file to pass to the final link.

I don't have gold to verify, but can you confirm what happens with:

A.cpp:
int main() {}
B.cpp:
void foo() {}

when built with:

clang -c -flto -c A.cpp B.cpp
clang -flto A.o --start-lib B.o --end-lib -Wl,-save-temps

(Not sure what is the right options to save the LTO bitcode file).

Will B.o be sent to LTO? Will it be merged with A.o?
Similarly, repeating the process with ThinLTO, will B.o be part of the index? (That'd be a semantic break).

Previously it was including B.o in the index (with the old binutils I have installed I see that behavior). However, eugenis recently fixed this for the LTO side, via a new callback in gold and support in gold-plugin to skip the resulting files (see https://reviews.llvm.org/rL262676). And I can see that with his changes to the plugin this should avoid putting B.o in the index (will update my binutils shortly to confirm).