This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Enable AutoHide on symbols with local_unnamed_addr
AbandonedPublic

Authored by steven_wu on Feb 15 2018, 3:26 PM.

Details

Summary

Teach thinLTO to analyze and auto hide local_unnamed_addr symbols if
possible.

Diff Detail

Event Timeline

steven_wu created this revision.Feb 15 2018, 3:26 PM
pcc added a comment.Feb 15 2018, 4:11 PM

The last time this was attempted (D28978) we established that the linker can do this itself. I don't think the situation has changed since then.

In D43361#1009486, @pcc wrote:

The last time this was attempted (D28978) we established that the linker can do this itself. I don't think the situation has changed since then.

I didn't know that we have discussed about this once. This issue comes up again because when thinLTO can promote linkonce_odr to weak_odr (in order to keep the symbol, not to disable auto hide), thus it lost auto hide before the object reaches the linker.

There is an alternative which is to teach linker to deduct auto hide property from the bitcode file and overwrite the linkage type afterwards. For C API, there is a new scope LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN can be used to do that but that will be forced the work onto all the linker which is probably not desirable but it solves the same problem.

pcc added a comment.Feb 15 2018, 6:26 PM

There is an alternative which is to teach linker to deduct auto hide property from the bitcode file and overwrite the linkage type afterwards. For C API, there is a new scope LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN can be used to do that but that will be forced the work onto all the linker which is probably not desirable but it solves the same problem.

This seems better to me. Ideally the linker should be reading and resolving symbols from bitcode files in the same way as regular object files, including the auto-hide bit. Linkers would need something similar for any other properties that are contained only in bitcode files but not in object files (such as a general address-taken bit).

In D43361#1009616, @pcc wrote:

There is an alternative which is to teach linker to deduct auto hide property from the bitcode file and overwrite the linkage type afterwards. For C API, there is a new scope LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN can be used to do that but that will be forced the work onto all the linker which is probably not desirable but it solves the same problem.

This seems better to me. Ideally the linker should be reading and resolving symbols from bitcode files in the same way as regular object files, including the auto-hide bit. Linkers would need something similar for any other properties that are contained only in bitcode files but not in object files (such as a general address-taken bit).

But this is extremely fragile, especially through C API. Linker can never trust 100% of all the information provided in the bitcode because there can be optimizations in the LTO pipeline that modify/add/delete any symbols or attributes. Unless the linker knows about all the optimizations, it is risky for linker to redo the linkage change from the optimizers and expect that there are no side affects. In this case, for example, if someone implement an optimization that can introduce code that taken the address of local_unnamed_addr and remove the attributes, then linker is not allowed to auto hide, even through all the original bitcode says unnamed_addr.

pcc added a comment.Feb 16 2018, 11:39 AM
In D43361#1009616, @pcc wrote:

There is an alternative which is to teach linker to deduct auto hide property from the bitcode file and overwrite the linkage type afterwards. For C API, there is a new scope LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN can be used to do that but that will be forced the work onto all the linker which is probably not desirable but it solves the same problem.

This seems better to me. Ideally the linker should be reading and resolving symbols from bitcode files in the same way as regular object files, including the auto-hide bit. Linkers would need something similar for any other properties that are contained only in bitcode files but not in object files (such as a general address-taken bit).

But this is extremely fragile, especially through C API. Linker can never trust 100% of all the information provided in the bitcode because there can be optimizations in the LTO pipeline that modify/add/delete any symbols or attributes. Unless the linker knows about all the optimizations, it is risky for linker to redo the linkage change from the optimizers and expect that there are no side affects. In this case, for example, if someone implement an optimization that can introduce code that taken the address of local_unnamed_addr and remove the attributes, then linker is not allowed to auto hide, even through all the original bitcode says unnamed_addr.

The unnamed_addr attribute means that the address is semantically significant. If a pass changes whether a global's address is semantically significant, that is a change in semantics and a bug in the pass.

pcc added a comment.Feb 16 2018, 11:40 AM

The unnamed_addr attribute means that the address is semantically significant

I mean, that the address is *not* semantically significant.

I am using local_unnamed_addr as an example. Maybe you are right but I think my point is still valid. That is, LTO optimization and code generation can modify the symbol table from the bitcode file. It is already the case that symbols can be internalized, removed, backend can also insert new symbols. Unless there is a rule saying LTO passes cannot modify the linkage or visibility or any other attributes to the symbols, linker cannot trust whatever the bitcode symbol table says and firmly it will not be changed for some good reason.

In this case, if the unnamed_addr attributes are not illegal to change, the linkage can be changed which effectively disable autohide on the symbol.

espindola edited reviewers, added: espindola; removed: rafael.Mar 15 2018, 8:34 AM

Can this be closed?

espindola requested changes to this revision.Mar 15 2018, 3:27 PM

If LTO does change the IR in some way that would drop local_unnamed_addr, it will not be present in the .o that is passed to the linker.

So the linker can decide if it is valid to hide the symbol by looking at all the input files, bitcode or not, including the ones produced by LTO.

This revision now requires changes to proceed.Mar 15 2018, 3:27 PM

So the original of this patch is thinLTO is promoting linkonce_odr to weak_odr. Think the following two situations:

  1. linkonce_odr + local_unnamed_addr --> linkonce_odr

compiler is dropping unnamed_addr, it is not safe to hide.

  1. linkonce_odr + local_unnamed_addr --> weak_odr + local_unnamed_addr

this is the linkage promotion by thinLTO, it is still 'safe' to hide.

For the linker point of view (at least ld64), both of them started with "weak auto-hide" in IR and turns into "weak" in object file. The only thing like can do safely is export the weak symbol.

So the original of this patch is thinLTO is promoting linkonce_odr to weak_odr. Think the following two situations:

  1. linkonce_odr + local_unnamed_addr --> linkonce_odr

compiler is dropping unnamed_addr, it is not safe to hide.

  1. linkonce_odr + local_unnamed_addr --> weak_odr + local_unnamed_addr

this is the linkage promotion by thinLTO, it is still 'safe' to hide.

The solution is to use canBeOmittedFromSymbolTable from include/llvm/Object/IRSymtab.h.

The symbol table is computed earlier and so the above predicate still returns 1. Your examples work with lld and ThinLTO if you want to check the implementation details.

The solution is to use canBeOmittedFromSymbolTable from include/llvm/Object/IRSymtab.h.

The symbol table is computed earlier and so the above predicate still returns 1. Your examples work with lld and ThinLTO if you want to check the implementation details.

It is easy to get 1 correct. This patch is to get 2 to be hidden using LTO.

Sure, linker can consult the symbol table in IRSymtab but the problem is how much can it be trusted? LTO can already add/remove symbols from IRSymtab. Is changing symbol linkage/visibility legal during LTO?

The solution is to use canBeOmittedFromSymbolTable from include/llvm/Object/IRSymtab.h.

The symbol table is computed earlier and so the above predicate still returns 1. Your examples work with lld and ThinLTO if you want to check the implementation details.

It is easy to get 1 correct. This patch is to get 2 to be hidden using LTO.

Using the IRSymtab also solves 2.

Sure, linker can consult the symbol table in IRSymtab but the problem is how much can it be trusted? LTO can already add/remove symbols from IRSymtab. Is changing symbol linkage/visibility legal during LTO?

It can be trusted. That is what lld and the gold plugin are using. The only theoretical change I can think of is a pass adding a use and dropping local_unnamed_addr, but the linker will see that in the .o file that LTO produces.

I might have missed something. Can you be more specific about how it is going to work?

Let's say IRSymtab has symbol "_foo" and it is weak and it can be omitted from symbol table. After LTO codegen, linker sees that "_foo" is weak in the object file without "auto hide" attribute. How can linker figure out if it should hide the symbol or not?

Actually, how does ELF/COFF linker know that if LTO decides to drop local_unnamed_addr (since only macho has flags to indicate that)? Is lld/gold looking at the object file to figure out if there are uses for the addr for this symbol?

Actually, how does ELF/COFF linker know that if LTO decides to drop local_unnamed_addr (since only macho has flags to indicate that)? Is lld/gold looking at the object file to figure out if there are uses for the addr for this symbol?

Right now the symbol is only dropped if it is used only from IR files. Any use from ELF counts as cannot be hidden. This is indeed the simple case as the symbol can be internalized.

But it does point out that we have an implicit assumption: no pass drops local_unnamed_addr.

If we had a strange_pass that did that, what we would get is

  • Linker gets true for every call to canBeOmittedFromSymbolTable.
  • Linker sets VisibleToRegularObj to false for every file.
  • LLVM internalizes the GV
  • strange_pass adds a use of the address of GV
  • we now have an address dependency that can fail.

I am tempered to just document in the language reference that it is invalid for a pass to drop local_unnamed_addr. With that the linker job is even simpler, it doesn't need to check the LTO produced files, just canBeOmittedFromSymbolTable in very IR symbol table.

Peter, what do you think?

pcc added a comment.Mar 15 2018, 6:19 PM

I am tempered to just document in the language reference that it is invalid for a pass to drop local_unnamed_addr. With that the linker job is even simpler, it doesn't need to check the LTO produced files, just canBeOmittedFromSymbolTable in very IR symbol table.

Peter, what do you think?

I agree. If a pass drops local_unnamed_addr from a global, it means that the pass has changed the program semantics such that the global's address is significant. But passes are not allowed to change semantics, therefore dropping local_unnamed_addr is invalid.

If we all agree that dropping local_unnamed_addr is invalid, I can abandon this. Thanks for the feedback!

steven_wu abandoned this revision.Mar 19 2018, 7:16 PM