This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ARM] Add support for thunks to undefined non-weak symbols
ClosedPublic

Authored by peter.smith on Dec 13 2016, 3:18 AM.

Details

Reviewers
ruiu
rafael
Summary

In a shared library an undefined symbol is implicitly imported. If the symbol is called as a function a PLT entry is generated for it. When the caller is a Thumb b.w a thunk to the PLT entry is needed as all PLT entries are in ARM state.

This change allows undefined symbols to have thunks in the same way that shared symbols may have thunks.

I found this after finding segfaults in the Thumb2 release build of various python C extensions. The extensions are built as shared objects
that call back to the python interpreter. The python interpreter symbols are implicitly imported as undefined symbols. If a tail call from Thumb2 is used b.w then the PLT entry (ARM) was entered in Thumb state causing a segfault.

Adding thunks to undefined symbols is straightforward, although it does have the side-effect of adding a template parameter which requires some unrelated code to be updated. I considered trying to convert the undefined symbol to a shared symbol, but decided against it on the grounds that that this might cause problems later on if the non-existent InputFile was ever accessed.

Diff Detail

Event Timeline

peter.smith retitled this revision from to [LLD][ARM] Add support for thunks to undefined non-weak symbols.
peter.smith updated this object.
peter.smith added reviewers: ruiu, rafael.
peter.smith added a subscriber: llvm-commits.
ruiu edited edge metadata.Dec 13 2016, 4:35 PM

The concept of "undefined symbols with thunks" doesn't click in my mind. It feels like a null pointer but with some data. Can we handle such undefined symbols as a different type? For example, what if we promote (or replace) undefined symbols with shared symbols (or defined symbols) after we resolve all symbols if we are creating an DSO?

I think it depends on what the precise meaning of Shared and Undefined. At the moment I think the distinction is:
Shared:

  • The symbol is defined in a known shared object.
  • The symbol details such as type, binding and version are known.
  • The shared object has been added as a dependency to the output.
  • The shared object defining the symbol can be found from the SymbolBody.

Undefined:

  • We have a reference to a symbol but no definition.
  • When linking a shared library we assume that the dynamic loader will find a suitable definition from the executable/shared library or error.
  • In all other cases (besides relocatable -r) an undefined non-weak symbol is an error.
  • There is no input file associated with the symbol.

With those definitions I don't think that there is a problem with Thunks on undefined symbols. There is a call to a symbol that we know nothing about and we are going to rely on the dynamic loader to sort it out for us. As we need the dynamic loader we will necessarily need a PLT entry. The Thunk is from the caller to the PLT entry.

If we make the change to say that in Shared Libraries Undefined symbols become Shared there will be some implications. I think the definition of Shared has to be weakened to something closer to Undefined and any calling code that needs to know about the distinctions will have to do some extra work. For example if we have some diagnostics that print out the shared library name for the Shared symbol will need to test to see if it actually has an associated file.

Other possibilities:

  • A SharedUndefined symbol kind that could replace the if (body.isUndefined() && body.isShared()) with body.isUndefinedShared(), that could support a Thunk and we could keep Undefined to mean error if not defined.
  • Don't solve this problem right now and start work on range-extension thunks. As there can be more than one Thunk per symbol with range extension thunks we'll need to do something differently anyway.

I'm happy to try out the alternatives. I'm on vacation from Monday till the end of the year so I'll probably only be able to get a small change done by then.

ruiu added a comment.Dec 14 2016, 1:53 PM

Thank you for the explanation. I think I'm convinced that undefined symbols with thunks are not really that odd concept.

ELF/Symbols.h
248

Can you also mention that remaining undefined symbols when creating a DSO are resolved by the dynamic linker, and relocation against such undefined symbols need thunk in some architecture such as AArch64?

ELF/Target.cpp
1734

This code is subtle. So, if there is a remaining undefined symbol with a weak linkage when creating a DSO, we'll let the dynamic linker resolve it, instead of handling it as an undefined symbol. Is this consistent with other code in LLD and other linkers? It would be nice to add a comment to clarify.

peter.smith edited edge metadata.

Thanks for the update. I've added the comments.

I've also checked the behaviour of undefined weak references and both gold and lld will create PLT entries for calls to weak references in shared libraries. As a result I've updated the condition and added the behaviour to the test.

peter.smith updated this revision to Diff 82861.Jan 3 2017, 3:43 AM

Rebased patch to account for last couple of weeks changes, no functional changes. Happy New Year!

I don't think that there are any outstanding review comments left to address. Let me know if there is anything else?

ruiu accepted this revision.Jan 3 2017, 3:24 PM
ruiu edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jan 3 2017, 3:24 PM
peter.smith closed this revision.Jan 4 2017, 2:14 AM

Thanks for the review, committed revision 290951