This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO][LTO][Legacy] Fix dependent libraries support by adding querying of the IRSymtab
ClosedPublic

Authored by bd1976llvm on Jun 5 2019, 4:44 PM.

Details

Summary

Dependent libraries support for the legacy api was committed in a broken state (see: https://reviews.llvm.org/D60274). This was missed due to the painful nature of having to integrate the changes into a linker in order to test. This change implements support for dependent libraries in the legacy LTO api:

  • I have removed the current api function, which returns a single string, and added functions to access each dependent library specifier individually.
  • To reduce the testing pain, I have made the api functions as thin as possible to maximize coverage from llvm-lto.
  • When doing ThinLTO the system linker will load the modules lazily when scanning the input files. Unfortunately, when modules are lazily loaded there is no access to module level named metadata. To fix this I have added api functions that allow querying the IRSymtab for the dependent libraries. I hope to expand the api in the future so that, eventually, all the information needed by a client linker during scan can be retrieved from the IRSymtab.

Diff Detail

Repository
rL LLVM

Event Timeline

bd1976llvm created this revision.Jun 5 2019, 4:44 PM
Herald added a project: Restricted Project. · View Herald Transcript
ormris added a subscriber: ormris.Jun 5 2019, 4:56 PM
bd1976llvm retitled this revision from [LTO][Legacy] Fix dependent libraries support by adding querying of the IRSymtab to [ThinLTO][LTO][Legacy] Fix dependent libraries support by adding querying of the IRSymtab.Jun 7 2019, 2:42 AM

This is ELF specific and miss the initial RFC so there is quite some reading to catch up. Should we merge the MachO implementation together with the ELF one? Once you commit this, you might want to file a bug report on me to catch up on the macho side.

Can you also add an integrated testcase (with llvm-lto) to show how this should be integrated with rest of the pipeline? There can be some interesting corner cases, like pulling in a static library with another dependency library (which might or might not already satisfied).

From the first glance, the patch looks fine to me.

Thanks for the response.

This is ELF specific and miss the initial RFC so there is quite some reading to catch up. Should we merge the MachO implementation together with the ELF one? Once you commit this, you might want to file a bug report on me to catch up on the macho side.

We should look at either merging the implementations or providing an api function that provides access to the "llvm.linker.options" named metadata via the IRSymtab. It may not be possible to merge the implementations because in COFF and MACHO the representation of dependent libraries is a string of command line options; whereas, the ELF representation is an array of library strings. This is because for COFF and MACHO there is basically only one linker for each file format so the command line format is fixed. For ELF there are a greater variety of linkers, so I just pass the library strings through to the linker, without processing them into command line arguments, and it is deferred to the linker to interpret the library strings.

Can you also add an integrated testcase (with llvm-lto) to show how this should be integrated with rest of the pipeline? There can be some interesting corner cases, like pulling in a static library with another dependency library (which might or might not already satisfied).

I might be misunderstanding here, but I don't think I can provide any further testing. These api functions are intended to be used by a client linker so that it can do symbol resolution (including processing the dependent libraries) on bitcode files. This is done before invoking the Full/ThinLTO codegen to generate native object files from the input bitcode files. So all we need to do is test that the correct information is supplied to a client linker. How to resolve scenarios such as pulling in a static library which specifies a dependent library is up to the client linker, no?

From the first glance, the patch looks fine to me.

Thanks. I was not sure if the new functions in LTOModule.h/.cpp were in the right file - but I don't see a better place to put them.

We should look at either merging the implementations or providing an api function that provides access to the "llvm.linker.options" named metadata via the IRSymtab. It may not be possible to merge the implementations because in COFF and MACHO the representation of dependent libraries is a string of command line options; whereas, the ELF representation is an array of library strings. This is because for COFF and MACHO there is basically only one linker for each file format so the command line format is fixed. For ELF there are a greater variety of linkers, so I just pass the library strings through to the linker, without processing them into command line arguments, and it is deferred to the linker to interpret the library strings.

It would be good if the API is flexible to return either. Not against having a different API to return linker options but I think that is less clean.

I might be misunderstanding here, but I don't think I can provide any further testing. These api functions are intended to be used by a client linker so that it can do symbol resolution (including processing the dependent libraries) on bitcode files. This is done before invoking the Full/ThinLTO codegen to generate native object files from the input bitcode files. So all we need to do is test that the correct information is supplied to a client linker. How to resolve scenarios such as pulling in a static library which specifies a dependent library is up to the client linker, no?

I agree it should be in the linker implementation.

Thanks. I was not sure if the new functions in LTOModule.h/.cpp were in the right file - but I don't see a better place to put them.

This is fine. Do you think you might want to query symbols from InputFile instead of LTOModule in the future?

We should look at either merging the implementations or providing an api function that provides access to the "llvm.linker.options" named metadata via the IRSymtab. It may not be possible to merge the implementations because in COFF and MACHO the representation of dependent libraries is a string of command line options; whereas, the ELF representation is an array of library strings. This is because for COFF and MACHO there is basically only one linker for each file format so the command line format is fixed. For ELF there are a greater variety of linkers, so I just pass the library strings through to the linker, without processing them into command line arguments, and it is deferred to the linker to interpret the library strings.

It would be good if the API is flexible to return either. Not against having a different API to return linker options but I think that is less clean.

Actually, now I remember that we need both. Developers do want to use linker options on ELF in the future (LLVM retains support for linker options on ELF - https://llvm.org/docs/Extensions.html#linker-options-section-linker-options). It is not implemented yet in LLD; but, I believe that the idea is that in source code "#pragma comment(lib, "foo") writes "foo" into the llvm.dependent-libraries and some new pragma e.g. #pragma linker(lib, "foo") would write ("lib", "foo") into llvm.linker-options which will specify -lfoo to ELF linkers. Given that, I think that we need to have separate api functions in the LTO interface.

Thanks. I was not sure if the new functions in LTOModule.h/.cpp were in the right file - but I don't see a better place to put them.

This is fine. Do you think you might want to query symbols from InputFile instead of LTOModule in the future?

Yes! I would like to expand the api in the future so that a client linker can everything it needs from the IRSymtab.

This revision is now accepted and ready to land.Jun 11 2019, 8:22 AM
This revision was automatically updated to reflect the committed changes.