This is an archive of the discontinued LLVM Phabricator instance.

[lld-link] implement -thinlto-index-only
ClosedPublic

Authored by inglorion on Jul 9 2019, 5:42 PM.

Details

Summary

This implements -thinlto-index-only, -thinlto-index-only:,
and -thinlto-emit-imports-files options in lld-link. They are
analogous to their counterparts in ld.lld: -thinlto-index-only
causes us to perform ThinLTO's thin link and write index files,
but not perform code generation. -thinlto-index-only: does the
same, but also writes a text file listing the native object
files expected to be generated. -thinlto-emit-imports-files
creates a text file next to each index file, listing the files
to import from.

Event Timeline

inglorion created this revision.Jul 9 2019, 5:42 PM
rnk added inline comments.Jul 10 2019, 3:07 PM
lld/COFF/LTO.cpp
57

I guess this is mainly a future extension point for prefix rewriting, so we can have different .thinlto.bc index files for different executable targets.

rnk added a comment.Jul 10 2019, 3:08 PM

Hit send without saying things...

This basically looks good to me, but I figured you should wait to get feedback from other reviewers.

inglorion marked an inline comment as done.Jul 10 2019, 3:40 PM
inglorion added inline comments.
lld/COFF/LTO.cpp
57

Exactly. The ELF linker allows prefix and suffix replace here, and we'll want that for the COFF linker, too. Patch is coming.

inglorion marked an inline comment as done.Jul 10 2019, 3:42 PM
inglorion added inline comments.
lld/COFF/LTO.cpp
57

Originally I had the prefix and suffix replace code in this patch, but I broke it up into smaller pieces.

ruiu accepted this revision.Jul 10 2019, 10:38 PM

LGTM

lld/COFF/LTO.cpp
57

Please move this https://reviews.llvm.org/D64542 so that this identity function is not included in the first patch.

This revision is now accepted and ready to land.Jul 10 2019, 10:38 PM

removed getThinLTOOutputFile identity function

inglorion marked 2 inline comments as done.

new variable naming style

This revision was automatically updated to reflect the committed changes.

Distributed ThinLTO doesn't support archives: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133145.html. @tejohnson suggested -start-lib -end-lib as a workaround, but that's not a thing for COFF. Do you have plans for archive support, or is that not something you need to deal with for your use cases?

rnk added a comment.Jul 15 2019, 3:27 PM

Distributed ThinLTO doesn't support archives: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133145.html. @tejohnson suggested -start-lib -end-lib as a workaround, but that's not a thing for COFF. Do you have plans for archive support, or is that not something you need to deal with for your use cases?

Personally I believe we should prioritize fixing that. It's trivial to fix for thin archives, since those are practically equivalent to the command line object groups.

In D64461#1586582, @rnk wrote:

Distributed ThinLTO doesn't support archives: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133145.html. @tejohnson suggested -start-lib -end-lib as a workaround, but that's not a thing for COFF. Do you have plans for archive support, or is that not something you need to deal with for your use cases?

Personally I believe we should prioritize fixing that. It's trivial to fix for thin archives, since those are practically equivalent to the command line object groups.

Someone asked about this on the llvm-dev mailing list recently, pasting my response below:

Somehow, the module identifier for each constituent object would need to be both unique (we currently generate this from the name of the archive plus the offset in the archive plus the name of the source file IIRC), but also correctly identify the extracted bitcode object used in the post-thinlink backend invocation. This is so it can write out the distributed index file with a filename that gets consumed by the associated backend invocation (passed to -fthinlto-index=), and so that the module paths emitted in those index files correctly identify where we can import functions from. Since the bitcode objects need to be extracted for the corresponding backend clang invocations, a couple possibilities come to mind:

  1. Do it outside the compiler/linker: wrap the whole thing in a script that does the extraction, invokes the link with extracted constituents surrounded by --start-lib/-end-lib pairs, invokes each backend through some parallel or distributed mechanism, and then invokes the final link; or
  2. Add support to pass some kind of mapping file into LTO that maps from each archive constituent to the extracted filename including path that the corresponding ThinLTO backend clang invocation will use, and have LTO set the module identifiers accordingly so that everything "just works" (in theory). We already support some munging of these names (see the thinlto_object_suffix_replace plugin option in either gold-plugin.cpp or in lld), but what you need here is a bit more complicated than a simple suffix change. But since there is already support for adjusting the name, it might not be too bad to add this support.
rnk added a comment.Jul 16 2019, 11:10 AM
In D64461#1586582, @rnk wrote:

Personally I believe we should prioritize fixing that. It's trivial to fix for thin archives, since those are practically equivalent to the command line object groups.

Someone asked about this on the llvm-dev mailing list recently, pasting my response below:

Somehow, the module identifier for each constituent object would need to be both unique (we currently generate this from the name of the archive plus the offset in the archive plus the name of the source file IIRC), but also correctly identify the extracted bitcode object used in the post-thinlink backend invocation. This is so it can write out the distributed index file with a filename that gets consumed by the associated backend invocation (passed to -fthinlto-index=), and so that the module paths emitted in those index files correctly identify where we can import functions from. Since the bitcode objects need to be extracted for the corresponding backend clang invocations, a couple possibilities come to mind:

  1. Do it outside the compiler/linker: wrap the whole thing in a script that does the extraction, invokes the link with extracted constituents surrounded by --start-lib/-end-lib pairs, invokes each backend through some parallel or distributed mechanism, and then invokes the final link; or
  2. Add support to pass some kind of mapping file into LTO that maps from each archive constituent to the extracted filename including path that the corresponding ThinLTO backend clang invocation will use, and have LTO set the module identifiers accordingly so that everything "just works" (in theory). We already support some munging of these names (see the thinlto_object_suffix_replace plugin option in either gold-plugin.cpp or in lld), but what you need here is a bit more complicated than a simple suffix change. But since there is already support for adjusting the name, it might not be too bad to add this support.

For thin archives, the bitcode files should already be extracted, it's just a matter of adjusting the path to the thin archive input. I think what you described for true static archives is quite a bit of work, and I don't think it's worth doing. But, making thin archives work should be feasible, and it moving a build system from fat to thin archives should just be a matter of passing a flag to the archiver. I think that extra ease of use is worth a lot.

To @ruiu's point on the llvm-commits thread, I'm also in favor of adding --start-lib --end-lib to LLD COFF. There's no reason we can't do both.