This is an archive of the discontinued LLVM Phabricator instance.

Add binary filename to the bitcode filename when using -thinlto-index-only
AbandonedPublic

Authored by akhuang on Aug 26 2019, 1:58 PM.

Details

Reviewers
inglorion
rnk
ruiu
Summary

Append binary name to the bitcode filename when using thin archives. This is to make the filenames more unique when doing distributed thinLTO and an object is compiled to multiple binaries.

Event Timeline

akhuang created this revision.Aug 26 2019, 1:58 PM
inglorion added a subscriber: ruiu.

@ruiu, I'd like your opinion on this.

TLDR; I lean towards passing on this change because we already have -thinlto-prefix-replace, but automatically coming up with output-dependent names for files that need to be different depending on output is nice so perhaps others think differently. My thoughts:

This includes the (base)name of the output in the name of the bitcode file buffer when using -thinlto-index-only. That name ends up being used in diagnostics and for distributed ThinLTO files (.thinlto.bc) and imports files, as well as the name we name we record for the file the symbols came from. So a file foo.obj in libfoobar.lib being linked into test.exe used to be called "libfoobar.libfoo.obj576" and would now be called "libfoobar.libfoo.obj576.test.exe" when -thinlto-index-only is in use, and the ThinLTO index would be "libfoobar.libfoo.obj576.test.exe.thinlto.bc" instead of "libfoobar.libfoo.obj576.thinlto.bc".

Without this change, we can achieve unique names for index and imports files by using the -thinlto-prefix-replace flag, e.g. "-thinlto-prefix-replace=;test.exe." would replace the empty prefix with "test.exe.", so that "libfoobar.libfoo.obj576" would become "test.exe.libfoobar.libfoo.obj576". Aside from the different naming scheme, the main difference is that this change makes the name uniquing automatic. I think there is a case to be made for that, but I'm not 100% sold. One objection is that the new names are not guaranteed to be unique; for example, two outputs called "test.exe" in different directories will generate the same names. With -thinlto-prefix-replace, because the renaming needs to be explicitly specified, we punt responsibility for unique names to the user, rather than doing something that will usually work but sometimes lead to incorrect results. Given that I'm not 100% sold and we have an existing mechanism, I lean towards not taking the change.

ruiu added a comment.Aug 26 2019, 10:14 PM

I think I'm convinced by Bob's argument. -thinlto-prefix-replace already exists, and the new option still has a chance to create a non-unique name, so it looks like passing on this is more convincing.

rnk added a comment.Aug 27 2019, 1:21 PM

If I understand the prefix replace option correctly, it operates on the entire input bitcode file path. With your example of -thinlto-prefix-replace:;test.exe, that would end up creating a new tree of thinlto.bc files (and import files) in a new test.exe directory. While that works, it doesn't seem very easy for a build system to use. For myself as a compiler developer, I'd also prefer not to have to navigate between two parallel deeply nested directory trees to find intermediate inputs and outputs. So, I'd really like to come up with a solution that puts the thinlto.bc index file in the same directory as the input bitcode file. If we had a prefix replace variant that worked on the file basename instead of the full path, that seems like it would work well, and fit into the existing flow with the addition of one boolean flag.

lld/COFF/InputFiles.cpp
796

This seems like an awkward place to introduce the binary name. This code is coming up with a fake "identifier" for the memory buffer that points into a static archive. I think we want to inject the binary name in the code that computes the output file path, which is in lto::getThinLTOOutputFile. I'm not sure what's the best way to thread the binary name through to that code, but I think that's the best place to do it.

Separately, I believe this code path only runs for non-thin archives, which aren't used for Chromium. There are two call sites for this ctor in Driver.cpp, and I think they pass an empty archive name for thin archives. Coming up with a unique name for regular, non-thin static archives is challenging, since there could even be multiple members with the same name, which is why offsetInArchive is factored in. I think we should keep that out of scope for now, and just focus on thin archives.

akhuang abandoned this revision.Sep 6 2019, 11:59 AM