This is an archive of the discontinued LLVM Phabricator instance.

[lld] Support separate native object file path in --thinlto-prefix-replace
ClosedPublic

Authored by itf on Feb 22 2023, 3:17 PM.

Details

Summary

Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:

  1. the output files of the thin link itself (the .thinlto.bc index files and the

optional .imports files)

  1. the specified object file paths written to the response file given in the

--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.
This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).

The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"

This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.

Diff Detail

Event Timeline

itf created this revision.Feb 22 2023, 3:17 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
itf requested review of this revision.Feb 22 2023, 3:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2023, 3:17 PM

What's the motivation of this change?

oontvoo added inline comments.
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

This needs to go under %t (otherwise the tests may fail in our integrates because only stuff under %t is writeable)

6

%t/oldpath

(here and the rest)

11

This is running with the ELF port?
You should use the variable %lld

itf added a comment.Feb 23 2023, 9:51 AM

What's the motivation of this change?

This is necessary for Bazel to support thinlto with tree artifacts.

Bazel tree artifacts are generated folders with an unknown number of files, for example, auto generating source files from a script or the result of extracting a .zip file containing source files.

Bazel tree artifacts require that the folders are "write once". Therefore we cannot insert the link objects in the same folder as the result from the thinlto indexing step. Currently the file "--thinlto-index-only=file" requires the link objects to be in the same folder as the summary+import files from the indexing step.

This change allows having a different prefix for the link objects, and therefore keep the folders containing the summary+import files be "write once".

What's the motivation of this change?

This is necessary for Bazel to support thinlto with tree artifacts.

Bazel tree artifacts are generated folders with an unknown number of files, for example, auto generating source files from a script or the result of extracting a .zip file containing source files.

Bazel tree artifacts require that the folders are "write once". Therefore we cannot insert the link objects in the same folder as the result from the thinlto indexing step. Currently the file "--thinlto-index-only=file" requires the link objects to be in the same folder as the summary+import files from the indexing step.

This change allows having a different prefix for the link objects, and therefore keep the folders containing the summary+import files be "write once".

Could you give an example in the summary how an end user is going to use the proposed feature?
To describe how distributed ThinLTO works, I typically use such an example including ELF relocatable object files and LLVM bitcode files.

echo 'int bb(); int main() { return bb(); }' > a.c
echo 'int elf0(); int bb() { return elf0(); }' > b.c
echo 'int cc() { return 0; }' > c.c
echo 'int elf0() { return 0; }' > elf0.c && clang -c elf0.c
echo '' > elf1.c && clang -c elf1.c

clang -c -O2 -flto=thin a.c b.c c.c
clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/' elf0.o a.o -Wl,--start-lib b.o c.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o lto/a.o
clang -c -O2 -fthinlto-index=lto/b.o.thinlto.bc b.o -o lto/b.o
clang -c -O2 -fthinlto-index=lto/c.o.thinlto.bc c.o -o lto/c.o
clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp contains lto/a.o and lto/b.o

IIUC the proposed third component of ; just changes the --thinlto-index-only=a.rsp specified response file (not anything else on the disk), then I am unsure how it is going to help Bazel...
Bazel can preprocess the a.rsp content itself, instead of requiring more from the linker.

BTW: if you or someone is working on distributed ThinLTO in Bazel, I wonder whether you may investigate D130229 to address some issues with the current --thinlto-index-only= design...

lld/ELF/Driver.cpp
1269

The idiomatic way is must be used with --thinlto-index-only=.

lld/MachO/Driver.cpp
1589

This has a build error.

itf updated this revision to Diff 499958.EditedFeb 23 2023, 12:46 PM

Fixes build error caused by not including the new function getOldNewOptionsExtra in driver.cpp into the patch

itf marked an inline comment as done.Feb 23 2023, 1:15 PM

IIUC the proposed third component of ; just changes the --thinlto-index-only=a.rsp specified response file (not anything else on the disk), then I am unsure how it is going to help Bazel...
Bazel can preprocess the a.rsp content itself, instead of requiring more from the linker.

Depending on the contents of a.rsp to stay constant (i.e. exactly one file per line, no extra information such as "start-lib" "end-lib",etc) in order to parse and modify it as a text file from Bazel seemed to be very fragile.

Given that the initial reason for --thinlto-prefix-replace was rLLD365807
This can be used to ensure index files and native object files are stored in unique directories, ,

It seemed more appropriate to add an extra parameter to --thinlto-prefix-replace since it seemed in line with the original design. .

Could you give an example in the summary how an end user is going to use the proposed feature?

Modifying your example:

echo 'int bb(); int main() { return bb(); }' > a.c
echo 'int elf0(); int bb() { return elf0(); }' > b.c
echo 'int cc() { return 0; }' > c.c
echo 'int elf0() { return 0; }' > elf0.c && clang -c elf0.c
echo '' > elf1.c && clang -c elf1.c

clang -c -O2 -flto=thin a.c b.c c.c
clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/;ltoobj/' elf0.o a.o -Wl,--start-lib b.o c.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o ltoobj/a.o
clang -c -O2 -fthinlto-index=lto/b.o.thinlto.bc b.o -o ltoobj/b.o
clang -c -O2 -fthinlto-index=lto/c.o.thinlto.bc c.o -o ltoobj/c.o
clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp contains ltoobj/a.o and ltoobj/b.o
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

is cd %t in the previous line not enough to keep the following commands under %t ?

int3 added a subscriber: int3.Feb 23 2023, 1:17 PM
int3 added inline comments.
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

It is. However, none of the other tests do this, so I would prefer we avoid it unless it's required by the nature of this test

MaskRay added inline comments.Feb 23 2023, 1:21 PM
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

I support the use of cd %t as in rm -rf %t && split-file %s %t && cd %t, especially if there are many temporary/output files or complex file hierarchies. It's difficult to ensure a test doesn't write to other directories without %t.

(As mentioned elsewhere, Google has a simplified lit runner where CWD is the source directory instead of some directive relative to %t. And writing to the source directory will cause an error due to Bazel's sandbox.)

llvm/tools/llvm-lto2/llvm-lto2.cpp
323–328

The canonical spelling is /*xxx=*/yyy (no space).

IIUC the proposed third component of ; just changes the --thinlto-index-only=a.rsp specified response file (not anything else on the disk), then I am unsure how it is going to help Bazel...
Bazel can preprocess the a.rsp content itself, instead of requiring more from the linker.

Depending on the contents of a.rsp to stay constant (i.e. exactly one file per line, no extra information such as "start-lib" "end-lib",etc) in order to parse and modify it as a text file from Bazel seemed to be very fragile.

Can Bazel rewrite a.rsp with content xxx/a.o to xxx/random_hash/a.o? It will create a new file but I don't think that's a bottleneck.

Given that the initial reason for --thinlto-prefix-replace was rLLD365807
This can be used to ensure index files and native object files are stored in unique directories, ,

It seemed more appropriate to add an extra parameter to --thinlto-prefix-replace since it seemed in line with the original design. .

Could you give an example in the summary how an end user is going to use the proposed feature?

Modifying your example:

echo 'int bb(); int main() { return bb(); }' > a.c
echo 'int elf0(); int bb() { return elf0(); }' > b.c
echo 'int cc() { return 0; }' > c.c
echo 'int elf0() { return 0; }' > elf0.c && clang -c elf0.c
echo '' > elf1.c && clang -c elf1.c

clang -c -O2 -flto=thin a.c b.c c.c
clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/;ltoobj/' elf0.o a.o -Wl,--start-lib b.o c.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o ltoobj/a.o
clang -c -O2 -fthinlto-index=lto/b.o.thinlto.bc b.o -o ltoobj/b.o
clang -c -O2 -fthinlto-index=lto/c.o.thinlto.bc c.o -o ltoobj/c.o
clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp contains ltoobj/a.o and ltoobj/b.o
int3 added inline comments.Feb 23 2023, 1:26 PM
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

aside from consistency, I like explicit paths in order that I can copy and paste individual lines from the verbose test output for debugging purposes. (I prefer to avoid environment variables in test for the same reasons.)

using a sandbox does seem like a better way of ensuring a test doesn't write outside of the test dir

but ok, I won't insist :)

MaskRay added inline comments.Feb 23 2023, 1:33 PM
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

OK:) I use my own simplified lit runner: https://github.com/MaskRay/Config/blob/master/home/bin/ff

% cd lld/test/ELF/lto 
% ff thinlto-index-file.ll
0  rm -rf a && mkdir a && cd a
1  opt -module-summary ../thinlto-index-file.ll -o 1.o
2  opt -module-summary $HOME/llvm/lld/test/ELF/lto/Inputs/thinlto.ll -o 2.o
3  opt -module-summary $HOME/llvm/lld/test/ELF/lto/Inputs/thinlto_empty.ll -o 3.o
4  ld.lld --plugin-opt=thinlto-index-only=1.txt -shared 1.o 2.o 3.o -o /dev/null
5  FileCheck ../thinlto-index-file.ll < 1.txt
6  ld.lld --thinlto-index-only=2.txt -shared 1.o 2.o 3.o -o /dev/null
7  diff 1.txt 2.txt

When I want to inspect something, I just cd a. So cd %t in the test file doesn't bother me that much :)

Another advantage is that some features depend on whether a stale output file exists. rm -rf %t makes the test immune to such bugs. With a lot of %t1 %t2 it's tedious to do rm -f %t1 %t2

itf added a comment.Feb 24 2023, 2:01 PM

Can Bazel rewrite a.rsp with content xxx/a.o to xxx/random_hash/a.o? It will create a new file but I don't think that's a bottleneck.

Currently there is no code in bazel, that reads and modifies a parameter file like a.rsp. It is possible to do so by, for example, calling a shell script / binary executable that modifies the file and saves a new one as one of the actions (like so: https://github.com/bazelbuild/examples/tree/main/rules/actions_run).

However, I'd argue that it is better for lld to have this extra argument. The link objects .o are outputs of actions that can be run in parallel, while the thinlto.bc and the .import files are a result of the thinlto indexing action which run in series, i.e. all the thinlto.bc are generated simultaneously. I find it unexpected that lld would force those files to be in the same folders, given that thinlto-index-only= is particularly useful for distributed thinLTO, where there could be restrictions on what can be saved in each location

MaskRay added a comment.EditedFeb 24 2023, 2:21 PM

Can Bazel rewrite a.rsp with content xxx/a.o to xxx/random_hash/a.o? It will create a new file but I don't think that's a bottleneck.

Currently there is no code in bazel, that reads and modifies a parameter file like a.rsp. It is possible to do so by, for example, calling a shell script / binary executable that modifies the file and saves a new one as one of the actions (like so: https://github.com/bazelbuild/examples/tree/main/rules/actions_run).

However, I'd argue that it is better for lld to have this extra argument. The link objects .o are outputs of actions that can be run in parallel, while the thinlto.bc and the .import files are a result of the thinlto indexing action which run in series, i.e. all the thinlto.bc are generated simultaneously. I find it unexpected that lld would force those files to be in the same folders, given that thinlto-index-only= is particularly useful for distributed thinLTO, where there could be restrictions on what can be saved in each location

I am not convinced. For --thinlto-prefix-replace=';lto/;unique_prefix_for_rsp/', the output bitcode files and import files are in lto/ while the response file uses a different prefix.
This doesn't look like a reasonable build system request for the linker. Why can't Bazel place output bitcode files and import files under unique_prefix_for_rsp/ as well?

I prefer a generic solution for more build systems, not just a feature tuned for some strange aspects of a build system.

itf added a comment.Feb 24 2023, 3:24 PM

Can Bazel rewrite a.rsp with content xxx/a.o to xxx/random_hash/a.o? It will create a new file but I don't think that's a bottleneck.

Currently there is no code in bazel, that reads and modifies a parameter file like a.rsp. It is possible to do so by, for example, calling a shell script / binary executable that modifies the file and saves a new one as one of the actions (like so: https://github.com/bazelbuild/examples/tree/main/rules/actions_run).

However, I'd argue that it is better for lld to have this extra argument. The link objects .o are outputs of actions that can be run in parallel, while the thinlto.bc and the .import files are a result of the thinlto indexing action which run in series, i.e. all the thinlto.bc are generated simultaneously. I find it unexpected that lld would force those files to be in the same folders, given that thinlto-index-only= is particularly useful for distributed thinLTO, where there could be restrictions on what can be saved in each location

I am not convinced. For --thinlto-prefix-replace=';lto/;unique_prefix_for_rsp/', the output bitcode files and import files are in lto/ while the response file uses a different prefix.
This doesn't look like a reasonable build system request for the linker. Why can't Bazel place output bitcode files and import files under unique_prefix_for_rsp/ as well?

Bazel could output bitcode files and import files under unique_prefix_for_rsp/ as well, the issue is the linker enforcing that the object file generated in the next step must also be in this same folder.

I prefer a generic solution for more build systems, not just a feature tuned for some strange aspects of a build system.

The original -thinlto-prefix-replace D64542 is generic because it allows the source files and the index files to be stored in different folders. thinlto-index-only= forces index files and the object files to be stored in the same folder. This would allow index-files and object files to be stored in different folders.

I believed being able to store the results of different compilation steps in different folders was a common aspect of build systems, and often required when running a build in a distributed way.

Modifying your example, currently:

clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/' elf0.o a.o -Wl,--start-lib b.o c.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o otherfolder/a.o
...

Is invalid (would require an intermediate step mv otherfolder/a.o lto/), but

clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/;otherfolder/' elf0.o a.o -Wl,--start-lib b.o c.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o otherfolder/a.o
...

would become valid.

Originally the prefix replace option was added for build system convenience, so it didn't need to move output files around before other actions. It is somewhat arbitrary that the current new path is used for both the indexing action outputs as well as the paths written into the response file, that's just what the one user at the time wanted. I don't have a problem making this a bit more flexible allowing different paths to be specified for those different cases.

oontvoo added inline comments.Feb 27 2023, 5:52 AM
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

+1 - also prefer the explicit paths taht I can copy without havign to look up where the current directory is

itf updated this revision to Diff 501298.Feb 28 2023, 2:54 PM

replaced lld with %lld for MachO

itf added inline comments.Feb 28 2023, 2:59 PM
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

one of the problems of using full paths with %t is that then we cannot use
FileCheck --match-full-lines
it seems. Because
; CHECK: %t/foo.o
Does not seem to expand %t before checking the content of the file.

We can only use FileCheck without --match-full-lines, which is a weaker test

MaskRay added a comment.EditedMar 7 2023, 4:53 PM

I find the summary difficult to follow and I believe users will be more confusing than me (as a ThinLTO contributor). Feel free to take some of the command examples below to make things clearer

Say, our binary target is named $package:b and has a source file $package/b.cc along with other dependencies.

The new semantics can be used this way by newer Bazel: -Wl,--thinlto-index-only=bazel-bin/$package/b-lto-final.params,--thinlto-emit-imports-files,--thinlto-prefix-replace=bazel-bin/;bazel-bin/$package/b.lto/;bazel-bin/$package/b.lto.backend/,--lto-obj-path=bazel-bin/$package/b.lto.merged.o

  • bazel-bin/$package/_objs/b/b.pic.o: bitcode file
  • bazel-bin/$package/_objs.indexing/b/b.pic.o: minimized bitcode file
  • bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.thinlto.bc: index file
  • bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.imports: import file
  • bazel-bin/$package/b.lto.backend/$package/_objs/b/b.pic.o: ELF relocatable object file after a backend compile
  • bazel-bin/$package/b-lto-final.params: response file for the final native link (it references ELF relocatable object files after backend compiles)

There are two advantages:

  • (This doesn't require lld changes.) the full bitcode file and the minimized bitcode file are in different directories. This allows a ThinLTO indexing action to ship fewer input files. This may be a Bazel limitation, but a fairly reasonable limitation that other build systems may have as well: they track directories, but not individual files.
  • The ELF relocatable object file after a backend compile is now in a different directory. This ensures the directory b.lto is only used by one action. This requires this lld patch.

AIUI --thinlto-object-suffix-replace=.indexing.o;.o can effectively be deprecated. So after a transition period when we finally remove --thinlto-object-suffix-replace, the feature as proposed in this patch will not increase the overall complexity.

Therefore, I am happy with the direction.

lld/test/ELF/lto/thinlto-index-file-object-prefix-replace.ll
6

Perhaps shorten directories like oldpath, newpath, prefix_replace to something like old, new, out.

9

End full sentences in comments with periods. Fix this everywhere.

53

Fix "No newline at end of file"

lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

Just one positional argument mkdir -p oldpath/prefix_replace

MaskRay added inline comments.Mar 7 2023, 4:57 PM
llvm/include/llvm/LTO/LTO.h
229

NewLinkObjectPrefix refers to the ELF relocatable object file path. Is there a better parameter name to reflect that than NewLinkObjectPrefix? @tejohnson

I find the summary difficult to follow and I believe users will be more confusing than me (as a ThinLTO contributor).

I'll take a stab at suggesting a clearer summary in a little bit.

Feel free to take some of the command examples below to make things clearer

Say, our binary target is named $package:b and has a source file $package/b.cc along with other dependencies.

The new semantics can be used this way by newer Bazel: -Wl,--thinlto-index-only=bazel-bin/$package/b-lto-final.params,--thinlto-emit-imports-files,--thinlto-prefix-replace=bazel-bin/;bazel-bin/$package/b.lto/;bazel-bin/$package/b.lto.backend/,--lto-obj-path=bazel-bin/$package/b.lto.merged.o

  • bazel-bin/$package/_objs/b/b.pic.o: bitcode file
  • bazel-bin/$package/_objs.indexing/b/b.pic.o: minimized bitcode file

To clarify, the above file is currently emitted in the same directory as the full bitcode, and just has a different extension. I suppose you are proposing that we change to the above naming scheme, but that is not so simple, see below.

  • bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.thinlto.bc: index file
  • bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.imports: import file
  • bazel-bin/$package/b.lto.backend/$package/_objs/b/b.pic.o: ELF relocatable object file after a backend compile
  • bazel-bin/$package/b-lto-final.params: response file for the final native link (it references ELF relocatable object files after backend compiles)

There are two advantages:

  • (This doesn't require lld changes.) the full bitcode file and the minimized bitcode file are in different directories. This allows a ThinLTO indexing action to ship fewer input files.

Moving these into separate directories won't affect what the action ships (at least in Bazel). The inputs are specified as full paths, not directories. In fact, the main reason on our end for implementing and using minimized bitcode files was to reduce the aggregate amount of input files sent to the indexing action.

This may be a Bazel limitation, but a fairly reasonable limitation that other build systems may have as well: they track directories, but not individual files.

  • The ELF relocatable object file after a backend compile is now in a different directory. This ensures the directory b.lto is only used by one action. This requires this lld patch.

    AIUI --thinlto-object-suffix-replace=.indexing.o;.o can effectively be deprecated. So after a transition period when we finally remove --thinlto-object-suffix-replace, the feature as proposed in this patch will not increase the overall complexity.

Moving the minimized bitcode to a different directory will not obviate the need for an option. The thin link indexing step needs to know where the corresponding full bitcode is so that it can set up the paths for the later ThinLTO backend actions, which need to know the full IR paths where they are importing from. These full IR bitcode paths are recorded in the .thinlto.bc index files as well as the .imports files. If we moved the minimized bitcode to a different directory but used the same suffix, we would need to replace this with a different option that specified the prefix mapping from the path of the minimized IR (input to the indexing action) to the path of the full IR that will be used during importing by the later ThinLTO backend actions.

Therefore, I am happy with the direction.

llvm/include/llvm/LTO/LTO.h
229

Maybe FinalObjectPrefix ?

(I'll be be out for most of the next 4 weeks and may not have much time to review patches.)

Let me write down my understanding and the proposal.

When compiling a source file with -flto=thin, we additionally specify -fthin-link-bitcode= to create a minimized bitcode file.

Here is an example (simplified from how Bazel implements distributed ThinLTO):

for i in a b c; do clang -c -O2 -flto=thin -fthin-link-bitcode=$i.indexing.o $i.c; done
clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/',--thinlto-object-suffix-replace='.indexing.o;.o' elf0.o a.indexing.o -Wl,--start-lib [bc].indexing.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o lto/a.o
clang -c -O2 -fthinlto-index=lto/b.o.thinlto.bc b.o -o lto/b.o
clang -c -O2 -fthinlto-index=lto/c.o.thinlto.bc c.o -o lto/c.o
clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp contains lto/a.o and lto/b.o

In the thin link, we specify --thinlto-object-suffix-replace=.indexing.o;.o to change IR module names.

In the emitted index files (*.thinlink.bc) and import files (*.imports), we will see [abc].o instead of [abc].indexing.o.

% llvm-bcanalyzer --dump lto/a.o.thinlto.bc | grep '<ENTRY'
    <ENTRY abbrevid=6 op0=0 op1=97 op2=46 op3=111/> record string = 'a.o'
    <ENTRY abbrevid=6 op0=1 op1=98 op2=46 op3=111/> record string = 'b.o'

Note, thin links only require minimized bitcode files, while backend compiles only require full bitcode files.

To support tree artifacts with an unknown number of full bitcode files and minimized bitcode files, we need to use different directories.
If these bitcode files are within the same directory, Bazel will have to ship both full and minimized bitcode files to a backend compile action, causing waste.
To ship just full bitcode files in a backend compile action, full and minimized bitcode files need to be in different directories (e.g. a.bc and indexing/a.bc).

While we're here, let's make two more changes. First, let's use .bc instead of .o for the bitcode file extension name.
Second, let's place the initially compiled bitcode files in thin/ instead of the current working directory for clarity.

mkdir -p thin indexing
for i in a b c; do clang -c -O2 -flto=thin -fthin-link-bitcode=indexing/$i.bc $i.c -o thin/$i.bc; done
clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace='thin/;index/;obj/;indexing/' elf0.o indexing/a.bc -Wl,--start-lib indexing/[bc].bc -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=index/a.bc.thinlto.bc a.bc -o obj/a.o  # index/a.bc.imports specifies the needed full bitcode files, e.g. thin/a.bc
clang -c -O2 -fthinlto-index=index/b.bc.thinlto.bc b.bc -o obj/b.o
clang -c -O2 -fthinlto-index=index/c.bc.thinlto.bc c.bc -o obj/c.o
clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp specifies obj/a.o and obj/b.o

In -Wl,--thinlto-prefix-replace=';index/;obj/;indexing/', we introduce two new components.
The third component, obj/, specifies the output directory for the backend compile.
For tree artifacts, an ouput directory can only be used by one action type.
Since index/ is created by the thin link, we cannot place backend compiled ELF relocatable object files (*.o) there.
Backend compiled ELF relocatable object files have to use a separate directory (obj/).

The fourth component, indexing/, specifies the directory for minimized bitcode files and replaces the previous --thinlto-object-suffix-replace='.indexing.o;.o'.
When ld.lld communicates module names to LLVM LTO, it forms the module identifier a.bc from the input filename indexing/a.bc (strip the first component (an empty string) and prepends the fourth component indexing/).

I find the summary difficult to follow and I believe users will be more confusing than me (as a ThinLTO contributor).

I'll take a stab at suggesting a clearer summary in a little bit.

Thanks!

Feel free to take some of the command examples below to make things clearer

Say, our binary target is named $package:b and has a source file $package/b.cc along with other dependencies.

The new semantics can be used this way by newer Bazel: -Wl,--thinlto-index-only=bazel-bin/$package/b-lto-final.params,--thinlto-emit-imports-files,--thinlto-prefix-replace=bazel-bin/;bazel-bin/$package/b.lto/;bazel-bin/$package/b.lto.backend/,--lto-obj-path=bazel-bin/$package/b.lto.merged.o

  • bazel-bin/$package/_objs/b/b.pic.o: bitcode file
  • bazel-bin/$package/_objs.indexing/b/b.pic.o: minimized bitcode file

To clarify, the above file is currently emitted in the same directory as the full bitcode, and just has a different extension. I suppose you are proposing that we change to the above naming scheme, but that is not so simple, see below.

The "To support tree artifacts with an unknown number of full bitcode files and minimized bitcode files, we need to use different directories." part in my previous long reply captures my understanding.

Yes, b.pic.o and b.pic.indexing.o are in the same directory, and we use --thinlto-object-suffix-replace='.indexing.o;.o' to derive the full bitcode file from the minimized bitcode file.
With off-topic discussion with @itf , I think I kinda understand why b.pic.o and b.pic.indexing.o need to be in different directories now (with tree artifacts we have to ship the whole directory, not individual files), but more context from the Bazel side will be nice to have.

  • bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.thinlto.bc: index file
  • bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.imports: import file
  • bazel-bin/$package/b.lto.backend/$package/_objs/b/b.pic.o: ELF relocatable object file after a backend compile
  • bazel-bin/$package/b-lto-final.params: response file for the final native link (it references ELF relocatable object files after backend compiles)

There are two advantages:

  • (This doesn't require lld changes.) the full bitcode file and the minimized bitcode file are in different directories. This allows a ThinLTO indexing action to ship fewer input files.

Moving these into separate directories won't affect what the action ships (at least in Bazel). The inputs are specified as full paths, not directories. In fact, the main reason on our end for implementing and using minimized bitcode files was to reduce the aggregate amount of input files sent to the indexing action.

This may be a Bazel limitation, but a fairly reasonable limitation that other build systems may have as well: they track directories, but not individual files.

  • The ELF relocatable object file after a backend compile is now in a different directory. This ensures the directory b.lto is only used by one action. This requires this lld patch.

    AIUI --thinlto-object-suffix-replace=.indexing.o;.o can effectively be deprecated. So after a transition period when we finally remove --thinlto-object-suffix-replace, the feature as proposed in this patch will not increase the overall complexity.

Moving the minimized bitcode to a different directory will not obviate the need for an option. The thin link indexing step needs to know where the corresponding full bitcode is so that it can set up the paths for the later ThinLTO backend actions, which need to know the full IR paths where they are importing from. These full IR bitcode paths are recorded in the .thinlto.bc index files as well as the .imports files. If we moved the minimized bitcode to a different directory but used the same suffix, we would need to replace this with a different option that specified the prefix mapping from the path of the minimized IR (input to the indexing action) to the path of the full IR that will be used during importing by the later ThinLTO backend actions.

Agreed. --thinlto-prefix-replace='thin/;index/;obj/;indexing/' in my previous long message should work.

Therefore, I am happy with the direction.

(I'll be be out for most of the next 4 weeks and may not have much time to review patches.)

Ok, will work with @itf on this particular patch, and as mentioned below, let's discuss a follow on proposal to change the minimized bitcode file handling separately.

Let me write down my understanding and the proposal.

When compiling a source file with -flto=thin, we additionally specify -fthin-link-bitcode= to create a minimized bitcode file.

Here is an example (simplified from how Bazel implements distributed ThinLTO):

for i in a b c; do clang -c -O2 -flto=thin -fthin-link-bitcode=$i.indexing.o $i.c; done
clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp -Wl,--thinlto-prefix-replace=';lto/',--thinlto-object-suffix-replace='.indexing.o;.o' elf0.o a.indexing.o -Wl,--start-lib [bc].indexing.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o lto/a.o
clang -c -O2 -fthinlto-index=lto/b.o.thinlto.bc b.o -o lto/b.o
clang -c -O2 -fthinlto-index=lto/c.o.thinlto.bc c.o -o lto/c.o
clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp contains lto/a.o and lto/b.o

In the thin link, we specify --thinlto-object-suffix-replace=.indexing.o;.o to change IR module names.

In the emitted index files (*.thinlink.bc) and import files (*.imports), we will see [abc].o instead of [abc].indexing.o.

% llvm-bcanalyzer --dump lto/a.o.thinlto.bc | grep '<ENTRY'
    <ENTRY abbrevid=6 op0=0 op1=97 op2=46 op3=111/> record string = 'a.o'
    <ENTRY abbrevid=6 op0=1 op1=98 op2=46 op3=111/> record string = 'b.o'

Note, thin links only require minimized bitcode files, while backend compiles only require full bitcode files.

The above is all correct afaict.

To support tree artifacts with an unknown number of full bitcode files and minimized bitcode files, we need to use different directories.
If these bitcode files are within the same directory, Bazel will have to ship both full and minimized bitcode files to a backend compile action, causing waste.
To ship just full bitcode files in a backend compile action, full and minimized bitcode files need to be in different directories (e.g. a.bc and indexing/a.bc).

I'm very confused as this is not mentioned anywhere in this patch by @itf. Normally, at least without tree artifacts, we only ship the specified input files, not the entire directory. The stated motivation for this particular patch, which was unrelated to the input files of the indexing action, was:


Bazel tree artifacts require that the folders are "write once". Therefore we cannot insert the link objects in the same folder as the result from the thinlto indexing step. Currently the file "--thinlto-index-only=file" requires the link objects to be in the same folder as the summary+import files from the indexing step.

This change allows having a different prefix for the link objects, and therefore keep the folders containing the summary+import files be "write once".


@itf can you clarify whether the input object handling is different than for non-tree artifacts as well?

Regardless, I would prefer to keep this patch about the original issue of the write-once output folders. Additional refinements to the input files can be discussed in follow on patches.

While we're here, let's make two more changes. First, let's use .bc instead of .o for the bitcode file extension name.

I'm not in favor of hardcoding the extension in the compiler, it should be left to the build system, but I don't believe that what you are proposing would require a specific extension anyway. Currently, the object output file extension of the compile action is not a hardcoded requirement and I don't think it should be. Additionally, when we were initially implementing ThinLTO I recall that the Bazel team specifically wanted to keep the .o extension (otherwise the handling gets a bit trickier for them as a compile action will have different extensions depending on the optimization options). But like I said it doesn't appear to me that what you are proposing would require this.

Second, let's place the initially compiled bitcode files in thin/ instead of the current working directory for clarity.

Again, this should be decided in the build system, not by the compiler. But it also isn't clear to me that your proposal requires a specific name, just that your proposal is that we require the full and minimized bitcode to be in separate directories and supply the mapping between those instead of the mapping between the extensions. As mentioned above, I would like to understand from @itf if that is needed for tree artifacts, and discuss this in a separate patch, as it seems more like a refinement and less like a requirement, and is orthogonal to this particular change. It may also require additional changes to the build system.

I find the summary difficult to follow and I believe users will be more confusing than me (as a ThinLTO contributor).

I'll take a stab at suggesting a clearer summary in a little bit.

Thanks!

Got busy and forgot to do so. I suggest something like the following:


Title: [lld] Support separate native object file path in --thinlto-prefix-replace

Summary:
Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:

1) the output files of the thin link itself (the .thinlto.bc index files and the
optional .imports files)
2) the specified object file paths written to the response file given in the
--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.

This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).

The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"

This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.


Feel free to take some of the command examples below to make things clearer

Say, our binary target is named $package:b and has a source file $package/b.cc along with other dependencies.

The new semantics can be used this way by newer Bazel: -Wl,--thinlto-index-only=bazel-bin/$package/b-lto-final.params,--thinlto-emit-imports-files,--thinlto-prefix-replace=bazel-bin/;bazel-bin/$package/b.lto/;bazel-bin/$package/b.lto.backend/,--lto-obj-path=bazel-bin/$package/b.lto.merged.o

  • bazel-bin/$package/_objs/b/b.pic.o: bitcode file
  • bazel-bin/$package/_objs.indexing/b/b.pic.o: minimized bitcode file

To clarify, the above file is currently emitted in the same directory as the full bitcode, and just has a different extension. I suppose you are proposing that we change to the above naming scheme, but that is not so simple, see below.

The "To support tree artifacts with an unknown number of full bitcode files and minimized bitcode files, we need to use different directories." part in my previous long reply captures my understanding.

Yes, b.pic.o and b.pic.indexing.o are in the same directory, and we use --thinlto-object-suffix-replace='.indexing.o;.o' to derive the full bitcode file from the minimized bitcode file.
With off-topic discussion with @itf , I think I kinda understand why b.pic.o and b.pic.indexing.o need to be in different directories now (with tree artifacts we have to ship the whole directory, not individual files), but more context from the Bazel side will be nice to have.

As mentioned earlier, I would like clarification from @itf. But regardless, this seems like a separate refinement and I would prefer to separate the 2 issues and discuss and design any changes to this handling in a different patch.

  • bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.thinlto.bc: index file
  • bazel-bin/$package/b.lto/$package/_objs/b/b.pic.o.imports: import file
  • bazel-bin/$package/b.lto.backend/$package/_objs/b/b.pic.o: ELF relocatable object file after a backend compile
  • bazel-bin/$package/b-lto-final.params: response file for the final native link (it references ELF relocatable object files after backend compiles)

There are two advantages:

  • (This doesn't require lld changes.) the full bitcode file and the minimized bitcode file are in different directories. This allows a ThinLTO indexing action to ship fewer input files.

Moving these into separate directories won't affect what the action ships (at least in Bazel). The inputs are specified as full paths, not directories. In fact, the main reason on our end for implementing and using minimized bitcode files was to reduce the aggregate amount of input files sent to the indexing action.

This may be a Bazel limitation, but a fairly reasonable limitation that other build systems may have as well: they track directories, but not individual files.

  • The ELF relocatable object file after a backend compile is now in a different directory. This ensures the directory b.lto is only used by one action. This requires this lld patch.

    AIUI --thinlto-object-suffix-replace=.indexing.o;.o can effectively be deprecated. So after a transition period when we finally remove --thinlto-object-suffix-replace, the feature as proposed in this patch will not increase the overall complexity.

Moving the minimized bitcode to a different directory will not obviate the need for an option. The thin link indexing step needs to know where the corresponding full bitcode is so that it can set up the paths for the later ThinLTO backend actions, which need to know the full IR paths where they are importing from. These full IR bitcode paths are recorded in the .thinlto.bc index files as well as the .imports files. If we moved the minimized bitcode to a different directory but used the same suffix, we would need to replace this with a different option that specified the prefix mapping from the path of the minimized IR (input to the indexing action) to the path of the full IR that will be used during importing by the later ThinLTO backend actions.

Agreed. --thinlto-prefix-replace='thin/;index/;obj/;indexing/' in my previous long message should work.

Therefore, I am happy with the direction.

While we're here, let's make two more changes. First, let's use .bc instead of .o for the bitcode file extension name.

I'm not in favor of hardcoding the extension in the compiler, it should be left to the build system, but I don't believe that what you are proposing would require a specific extension anyway. Currently, the object output file extension of the compile action is not a hardcoded requirement and I don't think it should be. Additionally, when we were initially implementing ThinLTO I recall that the Bazel team specifically wanted to keep the .o extension (otherwise the handling gets a bit trickier for them as a compile action will have different extensions depending on the optimization options). But like I said it doesn't appear to me that what you are proposing would require this.

My proposal is about the convention used in Bazel. I don't propose to hard code the .bc path in llvm-project.
Switching from .o to .bc in Bazel is for clarity.

Second, let's place the initially compiled bitcode files in thin/ instead of the current working directory for clarity.

Again, this should be decided in the build system, not by the compiler. But it also isn't clear to me that your proposal requires a specific name, just that your proposal is that we require the full and minimized bitcode to be in separate directories and supply the mapping between those instead of the mapping between the extensions. As mentioned above, I would like to understand from @itf if that is needed for tree artifacts, and discuss this in a separate patch, as it seems more like a refinement and less like a requirement, and is orthogonal to this particular change. It may also require additional changes to the build system.

thin/ is for an arbitrary directory for exposition purpose.
It's just a convention in my example to show how distributed ThinLTO will work.
The Bazel implementation may prefer something like _objs* or other directories.

itf marked an inline comment as done.Mar 14 2023, 2:52 PM

To support tree artifacts with an unknown number of full bitcode files and minimized bitcode files, we need to use different directories.
If these bitcode files are within the same directory, Bazel will have to ship both full and minimized bitcode files to a backend compile action, causing waste.
To ship just full bitcode files in a backend compile action, full and minimized bitcode files need to be in different directories (e.g. a.bc and indexing/a.bc).

I'm very confused as this is not mentioned anywhere in this patch by @itf. Normally, at least without tree artifacts, we only ship the specified input files, not the entire directory. The stated motivation for this particular patch, which was unrelated to the input files of the indexing action, was:


Bazel tree artifacts require that the folders are "write once". Therefore we cannot insert the link objects in the same folder as the result from the thinlto indexing step. Currently the file "--thinlto-index-only=file" requires the link objects to be in the same folder as the summary+import files from the indexing step.

This change allows having a different prefix for the link objects, and therefore keep the folders containing the summary+import files be "write once".


@itf can you clarify whether the input object handling is different than for non-tree artifacts as well?

When we have an action in bazel that may take multiple tree artifacts as an input, such as the indexing or final linking steps, the action takes as input all the files inside the directory of the tree artifact or none of them. Therefore, in order to have only the thin bitcode be an input tothe indexing step, but not the full bit code, we will need to store the full bitcode and the thin one in different places.

This was not originally part of this proposal because I intended to submit a second patch with a new argument such as "--thinlto-object-prefix-replace". However, after discussing, @MaskRay suggested to simply add a 4th string to --thinlto-prefix-replace=, such as --thinlto-prefix-replace='thin/;index/;obj/;indexing/' instead of adding a brand new argument, and I agreed that this was a cleaner solution.

This 4th string can be added to this current patch or to a following patch. I can see arguments for either: because we are modifying the same argument, it should be in this patch. But because they fix different issues, they should be in different patches.

Regardless, I would prefer to keep this patch about the original issue of the write-once output folders. Additional refinements to the input files can be discussed in follow on patches.

While we're here, let's make two more changes. First, let's use .bc instead of .o for the bitcode file extension name.

I'm not in favor of hardcoding the extension in the compiler, it should be left to the build system, but I don't believe that what you are proposing would require a specific extension anyway. Currently, the object output file extension of the compile action is not a hardcoded requirement and I don't think it should be. Additionally, when we were initially implementing ThinLTO I recall that the Bazel team specifically wanted to keep the .o extension (otherwise the handling gets a bit trickier for them as a compile action will have different extensions depending on the optimization options). But like I said it doesn't appear to me that what you are proposing would require this.

Second, let's place the initially compiled bitcode files in thin/ instead of the current working directory for clarity.

Again, this should be decided in the build system, not by the compiler. But it also isn't clear to me that your proposal requires a specific name, just that your proposal is that we require the full and minimized bitcode to be in separate directories and supply the mapping between those instead of the mapping between the extensions. As mentioned above, I would like to understand from @itf if that is needed for tree artifacts, and discuss this in a separate patch, as it seems more like a refinement and less like a requirement, and is orthogonal to this particular change. It may also require additional changes to the build system.

I find the summary difficult to follow and I believe users will be more confusing than me (as a ThinLTO contributor).

I'll take a stab at suggesting a clearer summary in a little bit.

Thanks!

Got busy and forgot to do so. I suggest something like the following:


Title: [lld] Support separate native object file path in --thinlto-prefix-replace

Summary:
Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:

1) the output files of the thin link itself (the .thinlto.bc index files and the
optional .imports files)
2) the specified object file paths written to the response file given in the
--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.

This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).

The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"

This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.


In the end, the new format should be:

--thinlto-prefix-replace="origpath;outpath[;objpath][;indexpath]" with 4 arguments, in some order. the new objpath allows us to have each directory be the output of a different action, and ;indexpath allows us to have each directory be either the input of an action or not, without a subset of a directory being the input of an action.

In bazel, this is not an issue when handling non-tree artifacts, because for normal files we can decide to include or not include the file as input to an action. For tree artifacts, when we add a tree artifact as an input to an action, the whole directory gets added.

To support tree artifacts with an unknown number of full bitcode files and minimized bitcode files, we need to use different directories.
If these bitcode files are within the same directory, Bazel will have to ship both full and minimized bitcode files to a backend compile action, causing waste.
To ship just full bitcode files in a backend compile action, full and minimized bitcode files need to be in different directories (e.g. a.bc and indexing/a.bc).

I'm very confused as this is not mentioned anywhere in this patch by @itf. Normally, at least without tree artifacts, we only ship the specified input files, not the entire directory. The stated motivation for this particular patch, which was unrelated to the input files of the indexing action, was:


Bazel tree artifacts require that the folders are "write once". Therefore we cannot insert the link objects in the same folder as the result from the thinlto indexing step. Currently the file "--thinlto-index-only=file" requires the link objects to be in the same folder as the summary+import files from the indexing step.

This change allows having a different prefix for the link objects, and therefore keep the folders containing the summary+import files be "write once".


@itf can you clarify whether the input object handling is different than for non-tree artifacts as well?

When we have an action in bazel that may take multiple tree artifacts as an input, such as the indexing or final linking steps, the action takes as input all the files inside the directory of the tree artifact or none of them. Therefore, in order to have only the thin bitcode be an input tothe indexing step, but not the full bit code, we will need to store the full bitcode and the thin one in different places.

Ok, thanks for confirming.

This was not originally part of this proposal because I intended to submit a second patch with a new argument such as "--thinlto-object-prefix-replace". However, after discussing, @MaskRay suggested to simply add a 4th string to --thinlto-prefix-replace=, such as --thinlto-prefix-replace='thin/;index/;obj/;indexing/' instead of adding a brand new argument, and I agreed that this was a cleaner solution.

This 4th string can be added to this current patch or to a following patch. I can see arguments for either: because we are modifying the same argument, it should be in this patch. But because they fix different issues, they should be in different patches.

I'd prefer to do that in a separate patch and keep this one about the native object output directory.

We'll also want to deprecate --thinlto-object-suffix-replace (I'm not sure what the current deprecation policy is, @MaskRay may know).

Regardless, I would prefer to keep this patch about the original issue of the write-once output folders. Additional refinements to the input files can be discussed in follow on patches.

While we're here, let's make two more changes. First, let's use .bc instead of .o for the bitcode file extension name.

I'm not in favor of hardcoding the extension in the compiler, it should be left to the build system, but I don't believe that what you are proposing would require a specific extension anyway. Currently, the object output file extension of the compile action is not a hardcoded requirement and I don't think it should be. Additionally, when we were initially implementing ThinLTO I recall that the Bazel team specifically wanted to keep the .o extension (otherwise the handling gets a bit trickier for them as a compile action will have different extensions depending on the optimization options). But like I said it doesn't appear to me that what you are proposing would require this.

Second, let's place the initially compiled bitcode files in thin/ instead of the current working directory for clarity.

Again, this should be decided in the build system, not by the compiler. But it also isn't clear to me that your proposal requires a specific name, just that your proposal is that we require the full and minimized bitcode to be in separate directories and supply the mapping between those instead of the mapping between the extensions. As mentioned above, I would like to understand from @itf if that is needed for tree artifacts, and discuss this in a separate patch, as it seems more like a refinement and less like a requirement, and is orthogonal to this particular change. It may also require additional changes to the build system.

I find the summary difficult to follow and I believe users will be more confusing than me (as a ThinLTO contributor).

I'll take a stab at suggesting a clearer summary in a little bit.

Thanks!

Got busy and forgot to do so. I suggest something like the following:


Title: [lld] Support separate native object file path in --thinlto-prefix-replace

Summary:
Currently, the --thinlto-prefix-replace="oldpath;newpath" option is used during
distributed ThinLTO thin links to specify the mapping of the input bitcode object
files' directory tree (oldpath) to the directory tree (newpath) used for both:

1) the output files of the thin link itself (the .thinlto.bc index files and the
optional .imports files)
2) the specified object file paths written to the response file given in the
--thinlto-index-only=${response} option, which is used by the final native
link and must match the paths of the native object files that will be
produced by ThinLTO backend compiles.

This patch expands the --thinlto-prefix-replace option to allow a separate directory
tree mapping to be specified for the object file paths written to the response file
(number 2 above). This is important to support builds and build systems where the
same output directory may not be written by multiple build actions (e.g. the thin link
and the ThinLTO backend compiles).

The new format is: --thinlto-prefix-replace="origpath;outpath[;objpath]"

This replaces the origpath directory tree of the thin link input files with
outpath when writing the thin link index and imports outputs (number 1
above). If objpath is specified it replaces origpath of the input files with
objpath when writing the response file (number 2 above), otherwise it
falls back to the old behavior of using outpath for this as well.


In the end, the new format should be:

--thinlto-prefix-replace="origpath;outpath[;objpath][;indexpath]" with 4 arguments, in some order. the new objpath allows us to have each directory be the output of a different action, and ;indexpath allows us to have each directory be either the input of an action or not, without a subset of a directory being the input of an action.

In bazel, this is not an issue when handling non-tree artifacts, because for normal files we can decide to include or not include the file as input to an action. For tree artifacts, when we add a tree artifact as an input to an action, the whole directory gets added.

itf updated this revision to Diff 507751.Mar 23 2023, 9:36 AM
itf retitled this revision from Add extra parameter to thinlto-prefix-replace to [lld] Support separate native object file path in --thinlto-prefix-replace.
itf edited the summary of this revision. (Show Details)
itf marked 5 inline comments as done.Mar 23 2023, 9:42 AM
itf added inline comments.
llvm/include/llvm/LTO/LTO.h
229

Would FinalObjectPrefix not be confused with being the prefix of the final executable/lib created at the end of the final linking step, where are these "link objects" get linked together?

itf updated this revision to Diff 507778.Mar 23 2023, 9:49 AM
itf marked an inline comment as not done.
itf marked 5 inline comments as done.
itf updated this revision to Diff 508234.Mar 24 2023, 3:52 PM
itf marked 2 inline comments as done.

All changes suggested in the comments have now been implemented

itf marked 2 inline comments as done.Mar 24 2023, 3:52 PM
itf updated this revision to Diff 508673.Mar 27 2023, 8:22 AM
oontvoo added inline comments.Mar 27 2023, 10:39 AM
lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
4

one of the problems of using full paths with %t is that then we cannot use
FileCheck --match-full-lines
it seems. Because
; CHECK: %t/foo.o
Does not seem to expand %t before checking the content of the file.

We can only use FileCheck without --match-full-lines, which is a weaker test

It doesn't expand in a CHECK but you can pass it in as a variable to FileCheck
Eg

; FileCheck -DT_PATH=%t  <rest of args>

; CHECK: [[T_PAT]]/foo.o

I think this looks pretty good now. Just some minor suggestions for documentation comments etc.

lld/COFF/Config.h
223

Can you document this here and in the other Config.h files, specifically noting its relationship with thinLTOPrefixReplace? It's maybe a little confusing that we are replacing the first string in the pair of thinLTOPrefixReplace with this string, so I think it needs to be documented.

lld/ELF/Driver.cpp
1023

Add a brief documentation of what this is doing. Ditto for implementations in other Driver.cpp files.

llvm/include/llvm/LTO/LTO.h
224

nit: s/for/with

229

Hmm, I think I agree with you. How about NativeObjectPrefix ? Sorry for the renaming churn.

itf added inline comments.Mar 28 2023, 10:22 AM
lld/COFF/Config.h
223

I will split thinLTOPrefixReplace into thinLTOPrefixReplaceOld and thinLTOPrefixReplaceNew to help clear up the confusion, as well as add documentation:

// Used for /thinlto-prefix-replace:
// Replace the prefix in paths generated for ThinLTO, replacing thinLTOPrefixReplaceOld with thinLTOPrefixReplaceNew.
//If thinLTOPrefixReplaceNativeObject is defined, replaces the prefix of the object file paths written to the response file given in the
--thinlto-index-only=${response} option,
 llvm::StringRef thinLTOPrefixReplaceOld;
 llvm::StringRef thinLTOPrefixReplaceNew;
 llvm::StringRef thinLTOPrefixReplaceNativeObject;
itf updated this revision to Diff 509768.Mar 30 2023, 12:10 PM

Split thinLTOPrefixReplace into thinLTOPrefixReplaceOld and thinLTOPrefixReplaceNew because otherwise it was unexpeced that the first element of a pair would be replaced by a a variable that comes from outside the pair. Add test for COFF. Fixes typos and add comments to code.

itf updated this revision to Diff 509774.Mar 30 2023, 12:15 PM
itf updated this revision to Diff 509841.Mar 30 2023, 3:50 PM
tejohnson accepted this revision.Mar 31 2023, 8:00 AM

lgtm, thanks!

This revision is now accepted and ready to land.Mar 31 2023, 8:00 AM
itf updated this revision to Diff 510065.Mar 31 2023, 9:32 AM

Adds missing %t to failing COFF test

itf updated this revision to Diff 510113.Mar 31 2023, 1:13 PM

Mostly looks good. I am on a trip til April 10, but I'll be reasonably responsible reading updates to this patch in the evening.

lld/ELF/Driver.cpp
1026

Use C++17 structured binding

lld/test/COFF/thinlto-index-file-object-prefix-replace.ll
39 ↗(On Diff #510113)

delete

lld/test/ELF/lto/thinlto-index-file-object-prefix-replace.ll
16

Here and throughout, add -NEXT if applicable

40

;; Ensure that lld generates error can be simplified to Create an error since the subject (lld) is implied by the context.

42

Here and throughout, always include error: for an error message, and try including the input filename.

lld/test/MachO/thinlto-index-file-object-prefix-replace.ll
3

Prefer && to ; (if %t is somehow non-removable, bail out immediately)

itf updated this revision to Diff 510609.Apr 3 2023, 2:32 PM
itf marked an inline comment as done.

Fix issue with gold plugin. Uses c++17 structured binding for getOldNewOptionsExtra for readability.

itf marked 5 inline comments as done.Apr 3 2023, 2:34 PM
tejohnson added inline comments.Apr 3 2023, 2:43 PM
lld/test/ELF/lto/thinlto-index-file-object-prefix-replace.ll
42

I believe what @MaskRay meant was to match the "error:" from the error message that this presumably comes from (and the input filename assuming that is in the error message I guess?) - I would just go ahead and match the full error message output.

llvm/tools/gold/gold-plugin.cpp
895 ↗(On Diff #510609)

Add a TODO to support the new option syntax used by lld in the gold plugin.

itf updated this revision to Diff 510617.Apr 3 2023, 3:13 PM
tejohnson added inline comments.Apr 3 2023, 3:22 PM
lld/test/ELF/lto/thinlto-index-file-object-prefix-replace.ll
42

Still missing the "error:". I.e. I assume the full output is something like:

error: --thinlto-prefix-replace=old_dir;new_dir;obj_dir must be used with --thinlto-index-only=

I think @MaskRay would like the entire message matched.

llvm/tools/gold/gold-plugin.cpp
896 ↗(On Diff #510617)

Maybe:

// TODO: Add support for optional native object path in thinlto_prefix_replace option to match lld.

itf marked 6 inline comments as done.Apr 3 2023, 3:24 PM
itf added inline comments.
lld/test/COFF/thinlto-index-file-object-prefix-replace.ll
39 ↗(On Diff #510113)

deleted the extra white space

tejohnson added inline comments.Apr 3 2023, 3:27 PM
lld/test/COFF/thinlto-index-file-object-prefix-replace.ll
2 ↗(On Diff #510617)

Remove extra semicolon.

itf updated this revision to Diff 510624.Apr 3 2023, 3:31 PM
itf marked an inline comment as done.
itf added inline comments.
lld/test/ELF/lto/thinlto-index-file-object-prefix-replace.ll
42

I did not realize that the error message was being preprended with "error: ". Fixed now

MaskRay accepted this revision as: MaskRay.Apr 3 2023, 7:37 PM
MaskRay added inline comments.
lld/MachO/Driver.cpp
874

The original function name getOldNewOptions is a misnomer in using "option" instead of other terms like "value". --thinlto-prefix-replace is an option. The argument is a colon separated list.

Since below you use old_dir/etc, you can use oldDir/newDir/etc.

itf updated this revision to Diff 510845.Apr 4 2023, 9:53 AM
itf marked 3 inline comments as done.
itf marked an inline comment as done.