Page MenuHomePhabricator

Add `--dependency-files` option, which is equivalent to compiler option -MD.
Needs ReviewPublic

Authored by ruiu on Jul 29 2019, 9:08 PM.

Details

Summary

Clang and GCC have a feature (-MD flag) to create a dependency file
in a format that make command can read, so that you don't have to
manually maintain dependencies between .c files and .h files.

There was no corresponding feature in the linker. With this patch, you
can now do the same thing by passing --dependency-files=<path> to lld.
I believe this option is worth adding.

This is an example of a generated file:
https://gist.github.com/rui314/f5e2c669c70bec31f260bfd4b8ef41a6

Fixes https://bugs.llvm.org/show_bug.cgi?id=42806.

Event Timeline

ruiu created this revision.Jul 29 2019, 9:08 PM
Herald added a project: Restricted Project. · View Herald Transcript
ruiu edited the summary of this revision. (Show Details)Jul 29 2019, 9:08 PM
pcc added inline comments.Jul 29 2019, 9:19 PM
lld/ELF/Driver.cpp
1408

Clang has a bunch of complicated logic to handle escaping here: http://llvm-cs.pcc.me.uk/tools/clang/lib/Frontend/DependencyFile.cpp#299

Would it be worth extracting it into a function and using it from here? Or maybe there's something simpler that we can do that would be understood by ninja et al.?

MaskRay added inline comments.Jul 29 2019, 10:48 PM
lld/ELF/Config.h
89

Move dependencyFiles below chroot?

lld/ELF/Driver.cpp
1393

statically-linked -> ``

a.o/b.a/c.so are not statically linked..

ruiu marked an inline comment as done.Jul 29 2019, 10:53 PM
ruiu added inline comments.
lld/ELF/Driver.cpp
1393

Yeah, and in that case, c.so is not part of your program, and it is not wrong to say that your executable does not really depend on c.so at link-time as long as c.so maintains a stable API.

ruiu updated this revision to Diff 212277.Jul 29 2019, 11:45 PM
  • quote a filename if it contains a space character
  • add a test
ruiu marked an inline comment as done.Jul 29 2019, 11:46 PM
ruiu added inline comments.
lld/ELF/Driver.cpp
1408

I read through the code and honestly my feeling is that we probably wouldn't have to make our code that bullet-proof. That logic is complicated because it tries to handle pathnames containing # or a special three-letter sequence \ #. If someone actually tried to use a filename that starts with a space followed by '#', that person is intentionally trying to break the build system.

So I made a change to quote a string if it contains a space character, which I believe should be enough.

grimar added inline comments.Jul 30 2019, 12:56 AM
lld/test/ELF/write-dependencies.s
4 ↗(On Diff #212277)

This probably a bit simpler:

llvm-mc -filetype=obj -triple=x86_64-unknown-linux /dev/null -o "%t/bar baz.o"
6 ↗(On Diff #212277)

Since you're testing the output in a particular format, what about adding --strict-whitespace --match-full-lines to FileCheck invocation?

If you need a makefile snippet: { echo -n 'a: '; clang a.o -o a -Wl,-t | sed 's/(.*//' | sort -u | sed '$!s/$/ \\/';} > a.d (assuming there is no special character)...

If there is some special needs, maybe we can improve -t to add more customization?

lld/ELF/Driver.cpp
1393

I think the phrase "statically-linked system libraries" is not accurate. Neither a system .a nor a system .so is statically linked.

ruiu marked an inline comment as done.Jul 30 2019, 2:52 AM
ruiu added inline comments.
lld/ELF/Driver.cpp
1393

I feel it is colloquially OK, but precisely speaking, you are right. How about this? "even though system libraries that are linked to your executable statically are a part of your program."

MaskRay added inline comments.Jul 30 2019, 3:36 AM
lld/ELF/Driver.cpp
1393
"even though system libraries that are linked to your executable statically are a part of your program."

This looks good.

The code looks good but my main question is if we should add a new option, rather than improve an existing option (-t/--trace), which has a fairly similar feature. What --write-dependencies does can be easily implemented with some sed commands. With a small modification, it can be adjusted to Ninja output (and probably dependency format used by other build systems).

No objections to the idea in principle. I think it may need a bit of wider discussion to reach its full potential though. My understanding is that many developers use makefile/ninja generation systems such as cmake rather than hand-write the file themselves. As such would this get much use unless it was integrated into these generators? May be worth approaching them to see if they have any requirements/observations about the option?

I also think PCC has a good point about special characters and quoting, I'm no expert myself, and have been bitten by edge cases in the past, albeit these were usually routines that had to cope with Unix and Windows paths.

My understanding is that many developers use makefile/ninja generation systems such as cmake rather than hand-write the file themselves. As such would this get much use unless it was integrated into these generators? May be worth approaching them to see if they have any requirements/observations about the option?

Having it integrated into e.g. cmake would be a great project for someone to take on, but for us (FreeBSD) it would be useful by itself. Our build is about 175K lines of hand-written Makefiles and we could plug it in with a small change in a couple of places. In any case we shouldn't hold up lld support waiting on prospective cmake changes IMO.

ruiu added a comment.Jul 30 2019, 5:29 PM

No objections to the idea in principle. I think it may need a bit of wider discussion to reach its full potential though. My understanding is that many developers use makefile/ninja generation systems such as cmake rather than hand-write the file themselves. As such would this get much use unless it was integrated into these generators? May be worth approaching them to see if they have any requirements/observations about the option?

These generators like cmake don't know the complete picture of the dependency list -- for example, if you pass -fsanitize to the compiler, it links libasan, and if you update libasan, you may want to rebuild your executable again, but that dependency cannot be tracked by cmake.

You are right that this probably needs wider discussion. I'll start a thread at llvm-dev.

phosek added a subscriber: phosek.Jul 30 2019, 6:40 PM
phosek added inline comments.
lld/ELF/Options.td
419

Have you considered naming the option -dependency-file? That's the name of the CC1 flag (see https://github.com/llvm/llvm-project/blob/master/clang/include/clang/Driver/Options.td#L606) and I think it'd be nice to make those consistent across different tools.

ruiu marked an inline comment as done.Jul 30 2019, 6:47 PM
ruiu added inline comments.
lld/ELF/Options.td
419

No, but that's a good suggestion. --write-dependencies, which is an alias for -MD, is also taken from clang, but in clang, that option doesn't take an argument. Clang's -dependency-file takes an argument. So, for consistency, it seems like -dependency-file is a better name.

ruiu marked 2 inline comments as done.Jul 30 2019, 6:55 PM
ruiu added inline comments.
lld/test/ELF/write-dependencies.s
6 ↗(On Diff #212277)

Since whitespace characters are not significant, I don't think we need to test that, but adding --match-full-lines is a good idea.

ruiu updated this revision to Diff 212491.Jul 30 2019, 6:55 PM
  • address review comments
ruiu retitled this revision from Add `--write-dependencies` option, which is equivalent to compiler option -MD. to Add `--dependency-files` option, which is equivalent to compiler option -MD..Jul 30 2019, 6:55 PM
ruiu edited the summary of this revision. (Show Details)
hctim added a subscriber: hctim.Aug 1 2019, 1:26 PM

My understanding is that many developers use makefile/ninja generation systems such as cmake rather than hand-write the file themselves. As such would this get much use unless it was integrated into these generators? May be worth approaching them to see if they have any requirements/observations about the option?

Having it integrated into e.g. cmake would be a great project for someone to take on, but for us (FreeBSD) it would be useful by itself. Our build is about 175K lines of hand-written Makefiles and we could plug it in with a small change in a couple of places. In any case we shouldn't hold up lld support waiting on prospective cmake changes IMO.

Ed, does ld.lld a.o -o /dev/null | sed 's/(.*//' | sort -u | sed '$!s/$/ \\/';} generate a usable Makefile fragment? I think the scenarios where people write Makefile are not very common now (think cmake/bazel/meson/scons/buck/...). Makefile and build.ninja are mostly used as an "assembly language". I think a dependency graph feature will be useful, but a Makefile fragment is probably not the best format. (I actually like Makefile more than the alternatives but I know the trend is that people are moving away from hand-written Makefile) At least it looks like the output can be easily generated from ld.lld -t output.

ruiu added a comment.Aug 5 2019, 6:12 AM

My understanding is that many developers use makefile/ninja generation systems such as cmake rather than hand-write the file themselves. As such would this get much use unless it was integrated into these generators? May be worth approaching them to see if they have any requirements/observations about the option?

Having it integrated into e.g. cmake would be a great project for someone to take on, but for us (FreeBSD) it would be useful by itself. Our build is about 175K lines of hand-written Makefiles and we could plug it in with a small change in a couple of places. In any case we shouldn't hold up lld support waiting on prospective cmake changes IMO.

Ed, does ld.lld a.o -o /dev/null | sed 's/(.*//' | sort -u | sed '$!s/$/ \\/';} generate a usable Makefile fragment? I think the scenarios where people write Makefile are not very common now (think cmake/bazel/meson/scons/buck/...). Makefile and build.ninja are mostly used as an "assembly language". I think a dependency graph feature will be useful, but a Makefile fragment is probably not the best format. (I actually like Makefile more than the alternatives but I know the trend is that people are moving away from hand-written Makefile) At least it looks like the output can be easily generated from ld.lld -t output.

I mentioned Makefile because -MD was created with that usage in mind (at least originally), but I guess that file format can be understood by other build tools. Am I missing something? Even though some build tools can understand C/C++ file dependencies without using a compiler, there's still some languages that the build systems cannot read, but still they can read auto-generated Makefile-compatible dependency file, no?

My understanding is that many developers use makefile/ninja generation systems such as cmake rather than hand-write the file themselves. As such would this get much use unless it was integrated into these generators? May be worth approaching them to see if they have any requirements/observations about the option?

Having it integrated into e.g. cmake would be a great project for someone to take on, but for us (FreeBSD) it would be useful by itself. Our build is about 175K lines of hand-written Makefiles and we could plug it in with a small change in a couple of places. In any case we shouldn't hold up lld support waiting on prospective cmake changes IMO.

Ed, does ld.lld a.o -o /dev/null | sed 's/(.*//' | sort -u | sed '$!s/$/ \\/';} generate a usable Makefile fragment? I think the scenarios where people write Makefile are not very common now (think cmake/bazel/meson/scons/buck/...). Makefile and build.ninja are mostly used as an "assembly language". I think a dependency graph feature will be useful, but a Makefile fragment is probably not the best format. (I actually like Makefile more than the alternatives but I know the trend is that people are moving away from hand-written Makefile) At least it looks like the output can be easily generated from ld.lld -t output.

I mentioned Makefile because -MD was created with that usage in mind (at least originally), but I guess that file format can be understood by other build tools. Am I missing something? Even though some build tools can understand C/C++ file dependencies without using a compiler, there's still some languages that the build systems cannot read, but still they can read auto-generated Makefile-compatible dependency file, no?

Different build systems use different syntax :/

Some are represented as a function call, e.g. executable('main', 'main.c'). Some are cc_binary(name="main", srcs=["main.c"]). llvm has LLVMBuild.txt. A Makefile fragment cannot be consumed by any of the build systems ;-)

In a hand-written Makefile, a.d is usually used in the following pattern:

a.o a.d: a.c
  gcc -MD -c $<

-include a.d   # suppress the error if a.d does not exist

a.c has many direct and indirect includes, e.g. stdio.h, stddef.h, a.h.
When any of the dependency is updated, a.o will be considered stale and will get rebuilt next time make is executed.

-MD is useful in that it gives the information that isn't available before: we can't tell what files a.c includes.

In the ld.lld --write-dependencies= case, the dependencies (bitcode files, object files, shared objects, and archive members that get pulled out) are already known. If it isn't, we'll get a "undefined reference" error. So I don't know who will be the consumer of the dependency output.

My understanding is that many developers use makefile/ninja generation systems such as cmake rather than hand-write the file themselves. As such would this get much use unless it was integrated into these generators? May be worth approaching them to see if they have any requirements/observations about the option?

Having it integrated into e.g. cmake would be a great project for someone to take on, but for us (FreeBSD) it would be useful by itself. Our build is about 175K lines of hand-written Makefiles and we could plug it in with a small change in a couple of places. In any case we shouldn't hold up lld support waiting on prospective cmake changes IMO.

Ed, does ld.lld a.o -o /dev/null | sed 's/(.*//' | sort -u | sed '$!s/$/ \\/';} generate a usable Makefile fragment? I think the scenarios where people write Makefile are not very common now (think cmake/bazel/meson/scons/buck/...). Makefile and build.ninja are mostly used as an "assembly language". I think a dependency graph feature will be useful, but a Makefile fragment is probably not the best format. (I actually like Makefile more than the alternatives but I know the trend is that people are moving away from hand-written Makefile) At least it looks like the output can be easily generated from ld.lld -t output.

I mentioned Makefile because -MD was created with that usage in mind (at least originally), but I guess that file format can be understood by other build tools. Am I missing something? Even though some build tools can understand C/C++ file dependencies without using a compiler, there's still some languages that the build systems cannot read, but still they can read auto-generated Makefile-compatible dependency file, no?

Different build systems use different syntax :/

Some are represented as a function call, e.g. executable('main', 'main.c'). Some are cc_binary(name="main", srcs=["main.c"]). llvm has LLVMBuild.txt. A Makefile fragment cannot be consumed by any of the build systems ;-)

In a hand-written Makefile, a.d is usually used in the following pattern:

a.o a.d: a.c
  gcc -MD -c $<

-include a.d   # suppress the error if a.d does not exist

a.c has many direct and indirect includes, e.g. stdio.h, stddef.h, a.h.
When any of the dependency is updated, a.o will be considered stale and will get rebuilt next time make is executed.

-MD is useful in that it gives the information that isn't available before: we can't tell what files a.c includes.

In the ld.lld --write-dependencies= case, the dependencies (bitcode files, object files, shared objects, and archive members that get pulled out) are already known. If it isn't, we'll get a "undefined reference" error. So I don't know who will be the consumer of the dependency output.

One use case for ld.lld --dependency-file= is auto-linking which is supported by lld for both ELF and COFF. In case of auto-linking, you don't know which additional dependencies are going to be included until you process all input files. Another case that's specific to ELF are linker scripts which allows specifying additional inputs using the INPUT statement e.g. libc++ installs the following linker script INPUT(libc++.so.1 -lunwind -lc++abi) as libc++.so.

Regarding the depfile format, the simplified Makefile format is the de-facto standard for depfiles and is emitted and supported by build systems like Make or Ninja (which is generated by number of other high-level build systems like CMake, Meson, GN, etc.). Using this format would make this functionality immediately usable.

MaskRay added a subscriber: bd1976llvm.EditedAug 5 2019, 10:43 PM

One use case for ld.lld --dependency-file= is auto-linking which is supported by lld for both ELF and COFF. In case of auto-linking, you don't know which additional dependencies are going to be included until you process all input files. Another case that's specific to ELF are linker scripts which allows specifying additional inputs using the INPUT statement e.g. libc++ installs the following linker script INPUT(libc++.so.1 -lunwind -lc++abi) as libc++.so.

Regarding the depfile format, the simplified Makefile format is the de-facto standard for depfiles and is emitted and supported by build systems like Make or Ninja (which is generated by number of other high-level build systems like CMake, Meson, GN, etc.). Using this format would make this functionality immediately usable.

I carefully think about this. deplibs is probably the only reasonable use case. Let me give an example to ensure I understand it correctly:

touch d.h

cat > a.c <<e
#include "d.h"
#pragma comment(lib, "b.a")
#pragma comment(lib, "c.a")
extern int b, c;
int main() { return b+c; }
e

cat > b.c <<e
int b;
e

cat > c.c <<e
int c;
e

cc -c b.c c.c; ar r b.a b.o; ar r c.a b.o

build.ninja:

rule cc
  depfile = $out.d
  command = clang -MD -MT $out -MF $out.d -c $in -o $out

rule link
  depfile = $out.link.d
  command = clang $in -fuse-ld=lld -Wl,--dependency-file=$out.link.d -o $out

build a: link a.o
build a.o: cc a.c

Alternatively, Makefile:

CFLAGS = -MD -MP -MF $@.d  # note -MP is not required in build.ninja
CC = clang
LDFLAGS = -fuse-ld=lld -Wl,--dependency-file=$@.link.d
a_deps = a.o  # explicit deps (.deplibs are excluded)

# implicit rule of a cannot be used because it would link all files specified in a.link.d together
a: $(a_deps)
	$(LINK.c) $(a_deps) $(LDLIBS) -o $@

-include a.o.d
-include a.link.d

This dependency file approach has an assumption: a.o.d is not older than a.o, a.link.d is not older than a.

  1. if a.o.d does not exist, according to the assumption, a.o does not exist as well. ninja or make generates a.o and a.o.d. a and a.link.d are similar.
  2. if a.c or d.h is updated, a get generated due to the dependency described in a.o.d.
  3. if d.h is deleted and #include "d.h" is deleted from a.c. Note there is no rule to generate d.h. ninja seems to ignore its nonexistence, while Makefile needs a force target. That is why in the Makefile version we use -MP.
  4. if b.a is deleted and #pragma comment(lib, "b.a") is deleted from a.c. ninja ignores b.a and generates a. However, GNU make will error: make: *** No rule to make target 'b.a', needed by 'a'. Stop.

I think ld.lld --write-dependencies= can be plugged into a ninja based build system (including cmake -G Ninja) without change, but not into a Makefile based build system.
To use this feature in meson/bazel/buck/..., a post-processing tool is definitely needed.
However, if a post-processing tool is ever needed, ld.lld -t will be just as simple (or difficult) to use.

Let's revisit this ninja fragment:

rule link
  depfile = $out.link.d
  command = clang $in -fuse-ld=lld -Wl,--dependency-file=$out.link.d -o $out

#pragma comment(lib, "b.a") is not common. If the build system goes through the trouble to use deplibs,
there may likely be other post processing steps. I wonder if it can just point -fuse-ld= to a custom wrapper that processes ld.lld -t output and generates $out.link.d.

So the question is whether we consider this feature too application-specific.
For simple dependency information, we can already use -t.
For richer information, there is a proposal for dependency analysis (https://lists.llvm.org/pipermail/llvm-dev/2019-February/130565.html)

I'd also like to hear from @bd1976llvm, who implemented the deplibs feature in the first place.

lld/ELF/Config.h
87

This probably should be

llvm::SetVector<std::string, std::vector<std::string>,
                llvm::StringSet<llvm::MallocAllocator>>
    dependencyFiles; // for --dependency-file

Some StringRef elements are not internalized.

lld/ELF/Driver.cpp
1377

--dependency-file

ruiu added a comment.Aug 6 2019, 5:35 AM

As to post-process -verbose output, I think that that is too fragile to depend on, as -verbose is mainly for inspection and debugging. Any arbitrary output may be added to the -verbose output that can confuse post-process scripts. I also don't want to recommend users post-process -verbose output because of the same reason -- if people rely on the current -verbose output too much, any change to the file may break scripts in the wild which could prevent us from adding new messages there.

As to post-process -verbose output, I think that that is too fragile to depend on, as -verbose is mainly for inspection and debugging. Any arbitrary output may be added to the -verbose output that can confuse post-process scripts. I also don't want to recommend users post-process -verbose output because of the same reason -- if people rely on the current -verbose output too much, any change to the file may break scripts in the wild which could prevent us from adding new messages there.

It is -t | --trace in ELF, not --verbose. (In COFF there is no --trace counterpart so I asked whether we should add a /trace on the mailing list.) The output format of --trace is stable.

Has there been any progress on this? This need for this came out again in our build today: we currently generate input linker scripts that pull in the right library at link time (e.g. this would like INPUT(/path/to/real/library.so)). Our distributed build needs to know about all the files that are needed as input to send them over to the server, and these input linker scripts are currently posing a problem. We were hoping to use the depfile support to find all such files; the system already knows how to parse depfiles to handle header dependencies so there isn't any additional support needed there. We would really like to see this landed as it'd unblock our progress or having to come up with some custom solution like manually parsing the input linker scripts.

#pragma comment(lib, "b.a") is not common. If the build system goes through the trouble to use deplibs,

These are already being used by libc++ and libc++abi so I wouldn't say it's uncommon.

there may likely be other post processing steps. I wonder if it can just point -fuse-ld= to a custom wrapper that processes ld.lld -t output and generates $out.link.d.

This sounds like a lot of extra effort that would have to be replicated in every build system that would like to use the depfile support (i.e. every build that wants to use linker depfiles would need such wrapper and integrate it into their build).

The implementation of this feature is very minimal and doesn't significantly increase the linker complexity so I don't understand why there's so much pushback?

#pragma comment(lib, "b.a") is not common. If the build system goes through the trouble to use deplibs,

These are already being used by libc++ and libc++abi so I wouldn't say it's uncommon.

there may likely be other post processing steps. I wonder if it can just point -fuse-ld= to a custom wrapper that processes ld.lld -t output and generates $out.link.d.

This sounds like a lot of extra effort that would have to be replicated in every build system that would like to use the depfile support (i.e. every build that wants to use linker depfiles would need such wrapper and integrate it into their build).

See my analysis at https://reviews.llvm.org/D65430#1616208 , I don't think the in-tree C++ implementation reduces complexity for other build systems. It works for Ninja, but not even for Make.

The implementation of this feature is very minimal and doesn't significantly increase the linker complexity so I don't understand why there's so much pushback?

We've been innovative in introducing useful options. That is a good thing. But we've already gotten similar features in this area (e.g. --trace, --verbose). The pushback is about the generality of a feature. The question raised in https://reviews.llvm.org/D65430#1616208 is never answered. I hope we can be aware of feature creep. Ideally we just implement powerful and orthogonal features that can meet a wide range of demand, instead of inventing a new option for every feature request that only differs from the previous ones in some nuance.

phosek added a comment.Fri, Nov 8, 2:21 PM

#pragma comment(lib, "b.a") is not common. If the build system goes through the trouble to use deplibs,

These are already being used by libc++ and libc++abi so I wouldn't say it's uncommon.

there may likely be other post processing steps. I wonder if it can just point -fuse-ld= to a custom wrapper that processes ld.lld -t output and generates $out.link.d.

This sounds like a lot of extra effort that would have to be replicated in every build system that would like to use the depfile support (i.e. every build that wants to use linker depfiles would need such wrapper and integrate it into their build).

See my analysis at https://reviews.llvm.org/D65430#1616208 , I don't think the in-tree C++ implementation reduces complexity for other build systems. It works for Ninja, but not even for Make.

The implementation of this feature is very minimal and doesn't significantly increase the linker complexity so I don't understand why there's so much pushback?

We've been innovative in introducing useful options. That is a good thing. But we've already gotten similar features in this area (e.g. --trace, --verbose). The pushback is about the generality of a feature. The question raised in https://reviews.llvm.org/D65430#1616208 is never answered. I hope we can be aware of feature creep. Ideally we just implement powerful and orthogonal features that can meet a wide range of demand, instead of inventing a new option for every feature request that only differs from the previous ones in some nuance.

That's fair, but I don't feel like we've been upholding that principle in the past and I don't understand why we should start right now. For example, we've introduced a number of features that are specific to Android: -use-android-relr-tags, or Partitions. We've also added IDE specific features: -vs-diagnostics. While it's true that depfiles could only be utilized by Ninja at the moment, Ninja is being used as a backend for multiple build systems: CMake, GN, Meson, Blueprint, kati, to name the most popular ones. This already seems like a much higher bar than for the other features.

To try and answer your questions from https://reviews.llvm.org/D65430#1616208, there are two different use cases I mentioned before, deplibs is one of them, input linker scripts is the other one. Regarding the issue with Make, could we address that using the same approach as -MP does by generating a phony target for each dependency other than the main file? For your example we would generate the following depfile:

a: b.a
b.a:

I do agree that Makefile format is not great, but it's what existing compilers and build systems already use and support so it seems like the most pragmatic option for now. I know there's a discussion about introducing a new (possibly JSON based) depfile format for C++ modules, and if that indeed happens and build system implement support for this format, we could extend lld to support the new format as well, but I don't think we should be blocking this feature on that.