This is an archive of the discontinued LLVM Phabricator instance.

Add a feature to dump dependency graph.
Needs ReviewPublic

Authored by MaskRay on Sep 10 2019, 1:40 AM.

Details

Summary

This patch is not intended to be submitted as-is.

This patch implements an experimental feature that can be enabled
by --print-dependency-graph flag. When you give that option to lld,
lld prints out dependencies between files in the following format:

<file1> <file2> <symbol>

The above line means <file1> uses <symbol> defined by <file2>.

This is a simple feature but seems pretty useful when debugging
why some file gets loaded. To find out why some file gets linked,
you can start from a special file <commandline> to follow the
triplets until you reach the file.

I uploaded an example output of --print-dependency-graph to
https://gist.github.com/rui314/f2f6b1bf5a30a12630183d5a0e5ad100

I don't think this feature is complete. At least I think I need
to do something for garbage-collected sections and ICF-merged
sections.

Rationale for the file format: I considered several different file
format for representing the dependency graph. I chose the above
format because a line-oriented text file format is easy to handle
by a scripting language. Since the same name is repeated many times
in an output, an output tend to be extremely large, but I think that
the advantage of simplicity outweighs the disadvantage, especially
for those who uses the option for the first time to debug some random
problem.

Event Timeline

ruiu created this revision.Sep 10 2019, 1:40 AM
Herald added a project: Restricted Project. · View Herald Transcript
ruiu edited the summary of this revision. (Show Details)Sep 10 2019, 1:43 AM

I looked at the sample provided and I think that if I was a new user, I'd find not clear what is "<internal>" probably.
What about including something like "<internal> is usually a symbol defined by linker" line to the header?

Also I am not sure what is situation now, but before we used to report "<internal>" for linker script defined symbols
in our messages sometimes. Would be nice to have a test for that case.

About printing "-": I am not sure how much it is usefull? Actually it was a bit confusing to see it for me at first. I think if file
is provided by command line it is enough to print "<commandline> <filename>" probably.

I think this feature will be very useful. If section-level dependency information (--gc-sections, --icf=) is required, we may need some variations of the triple representation ((<from input section>, <symbol>, <to input section>)) proposed in https://lists.llvm.org/pipermail/llvm-dev/2019-February/130565.html

lld/ELF/Driver.cpp
266

Should onCommandLine be set for a LazyObjFile?

ruiu updated this revision to Diff 219498.Sep 10 2019, 2:19 AM
  • remove "-" from "<commandline>" lines
ruiu added a comment.Sep 10 2019, 2:21 AM

I looked at the sample provided and I think that if I was a new user, I'd find not clear what is "<internal>" probably.
What about including something like "<internal> is usually a symbol defined by linker" line to the header?

Also I am not sure what is situation now, but before we used to report "<internal>" for linker script defined symbols
in our messages sometimes. Would be nice to have a test for that case.

"<internal>" does not represent a symbol but a file. Ideally we should give more meaningful name for each virtual file, but I think that's beyond the scope of this patch.

About printing "-": I am not sure how much it is usefull? Actually it was a bit confusing to see it for me at first. I think if file
is provided by command line it is enough to print "<commandline> <filename>" probably.

I can remove "-" from the "<commandline>" lines.

ruiu marked an inline comment as done.Sep 10 2019, 2:24 AM
ruiu added inline comments.
lld/ELF/Driver.cpp
266

I can but we don't have to because we are not going to use the bit for LazyObjFile.

grimar added a comment.EditedSep 10 2019, 2:27 AM

I looked at the sample provided and I think that if I was a new user, I'd find not clear what is "<internal>" probably.
What about including something like "<internal> is usually a symbol defined by linker" line to the header?

Also I am not sure what is situation now, but before we used to report "<internal>" for linker script defined symbols
in our messages sometimes. Would be nice to have a test for that case.

"<internal>" does not represent a symbol but a file. Ideally we should give more meaningful name for each virtual file, but I think that's beyond the scope of this patch.

If it reports "<internal>" for a symbol from a linker script still - it should be mentioned in the header. Otherwise it might be not clear where a sybmol is coming from.

ruiu added a comment.Sep 10 2019, 2:46 AM

Here is an example of a simple post-link analysis using good old awk command. Assume file graph contains an output of --print-dependency-graph generated when linking bin/llvm-strings.

You noticed that Path.cpp.o in libLLVMSupport.a gets linked and were wondering why. You can get a list of files that depend on Path.cpp.o with the following command:

$ awk '$2 == "lib/libLLVMSupport.a(Path.cpp.o)" { print $1 }' graph | sort -u
lib/libLLVMSupport.a(CommandLine.cpp.o)
lib/libLLVMSupport.a(MD5.cpp.o)
lib/libLLVMSupport.a(MemoryBuffer.cpp.o)
lib/libLLVMSupport.a(Process.cpp.o)
lib/libLLVMSupport.a(Program.cpp.o)
lib/libLLVMSupport.a(raw_ostream.cpp.o)
lib/libLLVMSupport.a(Signals.cpp.o)

Now you are wondering what symbols are shared between them, so you type in the following command.

$ awk -F'\t' '$2 == "lib/libLLVMSupport.a(Path.cpp.o)" { print $1 " " $3 }' graph | sort -u
lib/libLLVMSupport.a(CommandLine.cpp.o) llvm::sys::fs::current_path(llvm::SmallVectorImpl<char>&)
lib/libLLVMSupport.a(CommandLine.cpp.o) llvm::sys::fs::equivalent(llvm::Twine const&, llvm::Twine const&, bool&)
lib/libLLVMSupport.a(CommandLine.cpp.o) llvm::sys::path::append(llvm::SmallVectorImpl<char>&, llvm::Twine const&, llvm::Twine const&, llvm::Twine const&, llvm::Twine const&)
lib/libLLVMSupport.a(CommandLine.cpp.o) llvm::sys::path::filename(llvm::StringRef, llvm::sys::path::Style)
lib/libLLVMSupport.a(CommandLine.cpp.o) llvm::sys::path::is_relative(llvm::Twine const&, llvm::sys::path::Style)
lib/libLLVMSupport.a(CommandLine.cpp.o) llvm::sys::path::parent_path(llvm::StringRef, llvm::sys::path::Style)
lib/libLLVMSupport.a(MD5.cpp.o) llvm::ArrayRef<unsigned char> llvm::makeArrayRef<unsigned char>(unsigned char const*, unsigned long)
lib/libLLVMSupport.a(MD5.cpp.o) void llvm::SmallVectorTemplateBase<char, true>::uninitialized_move<char*, char*>(char*, char*, char*)
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::closeFile(int&)
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::getStdinHandle()
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::mapped_file_region::alignment()
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::mapped_file_region::const_data() const
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::mapped_file_region::~mapped_file_region()
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::mapped_file_region::mapped_file_region(int, llvm::sys::fs::mapped_file_region::mapmode, unsigned long, unsigned long, std::error_code&)
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::openNativeFileForRead(llvm::Twine const&, llvm::sys::fs::OpenFlags, llvm::SmallVectorImpl<char>*)
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::openNativeFile(llvm::Twine const&, llvm::sys::fs::CreationDisposition, llvm::sys::fs::FileAccess, llvm::sys::fs::OpenFlags, unsigned int)
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::readNativeFile(int, llvm::MutableArrayRef<char>)
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::readNativeFileSlice(int, llvm::MutableArrayRef<char>, unsigned long)
lib/libLLVMSupport.a(MemoryBuffer.cpp.o) llvm::sys::fs::status(int, llvm::sys::fs::file_status&)
lib/libLLVMSupport.a(Process.cpp.o) llvm::SmallString<128u>::SmallString(llvm::StringRef)
lib/libLLVMSupport.a(Process.cpp.o) llvm::SmallVector<char, 128u>::SmallVector<char const*, void>(char const*, char const*)
lib/libLLVMSupport.a(Process.cpp.o) llvm::sys::fs::access(llvm::Twine const&, llvm::sys::fs::AccessMode)
lib/libLLVMSupport.a(Process.cpp.o) llvm::sys::fs::equivalent(llvm::Twine const&, llvm::Twine const&, bool&)
lib/libLLVMSupport.a(Process.cpp.o) llvm::sys::fs::exists(llvm::Twine const&)
lib/libLLVMSupport.a(Process.cpp.o) llvm::sys::path::append(llvm::SmallVectorImpl<char>&, llvm::Twine const&, llvm::Twine const&, llvm::Twine const&, llvm::Twine const&)
lib/libLLVMSupport.a(Process.cpp.o) llvm::sys::path::is_absolute(llvm::Twine const&, llvm::sys::path::Style)
lib/libLLVMSupport.a(Process.cpp.o) std::chrono::duration<long, std::ratio<1l, 1000000000l> >::count() const
lib/libLLVMSupport.a(Process.cpp.o) std::chrono::duration<long, std::ratio<1l, 1000000000l> >::duration<long, void>(long const&)
lib/libLLVMSupport.a(Process.cpp.o) std::chrono::duration<long, std::ratio<1l, 1l> >::count() const
lib/libLLVMSupport.a(Process.cpp.o) std::chrono::duration<long, std::ratio<1l, 1l> >::duration<long, void>(long const&)
lib/libLLVMSupport.a(Process.cpp.o) std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >::time_since_epoch() const
lib/libLLVMSupport.a(Program.cpp.o) llvm::SmallString<128u>::SmallString(llvm::StringRef)
lib/libLLVMSupport.a(Program.cpp.o) llvm::SmallVector<char, 128u>::SmallVector<char const*, void>(char const*, char const*)
lib/libLLVMSupport.a(Program.cpp.o) llvm::SmallVectorTemplateCommon<llvm::StringRef, void>::back()
lib/libLLVMSupport.a(Program.cpp.o) llvm::sys::fs::access(llvm::Twine const&, llvm::sys::fs::AccessMode)
lib/libLLVMSupport.a(Program.cpp.o) llvm::sys::fs::can_execute(llvm::Twine const&)
lib/libLLVMSupport.a(Program.cpp.o) llvm::sys::fs::exists(llvm::Twine const&)
lib/libLLVMSupport.a(Program.cpp.o) llvm::sys::path::append(llvm::SmallVectorImpl<char>&, llvm::Twine const&, llvm::Twine const&, llvm::Twine const&, llvm::Twine const&)
lib/libLLVMSupport.a(raw_ostream.cpp.o) llvm::sys::fs::openFile(llvm::Twine const&, int&, llvm::sys::fs::CreationDisposition, llvm::sys::fs::FileAccess, llvm::sys::fs::OpenFlags, unsigned int)
lib/libLLVMSupport.a(Signals.cpp.o) llvm::sys::fs::access(llvm::Twine const&, llvm::sys::fs::AccessMode)
lib/libLLVMSupport.a(Signals.cpp.o) llvm::sys::fs::createTemporaryFile(llvm::Twine const&, llvm::StringRef, int&, llvm::SmallVectorImpl<char>&)
lib/libLLVMSupport.a(Signals.cpp.o) llvm::sys::fs::createTemporaryFile(llvm::Twine const&, llvm::StringRef, llvm::SmallVectorImpl<char>&)
lib/libLLVMSupport.a(Signals.cpp.o) llvm::sys::fs::exists(llvm::Twine const&)
lib/libLLVMSupport.a(Signals.cpp.o) llvm::sys::fs::getMainExecutable[abi:cxx11](char const*, void*)
lib/libLLVMSupport.a(Signals.cpp.o) llvm::sys::fs::remove(llvm::Twine const&, bool)
lib/libLLVMSupport.a(Signals.cpp.o) llvm::sys::path::parent_path(llvm::StringRef, llvm::sys::path::Style)

You can repeat this process until you reach "<commandline>" file.

I looked at the sample provided and I think that if I was a new user, I'd find not clear what is "<internal>" probably.
What about including something like "<internal> is usually a symbol defined by linker" line to the header?

Also I am not sure what is situation now, but before we used to report "<internal>" for linker script defined symbols
in our messages sometimes. Would be nice to have a test for that case.

"<internal>" does not represent a symbol but a file. Ideally we should give more meaningful name for each virtual file, but I think that's beyond the scope of this patch.

If it reports "<internal>" for a symbol from a linker script still - it should be mentioned in the header. Otherwise it might be not clear where a sybmol is coming from.

If we want to have a meaningful file name for synthesized sections, linker synthesized symbols, and SymbolAssignment symbols, we have to create a new InputFile::Kind

std::string lld::toString(const InputFile *f) {
  if (!f)  // this can be null for various reasons above
    return "<internal>";

I looked at the sample provided and I think that if I was a new user, I'd find not clear what is "<internal>" probably.
What about including something like "<internal> is usually a symbol defined by linker" line to the header?

Also I am not sure what is situation now, but before we used to report "<internal>" for linker script defined symbols
in our messages sometimes. Would be nice to have a test for that case.

"<internal>" does not represent a symbol but a file. Ideally we should give more meaningful name for each virtual file, but I think that's beyond the scope of this patch.

If it reports "<internal>" for a symbol from a linker script still - it should be mentioned in the header. Otherwise it might be not clear where a sybmol is coming from.

If we want to have a meaningful file name for synthesized sections, linker synthesized symbols, and SymbolAssignment symbols, we have to create a new InputFile::Kind

std::string lld::toString(const InputFile *f) {
  if (!f)  // this can be null for various reasons above
    return "<internal>";

I suggested the same previously: D45375

I looked at the sample provided and I think that if I was a new user, I'd find not clear what is "<internal>" probably.
What about including something like "<internal> is usually a symbol defined by linker" line to the header?

Also I am not sure what is situation now, but before we used to report "<internal>" for linker script defined symbols
in our messages sometimes. Would be nice to have a test for that case.

"<internal>" does not represent a symbol but a file. Ideally we should give more meaningful name for each virtual file, but I think that's beyond the scope of this patch.

If it reports "<internal>" for a symbol from a linker script still - it should be mentioned in the header. Otherwise it might be not clear where a sybmol is coming from.

If we want to have a meaningful file name for synthesized sections, linker synthesized symbols, and SymbolAssignment symbols, we have to create a new InputFile::Kind

std::string lld::toString(const InputFile *f) {
  if (!f)  // this can be null for various reasons above
    return "<internal>";

I suggested the same previously: D45375

Interesting. So this current form of the patch is actually an alternative representation of the existing feature --cref. I wonder if we can change the output format of --cref and reuse the option name, or name it --cref=format.

I think that this will definitely be useful, especially to work out all the files that are involved in referencing a symbol. Looking to the future I'd recommend keeping the symbol dependency information, which is mostly useful for objects and libraries, from relocation dependencies as used in GC as the output is likely to be different and will answer different questions. How do you want the final interface to look when there are many diagnostic analyses?

As I think I mentioned in the discussion on the mailing lists, the output of the final state of a data structure can lose information that is useful in tracking problems. A complementary technique is to optionally print information as we process, this can help to fix problems caused by ordering of libraries.

lld/ELF/Driver.cpp
876

There are many types of dependency graph, for example the undefined symbol dependencies are not the same as GC/ICF dependencies. I think it would be worth being more specific, with the option name. There are also some options on future expansion:

  • One flag for each dependency like option each with a different file name
  • One flag for each dependency, but outputting to a single "map like" file
  • An aggregated option like --dependency=<comma separated list> of topics.
1680

This is more a slice of the internal symbol table. In particular it is the set of symbols defined by other objects. We may want to be precise about what we are printing, an object could define a weak symbol, but not refer to it, only to have it defined non-weakly in another file. This would show up as uses.

For each object file participating in the linking of <object file>, print the Symbols that were defined by other object files.
The format of the file is:
# <file1> <file2> <symbol>
# meaning <file1> has <symbol> in its symbol table which was defined in <file2>.
1686

Personally I'd prefer not to include the files on the command line in this particular diagnostic. Although that information is useful I'd prefer to see it in a different diagnostic option to keep this one focused on symbol dependencies.

I can see myself writing a python/perl script to process this for large programs and whereas I could simply cut a static number of lines for the header, I'd have to account for the <commandline> in the parser.

MaskRay commandeered this revision.Jul 31 2020, 5:27 PM
MaskRay edited reviewers, added: ruiu; removed: MaskRay.

Closed by D82437

MaskRay abandoned this revision.Jul 31 2020, 5:27 PM
pcc added a subscriber: pcc.Aug 3 2020, 4:28 PM

I think this is actually a different feature to D82437?

In D67388#2192129, @pcc wrote:

I think this is actually a different feature to D82437?

My bad... I intended to close D65430

MaskRay reclaimed this revision.Aug 3 2020, 4:54 PM

This can be useful!

For me right now: building a shared library from objects built with -ffunction-sections, I'm mostly interested in a finer grain information than "which object file gets loaded": I'd like to be able to track down why a particular symbols end up in the final binary / isn't stripped.