This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/ELF/
-
ELF/
-
CMakeLists.txt
1/1
Driver.cpp
1/1
Explain.h
13/27
Explain.cpp
-
InputFiles.h
-
InputFiles.cpp
2/5
Options.td

Differential D69607

Add a feature to explain why some file gets included to the linker's output
Needs ReviewPublic

Authored by ruiu on Oct 30 2019, 12:33 AM.

Download Raw Diff

Details

Reviewers

MaskRay
pcc
grimar
peter.smith
• espindola

Summary

This patch proposes a new option --explain. This patch is not intended to be
committed as-is but is for discussion.

So, I sent https://reviews.llvm.org/D67388 to dump the internal dependency graph
from the linker so that users can run arbitrary graph algorithms to analyze
linker outputs. But I believe in most cases what users want to know is simple:
why some file, that wasn't previously linked, is now included to the final
binary? I think this situation occurs so frequently that we probably should add
a new feature that answers to that particular question, so that users don't have
to write a graph analysis program.

This is an example output of lld when --explain=lib/libLLVMSupport.a(APInt.cpp.o)
is given (shortened to fit to the screen).

This is why 'lib/libLLVMSupport.a(APInt.cpp.o)' is linked:

'(--entry option)' uses '_start' defined in '/usr/lib/x86_64-linux-gnu/crt1.o'
which uses 'main' defined in 'tools/lld/tools/lld/CMakeFiles/lld.dir/lld.cpp.o'
which uses 'StringRef::endswith_lower()' defined in 'lib/libLLVMSupport.a(StringRef.cpp.o)'
which uses 'APInt::zext()' defined in 'lib/libLLVMSupport.a(APInt.cpp.o)'

What we are doing in this patch is the basic breadth-first search in the
dependency graph.

The feature implemented in this patch is somewhat redundant once we land
https://reviews.llvm.org/D67388, but this option seems pretty practical and easy
to use. So, I guess that adding something like this would make users life a bit
easier.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 40331
Build 40435: arc lint + arc unit

Event Timeline

ruiu created this revision.Oct 30 2019, 12:33 AM

Herald added a reviewer: • espindola. · View Herald TranscriptOct 30 2019, 12:33 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: arichardson, mgorny, emaste. · View Herald Transcript

Harbormaster completed remote builds in B40251: Diff 227036.Oct 30 2019, 12:40 AM

Looks useful.

lld/ELF/Explain.cpp
92	Perhaps worth to show a list of files available for --explain to user on this step? Or have another way to get a list of files that can be passed here. May be explain without arguments could show something.

I agree that having the most common case built in is valuable. Unless there is an analysis program that is shipped with the linker I suspect that the majority of users won't want to write their own algorithm.
Looking at the example output:

'(--entry option)' uses '_start'
'/usr/lib/x86_64-linux-gnu/crt1.o' uses 'main'
'lld.cpp.o' uses 'lld::elf::link(llvm::ArrayRef<char const*>)'
'lib/liblldELF.a(Driver.cpp.o)' uses 'llvm::object::Archive::create()'
'lib/libLLVMObject.a(Archive.cpp.o)' uses 'llvm::Expected<bool>::operator bool()'
'lib/libLLVMPasses.a(PassBuilder.cpp.o)' uses 'llvm::InlinerPass::~InlinerPass()'
'lib/libLLVMipo.a(Inliner.cpp.o)'

I'd be tempted to start with the last object in the chain that is unconditionally present on the command line. For example _start, crt1.o, etc. will always be present and most likely not visible to the user as they have been added by the compiler driver. In this case I'd expect lld.cpp.o to be the first entry in the diagnostic.

Although it is implicit, it may be worth stating the dependency as:

'lld.cpp.o' uses 'lld::elf::link(llvm::ArrayRef<char const*>)' defined in 'lib/liblldELF.a(Driver.cpp.o)'

It would make it a bit more verbose, but everything is on one line.

I think this is a cool feature. Could it accept symbol names in addition to files as well? ld64 has a "-why_live" for symbols, and a "-why_load" for object files (https://www.manpagez.com/man/1/ld64/). I've used why_live a lot more than why_load.

Just for the sake of correctness, this is provided by GNU ld as part of the map file (-Wl,-Map=foo.map). It lists all archive members and which object file and symbol triggered the inclusion.

I agree with Nico that it would be nice if this could take symbol names as well. Maybe the --explain output could be added to the output of --trace`?

lld/ELF/Explain.cpp
96	Hmm, object files aren't really gc roots, so this could give misleading output (e.g. `ld a.o b.o --explain a` with a.o containing `a: ret` and b.o containing `b: call a` with only b in dynsym won't report b.o). Also, it misses archives included with `--whole-archive`. Shouldn't this be more similar to MarkLive.cpp? In other words, include dynsym and don't treat object files as roots.
126	Shouldn't it also look at `dependentSections`?
168	I don't think the `isWeak` part here is correct.

MaskRay added inline comments.Oct 30 2019, 11:30 PM

lld/ELF/Explain.cpp
102	--init and --fini are used in --gc-sections but they do not fetch lazy definitions. I do not know why GNU ld behaves this way.
158	We can make edges a superset of specialEdges, then we can just delete `seen` and use `edges.try_emplace(...)` here.
168	`-u`/`-e` can fetch a weak lazy definition. This may cause a bogus `file unreachable` error.

MaskRay added inline comments.Oct 30 2019, 11:33 PM

lld/ELF/Driver.cpp
1990	Missing full stop.
lld/ELF/Explain.h
12	Missing `#include "lld/Common/LLVM.h"`

MaskRay added inline comments.Oct 30 2019, 11:37 PM

lld/ELF/Options.td
46	It is not obvious that the user is supposed to specify `b.a(b.o)` for an object file in an archive or `b.o` for an object between `--start-lib` and `--end-lib`.

In D69607#1726792, @peter.smith wrote:
I agree that having the most common case built in is valuable. Unless there is an analysis program that is shipped with the linker I suspect that the majority of users won't want to write their own algorithm.
Looking at the example output:
'(--entry option)' uses '_start'
'/usr/lib/x86_64-linux-gnu/crt1.o' uses 'main'
'lld.cpp.o' uses 'lld::elf::link(llvm::ArrayRef<char const*>)'
'lib/liblldELF.a(Driver.cpp.o)' uses 'llvm::object::Archive::create()'
'lib/libLLVMObject.a(Archive.cpp.o)' uses 'llvm::Expected<bool>::operator bool()'
'lib/libLLVMPasses.a(PassBuilder.cpp.o)' uses 'llvm::InlinerPass::~InlinerPass()'
'lib/libLLVMipo.a(Inliner.cpp.o)'
I'd be tempted to start with the last object in the chain that is unconditionally present on the command line. For example _start, crt1.o, etc. will always be present and most likely not visible to the user as they have been added by the compiler driver. In this case I'd expect lld.cpp.o to be the first entry in the diagnostic.

Although it is implicit, it may be worth stating the dependency as:
'lld.cpp.o' uses 'lld::elf::link(llvm::ArrayRef<char const*>)' defined in 'lib/liblldELF.a(Driver.cpp.o)'
It would make it a bit more verbose, but everything is on one line.

How about the new one? I think I can reverse the order by making the message in a passive tense, but it seems that this order is a bit easier to read.

lld/ELF/Explain.cpp
92	Added a hint as to how to get a list of input files.
96	I think object files are actually roots in this features sense if you do not pass `-gc-sections`, so we need to handle two different cases, `-no-gc-sections` and `-gc-sections`. How is this new code?
126	Done.
168	Yeah, this was in a wrong place. I moved this to `enqueue`.

address review comments

Harbormaster completed remote builds in B40324: Diff 227227.Oct 30 2019, 11:53 PM

ruiu edited the summary of this revision. (Show Details)Oct 30 2019, 11:56 PM

ruiu marked 5 inline comments as done.Oct 31 2019, 12:11 AM

ruiu added inline comments.

lld/ELF/Explain.cpp
158	We probably could but looks like it also makes sense to keep them distinctive sets, so I'm leaning towards not doing that.

address vreview comments

MaskRay added inline comments.Oct 31 2019, 12:17 AM

lld/ELF/Explain.cpp
157	We may consider turning the error into a regular print. A similar functionality `-y not_exist` does not error.

Harbormaster completed remote builds in B40328: Diff 227231.Oct 31 2019, 12:21 AM

ruiu marked an inline comment as done.Oct 31 2019, 12:29 AM

ruiu added inline comments.

lld/ELF/Explain.cpp
158	It's not a strong preference, but error is perhaps better than printing it out as a non-error message? When the control reaches here, something is wrong from the user's perspective.

ruiu edited the summary of this revision. (Show Details)Oct 31 2019, 12:35 AM

grimar added inline comments.Oct 31 2019, 1:54 AM

lld/ELF/Explain.cpp
176	But it doesn't seem helps for the case mentioned in the description? If we have a `main.o` compiled from .global _start; _start: callq foo And a `foo.a` which contains a `foo.o` compiled from .globl foo; foo: Then when I invoke `-flavor gnu main.o foo.a "-o" out --verbose` I see: lld: main.o lld: foo.a Then, like a possible normal user :) I do: `-flavor gnu --explain=foo.a main.o foo.a "-o" out --verbose` and I see: lld: main.o lld: foo.a lld: error: --explain: no such file or symbol: 'foo.a'. Use --verbose option to see a list of input files. How I am supposed to realise that I should invoke `-flavor gnu --explain=foo.a(foo.o) main.o foo.a "-o" out --verbose` to see the following? lld: main.o lld: foo.a Explain: This is why 'foo.a(foo.o)' is linked: Explain: Explain: '(--entry option)' uses '_start' defined in 'main.o' Explain: which uses 'foo' defined in 'foo.a(foo.o)'

grimar added inline comments.Oct 31 2019, 1:57 AM

lld/ELF/Explain.cpp
176	btw, note that it says `'(--entry option)' uses '_start'`, but in fact I am not using any command line option like `-e`

ruiu marked 2 inline comments as done.Oct 31 2019, 2:09 AM

ruiu added inline comments.

lld/ELF/Explain.cpp
176	That's I think fine. It's an implicit option but still there.
176	Well, I don't think we should print out all section names because it's just too long, and I believe showing a hint is better. The problem is that --verbose option doesn't show archive members, but that can be fixed simply by adding a log() call to fetch().

address review comments

Harbormaster completed remote builds in B40331: Diff 227243.Oct 31 2019, 2:11 AM

Thanks for the update, I'm happy with the explain output format.

One anomaly that I'm not sure is worth dealing with is the linkerscript GROUP command as used in libc.so

GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a  AS_NEEDED ( /lib/ld-linux-armhf.so.3 ) )

I think that these will just end up as input files on the command line. Although they are non-obvious and implicitly included due to the linker script.

lld/ELF/Explain.cpp
168	Global symbols defined by a comdat group may cause some confusion as there will be multiple object files that define the symbol, but only one group that is selected that will match here. Not a lot that we can do here. Perhaps if we knew that a symbol was defined in a group then we could print that out. "Explain: Symbol foo is defined in COMDAT group g, selected from from file foo.o". Perhaps wait and see if anyone does get confused.
lld/ELF/Options.td
47	A suggestion: Explain why a given object file gets linked in to the final binary. Objects can be specified by full path, or by a global symbol that they define. Objects defined in archives are specified by full/path/to/library(object) such as library.a(object.o).

MaskRay added inline comments.Oct 31 2019, 3:21 PM

lld/ELF/Explain.cpp
158	`error` inhibits producing an output. I have a feeling that retaining the output may be slightly more useful. Sometimes the user can specify multiple --explain. The user may still need the output though some of the --explain values are not existent.
168	Explaining more about COMDAT groups looks good to me. We essentially discarded all but one definitions, lost some information in the transformation, and the best we can do here is to select the prevailing definition. The user is hinted from the message that the archive order may matter. A COMDAT group is a SHT_GROUP section with the ELF group section flag GRP_COMDAT. Its name is almost always ".group". Shall we say `the COMDAT group with signature "g"`?
175	Is `--trace` better? --verbose suggests the raw archive names while --trace does not, and the --trace output does not include unused files. # --verbose ld.lld: a.a ld.lld: b.o ld.lld: a.a(a.o) # --trace b.o a.a(a.o)
lld/ELF/Options.td
47	Unfortunately this (lld::elf::InputFile::archiveName) is not the full path. Here are some examples: ld.lld a.a => a.a ld.lld -L. -la => ./liba.a ld.lld /tmp/c/a.a => /tmp/c/a.a So we may need path normalization here. Some users may resolve the path and feed that to lld.

added tests
added a man page entry
removed dependent section tracking because dependencies are within the same object file, while this explain feature works in the file granularity

Harbormaster completed remote builds in B40390: Diff 227410.Nov 1 2019, 3:18 AM

In D69607#1728450, @peter.smith wrote:
Thanks for the update, I'm happy with the explain output format.

One anomaly that I'm not sure is worth dealing with is the linkerscript GROUP command as used in libc.so
GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a  AS_NEEDED ( /lib/ld-linux-armhf.so.3 ) )
I think that these will just end up as input files on the command line. Although they are non-obvious and implicitly included due to the linker script.

In most cases I believe linker scripts are explicitly added to the command line by users, and that wouldn't cause too much confusion as they should know what they are doing. One exception is Linux's libc.so which is not a DSO but actually a linker script, but maybe that's the only implicit use case?

lld/ELF/Explain.cpp
158	I guess that's actually a good thing, no? I want users to use this option only interactively just by appending an --explain to an otherwise complete command line, and if a command line option passed by a user is not correct, that is, well, I think an error.
168	Currently it is not easy to know whether a section was in a comdat group or not. Fortunately, I don't think that's too confusing, at least we don't tell a lie to users. If we report that there's some dependency from one file to another, that dependency does exist in the source code, though other files may have similar dependencies to the same file. So I don't think we need to do something special for comdats.
175	Thank you for the suggestion. I replaced --verbose with --trace.
lld/ELF/Options.td
47	Added that text to the man page, thanks!
47	OK, I made a change to code so that we normalize pathnames before comparison.

address review comments

Harbormaster completed remote builds in B40391: Diff 227412.Nov 1 2019, 3:50 AM

This needs a test case.

add a test file which I forgot to add in a previous patch

Harbormaster completed remote builds in B40392: Diff 227415.Nov 1 2019, 3:59 AM

removed dependent section tracking because dependencies are within the same object file, while this explain feature works in the file granularity

Is it a good idea to make the feature use file granularity in the --gc-sections case? I would imagine that this feature would be used in tricky situations where the explanation for a section being included is not entirely obvious, so doing things at a file granularity may end up hiding critical information. That's just my opinion though.

lld/ELF/Explain.cpp
96	Maybe I'm missing something, but I don't see where dynsyms are being added in the new code in the `--gc-sections` case,

MaskRay added inline comments.Nov 5 2019, 9:16 AM

lld/ELF/Explain.cpp
96	We don't need to handle --init, --fini or .dynsym as opposed to MarkLive.cpp. # c.s -> c.o .globl _start _start: ret .section .text.foo,"ax" call bar # b.s -> b.o -> b.a % cat b.s .globl bar bar: call foo # a.s -> a.o -> a.a .globl foo foo: ret For example, in `ld.lld c.o b.a a.a --gc-sections -o a`, neither b.a nor a.a is necessary (none of their sections is retained) but we still need to link in b.a and a.a before discarding their sections. The archive handling logic happens before garbage collection.

In D69607#1726863, @thakis wrote:

I think this is a cool feature. Could it accept symbol names in addition to files as well? ld64 has a "-why_live" for symbols, and a "-why_load" for object files (https://www.manpagez.com/man/1/ld64/). I've used why_live a lot more than why_load.

@thakis The current implementation uses the option name --explain and allows either a symbol name and a filename pattern. Do you have a preference of using an alternative name, or using different option names?

In D69607#1734134, @MaskRay wrote:

In D69607#1726863, @thakis wrote:

I think this is a cool feature. Could it accept symbol names in addition to files as well? ld64 has a "-why_live" for symbols, and a "-why_load" for object files (https://www.manpagez.com/man/1/ld64/). I've used why_live a lot more than why_load.

@thakis The current implementation uses the option name --explain and allows either a symbol name and a filename pattern. Do you have a preference of using an alternative name, or using different option names?

I don't have an opinion on the flag name. --explain sounds fine, and it working with both file and symbol names seems fine too.

delphij added a subscriber: delphij.Aug 2 2020, 10:22 PM

Herald added a subscriber: dang. · View Herald TranscriptAug 2 2020, 10:22 PM

simon.giesecke added a subscriber: simon.giesecke.May 27 2021, 1:38 AM

Herald added a subscriber: pengfei. · View Herald TranscriptMay 27 2021, 1:38 AM

MaskRay mentioned this in D109572: [ELF] Add --why-extract= to query why archive members/lazy object files are extracted.Sep 10 2021, 12:38 AM

I created D109572 (--why-extract) as an alternative.

MaskRay mentioned this in rGa954bb18b143: [ELF] Add --why-extract= to query why archive members/lazy object files are….Sep 20 2021, 9:52 AM

Revision Contents

Path

Size

lld/

ELF/

1 line

5 lines

22 lines

252 lines

4 lines

4 lines

4 lines

Diff 227243

lld/ELF/CMakeLists.txt

Show All 22 Lines	add_lld_library(lldELF
Arch/X86.cpp		Arch/X86.cpp
Arch/X86_64.cpp		Arch/X86_64.cpp
ARMErrataFix.cpp		ARMErrataFix.cpp
CallGraphSort.cpp		CallGraphSort.cpp
DWARF.cpp		DWARF.cpp
Driver.cpp		Driver.cpp
DriverUtils.cpp		DriverUtils.cpp
EhFrame.cpp		EhFrame.cpp
		Explain.cpp
ICF.cpp		ICF.cpp
InputFiles.cpp		InputFiles.cpp
InputSection.cpp		InputSection.cpp
LTO.cpp		LTO.cpp
LinkerScript.cpp		LinkerScript.cpp
MapFile.cpp		MapFile.cpp
MarkLive.cpp		MarkLive.cpp
OutputSections.cpp		OutputSections.cpp
Show All 30 Lines

lld/ELF/Driver.cpp

Show All 18 Lines
// be harmful when you are doing cross-linking. Therefore, in LLD, we		// be harmful when you are doing cross-linking. Therefore, in LLD, we
// simply trust the compiler driver to pass all required options and		// simply trust the compiler driver to pass all required options and
// don't try to make effort on our side.		// don't try to make effort on our side.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Driver.h"		#include "Driver.h"
#include "Config.h"		#include "Config.h"
		#include "Explain.h"
#include "ICF.h"		#include "ICF.h"
#include "InputFiles.h"		#include "InputFiles.h"
#include "InputSection.h"		#include "InputSection.h"
#include "LinkerScript.h"		#include "LinkerScript.h"
#include "MarkLive.h"		#include "MarkLive.h"
#include "OutputSections.h"		#include "OutputSections.h"
#include "ScriptParser.h"		#include "ScriptParser.h"
#include "SymbolTable.h"		#include "SymbolTable.h"
▲ Show 20 Lines • Show All 1,946 Lines • ▼ Show 20 Lines	template <class ELFT> void LinkerDriver::link(opt::InputArgList &args) {
// Read the callgraph now that we know what was gced or icfed		// Read the callgraph now that we know what was gced or icfed
if (config->callGraphProfileSort) {		if (config->callGraphProfileSort) {
if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file))		if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file))
if (Optional<MemoryBufferRef> buffer = readFile(arg->getValue()))		if (Optional<MemoryBufferRef> buffer = readFile(arg->getValue()))
readCallGraph(*buffer);		readCallGraph(*buffer);
readCallGraphsFromObjectFiles<ELFT>();		readCallGraphsFromObjectFiles<ELFT>();
}		}

		// Handle --explain.
		MaskRayUnsubmitted Done Reply Inline Actions Missing full stop. MaskRay: Missing full stop.
		for (auto *arg : args.filtered(OPT_explain))
		explain<ELFT>(arg->getValue());

// Write the result to the file.		// Write the result to the file.
writeResult<ELFT>();		writeResult<ELFT>();
}		}

} // namespace elf		} // namespace elf
} // namespace lld		} // namespace lld

lld/ELF/Explain.h

This file was added.

				//===- Explain.h ----------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLD_ELF_EXPLAIN_H
				#define LLD_ELF_EXPLAIN_H

				#include "lld/Common/LLVM.h"
				MaskRayUnsubmitted Done Reply Inline Actions Missing `#include "lld/Common/LLVM.h"` MaskRay: Missing `#include "lld/Common/LLVM.h"`

				namespace lld {
				namespace elf {

				template <class ELFT> void explain(StringRef fileOrSym);

				} // namespace elf
				} // namespace lld

				#endif

lld/ELF/Explain.cpp

This file was added.

				//===- Explain.cpp --------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements an optional analysis pass that doesn't change the
				// linker's internal state. If you are reading this file to understand how the
				// linker works, you can skip this file now.
				//
				// So, if you are doing a release work, it is a common situation that you find a
				// binary bloat between releases, nail down a file that wasn't linked in a
				// previous release, and try to figure out why the file gets linked to the
				// current one. There was no elegant solution for the last step; you would make
				// a guess and remove some function calls, hoping that that eliminates a
				// dependency to the file.
				//
				// --explain option is intended to be the solution for the above situation. You
				// can specify a filename as an argument for the option, and lld prints out the
				// shortest path from a root to a given file. Below is an example output of lld
				// when `--explain=lib/libLLVMSupport.a(APInt.cpp.o)` is given (shortened to fit
				// to the screen):
				//
				// This is why 'lib/libLLVMSupport.a(APInt.cpp.o)' is linked:
				//
				// '(--entry option)' uses '_start' defined in '/usr/lib/x86_64-linux-gnu/crt1.o'
				// which uses 'main' defined in 'tools/lld/tools/lld/CMakeFiles/lld.dir/lld.cpp.o'
				// which uses 'StringRef::endswith_lower()' defined in 'lib/libLLVMSupport.a(StringRef.cpp.o)'
				// which uses 'APInt::zext()' defined in 'lib/libLLVMSupport.a(APInt.cpp.o)'
				//
				// You can also pass a symbol name instead of a filename to the option, and lld
				// will try to figure out why a file that defines a symbol is linked.
				//
				// What we are doing in this file is the basic breadth-first search in the
				// dependency graph.
				//
				//===----------------------------------------------------------------------===//

				#include "InputFiles.h"
				#include "InputSection.h"
				#include "LinkerScript.h"
				#include "SymbolTable.h"
				#include "Symbols.h"
				#include "lld/Common/ErrorHandler.h"
				#include <deque>

				using namespace llvm;
				using namespace llvm::ELF;
				using namespace llvm::object;

				namespace lld {
				namespace elf {

				namespace {
				template <class ELFT> class Explain {
				public:
				void run(StringRef fileOrSym);

				private:
				InputFile *findFile(StringRef fileOrSym);
				template <class RelTy> void enqueue(const InputSectionBase *sec, RelTy &rel);
				void enqueueSpecial(StringRef cause, StringRef symName);
				void printPath(StringRef fileOrSym);

				// A file object to which we are searching for a path.
				InputFile *target = nullptr;

				// This map represents how we reach a file from root.
				DenseMap<InputFile , std::pair<InputFile , Symbol *>> edges;

				// Some root objects are not really object files but command line arguments
				// (e.g. --entry) or linker scripts. This map manages edges to such vertices.
				DenseMap<InputFile *, std::pair<StringRef, StringRef>> specialEdges;

				// We want to visit each object file at most once.
				DenseSet<InputFile *> seen;

				// A queue for breadth-first search.
				std::deque<InputFile *> queue;
				};
				} // namespace

				// This is the main function of the --explain feature.
				template <class ELFT> void Explain<ELFT>::run(StringRef fileOrSym) {
				// Find a file object for a given filename or a symbol.
				target = findFile(fileOrSym);
				if (!target)
				return;

				// Collect root objects. Object files are usually root objects (i.e. always
				grimarUnsubmitted Not Done Reply Inline Actions Perhaps worth to show a list of files available for --explain to user on this step? Or have another way to get a list of files that can be passed here. May be explain without arguments could show something. grimar: Perhaps worth to show a list of files available for --explain to user on this step? Or have…
				ruiuAuthorUnsubmitted Done Reply Inline Actions Added a hint as to how to get a list of input files. ruiu: Added a hint as to how to get a list of input files.
				// included to the result), but if -gc-sections is passed, they need to be
				// (directly or indirectly) referenced by a root symbol.
				if (!config->gcSections) {
				for (InputFile *file : objectFiles)
				pccUnsubmitted Not Done Reply Inline Actions Hmm, object files aren't really gc roots, so this could give misleading output (e.g. `ld a.o b.o --explain a` with a.o containing `a: ret` and b.o containing `b: call a` with only b in dynsym won't report b.o). Also, it misses archives included with `--whole-archive`. Shouldn't this be more similar to MarkLive.cpp? In other words, include dynsym and don't treat object files as roots. pcc: Hmm, object files aren't really gc roots, so this could give misleading output (e.g. `ld a.o b.
				ruiuAuthorUnsubmitted Done Reply Inline Actions I think object files are actually roots in this features sense if you do not pass `-gc-sections`, so we need to handle two different cases, `-no-gc-sections` and `-gc-sections`. How is this new code? ruiu: I think object files are actually roots in this features sense if you do not pass `-gc…
				pccUnsubmitted Not Done Reply Inline Actions Maybe I'm missing something, but I don't see where dynsyms are being added in the new code in the `--gc-sections` case, pcc: Maybe I'm missing something, but I don't see where dynsyms are being added in the new code in…
				MaskRayUnsubmitted Not Done Reply Inline Actions We don't need to handle --init, --fini or .dynsym as opposed to MarkLive.cpp. # c.s -> c.o .globl _start _start: ret .section .text.foo,"ax" call bar # b.s -> b.o -> b.a % cat b.s .globl bar bar: call foo # a.s -> a.o -> a.a .globl foo foo: ret For example, in `ld.lld c.o b.a a.a --gc-sections -o a`, neither b.a nor a.a is necessary (none of their sections is retained) but we still need to link in b.a and a.a before discarding their sections. The archive handling logic happens before garbage collection. MaskRay: We don't need to handle --init, --fini or .dynsym as opposed to MarkLive.cpp. ``` # c.s -> c.o…
				if (!file->loadedLazily)
				queue.push_back(file);

				for (InputFile *file : bitcodeFiles)
				if (!file->loadedLazily)
				queue.push_back(file);
				MaskRayUnsubmitted Done Reply Inline Actions --init and --fini are used in --gc-sections but they do not fetch lazy definitions. I do not know why GNU ld behaves this way. MaskRay: --init and --fini are used in --gc-sections but they do not fetch lazy definitions. I do not…

				// If the target file is a root object, we don't need to run BFS.
				for (InputFile *file : queue) {
				if (file != target)
				continue;
				outs() << "Explain: File '" << toString(target) << "' is linked because "
				<< "it is passed as a command line argument.\n";
				return;
				}
				}

				// Object files referenced by the following symbols are also root objects.
				enqueueSpecial("(--entry option)", config->entry);
				for (StringRef s : config->undefined)
				enqueueSpecial("(--undefined option)", s);
				for (StringRef s : script->referencedSymbols)
				enqueueSpecial("(linker script)", s);

				// Now that we have a complete set of root objects, run BFS.
				while (!queue.empty()) {
				InputFile *file = queue.front();
				queue.pop_front();

				if (file == target) {
				pccUnsubmitted Not Done Reply Inline Actions Shouldn't it also look at `dependentSections`? pcc: Shouldn't it also look at `dependentSections`?
				ruiuAuthorUnsubmitted Done Reply Inline Actions Done. ruiu: Done.
				printPath(fileOrSym);
				return;
				}

				if (!isa<ObjFile<ELFT>>(file) && !isa<BitcodeFile>(file))
				continue;

				for (const InputSectionBase *sec : file->getSections()) {
				if (!sec \|\| !sec->isLive())
				continue;

				if (sec->areRelocsRela) {
				for (const typename ELFT::Rela &rel : sec->template relas<ELFT>())
				enqueue(sec, rel);
				} else {
				for (const typename ELFT::Rel &rel : sec->template rels<ELFT>())
				enqueue(sec, rel);
				}

				for (InputSection *dep : sec->dependentSections) {
				auto *sec2 = dyn_cast<InputSection>(dep);
				if (!sec2 \|\| !seen.insert(sec2->file).second)
				return;
				queue.push_back(sec2->file);
				edges[sec2->file] = {sec->file, nullptr};
				}
				}
				}

				error("--explain: unreachable file: " + toString(target));
				}
				MaskRayUnsubmitted Not Done Reply Inline Actions We may consider turning the error into a regular print. A similar functionality `-y not_exist` does not error. MaskRay: We may consider turning the error into a regular print. A similar functionality `-y not_exist`…

				MaskRayUnsubmitted Not Done Reply Inline Actions We can make edges a superset of specialEdges, then we can just delete `seen` and use `edges.try_emplace(...)` here. MaskRay: We can make edges a superset of specialEdges, then we can just delete `seen` and use `edges.
				ruiuAuthorUnsubmitted Done Reply Inline Actions We probably could but looks like it also makes sense to keep them distinctive sets, so I'm leaning towards not doing that. ruiu: We probably could but looks like it also makes sense to keep them distinctive sets, so I'm…
				ruiuAuthorUnsubmitted Done Reply Inline Actions It's not a strong preference, but error is perhaps better than printing it out as a non-error message? When the control reaches here, something is wrong from the user's perspective. ruiu: It's not a strong preference, but error is perhaps better than printing it out as a non-error…
				MaskRayUnsubmitted Not Done Reply Inline Actions `error` inhibits producing an output. I have a feeling that retaining the output may be slightly more useful. Sometimes the user can specify multiple --explain. The user may still need the output though some of the --explain values are not existent. MaskRay: `error` inhibits producing an output. I have a feeling that retaining the output may be…
				ruiuAuthorUnsubmitted Done Reply Inline Actions I guess that's actually a good thing, no? I want users to use this option only interactively just by appending an --explain to an otherwise complete command line, and if a command line option passed by a user is not correct, that is, well, I think an error. ruiu: I guess that's actually a good thing, no? I want users to use this option only interactively…
				template <class ELFT> InputFile *Explain<ELFT>::findFile(StringRef name) {
				for (InputFile *f : objectFiles)
				if (toString(f) == name)
				return f;

				for (SharedFile *f : sharedFiles)
				if (toString(f) == name)
				return f;

				if (auto *sym = dyn_cast_or_null<Defined>(symtab->find(name))) {
				pccUnsubmitted Not Done Reply Inline Actions I don't think the `isWeak` part here is correct. pcc: I don't think the `isWeak` part here is correct.
				MaskRayUnsubmitted Done Reply Inline Actions `-u`/`-e` can fetch a weak lazy definition. This may cause a bogus `file unreachable` error. MaskRay: `-u`/`-e` can fetch a weak lazy definition. This may cause a bogus `file unreachable` error.
				ruiuAuthorUnsubmitted Done Reply Inline Actions Yeah, this was in a wrong place. I moved this to `enqueue`. ruiu: Yeah, this was in a wrong place. I moved this to `enqueue`.
				peter.smithUnsubmitted Not Done Reply Inline Actions Global symbols defined by a comdat group may cause some confusion as there will be multiple object files that define the symbol, but only one group that is selected that will match here. Not a lot that we can do here. Perhaps if we knew that a symbol was defined in a group then we could print that out. "Explain: Symbol foo is defined in COMDAT group g, selected from from file foo.o". Perhaps wait and see if anyone does get confused. peter.smith: Global symbols defined by a comdat group may cause some confusion as there will be multiple…
				MaskRayUnsubmitted Not Done Reply Inline Actions Explaining more about COMDAT groups looks good to me. We essentially discarded all but one definitions, lost some information in the transformation, and the best we can do here is to select the prevailing definition. The user is hinted from the message that the archive order may matter. A COMDAT group is a SHT_GROUP section with the ELF group section flag GRP_COMDAT. Its name is almost always ".group". Shall we say `the COMDAT group with signature "g"`? MaskRay: Explaining more about COMDAT groups looks good to me. We essentially discarded all but one…
				ruiuAuthorUnsubmitted Done Reply Inline Actions Currently it is not easy to know whether a section was in a comdat group or not. Fortunately, I don't think that's too confusing, at least we don't tell a lie to users. If we report that there's some dependency from one file to another, that dependency does exist in the source code, though other files may have similar dependencies to the same file. So I don't think we need to do something special for comdats. ruiu: Currently it is not easy to know whether a section was in a comdat group or not. Fortunately, I…
				outs() << "Explain: Symbol '" << name << "' is defined in file '"
				<< toString(sym->file) << "'.\n";
				return sym->file;
				}

				error("--explain: no such file or symbol: '" + name +
				"'. Use --verbose option to see a list of input files.");
				MaskRayUnsubmitted Not Done Reply Inline Actions Is `--trace` better? --verbose suggests the raw archive names while --trace does not, and the --trace output does not include unused files. # --verbose ld.lld: a.a ld.lld: b.o ld.lld: a.a(a.o) # --trace b.o a.a(a.o) MaskRay: Is `--trace` better? --verbose suggests the raw archive names while --trace does not, and the…
				ruiuAuthorUnsubmitted Done Reply Inline Actions Thank you for the suggestion. I replaced --verbose with --trace. ruiu: Thank you for the suggestion. I replaced --verbose with --trace.
				return nullptr;
				grimarUnsubmitted Not Done Reply Inline Actions But it doesn't seem helps for the case mentioned in the description? If we have a `main.o` compiled from .global _start; _start: callq foo And a `foo.a` which contains a `foo.o` compiled from .globl foo; foo: Then when I invoke `-flavor gnu main.o foo.a "-o" out --verbose` I see: lld: main.o lld: foo.a Then, like a possible normal user :) I do: `-flavor gnu --explain=foo.a main.o foo.a "-o" out --verbose` and I see: lld: main.o lld: foo.a lld: error: --explain: no such file or symbol: 'foo.a'. Use --verbose option to see a list of input files. How I am supposed to realise that I should invoke `-flavor gnu --explain=foo.a(foo.o) main.o foo.a "-o" out --verbose` to see the following? lld: main.o lld: foo.a Explain: This is why 'foo.a(foo.o)' is linked: Explain: Explain: '(--entry option)' uses '_start' defined in 'main.o' Explain: which uses 'foo' defined in 'foo.a(foo.o)' grimar: But it doesn't seem helps for the case mentioned in the description? If we have a `main.o`…
				grimarUnsubmitted Not Done Reply Inline Actions btw, note that it says `'(--entry option)' uses '_start'`, but in fact I am not using any command line option like `-e` grimar: btw, note that it says `'(--entry option)' uses '_start'`, but in fact I am not using any…
				ruiuAuthorUnsubmitted Done Reply Inline Actions That's I think fine. It's an implicit option but still there. ruiu: That's I think fine. It's an implicit option but still there.
				ruiuAuthorUnsubmitted Done Reply Inline Actions Well, I don't think we should print out all section names because it's just too long, and I believe showing a hint is better. The problem is that --verbose option doesn't show archive members, but that can be fixed simply by adding a log() call to fetch(). ruiu: Well, I don't think we should print out all section names because it's just too long, and I…
				}

				template <class ELFT>
				template <class RelTy>
				void Explain<ELFT>::enqueue(const InputSectionBase *sec, RelTy &rel) {
				Symbol &sym = sec->getFile<ELFT>()->getRelocTargetSym(rel);
				if (!sym.file \|\| sym.isWeak() \|\| !seen.insert(sym.file).second)
				return;
				queue.push_back(sym.file);
				edges[sym.file] = {sec->file, &sym};
				}

				template <class ELFT>
				void Explain<ELFT>::enqueueSpecial(StringRef cause, StringRef symName) {
				auto *sym = dyn_cast_or_null<Defined>(symtab->find(symName));
				if (!sym)
				return;

				auto *sec = dyn_cast_or_null<InputSection>(sym->section);
				if (!sec \|\| !seen.insert(sec->file).second)
				return;

				queue.push_back(sec->file);
				specialEdges[sec->file] = {cause, symName};
				}

				template <class ELFT> void Explain<ELFT>::printPath(StringRef fileOrSym) {
				std::vector<std::string> files;
				std::vector<std::string> syms;
				InputFile *cur = target;

				for (;;) {
				files.push_back(toString(cur));

				if (specialEdges.count(cur)) {
				StringRef cause;
				StringRef sym;
				std::tie(cause, sym) = specialEdges[cur];
				files.push_back(cause);
				syms.push_back(sym);
				break;
				}

				if (!edges.count(cur))
				break;

				Symbol *sym;
				std::tie(cur, sym) = edges[cur];

				if (sym)
				syms.push_back(toString(*sym));
				else
				syms.push_back("(dependent section)");
				}

				outs() << "Explain: This is why '" << files[0] << "' is linked:\n"
				<< "Explain:\n"
				<< "Explain: '" << files.back() << "' uses '" << syms.back()
				<< "' defined in '" << files[files.size() - 2] << "'\n";

				for (int i = files.size() - 3; i >= 0; --i)
				outs() << "Explain: which uses '" << syms[i] << "' defined in '" << files[i]
				<< "'\n";
				}

				template <class ELFT> void explain(StringRef fileOrSym) {
				Explain<ELFT>().run(fileOrSym);
				}

				template void explain<ELF32LE>(StringRef);
				template void explain<ELF32BE>(StringRef);
				template void explain<ELF64LE>(StringRef);
				template void explain<ELF64BE>(StringRef);

				} // namespace elf
				} // namespace lld

lld/ELF/InputFiles.h

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	public:
mutable std::string toStringCache;		mutable std::string toStringCache;

std::string getSrcMsg(const Symbol &sym, InputSectionBase &sec,		std::string getSrcMsg(const Symbol &sym, InputSectionBase &sec,
uint64_t offset);		uint64_t offset);

// True if this is an argument for --just-symbols. Usually false.		// True if this is an argument for --just-symbols. Usually false.
bool justSymbols = false;		bool justSymbols = false;

		// True if this file loaded lazily from an archive or --start-lib/--end-lib.
		// False if this file is directly specified on the command line.
		bool loadedLazily = false;

// outSecOff of .got2 in the current file. This is used by PPC32 -fPIC/-fPIE		// outSecOff of .got2 in the current file. This is used by PPC32 -fPIC/-fPIE
// to compute offsets in PLT call stubs.		// to compute offsets in PLT call stubs.
uint32_t ppc32Got2OutSecOff = 0;		uint32_t ppc32Got2OutSecOff = 0;

// On PPC64 we need to keep track of which files contain small code model		// On PPC64 we need to keep track of which files contain small code model
// relocations that access the .toc section. To minimize the chance of a		// relocations that access the .toc section. To minimize the chance of a
// relocation overflow, files that do contain said relocations should have		// relocation overflow, files that do contain said relocations should have
// their .toc sections sorted closer to the .got section than files that do		// their .toc sections sorted closer to the .got section than files that do
▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

lld/ELF/InputFiles.cpp

Show First 20 Lines • Show All 1,093 Lines • ▼ Show 20 Lines	MemoryBufferRef mb =
toELFString(sym));		toELFString(sym));

if (tar && c.getParent()->isThin())		if (tar && c.getParent()->isThin())
tar->append(relativeToRoot(CHECK(c.getFullName(), this)), mb.getBuffer());		tar->append(relativeToRoot(CHECK(c.getFullName(), this)), mb.getBuffer());

InputFile *file = createObjectFile(		InputFile *file = createObjectFile(
mb, getName(), c.getParent()->isThin() ? 0 : c.getChildOffset());		mb, getName(), c.getParent()->isThin() ? 0 : c.getChildOffset());
file->groupId = groupId;		file->groupId = groupId;
		file->loadedLazily = true;
parseFile(file);		parseFile(file);

		log(toString(file));
}		}

unsigned SharedFile::vernauxNum;		unsigned SharedFile::vernauxNum;

// Parse the version definitions in the object file if present, and return a		// Parse the version definitions in the object file if present, and return a
// vector whose nth element contains a pointer to the Elf_Verdef for version		// vector whose nth element contains a pointer to the Elf_Verdef for version
// identifier n. Version identifiers that are not definitions map to nullptr.		// identifier n. Version identifiers that are not definitions map to nullptr.
template <typename ELFT>		template <typename ELFT>
▲ Show 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	void LazyObjFile::fetch() {
file->groupId = groupId;		file->groupId = groupId;

mb = {};		mb = {};

// Copy symbol vector so that the new InputFile doesn't have to		// Copy symbol vector so that the new InputFile doesn't have to
// insert the same defined symbols to the symbol table again.		// insert the same defined symbols to the symbol table again.
file->symbols = std::move(symbols);		file->symbols = std::move(symbols);

		file->loadedLazily = true;
parseFile(file);		parseFile(file);
}		}

template <class ELFT> void LazyObjFile::parse() {		template <class ELFT> void LazyObjFile::parse() {
using Elf_Sym = typename ELFT::Sym;		using Elf_Sym = typename ELFT::Sym;

// A lazy object file wraps either a bitcode file or an ELF file.		// A lazy object file wraps either a bitcode file or an ELF file.
if (isBitcode(this->mb)) {		if (isBitcode(this->mb)) {
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

lld/ELF/Options.td

Show All 36 Lines	defm check_sections: B<"check-sections",
"Do not check section addresses for overlaps">;		"Do not check section addresses for overlaps">;

defm compress_debug_sections:		defm compress_debug_sections:
Eq<"compress-debug-sections", "Compress DWARF debug sections">,		Eq<"compress-debug-sections", "Compress DWARF debug sections">,
MetaVarName<"[none,zlib]">;		MetaVarName<"[none,zlib]">;

defm defsym: Eq<"defsym", "Define a symbol alias">, MetaVarName<"<symbol>=<value>">;		defm defsym: Eq<"defsym", "Define a symbol alias">, MetaVarName<"<symbol>=<value>">;

		defm explain:
		Eq<"explain", "Explain why a given file gets linked to the final binary">,
		MaskRayUnsubmitted Not Done Reply Inline Actions It is not obvious that the user is supposed to specify `b.a(b.o)` for an object file in an archive or `b.o` for an object between `--start-lib` and `--end-lib`. MaskRay: It is not obvious that the user is supposed to specify `b.a(b.o)` for an object file in an…
		MetaVarName<"<path>">;
		peter.smithUnsubmitted Not Done Reply Inline Actions A suggestion: Explain why a given object file gets linked in to the final binary. Objects can be specified by full path, or by a global symbol that they define. Objects defined in archives are specified by full/path/to/library(object) such as library.a(object.o). peter.smith: A suggestion: ``` Explain why a given object file gets linked in to the final binary. Objects…
		MaskRayUnsubmitted Not Done Reply Inline Actions Unfortunately this (lld::elf::InputFile::archiveName) is not the full path. Here are some examples: ld.lld a.a => a.a ld.lld -L. -la => ./liba.a ld.lld /tmp/c/a.a => /tmp/c/a.a So we may need path normalization here. Some users may resolve the path and feed that to lld. MaskRay: Unfortunately this (lld::elf::InputFile::archiveName) is not the full path. Here are some…
		ruiuAuthorUnsubmitted Done Reply Inline Actions OK, I made a change to code so that we normalize pathnames before comparison. ruiu: OK, I made a change to code so that we normalize pathnames before comparison.
		ruiuAuthorUnsubmitted Done Reply Inline Actions Added that text to the man page, thanks! ruiu: Added that text to the man page, thanks!

defm split_stack_adjust_size		defm split_stack_adjust_size
: Eq<"split-stack-adjust-size",		: Eq<"split-stack-adjust-size",
"Specify adjustment to stack size when a split-stack function calls a "		"Specify adjustment to stack size when a split-stack function calls a "
"non-split-stack function">,		"non-split-stack function">,
MetaVarName<"<value>">;		MetaVarName<"<value>">;

defm library_path:		defm library_path:
Eq<"library-path", "Add a directory to the library search path">, MetaVarName<"<dir>">;		Eq<"library-path", "Add a directory to the library search path">, MetaVarName<"<dir>">;
▲ Show 20 Lines • Show All 540 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add a feature to explain why some file gets included to the linker's outputNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 227243

lld/ELF/CMakeLists.txt

lld/ELF/Driver.cpp

lld/ELF/Explain.h

lld/ELF/Explain.cpp

lld/ELF/InputFiles.h

lld/ELF/InputFiles.cpp

lld/ELF/Options.td

Add a feature to explain why some file gets included to the linker's output
Needs ReviewPublic