Page MenuHomePhabricator

[lld] Add options to trace all symbols and to trace all symbols originated from a file
Needs RevisionPublic

Authored by hoy on Feb 12 2021, 9:11 AM.

Details

Summary

--trace-symbol=<symbol> only traces one symbol at a time, add :

  • --trace-all-symbols to trace all symbols
  • --trace-symbols-from-file=<file> to trace symbols referenced or defined in a file

Diff Detail

Event Timeline

hoy created this revision.Feb 12 2021, 9:11 AM
hoy requested review of this revision.Feb 12 2021, 9:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2021, 9:11 AM
hoy added a reviewer: wenlei.Feb 12 2021, 9:15 AM

I created https://sourceware.org/bugzilla/show_bug.cgi?id=27407 asking for binutils opinions.

lld/ELF/Symbols.h
535

Can you measure how much slowdown this will cause?

hoy added a comment.Feb 15 2021, 10:33 PM

I created https://sourceware.org/bugzilla/show_bug.cgi?id=27407 asking for binutils opinions.

Thanks for gather's inputs from binutils. I'm curious about how they view this.

Regarding the throughput impact, I was seeing --trace-all-symbols slowed down the linker by 2x for a final executable around 60M unfortunately. Since it is a debug switch, I would imagine people whoever use this would be willing to pay for its cost.

hoy updated this revision to Diff 323888.Feb 15 2021, 10:40 PM

Fixing lint issue.

smeenai added a subscriber: smeenai.
hoy added a comment.Feb 22 2021, 12:30 PM

@MaskRay I'm wondering if you've got any update from the binutils side. Thanks.

In D96613#2579750, @hoy wrote:

@MaskRay I'm wondering if you've got any update from the binutils side. Thanks.

I even pinged that for you: https://sourceware.org/pipermail/binutils/2021-February/115455.html
If you want to reply, you can download https://sourceware.org/pipermail/binutils/2021-February.txt.gz , extract the Message-ID as your In-Reply-To: header.

MaskRay added inline comments.Feb 22 2021, 1:32 PM
lld/ELF/Symbols.h
535

And ping on this question: the condition is pretty complex. Does it affect symbol resolution time?

hoy added a comment.Feb 22 2021, 2:24 PM
In D96613#2579750, @hoy wrote:

@MaskRay I'm wondering if you've got any update from the binutils side. Thanks.

I even pinged that for you: https://sourceware.org/pipermail/binutils/2021-February/115455.html
If you want to reply, you can download https://sourceware.org/pipermail/binutils/2021-February.txt.gz , extract the Message-ID as your In-Reply-To: header.

Thanks for pinging for me!

lld/ELF/Symbols.h
535

Sorry, I misunderstood your question. I haven't seen noticeable link time difference w/ and w/o this function when no tracing switches is turned on.

MaskRay added a comment.EditedFeb 26 2021, 11:43 AM

https://sourceware.org/bugzilla/show_bug.cgi?id=27407
I think the functionality can be straightforwardly emulated. For example, llvm-nm -Du dumps undefined symbols in .dynsym.

% ld.lld @response.txt $(llvm-nm -Du usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}')

This even works with versioned symbols while the patch doesn't. You can tweak -u: e.g. -U dumps defined symbols (-U is not in nm).
This is the flexibility provided by composing tools, which will be a bit inconvenient to implement in code.

The patch uses this->file, however, the value may change if the symbol gets resolved to other files. In the cases I can conceive we want the trace of the full lifetime of a symbol, not only when it is bound to a specific file.

https://sourceware.org/bugzilla/show_bug.cgi?id=27407
I think the functionality can be straightforwardly emulated. For example, llvm-nm -Du dumps undefined symbols in .dynsym.

% ld.lld @response.txt $(llvm-nm -Du usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}')

This even works with versioned symbols while the patch doesn't. You can tweak -u: e.g. -U dumps defined symbols (-U is not in nm).
This is the flexibility provided by composing tools, which will be a bit inconvenient to implement in code.

The patch uses this->file, however, the value may change if the symbol gets resolved to other files. In the cases I can conceive we want the trace of the full lifetime of a symbol, not only when it is bound to a specific file.

In general, while there's a lot of value in being able to compose tools, I strongly disagree that it's an adequate replacement for having functionality directly built into a tool:

  • You have to think about how to construct the appropriate pipeline, and in practice, you're going to have to re-remember how to construct that pipeline every single time you want to use it (vs. just remembering a simple command like --trace-all-symbols). (I know you can use aliases/shell functions/etc., but it's still a lot of overhead.)
  • If you want to integrate the functionality in your build system (e.g. you're trying to generate an aggregated report for all the libraries you build), a pipeline is much harder to integrate into your build system, vs. just adding an argument.
  • You have to worry about platform differences; e.g., lots of Linux utilities have different arguments or behaviors than their BSD (and therefore macOS) equivalents, and Windows doesn't have these utilities at all.
MaskRay added a comment.EditedFeb 26 2021, 12:10 PM

https://sourceware.org/bugzilla/show_bug.cgi?id=27407
I think the functionality can be straightforwardly emulated. For example, llvm-nm -Du dumps undefined symbols in .dynsym.

% ld.lld @response.txt $(llvm-nm -Du usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}')

This even works with versioned symbols while the patch doesn't. You can tweak -u: e.g. -U dumps defined symbols (-U is not in nm).
This is the flexibility provided by composing tools, which will be a bit inconvenient to implement in code.

The patch uses this->file, however, the value may change if the symbol gets resolved to other files. In the cases I can conceive we want the trace of the full lifetime of a symbol, not only when it is bound to a specific file.

In general, while there's a lot of value in being able to compose tools, I strongly disagree that it's an adequate replacement for having functionality directly built into a tool:

My opinion on this is still case by case. For the particular features, (1) the lack of customization of -u/-U` and (2) the this->file usage is my main concern about the new option.
(And a minor concern: needToTraceSymbol cost - that was why I re-stated that whether it would regress the symbol resolution performance)

  • You have to think about how to construct the appropriate pipeline, and in practice, you're going to have to re-remember how to construct that pipeline every single time you want to use it (vs. just remembering a simple command like --trace-all-symbols). (I know you can use aliases/shell functions/etc., but it's still a lot of overhead.)

New options have education costs. I've heard many internal reports where they want some build analysis features but they don't investigate(know) --cref/-Map/-y/etc.

  • If you want to integrate the functionality in your build system (e.g. you're trying to generate an aggregated report for all the libraries you build), a pipeline is much harder to integrate into your build system, vs. just adding an argument.
  • You have to worry about platform differences; e.g., lots of Linux utilities have different arguments or behaviors than their BSD (and therefore macOS) equivalents, and Windows doesn't have these utilities at all.

I think a small set of composable options for build dependency analysis will be useful, but we need to consolidate the requests and think of their composability/maintenance/etc. Apologies but I feel that "Linux utilities have different arguments or behaviors than their BSD" in this particular context is probably a weak argument: here we use LLVM binary utilities and a common 'awk' (which can be reimplemented in build system language straightforwardly). Many downstream groups have bundled LLVM binary utilities as platform-neutral utilities which are already used heavily in build systems. We don't necessarily re-invent features provided by them in LLD.

https://sourceware.org/bugzilla/show_bug.cgi?id=27407
I think the functionality can be straightforwardly emulated. For example, llvm-nm -Du dumps undefined symbols in .dynsym.

% ld.lld @response.txt $(llvm-nm -Du usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}')

This even works with versioned symbols while the patch doesn't. You can tweak -u: e.g. -U dumps defined symbols (-U is not in nm).
This is the flexibility provided by composing tools, which will be a bit inconvenient to implement in code.

The patch uses this->file, however, the value may change if the symbol gets resolved to other files. In the cases I can conceive we want the trace of the full lifetime of a symbol, not only when it is bound to a specific file.

In general, while there's a lot of value in being able to compose tools, I strongly disagree that it's an adequate replacement for having functionality directly built into a tool:

My opinion on this is still case by case. For the particular features, (1) the lack of customization of -u/-U` and (2) the this->file usage is my main concern about the new option.
(And a minor concern: needToTraceSymbol cost - that was why I re-stated that whether it would regress the symbol resolution performance)

Yup, we should still be evaluating new options on their merits and drawbacks, of course. I just meant that we shouldn't be preventing some functionality from being built directly into a tool just because you can also emulate the same results by composing other tools.

  • You have to think about how to construct the appropriate pipeline, and in practice, you're going to have to re-remember how to construct that pipeline every single time you want to use it (vs. just remembering a simple command like --trace-all-symbols). (I know you can use aliases/shell functions/etc., but it's still a lot of overhead.)

New options have education costs. I've heard many internal reports where they want some build analysis features but they don't investigate(know) --cref/-Map/-y/etc.

Agreed, but I also think they're more discoverable (e.g. by reading the --help output or man pages).

  • If you want to integrate the functionality in your build system (e.g. you're trying to generate an aggregated report for all the libraries you build), a pipeline is much harder to integrate into your build system, vs. just adding an argument.
  • You have to worry about platform differences; e.g., lots of Linux utilities have different arguments or behaviors than their BSD (and therefore macOS) equivalents, and Windows doesn't have these utilities at all.
MaskRay added a comment.EditedFeb 26 2021, 12:24 PM

I edited my previous comment to include a bit more information.

For discoverability, if people know that --trace-symbol, re-inventing this mechanism is simple. nm (llvm-nm) is well-known utility to dump the symbol table. Composing nm and ld.lld together is straightforward.

  • If you want to integrate the functionality in your build system (e.g. you're trying to generate an aggregated report for all the libraries you build), a pipeline is much harder to integrate into your build system, vs. just adding an argument.
  • You have to worry about platform differences; e.g., lots of Linux utilities have different arguments or behaviors than their BSD (and therefore macOS) equivalents, and Windows doesn't have these utilities at all.

I think a small set of composable options for build dependency analysis will be useful, but we need to consolidate the requests and think of their composability/maintenance/etc. Apologies but I feel that "Linux utilities have different arguments or behaviors than their BSD" in this particular context is probably a weak argument: here we use LLVM binary utilities and a common 'awk' (which can be reimplemented in build system language straightforwardly). Many downstream groups have bundled LLVM binary utilities as platform-neutral utilities which are already used heavily in build systems. We don't necessarily re-invent features provided by them in LLD.

For discoverability, if people know that --trace-symbol, re-inventing this mechanism is simple. nm (llvm-nm) is well-known utility to dump the symbol table. Composing nm and ld.lld together is straightforward.

I agree that we should be thinking about the composability and maintenance burden of new options, and thinking about this in a holistic way (and not just always adding one-off options for each request). At the same time, I disagree with your assessment of how straightforward it is to compose different tools together. Fair point about the awk functionality in your command being the same across Unixes, but awk still isn't a thing on Windows, and whether it's straightforward or not to reimplement the equivalent functionality in your build system language depends on your build system. Furthermore, I think text processing is inherently fragile in general; it's fair to assume that a tool like llvm-nm isn't going to be changing its output format, but you still have to put a lot more thought into setting up an appropriate pipeline than you would into using a built-in option.

Case in point: you're using the following invocation:

llvm-nm -Du /usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}'

I don't know why this is, but both GNU nm and llvm-nm print a different number of leading spaces for 32 vs. 64-bit binaries. If I do llvm-nm -Du /usr/lib/libc.so.6 (instead of /usr/lib64/libc.so.6), I need to change the substr amount to 12 instead of 20. I could use $2 (as in the second field) instead, but that'll break if my symbol name has spaces in it (which definitely occurs for Objective-C, at least). (Incidentally, you're not putting quotes around your -y command, so it'll also break if the symbol name has spaces in it, which is one of the generally fragile things about shell pipelines.) Fortunately, llvm-nm has a --just-symbol-name option (which at least my version of GNU nm doesn't), which'll work regardless of the output format for the particular architecture, and handle symbol names with spaces without any issues. There's clear value to having the --just-symbol-name flag IMO, even though you could theoretically emulate the same functionality with a shell pipeline, since it lets you not have to worry about all these caveats.

hoy added a comment.Feb 27 2021, 10:41 PM

@MaskRay I think your concern is primarily about the extra condition checks in Symbol::needToTraceSymbol that may slow down the linker. Is there a linker performance testing system on your side that you can help evaluate it? I have manually run the patch against medium-sized programs and haven't seen regressions. Otherwise, I agree with @smeenai that we should be open to changes that make life easier with little cost.

MaskRay added a comment.EditedFeb 27 2021, 11:21 PM

I have several concerns.

  • The slowdown. You can test some internal projects.
  • This does not work with versioned symbols in shared objects.
  • The patch uses this->file, however, the value may change if the symbol gets resolved to other files. In the cases I can conceive we want the trace of the full lifetime of a symbol, not only when it is bound to a specific file.
  • The usefulness is not justified. This looks like a debug option. I raised ld.lld @response.txt $(llvm-nm -Du usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}') as an example how this can be trivially implemented with tool composing.

https://sourceware.org/bugzilla/show_bug.cgi?id=27407 If you can make arguments there, it will probably make this proposal more competitive.

hoy added a comment.Feb 28 2021, 12:27 PM

I have several concerns.

  • The slowdown. You can test some internal projects.

Our internal projects mostly use thinLTO thus the overhead here is negligible. Without thinLTO, generated code becomes much smaller (due to lack of cross-module inlining). Anyway, I'll continue my measurement. If there's any regression, I think we can optimize the checks to just favor non-debug scenario, like another bool flag to identify if config->traceSymbolsFromFile is empty?

  • This does not work with versioned symbols in shared objects.

Does the exiting --trace-symbol handle versioned symbols?

  • The patch uses this->file, however, the value may change if the symbol gets resolved to other files. In the cases I can conceive we want the trace of the full lifetime of a symbol, not only when it is bound to a specific file.

Is this for weak symbols? --trace-symbols-from-file aims to trace where a definition of a symbol is finally picked up.

  • The usefulness is not justified. This looks like a debug option. I raised ld.lld @response.txt $(llvm-nm -Du usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}') as an example how this can be trivially implemented with tool composing.

As @smeenai pointed out, that approach also has limitations.

https://sourceware.org/bugzilla/show_bug.cgi?id=27407 If you can make arguments there, it will probably make this proposal more competitive.

Sorry, I don't quite get this. Are we trying to have the gnu linker implement the two switches as well? I don't see an objection on their side.

MaskRay added a comment.EditedFeb 28 2021, 12:38 PM
In D96613#2593029, @hoy wrote:

I have several concerns.

  • The slowdown. You can test some internal projects.

Our internal projects mostly use thinLTO thus the overhead here is negligible. Without thinLTO, generated code becomes much smaller (due to lack of cross-module inlining). Anyway, I'll continue my measurement. If there's any regression, I think we can optimize the checks to just favor non-debug scenario, like another bool flag to identify if config->traceSymbolsFromFile is empty?

  • This does not work with versioned symbols in shared objects.

Does the exiting --trace-symbol handle versioned symbols?

Yes. See verneed-shared.s

  • The patch uses this->file, however, the value may change if the symbol gets resolved to other files. In the cases I can conceive we want the trace of the full lifetime of a symbol, not only when it is bound to a specific file.

Is this for weak symbols? --trace-symbols-from-file aims to trace where a definition of a symbol is finally picked up.

Not just weak symbols. See resolveUndefined where an undefined symbol's file can be replaced.
By using Symbol::file, the semantics of the patch are not clear.

  • The usefulness is not justified. This looks like a debug option. I raised ld.lld @response.txt $(llvm-nm -Du usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}') as an example how this can be trivially implemented with tool composing.

As @smeenai pointed out, that approach also has limitations.

You are proposing new options, so you need to justify the options. I've also asked other folks and many feel that this is unnecessary.
I forgot llvm-nm --just-symbol-name. How is this proposal more competitive than composing llvm-nm --just-symbol-name and ld.lld?

https://sourceware.org/bugzilla/show_bug.cgi?id=27407 If you can make arguments there, it will probably make this proposal more competitive.

Sorry, I don't quite get this. Are we trying to have the gnu linker implement the two switches as well? I don't see an objection on their side.

When things are in doubts, one tie-breaking thing you can try is to make other implementations accept your proposal. That could justify the merit of the proposal.

MaskRay requested changes to this revision.Jul 28 2021, 8:49 PM
This revision now requires changes to proceed.Jul 28 2021, 8:49 PM