This is an archive of the discontinued LLVM Phabricator instance.

Discard debuginfo for object files empty after GC
ClosedPublic

Authored by rocallahan on Nov 20 2018, 3:02 AM.

Details

Summary

Rust projects tend to link in all object files from all dependent libraries and rely on --gc-sections to strip unused code and data. Unfortunately --gc-sections doesn't currently strip any debuginfo associated with GC'ed sections, so lld links in the full debuginfo from all dependencies even if almost all that code has been discarded. See https://github.com/rust-lang/rust/issues/56068 for some details.

Properly stripping debuginfo for discarded sections would be difficult, but a simple approach that helps significantly is to mark debuginfo sections as live only if their associated object file has at least one live code/data section. This patch does that. In a (contrived but not totally artificial) Rust testcase linked above, it reduces the final binary size from 46MB to 5.1MB.

Diff Detail

Repository
rL LLVM

Event Timeline

rocallahan created this revision.Nov 20 2018, 3:02 AM
ruiu added a comment.Nov 28 2018, 2:57 PM

Thank you for the patch.

What you are doing in this patch is not too complicated and makes sense to me. That said, if actual size saving is not significant as you said in https://github.com/rust-lang/rust/issues/56068#issuecomment-440160568, it may not be worth doing. It seems to me that if debug info is already 2.4GB, shrinking it to 2GB doesn't make much difference. Do you have more convincing examples?

If we decide to optimize DWARF garbage collection, something generic will be cool.

This generic-abi thread has some discussion about that https://groups.google.com/d/msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ (e.g. using COMDAT but it seems challenging and it comes with its own costs)

Thank you for the patch.

What you are doing in this patch is not too complicated and makes sense to me. That said, if actual size saving is not significant as you said in https://github.com/rust-lang/rust/issues/56068#issuecomment-440160568, it may not be worth doing. It seems to me that if debug info is already 2.4GB, shrinking it to 2GB doesn't make much difference. Do you have more convincing examples?

400MB is 400MB... or, if you'd prefer 17% of overall size that was mentioned there. Additional testing might be nice in order to get a better idea of what we're looking at in practice (I guess a clang build would be another good choice), but the overall patch seems to be small and in a lot of ways simplifying for how we're linking.

grimar added a subscriber: grimar.Nov 28 2018, 11:50 PM
ruiu added a comment.Nov 30 2018, 2:03 PM

But 2GB is perhaps still too big and I guess a large part of it can be for dead sections. If we fix this, I'd like to fix it in a proper way so that we can completely eliminate debug info for dead sections.

rocallahan, can I ask why Rust passes all object files to the linker and use -gc-sections to eliminate unused part of these files?

One possible way to fix it is to pass --start-lib to the linker before any object file paths in the command line. That option gives archive file semantics to object files, so if you option, each object file is treated as if that were in an archive file containing that the single object file. (Note that --start-lib is a positional option just like --start-group, so all files between --start-lib and --end-lib get that special treament.)

But 2GB is perhaps still too big and I guess a large part of it can be for dead sections. If we fix this, I'd like to fix it in a proper way so that we can completely eliminate debug info for dead sections.

There's a few options here, mostly using comdat groups for debug infomation associated with sections that the debug information comes from. That said, it still wouldn't encompass compile units that would have no code in them because we didn't choose a single function from a .o file and so this patch is still a strict improvement over that. In addition, it doesn't involve future work to change how we emit debug info and doesn't depend on everyone emitting debug info the same way.

Thoughts?

ruiu added a comment.Dec 4 2018, 4:21 PM

It seems to me that just adding --start-lib to his command line can fix the issue, so I'm waiting for Robert's response. If it doesn't work for some reason, we can analyze why it doesn't work and then discuss what we can do for his problem.

rocallahan added a comment.EditedDec 28 2018, 11:17 PM

Sorry for the delayed reply. I just discovered that Phabricator nofications were being buried by my mail filters.

What you are doing in this patch is not too complicated and makes sense to me. That said, if actual size saving is not significant as you said in https://github.com/rust-lang/rust/issues/56068#issuecomment-440160568, it may not be worth doing. It seems to me that if debug info is already 2.4GB, shrinking it to 2GB doesn't make much difference.

17% saving for a pretty small and conservative patch seems good to me, but I guess it's subjective.

rocallahan, can I ask why Rust passes all object files to the linker and use -gc-sections to eliminate unused part of these files?

I'm not a Rust compiler contributor, but my guess is developer ergonomics. Once upon a time users manually divided code into translation units, each producing an object file, and garbage collection occurred only at the granularity of object files. Later --gc-sections and -ffunction-sections let compilers automatically subdivide object files into smaller units and GC with a granularity of functions. In Rust each translation unit is a whole module (crate); users aren't expected to subdivide libraries into smaller units for GC purposes, and in fact aren't able to. Emitting one section per function mostly means they don't need to ... except it breaks down with DWARF here.

One possible way to fix it is to pass --start-lib to the linker before any object file paths in the command line. That option gives archive file semantics to object files, so if you option, each object file is treated as if that were in an archive file containing that the single object file. (Note that --start-lib is a positional option just like --start-group, so all files between --start-lib and --end-lib get that special treament.)

I will try that and follow up but I don't see how it would make any difference.

Here are some results for the rusoto test in https://github.com/rust-lang/rust/issues/56068#issue-382175735:

LLDBinary size in bytes
LLD 6.0.143,791,192
LLD 8.0.0 1cfcf404f6b23943405bc3c5bb5940b10b91462443,861,056
LLD 8.0.0 1cfcf404f6b23943405bc3c5bb5940b10b914624 --start-lib43,844,760
LLD 8.0.0 1cfcf404f6b23943405bc3c5bb5940b10b914624 + this patch6,281,488

For --start-lib I wrapped --start-lib/--end-lib around all object files.

As I expected --start-lib didn't make much difference. The dependencies containing debuginfo that I want to be GCed away are already packaged into static libraries before being passed to the final link.

I agree that ideally the linker would be able to do fine-grained GC of DWARF at the granularity of individual DIEs and other data items. However, implementing that would be a huge project. Currently AFAICT lld does very little DWARF processing. Wholesale DWARF rewriting would expand the scope of the linker and require lots of testing against various DWARF producers and consumers. I definitely wouldn't want to implement that.

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2019, 10:31 PM
MaskRay added inline comments.Mar 21 2019, 1:01 AM
lld/ELF/MarkLive.cpp
192 ↗(On Diff #174739)

This check can be changed to !isa<InputSection> && !isa<MergeInputSection>. But do you just want to exclude EhInputSection?

Here are some results for the rusoto test in https://github.com/rust-lang/rust/issues/56068#issue-382175735:

LLDBinary size in bytes
LLD 6.0.143,791,192
LLD 8.0.0 1cfcf404f6b23943405bc3c5bb5940b10b91462443,861,056
LLD 8.0.0 1cfcf404f6b23943405bc3c5bb5940b10b914624 --start-lib43,844,760
LLD 8.0.0 1cfcf404f6b23943405bc3c5bb5940b10b914624 + this patch6,281,488

For --start-lib I wrapped --start-lib/--end-lib around all object files.

As I expected --start-lib didn't make much difference. The dependencies containing debuginfo that I want to be GCed away are already packaged into static libraries before being passed to the final link.

The improvement of this patch looks really promising! Do you have numbers with ld.bfd and gold?

lld/ELF/MarkLive.cpp
195 ↗(On Diff #174739)

!ObjFile<ELFT>::classof(Sec->File)

Can this happen?

199 ↗(On Diff #174739)

The brace is redundant.

@rocallahan I find that people are discussing a generic approach in D59553

rocallahan marked 3 inline comments as done.Mar 24 2019, 6:24 PM

@rocallahan I find that people are discussing a generic approach in D59553

Thanks for the reference. It seems to me that there is no consensus there yet for any approach that would obsolete this change. Of the three approaches discussed there:

  1. Having the linker rewrite DWARF: seems to have been decisively rejected.
  2. Give each function standalone debuginfo in COMDAT: sounds to me like it would be very inefficient, both in the size of intermediate files and in the size of the final binary. DWARF type information currently shared between functions in the same CU would have to be duplicated, and eliminating that duplication would require the linker rewrite DWARF, which brings us back to point 1.
  3. Optimize away unnecessary debuginfo using a post-link tool like dwz: Considerably less efficient than not writing out the bloated DWARF in the first place.
lld/ELF/MarkLive.cpp
192 ↗(On Diff #174739)

Shouldn't I also be excluding SyntheticSection?

195 ↗(On Diff #174739)

I think not. Happy to remove this check if you don't want it.

199 ↗(On Diff #174739)

Thanks.

Do you have numbers with ld.bfd and gold?

I don't have numbers for that exact test with those linkers but they basically behave the same way as LLD.

To be clear, I think the best long-term solution is for LLD to rewrite the DWARF, but from my (admittedly limited) perspective that seems to be at best a distant prospect.

Another clarification:

DWARF type information currently shared between functions in the same CU would have to be duplicated.

I guess that's not necessary. The compiler could put all debuginfo that's shared between functions into a non-COMDAT section for the compilation unit. The downside of that is that that data is not removed by gc-sections ... although my patch here would remove it, if all functions in the CU are removed.

I guess this is what @echristo was getting at in their comment above:

There's a few options here, mostly using comdat groups for debug infomation associated with sections that the debug information comes from. That said, it still wouldn't encompass compile units that would have no code in them because we didn't choose a single function from a .o file and so this patch is still a strict improvement over that. In addition, it doesn't involve future work to change how we emit debug info and doesn't depend on everyone emitting debug info the same way.

ruiu added inline comments.Mar 25 2019, 11:35 AM
lld/ELF/MarkLive.cpp
190 ↗(On Diff #174739)

Since this file is MarkLive, markSection is perhaps a better name.

192 ↗(On Diff #174739)

This needs a comment.

Do you have to visit each file during the mark phase? Looks like you can mark only sections first, and after marking all sections, you can scan all sections to mark files. Looks like they can be two separate stages.

rocallahan marked 2 inline comments as done.Mar 25 2019, 1:07 PM
rocallahan added inline comments.
lld/ELF/MarkLive.cpp
190 ↗(On Diff #174739)

OK

192 ↗(On Diff #174739)

Maybe I'm wrong but I would have expected adding another pass over all sections to be slower than what I'm doing here.

Also we need to distinguish LSDA sections from other live sections since LSDA does not count as "making the file live". So we'd have to add an LSDA flag to InputSection.

Are you sure you want me to make this change?

Also I just noticed I've written LDSA in several places where it should be LSDA!

ruiu added inline comments.Mar 25 2019, 1:18 PM
lld/ELF/MarkLive.cpp
192 ↗(On Diff #174739)

Ah OK, I thought that you set Sec->Live to true only in setSectionLive() but you manipulated that in EnqueueMaybeLDSA as well. Looks like we have too many callback functions in this function -- this file is organized that way because the callback functions were very simple. Now it's been growing organically and probably get to the point that we should just use the regular class-based abstraction. Let me do that first.

ruiu added a comment.Mar 25 2019, 4:29 PM

I committed https://reviews.llvm.org/D59800 which I believe makes your change easier to follow once rebased. Could you rebase it and upload a patch? Thanks!

avl added a subscriber: avl.Apr 3 2019, 4:46 AM

@rocallahan Would you mind to share the performance results of that patch, please ? Similar to above table for rusoto test, but with timings ...

rocallahan updated this revision to Diff 194244.Apr 8 2019, 9:59 PM

Addressed all comments AFAIK. I'll post some performance numbers in a moment.

Updated results for the rusoto test in https://github.com/rust-lang/rust/issues/56068#issue-382175735. The test changed a bit because I'm using an updated Rust toolchain and rusoto_core 0.37.0.

LinkerSize (bytes)Real time (ms)
GNU ld version 2.29.1-23.fc2848,554,7362234
GNU gold (version 2.29.1-23.fc28) 1.1449,888,392813
lld-6.0.1-1.fc28.x86_6449,800,824247
lld d918d74461724a22cedd0b76dc1237392f29565649,873,960224
lld d918d74461724a22cedd0b76dc1237392f295656 + this patch5,390,632158

I am also pleased to report that for my real project, switching from ld.bfd to lld + this patch reduces the total size of our dist built binaries from 2.9GB to 2.0GB.

I'm not sure why it's become more effective, but several variables have changed, including some significant restructuring of the build artifacts for our project. Still, this seems like a pretty good result. (Perhaps even suspiciously good! But I can't find any missing debuginfo with cursory checks.)

ruiu added a comment.Apr 8 2019, 10:48 PM

Nice. One more thing you might want to try is to add -O2 to the linker command line option. When that option is given, lld attempts to tail-merge strings in the string table. That's not very effective, but you might be able to shave off 0.2% or something like that.

ruiu added a comment.Apr 8 2019, 10:58 PM

Overall looking good.

lld/ELF/MarkLive.cpp
190 ↗(On Diff #194244)

Move this code after Sec->Live = true so that when we visit the same section more than once, this piece of code runs only once.

190–191 ↗(On Diff #194244)

S is true only when it is an InputSection, so you have duplicate tests for this. It also looks like you don't have to care about whether a section is an input section or a merged section. Isn't this condition sufficient?

if (Sec->File && !IsLSDA)
  Sec->getFile<ELFT>()->HasLiveCodeOrData = true;
313 ↗(On Diff #194244)

Let's remove this optimization -- I don't think we need this.

321 ↗(On Diff #194244)

Then you can simplify the condition to

if (!IsAlloc && !IsLinkOrder && !IsRel && !Sec->Debug)
  Sec->Live = true;
330 ↗(On Diff #194244)

Remove this variable and just visit all sections even if there's no debug info (which shouldn't take too much time anyway.)

rocallahan marked 4 inline comments as done.Apr 8 2019, 11:17 PM
rocallahan added inline comments.
lld/ELF/MarkLive.cpp
190 ↗(On Diff #194244)

We can't do that. The comment I added tries to explain why. Is it unclear?

190–191 ↗(On Diff #194244)

I want to skip EhInputSection and SyntheticSection here. So I guess the advice to use isa here was wrong and I should revert to checking kind() == Regular || kind() == Merge?

313 ↗(On Diff #194244)

OK

321 ↗(On Diff #194244)

Yep.

ruiu added inline comments.Apr 8 2019, 11:36 PM
lld/ELF/MarkLive.cpp
190 ↗(On Diff #194244)

Ah, OK, got it.

190–191 ↗(On Diff #194244)

But you don't really care even if it is EhInputSection or SyntheticSection, no? I mean an assigned value would not be used but doesn't harm.

rocallahan marked an inline comment as done.Apr 8 2019, 11:55 PM
rocallahan added inline comments.
lld/ELF/MarkLive.cpp
190–191 ↗(On Diff #194244)

I'm assuming that .eh_frame sections don't have associated debuginfo so even if such a section is enqueued somehow, it should not cause debuginfo to be included for that object file. Is it impossible for a .eh_frame section to be enqueued here? Ditto for a SyntheticSection?

To put it another way, the logic here is supposed to be: "debuginfo can only be associated with Regular or Merge sections, so it only makes sense to mark files as having 'live code or data' possibly associated with debuginfo for those section types."

rocallahan edited the summary of this revision. (Show Details)

The results are basically unchanged with the section type checks removed, so I've just gone ahead and done that.

ruiu added inline comments.Apr 9 2019, 12:52 AM
lld/ELF/MarkLive.cpp
190–191 ↗(On Diff #194244)

No debug sections are attached to synthetic sections, but I don't think that logically matters. The logic we want to implement here is

  1. if a file contains one or more debug info sections, and
  2. if at least one of them are marked live, then
  3. all debug sections should be marked live.

If Sec->Debug is true, it is a debug section, and vice versa. If Sec->File is a non-null, it is a file that the section belongs to.

So, in the above logic, there's no logic that depends on the type of the section. That's why I think it is better to not assert any expected type for a debug section. As long as !IsLSDA && Sec->File, that File should be marked live here.

rocallahan marked an inline comment as done.Apr 9 2019, 3:31 AM
rocallahan added inline comments.
lld/ELF/MarkLive.cpp
190–191 ↗(On Diff #194244)
  1. if a file contains one or more debug info sections, and
  1. if at least one of them are marked live, then
  2. all debug sections should be marked live.

That is not quite right. We are not marking any debuginfo sections live until the final pass. The first condition is "if a file contains one or more non-debuginfo sections marked live for non-LSDA data..."

Anyway the diff I just posted contains the change you wanted here.

ruiu accepted this revision.Apr 9 2019, 11:45 PM

LGTM

I'd write a comment explaining why we are handling debug sections in a special way, but that can be done later. Please submit. Thank you for doing this!

This revision is now accepted and ready to land.Apr 9 2019, 11:45 PM

I don't have commit access so I believe someone has to commit this for me?

ruiu added a comment.Apr 10 2019, 3:27 AM

I will do that.

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.May 8 2019, 12:22 PM

Hello, in a bit of a https://xkcd.com/1172/ moment this breaks the chromium/android build. We have a list of "resources" (strings, bitmaps, etc) that we list in an XML file which then generates a header with lots of "IDR_foo" constants. As it turns out, now all of these resources are used on all platforms, so we use the following technique to only keep the ones actually used on android:

  • Have an empty template function template<int N> Whitelist() {}
  • Have the resource header expand IDR_foo to (Whitelist<unique_id_per_resource>(), unique_id_per_resource) where unique_id_per_resource is something like 123 (and different for every resource)
  • For every IDR_foo referenced in a cc file, this generates debug info for the function call Whitelist<unique_id_per_resource>()

We then look at 'readelf', '-p', '.debug_str' and grep for WhitelistedResource< and keep all resources whose ID is referenced from debug info.

Before this change, this worked great.

Now, resources that are referenced from functions that make it into the final binary get stripped. From the patch description, it sounds like maybe that shouldn't happen (can you confirm?), but in practice it does.

If this only stripped debug info for functions that don't make it into the final binary, this would be a great change for us though!

That problem only seems to happen when (thin) lto is enabled.

rnk added a subscriber: rnk.May 15 2019, 3:31 PM

I think there's an issue with the whole idea of dropping .debug_info from objects without live sections. Consider:

// a.cpp
int main() {
  return f();
}
// b.cpp
struct Foo {
  int x, y;
};
int f() {
  volatile Foo var;
  var.x = 13;
  var.y = 42;
  return var.x + var.y;
}

When compiled and linked with thinlto, f is imported into the a.cpp TU, but the full definition of the Foo type remains in the b.cpp TU, because importing the full type would be expensive and wasteful.

I used these commands to show the change in behavior before and after this change:

$ clang -O2 -flto=thin -c -ffunction-sections a.cpp b.cpp -g 
# before
$ ninja lld
[1 processes, 3/3 @ 0.8/s : 3.582s ] Linking CXX executable bin/lld
$ clang -Wl,--gc-sections -flto=thin  -fuse-ld=lld a.o b.o  -o t.exe
$ gdb -ex 'b foo' -ex r -ex s -ex 'p var' -batch t.exe
...
Breakpoint 1 at 0x201168: file a.cpp, line 2.

Breakpoint 1, foo () at b.cpp:7
7         var.y = 42;
8         return var.x + var.y;
$1 = {x = 13, y = 42}
# after
$ ninja lld 
$ clang -Wl,--gc-sections -flto=thin  -fuse-ld=lld a.o b.o  -o t.exe 
$ gdb -ex 'b foo' -ex r -ex s -ex 'p var' -batch t.exe
...
Breakpoint 1 at 0x201168: file a.cpp, line 2.

Breakpoint 1, foo () at b.cpp:7
7         var.y = 42;
8         return var.x + var.y;
$1 = <incomplete type>

So, before Foo had a complete type, but now it does not.

I can't seem to construct an example where LLD will throw away useful debug info without thinlto, but Clang often makes assumptions that other object files will provide certain bits of debug info as a size optimization.

Updated results for the rusoto test in https://github.com/rust-lang/rust/issues/56068#issue-382175735. The test changed a bit because I'm using an updated Rust toolchain and rusoto_core 0.37.0.

LinkerSize (bytes)Real time (ms)
GNU ld version 2.29.1-23.fc2848,554,7362234
GNU gold (version 2.29.1-23.fc28) 1.1449,888,392813
lld-6.0.1-1.fc28.x86_6449,800,824247
lld d918d74461724a22cedd0b76dc1237392f29565649,873,960224
lld d918d74461724a22cedd0b76dc1237392f295656 + this patch5,390,632158

@rocallahan
Can you also share the linker command line? You may execute export LLD_REPRODUCE=/tmp/reproduce.tar before the linking stage. After unpacking, there is a response.txt file that contains the full command line. I know little about Rust crates but I worry some component doesn't use static archives .a correctly. If you don't have .a but you have a bunch of .o files, you may place them inside a --start-lib ... --end-lib to get the archive semantic.

If all debug information of an object file is discarded, it sounds like if you switch the build system to use archives, the member in the archive will just get discarded, i.e.

If you do ld.lld main.o a.o b.o c.o d1.o d2.o d3.o before, you should probably switch to

ld.lld main.o --start-lib a.o b.o c.o --end-lib --start-lib d1.o d2.o d3.o --end-lib

rnk added a comment.May 16 2019, 1:56 PM

There's another use case worth considering here, which is -fmodules-debuginfo. In that situation, a standalone object file is emitted that contains nothing but DWARF .debug_info sections, and the object is fed to the linker. If the linker drops it with --gc-sections, it's not going to work. Here's a transcript of me trying to use this feature:
https://reviews.llvm.org/P8150

Note that t.o (the normal object) only contains a forward declaration of Foo, and the complete definition is provided in the debug-info-only module object file, t2.o in this case.

I think, before this change, passing an object with only .debug_info in it (and --gc-sections) used to retain that debug info. I think we need a way to keep doing that going forward. I think that might mean we need a new flag for this new behavior, or we should revert this patch altogether, and recommend that people who want smaller debug info use tools like dsymutil, dwz, DWP, or other format aware things that can remove unreferenced or duplicate DWARF.

This breaking both debug info in thinlto builds and fmodules-debuginfo is probably enough to revert and go back to the drawing board.

I suppose we don't have any debug info quality tests that use thinlto?

I also think we should revert this change while we think about the approach. I've been reluctant to do this because I was impressed by how many bytes of output it saves, and I didn't want to lose that just because it doesn't play nice with ThinLTO. However, after some more data gathering and talking to people, I think that:

  1. There is enough doubt that this is the right approach that we probably should take some time to think about which way we really want to go.
  1. In the meantime, a build configuration that used to work is broken.
  1. To fix that while leaving this change in, we would have to build more things on top of the change, which seems unwise if we're not sure that this is the best approach.

So I think we should revert and rethink.

Reasons I'm not convinced that this is the right approach:

  1. We have now identified multiple use cases that break with this change. ThinLTO is one, -fmodules-debuginfo is another. The general theme here is that the expectation is that if an object file is passed to the linker and not between --start-lib and --end-lib or in a static library, the expectation is that it makes it into the final link. -gc-sections asks to discard unused sections, but it's not clear that this should also apply to debug information, and we have seen some problems.
  1. It's also not clear to me that this is as much of a win as I initially thought. It is in the cases that have been demonstrated, of course, but I'm not sure how well that generalizes. On a local Clang build I performed, for example, it made exactly 0 bytes difference. Perhaps the big wins only happen in pathological cases that are better addressed some other way.

So I'm going to revert this to unbreak the Chromium/Android build, and then we can think some more about how we can get the size win without the breakage.

Reverted in r360955.

Thanks for the revert!

So I'm going to revert this to unbreak the Chromium/Android build, and then we can think some more about how we can get the size win without the breakage.

I think by now the Chromium/Android build is a red herring. This drops debug info from thinlto builds in general and makes debugging there worse. It's just that in chromium/android, the debug info was load bearing and hence we noticed there sooner. We should probably set up debug info quality bots using thinlto somewhere, to have a better way of identifying this type of bug.

echristo's comments on the mail thread for this didn't make it to phab. I apologize for repeating more or less everything he said, I had looked on phab before reading email.

ruiu added a comment.May 17 2019, 5:41 AM

Bob, thank you for reverting this.

So, Robert, looks like this idea didn't work well. We need a different solution. And perhaps a better approach is to use --start-lib and --end-lib. You found that these options didn't work well for your input, but I don't fully understand why that's the case. Are you combining all input object files into a single object file using -r linker option? If you just pass all object files to the linker's command line, the linker can choose only the files that are actually used. I'd like to understand why that didn't work for Rust.

For posterity, https://bugs.chromium.org/p/chromium/issues/detail?id=960881#c31 has the explanation for what exactly was happening in the Chromium/Android build. It's a bit different from rnk's example since no cross-TU imports are happening. It likely explains why rnk couldn't find an example that doesn't use thinlto though: To get the bug, you need to require debug info in a TU but all actual code of the TU needs to be stripped. In regular builds this can't happen, but in thinlto builds it can happen in several cases:

  • rnk's example: A function is inlined into a different TU for inlining, but the definition (+ debug info) is in a different, now-empty TU.
  • my example: An inline function is in two TUs. The thin link step marks one of them as weak_odr and the others as available_externally, which causes the debug info to go into only the TU where the function is weak_odr. If this TU happens to contain no actual code, it will now be dead-stripped, even though the other TUs relied on it to provide debug info. This doesn't require any cross-TU function importing.
mati865 added a subscriber: mati865.Jul 4 2019, 5:59 AM

Sorry for not responding earlier. Phabricator insists on sending me emails about all issues in the system, and I never figured out how to get GMail to show me only emails about Phabricator issues I'm CCed on.

I know little about Rust crates but I worry some component doesn't use static archives .a correctly. If you don't have .a but you have a bunch of .o files, you may place them inside a --start-lib ... --end-lib to get the archive semantic.

As I mentioned earlier, Rust already packs the object files for its "crates" (think libraries) into archives, so --start-lib ... --end-lib has no effect. The problem is that when the linker computes the transitive closure of dependencies at object file granularity, it finds that most object files in most Rust libraries are used. It's only when the linker computes the transitive closure of dependencies at function-sections granularity that it finds many functions are not used, and that in some object files all functions are unused. Is this still unclear?

recommend that people who want smaller debug info use tools like dsymutil, dwz, DWP, or other format aware things that can remove unreferenced or duplicate DWARF.

A big advantage of pruning debuginfo in the linker instead of postprocessing the binaries is that the former makes the build faster (see https://reviews.llvm.org/D54747#1459175) and the latter makes the build slower. When a project statically links a lot of binaries (as large Rust projects tend to do for example) doubling the link I/O required is significant. (There might be a way to wire together the linker and a postprocessor via a tmpfs but it would be ugly and still suboptimal...)

I'm not sure how well that generalizes. On a local Clang build I performed, for example, it made exactly 0 bytes difference. Perhaps the big wins only happen in pathological cases that are better addressed some other way.

You didn't say what kind of project you built locally, but I did expect this to have much less impact for C++ projects because their dependencies are often either a) dynamically linked b) manually vendored (in which case well-optimized projects only build the code they actually use) c) "single-header" libraries or d) manually assigning functions to TUs to optimize the link.

As I mentioned earlier, Rust and C++ are significantly different here in that Rust does not require, or even allow, developers to manually assign functions to TUs. Combine that with Rust's preference for static linking and we have the problem I tried to address here.

It might be possible to tackle this at the Rust level by assigning functions to TUs in a more intelligent way. That would never be as effective as having the linker strip unused data, because in the latter case the linker has full knowledge of what is used in the final binary, and in the former case the Rust compiler does not (and cannot, because the library may be used in different ways by different final binaries even in the same project).

So I think we should revert and rethink.

Thanks for reverting and I'm sorry for causing those regressions.

Even though it was rejected in https://reviews.llvm.org/D59553, I think the way forward would be to implement DWARF-level linking in LLD somehow. That would benefit C++ projects since you could shrink the generated DWARF with better overall performance than a postprocessor. It would also improve and simplify debuggers --- right now it's a real pain for debuggers to collect all relevant DWARF information for a given type from all the TUs clang can smear it over. DWARF-level linking is quite a big task so someone would probably need to pay someone to do it, and the LLD community would need to agree up front that it's acceptable, so time and money isn't wasted.

Whatever you do, it will almost certainly need to be opt-in, since as noted in reason #1 in https://reviews.llvm.org/D54747#1505749 there are existing workflows that assume any debuginfo in a .o file on the link command line is preserved as-is.