Page MenuHomePhabricator

[WIP][LLD][ELF][DebugInfo] Remove obsolete debug info.
Needs ReviewPublic

Authored by avl on Feb 6 2020, 2:17 PM.

Details

Summary

This patch is the implementation for removing obsolete debug info:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135473.html

It uses DWARFLinker which contains code refactored out from dsymutil.

Diff Detail

Unit TestsFailed

TimeTest
120 mswindows > lld.ELF::gc-debuginfo.test
Script: -- : 'RUN: at line 3'; c:\ws\w1\llvm-project\premerge-checks\build\bin\llvm-mc.exe -filetype=obj -triple=x86_64 C:\ws\w1\llvm-project\premerge-checks\lld\test\ELF\gc-debuginfo.test -o C:\ws\w1\llvm-project\premerge-checks\build\tools\lld\test\ELF\Output\gc-debuginfo.test.tmp.o
440 mswindows > lld.ELF/invalid::symtab-sh-info.s
Script: -- : 'RUN: at line 4'; c:\ws\w1\llvm-project\premerge-checks\build\bin\yaml2obj.exe --docnum=1 C:\ws\w1\llvm-project\premerge-checks\lld\test\ELF\invalid\symtab-sh-info.s -o C:\ws\w1\llvm-project\premerge-checks\build\tools\lld\test\ELF\invalid\Output\symtab-sh-info.s.tmp.o

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
grimar added inline comments.Apr 6 2020, 4:46 AM
lld/ELF/LLDDwarfLinker.cpp
144

Why this assert is useful? What scenario did you have in mind?

146

I think you can just inline Value.

194

An excessive empty line.

195

This comment needs expanding I think. "Estimation" is something that usually sounds tricky and requires an explanation of your algorithm implemented.

197

Don't use auto when the type is not obvious.

200

No need to have curly bracers here.

231

foo.size() > 0 -> !foo.empty() when foo is StringRef

257

emptyWarnings is unused. Is it OK? You can inline it if so.

267

Map what to what?

272

The way you work with Expected<> is incorrect. It might fail on error with LLVM_ENABLE_ABI_BREAKING_CHECKS=1.
You need to handle the error or to ignore it explicitly. E.g.:

Expected<GlobPattern> pat = GlobPattern::create(arg);
if (!pat) {
  error("--undefined-glob: " + toString(pat.takeError()));
  return;
}
284

Don't use auto here.

293

StringRef() -> ""

But I think it can be just:

for (auto &sec : outputSections)
  if (isDebugSection(sec->name))
    sec->setDebugData(sectionsMap.lookup(sec->name));
lld/ELF/Options.td
226

When you add an LLD option, you should also update the documentation (lld/docs/ld.lld.1)

lld/ELF/OutputSections.cpp
338

Needs a comment.

lld/ELF/OutputSections.h
128

Did you consider to use llvm::Optional<StringRef>?

llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
461

You do not need to use else after return:

if (Map.getTriple().isOSDarwin() && !Map.getBinaryPath().empty() &&
    Options.FileType == OutputFileType::Object)
  return MachOUtils::generateDsymCompanion(
      Map, Options.Translator, *Streamer->getAsmPrinter().OutStreamer,
      OutFile);
return Streamer->finish(), true;
462

Something is wrong with this line. , true is a part of something else it seems.

avl marked 32 inline comments as done.Apr 6 2020, 10:33 AM
avl added inline comments.
lld/ELF/DWARF.cpp
61

There should be three states here(seen, not seen, seen more than once). DWARFFormValue::dumpAddressSection() needs this to properly dump die address. If name seen more than once it prints section index.

// Print section index if name is not unique.
if (!SecRef.IsNameUnique)
  OS << format(" [%" PRIu64 "]", SectionIndex);
275

right. It assumes that relocations are sorted by r_offset.
It is sorted in scanRelocs(). But I missed that it is sorted not for all types of sections.
Would fix it , Thanks!

lld/ELF/DWARF.h
93

It is used at DWARFFormValue::dumpAddressSection(). When address is dumped, DWARFObject could be asked for section names(LLDDwarfObj is a child of DWARFObject). It could happen when Verbose mode is set for DWARFLinker.

132

Yes, it could be used outside of LLDDwarfObj() constructor. When DWARFObject::getSectionNames() is called the sectionNames is used.
One such place is DWARFFormValue::dumpAddressSection().

lld/ELF/InputSection.h
127

To not do string compare in every usage. i.e. that bit is set in the constructor and at every usage place is checked. Otherwise, it would be necessary to compare strings at every usage place.

lld/ELF/LLDDwarfLinker.cpp
66

The idea was to not put error data into sectionIndexesMap. sec is indeed checked later for nullptr.
But it looks like not putting wrong data is better approach than relying on following checks.

144

To check that relocation passed to the handler is from baseOffset, baseOffset + data.size() range.
i.e. to check that enumerateRelocations works correctly and pass proper relocations to the handler.

257

it is used only in std::make_unique<DwarfFile>.

If emptyWarnings is used inline then it`s constructor would be called for each ObjectFile. Probably we could rely on compiler here...

267

Would add comment.

grimar added a comment.Apr 7 2020, 3:20 AM

Few more comments. They are mostly to give a feedback about the code style, I have not look deep into how it works yet.

lld/ELF/DWARF.cpp
29

I'd call it sectionNumber, having Map in the name here is not useful.

58

To avoid nesting, you can use an early continue:

if (!sec) {
  sectionNames.push_back({"", true});
  continue;
}
...
61

Ah, sorry, I've misread the code slightly, I did not notice the codition is sectionAmountMap[sec.Name] > 1 and not sectionAmountMap[sec.Name] > 0.

82

It is better to avoid using operator[] in situations like this, because
it is not const, when a sec.Name is not found it creates a new map entry. This is not what you want here,
you can use lookup().

lld/ELF/InputSection.h
127

Yeah, but did you measure numbers, i.e. is there a visible benefit? The code in this diff still calls isDebugSection in many places. It looks like a premature optimization to me and/or something that could probably be moved to another patch (which could change all call sites at once).

lld/ELF/LLDDwarfLinker.cpp
207

I think the returned type of this lambda can be deduced:

[](StringRef S) { return S; }

219

This 2 lambdas looks similar. Also you have 2 more below.
It seems you could do something like the following?

auto ReportWarn = [] [&](const Twine &msg, StringRef ctx, const DWARFDie *) {
if (!context.empty())
  warn(ctx+ ": " + msg);
else
  warn(msg);
};

DwarfStreamer outStreamer(..., ReportWarn, ReportWarn);
257

I'd inline to make the code shorter/cleaner. It doesn't look as hot path, so no need to optimize it until a benefit is proven.

avl marked an inline comment as done.Apr 9 2020, 5:25 AM
avl added inline comments.
lld/ELF/DWARF.cpp
275

@grimar @ruiu It looks like it was considered safe to rely on sorted order of relocations - D36079. What do you think ?

grimar added inline comments.Apr 9 2020, 6:18 AM
lld/ELF/DWARF.cpp
275

I do not know. ELF specification does not says they must be. Perhaps we can at least check this with std::is_sorted.
I see lld/Wasm does it:

ArrayRef<WasmRelocation> relocs = section->Relocations;
assert(std::is_sorted(relocs.begin(), relocs.end(),
                      [](const WasmRelocation &r1, const WasmRelocation &r2) {
                        return r1.Offset < r2.Offset;
                      }));

assert will not help in release build though.
Perhaps other people can comment on this, as you see my concern about D36079 was not answered that time it seems.

avl marked an inline comment as done.Apr 9 2020, 6:43 AM
avl added inline comments.
lld/ELF/DWARF.cpp
275

I see. I also did not find any reference to documentation which mentioned that relocations must be sorted(nor ELF file specification, nor DWARF specification).

grimar added inline comments.Apr 9 2020, 7:37 AM
lld/ELF/DWARF.cpp
275

I found it was answered actually:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170731/474875.html
Then it is OK I think.

avl updated this revision to Diff 256333.Apr 9 2020, 10:33 AM

addressed comments:

added test for -flto case.
resolved style issues.

I did not change isDebug bit yet. would make it with separate review.

avl added a comment.Apr 9 2020, 10:43 AM

@jhenderson

One drive-by request/question: have you done any measurements of the impact on debuggers? In other words, does it improve/make worse the time it takes them to load the debug data?

loading times for debuggers:

no gc-debuginfo case:
gdb 8.1 clang11 1,4G: loading time 56 sec used memory 2166872kb
lldb 8.0 clang11 1,4G: loading time 7.5 sec used memory 1508860kb

gc-debuginfo case:
gdb 8.1 clang11 783M: loading time 51 sec used memory 1741552kb
lldb 8.1 clang11 783M: loading time 4 sec used memory 842980kb

Overall result: gc-debuginfo makes loading time shorter and reduces memory usage for debuggers,

In D74169#1972348, @avl wrote:

@jhenderson

One drive-by request/question: have you done any measurements of the impact on debuggers? In other words, does it improve/make worse the time it takes them to load the debug data?

loading times for debuggers:

no gc-debuginfo case:
gdb 8.1 clang11 1,4G: loading time 56 sec used memory 2166872kb
lldb 8.0 clang11 1,4G: loading time 7.5 sec used memory 1508860kb

gc-debuginfo case:
gdb 8.1 clang11 783M: loading time 51 sec used memory 1741552kb
lldb 8.1 clang11 783M: loading time 4 sec used memory 842980kb

Overall result: gc-debuginfo makes loading time shorter and reduces memory usage for debuggers,

Might be worth comparing with -gdb-index too (though not sure if gdb-index will be immediately compatible with the DWARF aware linking, so a difficult comparison - but even comparing "gdb-index without DWARF-aware linking V DWARF-aware linking without gdb-index", etc)

avl added a comment.Apr 10 2020, 4:06 AM

Might be worth comparing with -gdb-index too (though not sure if gdb-index will be immediately compatible with the DWARF aware linking, so a difficult comparison - but even comparing "gdb-index without DWARF-aware linking V DWARF-aware linking without gdb-index", etc)

here are the results for clang binary:

-------------------------------------------------------------
|     Options    | Run-time, sec | Used Memory, kb |  Size  |
-------------------------------------------------------------
|                |     35.58     |     9403596     |  1,4G  |
-------------------------------------------------------------
|  gc-debuginfo  |    109.15     |    14275444     |  783M  |
-------------------------------------------------------------
|  gc-debuginfo  |    130.63     |    14394700     |  797M  |
|   gdb-index    |               |                 |        |
-------------------------------------------------------------
|   gdb-index    |     52.88     |     9524976     |  1.5G  | 
-------------------------------------------------------------

I did not check whether gdb-index is compatible with DWARF aware linking output.

There probably exists the room for optimization of gdb-index(related to the strategy of DWARF context creation).
As it was mentioned in https://reviews.llvm.org/D74773#1920751 simple caching of DWARF context results in an 8% performance increase and 20% memory usage increase.
It is probably worth implementing some smart strategy(reference counting?) that will keep DWARF context while there are users of it and release it when it is not expected to be used(so that it does not hold all DWARF contexts forever).

In D74169#1974097, @avl wrote:

Might be worth comparing with -gdb-index too (though not sure if gdb-index will be immediately compatible with the DWARF aware linking, so a difficult comparison - but even comparing "gdb-index without DWARF-aware linking V DWARF-aware linking without gdb-index", etc)

here are the results for clang binary:

-------------------------------------------------------------
|     Options    | Run-time, sec | Used Memory, kb |  Size  |
-------------------------------------------------------------
|                |     35.58     |     9403596     |  1,4G  |
-------------------------------------------------------------
|  gc-debuginfo  |    109.15     |    14275444     |  783M  |
-------------------------------------------------------------
|  gc-debuginfo  |    130.63     |    14394700     |  797M  |
|   gdb-index    |               |                 |        |
-------------------------------------------------------------
|   gdb-index    |     52.88     |     9524976     |  1.5G  | 
-------------------------------------------------------------

Ah, sorry, I meant specifically related to the gdb startup time point you made earlier about DWARF-aware linking improving startup time. It'd be useful to measure startup time with a gdb-index for comparison. (I'm guessing the time in the above chart is linking time, rather than GDB startup time)

I did not check whether gdb-index is compatible with DWARF aware linking output.

There probably exists the room for optimization of gdb-index(related to the strategy of DWARF context creation).
As it was mentioned in https://reviews.llvm.org/D74773#1920751 simple caching of DWARF context results in an 8% performance increase and 20% memory usage increase.
It is probably worth implementing some smart strategy(reference counting?) that will keep DWARF context while there are users of it and release it when it is not expected to be used(so that it does not hold all DWARF contexts forever).

avl added a comment.Apr 10 2020, 10:27 AM

Ah, sorry, I meant specifically related to the gdb startup time point you made earlier about DWARF-aware linking improving startup time. It'd be useful to measure startup time with a gdb-index for comparison. (I'm guessing the time in the above chart is linking time, rather than GDB startup time)

right, these are linking times. here is start-up times:

-------------------------------------------------------------------
 Debugger     |     Options    | load-time, sec | Used Memory, kb |
-------------------------------------------------------------------
| gdb 8.1     |                |      47,79     |     2168172     |
-------------------------------------------------------------------
| gdb 8.1     |    gdb-index   |       3.04     |      555408     |
-------------------------------------------------------------------
| gdb 8.1     |  gc-debuginfo  |      44,8      |     1740876     |
-------------------------------------------------------------------
| gdb 8.1     |    gdb-index   |       3.7      |      555340     |
|             |  gc-debuginfo  |                |                 |
-------------------------------------------------------------------
| lldb 8      |                |       6,29     |     1511052     |
-------------------------------------------------------------------
| lldb 8      |    gdb-index   |       6,22     |     1523516     |
-------------------------------------------------------------------
| lldb 8      |  gc-debuginfo  |       3,32     |      843600     |
-------------------------------------------------------------------
| lldb 8      |    gdb-index   |       3.6      |      858532     |
|             |  gc-debuginfo  |                |                 |
-------------------------------------------------------------------

though gdb-index+gc-debuginfo times could be meaningless.

In D74169#1974622, @avl wrote:

Ah, sorry, I meant specifically related to the gdb startup time point you made earlier about DWARF-aware linking improving startup time. It'd be useful to measure startup time with a gdb-index for comparison. (I'm guessing the time in the above chart is linking time, rather than GDB startup time)

right, these are linking times. here is start-up times:

-------------------------------------------------------------------
 Debugger     |     Options    | load-time, sec | Used Memory, kb |
-------------------------------------------------------------------
| gdb 8.1     |                |      47,79     |     2168172     |
-------------------------------------------------------------------
| gdb 8.1     |    gdb-index   |       3.04     |      555408     |
-------------------------------------------------------------------
| gdb 8.1     |  gc-debuginfo  |      44,8      |     1740876     |
-------------------------------------------------------------------
| gdb 8.1     |    gdb-index   |       3.7      |      555340     |
|             |  gc-debuginfo  |                |                 |
-------------------------------------------------------------------
| lldb 8      |                |       6,29     |     1511052     |
-------------------------------------------------------------------
| lldb 8      |    gdb-index   |       6,22     |     1523516     |
-------------------------------------------------------------------
| lldb 8      |  gc-debuginfo  |       3,32     |      843600     |
-------------------------------------------------------------------
| lldb 8      |    gdb-index   |       3.6      |      858532     |
|             |  gc-debuginfo  |                |                 |
-------------------------------------------------------------------

though gdb-index+gc-debuginfo times could be meaningless.

Thanks! (yeah, the gdb-index + lldb times aren't relevant either, lldb uses a different kind of accelerated access (one that's only supported on the MachO debug info distribution model currently - but I believe someone's working on/plans to work on DWARFv5 debug_names support))

avl added a comment.Apr 13 2020, 8:35 AM

Thanks! (yeah, the gdb-index + lldb times aren't relevant either, lldb uses a different kind of accelerated access (one that's only supported on the MachO debug info distribution model currently - but I believe someone's working on/plans to work on DWARFv5 debug_names support))

dsymutil supports DWARFv5 debug_names table generation. So if --gc-debuginfo specified then debug_names would be generated. This patch has a minor bug with it. I would update it.

avl updated this revision to Diff 257399.Apr 14 2020, 10:28 AM

added test for flto case.(which shows that -flto=thin does not work currently).
added option --debug-names, which enable generation of debug_names table.

avl added a comment.Apr 14 2020, 10:32 AM

This implementation does not work for -flto=thin case. That is the same problem like in D54747: https://reviews.llvm.org/D54747#1503720.

avl updated this revision to Diff 258171.Apr 16 2020, 2:20 PM

rebased.

avl added a comment.Apr 17 2020, 7:23 AM

Re-measured size/performance results:

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --fdebug-types-section --gc-debuginfo
(last case does not produce valid DWARF currently)

LLVM code base:
--------------------------------------------------------------
| Options |    build time   |    bin size   |    lib size    | 
--------------------------------------------------------------
|    A    |    66min(100%)  |   18.0G(100%) |  16.0G(100.0%) |
--------------------------------------------------------------
|    B    |    81min(123%)  |    9.7G( 54%) |  13.0G( 81.0%) |
--------------------------------------------------------------
|    C    |    58min( 87%)  |   12.0G( 67%) |  15.0G( 93.5%) |
--------------------------------------------------------------
|    D    |    58min( 87%)  |   12.0G( 67%) |   8.7G( 54.0%) |
--------------------------------------------------------------
|    E    |    68min(103%)  |    7.6G( 42%) |  14.0G( 87.5%) |
--------------------------------------------------------------


Clang binary:
--------------------------------------------------------------
| Options |       size     |     link time   |  used memory  |
--------------------------------------------------------------
|    A    |    1.50G(100%) |   26.7sec(100%) |  9191MB(100%) |
--------------------------------------------------------------
|    B    |    0.76G( 50%) |  107.4sec(402%) | 13361MB(145%) |
--------------------------------------------------------------
|    C    |    0.82G( 54%) |   18.3sec( 68%) |  8300MB( 90%) |
--------------------------------------------------------------
|    D    |    0.96G( 64%) |    7.6sec( 28%) |  4255MB( 46%) |
--------------------------------------------------------------
|    E    |    0.58G( 45%) |   58.5sec(219%) | 10317MB(112%) |
--------------------------------------------------------------


lldb loading time:
---------------------------------------------
| Options |       time     |   used memory  |
---------------------------------------------
|    A    |  6.32sec(100%) |  1475MB(100%)  |
---------------------------------------------
|    B    |  3.49sec( 55%) |   825MB( 56%)  |
---------------------------------------------
|    C    |  3.65sec( 58%) |   875MB( 59%)  |
---------------------------------------------
|    D    |  4.12sec( 65%) |  1017MB( 69%)  |
---------------------------------------------
|    E    |  3.95sec( 63%) |   628MB( 43%)  |
---------------------------------------------

Just an excited observer, but wanted to drop some statistics of my own, in case they're helpful.

I have a Rust project (Materialize) that compiles down into a ~800MB (!) binary by default. The weight is almost entirely debug info; .text is less than 5% of its size. This is evidently an artifact of how Rust codegen units are much, much larger than your average C/C++ codegen units and so the toolchain relies heavily on the linker for dead code elimination. Of course that dead code elimination doesn't currently apply to the debug info. This diff is exactly what's needed to solve the problem, as far as I can tell. (There's an upstream issue about this that led me here: https://github.com/rust-lang/rust/issues/56068.)

Without --gc-debuginfo:

# Link time: 10.486s
benesch@gibbon:~/materialize$ bloaty target/release/materialized -n 10
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  24.5%   194Mi   0.0%       0    .debug_info
  24.1%   191Mi   0.0%       0    .debug_loc
  13.8%   109Mi   0.0%       0    .debug_pubtypes
  10.1%  79.9Mi   0.0%       0    .debug_pubnames
   8.8%  70.0Mi   0.0%       0    .debug_str
   8.3%  66.3Mi   0.0%       0    .debug_ranges
   4.4%  35.3Mi   0.0%       0    .debug_line
   3.1%  24.8Mi  66.3%  24.8Mi    .text
   1.8%  14.4Mi  25.1%  9.39Mi    [41 Others]
   0.6%  4.79Mi   0.0%       0    .strtab
   0.4%  3.22Mi   8.6%  3.22Mi    .eh_frame
 100.0%   793Mi 100.0%  37.4Mi    TOTAL

With --gc-debuginfo:

# Link time: 249.855s
benesch@gibbon:~/materialize$ bloaty target/release/materialized -n 10
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  37.9%   115Mi   0.0%       0    .debug_info
  17.9%  54.7Mi   0.0%       0    .debug_str
  12.0%  36.7Mi   0.0%       0    .debug_ranges
  11.2%  34.4Mi   0.0%       0    .debug_loc
   8.1%  24.8Mi  66.3%  24.8Mi    .text
   6.6%  20.1Mi   0.0%       0    .debug_line
   2.7%  8.15Mi  19.3%  7.23Mi    [35 Others]
   1.6%  4.79Mi   0.0%       0    .strtab
   1.1%  3.22Mi   8.6%  3.22Mi    .eh_frame
   1.0%  2.99Mi   0.0%       0    .symtab
   0.0%       0   5.8%  2.16Mi    .bss
 100.0%   305Mi 100.0%  37.4Mi    TOTAL

So hugely successful in reducing the size of the debug info! The new binary is 40% of the size of the original. Some of that savings is from omitting debug_pubtypes and debug_pubnames, which is easy enough to strip without this patch, but there is also a substantial reduction in debug_info and debug_loc—a much larger reduction than dbz is able to provide.

Unfortunately there is also a massive blowup in link times, from 10s to 250s. That's a 25x increase, much larger than the link time increase for Clang described above, which was about 4x.

Just an excited observer, but wanted to drop some statistics of my own, in case they're helpful.

I have a Rust project (Materialize) that compiles down into a ~800MB (!) binary by default. The weight is almost entirely debug info; .text is less than 5% of its size. This is evidently an artifact of how Rust codegen units are much, much larger than your average C/C++ codegen units and so the toolchain relies heavily on the linker for dead code elimination.

Out of curiosity is there any work in the direction of fragmenting these modules/objects further to reduce this & other negative side effects of that situation?

Of course that dead code elimination doesn't currently apply to the debug info. This diff is exactly what's needed to solve the problem, as far as I can tell. (There's an upstream issue about this that led me here: https://github.com/rust-lang/rust/issues/56068.)

Without --gc-debuginfo:

# Link time: 10.486s
benesch@gibbon:~/materialize$ bloaty target/release/materialized -n 10
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  24.5%   194Mi   0.0%       0    .debug_info
  24.1%   191Mi   0.0%       0    .debug_loc
  13.8%   109Mi   0.0%       0    .debug_pubtypes
  10.1%  79.9Mi   0.0%       0    .debug_pubnames
   8.8%  70.0Mi   0.0%       0    .debug_str
   8.3%  66.3Mi   0.0%       0    .debug_ranges
   4.4%  35.3Mi   0.0%       0    .debug_line
   3.1%  24.8Mi  66.3%  24.8Mi    .text
   1.8%  14.4Mi  25.1%  9.39Mi    [41 Others]
   0.6%  4.79Mi   0.0%       0    .strtab
   0.4%  3.22Mi   8.6%  3.22Mi    .eh_frame
 100.0%   793Mi 100.0%  37.4Mi    TOTAL

With --gc-debuginfo:

# Link time: 249.855s
benesch@gibbon:~/materialize$ bloaty target/release/materialized -n 10
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  37.9%   115Mi   0.0%       0    .debug_info
  17.9%  54.7Mi   0.0%       0    .debug_str
  12.0%  36.7Mi   0.0%       0    .debug_ranges
  11.2%  34.4Mi   0.0%       0    .debug_loc
   8.1%  24.8Mi  66.3%  24.8Mi    .text
   6.6%  20.1Mi   0.0%       0    .debug_line
   2.7%  8.15Mi  19.3%  7.23Mi    [35 Others]
   1.6%  4.79Mi   0.0%       0    .strtab
   1.1%  3.22Mi   8.6%  3.22Mi    .eh_frame
   1.0%  2.99Mi   0.0%       0    .symtab
   0.0%       0   5.8%  2.16Mi    .bss
 100.0%   305Mi 100.0%  37.4Mi    TOTAL

So hugely successful in reducing the size of the debug info! The new binary is 40% of the size of the original. Some of that savings is from omitting debug_pubtypes and debug_pubnames, which is easy enough to strip without this patch, but there is also a substantial reduction in debug_info and debug_loc—a much larger reduction than dbz is able to provide.

Unfortunately there is also a massive blowup in link times, from 10s to 250s. That's a 25x increase, much larger than the link time increase for Clang described above, which was about 4x.

Out of curiosity is there any work in the direction of fragmenting these modules/objects further to reduce this & other negative side effects of that situation?

I'm afraid I have no idea. Nothing I'm aware of, but I'm not a rustc dev. Perhaps @rocallahan knows. I'll ask on the Rust Zulip, too.

avl updated this revision to Diff 258596.Apr 19 2020, 5:52 AM

allowed multi-threaded execution.

avl added a comment.Apr 19 2020, 5:57 AM

@benesch Thank you for the interest in this patch and the statistics data.

Unfortunately there is also a massive blowup in link times, from 10s to 250s. That's a 25x increase, much larger than the link time increase for Clang described above, which was about 4x.

Excuse me. It looks like my reported statistic is not very accurate. I would re-measure it and re-submit.
There are two things here: first is that reported numbers are for the patch, which I did not update(because of some bug). And another is that calculations are not accurate. I have just updated the patch. Following is what I got for clang:

/usr/bin/time -f "%E %M" bin/clang-11
0:11.62 9509308

/usr/bin/time -f "%E %M" bin/clang-11 --gc-debuginfo
1:15.16 15346672

i.e. it is about 6.5x, not 25x. I would also check the numbers for the updated patch and for your materialize project. Probably, it might be quite close since .text section takes 7.5% of entire binary in clang.
Thank you.

The new binary is 40% of the size of the original. Some of that savings is from omitting debug_pubtypes and debug_pubnames, which is easy enough to strip without this patch

The reason why debug_pubtypes and debug_pubnames were stripped, is that the current patch uses already existed dsymutil code, which does not generate debug_pubtypes and debug_pubnames. This patch changes debug info. Thus it removes other debug info tables, which it does not support(since they most probably would contain no relevant data). Supporting all these tables(as well as DWARF5) is a future task for the DWARFLinker.

avl added a comment.Apr 21 2020, 2:27 PM

Re-measured size/performance results once more. These results are for 12-cores machine. Actually, the more cores the bigger difference between --gc-debuginfo/--no-gc-debuginfo cases.

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --gc-debuginfo --compress-debug-sections=zlib
F: --function-sections --gc-sections --fdebug-types-section --gc-debuginfo
(last case does not produce valid DWARF currently)

LLVM code base:
--------------------------------------------------------------
| Options |    build time   |    bin size   |    lib size    | 
--------------------------------------------------------------
|    A    |    54min(100%)  |   19.0G(100%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    B    |    65min(120%)  |    9.7G( 51%) |  12.0G( 80.0%) |
--------------------------------------------------------------
|    C    |    53min( 98%)  |   12.0G( 63%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    D    |    52min( 96%)  |   12.0G( 63%) |   8.2G( 55.0%) |
--------------------------------------------------------------
|    E    |    64min(118%)  |    5.3G( 28%) |  12.0G( 80.0%) |
--------------------------------------------------------------
|    F    |    57min(105%)  |    7.6G( 40%) |  14.0G( 93.0%) |
--------------------------------------------------------------


Clang binary:
-------------------------------------------------------------
| Options |      size      |     link time  |  used memory  |
-------------------------------------------------------------
|    A    |    1.50G(100%) |    9sec(100%)  |  9307MB(100%) |
-------------------------------------------------------------
|    B    |    0.76G( 50%) |   72sec(800%)  | 15055MB(161%) |
-------------------------------------------------------------
|    C    |    0.82G( 54%) |    8sec( 89%)  |  8402MB( 90%) |
-------------------------------------------------------------
|    D    |    0.96G( 64%) |    6sec( 67%)  |  4273MB( 46%) |
-------------------------------------------------------------
|    E    |    0.43G( 29%) |   80sec(889%)  | 15000MB(161%) |
-------------------------------------------------------------
|    F    |    0.58G( 39%) |   30sec(334%)  | 10890MB(117%) |
-------------------------------------------------------------


lldb loading time:
--------------------------------------------
| Options |      time     |   used memory  |
--------------------------------------------
|    A    |  6.4sec(100%) |  1495MB(100%)  |
--------------------------------------------
|    B    |  4.0sec( 63%) |   826MB( 55%)  |
--------------------------------------------
|    C    |  3.7sec( 58%) |   877MB( 59%)  |
--------------------------------------------
|    D    |  4.3sec( 67%) |  1023MB( 69%)  |
--------------------------------------------
|    E    |  2.1sec( 33%) |   478MB( 32%)  |
--------------------------------------------
|    F    |  2.7sec( 42%) |   631MB( 43%)  |
--------------------------------------------
avl updated this revision to Diff 260362.Apr 27 2020, 9:56 AM

rebased. added tests.

avl added a comment.Apr 27 2020, 10:36 AM

@ruiu @grimar @MaskRay @dblaikie @clayborg

What do you think about that patch ? Whether it is generally OK to start integration? If it is OK then I would divide the patch to smaller patches and start to integrate.

I think it is worth integrating: it has a small impact on existing lld code(most implementation is in separate library DWARFLinker. Only lld/ELF/DWARF.* and lld/ELF/LLDDwarfLinker.* are noticeably affected). --gc-debuginfo is off by default so it would not affect most of the users. Only those who explicitly specified --gc-debuginfo would use that functionality. The patch noticeably reduces the size of debug info. It also resolves"overlapping address ranges" problem. Upcoming optimizations could minimize the performance reduction.

avl updated this revision to Diff 261860.May 4 2020, 10:16 AM

rebased. improved execution time for 5%.

@avl Would it be possible to rebase this on latest, or is this diff abandoned?

avl updated this revision to Diff 292457.Sep 17 2020, 4:21 AM
avl edited the summary of this revision. (Show Details)

rebased.

avl added a comment.Sep 17 2020, 4:42 AM

@ayermolo Hi Alexander, I rebased on latest but did not check whether current size/performance numbers match with reported previously.

ayermolo added inline comments.Sep 23 2020, 2:58 PM
lld/ELF/LLDDwarfLinker.cpp
81

The RangesTy is defined as a map with uint64_t
using RangesTy = std::map<uint64_t, ObjFileAddressRange>;

Not sure I follow how this works. Am I missing some c++ thing? This fails to compile with clang.

223

There is no AccelTableKind::None, did you mean AccelTableKind::Default?

enum class AccelTableKind {

Apple,   ///< .apple_names, .apple_namespaces, .apple_types, .apple_objc.
Dwarf,   ///< DWARF v5 .debug_names.
Default, ///< Dwarf for DWARF5 or later, Apple otherwise.

};

avl added inline comments.Sep 24 2020, 1:26 AM
lld/ELF/LLDDwarfLinker.cpp
81

The RangesTy is defined as a map with object::SectionedAddress.

/// Map LowPC to AddressHighPC.
using RangesTy = std::map<object::SectionedAddress, AddressHighPC>;

It looks like your local source files are in inconsistent state. This patch was build successfully by pre-merge build bot.

223

It is added by this patch at line 29 of llvm/include/llvm/DWARFLinker/DWARFLinker.h.

llvm/include/llvm/DWARFLinker/DWARFLinker.h
29

Here is the place where AccelTableKind::None is defined.

avl updated this revision to Diff 298246.Oct 14 2020, 2:53 PM

rebased, added command-line option --gc-debuginfo-no-odr.