Page MenuHomePhabricator

[WIP][LLD][ELF][DebugInfo] Remove obsolete debug info.
Needs ReviewPublic

Authored by avl on Feb 6 2020, 2:17 PM.

Details

Summary

This patch is the implementation for removing obsolete debug info:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135473.html

It uses DWARFLinker which contains code refactored out from dsymutil.

Diff Detail

Event Timeline

avl created this revision.Feb 6 2020, 2:17 PM

test/ELF/gc-debuginfo.s works without the code change (after deleting --gc-debuginfo). How many changes do you expect will be needed to make the feature work?

lld/ELF/CMakeLists.txt
57

Is AsmPrinter used?

lld/ELF/LLDDwarfLinker.cpp
71

Capitalize.

Append full stop.

71

// for each object file does not convey more information than the code itself.

Say what this loop is supposed to do.

74

Delete empty line

82

Doesn't make_unique work?

lld/test/ELF/gc-debuginfo.s
3

Delete (invisibile) whitespace

7

- -> --

11

Append # CHECK-NOT: {{.}}

11

Delete trailing space

grimar added a comment.Feb 7 2020, 2:01 AM

test/ELF/gc-debuginfo.s works without the code change (after deleting --gc-debuginfo). How many changes do you expect will be needed to make the feature work?

(Disclaimer: As a person who is interested in this feature implicitly (Alexey and I do work for the same company))

I wonder if that is really important to split patches that much (this patch seems have no functionality at all?). It would be good to see at least some benefit, but perhaps
it is not that easy to do... Anyways:
Personally I'd probably prefer to see a full set of patches for the feature. A reviewer should be able to apply them and debug (if he/she wants to) probably.

lld/ELF/CMakeLists.txt
38

I'd not add "LLD" prefix. All if sources here belong to "LLD" actually.

lld/ELF/Driver.cpp
2004

Missing a full stop.

lld/ELF/LLDDwarfLinker.cpp
13

Is it a final version of the comment? I'd expect to habe more information here probably about what we exactly do.

lld/test/ELF/gc-debuginfo.s
2

I'd ask to add a comment:

## This is a test for blah blah..
avl added a comment.Feb 7 2020, 5:23 AM
How many changes do you expect will be needed to make the feature work?

Hi, following is the approximate list of changes, which assumed to match with corresponding patches :

  1. Do not create DWARFContext inplace. Use ObjFile<ELFT>::dwarf and "llvm::call_once(initDwarfLine, [this]() { initializeDwarf(); });" instead.
  1. Patch DebugInfoDWARF library to have a possibility to replace the error handler. Someone could already pass own error handler for DWARFContext. But there are places where WithColor::error() is used directly:
void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {
if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))
  WithColor::error() << toString(std::move(e));
}

Thus, it is necessary to have a possibility to replace ErrorHandler here.

  1. Replace LLDDwarfObj with DWARFObjInMemory implementation while creating DWARFContext:
for (InputFile *file : objectFiles) {
  std::unique_ptr<ObjectFile> ObjectFile = createELFObjectFile(file.mb);

  std::unique_ptr<DWARFContext> DWARFContext::create (ObjectFile, nullptr, ErrorHandler));
}
  1. Create an implementation for lld/ELF/LLDDwarfLinker.cpp::ObjFileAddressMap.
  1. Create special kind of section - "DebugInputSection : public InputSection". To keep the result of debug info linking and to handle relocations.
  1. add LLD`s implementation for DWARFLinker/DWARFLinker.h::DwarfEmitter, which would write the output of DWARFLinker into DebugInputSection.
avl marked 13 inline comments as done.Feb 7 2020, 5:43 AM
avl added inline comments.
lld/ELF/CMakeLists.txt
38

The idea is that the file name matches the class name. The class name has LLD to be different from DWARFLinker from DWARFLinker library. If it looks redundant, then I would delete this prefix.

57

It is used because LLDDwarfLinker.cpp includes DWARFLinker.h, which includes "llvm/CodeGen/Die.h", which needs "llvm::DIEAbbrev::Profile(llvm::FoldingSetNodeID&)" which is in AsmPrinter.

probably it could be refactored out later.

lld/ELF/LLDDwarfLinker.cpp
13

Ok, I would put more detailed comment.

avl added a comment.Feb 7 2020, 6:03 AM

I wonder if that is really important to split patches that much (this patch seems have no functionality at all?). It would be good to see at least some benefit, but perhaps
it is not that easy to do... Anyways:
Personally I'd probably prefer to see a full set of patches for the feature. A reviewer should be able to apply them and debug (if he/she wants to) probably.

Hi George,

I follow the idea is that small patches are easier to review and easier to integrate. That concrete patch does not have a functionality but it includes DWARLinker library, creates option and creates placeholders for the functionality. Which looks like a small amount of things that could be reviewed.

Creating full set of patches which could be any time applied to the upstream and to the each other is quite time consuming thing because of time for synchronization/merging/rebasing. As an alternative we could discuss main points of future patches[0]. And iteratively apply patches step by step. If that plan is not good - then OK, I would start to work on preparing full set of patches for that feature.

[0] following is the approximate list of changes, which assumed to match with corresponding patches :

  1. Do not create DWARFContext inplace. Use ObjFile<ELFT>::dwarf and "llvm::call_once(initDwarfLine, [this]() { initializeDwarf(); });" instead.
  1. Patch DebugInfoDWARF library to have a possibility to replace the error handler. Someone could already pass own error handler for DWARFContext. But there are places where WithColor::error() is used directly:
void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {
if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))
  WithColor::error() << toString(std::move(e));
}

Thus, it is necessary to have a possibility to replace ErrorHandler here.

  1. Replace LLDDwarfObj with DWARFObjInMemory implementation while creating DWARFContext:
for (InputFile *file : objectFiles) {
  std::unique_ptr<ObjectFile> ObjectFile = createELFObjectFile(file.mb);

  std::unique_ptr<DWARFContext> DWARFContext::create (ObjectFile, nullptr, ErrorHandler));
}
  1. Create an implementation for lld/ELF/LLDDwarfLinker.cpp::ObjFileAddressMap.
  1. Create special kind of section - "DebugInputSection : public InputSection". To keep the result of debug info linking and to handle relocations.
  1. add LLD`s implementation for DWARFLinker/DWARFLinker.h::DwarfEmitter, which would write the output of DWARFLinker into DebugInputSection.
avl updated this revision to Diff 243205.Feb 7 2020, 9:52 AM

addressed comments.

avl retitled this revision from [LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info. to [WIP][LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info..Feb 14 2020, 11:48 PM

marked the patch as WIP while preparing full set of patches for the feature.

A few comments from me.

lld/ELF/Config.h
158

Unsorted (shold be placed before gcSections + should be gcDebugInfo I think.

lld/ELF/Driver.cpp
2006

Looking at this piece I think that 915, 916 lines should just be:

config->gcDebuginfo = config->gcSections &&
      args.hasFlag(OPT_gc_debuginfo, OPT_no_gc_debuginfo, false);

Perhaps worth to report an error when we have --gc-debug info, but not --gc-sections, but I am not sure here.

lld/ELF/LLDDwarfLinker.cpp
66

We usually do not do such things in LLD. I.e. do not assert on options when the code is trivial like in your case:

if (config->gcSections && config->gcDebuginfo)
  linkDebugInfo<ELFT>();
83

Please do not use new. Is should be something like:

dwarfContexts.push_back(std::make_unique...

88

It would probably be a bit more natural to:

std::unique_ptr<DwarfLinkerObjFile> Obj = std::make_unique<DwarfLinkerObjFile>(
              file->getName(), dwarfObjFile, emptyWarnings);
Obj->Addresses = ;
debugInfoLinker.addObjectFile(*Obj);
objectsForLinking.push_back(std::move(Obj));
93

Addresses = std::make_unique<ObjFileAddressMap>();

lld/ELF/LLDDwarfLinker.h
17

I am not an expert in English, though would rewrite to something like:

// This function is used to remove an unused parts of debug data
// which belongs to garbage collected sections.

(I guess othes might have a better variant though).

lld/test/ELF/gc-debuginfo.s
7

No need to use that much asm:

llvm-mc /dev/null -o %t.o -filetype=obj -triple=aarch64-pc-linux

15

I'd add a test for -r --gc-debuginfo error too since you have this code.

grimar added a comment.EditedFeb 15 2020, 6:27 AM
In D74169#1863848, @avl wrote:

Creating full set of patches which could be any time applied to the upstream and to the each other is quite time consuming thing because of time for synchronization/merging/rebasing.

I do not think them needed to be in state "any time applied to the upstream".
"any time" is indeed too much.

Perhaps we could do something like the following: create a single large patch with all of features which can be applied. And upload it somewhere here as [not for commit].
Then reviewers should be able to review this single patch, but in case of questions they should be able to apply the huge patch to see how the whole feature works.
In this case you'll need to update one huge patch from time to time, e.g. rebase it from time to time after commits of smaller ones. And it should be much simpler.
How does it sound? (the same question for others).

avl added a comment.Feb 16 2020, 1:42 PM

That is OK. The only thing is that I planned to integrate small fixes as soon as they are ready. So I do not have a full implementation at the moment. I will create a single large patch with all of the features sometime later as soon as it would be fully ready.

MaskRay added a comment.EditedFeb 16 2020, 2:22 PM

I agree with

if the additional complexity is not too much, and the new code is nicely isolated from existing code. I think I agree with you that linker is perhaps the best place to drop dead DWARF info.

(One thing I plan to work on is .debug_names accelerator table, which is similar to but more power than .gdb_index.)

On one side, I've heard thoughts that: (1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable? (2) DWARFLinker can significantly increase link time. Can we move it to a post-link tool (dsymutil but for ELF)?
On the other side, I think this approach may be better than D54747 (reverted by rL360955) and may address the problem it tried to mitigate. Given the savings, we may bend our mind a bit and give more tolerance even if the additional complexity is more than not too much.

I am reluctant to let individual patches in before we can see the benefits, because they could easily turn out to be insufficient and cause churns. There is still possibility that we find --gc-debuginfo not suitable on the linker side and may want to do it in a post-link tool.

I agree with

if the additional complexity is not too much, and the new code is nicely isolated from existing code. I think I agree with you that linker is perhaps the best place to drop dead DWARF info.

(One thing I plan to work on is .debug_names accelerator table, which is similar to but more power than .gdb_index.)

On one side, I've heard thoughts that:
(1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable?

For users using -gsplit-dwarf, linker garbage collection provides zero value. But not everyone does, there are reasons to use one or the other in different situations.

(2) DWARFLinker can significantly increase link time. Can we move it to a post-link tool (dsymutil but for ELF)?

Presumably that'd take significantly longer and use more memory and disk space overall - but, yes, it would get you a running executable sooner. (/somewhat/ like the tradeoff of using .dwo files directly, compared to linking them into a .dwp - dwo files are ready sooner, but take up more space, etc, but dwp takes time to make - good for archival purposes)

But (2) I think is unclear to me - I'd like to know how much more time/memory it takes, if it does take more (it probably will take more) - given that it should produce significantly less output (& at Google at least, output is memory usage - since we write to a memory mapped file (though we use Split DWARF in most places where size/memory usage matters, so this solution isn't immediately relevant there).

(& this also might be reusable/good foundation for DWARF aware linking in llvm-dwp, for what it's worth (again, not clear what the tradeoffs will be like in terms of memory/cpu overhead, output size savings, etc))

On the other side, I think this approach may be better than D54747 (reverted by rL360955) and may address the problem it tried to mitigate. Given the savings, we may bend our mind a bit and give more tolerance even if the additional complexity is more than not too much.

I am reluctant to let individual patches in before we can see the benefits, because they could easily turn out to be insufficient and cause churns. There is still possibility that we find --gc-debuginfo not suitable on the linker side and may want to do it in a post-link tool.

lld/ELF/LLDDwarfLinker.cpp
75–76

Probably should be using make_uniqeu<DWARFContext> too, rather than "unique_ptr(new DWARFContext"?

(& a couple of other opportunities to use make_unique in this code below)

avl added a comment.Feb 19 2020, 2:11 PM
On one side, I've heard thoughts that:
 (1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable?

For users using -gsplit-dwarf, linker garbage collection provides zero value. But not everyone does, there are reasons to use one or the other in different situations.

also, as an theoretical alternative - probably we might want to implement -gsplit-dwarf so that it include only used part of debug info. i.e. that which referenced after garbage collection. As the result size of all .dwo files would be smaller.

(2) DWARFLinker can significantly increase link time. Can we move it to a post-link tool (dsymutil but for ELF)?

Presumably that'd take significantly longer and use more memory and disk space overall - but, yes, it would get you a running executable sooner. (/somewhat/ like the tradeoff of using .dwo files directly, compared to linking them into a .dwp - dwo files are ready sooner, but take up more space, etc, but dwp takes time to make - good for archival purposes)

But (2) I think is unclear to me - I'd like to know how much more time/memory it takes,

I am preparing the full patch for that feature, so that it would be possible to estimate these values: how much more time/memory it takes.

In D74169#1883671, @avl wrote:
On one side, I've heard thoughts that:
 (1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable?

For users using -gsplit-dwarf, linker garbage collection provides zero value. But not everyone does, there are reasons to use one or the other in different situations.

also, as an theoretical alternative - probably we might want to implement -gsplit-dwarf so that it include only used part of debug info. i.e. that which referenced after garbage collection. As the result size of all .dwo files would be smaller.

That's not possible - the .dwo files are produced during compilation time, before the linker has done any garbage collection.

(currently even when building the dwp file from dwo files it's not possible to strip the dead debug info - since there's no feedback from the linker that reaches the dwp tool (dwp and linking might be done in parallel on different machines, for instance) - but it'd be possible to implement something more like dsymutil where the linker generates a side-file address mapping/details of the garbage collection and then that is fed into the dwp tool (this would delay running the dwp tool though))

avl added a comment.Feb 21 2020, 4:19 AM

That's not possible - the .dwo files are produced during compilation time, before the linker has done any garbage collection.

(currently even when building the dwp file from dwo files it's not possible to strip the dead debug info - since there's no feedback from the linker that reaches the dwp tool (dwp and linking might be done in parallel on different machines, for instance) - but it'd be possible to implement something more like dsymutil where the linker generates a side-file address mapping/details of the garbage collection and then that is fed into the dwp tool (this would delay running the dwp tool though))

Right. That is not possible in current scheme. Though we could probably change the scheme(if it would be considered beneficial). One alternative is what you were talking about - dsymutil/dwp tool which would take linker report and .dwo files and produce optimized .dwo files. Another alternative is that .dwo files could be optimized during linking time using the same scheme like current proposal. This proposal would take unoptimized .debug* sections of binary and generate optimized contents of destination .debug* sections during linking. Generally, It is possible to take unoptimized .dwo files and make them optimized.

I do not suggest to do it now. That is just possible alternative which could probably be considered if current proposal would show good results.

Another thing, is that this patch could be useful for -gsplit-dwarf even without optimizing .dwo files.
It could skip DW_TAG_skeleton_unit related to deleted code. So that Skeleton file do not have overlapped address ranges like this :

0x00000014: DW_TAG_skeleton_unit
              DW_AT_dwo_name    ("split1.dwo")
              DW_AT_low_pc      (0x0000000000000000)   <<< not assigned because of deleted code
              DW_AT_high_pc     (0x0000000000000008)
              DW_AT_addr_base   (0x00000008)

0x00000034: DW_TAG_skeleton_unit
              DW_AT_dwo_name    ("split2.dwo")
              DW_AT_low_pc      (0x0000000000000000)   <<< not assigned because of deleted code
              DW_AT_high_pc     (0x0000000000000008)
              DW_AT_addr_base   (0x00000008)

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

One last point is to have a flag that enabled DWARF verification. This can be run before the GC operation to ensure the incoming DWARF is in good shape and again after any modifications, to ensure the DWARF is still valid.

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

I believe all the functionality in llvm-dsymutil related to DWARF linking that implements the features you described above (llvm-dsymutil was initially implemented with bug-for-bug compatibility with Apple dsymutil for safety and since then I think a few of those bugs have been fixed/improvements have been made) & that code is now/already in the llvm/lib/DWARFLinker library so it can be reused in lld for this effort (apologies if I'm preaching to the choir/being redundant here) & any others you might have in mind. No doubt there's probably some refactoring to do to improve its reusability as the integration with lld & other situations is done.

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

I believe all the functionality in llvm-dsymutil related to DWARF linking that implements the features you described above (llvm-dsymutil was initially implemented with bug-for-bug compatibility with Apple dsymutil for safety and since then I think a few of those bugs have been fixed/improvements have been made) & that code is now/already in the llvm/lib/DWARFLinker library so it can be reused in lld for this effort (apologies if I'm preaching to the choir/being redundant here) & any others you might have in mind. No doubt there's probably some refactoring to do to improve its reusability as the integration with lld & other situations is done.

That is great to hear! Sorry I haven't looked around at that code in a while, but glad to hear it is all ready to go.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time. I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac. That allows people to get the benefits of the .dwo performance improvements for the edit/compile /debug cycle and then link it into a minimal and optimized DWARF, using contribution from this patch and others to come that optimize the DWARF, for permanent storage on build servers.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time. I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

Note that such post-processing is probably more suited to llvm-objcopy than lld, though of course they can use the same underlying LLVM library.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

It's "complete" in the sense that it's whatever GDB wanted and created a contract that GCC (& Clang/LLVM as best we know/have done so far) agree too, so GDB can rely on things being present/not present in some sense.

That's why debug_gnu_pubnames existed - because of, as you say, the underspecified nature of (or unreliable implementation of) debug_pubnames.

I believe debug_names semantics have been more strictly specified regarding what names will be there and what names will not.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

That allows people to get the benefits of the .dwo performance improvements for the edit/compile /debug cycle and then link it into a minimal and optimized DWARF, using contribution from this patch and others to come that optimize the DWARF, for permanent storage on build servers.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

It's "complete" in the sense that it's whatever GDB wanted and created a contract that GCC (& Clang/LLVM as best we know/have done so far) agree too, so GDB can rely on things being present/not present in some sense.

That's why debug_gnu_pubnames existed - because of, as you say, the underspecified nature of (or unreliable implementation of) debug_pubnames.

Perfect, that is what I was hoping for.

I believe debug_names semantics have been more strictly specified regarding what names will be there and what names will not.

Yes, it indeed fully specifies what is needed. Thanks for the info.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

For the linking, I was more thinking of converting the DWP format back into vanilla DWARF with no DWO/DWP artifacts (skeleton DIEs etc) or any other redirections (string table offsets, address tables, etc). Not sure if this would reduce the DWARF size of all of the DWO/DWP instrstructure was all removed?

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

For the linking, I was more thinking of converting the DWP format back into vanilla DWARF with no DWO/DWP artifacts (skeleton DIEs etc) or any other redirections (string table offsets, address tables, etc). Not sure if this would reduce the DWARF size of all of the DWO/DWP instrstructure was all removed?

Ah, sorry, I see what you mean. Well, the offsets systems in some cases save total size even once linked (eg: having a single 64 bit address with two short uleb indexes compared to two 64 bit addresses - even once the binary is linked and there's no extra bytes of relocation records for that address/addresses) & these features are no longer Split DWARF specific - Clang/LLVM generates almost (entirely, maybe?) all these constructs the same way with split or unsplit DWARF when emitting DWARFv5.

Might save a bit - certainly something someone could experiment with as improvements to DWARFLinker - for dsymutil, lld, and maybe dwp in the future.

(one of the cheap ones that'd be neat to see would be debug_rnglists optimization - wouldn't need to parse DWARF DIEs, etc, would be an entirely local optimization - may be useful for PROPELLER (~= -fbasic-block-sections) - because of the indirection in debug_rnglists (though the rnglist header's offset table) you could totally rewrite range lists to coalesce any contiguous ranges that end up together & rewriting the offsets in the offset table - though this optimization would benefit from being in both lld and dsymutil I think - not applicable to dwp unless address information is passed from the linker like dsymutil)

avl added a comment.Wed, Mar 18, 5:09 AM

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

In D74169#1928711, @avl wrote:

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

Honestly I just do not know the answer, but it sounds to me that we can add them, and do
the refactoring you are proposing after whole functionality is done. (i.e. remove them later if it will be possible/reasonable).

MaskRay added a comment.EditedThu, Mar 19, 10:36 AM
In D74169#1928711, @avl wrote:

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

Honestly I just do not know the answer, but it sounds to me that we can add them, and do
the refactoring you are proposing after whole functionality is done. (i.e. remove them later if it will be possible/reasonable).

Honestly I don't know, either. I echo most of comments on https://reviews.llvm.org/D74169#1880922 For -gsplit-dwarf or -gsplit-dwarf=single users, garbage collection of debug info can provide hardly any benefit. For non -gsplit-dwarf use cases, I am concerned whether the pros (avoid an intermediate file/avoid a post-processing step) overweight cons (time/memory/lld size). The link time/memory may increase significantly. I wonder whether the peak memory usage can increase to 2x given our experience https://reviews.llvm.org/D74773#1920751

I am also curious to learn how much + AllTargetsCodeGens + AllTargetsDescs + AllTargetsInfos + AsmPrinter can increase the size of lld executable in a -DBUILD_SHARED_LIBS=off build.

Doing a full DWARF rewriting can save intermediate files. But if the memory/time trade-off is not that appealing, a standalone tool (which can have more configuration options) may be more suitable. That said, we may need some prototype to verify these theories.

How much do we sacrifice in link speed/peak memory usage/lld executable size in return for X% of .debug_* size saving?

avl added a comment.Fri, Mar 20, 5:46 AM

I am also curious to learn how much + AllTargetsCodeGens + AllTargetsDescs + AllTargetsInfos + AsmPrinter can increase the size of lld executable in a -DBUILD_SHARED_LIBS=off build.

Currently, it has zero effect because of --gc-sections.

I asked about using AsmPrinter in lld to check whether there is a fundamental limitation for that.
So far, I understood that the answer depends on the final results of the implementation.

Doing a full DWARF rewriting can save intermediate files. But if the memory/time trade-off is not that appealing, a standalone tool (which can have more configuration options) may be more suitable. That said, we may need some prototype to verify these theories.

I am currently working on that prototype.

How much do we sacrifice in link speed/peak memory usage/lld executable size in return for X% of .debug_* size saving?

When the prototype is ready, it would be possible to evaluate memory/time results.

I did another prototype(not based on dsymutil, though with a very similar idea) D67469.
Its results are in llvm thread http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html.

avl updated this revision to Diff 254852.Fri, Apr 3, 10:43 AM
avl retitled this revision from [WIP][LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info. to [WIP][LLD][ELF][DebugInfo] Remove obsolete debug info..
avl edited the summary of this revision. (Show Details)

This version is a full patch implementing "remove obsolete debug info from lld" functionality.
As it was requested, It is presented as a full patch, which could be applied to the current code
base and evaluated. Some parts of that patch are already presented for review - D77169, D76085.
I am going to add tests for this patch and to address comments. Performance results would be presented in a separate message.

Limitations: it does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections.

Testing: it passes check-all lit testing. It is able to compile llvm code base. llvm-dwarfdump --verify shows zero errors for clang built by lld from this patch.

In D74169#1960135, @avl wrote:

This version is a full patch implementing "remove obsolete debug info from lld" functionality.
As it was requested, It is presented as a full patch, which could be applied to the current code
base and evaluated. Some parts of that patch are already presented for review - D77169, D76085.
I am going to add tests for this patch and to address comments. Performance results would be presented in a separate message.

Limitations: it does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections.

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

Testing: it passes check-all lit testing. It is able to compile llvm code base. llvm-dwarfdump --verify shows zero errors for clang built by lld from this patch.

Do you have memory usage/compile time numbers for linking various programs with/without this feature?

avl added a comment.Fri, Apr 3, 11:21 AM

Size/Performance/Memory results:

lld binary size with this patch(SHARED_LIBS=OFF) - 49925kb.
lld binary size without this patch(SHARED_LIBS=OFF) - 49699kb.

Performance results for building whole LLVM:

--------------------------------------------------------------------------
|   |       CL options                | LLVM bin size  | LLVM build time |   
--------------------------------------------------------------------------
| A | (default set of options*)       |  100.0%(18.0GB)|  100.0%(75m15s) |
|   |                                 |                |                 |
| B | A +gc-debuginfo(**) +NoOdr=false|    54%(9,7GB)  |  124.0%(93m33s) |
|   |                                 |                |                 |
| C | A +gc-debuginfo +NoOdr=false    |    43%(7,7GB)  |  107.0%(80m26s) |
|   |   +fdebug-types-section(***)    |                |                 |
--------------------------------------------------------------------------

Performance results of lld for building single clang binary:

---------------------------------------------------------------------------------
|   |       CL options             | Clang size   | link time | run-time memory |
---------------------------------------------------------------------------------
| A | (default set of options*)    | 100.0%(1,4GB)| 100%(49s) |   (100%)9.3GB   |
|   |                              |              |           |                 |
| B | A +gc-debuginfo +NoOdr=false | 56.0%(0.78GB)| 296%(145s)|   (153%)14.2GB  |
|   |                              |              |           |                 |
| C | A +gc-debuginfo +NoOdr=true  | 86.0%(1.2GB) | 380%(186s)|   (161%)15GB    |
|   |                              |              |           |                 |
| D | A +gc-debuginfo +NoOdr=false | 42.0%(0.59GB)| 133%(65s) |   (113%)10.5GB  |
|   |   +fdebug-types-section(***) |              |           |                 |
---------------------------------------------------------------------------------

(*)
LLVM_TARGETS_TO_BUILD=X86;AArch64
LLVM_ENABLE_PROJECTS=clang;lld
LLVM_ENABLE_LLD=ON
CMAKE_CXX_FLAGS=--ffunction-sections
CMAKE_C_FLAGS=--ffunction-sections
CMAKE_CXX_FLAGS_DEBUG=-g
CMAKE_C_FLAGS_DEBUG=-g
CMAKE_EXE_LINKER_FLAGS=--Wl,--gc-sections
CMAKE_MODULE_LINKER_FLAGS=-Wl,--gc-sections
CMAKE_SHARED_LINKER_FLAGS=-Wl,--gc-sections

(**)

  • gc-debuginfo - removes obsolete debug info. implemented by this patch.
  • NoOdr - mode of DWARFLinker which switches on/off types deduplication. there is no command line option to switch it on/off.
  • fdebug-types-section - deduplicates types using comdat sections.

(***) fdebug-types-section is not supported by dsymutil/DWARFLinker.
Thus presented results do not show real data.
But these numbers could be taken as an estimation for future results
when DWARF5/-fdebug-types-section would be supported by dsymutil/DWARFLinker.

(****) HW configuration:

OS Ubuntu 18.04
CPU Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
RAM 32018 MiB
Storage SSD

avl added a comment.Fri, Apr 3, 12:27 PM

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

it should but I did not check it yet. I would check and post results here.

In D74169#1960444, @avl wrote:

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

To the best of my knowledge (as the person who implemented type unit support in LLVM... ) type units never reduce the size of object files - they only increase it.

There is no type duplication, so far as I know, in LLVM's object files even without type units - one definition of the type is emitted and all the rest of the DWARF references that type. With type units that type is just the skeleton (DW_AT_declaration and DW_AT_signature), without type units that type is the full definition, without the indirection to the type unit.

Are there particular examples of type duplication within a single object file you have in mind? I'd be super curious to see them - might be some bugs that need fixing.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Bag of DWARF doesn't seem like it'd help in a situation where you know you're going to use a DWARF-aware linker (just put all the DWARF in the CU as normal & know the DWARF-aware linker will introduce cross-CU references as needed as things are deduplicated). The idea for Bag of DWARF was with a DWARF agnostic linker you could reference more than one DIE from in a type unit (or a compile unit) to reduce a bunch of the duplication/overhead that type units introduce (eg: when defining a member function of a type unit, you currently have to duplicate the member function's declaration in the CU that references the TU, because you can't refer to that subprogram declaration DIE in the TU - only the type itself).

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

it should but I did not check it yet. I would check and post results here.

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

(***) fdebug-types-section is not supported by dsymutil/DWARFLinker.
Thus presented results do not show real data.
But these numbers could be taken as an estimation for future results
when DWARF5/-fdebug-types-section would be supported by dsymutil/DWARFLinker.

Sorry, sort of duplicating the other thread of conversation we're having on this review - I don't believe type units would help dsymutil/DWARFLinker - nor that this data would be representative of potential improvements. Could you describe how these numbers would be representative of potential improvements? What benefit would type units provide to DWARF-aware linking?

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

Given this is built on top of the DWARFLinker used for llvm-dsymutil which, I believe, already strips out those duplicate descriptions - hopefully that's already included in the functionality under review? But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?

(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

In D74169#1960444, @avl wrote:

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

.debug_types could be much more efficient if we stop duplicating the decl context just so we can refer to a type. Right now if you have a variable, it will point to a CU local type, which duplicates the containing namespaces and class/struct/union decl contexts. So the overhead of .debug_types is a bit higher. It would be great if we could just refer directly to the type from .debug_types in the DW_TAG_variable and other DIEs that have DW_AT_type attributes.

Also, .debug_types is only there for struct, unions, classes and a few other types. When you have a typedef like "std::string::size_type", or a typedef inside of the std::string, it may or may not appear in the .debug_types definition, but no one can point to those types because the way .debug_types is currently defined, only one type can be referred to within a type unit. The results is that any code wanting to use of one these types must duplicate the entire decl context (namespace "std", class "basic_string<>", typedef "size_type") just so a variable can point to the DWARF.

I mention this because I think letting --gc-debuginfo do its thing with ODR type uniquing leaves the DWARF in a better state: type info for one class isn't strewn across a bunch of .o files, or having large swaths of the type duplicated all over the place. So if we have type uniquing, I would keep .debug_types out of it.

We could move to make .debug_types more useful. Right now a type unit has a single type signature that people can refer to and a single offset within the type unit for the type that people will extract. If we got a DWARF specification change in, we could make it so a single type unit has N signatures and N offsets to types within the type unit. This would allow people to directly reference contained types that are not class, structs or unions, like typedefs. But that requires a DWARF spec change.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

Are object files not actually bigger when using .debug_types? All the duplicated decl context junk in the DWARF is not efficient. I would be surprised if type units make .o files smaller. There is nothing to unique within a single .o file and the type units should only make things bigger.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Right now the ODR type uniquing does this, but it doesn't split the types out into their own compile units. The first time you see a type, you emit it and remember it. All other uses of this type and up using a DW_FORM_ref_addr which is 32 bits to refer to the first emitted type. So the ODR already does this. What it doesn't do is split types out into unique compile units based on their DW_AT_decl_file. If we did, you can slowly combine all DWARF that was defined in header files into DW_TAG_compile_unit DIEs with only types from those files. Then the DWARF would be really clean. The main issue with this approach is that it means you have to make all DWARF in memory to that things can lazily be added to each of these bags (compile units with types in them) since you might end up adding more types to each compile unit. With the current ODR support we can parse the .o files one at a time, and refer to things we have already emitted easily and we don't need to keep everything around so we can add things to the "bags".

The downside of the dsymutil ODR technique, is many times you types that have implicitly added methods (default constructor, assignment operators, move constructors) that are only present if the user used them in their source file. So you might end up with many definitions of "class A", but some have the copy constructor in the DWARF (it was created by the compiler for types where the compiler can create them) and others that don't. So we sometimes have to emit the type multiple times when you have a DW_TAG_subprogram that refers to these implicitly created functions. But overall the approach works well for how easy it is to implement.

I am all for creating compile units and adding all of the types declared in that file to them and having the DWARF be even cleaner though!

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

Given this is built on top of the DWARFLinker used for llvm-dsymutil which, I believe, already strips out those duplicate descriptions - hopefully that's already included in the functionality under review? But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?

(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

dsymutil's ODR does unique the types and only emit one copy. See my comment on implicit methods as one thing that can cause multiple copies of a type to be emitted.

avl added a comment.Fri, Apr 3, 2:13 PM

But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?
(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

right. It generates one copy:

$cat test1.cpp

inline void f1() { } void f2(); int main() { f1(); f2(); }

$cat test2.cpp

inline void f1() { } void f2() { f1(); }

$ clang++ test1.cpp test2.cpp -O0 -g -ffunction-sections -fuse-ld=lld -Wl,--gc-sections; llvm-dwarfdump -a a.out | grep "\"f1\""
                DW_AT_name	("f1")
                DW_AT_name	("f1")
0x00000098: "f1"

$ clang++ test1.cpp test2.cpp -O0 -g -ffunction-sections -fuse-ld=lld -Wl,--gc-sections,--gc-debuginfo; llvm-dwarfdump -a a.out | grep "\"f1\""
                DW_AT_name	("f1")
0x0000008b: "f1"