This is an archive of the discontinued LLVM Phabricator instance.

[WIP][LLD][ELF][DebugInfo] Remove obsolete debug info.
Needs ReviewPublic

Authored by avl on Feb 6 2020, 2:17 PM.
Tokens
"Like" token, awarded by Dushistov.

Details

Summary

This patch is the implementation for removing obsolete debug info:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135473.html

It uses DWARFLinker which contains code refactored out from dsymutil.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
grimar added inline comments.Feb 7 2020, 2:01 AM
lld/ELF/LLDDwarfLinker.cpp
13

Is it a final version of the comment? I'd expect to habe more information here probably about what we exactly do.

avl added a comment.Feb 7 2020, 5:23 AM

How many changes do you expect will be needed to make the feature work?

Hi, following is the approximate list of changes, which assumed to match with corresponding patches :

  1. Do not create DWARFContext inplace. Use ObjFile<ELFT>::dwarf and "llvm::call_once(initDwarfLine, [this]() { initializeDwarf(); });" instead.
  1. Patch DebugInfoDWARF library to have a possibility to replace the error handler. Someone could already pass own error handler for DWARFContext. But there are places where WithColor::error() is used directly:
void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {
if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))
  WithColor::error() << toString(std::move(e));
}

Thus, it is necessary to have a possibility to replace ErrorHandler here.

  1. Replace LLDDwarfObj with DWARFObjInMemory implementation while creating DWARFContext:
for (InputFile *file : objectFiles) {
  std::unique_ptr<ObjectFile> ObjectFile = createELFObjectFile(file.mb);

  std::unique_ptr<DWARFContext> DWARFContext::create (ObjectFile, nullptr, ErrorHandler));
}
  1. Create an implementation for lld/ELF/LLDDwarfLinker.cpp::ObjFileAddressMap.
  1. Create special kind of section - "DebugInputSection : public InputSection". To keep the result of debug info linking and to handle relocations.
  1. add LLD`s implementation for DWARFLinker/DWARFLinker.h::DwarfEmitter, which would write the output of DWARFLinker into DebugInputSection.
avl marked 13 inline comments as done.Feb 7 2020, 5:43 AM
avl added inline comments.
lld/ELF/CMakeLists.txt
39

The idea is that the file name matches the class name. The class name has LLD to be different from DWARFLinker from DWARFLinker library. If it looks redundant, then I would delete this prefix.

57

It is used because LLDDwarfLinker.cpp includes DWARFLinker.h, which includes "llvm/CodeGen/Die.h", which needs "llvm::DIEAbbrev::Profile(llvm::FoldingSetNodeID&)" which is in AsmPrinter.

probably it could be refactored out later.

lld/ELF/LLDDwarfLinker.cpp
13

Ok, I would put more detailed comment.

avl added a comment.Feb 7 2020, 6:03 AM

I wonder if that is really important to split patches that much (this patch seems have no functionality at all?). It would be good to see at least some benefit, but perhaps
it is not that easy to do... Anyways:
Personally I'd probably prefer to see a full set of patches for the feature. A reviewer should be able to apply them and debug (if he/she wants to) probably.

Hi George,

I follow the idea is that small patches are easier to review and easier to integrate. That concrete patch does not have a functionality but it includes DWARLinker library, creates option and creates placeholders for the functionality. Which looks like a small amount of things that could be reviewed.

Creating full set of patches which could be any time applied to the upstream and to the each other is quite time consuming thing because of time for synchronization/merging/rebasing. As an alternative we could discuss main points of future patches[0]. And iteratively apply patches step by step. If that plan is not good - then OK, I would start to work on preparing full set of patches for that feature.

[0] following is the approximate list of changes, which assumed to match with corresponding patches :

  1. Do not create DWARFContext inplace. Use ObjFile<ELFT>::dwarf and "llvm::call_once(initDwarfLine, [this]() { initializeDwarf(); });" instead.
  1. Patch DebugInfoDWARF library to have a possibility to replace the error handler. Someone could already pass own error handler for DWARFContext. But there are places where WithColor::error() is used directly:
void DWARFUnit::extractDIEsIfNeeded(bool CUDieOnly) {
if (Error e = tryExtractDIEsIfNeeded(CUDieOnly))
  WithColor::error() << toString(std::move(e));
}

Thus, it is necessary to have a possibility to replace ErrorHandler here.

  1. Replace LLDDwarfObj with DWARFObjInMemory implementation while creating DWARFContext:
for (InputFile *file : objectFiles) {
  std::unique_ptr<ObjectFile> ObjectFile = createELFObjectFile(file.mb);

  std::unique_ptr<DWARFContext> DWARFContext::create (ObjectFile, nullptr, ErrorHandler));
}
  1. Create an implementation for lld/ELF/LLDDwarfLinker.cpp::ObjFileAddressMap.
  1. Create special kind of section - "DebugInputSection : public InputSection". To keep the result of debug info linking and to handle relocations.
  1. add LLD`s implementation for DWARFLinker/DWARFLinker.h::DwarfEmitter, which would write the output of DWARFLinker into DebugInputSection.
avl updated this revision to Diff 243205.Feb 7 2020, 9:52 AM

addressed comments.

avl retitled this revision from [LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info. to [WIP][LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info..Feb 14 2020, 11:48 PM

marked the patch as WIP while preparing full set of patches for the feature.

A few comments from me.

lld/ELF/Config.h
161

Unsorted (shold be placed before gcSections + should be gcDebugInfo I think.

lld/ELF/Driver.cpp
2056

Looking at this piece I think that 915, 916 lines should just be:

config->gcDebuginfo = config->gcSections &&
      args.hasFlag(OPT_gc_debuginfo, OPT_no_gc_debuginfo, false);

Perhaps worth to report an error when we have --gc-debug info, but not --gc-sections, but I am not sure here.

lld/ELF/LLDDwarfLinker.cpp
67

We usually do not do such things in LLD. I.e. do not assert on options when the code is trivial like in your case:

if (config->gcSections && config->gcDebuginfo)
  linkDebugInfo<ELFT>();
84

Please do not use new. Is should be something like:

dwarfContexts.push_back(std::make_unique...

89

It would probably be a bit more natural to:

std::unique_ptr<DwarfLinkerObjFile> Obj = std::make_unique<DwarfLinkerObjFile>(
              file->getName(), dwarfObjFile, emptyWarnings);
Obj->Addresses = ;
debugInfoLinker.addObjectFile(*Obj);
objectsForLinking.push_back(std::move(Obj));
94

Addresses = std::make_unique<ObjFileAddressMap>();

lld/ELF/LLDDwarfLinker.h
18

I am not an expert in English, though would rewrite to something like:

// This function is used to remove an unused parts of debug data
// which belongs to garbage collected sections.

(I guess othes might have a better variant though).

lld/test/ELF/gc-debuginfo.s
7 ↗(On Diff #243205)

No need to use that much asm:

llvm-mc /dev/null -o %t.o -filetype=obj -triple=aarch64-pc-linux

15 ↗(On Diff #243205)

I'd add a test for -r --gc-debuginfo error too since you have this code.

grimar added a comment.EditedFeb 15 2020, 6:27 AM
In D74169#1863848, @avl wrote:

Creating full set of patches which could be any time applied to the upstream and to the each other is quite time consuming thing because of time for synchronization/merging/rebasing.

I do not think them needed to be in state "any time applied to the upstream".
"any time" is indeed too much.

Perhaps we could do something like the following: create a single large patch with all of features which can be applied. And upload it somewhere here as [not for commit].
Then reviewers should be able to review this single patch, but in case of questions they should be able to apply the huge patch to see how the whole feature works.
In this case you'll need to update one huge patch from time to time, e.g. rebase it from time to time after commits of smaller ones. And it should be much simpler.
How does it sound? (the same question for others).

avl added a comment.Feb 16 2020, 1:42 PM

That is OK. The only thing is that I planned to integrate small fixes as soon as they are ready. So I do not have a full implementation at the moment. I will create a single large patch with all of the features sometime later as soon as it would be fully ready.

MaskRay added a comment.EditedFeb 16 2020, 2:22 PM

I agree with

if the additional complexity is not too much, and the new code is nicely isolated from existing code. I think I agree with you that linker is perhaps the best place to drop dead DWARF info.

(One thing I plan to work on is .debug_names accelerator table, which is similar to but more power than .gdb_index.)

On one side, I've heard thoughts that: (1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable? (2) DWARFLinker can significantly increase link time. Can we move it to a post-link tool (dsymutil but for ELF)?
On the other side, I think this approach may be better than D54747 (reverted by rL360955) and may address the problem it tried to mitigate. Given the savings, we may bend our mind a bit and give more tolerance even if the additional complexity is more than not too much.

I am reluctant to let individual patches in before we can see the benefits, because they could easily turn out to be insufficient and cause churns. There is still possibility that we find --gc-debuginfo not suitable on the linker side and may want to do it in a post-link tool.

I agree with

if the additional complexity is not too much, and the new code is nicely isolated from existing code. I think I agree with you that linker is perhaps the best place to drop dead DWARF info.

(One thing I plan to work on is .debug_names accelerator table, which is similar to but more power than .gdb_index.)

On one side, I've heard thoughts that:
(1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable?

For users using -gsplit-dwarf, linker garbage collection provides zero value. But not everyone does, there are reasons to use one or the other in different situations.

(2) DWARFLinker can significantly increase link time. Can we move it to a post-link tool (dsymutil but for ELF)?

Presumably that'd take significantly longer and use more memory and disk space overall - but, yes, it would get you a running executable sooner. (/somewhat/ like the tradeoff of using .dwo files directly, compared to linking them into a .dwp - dwo files are ready sooner, but take up more space, etc, but dwp takes time to make - good for archival purposes)

But (2) I think is unclear to me - I'd like to know how much more time/memory it takes, if it does take more (it probably will take more) - given that it should produce significantly less output (& at Google at least, output is memory usage - since we write to a memory mapped file (though we use Split DWARF in most places where size/memory usage matters, so this solution isn't immediately relevant there).

(& this also might be reusable/good foundation for DWARF aware linking in llvm-dwp, for what it's worth (again, not clear what the tradeoffs will be like in terms of memory/cpu overhead, output size savings, etc))

On the other side, I think this approach may be better than D54747 (reverted by rL360955) and may address the problem it tried to mitigate. Given the savings, we may bend our mind a bit and give more tolerance even if the additional complexity is more than not too much.

I am reluctant to let individual patches in before we can see the benefits, because they could easily turn out to be insufficient and cause churns. There is still possibility that we find --gc-debuginfo not suitable on the linker side and may want to do it in a post-link tool.

lld/ELF/LLDDwarfLinker.cpp
75–76

Probably should be using make_uniqeu<DWARFContext> too, rather than "unique_ptr(new DWARFContext"?

(& a couple of other opportunities to use make_unique in this code below)

avl added a comment.Feb 19 2020, 2:11 PM
On one side, I've heard thoughts that:
 (1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable?

For users using -gsplit-dwarf, linker garbage collection provides zero value. But not everyone does, there are reasons to use one or the other in different situations.

also, as an theoretical alternative - probably we might want to implement -gsplit-dwarf so that it include only used part of debug info. i.e. that which referenced after garbage collection. As the result size of all .dwo files would be smaller.

(2) DWARFLinker can significantly increase link time. Can we move it to a post-link tool (dsymutil but for ELF)?

Presumably that'd take significantly longer and use more memory and disk space overall - but, yes, it would get you a running executable sooner. (/somewhat/ like the tradeoff of using .dwo files directly, compared to linking them into a .dwp - dwo files are ready sooner, but take up more space, etc, but dwp takes time to make - good for archival purposes)

But (2) I think is unclear to me - I'd like to know how much more time/memory it takes,

I am preparing the full patch for that feature, so that it would be possible to estimate these values: how much more time/memory it takes.

In D74169#1883671, @avl wrote:
On one side, I've heard thoughts that:
 (1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable?

For users using -gsplit-dwarf, linker garbage collection provides zero value. But not everyone does, there are reasons to use one or the other in different situations.

also, as an theoretical alternative - probably we might want to implement -gsplit-dwarf so that it include only used part of debug info. i.e. that which referenced after garbage collection. As the result size of all .dwo files would be smaller.

That's not possible - the .dwo files are produced during compilation time, before the linker has done any garbage collection.

(currently even when building the dwp file from dwo files it's not possible to strip the dead debug info - since there's no feedback from the linker that reaches the dwp tool (dwp and linking might be done in parallel on different machines, for instance) - but it'd be possible to implement something more like dsymutil where the linker generates a side-file address mapping/details of the garbage collection and then that is fed into the dwp tool (this would delay running the dwp tool though))

avl added a comment.Feb 21 2020, 4:19 AM

That's not possible - the .dwo files are produced during compilation time, before the linker has done any garbage collection.

(currently even when building the dwp file from dwo files it's not possible to strip the dead debug info - since there's no feedback from the linker that reaches the dwp tool (dwp and linking might be done in parallel on different machines, for instance) - but it'd be possible to implement something more like dsymutil where the linker generates a side-file address mapping/details of the garbage collection and then that is fed into the dwp tool (this would delay running the dwp tool though))

Right. That is not possible in current scheme. Though we could probably change the scheme(if it would be considered beneficial). One alternative is what you were talking about - dsymutil/dwp tool which would take linker report and .dwo files and produce optimized .dwo files. Another alternative is that .dwo files could be optimized during linking time using the same scheme like current proposal. This proposal would take unoptimized .debug* sections of binary and generate optimized contents of destination .debug* sections during linking. Generally, It is possible to take unoptimized .dwo files and make them optimized.

I do not suggest to do it now. That is just possible alternative which could probably be considered if current proposal would show good results.

Another thing, is that this patch could be useful for -gsplit-dwarf even without optimizing .dwo files.
It could skip DW_TAG_skeleton_unit related to deleted code. So that Skeleton file do not have overlapped address ranges like this :

0x00000014: DW_TAG_skeleton_unit
              DW_AT_dwo_name    ("split1.dwo")
              DW_AT_low_pc      (0x0000000000000000)   <<< not assigned because of deleted code
              DW_AT_high_pc     (0x0000000000000008)
              DW_AT_addr_base   (0x00000008)

0x00000034: DW_TAG_skeleton_unit
              DW_AT_dwo_name    ("split2.dwo")
              DW_AT_low_pc      (0x0000000000000000)   <<< not assigned because of deleted code
              DW_AT_high_pc     (0x0000000000000008)
              DW_AT_addr_base   (0x00000008)

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

One last point is to have a flag that enabled DWARF verification. This can be run before the GC operation to ensure the incoming DWARF is in good shape and again after any modifications, to ensure the DWARF is still valid.

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

I believe all the functionality in llvm-dsymutil related to DWARF linking that implements the features you described above (llvm-dsymutil was initially implemented with bug-for-bug compatibility with Apple dsymutil for safety and since then I think a few of those bugs have been fixed/improvements have been made) & that code is now/already in the llvm/lib/DWARFLinker library so it can be reused in lld for this effort (apologies if I'm preaching to the choir/being redundant here) & any others you might have in mind. No doubt there's probably some refactoring to do to improve its reusability as the integration with lld & other situations is done.

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

I believe all the functionality in llvm-dsymutil related to DWARF linking that implements the features you described above (llvm-dsymutil was initially implemented with bug-for-bug compatibility with Apple dsymutil for safety and since then I think a few of those bugs have been fixed/improvements have been made) & that code is now/already in the llvm/lib/DWARFLinker library so it can be reused in lld for this effort (apologies if I'm preaching to the choir/being redundant here) & any others you might have in mind. No doubt there's probably some refactoring to do to improve its reusability as the integration with lld & other situations is done.

That is great to hear! Sorry I haven't looked around at that code in a while, but glad to hear it is all ready to go.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time. I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac. That allows people to get the benefits of the .dwo performance improvements for the edit/compile /debug cycle and then link it into a minimal and optimized DWARF, using contribution from this patch and others to come that optimize the DWARF, for permanent storage on build servers.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time. I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

Note that such post-processing is probably more suited to llvm-objcopy than lld, though of course they can use the same underlying LLVM library.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

It's "complete" in the sense that it's whatever GDB wanted and created a contract that GCC (& Clang/LLVM as best we know/have done so far) agree too, so GDB can rely on things being present/not present in some sense.

That's why debug_gnu_pubnames existed - because of, as you say, the underspecified nature of (or unreliable implementation of) debug_pubnames.

I believe debug_names semantics have been more strictly specified regarding what names will be there and what names will not.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

That allows people to get the benefits of the .dwo performance improvements for the edit/compile /debug cycle and then link it into a minimal and optimized DWARF, using contribution from this patch and others to come that optimize the DWARF, for permanent storage on build servers.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

It's "complete" in the sense that it's whatever GDB wanted and created a contract that GCC (& Clang/LLVM as best we know/have done so far) agree too, so GDB can rely on things being present/not present in some sense.

That's why debug_gnu_pubnames existed - because of, as you say, the underspecified nature of (or unreliable implementation of) debug_pubnames.

Perfect, that is what I was hoping for.

I believe debug_names semantics have been more strictly specified regarding what names will be there and what names will not.

Yes, it indeed fully specifies what is needed. Thanks for the info.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

For the linking, I was more thinking of converting the DWP format back into vanilla DWARF with no DWO/DWP artifacts (skeleton DIEs etc) or any other redirections (string table offsets, address tables, etc). Not sure if this would reduce the DWARF size of all of the DWO/DWP instrstructure was all removed?

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

For the linking, I was more thinking of converting the DWP format back into vanilla DWARF with no DWO/DWP artifacts (skeleton DIEs etc) or any other redirections (string table offsets, address tables, etc). Not sure if this would reduce the DWARF size of all of the DWO/DWP instrstructure was all removed?

Ah, sorry, I see what you mean. Well, the offsets systems in some cases save total size even once linked (eg: having a single 64 bit address with two short uleb indexes compared to two 64 bit addresses - even once the binary is linked and there's no extra bytes of relocation records for that address/addresses) & these features are no longer Split DWARF specific - Clang/LLVM generates almost (entirely, maybe?) all these constructs the same way with split or unsplit DWARF when emitting DWARFv5.

Might save a bit - certainly something someone could experiment with as improvements to DWARFLinker - for dsymutil, lld, and maybe dwp in the future.

(one of the cheap ones that'd be neat to see would be debug_rnglists optimization - wouldn't need to parse DWARF DIEs, etc, would be an entirely local optimization - may be useful for PROPELLER (~= -fbasic-block-sections) - because of the indirection in debug_rnglists (though the rnglist header's offset table) you could totally rewrite range lists to coalesce any contiguous ranges that end up together & rewriting the offsets in the offset table - though this optimization would benefit from being in both lld and dsymutil I think - not applicable to dwp unless address information is passed from the linker like dsymutil)

avl added a comment.Mar 18 2020, 5:09 AM

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

In D74169#1928711, @avl wrote:

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

Honestly I just do not know the answer, but it sounds to me that we can add them, and do
the refactoring you are proposing after whole functionality is done. (i.e. remove them later if it will be possible/reasonable).

MaskRay added a comment.EditedMar 19 2020, 10:36 AM
In D74169#1928711, @avl wrote:

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

Honestly I just do not know the answer, but it sounds to me that we can add them, and do
the refactoring you are proposing after whole functionality is done. (i.e. remove them later if it will be possible/reasonable).

Honestly I don't know, either. I echo most of comments on https://reviews.llvm.org/D74169#1880922 For -gsplit-dwarf or -gsplit-dwarf=single users, garbage collection of debug info can provide hardly any benefit. For non -gsplit-dwarf use cases, I am concerned whether the pros (avoid an intermediate file/avoid a post-processing step) overweight cons (time/memory/lld size). The link time/memory may increase significantly. I wonder whether the peak memory usage can increase to 2x given our experience https://reviews.llvm.org/D74773#1920751

I am also curious to learn how much + AllTargetsCodeGens + AllTargetsDescs + AllTargetsInfos + AsmPrinter can increase the size of lld executable in a -DBUILD_SHARED_LIBS=off build.

Doing a full DWARF rewriting can save intermediate files. But if the memory/time trade-off is not that appealing, a standalone tool (which can have more configuration options) may be more suitable. That said, we may need some prototype to verify these theories.

How much do we sacrifice in link speed/peak memory usage/lld executable size in return for X% of .debug_* size saving?

avl added a comment.Mar 20 2020, 5:46 AM

I am also curious to learn how much + AllTargetsCodeGens + AllTargetsDescs + AllTargetsInfos + AsmPrinter can increase the size of lld executable in a -DBUILD_SHARED_LIBS=off build.

Currently, it has zero effect because of --gc-sections.

I asked about using AsmPrinter in lld to check whether there is a fundamental limitation for that.
So far, I understood that the answer depends on the final results of the implementation.

Doing a full DWARF rewriting can save intermediate files. But if the memory/time trade-off is not that appealing, a standalone tool (which can have more configuration options) may be more suitable. That said, we may need some prototype to verify these theories.

I am currently working on that prototype.

How much do we sacrifice in link speed/peak memory usage/lld executable size in return for X% of .debug_* size saving?

When the prototype is ready, it would be possible to evaluate memory/time results.

I did another prototype(not based on dsymutil, though with a very similar idea) D67469.
Its results are in llvm thread http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html.

avl updated this revision to Diff 254852.Apr 3 2020, 10:43 AM
avl retitled this revision from [WIP][LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info. to [WIP][LLD][ELF][DebugInfo] Remove obsolete debug info..
avl edited the summary of this revision. (Show Details)

This version is a full patch implementing "remove obsolete debug info from lld" functionality.
As it was requested, It is presented as a full patch, which could be applied to the current code
base and evaluated. Some parts of that patch are already presented for review - D77169, D76085.
I am going to add tests for this patch and to address comments. Performance results would be presented in a separate message.

Limitations: it does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections.

Testing: it passes check-all lit testing. It is able to compile llvm code base. llvm-dwarfdump --verify shows zero errors for clang built by lld from this patch.

In D74169#1960135, @avl wrote:

This version is a full patch implementing "remove obsolete debug info from lld" functionality.
As it was requested, It is presented as a full patch, which could be applied to the current code
base and evaluated. Some parts of that patch are already presented for review - D77169, D76085.
I am going to add tests for this patch and to address comments. Performance results would be presented in a separate message.

Limitations: it does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections.

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

Testing: it passes check-all lit testing. It is able to compile llvm code base. llvm-dwarfdump --verify shows zero errors for clang built by lld from this patch.

Do you have memory usage/compile time numbers for linking various programs with/without this feature?

avl added a comment.Apr 3 2020, 11:21 AM

Size/Performance/Memory results:

lld binary size with this patch(SHARED_LIBS=OFF) - 49925kb.
lld binary size without this patch(SHARED_LIBS=OFF) - 49699kb.

Performance results for building whole LLVM:

--------------------------------------------------------------------------
|   |       CL options                | LLVM bin size  | LLVM build time |   
--------------------------------------------------------------------------
| A | (default set of options*)       |  100.0%(18.0GB)|  100.0%(75m15s) |
|   |                                 |                |                 |
| B | A +gc-debuginfo(**) +NoOdr=false|    54%(9,7GB)  |  124.0%(93m33s) |
|   |                                 |                |                 |
| C | A +gc-debuginfo +NoOdr=false    |    43%(7,7GB)  |  107.0%(80m26s) |
|   |   +fdebug-types-section(***)    |                |                 |
--------------------------------------------------------------------------

Performance results of lld for building single clang binary:

---------------------------------------------------------------------------------
|   |       CL options             | Clang size   | link time | run-time memory |
---------------------------------------------------------------------------------
| A | (default set of options*)    | 100.0%(1,4GB)| 100%(49s) |   (100%)9.3GB   |
|   |                              |              |           |                 |
| B | A +gc-debuginfo +NoOdr=false | 56.0%(0.78GB)| 296%(145s)|   (153%)14.2GB  |
|   |                              |              |           |                 |
| C | A +gc-debuginfo +NoOdr=true  | 86.0%(1.2GB) | 380%(186s)|   (161%)15GB    |
|   |                              |              |           |                 |
| D | A +gc-debuginfo +NoOdr=false | 42.0%(0.59GB)| 133%(65s) |   (113%)10.5GB  |
|   |   +fdebug-types-section(***) |              |           |                 |
---------------------------------------------------------------------------------

(*)
LLVM_TARGETS_TO_BUILD=X86;AArch64
LLVM_ENABLE_PROJECTS=clang;lld
LLVM_ENABLE_LLD=ON
CMAKE_CXX_FLAGS=--ffunction-sections
CMAKE_C_FLAGS=--ffunction-sections
CMAKE_CXX_FLAGS_DEBUG=-g
CMAKE_C_FLAGS_DEBUG=-g
CMAKE_EXE_LINKER_FLAGS=--Wl,--gc-sections
CMAKE_MODULE_LINKER_FLAGS=-Wl,--gc-sections
CMAKE_SHARED_LINKER_FLAGS=-Wl,--gc-sections

(**)

  • gc-debuginfo - removes obsolete debug info. implemented by this patch.
  • NoOdr - mode of DWARFLinker which switches on/off types deduplication. there is no command line option to switch it on/off.
  • fdebug-types-section - deduplicates types using comdat sections.

(***) fdebug-types-section is not supported by dsymutil/DWARFLinker.
Thus presented results do not show real data.
But these numbers could be taken as an estimation for future results
when DWARF5/-fdebug-types-section would be supported by dsymutil/DWARFLinker.

(****) HW configuration:

OS Ubuntu 18.04
CPU Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
RAM 32018 MiB
Storage SSD

avl added a comment.Apr 3 2020, 12:27 PM

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

it should but I did not check it yet. I would check and post results here.

In D74169#1960444, @avl wrote:

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

To the best of my knowledge (as the person who implemented type unit support in LLVM... ) type units never reduce the size of object files - they only increase it.

There is no type duplication, so far as I know, in LLVM's object files even without type units - one definition of the type is emitted and all the rest of the DWARF references that type. With type units that type is just the skeleton (DW_AT_declaration and DW_AT_signature), without type units that type is the full definition, without the indirection to the type unit.

Are there particular examples of type duplication within a single object file you have in mind? I'd be super curious to see them - might be some bugs that need fixing.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Bag of DWARF doesn't seem like it'd help in a situation where you know you're going to use a DWARF-aware linker (just put all the DWARF in the CU as normal & know the DWARF-aware linker will introduce cross-CU references as needed as things are deduplicated). The idea for Bag of DWARF was with a DWARF agnostic linker you could reference more than one DIE from in a type unit (or a compile unit) to reduce a bunch of the duplication/overhead that type units introduce (eg: when defining a member function of a type unit, you currently have to duplicate the member function's declaration in the CU that references the TU, because you can't refer to that subprogram declaration DIE in the TU - only the type itself).

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

it should but I did not check it yet. I would check and post results here.

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

(***) fdebug-types-section is not supported by dsymutil/DWARFLinker.
Thus presented results do not show real data.
But these numbers could be taken as an estimation for future results
when DWARF5/-fdebug-types-section would be supported by dsymutil/DWARFLinker.

Sorry, sort of duplicating the other thread of conversation we're having on this review - I don't believe type units would help dsymutil/DWARFLinker - nor that this data would be representative of potential improvements. Could you describe how these numbers would be representative of potential improvements? What benefit would type units provide to DWARF-aware linking?

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

Given this is built on top of the DWARFLinker used for llvm-dsymutil which, I believe, already strips out those duplicate descriptions - hopefully that's already included in the functionality under review? But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?

(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

In D74169#1960444, @avl wrote:

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

.debug_types could be much more efficient if we stop duplicating the decl context just so we can refer to a type. Right now if you have a variable, it will point to a CU local type, which duplicates the containing namespaces and class/struct/union decl contexts. So the overhead of .debug_types is a bit higher. It would be great if we could just refer directly to the type from .debug_types in the DW_TAG_variable and other DIEs that have DW_AT_type attributes.

Also, .debug_types is only there for struct, unions, classes and a few other types. When you have a typedef like "std::string::size_type", or a typedef inside of the std::string, it may or may not appear in the .debug_types definition, but no one can point to those types because the way .debug_types is currently defined, only one type can be referred to within a type unit. The results is that any code wanting to use of one these types must duplicate the entire decl context (namespace "std", class "basic_string<>", typedef "size_type") just so a variable can point to the DWARF.

I mention this because I think letting --gc-debuginfo do its thing with ODR type uniquing leaves the DWARF in a better state: type info for one class isn't strewn across a bunch of .o files, or having large swaths of the type duplicated all over the place. So if we have type uniquing, I would keep .debug_types out of it.

We could move to make .debug_types more useful. Right now a type unit has a single type signature that people can refer to and a single offset within the type unit for the type that people will extract. If we got a DWARF specification change in, we could make it so a single type unit has N signatures and N offsets to types within the type unit. This would allow people to directly reference contained types that are not class, structs or unions, like typedefs. But that requires a DWARF spec change.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

Are object files not actually bigger when using .debug_types? All the duplicated decl context junk in the DWARF is not efficient. I would be surprised if type units make .o files smaller. There is nothing to unique within a single .o file and the type units should only make things bigger.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Right now the ODR type uniquing does this, but it doesn't split the types out into their own compile units. The first time you see a type, you emit it and remember it. All other uses of this type and up using a DW_FORM_ref_addr which is 32 bits to refer to the first emitted type. So the ODR already does this. What it doesn't do is split types out into unique compile units based on their DW_AT_decl_file. If we did, you can slowly combine all DWARF that was defined in header files into DW_TAG_compile_unit DIEs with only types from those files. Then the DWARF would be really clean. The main issue with this approach is that it means you have to make all DWARF in memory to that things can lazily be added to each of these bags (compile units with types in them) since you might end up adding more types to each compile unit. With the current ODR support we can parse the .o files one at a time, and refer to things we have already emitted easily and we don't need to keep everything around so we can add things to the "bags".

The downside of the dsymutil ODR technique, is many times you types that have implicitly added methods (default constructor, assignment operators, move constructors) that are only present if the user used them in their source file. So you might end up with many definitions of "class A", but some have the copy constructor in the DWARF (it was created by the compiler for types where the compiler can create them) and others that don't. So we sometimes have to emit the type multiple times when you have a DW_TAG_subprogram that refers to these implicitly created functions. But overall the approach works well for how easy it is to implement.

I am all for creating compile units and adding all of the types declared in that file to them and having the DWARF be even cleaner though!

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

Given this is built on top of the DWARFLinker used for llvm-dsymutil which, I believe, already strips out those duplicate descriptions - hopefully that's already included in the functionality under review? But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?

(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

dsymutil's ODR does unique the types and only emit one copy. See my comment on implicit methods as one thing that can cause multiple copies of a type to be emitted.

avl added a comment.Apr 3 2020, 2:13 PM

But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?
(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

right. It generates one copy:

$cat test1.cpp

inline void f1() { } void f2(); int main() { f1(); f2(); }

$cat test2.cpp

inline void f1() { } void f2() { f1(); }

$ clang++ test1.cpp test2.cpp -O0 -g -ffunction-sections -fuse-ld=lld -Wl,--gc-sections; llvm-dwarfdump -a a.out | grep "\"f1\""
                DW_AT_name	("f1")
                DW_AT_name	("f1")
0x00000098: "f1"

$ clang++ test1.cpp test2.cpp -O0 -g -ffunction-sections -fuse-ld=lld -Wl,--gc-sections,--gc-debuginfo; llvm-dwarfdump -a a.out | grep "\"f1\""
                DW_AT_name	("f1")
0x0000008b: "f1"
avl added a comment.Apr 4 2020, 9:08 AM

To the best of my knowledge (as the person who implemented type unit support in LLVM... ) type units never reduce the size of object files - they only increase it.

There is no type duplication, so far as I know, in LLVM's object files even without type units - one definition of the type is emitted and all the rest of the DWARF references that type. With type units that type is just the skeleton (DW_AT_declaration and DW_AT_signature), without type units that type is the full definition, without the indirection to the type unit.

Are there particular examples of type duplication within a single object file you have in mind? I'd be super curious to see them - might be some bugs that need fixing.

Sorry. I need to stop writing letters in the late evening.
type units make object files bigger. Thus, type units could not be used for reducing object files.

Mainly I am talking about two things:

1 -fdebug-types-section allows processing debug info faster.

dsymutil/DWARFLinker performance heavily depends on the size of input DWARF.
The more DWARF to parse, the more time the linking process would take.
Time to parse whole DWARF with ODR type deduplication(+gc-debuginfo +NoOdr=false) for clang is:

296%(145s)

Resulting DWARF is accurate: no type duplications, no extra bytes needed for type units. But it's processing time: 145s.

Time to parse in case -fdebug-types-section for clang is:
(most of types are in .debug_types section and ignored currently, thus
only .debug_info/.debug_line/.debug_ranges/.debug_loc are parsed)

133%(65s)

This difference is because not necessary to parse type information.
Comdat sections are quickly resolved by the linker based on section id.
Comparing hash id instead of analyzing context makes it work much faster.

So separating type info from the whole DWARF allows to noticeable speedup the debug info linking.
DWARFLinker will speedup debug info processing in the current scheme by supporting type units(-fdebug-types-section).
It looks like, linking time would be close to 133%(65s).

2 Something like "bag of dies" could avoid increasing size of data(created by type units).

This idea is not for the current state. This is a possible DWARF format change.

The linker processes comdat sections with type units quickly because it does not parse contents.
It uses the hash id to compare and either keep either decline the section.
Adding hash id makes processing faster, but it requires type units.

To reduce size of type units it is possible to implement "bag of dies" idea.
Something like this https://reviews.llvm.org/P8164 (global type table/bag of dies - means something similar).
But how "bag of dies" from different object files would be merged ? When single type is put in single
section - it could be merged per type basis. The similar approach for "bag of dies"
would introduce type duplication(the same types could be used in various "bag of dies").

DWARFLinker could be used for merging "bag of dies" instead of lld linker.
It would merge incoming "bag of dies" into single output "bag of dies".
To avoid heavy parsing and context analyzing DWARFLinker
could use solution similar to comdat solution - whether current die should be put into
resulting "bag of dies" would be decided by compare hash id(DW_AT_signature).

It looks like this solution would allow having a minimal size and fast processing.

Bag of DWARF doesn't seem like it'd help in a situation where you know you're going to use a DWARF-aware linker (just put all the DWARF in the CU as normal & know the DWARF-aware linker will introduce cross-CU references as needed as things are deduplicated). The idea for Bag of DWARF was with a DWARF agnostic linker you could reference more than one DIE from in a type unit (or a compile unit) to reduce a bunch of the duplication/overhead that type units introduce (eg: when defining a member function of a type unit, you currently have to duplicate the member function's declaration in the CU that references the TU, because you can't refer to that subprogram declaration DIE in the TU - only the type itself).

right, DWARF-aware linker doing de-duplication using ODR does not need "Bag of DWARF".
But doing de-duplication using ODR works slower than doing de-duplication for types marked with hash.
To add hash to the type it is necessary to put it into "Bag of DWARF".

It seems that Bag of DWARF used with DWARF agnostic linker would have a problem of type duplication.
"Bag of DWARF" from several object files could have duplicated types.
I think DWARF-aware linker is necessary to deduplicate "Bags of DWARF".

We could move to make .debug_types more useful. Right now a type unit has a single type signature that people can refer to and a single offset within the type unit for the type that people will extract. If we got a DWARF specification change in, we could make it so a single type unit has N signatures and N offsets to types within the type unit. This would allow people to directly reference contained types that are not class, structs or unions, like typedefs. But that requires a DWARF spec change.

yes. that is exactly what I am talking about. using type hash with elements of that type unit(bag of dies) would allow to speedup types de-duplication.

Rather than replying point-by-point, since I think the conversation has got a bit jumbled up - and maybe it's all best left to a separate thread than in this review... but:

The initial idea for bag-of-dwarf was to allow a type unit to expose more than one DIE (so CUs could reference nested types, member functions, parameters, etc). Also potentially to let CUs export entities that are known to have one home & that other DWARF might want to reference (this is a bit more of a stretch - but you could imagine a type with a strong vtable, no point putting it in a type unit, so you put it in the CU - but you could, if you wanted, still expose the type and its member functions as referenceable DIEs that could be referenced from other CUs rather than by having to redeclare (though not define) the entity in that other CU - inter-object-file-DIE references).

I see your point somewhat about how type units could integrate with DWARF-aware linking, but perhaps only in a very specific way - At least as LLVM emits type units, the types should always be identical (the type unit should never contain implicit special members or member function template instantiations, etc) & the variable component of a type (those implicit special members, etc) only ever appear on the declaration of the type that references to the type unit. If you knew that was the kind of input you were getting, you could rely on every type unit being identical (well, with GCC you can rely on each type unit being identical, but the same ODR type might be in multiple distinct type units because it uses exact hashing of the type unit contents, and the contents varies with implicit special members, etc)... Anyway, if you could rely on the type units for a given ODR type being identical or at least equivalent (no implicit special members, etc) - you could just use the first one you saw, drop the rest, and any new members you see can just be attached to a type declaration the way Clang does them today... hmm - I guess that'd work with GCC's output too, just ignoring any duplicate type units even though they aren't quite equivalent - they'd be no worse than Clang's type units. (Clang's never contain any of the special members, GCC's contain the used one - so if your type unit ended up being an arbitrarily chosen one from GCC, it wouldn't be any worse than CLang's - it'd just have some, but not all, the special members, member function templates, etc)

Hmm, I guess no reason you couldn't do this without type units though - maybe - depending on how debuggers handle it. Maybe they aren't good with member function declarations in type declarations if those type declarations aren't referencing type units....

eg: existing debug info with type units can look something like this:

DW_TAG_compile_unit
  DW_TAG_structure_type // DIE: t1.1
    DW_AT_name "t1"
    DW_AT_signature 0x42
    DW_AT_declaration true
  DW_TAG_variable
    DW_AT_type // -> DIE t1.1
DW_TAG_compile_unit
  DW_TAG_structure_type // DIE: t1.2
    DW_AT_name "t1"
    DW_AT_signature 0x42
    DW_AT_declaration true
    DW_TAG_subprogram // DIE: t1::t1
      DW_AT_name "t1"
      DW_AT_declaration
      DW_AT_artificial true
  DW_TAG_subprogram
    DW_AT_specification // -> DIE t1::t1
    // etc... 
  DW_TAG_variable
    DW_AT_type // -> DIE t1.2

type unit, signature 0x42, type offset -> DIE t1.tu
DW_TAG_type_unit
  DW_TAG_structure_type // DIE: t1.tu
    DW_AT_name "t1"
    // line/file/column/size/etc... 
    // let's say there are no explicit members, for the sake of simplicity


Then, without type units you could emit something very similar
DW_TAG_compile_unit
  DW_TAG_structure_type // DIE: t1.1
    DW_AT_name "t1"
    // line/file/column/size/etc... 
  DW_TAG_variable
    DW_AT_type // -> DIE t1.1
DW_TAG_compile_unit
  DW_TAG_structure_type // DIE: t1.2
    DW_AT_name "t1"
    DW_AT_signature 0x42
    DW_AT_declaration true
    DW_TAG_subprogram // DIE: t1::t1
      DW_AT_name "t1"
      DW_AT_declaration
      DW_AT_artificial true
  DW_TAG_subprogram
    DW_AT_specification // -> DIE t1::t1
    // etc... 
  DW_TAG_variable
    DW_AT_type // -> DIE t1.2

ie: don't go back and try to make t1.1 complete/merge all the contents, just tack things on to CU-local declarations whenever you come across a new member that has a definition in this translation unit. (ie: when the linker keeps the code for t1::t1 alive, then keep the subprogram definition alive, which keeps the subprogram declaration alive, which keeps the t1.2 declaration alive - but otherwise skip it and refer to t1.1 directly (or, in a first approximation, just always keep the declaration - but I guess dsymutil already has the smarts to strip the local type definition out entirely (does it use a declaration to reduce encoding length if there's lots of references to a type? Probably not, I guess))

This sort of complex design discussion probably merits some in person or video chat discussions or at least lengthy design discussion on llvm-dev.

One drive-by request/question: have you done any measurements of the impact on debuggers? In other words, does it improve/make worse the time it takes them to load the debug data?

avl added a comment.Apr 6 2020, 4:25 AM

@dblaikie

This sort of complex design discussion probably merits some in person or video chat discussions or at least lengthy design discussion on llvm-dev.

Agreed. I think it would be better to dicuss it through email and extend it to llvm-dev later. I would start that email thread.

avl added a comment.Apr 6 2020, 4:26 AM

One drive-by request/question: have you done any measurements of the impact on debuggers? In other words, does it improve/make worse the time it takes them to load the debug data?

no, I have not done yet. I would do and post results here.

grimar added a comment.Apr 6 2020, 4:46 AM

I've put a few first general comments and suggestions inline.

side note: I've noticed you have a build time 75m15s, what looks to me too much for your configuration. Do you know about "-DLLVM_OPTIMIZED_TABLEGEN=1"? It should give an instant significant boost.

lld/ELF/DWARF.cpp
56

Doesn't seem you really need to know the number of sections?
Hence you can use DenseSet<StringRef> seen; (btw, 'seen' is the common name for sets used in a few similar situations in LLD code).

63

sec.Name.size() > 0 -> !sec.empty()

161

Comments should be started from an uppercase letter: check -> Check.
(applies here and for other comments)

166

Seems this assumes that relocations are sorted by r_offset? This is probably a wrong assumpting.
We have the code, for example, which uses the same thing (and perhaps could be reused somehow, not sure):

// Returns the index of the first relocation that points to a region between
// Begin and Begin+Size.
template <class IntTy, class RelTy>
static unsigned getReloc(IntTy begin, IntTy size, const ArrayRef<RelTy> &rels,
                         unsigned &relocI) {
  // Start search from RelocI for fast access. That works because the
  // relocations are sorted in .eh_frame.
  for (unsigned n = rels.size(); relocI < n; ++relocI) {
    const RelTy &rel = rels[relocI];
    if (rel.r_offset < begin)
      continue;

But it says that relocations are sorted in .eh_frame and I believe it is true,
but generally there is no requirement in ELF specification about that relocations must be sorted by offset I think.

183

Why exactly R_ABS/is it important? (Perhaps needs a comment).

lld/ELF/DWARF.h
84

Where it is used? I\ve not found it in this patch.

122

Is it used somewhere outside the template <class ELFT> LLDDwarfObj<ELFT>::LLDDwarfObj(ObjFile<ELFT> *obj) ?
Can it be a local variable?

lld/ELF/Driver.cpp
348

You do not need to have curly bracers around a single code line (the rule to look around to see the style often works good).

lld/ELF/InputSection.h
127

Why it is needed to store this bit? The code calls isDebugSection, is there something wrong with the current approach?

lld/ELF/LLDDwarfLinker.cpp
59

It should be InputSectionBase.

60

I'd suggest something shorter, like secToIndex

65

Prefer ++foo over foo++.

66

btw, what is so wrong with the following version?

for (InputSectionBase *sec : objFile->getSections())
  sectionIndexesMap[sec] = curIdx++;

(i.e. I probably see no issue with the case when sec == nullptr, as the code below handes this case anyways).

75

Continue early?

if (!sec || !sec->isLive() || !sectionIndexesMap.count(sec))
  continue;
104

We often call such things ret. Up to you.

141

This could be a single line, no need toadd an addend later.

144

Why this assert is useful? What scenario did you have in mind?

146

I think you can just inline Value.

194

An excessive empty line.

195

This comment needs expanding I think. "Estimation" is something that usually sounds tricky and requires an explanation of your algorithm implemented.

197

Don't use auto when the type is not obvious.

200

No need to have curly bracers here.

grimar added inline comments.Apr 6 2020, 4:46 AM
lld/ELF/LLDDwarfLinker.cpp
231

foo.size() > 0 -> !foo.empty() when foo is StringRef

257

emptyWarnings is unused. Is it OK? You can inline it if so.

267

Map what to what?

272

The way you work with Expected<> is incorrect. It might fail on error with LLVM_ENABLE_ABI_BREAKING_CHECKS=1.
You need to handle the error or to ignore it explicitly. E.g.:

Expected<GlobPattern> pat = GlobPattern::create(arg);
if (!pat) {
  error("--undefined-glob: " + toString(pat.takeError()));
  return;
}
284

Don't use auto here.

293

StringRef() -> ""

But I think it can be just:

for (auto &sec : outputSections)
  if (isDebugSection(sec->name))
    sec->setDebugData(sectionsMap.lookup(sec->name));
lld/ELF/Options.td
196

When you add an LLD option, you should also update the documentation (lld/docs/ld.lld.1)

lld/ELF/OutputSections.cpp
333

Needs a comment.

lld/ELF/OutputSections.h
128

Did you consider to use llvm::Optional<StringRef>?

llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
460

You do not need to use else after return:

if (Map.getTriple().isOSDarwin() && !Map.getBinaryPath().empty() &&
    Options.FileType == OutputFileType::Object)
  return MachOUtils::generateDsymCompanion(
      Map, Options.Translator, *Streamer->getAsmPrinter().OutStreamer,
      OutFile);
return Streamer->finish(), true;
461

Something is wrong with this line. , true is a part of something else it seems.

avl marked 32 inline comments as done.Apr 6 2020, 10:33 AM
avl added inline comments.
lld/ELF/DWARF.cpp
56

There should be three states here(seen, not seen, seen more than once). DWARFFormValue::dumpAddressSection() needs this to properly dump die address. If name seen more than once it prints section index.

// Print section index if name is not unique.
if (!SecRef.IsNameUnique)
  OS << format(" [%" PRIu64 "]", SectionIndex);
166

right. It assumes that relocations are sorted by r_offset.
It is sorted in scanRelocs(). But I missed that it is sorted not for all types of sections.
Would fix it , Thanks!

lld/ELF/DWARF.h
84

It is used at DWARFFormValue::dumpAddressSection(). When address is dumped, DWARFObject could be asked for section names(LLDDwarfObj is a child of DWARFObject). It could happen when Verbose mode is set for DWARFLinker.

122

Yes, it could be used outside of LLDDwarfObj() constructor. When DWARFObject::getSectionNames() is called the sectionNames is used.
One such place is DWARFFormValue::dumpAddressSection().

lld/ELF/InputSection.h
127

To not do string compare in every usage. i.e. that bit is set in the constructor and at every usage place is checked. Otherwise, it would be necessary to compare strings at every usage place.

lld/ELF/LLDDwarfLinker.cpp
66

The idea was to not put error data into sectionIndexesMap. sec is indeed checked later for nullptr.
But it looks like not putting wrong data is better approach than relying on following checks.

144

To check that relocation passed to the handler is from baseOffset, baseOffset + data.size() range.
i.e. to check that enumerateRelocations works correctly and pass proper relocations to the handler.

257

it is used only in std::make_unique<DwarfFile>.

If emptyWarnings is used inline then it`s constructor would be called for each ObjectFile. Probably we could rely on compiler here...

267

Would add comment.

grimar added a comment.Apr 7 2020, 3:20 AM

Few more comments. They are mostly to give a feedback about the code style, I have not look deep into how it works yet.

lld/ELF/DWARF.cpp
29

I'd call it sectionNumber, having Map in the name here is not useful.

53

To avoid nesting, you can use an early continue:

if (!sec) {
  sectionNames.push_back({"", true});
  continue;
}
...
56

Ah, sorry, I've misread the code slightly, I did not notice the codition is sectionAmountMap[sec.Name] > 1 and not sectionAmountMap[sec.Name] > 0.

63

It is better to avoid using operator[] in situations like this, because
it is not const, when a sec.Name is not found it creates a new map entry. This is not what you want here,
you can use lookup().

lld/ELF/InputSection.h
127

Yeah, but did you measure numbers, i.e. is there a visible benefit? The code in this diff still calls isDebugSection in many places. It looks like a premature optimization to me and/or something that could probably be moved to another patch (which could change all call sites at once).

lld/ELF/LLDDwarfLinker.cpp
207

I think the returned type of this lambda can be deduced:

[](StringRef S) { return S; }

219

This 2 lambdas looks similar. Also you have 2 more below.
It seems you could do something like the following?

auto ReportWarn = [] [&](const Twine &msg, StringRef ctx, const DWARFDie *) {
if (!context.empty())
  warn(ctx+ ": " + msg);
else
  warn(msg);
};

DwarfStreamer outStreamer(..., ReportWarn, ReportWarn);
257

I'd inline to make the code shorter/cleaner. It doesn't look as hot path, so no need to optimize it until a benefit is proven.

avl marked an inline comment as done.Apr 9 2020, 5:25 AM
avl added inline comments.
lld/ELF/DWARF.cpp
166

@grimar @ruiu It looks like it was considered safe to rely on sorted order of relocations - D36079. What do you think ?

grimar added inline comments.Apr 9 2020, 6:18 AM
lld/ELF/DWARF.cpp
166

I do not know. ELF specification does not says they must be. Perhaps we can at least check this with std::is_sorted.
I see lld/Wasm does it:

ArrayRef<WasmRelocation> relocs = section->Relocations;
assert(std::is_sorted(relocs.begin(), relocs.end(),
                      [](const WasmRelocation &r1, const WasmRelocation &r2) {
                        return r1.Offset < r2.Offset;
                      }));

assert will not help in release build though.
Perhaps other people can comment on this, as you see my concern about D36079 was not answered that time it seems.

avl marked an inline comment as done.Apr 9 2020, 6:43 AM
avl added inline comments.
lld/ELF/DWARF.cpp
166

I see. I also did not find any reference to documentation which mentioned that relocations must be sorted(nor ELF file specification, nor DWARF specification).

grimar added inline comments.Apr 9 2020, 7:37 AM
lld/ELF/DWARF.cpp
166

I found it was answered actually:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170731/474875.html
Then it is OK I think.

avl updated this revision to Diff 256333.Apr 9 2020, 10:33 AM

addressed comments:

added test for -flto case.
resolved style issues.

I did not change isDebug bit yet. would make it with separate review.

avl added a comment.Apr 9 2020, 10:43 AM

@jhenderson

One drive-by request/question: have you done any measurements of the impact on debuggers? In other words, does it improve/make worse the time it takes them to load the debug data?

loading times for debuggers:

no gc-debuginfo case:
gdb 8.1 clang11 1,4G: loading time 56 sec used memory 2166872kb
lldb 8.0 clang11 1,4G: loading time 7.5 sec used memory 1508860kb

gc-debuginfo case:
gdb 8.1 clang11 783M: loading time 51 sec used memory 1741552kb
lldb 8.1 clang11 783M: loading time 4 sec used memory 842980kb

Overall result: gc-debuginfo makes loading time shorter and reduces memory usage for debuggers,

In D74169#1972348, @avl wrote:

@jhenderson

One drive-by request/question: have you done any measurements of the impact on debuggers? In other words, does it improve/make worse the time it takes them to load the debug data?

loading times for debuggers:

no gc-debuginfo case:
gdb 8.1 clang11 1,4G: loading time 56 sec used memory 2166872kb
lldb 8.0 clang11 1,4G: loading time 7.5 sec used memory 1508860kb

gc-debuginfo case:
gdb 8.1 clang11 783M: loading time 51 sec used memory 1741552kb
lldb 8.1 clang11 783M: loading time 4 sec used memory 842980kb

Overall result: gc-debuginfo makes loading time shorter and reduces memory usage for debuggers,

Might be worth comparing with -gdb-index too (though not sure if gdb-index will be immediately compatible with the DWARF aware linking, so a difficult comparison - but even comparing "gdb-index without DWARF-aware linking V DWARF-aware linking without gdb-index", etc)

avl added a comment.Apr 10 2020, 4:06 AM

Might be worth comparing with -gdb-index too (though not sure if gdb-index will be immediately compatible with the DWARF aware linking, so a difficult comparison - but even comparing "gdb-index without DWARF-aware linking V DWARF-aware linking without gdb-index", etc)

here are the results for clang binary:

-------------------------------------------------------------
|     Options    | Run-time, sec | Used Memory, kb |  Size  |
-------------------------------------------------------------
|                |     35.58     |     9403596     |  1,4G  |
-------------------------------------------------------------
|  gc-debuginfo  |    109.15     |    14275444     |  783M  |
-------------------------------------------------------------
|  gc-debuginfo  |    130.63     |    14394700     |  797M  |
|   gdb-index    |               |                 |        |
-------------------------------------------------------------
|   gdb-index    |     52.88     |     9524976     |  1.5G  | 
-------------------------------------------------------------

I did not check whether gdb-index is compatible with DWARF aware linking output.

There probably exists the room for optimization of gdb-index(related to the strategy of DWARF context creation).
As it was mentioned in https://reviews.llvm.org/D74773#1920751 simple caching of DWARF context results in an 8% performance increase and 20% memory usage increase.
It is probably worth implementing some smart strategy(reference counting?) that will keep DWARF context while there are users of it and release it when it is not expected to be used(so that it does not hold all DWARF contexts forever).

In D74169#1974097, @avl wrote:

Might be worth comparing with -gdb-index too (though not sure if gdb-index will be immediately compatible with the DWARF aware linking, so a difficult comparison - but even comparing "gdb-index without DWARF-aware linking V DWARF-aware linking without gdb-index", etc)

here are the results for clang binary:

-------------------------------------------------------------
|     Options    | Run-time, sec | Used Memory, kb |  Size  |
-------------------------------------------------------------
|                |     35.58     |     9403596     |  1,4G  |
-------------------------------------------------------------
|  gc-debuginfo  |    109.15     |    14275444     |  783M  |
-------------------------------------------------------------
|  gc-debuginfo  |    130.63     |    14394700     |  797M  |
|   gdb-index    |               |                 |        |
-------------------------------------------------------------
|   gdb-index    |     52.88     |     9524976     |  1.5G  | 
-------------------------------------------------------------

Ah, sorry, I meant specifically related to the gdb startup time point you made earlier about DWARF-aware linking improving startup time. It'd be useful to measure startup time with a gdb-index for comparison. (I'm guessing the time in the above chart is linking time, rather than GDB startup time)

I did not check whether gdb-index is compatible with DWARF aware linking output.

There probably exists the room for optimization of gdb-index(related to the strategy of DWARF context creation).
As it was mentioned in https://reviews.llvm.org/D74773#1920751 simple caching of DWARF context results in an 8% performance increase and 20% memory usage increase.
It is probably worth implementing some smart strategy(reference counting?) that will keep DWARF context while there are users of it and release it when it is not expected to be used(so that it does not hold all DWARF contexts forever).

avl added a comment.Apr 10 2020, 10:27 AM

Ah, sorry, I meant specifically related to the gdb startup time point you made earlier about DWARF-aware linking improving startup time. It'd be useful to measure startup time with a gdb-index for comparison. (I'm guessing the time in the above chart is linking time, rather than GDB startup time)

right, these are linking times. here is start-up times:

-------------------------------------------------------------------
 Debugger     |     Options    | load-time, sec | Used Memory, kb |
-------------------------------------------------------------------
| gdb 8.1     |                |      47,79     |     2168172     |
-------------------------------------------------------------------
| gdb 8.1     |    gdb-index   |       3.04     |      555408     |
-------------------------------------------------------------------
| gdb 8.1     |  gc-debuginfo  |      44,8      |     1740876     |
-------------------------------------------------------------------
| gdb 8.1     |    gdb-index   |       3.7      |      555340     |
|             |  gc-debuginfo  |                |                 |
-------------------------------------------------------------------
| lldb 8      |                |       6,29     |     1511052     |
-------------------------------------------------------------------
| lldb 8      |    gdb-index   |       6,22     |     1523516     |
-------------------------------------------------------------------
| lldb 8      |  gc-debuginfo  |       3,32     |      843600     |
-------------------------------------------------------------------
| lldb 8      |    gdb-index   |       3.6      |      858532     |
|             |  gc-debuginfo  |                |                 |
-------------------------------------------------------------------

though gdb-index+gc-debuginfo times could be meaningless.

In D74169#1974622, @avl wrote:

Ah, sorry, I meant specifically related to the gdb startup time point you made earlier about DWARF-aware linking improving startup time. It'd be useful to measure startup time with a gdb-index for comparison. (I'm guessing the time in the above chart is linking time, rather than GDB startup time)

right, these are linking times. here is start-up times:

-------------------------------------------------------------------
 Debugger     |     Options    | load-time, sec | Used Memory, kb |
-------------------------------------------------------------------
| gdb 8.1     |                |      47,79     |     2168172     |
-------------------------------------------------------------------
| gdb 8.1     |    gdb-index   |       3.04     |      555408     |
-------------------------------------------------------------------
| gdb 8.1     |  gc-debuginfo  |      44,8      |     1740876     |
-------------------------------------------------------------------
| gdb 8.1     |    gdb-index   |       3.7      |      555340     |
|             |  gc-debuginfo  |                |                 |
-------------------------------------------------------------------
| lldb 8      |                |       6,29     |     1511052     |
-------------------------------------------------------------------
| lldb 8      |    gdb-index   |       6,22     |     1523516     |
-------------------------------------------------------------------
| lldb 8      |  gc-debuginfo  |       3,32     |      843600     |
-------------------------------------------------------------------
| lldb 8      |    gdb-index   |       3.6      |      858532     |
|             |  gc-debuginfo  |                |                 |
-------------------------------------------------------------------

though gdb-index+gc-debuginfo times could be meaningless.

Thanks! (yeah, the gdb-index + lldb times aren't relevant either, lldb uses a different kind of accelerated access (one that's only supported on the MachO debug info distribution model currently - but I believe someone's working on/plans to work on DWARFv5 debug_names support))

avl added a comment.Apr 13 2020, 8:35 AM

Thanks! (yeah, the gdb-index + lldb times aren't relevant either, lldb uses a different kind of accelerated access (one that's only supported on the MachO debug info distribution model currently - but I believe someone's working on/plans to work on DWARFv5 debug_names support))

dsymutil supports DWARFv5 debug_names table generation. So if --gc-debuginfo specified then debug_names would be generated. This patch has a minor bug with it. I would update it.

avl updated this revision to Diff 257399.Apr 14 2020, 10:28 AM

added test for flto case.(which shows that -flto=thin does not work currently).
added option --debug-names, which enable generation of debug_names table.

avl added a comment.Apr 14 2020, 10:32 AM

This implementation does not work for -flto=thin case. That is the same problem like in D54747: https://reviews.llvm.org/D54747#1503720.

avl updated this revision to Diff 258171.Apr 16 2020, 2:20 PM

rebased.

avl added a comment.Apr 17 2020, 7:23 AM

Re-measured size/performance results:

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --fdebug-types-section --gc-debuginfo
(last case does not produce valid DWARF currently)

LLVM code base:
--------------------------------------------------------------
| Options |    build time   |    bin size   |    lib size    | 
--------------------------------------------------------------
|    A    |    66min(100%)  |   18.0G(100%) |  16.0G(100.0%) |
--------------------------------------------------------------
|    B    |    81min(123%)  |    9.7G( 54%) |  13.0G( 81.0%) |
--------------------------------------------------------------
|    C    |    58min( 87%)  |   12.0G( 67%) |  15.0G( 93.5%) |
--------------------------------------------------------------
|    D    |    58min( 87%)  |   12.0G( 67%) |   8.7G( 54.0%) |
--------------------------------------------------------------
|    E    |    68min(103%)  |    7.6G( 42%) |  14.0G( 87.5%) |
--------------------------------------------------------------


Clang binary:
--------------------------------------------------------------
| Options |       size     |     link time   |  used memory  |
--------------------------------------------------------------
|    A    |    1.50G(100%) |   26.7sec(100%) |  9191MB(100%) |
--------------------------------------------------------------
|    B    |    0.76G( 50%) |  107.4sec(402%) | 13361MB(145%) |
--------------------------------------------------------------
|    C    |    0.82G( 54%) |   18.3sec( 68%) |  8300MB( 90%) |
--------------------------------------------------------------
|    D    |    0.96G( 64%) |    7.6sec( 28%) |  4255MB( 46%) |
--------------------------------------------------------------
|    E    |    0.58G( 45%) |   58.5sec(219%) | 10317MB(112%) |
--------------------------------------------------------------


lldb loading time:
---------------------------------------------
| Options |       time     |   used memory  |
---------------------------------------------
|    A    |  6.32sec(100%) |  1475MB(100%)  |
---------------------------------------------
|    B    |  3.49sec( 55%) |   825MB( 56%)  |
---------------------------------------------
|    C    |  3.65sec( 58%) |   875MB( 59%)  |
---------------------------------------------
|    D    |  4.12sec( 65%) |  1017MB( 69%)  |
---------------------------------------------
|    E    |  3.95sec( 63%) |   628MB( 43%)  |
---------------------------------------------

Just an excited observer, but wanted to drop some statistics of my own, in case they're helpful.

I have a Rust project (Materialize) that compiles down into a ~800MB (!) binary by default. The weight is almost entirely debug info; .text is less than 5% of its size. This is evidently an artifact of how Rust codegen units are much, much larger than your average C/C++ codegen units and so the toolchain relies heavily on the linker for dead code elimination. Of course that dead code elimination doesn't currently apply to the debug info. This diff is exactly what's needed to solve the problem, as far as I can tell. (There's an upstream issue about this that led me here: https://github.com/rust-lang/rust/issues/56068.)

Without --gc-debuginfo:

# Link time: 10.486s
benesch@gibbon:~/materialize$ bloaty target/release/materialized -n 10
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  24.5%   194Mi   0.0%       0    .debug_info
  24.1%   191Mi   0.0%       0    .debug_loc
  13.8%   109Mi   0.0%       0    .debug_pubtypes
  10.1%  79.9Mi   0.0%       0    .debug_pubnames
   8.8%  70.0Mi   0.0%       0    .debug_str
   8.3%  66.3Mi   0.0%       0    .debug_ranges
   4.4%  35.3Mi   0.0%       0    .debug_line
   3.1%  24.8Mi  66.3%  24.8Mi    .text
   1.8%  14.4Mi  25.1%  9.39Mi    [41 Others]
   0.6%  4.79Mi   0.0%       0    .strtab
   0.4%  3.22Mi   8.6%  3.22Mi    .eh_frame
 100.0%   793Mi 100.0%  37.4Mi    TOTAL

With --gc-debuginfo:

# Link time: 249.855s
benesch@gibbon:~/materialize$ bloaty target/release/materialized -n 10
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  37.9%   115Mi   0.0%       0    .debug_info
  17.9%  54.7Mi   0.0%       0    .debug_str
  12.0%  36.7Mi   0.0%       0    .debug_ranges
  11.2%  34.4Mi   0.0%       0    .debug_loc
   8.1%  24.8Mi  66.3%  24.8Mi    .text
   6.6%  20.1Mi   0.0%       0    .debug_line
   2.7%  8.15Mi  19.3%  7.23Mi    [35 Others]
   1.6%  4.79Mi   0.0%       0    .strtab
   1.1%  3.22Mi   8.6%  3.22Mi    .eh_frame
   1.0%  2.99Mi   0.0%       0    .symtab
   0.0%       0   5.8%  2.16Mi    .bss
 100.0%   305Mi 100.0%  37.4Mi    TOTAL

So hugely successful in reducing the size of the debug info! The new binary is 40% of the size of the original. Some of that savings is from omitting debug_pubtypes and debug_pubnames, which is easy enough to strip without this patch, but there is also a substantial reduction in debug_info and debug_loc—a much larger reduction than dbz is able to provide.

Unfortunately there is also a massive blowup in link times, from 10s to 250s. That's a 25x increase, much larger than the link time increase for Clang described above, which was about 4x.

Just an excited observer, but wanted to drop some statistics of my own, in case they're helpful.

I have a Rust project (Materialize) that compiles down into a ~800MB (!) binary by default. The weight is almost entirely debug info; .text is less than 5% of its size. This is evidently an artifact of how Rust codegen units are much, much larger than your average C/C++ codegen units and so the toolchain relies heavily on the linker for dead code elimination.

Out of curiosity is there any work in the direction of fragmenting these modules/objects further to reduce this & other negative side effects of that situation?

Of course that dead code elimination doesn't currently apply to the debug info. This diff is exactly what's needed to solve the problem, as far as I can tell. (There's an upstream issue about this that led me here: https://github.com/rust-lang/rust/issues/56068.)

Without --gc-debuginfo:

# Link time: 10.486s
benesch@gibbon:~/materialize$ bloaty target/release/materialized -n 10
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  24.5%   194Mi   0.0%       0    .debug_info
  24.1%   191Mi   0.0%       0    .debug_loc
  13.8%   109Mi   0.0%       0    .debug_pubtypes
  10.1%  79.9Mi   0.0%       0    .debug_pubnames
   8.8%  70.0Mi   0.0%       0    .debug_str
   8.3%  66.3Mi   0.0%       0    .debug_ranges
   4.4%  35.3Mi   0.0%       0    .debug_line
   3.1%  24.8Mi  66.3%  24.8Mi    .text
   1.8%  14.4Mi  25.1%  9.39Mi    [41 Others]
   0.6%  4.79Mi   0.0%       0    .strtab
   0.4%  3.22Mi   8.6%  3.22Mi    .eh_frame
 100.0%   793Mi 100.0%  37.4Mi    TOTAL

With --gc-debuginfo:

# Link time: 249.855s
benesch@gibbon:~/materialize$ bloaty target/release/materialized -n 10
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  37.9%   115Mi   0.0%       0    .debug_info
  17.9%  54.7Mi   0.0%       0    .debug_str
  12.0%  36.7Mi   0.0%       0    .debug_ranges
  11.2%  34.4Mi   0.0%       0    .debug_loc
   8.1%  24.8Mi  66.3%  24.8Mi    .text
   6.6%  20.1Mi   0.0%       0    .debug_line
   2.7%  8.15Mi  19.3%  7.23Mi    [35 Others]
   1.6%  4.79Mi   0.0%       0    .strtab
   1.1%  3.22Mi   8.6%  3.22Mi    .eh_frame
   1.0%  2.99Mi   0.0%       0    .symtab
   0.0%       0   5.8%  2.16Mi    .bss
 100.0%   305Mi 100.0%  37.4Mi    TOTAL

So hugely successful in reducing the size of the debug info! The new binary is 40% of the size of the original. Some of that savings is from omitting debug_pubtypes and debug_pubnames, which is easy enough to strip without this patch, but there is also a substantial reduction in debug_info and debug_loc—a much larger reduction than dbz is able to provide.

Unfortunately there is also a massive blowup in link times, from 10s to 250s. That's a 25x increase, much larger than the link time increase for Clang described above, which was about 4x.

Out of curiosity is there any work in the direction of fragmenting these modules/objects further to reduce this & other negative side effects of that situation?

I'm afraid I have no idea. Nothing I'm aware of, but I'm not a rustc dev. Perhaps @rocallahan knows. I'll ask on the Rust Zulip, too.

avl updated this revision to Diff 258596.Apr 19 2020, 5:52 AM

allowed multi-threaded execution.

avl added a comment.Apr 19 2020, 5:57 AM

@benesch Thank you for the interest in this patch and the statistics data.

Unfortunately there is also a massive blowup in link times, from 10s to 250s. That's a 25x increase, much larger than the link time increase for Clang described above, which was about 4x.

Excuse me. It looks like my reported statistic is not very accurate. I would re-measure it and re-submit.
There are two things here: first is that reported numbers are for the patch, which I did not update(because of some bug). And another is that calculations are not accurate. I have just updated the patch. Following is what I got for clang:

/usr/bin/time -f "%E %M" bin/clang-11
0:11.62 9509308

/usr/bin/time -f "%E %M" bin/clang-11 --gc-debuginfo
1:15.16 15346672

i.e. it is about 6.5x, not 25x. I would also check the numbers for the updated patch and for your materialize project. Probably, it might be quite close since .text section takes 7.5% of entire binary in clang.
Thank you.

The new binary is 40% of the size of the original. Some of that savings is from omitting debug_pubtypes and debug_pubnames, which is easy enough to strip without this patch

The reason why debug_pubtypes and debug_pubnames were stripped, is that the current patch uses already existed dsymutil code, which does not generate debug_pubtypes and debug_pubnames. This patch changes debug info. Thus it removes other debug info tables, which it does not support(since they most probably would contain no relevant data). Supporting all these tables(as well as DWARF5) is a future task for the DWARFLinker.

avl added a comment.Apr 21 2020, 2:27 PM

Re-measured size/performance results once more. These results are for 12-cores machine. Actually, the more cores the bigger difference between --gc-debuginfo/--no-gc-debuginfo cases.

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --gc-debuginfo --compress-debug-sections=zlib
F: --function-sections --gc-sections --fdebug-types-section --gc-debuginfo
(last case does not produce valid DWARF currently)

LLVM code base:
--------------------------------------------------------------
| Options |    build time   |    bin size   |    lib size    | 
--------------------------------------------------------------
|    A    |    54min(100%)  |   19.0G(100%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    B    |    65min(120%)  |    9.7G( 51%) |  12.0G( 80.0%) |
--------------------------------------------------------------
|    C    |    53min( 98%)  |   12.0G( 63%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    D    |    52min( 96%)  |   12.0G( 63%) |   8.2G( 55.0%) |
--------------------------------------------------------------
|    E    |    64min(118%)  |    5.3G( 28%) |  12.0G( 80.0%) |
--------------------------------------------------------------
|    F    |    57min(105%)  |    7.6G( 40%) |  14.0G( 93.0%) |
--------------------------------------------------------------


Clang binary:
-------------------------------------------------------------
| Options |      size      |     link time  |  used memory  |
-------------------------------------------------------------
|    A    |    1.50G(100%) |    9sec(100%)  |  9307MB(100%) |
-------------------------------------------------------------
|    B    |    0.76G( 50%) |   72sec(800%)  | 15055MB(161%) |
-------------------------------------------------------------
|    C    |    0.82G( 54%) |    8sec( 89%)  |  8402MB( 90%) |
-------------------------------------------------------------
|    D    |    0.96G( 64%) |    6sec( 67%)  |  4273MB( 46%) |
-------------------------------------------------------------
|    E    |    0.43G( 29%) |   80sec(889%)  | 15000MB(161%) |
-------------------------------------------------------------
|    F    |    0.58G( 39%) |   30sec(334%)  | 10890MB(117%) |
-------------------------------------------------------------


lldb loading time:
--------------------------------------------
| Options |      time     |   used memory  |
--------------------------------------------
|    A    |  6.4sec(100%) |  1495MB(100%)  |
--------------------------------------------
|    B    |  4.0sec( 63%) |   826MB( 55%)  |
--------------------------------------------
|    C    |  3.7sec( 58%) |   877MB( 59%)  |
--------------------------------------------
|    D    |  4.3sec( 67%) |  1023MB( 69%)  |
--------------------------------------------
|    E    |  2.1sec( 33%) |   478MB( 32%)  |
--------------------------------------------
|    F    |  2.7sec( 42%) |   631MB( 43%)  |
--------------------------------------------
avl updated this revision to Diff 260362.Apr 27 2020, 9:56 AM

rebased. added tests.

avl added a comment.Apr 27 2020, 10:36 AM

@ruiu @grimar @MaskRay @dblaikie @clayborg

What do you think about that patch ? Whether it is generally OK to start integration? If it is OK then I would divide the patch to smaller patches and start to integrate.

I think it is worth integrating: it has a small impact on existing lld code(most implementation is in separate library DWARFLinker. Only lld/ELF/DWARF.* and lld/ELF/LLDDwarfLinker.* are noticeably affected). --gc-debuginfo is off by default so it would not affect most of the users. Only those who explicitly specified --gc-debuginfo would use that functionality. The patch noticeably reduces the size of debug info. It also resolves"overlapping address ranges" problem. Upcoming optimizations could minimize the performance reduction.

avl updated this revision to Diff 261860.May 4 2020, 10:16 AM

rebased. improved execution time for 5%.

@avl Would it be possible to rebase this on latest, or is this diff abandoned?

avl updated this revision to Diff 292457.Sep 17 2020, 4:21 AM
avl edited the summary of this revision. (Show Details)

rebased.

avl added a comment.Sep 17 2020, 4:42 AM

@ayermolo Hi Alexander, I rebased on latest but did not check whether current size/performance numbers match with reported previously.

ayermolo added inline comments.Sep 23 2020, 2:58 PM
lld/ELF/LLDDwarfLinker.cpp
81

The RangesTy is defined as a map with uint64_t
using RangesTy = std::map<uint64_t, ObjFileAddressRange>;

Not sure I follow how this works. Am I missing some c++ thing? This fails to compile with clang.

223

There is no AccelTableKind::None, did you mean AccelTableKind::Default?

enum class AccelTableKind {

Apple,   ///< .apple_names, .apple_namespaces, .apple_types, .apple_objc.
Dwarf,   ///< DWARF v5 .debug_names.
Default, ///< Dwarf for DWARF5 or later, Apple otherwise.

};

avl added inline comments.Sep 24 2020, 1:26 AM
lld/ELF/LLDDwarfLinker.cpp
81

The RangesTy is defined as a map with object::SectionedAddress.

/// Map LowPC to AddressHighPC.
using RangesTy = std::map<object::SectionedAddress, AddressHighPC>;

It looks like your local source files are in inconsistent state. This patch was build successfully by pre-merge build bot.

223

It is added by this patch at line 29 of llvm/include/llvm/DWARFLinker/DWARFLinker.h.

llvm/include/llvm/DWARFLinker/DWARFLinker.h
29

Here is the place where AccelTableKind::None is defined.

avl updated this revision to Diff 298246.Oct 14 2020, 2:53 PM

rebased, added command-line option --gc-debuginfo-no-odr.

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2023, 10:31 AM
Herald added subscribers: jplehr, ormris. · View Herald Transcript