Page MenuHomePhabricator

[WIP][LLD][ELF][DebugInfo] Remove obsolete debug info.
Needs ReviewPublic

Authored by avl on Feb 6 2020, 2:17 PM.

Details

Summary

This patch is the implementation for removing obsolete debug info:

http://lists.llvm.org/pipermail/llvm-dev/2019-September/135473.html

It uses DWARFLinker which contains code refactored out from dsymutil.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
In D74169#1883671, @avl wrote:
On one side, I've heard thoughts that:
 (1) if we use -gsplit-dwarf, is .debug* garbage collection still valuable?

For users using -gsplit-dwarf, linker garbage collection provides zero value. But not everyone does, there are reasons to use one or the other in different situations.

also, as an theoretical alternative - probably we might want to implement -gsplit-dwarf so that it include only used part of debug info. i.e. that which referenced after garbage collection. As the result size of all .dwo files would be smaller.

That's not possible - the .dwo files are produced during compilation time, before the linker has done any garbage collection.

(currently even when building the dwp file from dwo files it's not possible to strip the dead debug info - since there's no feedback from the linker that reaches the dwp tool (dwp and linking might be done in parallel on different machines, for instance) - but it'd be possible to implement something more like dsymutil where the linker generates a side-file address mapping/details of the garbage collection and then that is fed into the dwp tool (this would delay running the dwp tool though))

avl added a comment.Feb 21 2020, 4:19 AM

That's not possible - the .dwo files are produced during compilation time, before the linker has done any garbage collection.

(currently even when building the dwp file from dwo files it's not possible to strip the dead debug info - since there's no feedback from the linker that reaches the dwp tool (dwp and linking might be done in parallel on different machines, for instance) - but it'd be possible to implement something more like dsymutil where the linker generates a side-file address mapping/details of the garbage collection and then that is fed into the dwp tool (this would delay running the dwp tool though))

Right. That is not possible in current scheme. Though we could probably change the scheme(if it would be considered beneficial). One alternative is what you were talking about - dsymutil/dwp tool which would take linker report and .dwo files and produce optimized .dwo files. Another alternative is that .dwo files could be optimized during linking time using the same scheme like current proposal. This proposal would take unoptimized .debug* sections of binary and generate optimized contents of destination .debug* sections during linking. Generally, It is possible to take unoptimized .dwo files and make them optimized.

I do not suggest to do it now. That is just possible alternative which could probably be considered if current proposal would show good results.

Another thing, is that this patch could be useful for -gsplit-dwarf even without optimizing .dwo files.
It could skip DW_TAG_skeleton_unit related to deleted code. So that Skeleton file do not have overlapped address ranges like this :

0x00000014: DW_TAG_skeleton_unit
              DW_AT_dwo_name    ("split1.dwo")
              DW_AT_low_pc      (0x0000000000000000)   <<< not assigned because of deleted code
              DW_AT_high_pc     (0x0000000000000008)
              DW_AT_addr_base   (0x00000008)

0x00000034: DW_TAG_skeleton_unit
              DW_AT_dwo_name    ("split2.dwo")
              DW_AT_low_pc      (0x0000000000000000)   <<< not assigned because of deleted code
              DW_AT_high_pc     (0x0000000000000008)
              DW_AT_addr_base   (0x00000008)

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

One last point is to have a flag that enabled DWARF verification. This can be run before the GC operation to ensure the incoming DWARF is in good shape and again after any modifications, to ensure the DWARF is still valid.

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

I believe all the functionality in llvm-dsymutil related to DWARF linking that implements the features you described above (llvm-dsymutil was initially implemented with bug-for-bug compatibility with Apple dsymutil for safety and since then I think a few of those bugs have been fixed/improvements have been made) & that code is now/already in the llvm/lib/DWARFLinker library so it can be reused in lld for this effort (apologies if I'm preaching to the choir/being redundant here) & any others you might have in mind. No doubt there's probably some refactoring to do to improve its reusability as the integration with lld & other situations is done.

I created dsymutil when at Apple, which has now been taken over by llvm-dsymutil. So a few thoughts on general DWARF linking and optimization:

As dsymutil links the DWARF it sees, it does indeed have a linker map of all final addresses for any addresses in each .o file. dsymutil will go through all DIEs within a CU and mark the ones that should be kept by starting with the compile unit DIE, then each DIE referred to the by the final link map (DW_TAG_subprogram, global or static DW_TAG_variable), that had a mapping in the linked executable will be marked as needing to be kept along with all child DIEs and all DIEs that these child DIEs refer to. You will run into many DIEs over and over, but as soon as you run into one that is already marked as needing to be kept, you can avoid recursing into that DIE again, so it never really gets out of hand to do this pass. One thing to be careful of: enumerations. A lot of code uses enums, but they don't use the enum type when that enumeration defines bit masks, so you might end up with code that doesn't refer to an enum type with any variables (they just use a signed or unsigned integer as the variable type), but it is sometimes still a good idea to keep these types around. We had issue where we were dead stripping these types and the enum weren't available in the debug info.

The other thing that dsymutil does to reduce file size, and the most valuable part of size reduction, is it uses ODR to only emit types one time when possible. If you have 100 .o files with std::string inside of them, you can often emit a single copy and change any future .o files references to their local copy, to use a DW_FORM_ref_addr form to refer to the first definition that was emitted. Enabling this allows us to trim debug info size by 75%. Dsymutil uses a method where it figures out the decl file and line for a type and the DW_AT_byte_size, along with a few other checks, to quickly unique types in C++ code. There is some trickiness here where compiler generated constructors and destructors may appear in one type and not in another, so there is some code to make sure there is a DIE within the type refer to, but overall this works really well and is pretty cheap to implement.

Line tables are fully relinked as well and all line entries for addresses that were stripped are removed. This can be tricky because often times the line tables are emitted as one large stream with no DW_LNE_end_sequence opcode at the end of each function address range. New DW_LNE_end_sequence opcodes need to be inserted when optimizing line tables so that you don't end up with an invalid line table where addresses don't increase within a sequence. Adding the new DW_LNE_end_sequence opcode terminates the sequence and allows the next line entries to change addresses to a lower address than the previous DW_LNE_end_sequence line entry.

dsymutil adds accelerator tables, in DWARF5 format or the Apple format, that are calculated by traversing the new DWARF once it is created. This means you don't end up with 100 accelerators tables concatenated together like linkers do now. I am keenly interested in this feature so we can prove that having the accelerator tables helps debugging speeds. Proving the accelerator tables help will help us get toolchains to integrate the accelerator tables into their builds. Having it start off as a separate post production step is a great way to help people figure out if they want to adopt them.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

Having all of this re-linking code in a DWARFLinker library llvm/lib/DebugInfo/DWARF would be great. We have many tools at Facebook that are relinking DWARF, and not doing a very good job. If we had a library that would allow people to re-link their DWARF, either linking .o file DWARF into a main executable, or having an optimization pass that runs on fully linked libraries that might do function outlining or block re-use optimizations, then we can ensure the final DWARF is in great shape.

I believe all the functionality in llvm-dsymutil related to DWARF linking that implements the features you described above (llvm-dsymutil was initially implemented with bug-for-bug compatibility with Apple dsymutil for safety and since then I think a few of those bugs have been fixed/improvements have been made) & that code is now/already in the llvm/lib/DWARFLinker library so it can be reused in lld for this effort (apologies if I'm preaching to the choir/being redundant here) & any others you might have in mind. No doubt there's probably some refactoring to do to improve its reusability as the integration with lld & other situations is done.

That is great to hear! Sorry I haven't looked around at that code in a while, but glad to hear it is all ready to go.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time. I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac. That allows people to get the benefits of the .dwo performance improvements for the edit/compile /debug cycle and then link it into a minimal and optimized DWARF, using contribution from this patch and others to come that optimize the DWARF, for permanent storage on build servers.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time. I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

Note that such post-processing is probably more suited to llvm-objcopy than lld, though of course they can use the same underlying LLVM library.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

It's "complete" in the sense that it's whatever GDB wanted and created a contract that GCC (& Clang/LLVM as best we know/have done so far) agree too, so GDB can rely on things being present/not present in some sense.

That's why debug_gnu_pubnames existed - because of, as you say, the underspecified nature of (or unreliable implementation of) debug_pubnames.

I believe debug_names semantics have been more strictly specified regarding what names will be there and what names will not.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

That allows people to get the benefits of the .dwo performance improvements for the edit/compile /debug cycle and then link it into a minimal and optimized DWARF, using contribution from this patch and others to come that optimize the DWARF, for permanent storage on build servers.

lld already supports gdb_index & there's plans to add debug_names support in the near future (~6 months).

Does it support adding the GDB index after the fact on an already linked binary, or just at normal link time? I want to add a GSYM option to LLD at some point too. Can replace -gline-tables-only in many cases.

I don't think there's been any effort to add it as a post-processing step, no. I believe it can be built either from parsed DWARF or from debug_gnu_pubnames - in either case, at link time.

is debug_gnu_pubnames a new section that is now complete? The old .debug_pub* sections only contained globally visible things which means all debuggers would ignore this section. If accelerator tables were made from incomplete sources that will mean debuggers will have to question if .debug_names is complete or not.

It's "complete" in the sense that it's whatever GDB wanted and created a contract that GCC (& Clang/LLVM as best we know/have done so far) agree too, so GDB can rely on things being present/not present in some sense.

That's why debug_gnu_pubnames existed - because of, as you say, the underspecified nature of (or unreliable implementation of) debug_pubnames.

Perfect, that is what I was hoping for.

I believe debug_names semantics have been more strictly specified regarding what names will be there and what names will not.

Yes, it indeed fully specifies what is needed. Thanks for the info.

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

For the linking, I was more thinking of converting the DWP format back into vanilla DWARF with no DWO/DWP artifacts (skeleton DIEs etc) or any other redirections (string table offsets, address tables, etc). Not sure if this would reduce the DWARF size of all of the DWO/DWP instrstructure was all removed?

I mean, it'd be probably quite practical to extend it to be able to be done as a standalone/post-processing step.

That would be great. As accelerator tables evolve and get updated, it would be nice to have a path to update older accelerator tables to newer more complete versions.

For the sake of object and linked executable size, I suspect we might end up trying to move the indexing out of the object files/linked executable - perhaps putting the names either in a third file, or in the .dwo files and generating an index from them instead of from the objects. Not sure. Some further experimentation expected there.

That is fine too. We make a dwarf linking tool that can link all of the .dwo files back into a normal DWARF file like with dsymutil and .o files on mac.

That's already done - DWP ( https://gcc.gnu.org/wiki/DebugFissionDWP ). But for an interactive scenario over a distributed filesystem shipping the whole linked DWP file back to the user for access by the debugger is a bit unfortunate/slow. Good for archival purposes, as you say - but it's still a bit of an open question as to what to do for interactive users if object file/linked executable size is a major concern - either generating a third file with just the index parts (so building the full index can be done in parallel/distributed separately from the link) or doing an index-only link of the .dwo files (not super great - still means a bottleneck of shipping all the .dwo files to one build machine, etc)...

For the linking, I was more thinking of converting the DWP format back into vanilla DWARF with no DWO/DWP artifacts (skeleton DIEs etc) or any other redirections (string table offsets, address tables, etc). Not sure if this would reduce the DWARF size of all of the DWO/DWP instrstructure was all removed?

Ah, sorry, I see what you mean. Well, the offsets systems in some cases save total size even once linked (eg: having a single 64 bit address with two short uleb indexes compared to two 64 bit addresses - even once the binary is linked and there's no extra bytes of relocation records for that address/addresses) & these features are no longer Split DWARF specific - Clang/LLVM generates almost (entirely, maybe?) all these constructs the same way with split or unsplit DWARF when emitting DWARFv5.

Might save a bit - certainly something someone could experiment with as improvements to DWARFLinker - for dsymutil, lld, and maybe dwp in the future.

(one of the cheap ones that'd be neat to see would be debug_rnglists optimization - wouldn't need to parse DWARF DIEs, etc, would be an entirely local optimization - may be useful for PROPELLER (~= -fbasic-block-sections) - because of the indirection in debug_rnglists (though the rnglist header's offset table) you could totally rewrite range lists to coalesce any contiguous ranges that end up together & rewriting the offsets in the offset table - though this optimization would benefit from being in both lld and dsymutil I think - not applicable to dwp unless address information is passed from the linker like dsymutil)

avl added a comment.Wed, Mar 18, 5:09 AM

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

In D74169#1928711, @avl wrote:

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

Honestly I just do not know the answer, but it sounds to me that we can add them, and do
the refactoring you are proposing after whole functionality is done. (i.e. remove them later if it will be possible/reasonable).

MaskRay added a comment.EditedThu, Mar 19, 10:36 AM
In D74169#1928711, @avl wrote:

@ruiu, @grimar, @MaskRay

While working on "remove obsolete debug info in lld" patch
I need to generate the DWARF section contents. dsymutil
uses AsmPrinter to create DWARF. Thus to create things
in the same way, I need to add AsmPrinter library and targets libraries:

--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -59,6 +59,10 @@ add_lld_library(lldELF
   Option
   Passes
   Support
+  AllTargetsCodeGens
+  AllTargetsDescs
+  AllTargetsInfos
+  AsmPrinter

Would it be OK to add these libraries to lld code ?

If that is not OK, then alternative solution would be to refactor
DWARF generation code to remove dependence on AsmPrinter and friends.

Honestly I just do not know the answer, but it sounds to me that we can add them, and do
the refactoring you are proposing after whole functionality is done. (i.e. remove them later if it will be possible/reasonable).

Honestly I don't know, either. I echo most of comments on https://reviews.llvm.org/D74169#1880922 For -gsplit-dwarf or -gsplit-dwarf=single users, garbage collection of debug info can provide hardly any benefit. For non -gsplit-dwarf use cases, I am concerned whether the pros (avoid an intermediate file/avoid a post-processing step) overweight cons (time/memory/lld size). The link time/memory may increase significantly. I wonder whether the peak memory usage can increase to 2x given our experience https://reviews.llvm.org/D74773#1920751

I am also curious to learn how much + AllTargetsCodeGens + AllTargetsDescs + AllTargetsInfos + AsmPrinter can increase the size of lld executable in a -DBUILD_SHARED_LIBS=off build.

Doing a full DWARF rewriting can save intermediate files. But if the memory/time trade-off is not that appealing, a standalone tool (which can have more configuration options) may be more suitable. That said, we may need some prototype to verify these theories.

How much do we sacrifice in link speed/peak memory usage/lld executable size in return for X% of .debug_* size saving?

avl added a comment.Fri, Mar 20, 5:46 AM

I am also curious to learn how much + AllTargetsCodeGens + AllTargetsDescs + AllTargetsInfos + AsmPrinter can increase the size of lld executable in a -DBUILD_SHARED_LIBS=off build.

Currently, it has zero effect because of --gc-sections.

I asked about using AsmPrinter in lld to check whether there is a fundamental limitation for that.
So far, I understood that the answer depends on the final results of the implementation.

Doing a full DWARF rewriting can save intermediate files. But if the memory/time trade-off is not that appealing, a standalone tool (which can have more configuration options) may be more suitable. That said, we may need some prototype to verify these theories.

I am currently working on that prototype.

How much do we sacrifice in link speed/peak memory usage/lld executable size in return for X% of .debug_* size saving?

When the prototype is ready, it would be possible to evaluate memory/time results.

I did another prototype(not based on dsymutil, though with a very similar idea) D67469.
Its results are in llvm thread http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html.

avl updated this revision to Diff 254852.Fri, Apr 3, 10:43 AM
avl retitled this revision from [WIP][LLD][ELF][DebugInfo] Skeleton implementation of removing obsolete debug info. to [WIP][LLD][ELF][DebugInfo] Remove obsolete debug info..
avl edited the summary of this revision. (Show Details)

This version is a full patch implementing "remove obsolete debug info from lld" functionality.
As it was requested, It is presented as a full patch, which could be applied to the current code
base and evaluated. Some parts of that patch are already presented for review - D77169, D76085.
I am going to add tests for this patch and to address comments. Performance results would be presented in a separate message.

Limitations: it does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections.

Testing: it passes check-all lit testing. It is able to compile llvm code base. llvm-dwarfdump --verify shows zero errors for clang built by lld from this patch.

In D74169#1960135, @avl wrote:

This version is a full patch implementing "remove obsolete debug info from lld" functionality.
As it was requested, It is presented as a full patch, which could be applied to the current code
base and evaluated. Some parts of that patch are already presented for review - D77169, D76085.
I am going to add tests for this patch and to address comments. Performance results would be presented in a separate message.

Limitations: it does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections.

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

Testing: it passes check-all lit testing. It is able to compile llvm code base. llvm-dwarfdump --verify shows zero errors for clang built by lld from this patch.

Do you have memory usage/compile time numbers for linking various programs with/without this feature?

avl added a comment.Fri, Apr 3, 11:21 AM

Size/Performance/Memory results:

lld binary size with this patch(SHARED_LIBS=OFF) - 49925kb.
lld binary size without this patch(SHARED_LIBS=OFF) - 49699kb.

Performance results for building whole LLVM:

--------------------------------------------------------------------------
|   |       CL options                | LLVM bin size  | LLVM build time |   
--------------------------------------------------------------------------
| A | (default set of options*)       |  100.0%(18.0GB)|  100.0%(75m15s) |
|   |                                 |                |                 |
| B | A +gc-debuginfo(**) +NoOdr=false|    54%(9,7GB)  |  124.0%(93m33s) |
|   |                                 |                |                 |
| C | A +gc-debuginfo +NoOdr=false    |    43%(7,7GB)  |  107.0%(80m26s) |
|   |   +fdebug-types-section(***)    |                |                 |
--------------------------------------------------------------------------

Performance results of lld for building single clang binary:

---------------------------------------------------------------------------------
|   |       CL options             | Clang size   | link time | run-time memory |
---------------------------------------------------------------------------------
| A | (default set of options*)    | 100.0%(1,4GB)| 100%(49s) |   (100%)9.3GB   |
|   |                              |              |           |                 |
| B | A +gc-debuginfo +NoOdr=false | 56.0%(0.78GB)| 296%(145s)|   (153%)14.2GB  |
|   |                              |              |           |                 |
| C | A +gc-debuginfo +NoOdr=true  | 86.0%(1.2GB) | 380%(186s)|   (161%)15GB    |
|   |                              |              |           |                 |
| D | A +gc-debuginfo +NoOdr=false | 42.0%(0.59GB)| 133%(65s) |   (113%)10.5GB  |
|   |   +fdebug-types-section(***) |              |           |                 |
---------------------------------------------------------------------------------

(*)
LLVM_TARGETS_TO_BUILD=X86;AArch64
LLVM_ENABLE_PROJECTS=clang;lld
LLVM_ENABLE_LLD=ON
CMAKE_CXX_FLAGS=--ffunction-sections
CMAKE_C_FLAGS=--ffunction-sections
CMAKE_CXX_FLAGS_DEBUG=-g
CMAKE_C_FLAGS_DEBUG=-g
CMAKE_EXE_LINKER_FLAGS=--Wl,--gc-sections
CMAKE_MODULE_LINKER_FLAGS=-Wl,--gc-sections
CMAKE_SHARED_LINKER_FLAGS=-Wl,--gc-sections

(**)

  • gc-debuginfo - removes obsolete debug info. implemented by this patch.
  • NoOdr - mode of DWARFLinker which switches on/off types deduplication. there is no command line option to switch it on/off.
  • fdebug-types-section - deduplicates types using comdat sections.

(***) fdebug-types-section is not supported by dsymutil/DWARFLinker.
Thus presented results do not show real data.
But these numbers could be taken as an estimation for future results
when DWARF5/-fdebug-types-section would be supported by dsymutil/DWARFLinker.

(****) HW configuration:

OS Ubuntu 18.04
CPU Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
RAM 32018 MiB
Storage SSD

avl added a comment.Fri, Apr 3, 12:27 PM

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

it should but I did not check it yet. I would check and post results here.

In D74169#1960444, @avl wrote:

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

To the best of my knowledge (as the person who implemented type unit support in LLVM... ) type units never reduce the size of object files - they only increase it.

There is no type duplication, so far as I know, in LLVM's object files even without type units - one definition of the type is emitted and all the rest of the DWARF references that type. With type units that type is just the skeleton (DW_AT_declaration and DW_AT_signature), without type units that type is the full definition, without the indirection to the type unit.

Are there particular examples of type duplication within a single object file you have in mind? I'd be super curious to see them - might be some bugs that need fixing.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Bag of DWARF doesn't seem like it'd help in a situation where you know you're going to use a DWARF-aware linker (just put all the DWARF in the CU as normal & know the DWARF-aware linker will introduce cross-CU references as needed as things are deduplicated). The idea for Bag of DWARF was with a DWARF agnostic linker you could reference more than one DIE from in a type unit (or a compile unit) to reduce a bunch of the duplication/overhead that type units introduce (eg: when defining a member function of a type unit, you currently have to duplicate the member function's declaration in the CU that references the TU, because you can't refer to that subprogram declaration DIE in the TU - only the type itself).

Does it support multipe compile units in a single section in an object file (ala LTO/ThinLTO)?

it should but I did not check it yet. I would check and post results here.

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

(***) fdebug-types-section is not supported by dsymutil/DWARFLinker.
Thus presented results do not show real data.
But these numbers could be taken as an estimation for future results
when DWARF5/-fdebug-types-section would be supported by dsymutil/DWARFLinker.

Sorry, sort of duplicating the other thread of conversation we're having on this review - I don't believe type units would help dsymutil/DWARFLinker - nor that this data would be representative of potential improvements. Could you describe how these numbers would be representative of potential improvements? What benefit would type units provide to DWARF-aware linking?

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

Given this is built on top of the DWARFLinker used for llvm-dsymutil which, I believe, already strips out those duplicate descriptions - hopefully that's already included in the functionality under review? But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?

(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

In D74169#1960444, @avl wrote:

Not so much reason to support type units ("-fdebug-types-section/type units/.debug_types section/multiple .debug_info sections" all essentially describe the same thing). Since type units add object size overhead to reduced linked size in the face of a DWARF-agnostic/ignorant linker. If you have a DWARF aware linker, you'd probably avoid using type units so you could have smaller object files too.

It seems to me type units are useful for the current scheme. They increase the size of type reference(8 bytes vs 4 bytes) and introduce type unit header, but they avoid duplication. All references inside the object file point to the single type description - thus, object files have only one type's description instead of multiple copies. The effect of de-duplication is higher than the introduced increase.

.debug_types could be much more efficient if we stop duplicating the decl context just so we can refer to a type. Right now if you have a variable, it will point to a CU local type, which duplicates the containing namespaces and class/struct/union decl contexts. So the overhead of .debug_types is a bit higher. It would be great if we could just refer directly to the type from .debug_types in the DW_TAG_variable and other DIEs that have DW_AT_type attributes.

Also, .debug_types is only there for struct, unions, classes and a few other types. When you have a typedef like "std::string::size_type", or a typedef inside of the std::string, it may or may not appear in the .debug_types definition, but no one can point to those types because the way .debug_types is currently defined, only one type can be referred to within a type unit. The results is that any code wanting to use of one these types must duplicate the entire decl context (namespace "std", class "basic_string<>", typedef "size_type") just so a variable can point to the DWARF.

I mention this because I think letting --gc-debuginfo do its thing with ODR type uniquing leaves the DWARF in a better state: type info for one class isn't strewn across a bunch of .o files, or having large swaths of the type duplicated all over the place. So if we have type uniquing, I would keep .debug_types out of it.

We could move to make .debug_types more useful. Right now a type unit has a single type signature that people can refer to and a single offset within the type unit for the type that people will extract. If we got a DWARF specification change in, we could make it so a single type unit has N signatures and N offsets to types within the type unit. This would allow people to directly reference contained types that are not class, structs or unions, like typedefs. But that requires a DWARF spec change.

DWARF aware linker does not reduce the size of object files. It reduces the size of binary, whereas type units allow decreasing object file.

Are object files not actually bigger when using .debug_types? All the duplicated decl context junk in the DWARF is not efficient. I would be surprised if type units make .o files smaller. There is nothing to unique within a single .o file and the type units should only make things bigger.

I think DWARF aware linker could help to get even more size reduction. If instead of type units, there would be another structure - something like "bag of DWARF"(bag of types). You mentioned this idea in http://lists.llvm.org/pipermail/llvm-dev/2019-September/135262.html. "bag of DWARF" would allow to have single type description in object file for type and to have 4-bytes references. DWARF aware linker could merge "bag of DWARF" from various object files. (similar to the current process when linker merges comdat sections. But joining "bag of DWARF" is a more complicated process than selecting one section from the set of comdat sections)

Right now the ODR type uniquing does this, but it doesn't split the types out into their own compile units. The first time you see a type, you emit it and remember it. All other uses of this type and up using a DW_FORM_ref_addr which is 32 bits to refer to the first emitted type. So the ODR already does this. What it doesn't do is split types out into unique compile units based on their DW_AT_decl_file. If we did, you can slowly combine all DWARF that was defined in header files into DW_TAG_compile_unit DIEs with only types from those files. Then the DWARF would be really clean. The main issue with this approach is that it means you have to make all DWARF in memory to that things can lazily be added to each of these bags (compile units with types in them) since you might end up adding more types to each compile unit. With the current ODR support we can parse the .o files one at a time, and refer to things we have already emitted easily and we don't need to keep everything around so we can add things to the "bags".

The downside of the dsymutil ODR technique, is many times you types that have implicitly added methods (default constructor, assignment operators, move constructors) that are only present if the user used them in their source file. So you might end up with many definitions of "class A", but some have the copy constructor in the DWARF (it was created by the compiler for types where the compiler can create them) and others that don't. So we sometimes have to emit the type multiple times when you have a DW_TAG_subprogram that refers to these implicitly created functions. But overall the approach works well for how easy it is to implement.

I am all for creating compile units and adding all of the types declared in that file to them and having the DWARF be even cleaner though!

Nice! Another thing I have been seeing a lot of is when functions are coalesced and many .o files have the same function that gets linked to the same location. We end up with many duplicate DWARF DIEs in the final executable since the linker will just link all of the DWARF for each function over and over (.debug_info and .debug_line content). I found this when making GSYM files from the DWARF and noticing that I had hundreds of duplicate entries that matched exactly. This would be a great follow up patch for --gc-debuginfo

Given this is built on top of the DWARFLinker used for llvm-dsymutil which, I believe, already strips out those duplicate descriptions - hopefully that's already included in the functionality under review? But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?

(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

dsymutil's ODR does unique the types and only emit one copy. See my comment on implicit methods as one thing that can cause multiple copies of a type to be emitted.

avl added a comment.Fri, Apr 3, 2:13 PM

But good to check/ask - @avl does this patch already do the good things for duplicate (or dropped) function sections?
(eg: "inline void f1() { } void f2() { f1(); }" + "inline void f1() { } void f2(); int main() { f1(); f2(); }" compiled and linked at -O0 the DWARF would usually contain two descriptions of 'f1', but ideally with a DWARF aware linker it'd contain only one description of 'f1'. Similarly: "void f1() { } void f2() { }" + "void f2(); int main() { f2(); }" compiled and linked with -ffunction-sections -Wl,-gc-sections would normally contain a DWARF description of "f1" but no code for f1, and a DWARF aware linker would strip out the DWARF description of 'f1' here)

right. It generates one copy:

$cat test1.cpp

inline void f1() { } void f2(); int main() { f1(); f2(); }

$cat test2.cpp

inline void f1() { } void f2() { f1(); }

$ clang++ test1.cpp test2.cpp -O0 -g -ffunction-sections -fuse-ld=lld -Wl,--gc-sections; llvm-dwarfdump -a a.out | grep "\"f1\""
                DW_AT_name	("f1")
                DW_AT_name	("f1")
0x00000098: "f1"

$ clang++ test1.cpp test2.cpp -O0 -g -ffunction-sections -fuse-ld=lld -Wl,--gc-sections,--gc-debuginfo; llvm-dwarfdump -a a.out | grep "\"f1\""
                DW_AT_name	("f1")
0x0000008b: "f1"
avl added a comment.Sat, Apr 4, 9:08 AM

To the best of my knowledge (as the person who implemented type unit support in LLVM... ) type units never reduce the size of object files - they only increase it.

There is no type duplication, so far as I know, in LLVM's object files even without type units - one definition of the type is emitted and all the rest of the DWARF references that type. With type units that type is just the skeleton (DW_AT_declaration and DW_AT_signature), without type units that type is the full definition, without the indirection to the type unit.

Are there particular examples of type duplication within a single object file you have in mind? I'd be super curious to see them - might be some bugs that need fixing.

Sorry. I need to stop writing letters in the late evening.
type units make object files bigger. Thus, type units could not be used for reducing object files.

Mainly I am talking about two things:

1 -fdebug-types-section allows processing debug info faster.

dsymutil/DWARFLinker performance heavily depends on the size of input DWARF.
The more DWARF to parse, the more time the linking process would take.
Time to parse whole DWARF with ODR type deduplication(+gc-debuginfo +NoOdr=false) for clang is:

296%(145s)

Resulting DWARF is accurate: no type duplications, no extra bytes needed for type units. But it's processing time: 145s.

Time to parse in case -fdebug-types-section for clang is:
(most of types are in .debug_types section and ignored currently, thus
only .debug_info/.debug_line/.debug_ranges/.debug_loc are parsed)

133%(65s)

This difference is because not necessary to parse type information.
Comdat sections are quickly resolved by the linker based on section id.
Comparing hash id instead of analyzing context makes it work much faster.

So separating type info from the whole DWARF allows to noticeable speedup the debug info linking.
DWARFLinker will speedup debug info processing in the current scheme by supporting type units(-fdebug-types-section).
It looks like, linking time would be close to 133%(65s).

2 Something like "bag of dies" could avoid increasing size of data(created by type units).

This idea is not for the current state. This is a possible DWARF format change.

The linker processes comdat sections with type units quickly because it does not parse contents.
It uses the hash id to compare and either keep either decline the section.
Adding hash id makes processing faster, but it requires type units.

To reduce size of type units it is possible to implement "bag of dies" idea.
Something like this https://reviews.llvm.org/P8164 (global type table/bag of dies - means something similar).
But how "bag of dies" from different object files would be merged ? When single type is put in single
section - it could be merged per type basis. The similar approach for "bag of dies"
would introduce type duplication(the same types could be used in various "bag of dies").

DWARFLinker could be used for merging "bag of dies" instead of lld linker.
It would merge incoming "bag of dies" into single output "bag of dies".
To avoid heavy parsing and context analyzing DWARFLinker
could use solution similar to comdat solution - whether current die should be put into
resulting "bag of dies" would be decided by compare hash id(DW_AT_signature).

It looks like this solution would allow having a minimal size and fast processing.

Bag of DWARF doesn't seem like it'd help in a situation where you know you're going to use a DWARF-aware linker (just put all the DWARF in the CU as normal & know the DWARF-aware linker will introduce cross-CU references as needed as things are deduplicated). The idea for Bag of DWARF was with a DWARF agnostic linker you could reference more than one DIE from in a type unit (or a compile unit) to reduce a bunch of the duplication/overhead that type units introduce (eg: when defining a member function of a type unit, you currently have to duplicate the member function's declaration in the CU that references the TU, because you can't refer to that subprogram declaration DIE in the TU - only the type itself).

right, DWARF-aware linker doing de-duplication using ODR does not need "Bag of DWARF".
But doing de-duplication using ODR works slower than doing de-duplication for types marked with hash.
To add hash to the type it is necessary to put it into "Bag of DWARF".

It seems that Bag of DWARF used with DWARF agnostic linker would have a problem of type duplication.
"Bag of DWARF" from several object files could have duplicated types.
I think DWARF-aware linker is necessary to deduplicate "Bags of DWARF".

We could move to make .debug_types more useful. Right now a type unit has a single type signature that people can refer to and a single offset within the type unit for the type that people will extract. If we got a DWARF specification change in, we could make it so a single type unit has N signatures and N offsets to types within the type unit. This would allow people to directly reference contained types that are not class, structs or unions, like typedefs. But that requires a DWARF spec change.

yes. that is exactly what I am talking about. using type hash with elements of that type unit(bag of dies) would allow to speedup types de-duplication.

Rather than replying point-by-point, since I think the conversation has got a bit jumbled up - and maybe it's all best left to a separate thread than in this review... but:

The initial idea for bag-of-dwarf was to allow a type unit to expose more than one DIE (so CUs could reference nested types, member functions, parameters, etc). Also potentially to let CUs export entities that are known to have one home & that other DWARF might want to reference (this is a bit more of a stretch - but you could imagine a type with a strong vtable, no point putting it in a type unit, so you put it in the CU - but you could, if you wanted, still expose the type and its member functions as referenceable DIEs that could be referenced from other CUs rather than by having to redeclare (though not define) the entity in that other CU - inter-object-file-DIE references).

I see your point somewhat about how type units could integrate with DWARF-aware linking, but perhaps only in a very specific way - At least as LLVM emits type units, the types should always be identical (the type unit should never contain implicit special members or member function template instantiations, etc) & the variable component of a type (those implicit special members, etc) only ever appear on the declaration of the type that references to the type unit. If you knew that was the kind of input you were getting, you could rely on every type unit being identical (well, with GCC you can rely on each type unit being identical, but the same ODR type might be in multiple distinct type units because it uses exact hashing of the type unit contents, and the contents varies with implicit special members, etc)... Anyway, if you could rely on the type units for a given ODR type being identical or at least equivalent (no implicit special members, etc) - you could just use the first one you saw, drop the rest, and any new members you see can just be attached to a type declaration the way Clang does them today... hmm - I guess that'd work with GCC's output too, just ignoring any duplicate type units even though they aren't quite equivalent - they'd be no worse than Clang's type units. (Clang's never contain any of the special members, GCC's contain the used one - so if your type unit ended up being an arbitrarily chosen one from GCC, it wouldn't be any worse than CLang's - it'd just have some, but not all, the special members, member function templates, etc)

Hmm, I guess no reason you couldn't do this without type units though - maybe - depending on how debuggers handle it. Maybe they aren't good with member function declarations in type declarations if those type declarations aren't referencing type units....

eg: existing debug info with type units can look something like this:

DW_TAG_compile_unit
  DW_TAG_structure_type // DIE: t1.1
    DW_AT_name "t1"
    DW_AT_signature 0x42
    DW_AT_declaration true
  DW_TAG_variable
    DW_AT_type // -> DIE t1.1
DW_TAG_compile_unit
  DW_TAG_structure_type // DIE: t1.2
    DW_AT_name "t1"
    DW_AT_signature 0x42
    DW_AT_declaration true
    DW_TAG_subprogram // DIE: t1::t1
      DW_AT_name "t1"
      DW_AT_declaration
      DW_AT_artificial true
  DW_TAG_subprogram
    DW_AT_specification // -> DIE t1::t1
    // etc... 
  DW_TAG_variable
    DW_AT_type // -> DIE t1.2

type unit, signature 0x42, type offset -> DIE t1.tu
DW_TAG_type_unit
  DW_TAG_structure_type // DIE: t1.tu
    DW_AT_name "t1"
    // line/file/column/size/etc... 
    // let's say there are no explicit members, for the sake of simplicity


Then, without type units you could emit something very similar
DW_TAG_compile_unit
  DW_TAG_structure_type // DIE: t1.1
    DW_AT_name "t1"
    // line/file/column/size/etc... 
  DW_TAG_variable
    DW_AT_type // -> DIE t1.1
DW_TAG_compile_unit
  DW_TAG_structure_type // DIE: t1.2
    DW_AT_name "t1"
    DW_AT_signature 0x42
    DW_AT_declaration true
    DW_TAG_subprogram // DIE: t1::t1
      DW_AT_name "t1"
      DW_AT_declaration
      DW_AT_artificial true
  DW_TAG_subprogram
    DW_AT_specification // -> DIE t1::t1
    // etc... 
  DW_TAG_variable
    DW_AT_type // -> DIE t1.2

ie: don't go back and try to make t1.1 complete/merge all the contents, just tack things on to CU-local declarations whenever you come across a new member that has a definition in this translation unit. (ie: when the linker keeps the code for t1::t1 alive, then keep the subprogram definition alive, which keeps the subprogram declaration alive, which keeps the t1.2 declaration alive - but otherwise skip it and refer to t1.1 directly (or, in a first approximation, just always keep the declaration - but I guess dsymutil already has the smarts to strip the local type definition out entirely (does it use a declaration to reduce encoding length if there's lots of references to a type? Probably not, I guess))

This sort of complex design discussion probably merits some in person or video chat discussions or at least lengthy design discussion on llvm-dev.

One drive-by request/question: have you done any measurements of the impact on debuggers? In other words, does it improve/make worse the time it takes them to load the debug data?

avl added a comment.Mon, Apr 6, 4:25 AM

@dblaikie

This sort of complex design discussion probably merits some in person or video chat discussions or at least lengthy design discussion on llvm-dev.

Agreed. I think it would be better to dicuss it through email and extend it to llvm-dev later. I would start that email thread.

avl added a comment.Mon, Apr 6, 4:26 AM

One drive-by request/question: have you done any measurements of the impact on debuggers? In other words, does it improve/make worse the time it takes them to load the debug data?

no, I have not done yet. I would do and post results here.

grimar added a comment.Mon, Apr 6, 4:46 AM

I've put a few first general comments and suggestions inline.

side note: I've noticed you have a build time 75m15s, what looks to me too much for your configuration. Do you know about "-DLLVM_OPTIMIZED_TABLEGEN=1"? It should give an instant significant boost.

lld/ELF/DWARF.cpp
35 ↗(On Diff #254852)

Doesn't seem you really need to know the number of sections?
Hence you can use DenseSet<StringRef> seen; (btw, 'seen' is the common name for sets used in a few similar situations in LLD code).

63 ↗(On Diff #254852)

sec.Name.size() > 0 -> !sec.empty()

161 ↗(On Diff #254852)

Comments should be started from an uppercase letter: check -> Check.
(applies here and for other comments)

166 ↗(On Diff #254852)

Seems this assumes that relocations are sorted by r_offset? This is probably a wrong assumpting.
We have the code, for example, which uses the same thing (and perhaps could be reused somehow, not sure):

// Returns the index of the first relocation that points to a region between
// Begin and Begin+Size.
template <class IntTy, class RelTy>
static unsigned getReloc(IntTy begin, IntTy size, const ArrayRef<RelTy> &rels,
                         unsigned &relocI) {
  // Start search from RelocI for fast access. That works because the
  // relocations are sorted in .eh_frame.
  for (unsigned n = rels.size(); relocI < n; ++relocI) {
    const RelTy &rel = rels[relocI];
    if (rel.r_offset < begin)
      continue;

But it says that relocations are sorted in .eh_frame and I believe it is true,
but generally there is no requirement in ELF specification about that relocations must be sorted by offset I think.

183 ↗(On Diff #254852)

Why exactly R_ABS/is it important? (Perhaps needs a comment).

lld/ELF/DWARF.h
84 ↗(On Diff #254852)

Where it is used? I\ve not found it in this patch.

122 ↗(On Diff #254852)

Is it used somewhere outside the template <class ELFT> LLDDwarfObj<ELFT>::LLDDwarfObj(ObjFile<ELFT> *obj) ?
Can it be a local variable?

lld/ELF/Driver.cpp
347

You do not need to have curly bracers around a single code line (the rule to look around to see the style often works good).

lld/ELF/InputSection.h
127 ↗(On Diff #254852)

Why it is needed to store this bit? The code calls isDebugSection, is there something wrong with the current approach?

lld/ELF/LLDDwarfLinker.cpp
59

It should be InputSectionBase.

60

I'd suggest something shorter, like secToIndex

65

Prefer ++foo over foo++.

66

btw, what is so wrong with the following version?

for (InputSectionBase *sec : objFile->getSections())
  sectionIndexesMap[sec] = curIdx++;

(i.e. I probably see no issue with the case when sec == nullptr, as the code below handes this case anyways).

75

Continue early?

if (!sec || !sec->isLive() || !sectionIndexesMap.count(sec))
  continue;
104

We often call such things ret. Up to you.

141

This could be a single line, no need toadd an addend later.

144

Why this assert is useful? What scenario did you have in mind?

146

I think you can just inline Value.

194

An excessive empty line.

195

This comment needs expanding I think. "Estimation" is something that usually sounds tricky and requires an explanation of your algorithm implemented.

197

Don't use auto when the type is not obvious.

200

No need to have curly bracers here.

231

foo.size() > 0 -> !foo.empty() when foo is StringRef

257

emptyWarnings is unused. Is it OK? You can inline it if so.

267

Map what to what?

272

The way you work with Expected<> is incorrect. It might fail on error with LLVM_ENABLE_ABI_BREAKING_CHECKS=1.
You need to handle the error or to ignore it explicitly. E.g.:

Expected<GlobPattern> pat = GlobPattern::create(arg);
if (!pat) {
  error("--undefined-glob: " + toString(pat.takeError()));
  return;
}
284

Don't use auto here.

293

StringRef() -> ""

But I think it can be just:

for (auto &sec : outputSections)
  if (isDebugSection(sec->name))
    sec->setDebugData(sectionsMap.lookup(sec->name));
lld/ELF/Options.td
188

When you add an LLD option, you should also update the documentation (lld/docs/ld.lld.1)

lld/ELF/OutputSections.cpp
314 ↗(On Diff #254852)

Needs a comment.

lld/ELF/OutputSections.h
129 ↗(On Diff #254852)

Did you consider to use llvm::Optional<StringRef>?

llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
456 ↗(On Diff #254852)

You do not need to use else after return:

if (Map.getTriple().isOSDarwin() && !Map.getBinaryPath().empty() &&
    Options.FileType == OutputFileType::Object)
  return MachOUtils::generateDsymCompanion(
      Map, Options.Translator, *Streamer->getAsmPrinter().OutStreamer,
      OutFile);
return Streamer->finish(), true;
457 ↗(On Diff #254852)

Something is wrong with this line. , true is a part of something else it seems.

avl marked 32 inline comments as done.Mon, Apr 6, 10:33 AM
avl added inline comments.
lld/ELF/DWARF.cpp
35 ↗(On Diff #254852)

There should be three states here(seen, not seen, seen more than once). DWARFFormValue::dumpAddressSection() needs this to properly dump die address. If name seen more than once it prints section index.

// Print section index if name is not unique.
if (!SecRef.IsNameUnique)
  OS << format(" [%" PRIu64 "]", SectionIndex);
166 ↗(On Diff #254852)

right. It assumes that relocations are sorted by r_offset.
It is sorted in scanRelocs(). But I missed that it is sorted not for all types of sections.
Would fix it , Thanks!

lld/ELF/DWARF.h
84 ↗(On Diff #254852)

It is used at DWARFFormValue::dumpAddressSection(). When address is dumped, DWARFObject could be asked for section names(LLDDwarfObj is a child of DWARFObject). It could happen when Verbose mode is set for DWARFLinker.

122 ↗(On Diff #254852)

Yes, it could be used outside of LLDDwarfObj() constructor. When DWARFObject::getSectionNames() is called the sectionNames is used.
One such place is DWARFFormValue::dumpAddressSection().

lld/ELF/InputSection.h
127 ↗(On Diff #254852)

To not do string compare in every usage. i.e. that bit is set in the constructor and at every usage place is checked. Otherwise, it would be necessary to compare strings at every usage place.

lld/ELF/LLDDwarfLinker.cpp
66

The idea was to not put error data into sectionIndexesMap. sec is indeed checked later for nullptr.
But it looks like not putting wrong data is better approach than relying on following checks.

144

To check that relocation passed to the handler is from baseOffset, baseOffset + data.size() range.
i.e. to check that enumerateRelocations works correctly and pass proper relocations to the handler.

257

it is used only in std::make_unique<DwarfFile>.

If emptyWarnings is used inline then it`s constructor would be called for each ObjectFile. Probably we could rely on compiler here...

267

Would add comment.

grimar added a comment.Tue, Apr 7, 3:20 AM

Few more comments. They are mostly to give a feedback about the code style, I have not look deep into how it works yet.

lld/ELF/DWARF.cpp
29 ↗(On Diff #254852)

I'd call it sectionNumber, having Map in the name here is not useful.

32 ↗(On Diff #254852)

To avoid nesting, you can use an early continue:

if (!sec) {
  sectionNames.push_back({"", true});
  continue;
}
...
35 ↗(On Diff #254852)

Ah, sorry, I've misread the code slightly, I did not notice the codition is sectionAmountMap[sec.Name] > 1 and not sectionAmountMap[sec.Name] > 0.

63 ↗(On Diff #254852)

It is better to avoid using operator[] in situations like this, because
it is not const, when a sec.Name is not found it creates a new map entry. This is not what you want here,
you can use lookup().

lld/ELF/InputSection.h
127 ↗(On Diff #254852)

Yeah, but did you measure numbers, i.e. is there a visible benefit? The code in this diff still calls isDebugSection in many places. It looks like a premature optimization to me and/or something that could probably be moved to another patch (which could change all call sites at once).

lld/ELF/LLDDwarfLinker.cpp
207

I think the returned type of this lambda can be deduced:

[](StringRef S) { return S; }

219

This 2 lambdas looks similar. Also you have 2 more below.
It seems you could do something like the following?

auto ReportWarn = [] [&](const Twine &msg, StringRef ctx, const DWARFDie *) {
if (!context.empty())
  warn(ctx+ ": " + msg);
else
  warn(msg);
};

DwarfStreamer outStreamer(..., ReportWarn, ReportWarn);
257

I'd inline to make the code shorter/cleaner. It doesn't look as hot path, so no need to optimize it until a benefit is proven.