This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Fix PR51087 Extraneous enum record in DWARF with type units
AbandonedPublic

Authored by Orlando on Dec 8 2021, 1:59 AM.

Details

Summary

Fixes https://llvm.org/PR51087: Extraneous enum record in DWARF with type units.

As explained in PR51087 we sometimes get skeleton DIEs for enums in a Dwarf
Compile Unit (CU) that are not referenced from any CU and are already described
by a type unit.

Types for enums are emitted whether used or not, all together before most types
in the CU. Mechanically, the extraneous CU records are generated because the
enum types are generated with a call to CU->getOrCreateTypeDIE. This function
will recursively get-or-create the parent DIE (in the CU) and the type unit for
each. We don't need the CU-side DIEs if the type units are sucesfully
emitted. Fix by only emitting the type units for enums if possible, falling back
to a call to getOrCreateTypeDIE if not.

Diff Detail

Unit TestsFailed

Event Timeline

Orlando created this revision.Dec 8 2021, 1:59 AM
Orlando requested review of this revision.Dec 8 2021, 2:00 AM

More info about the fix:
I've moved most of the body of addDwarfTypeUnitType into a new function getOrCreateDwarfTypeUnit, which addDwarfTypeUnitType now wraps. getOrCreateDwarfTypeUnit only gets or creates a type unit, and addDwarfTypeUnitType maintains its previous behaviour which is: get or create a type unit and add the info to a DIE (passed in as a parameter) if successful, otherwise construct the type in that DIE directly.

With the patch we call getOrCreateDwarfTypeUnit for each enum, which generates a type unit without adding a skeleton DIE to the CU. If the call fails because a type unit cannot be emitted then we fall back to the old behaviour. If a CU DIE is needed later on it is created via getOrCreateTypeDIE (unchanged with the patch) and the (unchanged) call to addDwarfTypeUnitType will find the already-created type unit signature to add to the DIE.

Possible future improvement:
I think there could be some further work to reduce redundancy with the retained types (see the loop just below the getEnumTypes loop). Firstly, retained types are not actually referenced in the CU, so I don't think there is any point generating a DW_TAG_pointer_type DIE in the CU with skeleton DIEs referencing a type unit for a type. If a type unit is emitted for the pointed-to retained type we could probably skip the CU emission entirely? Secondly, applying the same same fix as for enums (try to emit only the type unit) could remove additional redundancy, more so when coupled with the previous point.

More info about the fix:
I've moved most of the body of addDwarfTypeUnitType into a new function getOrCreateDwarfTypeUnit, which addDwarfTypeUnitType now wraps. getOrCreateDwarfTypeUnit only gets or creates a type unit, and addDwarfTypeUnitType maintains its previous behaviour which is: get or create a type unit and add the info to a DIE (passed in as a parameter) if successful, otherwise construct the type in that DIE directly.

With the patch we call getOrCreateDwarfTypeUnit for each enum, which generates a type unit without adding a skeleton DIE to the CU. If the call fails because a type unit cannot be emitted then we fall back to the old behaviour. If a CU DIE is needed later on it is created via getOrCreateTypeDIE (unchanged with the patch) and the (unchanged) call to addDwarfTypeUnitType will find the already-created type unit signature to add to the DIE.

Possible future improvement:
I think there could be some further work to reduce redundancy with the retained types (see the loop just below the getEnumTypes loop). Firstly, retained types are not actually referenced in the CU, so I don't think there is any point generating a DW_TAG_pointer_type DIE in the CU with skeleton DIEs referencing a type unit for a type. If a type unit is emitted for the pointed-to retained type we could probably skip the CU emission entirely? Secondly, applying the same same fix as for enums (try to emit only the type unit) could remove additional redundancy, more so when coupled with the previous point.

under what conditions do we get this redundant pointer type in the CU you're referring to here?

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1452–1465

Should we do the same thing for non-enum types too?

Thanks for taking a look at this.

under what conditions do we get this redundant pointer type in the CU you're referring to here?

Here's an example with target x86_64-unknown-linux-gnu using clang built at 51dc466642c5.

$ cat test.cpp
struct O { struct S { int x; } y; };
int f(void *p) {
  return static_cast<O::S*>(p)->x;
}

$ clang++ test.cpp -c -gdwarf-5 -fdebug-types-section -o - | llvm-dwarfdump - -o-
<snip>
0x0000003f:   DW_TAG_pointer_type
                DW_AT_type	(0x0000004d "O::S")

0x00000044:   DW_TAG_structure_type
                DW_AT_declaration	(true)
                DW_AT_signature	(0x9ff533511b48ced2)

0x0000004d:     DW_TAG_structure_type
                  DW_AT_declaration	(true)
                  DW_AT_signature	(0x7b36f109e857d200)
<snip>

DIE 0x0000003f is not referenced in the CU. The CU DIE is generated because it's a "retained type" (in CUNode->getRetainedTypes()). It's still emitted if even when we apply the fix in this patch to retained types because DW_TAG_pointer_type types don't get type units (in DwarfUnit::createTypeDIE(const DIEScope*, DIE&, const DIType*). So, my thinking was that consumers probably don't need the DW_TAG_pointer_type DIE to interpret pointers of the derived type; as long as we have the derived type DIE (in a type unit) then that should be enough? I don't have 100% confidence in that assumption though - what do you think?

I'm not sure how often this actually comes up in practice - I only noticed it when writing a test for applying this change to "retained types" too (the part that you have also highlighted with an inline comment). I am not sure of all the ways that a type becomes a "retained type"; the source above is inspired by llvm/test/DebugInfo/COFF/retained-types.ll. I decided to leave the retained types alone in this patch because I felt I didn't fully understand what causes them, what they look like "on average", and how many useless DIEs we really emit because of them. i.e. if retained types are always or mostly pointers then we'd need to apply the change described above (if it's possible) first to see any wins. Do you have any insight on this? Otherwise I can look into this further.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1452–1465

I've responded to this in the main comment thread - SGTM but I have some questions about retained types (which I'm happy to investigate).

dblaikie accepted this revision.Dec 13 2021, 4:29 PM

Thanks for taking a look at this.

under what conditions do we get this redundant pointer type in the CU you're referring to here?

Here's an example with target x86_64-unknown-linux-gnu using clang built at 51dc466642c5.

$ cat test.cpp
struct O { struct S { int x; } y; };
int f(void *p) {
  return static_cast<O::S*>(p)->x;
}

$ clang++ test.cpp -c -gdwarf-5 -fdebug-types-section -o - | llvm-dwarfdump - -o-
<snip>
0x0000003f:   DW_TAG_pointer_type
                DW_AT_type	(0x0000004d "O::S")

0x00000044:   DW_TAG_structure_type
                DW_AT_declaration	(true)
                DW_AT_signature	(0x9ff533511b48ced2)

0x0000004d:     DW_TAG_structure_type
                  DW_AT_declaration	(true)
                  DW_AT_signature	(0x7b36f109e857d200)
<snip>

DIE 0x0000003f is not referenced in the CU. The CU DIE is generated because it's a "retained type" (in CUNode->getRetainedTypes()). It's still emitted if even when we apply the fix in this patch to retained types because DW_TAG_pointer_type types don't get type units (in DwarfUnit::createTypeDIE(const DIEScope*, DIE&, const DIType*). So, my thinking was that consumers probably don't need the DW_TAG_pointer_type DIE to interpret pointers of the derived type; as long as we have the derived type DIE (in a type unit) then that should be enough? I don't have 100% confidence in that assumption though - what do you think?

Yep, that sounds right to me.

So we retain types that we know are used in ways that aren't otherwise tied to the rest of the DWARF graph - if you think about DWARF nodes/DIEs as a reachability graph, starting from functions and variables. Those functions and variables exist in the llvm::Module and reference the DI* description of the function or variable - those DI* descriptions reference types, types reference other types, etc. But if there's never a function or variable that references a type it might be skipped entirely - but in some syntactic constructs/cases, it's important that it isn't skipped - one of the canonical examples of that is where the type is only ever used within an expression and never as the type of a used and named variable.

We probably should only put user defined types in the retained types list - but that would require decomposing non-user-defined types in the frontend (eg: if you have a "void (*)(T1, T2)" type retained - we only need to retain T1 and T2, not the function pointer type wrapped around it (it's reasonable to expect a DWARF consumer to be able to build any non-user-defined type - even ones that that never appeared in the source program (so maybe a user ends up needing "T1 (*)(T2)" in their debugging context, the DWARF consumer would be able to create that - there's no need for us to encode that type or any other, so long as we provide T1 and T2)

But I guess for now we could apply the same logic to enums and to retained types, with one minor change to use dyn_cast instead of cast, in the retained types version - so if it is a top level user defined type in the retained types list, it'd skip the skeleton type DIE in that case. In other cases it'd still end up with the unnecessary poinetr type, function type, whatever.

I'm not sure how often this actually comes up in practice - I only noticed it when writing a test for applying this change to "retained types" too (the part that you have also highlighted with an inline comment). I am not sure of all the ways that a type becomes a "retained type"; the source above is inspired by llvm/test/DebugInfo/COFF/retained-types.ll. I decided to leave the retained types alone in this patch because I felt I didn't fully understand what causes them, what they look like "on average", and how many useless DIEs we really emit because of them. i.e. if retained types are always or mostly pointers then we'd need to apply the change described above (if it's possible) first to see any wins. Do you have any insight on this? Otherwise I can look into this further.

This revision is now accepted and ready to land.Dec 13 2021, 4:29 PM
Orlando updated this revision to Diff 394522.EditedDec 15 2021, 5:49 AM

Thanks for that info on retained types.

We probably should only put user defined types in the retained types list - but that would require decomposing non-user-defined types in the frontend (eg: if you have a "void (*)(T1, T2)" type retained - we only need to retain T1 and T2, not the function pointer type wrapped around it (it's reasonable to expect a DWARF consumer to be able to build any non-user-defined type - even ones that that never appeared in the source program (so maybe a user ends up needing "T1 (*)(T2)" in their debugging context, the DWARF consumer would be able to create that - there's no need for us to encode that type or any other, so long as we provide T1 and T2)

Ok great, I'll file a bug for this shortly.

But I guess for now we could apply the same logic to enums and to retained types, with one minor change to use dyn_cast instead of cast, in the retained types version - so if it is a top level user defined type in the retained types list, it'd skip the skeleton type DIE in that case. In other cases it'd still end up with the unnecessary poinetr type, function type, whatever.

I made this change (does this still look okay to land?) and I wanted to add a test for the retained types too so I wrote this:

$ cat test.cpp
struct Int {
  int i;
  Int() : i(0) {}
  operator int() const { return i; }
};
int f() { return Int(); }

Struct Int is indeed a retained type but even with this latest patch Int gets a CU DIE:

0x0000003b:   DW_TAG_structure_type
                DW_AT_declaration	(true)
                DW_AT_signature	(0x9366bdce481c7390)

Which is only referenced from within its children, e.g. for the overloaded int conversion:

0x0000003b:   DW_TAG_structure_type
                DW_AT_declaration	(true)
                DW_AT_signature	(0x9366bdce481c7390)
<snip>
0x0000008c:   DW_TAG_subprogram
                DW_AT_low_pc	(0x0000000000000000)
                DW_AT_high_pc	(0x0000000000000010)
                DW_AT_frame_base	(DW_OP_reg6 RBP)
                DW_AT_object_pointer	(0x0000009c)
                DW_AT_specification	(0x0000004e "_ZNK3IntcviEv")

0x0000009c:     DW_TAG_formal_parameter
                  DW_AT_location	(DW_OP_fbreg -8)
                  DW_AT_name	("this")
                  DW_AT_type	(0x000000ab "const Int *") # <-- References Int through DW_TAG_pointer_type(DW_TAG_const_type(Int))
                  DW_AT_artificial	(true)
<snip>
0x00000087:   DW_TAG_const_type
                DW_AT_type	(0x0000003b "Int")
<snip>
0x000000ab:   DW_TAG_pointer_type
                DW_AT_type	(0x00000087 "const Int")

These children that are included in the CU (above) are also duplicated in the TU (i.e. we get a full DW_TAG_subprogram DIE in both places). This sounds like a bug, or at least that my implementation needs further work to cover all cases for retained types.

I couldn't generate a test for retained types from source by hand or reduction (from CTMark and clang-3.4) that showed this patch working in action.

I'm happy to land either version of this patch - if we go with it in its current state I can file a(nother) bug for improving the retained types situation.

EDIT: Fixed dwarfdump example.

Thanks for that info on retained types.

We probably should only put user defined types in the retained types list - but that would require decomposing non-user-defined types in the frontend (eg: if you have a "void (*)(T1, T2)" type retained - we only need to retain T1 and T2, not the function pointer type wrapped around it (it's reasonable to expect a DWARF consumer to be able to build any non-user-defined type - even ones that that never appeared in the source program (so maybe a user ends up needing "T1 (*)(T2)" in their debugging context, the DWARF consumer would be able to create that - there's no need for us to encode that type or any other, so long as we provide T1 and T2)

Ok great, I'll file a bug for this shortly.

But I guess for now we could apply the same logic to enums and to retained types, with one minor change to use dyn_cast instead of cast, in the retained types version - so if it is a top level user defined type in the retained types list, it'd skip the skeleton type DIE in that case. In other cases it'd still end up with the unnecessary poinetr type, function type, whatever.

I made this change (does this still look okay to land?) and I wanted to add a test for the retained types too so I wrote this:

$ cat test.cpp
struct Int {
  int i;
  Int() : i(0) {}
  operator int() const { return i; }
};
int f() { return Int(); }

Struct Int is indeed a retained type but even with this latest patch Int gets a CU DIE:

0x0000003b:   DW_TAG_structure_type
                DW_AT_declaration	(true)
                DW_AT_signature	(0x9366bdce481c7390)

Which is only referenced from within its children, e.g. for the overloaded int conversion:

0x0000003b:   DW_TAG_structure_type
                DW_AT_declaration	(true)
                DW_AT_signature	(0x9366bdce481c7390)
<snip>
0x0000008c:   DW_TAG_subprogram
                DW_AT_low_pc	(0x0000000000000000)
                DW_AT_high_pc	(0x0000000000000010)
                DW_AT_frame_base	(DW_OP_reg6 RBP)
                DW_AT_object_pointer	(0x0000009c)
                DW_AT_specification	(0x0000004e "_ZNK3IntcviEv")

0x0000009c:     DW_TAG_formal_parameter
                  DW_AT_location	(DW_OP_fbreg -8)
                  DW_AT_name	("this")
                  DW_AT_type	(0x000000ab "const Int *") # <-- References Int through DW_TAG_pointer_type(DW_TAG_const_type(Int))
                  DW_AT_artificial	(true)
<snip>
0x00000087:   DW_TAG_const_type
                DW_AT_type	(0x0000003b "Int")
<snip>
0x000000ab:   DW_TAG_pointer_type
                DW_AT_type	(0x00000087 "const Int")

These children that are included in the CU (above) are also duplicated in the TU (i.e. we get a full DW_TAG_subprogram DIE in both places). This sounds like a bug, or at least that my implementation needs further work to cover all cases for retained types.

I couldn't generate a test for retained types from source by hand or reduction (from CTMark and clang-3.4) that showed this patch working in action.

I'm happy to land either version of this patch - if we go with it in its current state I can file a(nother) bug for improving the retained types situation.

EDIT: Fixed dwarfdump example.

So in this example the result is unchanged despite the code change to the retained types handling? (or it does change, but not enough?)

Would a simpler struct (struct x { }; x y;) show a positive change (remove the skeleton type DIE from the CU)? That'd be enough of an incentive now. Further value might need follow-up patches.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1455–1461

This could be pulled into a function or lambda and reused across the enum and retained types handling? (either just using the dyn_cast version for both, or keeping the dyn_cast/cast distinction between the two and everything inside/after that could be shared - I'd probably just keep the dyn_cast version and call that from both places. It could return a bool to indicate whether the thing has been handled or not.)

Orlando updated this revision to Diff 394885.Dec 16 2021, 8:19 AM
Orlando marked an inline comment as done.

+ Factored shared code into createMaybeUnusedType
+ Added retained types test llvm/test/DebugInfo/X86/type-units-unused-type.mir

So in this example the result is unchanged despite the code change to the retained types handling? (or it does change, but not enough?)

Yeah that's right, the result is unchanged unfortunately.

Would a simpler struct (struct x { }; x y;) show a positive change (remove the skeleton type DIE from the CU)? That'd be enough of an incentive now. Further value might need follow-up patches.

I managed to get a very simple test case from source (just struct X {};) working in the end by compiling with -Xclang -debug-info-kind=unused-types.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1455–1461

I opted for putting everything inside the new function, including the call to getOrCreateTypeDIE because it felt more clear; having a third wrapper around addDwarfTypeUnitType (e.g. calling and returning value of getOrCreateDwarfTypeUnit else returning false) looked a bit odd. Does this look okay to you?

Orlando updated this revision to Diff 394889.Dec 16 2021, 8:29 AM

+ clang-format

dblaikie added inline comments.Dec 16 2021, 11:40 AM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1394

Drop the extra semicolon here

llvm/test/DebugInfo/X86/type-units-used-enum.mir
2

I think we'd usually test this with LLVM IR, rather than mir - bit simpler in some ways, at least? (similar feedback on the other tests)
Maybe could test all 3 cases in one test without it being too complicated? (process creation overhead can be significant on some platforms (Windows) so we tend away from making tests /too/ fine grained... which is pretty subjective, admittedly)

Orlando updated this revision to Diff 395011.Dec 16 2021, 3:02 PM
Orlando marked 2 inline comments as done.

+ Remove unnecessary semi-colon
+ Slightly adjust createMaybeUnusedType's comment
+ Merge the 3 tests into one

dblaikie accepted this revision.Dec 16 2021, 3:10 PM

Looks good, thanks!

llvm/test/DebugInfo/Generic/type-units-maybe-unused-types.ll
58 ↗(On Diff #395011)

Might be able to use --implicit-check-not=DW_AT_signature?

Thanks for the review @dblaikie

llvm/test/DebugInfo/Generic/type-units-maybe-unused-types.ll
58 ↗(On Diff #395011)

I started with that but it meant I had to have a load of CHECK: DW_AT_signature between the type unit headers. The approach I've gone with here ends up using fewer checks (CHECK-NOTs).

This revision was landed with ongoing or failed builds.Dec 17 2021, 2:11 AM
This revision was automatically updated to reflect the committed changes.

I appear to have broken a bot here https://lab.llvm.org/buildbot/#/builders/105/builds/18850, looking now.

Seems to have broken some ThinLTO+Split DWARF situation, as follows:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!18}

!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, emissionKind: FullDebug, enums: !2, nameTableKind: GNU)
!1 = !DIFile(filename: "x.cc", directory: "/proc/self/cwd")
!2 = !{!3}
!3 = distinct !DICompositeType(tag: DW_TAG_enumeration_type, name: "E", baseType: !8, size: 64, flags: DIFlagEnumClass, elements: !9, identifier: "_Z1E")
!8 = !DIBasicType(name: "unsigned long", size: 64, encoding: DW_ATE_unsigned)
!9 = !{}
!18 = !{i32 2, !"Debug Info Version", i32 3}

Built with:

llc test.ll -split-dwarf-file=out.dwo -generate-type-units

This will produce an invalid .debug_gnu_pubnames with 0 offset values (.long 0 # DIE offset), or, if you apply this patch (this patch isn't perfectly clean for the codebase just yet - I need to cleanup one other mistake - but for the purpose of testing this patch it'll give a good signal as to where things have gone wrong), it'll assert:

diff --git llvm/include/llvm/CodeGen/DIE.h llvm/include/llvm/CodeGen/DIE.h
index 9e94c401bfae..60bd52ee4032 100644
--- llvm/include/llvm/CodeGen/DIE.h
+++ llvm/include/llvm/CodeGen/DIE.h
@@ -774,7 +774,10 @@ public:
   unsigned getAbbrevNumber() const { return AbbrevNumber; }
   dwarf::Tag getTag() const { return Tag; }
   /// Get the compile/type unit relative offset of this DIE.
-  unsigned getOffset() const { return Offset; }
+  unsigned getOffset() const {
+    assert(Offset && "Offset being queried before it's been computed.");
+    return Offset;
+  }
   unsigned getSize() const { return Size; }
   bool hasChildren() const { return ForceChildren || !Children.empty(); }
   void setForceChildren(bool B) { ForceChildren = B; }
nikic added a subscriber: nikic.Dec 24 2021, 12:27 AM

FYI I deleted the left behind test file in https://github.com/llvm/llvm-project/commit/cb31a57104215c57e01e8e9bfe9e545ae0b6b18b -- maybe the revert got confused due to the rename in between.

FYI I deleted the left behind test file in https://github.com/llvm/llvm-project/commit/cb31a57104215c57e01e8e9bfe9e545ae0b6b18b -- maybe the revert got confused due to the rename in between.

oh, thanks so much, and sorry for the noise!

Thanks very much for the reproducer and revert @dblaikie.

IIUC it looks like this comment in addGlobalTypeUnitType in llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp hints at the issue (which was adding in your patch a0e3c751874f):

// Insert, allowing the entry to remain as-is if it's already present
// This way the CU-level type DIE is preferred over the "can't describe this
// type as a unit offset because it's not really in the CU at all, it's only
// in a type unit"

Prior to my patch, the first call to addGlobalTypeUnitType would always be with the CU DIE. With my patch that is not always true (and sometimes there is no CU DIE at all).

I'm not particularly familiar with pubnames/types, so I'm not sure which of the options I think we have is better/more appropriate:

  1. Always emit CU DIEs for unused type unit types when pubtypes will be emitted (i.e., if we are emitting a pubtypes section then this patch will have no effect).
  2. Or, don't emit a pubtype entry for a type that doesn't have a CU DIE (for example, with this patch applied, an unused enum which is described by a type unit).

Or perhaps something else. What do you think?

Thanks very much for the reproducer and revert @dblaikie.

IIUC it looks like this comment in addGlobalTypeUnitType in llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp hints at the issue (which was adding in your patch a0e3c751874f):

// Insert, allowing the entry to remain as-is if it's already present
// This way the CU-level type DIE is preferred over the "can't describe this
// type as a unit offset because it's not really in the CU at all, it's only
// in a type unit"

Prior to my patch, the first call to addGlobalTypeUnitType would always be with the CU DIE. With my patch that is not always true (and sometimes there is no CU DIE at all).

I'm not particularly familiar with pubnames/types, so I'm not sure which of the options I think we have is better/more appropriate:

  1. Always emit CU DIEs for unused type unit types when pubtypes will be emitted (i.e., if we are emitting a pubtypes section then this patch will have no effect).
  2. Or, don't emit a pubtype entry for a type that doesn't have a CU DIE (for example, with this patch applied, an unused enum which is described by a type unit).

Or perhaps something else. What do you think?

Hrm :/ I'm not super enthusiastic about either direction. I'm starting to wonder whether this general direction is viable - that some DWARF consumers might not find types that are in type units but aren't referenced from the CUs at all.

Do you have a measure of the debug info size improvement of this direction? If it doesn't end up counting for much, maybe it's not worth trying to create this possibly novel debug info?

Do you know/could you look to see if there's any case where GCC produces type units without skeleton types that reference them? If there are such cases, that'd be reassuring, and may also help inform how we should behave with pubnames etc.

& have you tested this sort of DWARF (TUs without skeleton type references) with GDB and LLDB, do they seem to find the types well?

Hrm :/ I'm not super enthusiastic about either direction. I'm starting to wonder whether this general direction is viable - that some DWARF consumers might not find types that are in type units but aren't referenced from the CUs at all.

Do you have a measure of the debug info size improvement of this direction? If it doesn't end up counting for much, maybe it's not worth trying to create this possibly novel debug info?

For non-pathological cases the saving appears to be negligible. The CTMark projects all have total debug section size savings of less than 0.1% with this patch applied when built with -fdebug-types-section -gdwarf-5 in RelWithDebInfo configuration. Unfortunately, I don't actually have access to the pathological source that prompted this patch. We (@jmorse) only have a binary built with clang with what looks like a load of redundant DIEs in it and I don't think it's likely that we'll be able to get our hands on that source.

Do you know/could you look to see if there's any case where GCC produces type units without skeleton types that reference them? If there are such cases, that'd be reassuring, and may also help inform how we should behave with pubnames etc.

I've just had a look with gcc 7.5 (because it's what I have installed; I can spend more time getting/building a recent version?). Interestingly, GCC 7.5 doesn't produce CU skeleton DIEs at all when type units are enabled, even when types are used in the CU.

Taking some source from my test in this patch, you can see that the TU for Ex::Enum (a type that is used) is referenced directly by the DW_AT_type by its signature:

$ cat test.cpp
struct Ex { enum Enum { X }; };
void fun() { Ex::Enum local; }

$ gcc -c -O2 -g -fdebug-types-section -gdwarf-5  test.cpp -o test_gcc
$ llvm-dwarfdump -v test_gcc --name local
test_gcc:	file format elf64-x86-64

0x00000052: DW_TAG_variable [11]   (0x00000035)
              DW_AT_name [DW_FORM_strp]	( .debug_str[0x00000028] = "local")
              DW_AT_decl_file [DW_FORM_data1]	("/home/och/dev/bugs/scratch/test.cpp")
              DW_AT_decl_line [DW_FORM_data1]	(2)
              DW_AT_type [DW_FORM_ref_sig8]	(0x22f6e0a3cab4ab3c)

So, gcc appears to omit the CU skeleton DIE entirely when type units are used. That said, llvm-dwarfdump seems unhappy when we throw pubnames into the mix, but I don't know if this is gcc bug or not (see "error" in the output below):

$ gcc -c -O2 -g -fdebug-types-section -gdwarf-5  test.cpp -o test_gcc_pubnames -gpubnames
$ llvm-dwarfdump -v test_gcc_pubnames
...
.debug_pubnames contents:
length = 0x00000016, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000063
Offset     Name
0x00000035 "fun"

.debug_pubtypes contents:
error: name lookup table at offset 0x0 has a terminator at offset 0x1f before the expected end at 0x26
length = 0x00000026, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000063
Offset     Name
0x0000002e "unsigned int"

& have you tested this sort of DWARF (TUs without skeleton type references) with GDB and LLDB, do they seem to find the types well?

Just checked and LLDB (trunk) appears to be able to find types that are only described by TUs (not referenced in the CU at all). i.e. AFAICT it can handle the output of this patch. I don't have a recent GDB installed (the version I have doesn't seem to be able to handle type units at all).

Looking at the DWARFv5 spec while trying to get my head round pubnames, I see that there's a v5 specific section that combines pubnames and pubtypes called debug_names. Sorry if this is a silly question - this part of DWARF is new to me - does clang support emitting debug_names?

Looking at the DWARFv5 spec while trying to get my head round pubnames, I see that there's a v5 specific section that combines pubnames and pubtypes called debug_names. Sorry if this is a silly question - this part of DWARF is new to me - does clang support emitting debug_names?

Yes. I think if you use -gpubnames you get it even with gdb tuning.

Hrm :/ I'm not super enthusiastic about either direction. I'm starting to wonder whether this general direction is viable - that some DWARF consumers might not find types that are in type units but aren't referenced from the CUs at all.

Do you have a measure of the debug info size improvement of this direction? If it doesn't end up counting for much, maybe it's not worth trying to create this possibly novel debug info?

For non-pathological cases the saving appears to be negligible. The CTMark projects all have total debug section size savings of less than 0.1% with this patch applied when built with -fdebug-types-section -gdwarf-5 in RelWithDebInfo configuration. Unfortunately, I don't actually have access to the pathological source that prompted this patch. We (@jmorse) only have a binary built with clang with what looks like a load of redundant DIEs in it and I don't think it's likely that we'll be able to get our hands on that source.

Ah, /perhaps/ from the DWARF you might be able to hypothesize about the nature of the entities & why the code might've produced DWARF like that.

Do you know/could you look to see if there's any case where GCC produces type units without skeleton types that reference them? If there are such cases, that'd be reassuring, and may also help inform how we should behave with pubnames etc.

I've just had a look with gcc 7.5 (because it's what I have installed; I can spend more time getting/building a recent version?). Interestingly, GCC 7.5 doesn't produce CU skeleton DIEs at all when type units are enabled, even when types are used in the CU.

Taking some source from my test in this patch, you can see that the TU for Ex::Enum (a type that is used) is referenced directly by the DW_AT_type by its signature:

$ cat test.cpp
struct Ex { enum Enum { X }; };
void fun() { Ex::Enum local; }

$ gcc -c -O2 -g -fdebug-types-section -gdwarf-5  test.cpp -o test_gcc
$ llvm-dwarfdump -v test_gcc --name local
test_gcc:	file format elf64-x86-64

0x00000052: DW_TAG_variable [11]   (0x00000035)
              DW_AT_name [DW_FORM_strp]	( .debug_str[0x00000028] = "local")
              DW_AT_decl_file [DW_FORM_data1]	("/home/och/dev/bugs/scratch/test.cpp")
              DW_AT_decl_line [DW_FORM_data1]	(2)
              DW_AT_type [DW_FORM_ref_sig8]	(0x22f6e0a3cab4ab3c)

So, gcc appears to omit the CU skeleton DIE entirely when type units are used.

Ah, sorry, there are several ways GCC emits references to type units that I know of (referenced once, it works like you observed, DW_FORM_ref_sig8 on the DW_AT_type - when referenced more than once, a simple type DIE at the CU level, when members need to be emitted (eg: a member function definition) then the type skeleton goes in the correct namespace too), but I'm not sure if GCC ever produces a type description that is otherwise unreferenced from other parts of DWARF...
(just for the record, here's an example that shows the 3 ways I know of that GCC will emit a reference to a type unit)

namespace ns {
struct t1 { void f1(); };
}
using namespace ns;
#ifdef SINGLE
t1 v1;
#elif DOUBLE
t1 v1;
t1 v2;
#elif MEMBER
void t1::f1() { 
}
#endif

That said, llvm-dwarfdump seems unhappy when we throw pubnames into the mix, but I don't know if this is gcc bug or not (see "error" in the output below):

$ gcc -c -O2 -g -fdebug-types-section -gdwarf-5  test.cpp -o test_gcc_pubnames -gpubnames
$ llvm-dwarfdump -v test_gcc_pubnames
...
.debug_pubnames contents:
length = 0x00000016, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000063
Offset     Name
0x00000035 "fun"

.debug_pubtypes contents:
error: name lookup table at offset 0x0 has a terminator at offset 0x1f before the expected end at 0x26
length = 0x00000026, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000063
Offset     Name
0x0000002e "unsigned int"

Ah, looks like maybe GCC produces a corrupt pubtypes table /except/ in the "MEMBER" case in the above example code. (in that case it produces a duplicate entry in pubtypes, but at least they're both correct). Given that LLVM /only/ produces the "MEMBER" style type unit reference, that seems OK for LLVM's pubnames support there. But doesn't explain how we should create pubnames for types that don't have skeleton references from a CU.

(I'm guessing what GCC does is it tries to emit a pubtype entry from some internal intermediate object that never actually got added to the CU (because it used one of the more terse representation choices, so didn't emit the full/"real" declaration), so it has empty values and corrupts the list)

& have you tested this sort of DWARF (TUs without skeleton type references) with GDB and LLDB, do they seem to find the types well?

Just checked and LLDB (trunk) appears to be able to find types that are only described by TUs (not referenced in the CU at all). i.e. AFAICT it can handle the output of this patch. I don't have a recent GDB installed (the version I have doesn't seem to be able to handle type units at all).

Looking at the DWARFv5 spec while trying to get my head round pubnames, I see that there's a v5 specific section that combines pubnames and pubtypes called debug_names. Sorry if this is a silly question - this part of DWARF is new to me - does clang support emitting debug_names?

Yeah, as @probinson mentioned, -gdwarf-5 -gpubnames should give you DWARFv5 .debug_names.

Hmm... maybe we disable pubnames when there are type units enabled? I hadn't remembered that, and not sure what that does for Split DWARF + type units... because I could've sworn pubnames were /necessary/ for Split DWARf, at least with GDB... so many layers.

No accelerator tables if type units are enabled, as DWARF v4 type units are not compatible with accelerator tables.

(according to `llvm/test/DebugInfo/X86/accel-tables.ll)

*more testing*

Hmm... maybe we disable pubnames when there are type units enabled? I hadn't remembered that, and not sure what that does for Split DWARF + type units... because I could've sworn pubnames were /necessary/ for Split DWARf, at least with GDB... so many layers.

No accelerator tables if type units are enabled, as DWARF v4 type units are not compatible with accelerator tables.

(according to `llvm/test/DebugInfo/X86/accel-tables.ll)

*more testing*

OK, so looks like only -ggnu-pubname / .debug_gnu_pubnames / .debug_gnu_pubtypes is supported with Type Units enabled. Though in theory/according to that comment ^ it should be OK to support debug_names with type units.

Sorry it's taken me a while to get back to you on this - thanks for all that info. Here is a summary of the situation as far as I understand it:

(without any version of these patches applied)

  1. When type units are enabled, types that get a TU also get a skeleton CU DIE.
  2. Some unused types are retained on purpose.
  3. Types that are unreferenced within a CU still get (skeleton) DIEs in the CU, even if type units enabled (what this patch is trying to "fix").

(using pubnames to refer to both pubnames and pubtypes)

  1. -gpubnames / .debug_pubnames is not compatible with type units.
  2. -ggnu-pubnames / .debug_gnu_pubnames is compatible with type units.
  3. .debug_names (DWARFv5) is compatible with type units.

Do you know what the deciding factor is in whether or not one of these kinds of tables is compatible with type units (4, 5, 6)? Naively, I assume that it is whether there provision for referencing a TU directly from the name table? i.e., .debug_gnu_pubnames and .debug_names can both directly reference a TU, but .debug_pubnames cannot. If my assumption is true, then I would think the answer to:

But doesn't explain how we should create pubnames for types that don't have skeleton references from a CU.

is that we only have to do something special for DWARF v4 name tables (.debug_pubnames). But this seems to contradict "No accelerator tables if type units are enabled, as DWARF v4 type units are not compatible with accelerator tables.", which seems to suggest that we don't emit DWARF v4 accelerator tables at all when type units are enabled (name tables == accelerator tables?). I think I am still missing part of this picture. On that note, I found .debug_pubnames and .debug_names documentation easily (v4 and v5 spec), but I'm struggling to find any for .debug_gnu_pubnames.

In any case, the final point in summary:

  1. The complexity we'd introduce in order to adhere to the various constraints makes fixing the issue probably not worth it.

So this should probably be abandoned, at least in it's current form.

Sorry it's taken me a while to get back to you on this - thanks for all that info. Here is a summary of the situation as far as I understand it:

(without any version of these patches applied)

  1. When type units are enabled, types that get a TU also get a skeleton CU DIE.

Not entirely true - if a type is referenced from another type unit, and not from a retained type, there's no skeleton DIE in the CU.

eg:

$ clang++-tot test.cpp -g -c -fdebug-types-section && llvm-dwarfdump-tot test.o -debug-info -debug-types 
test.o: file format elf64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000039, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000003d)

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer    ("clang version 14.0.0 (git@github.com:llvm/llvm-project.git 499703e9c08a055a9007278e5222b0550aba89bf)")
              DW_AT_language    (DW_LANG_C_plus_plus_14)
              DW_AT_name        ("test.cpp")
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/usr/local/google/home/blaikie/dev/scratch")

0x0000001e:   DW_TAG_variable
                DW_AT_name      ("v1")
                DW_AT_type      (0x00000033 "t2")
                DW_AT_external  (true)
                DW_AT_decl_file ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
                DW_AT_decl_line (3)
                DW_AT_location  (DW_OP_addr 0x0)

0x00000033:   DW_TAG_structure_type
                DW_AT_declaration       (true)
                DW_AT_signature (0x4e6669a661523280)

0x0000003c:   NULL

.debug_types contents:
0x00000000: Type Unit: length = 0x0000003a, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08, name = 't2', type_signature = 0x4e6669a661523280, type_offset = 0x001e (next unit at 0x0000003e)

0x00000017: DW_TAG_type_unit
              DW_AT_language    (DW_LANG_C_plus_plus_14)
              DW_AT_stmt_list   (0x00000000)

0x0000001e:   DW_TAG_structure_type
                DW_AT_calling_convention        (DW_CC_pass_by_value)
                DW_AT_name      ("t2")
                DW_AT_byte_size (0x01)
                DW_AT_decl_file ("test.cpp")
                DW_AT_decl_line (2)

0x00000027:     DW_TAG_member
                  DW_AT_name    ("v1")
                  DW_AT_type    (0x00000034 "t1")
                  DW_AT_decl_file       ("test.cpp")
                  DW_AT_decl_line       (2)
                  DW_AT_data_member_location    (0x00)

0x00000033:     NULL

0x00000034:   DW_TAG_structure_type
                DW_AT_declaration       (true)
                DW_AT_signature (0xc6694e51369161f2)

0x0000003d:   NULL
0x00000000: Type Unit: length = 0x00000024, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08, name = 't1', type_signature = 0xc6694e51369161f2, type_offset = 0x001e (next unit at 0x00000028)

0x00000017: DW_TAG_type_unit
              DW_AT_language    (DW_LANG_C_plus_plus_14)
              DW_AT_stmt_list   (0x00000000)

0x0000001e:   DW_TAG_structure_type
                DW_AT_calling_convention        (DW_CC_pass_by_value)
                DW_AT_name      ("t1")
                DW_AT_byte_size (0x01)
                DW_AT_decl_file ("test.cpp")
                DW_AT_decl_line (1)

0x00000027:   NULL
  1. Some unused types are retained on purpose.

Yep.

  1. Types that are unreferenced within a CU still get (skeleton) DIEs in the CU, even if type units enabled (what this patch is trying to "fix").

Right.

(using pubnames to refer to both pubnames and pubtypes)

  1. -gpubnames / .debug_pubnames is not compatible with type units.

Doesn't seem to be correct:

$ clang++-tot test.cpp -g -c -fdebug-types-section -gpubnames && llvm-dwarfdump-tot test.o -debug-info -debug-types -debug-pubtypes
test.o: file format elf64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000039, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08 (next unit at 0x0000003d)

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer    ("clang version 14.0.0 (git@github.com:llvm/llvm-project.git 499703e9c08a055a9007278e5222b0550aba89bf)")
              DW_AT_language    (DW_LANG_C_plus_plus_14)
              DW_AT_name        ("test.cpp")
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/usr/local/google/home/blaikie/dev/scratch")
              DW_AT_GNU_pubnames        (true)

0x0000001e:   DW_TAG_variable
                DW_AT_name      ("v1")
                DW_AT_type      (0x00000033 "t2")
                DW_AT_external  (true)
                DW_AT_decl_file ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
                DW_AT_decl_line (3)
                DW_AT_location  (DW_OP_addr 0x0)

0x00000033:   DW_TAG_structure_type
                DW_AT_declaration       (true)
                DW_AT_signature (0x4e6669a661523280)

0x0000003c:   NULL

.debug_types contents:
0x00000000: Type Unit: length = 0x0000003a, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08, name = 't2', type_signature = 0x4e6669a661523280, type_offset = 0x001e (next unit at 0x0000003e)

0x00000017: DW_TAG_type_unit
              DW_AT_language    (DW_LANG_C_plus_plus_14)
              DW_AT_stmt_list   (0x00000000)

0x0000001e:   DW_TAG_structure_type
                DW_AT_calling_convention        (DW_CC_pass_by_value)
                DW_AT_name      ("t2")
                DW_AT_byte_size (0x01)
                DW_AT_decl_file ("test.cpp")
                DW_AT_decl_line (2)

0x00000027:     DW_TAG_member
                  DW_AT_name    ("v1")
                  DW_AT_type    (0x00000034 "t1")
                  DW_AT_decl_file       ("test.cpp")
                  DW_AT_decl_line       (2)
                  DW_AT_data_member_location    (0x00)

0x00000033:     NULL

0x00000034:   DW_TAG_structure_type
                DW_AT_declaration       (true)
                DW_AT_signature (0xc6694e51369161f2)

0x0000003d:   NULL
0x00000000: Type Unit: length = 0x00000024, format = DWARF32, version = 0x0004, abbr_offset = 0x0000, addr_size = 0x08, name = 't1', type_signature = 0xc6694e51369161f2, type_offset = 0x001e (next unit at 0x00000028)

0x00000017: DW_TAG_type_unit
              DW_AT_language    (DW_LANG_C_plus_plus_14)
              DW_AT_stmt_list   (0x00000000)

0x0000001e:   DW_TAG_structure_type
                DW_AT_calling_convention        (DW_CC_pass_by_value)
                DW_AT_name      ("t1")
                DW_AT_byte_size (0x01)
                DW_AT_decl_file ("test.cpp")
                DW_AT_decl_line (1)

0x00000027:   NULL

.debug_pubtypes contents:
length = 0x0000001c, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x0000003d
Offset     Name
0x0000000b "t1"
0x00000033 "t2"

So "t2" pubtype entry references the offset of the skeleton DIE in the CU. "t1" references the CU DIE because there is no skeleton to reference. But "t1" is reachable from that CU (via "t2").

GCC's debug info uses the CU DIE offset for both "t1" and "t2", so that's promising that CU references are adequate.

  1. -ggnu-pubnames / .debug_gnu_pubnames is compatible with type units.

Yep, same as non-gnu pubnames/pubtypes here. (generally they have the same content - gnu_pub* came into existence because GCC couldn't rely on the contents of the standard pubnames/pubtypes, so they invented a separate name that could contain a more reliable list/agreement between GCC and GDB, and Clang got in on that at some point - but there's no need for Clang or GCC to produce anything different into non-gnu pubnames/pubtypes)

  1. .debug_names (DWARFv5) is compatible with type units.

In the spec yes, but not in LLVM yet. https://github.com/llvm/llvm-project/blob/964dc368e7c72ad5b7ab995c97920c4deb624631/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L335

Do you know what the deciding factor is in whether or not one of these kinds of tables is compatible with type units (4, 5, 6)? Naively, I assume that it is whether there provision for referencing a TU directly from the name table? i.e., .debug_gnu_pubnames and .debug_names can both directly reference a TU, but .debug_pubnames cannot.

Nah, pubnames/pubtypes (gnu and non-gnu) don't have any way to actually reference a type unit - they "support" type units by describing the CU that references the type. This is why it's not obvious how to fix the bug you're dealing with - emitting a type in a type unit that would be unreferenced (even indirectly) from any CU. There'd be no place to put an index entry that'd allow the DWARF Consumer to find that entity.

For accelerator tables (Apple and the standardized for in DWARFv5) - the Apple version was designed for Apple's situation, where they don't use type units (because they use dsymutil, a DWARF-aware linker - so there's no need to make types in linker-agnostic units, etc). But when it was standardized into debug_names it got support for type units in some form. That support looks more like what you're describing - the ability to reference different units, specifically to reference CUs and TUs by offset, but also to reference "foreign" TUs by hash (this works for referencingc type units in Split DWARF - since they have no skeleton unit (not talking about skeleton type DIEs here, but the skeleton units as part of Split DWARF) in the .o file and so can only be referenced from the .o/exe by type hash, not by unit offset).

If my assumption is true, then I would think the answer to:

But doesn't explain how we should create pubnames for types that don't have skeleton references from a CU.

is that we only have to do something special for DWARF v4 name tables (.debug_pubnames). But this seems to contradict "No accelerator tables if type units are enabled, as DWARF v4 type units are not compatible with accelerator tables.", which seems to suggest that we don't emit DWARF v4 accelerator tables at all when type units are enabled (name tables == accelerator tables?). I think I am still missing part of this picture. On that note, I found .debug_pubnames and .debug_names documentation easily (v4 and v5 spec), but I'm struggling to find any for .debug_gnu_pubnames.

In any case, the final point in summary:

  1. The complexity we'd introduce in order to adhere to the various constraints makes fixing the issue probably not worth it.

So this should probably be abandoned, at least in it's current form.

Possibly. If you have a really pathalogical case it may still be worth further discussion, but yeah, I don't know of a particularly easy path forward.

Orlando reopened this revision.Jan 18 2022, 2:28 AM

That clears up a lot of my confusion.

Possibly. If you have a really pathalogical case it may still be worth further discussion, but yeah, I don't know of a particularly easy path forward.

We're happy to drop this for now. I really appreciate all the help on this one, thanks.

This revision is now accepted and ready to land.Jan 18 2022, 2:28 AM
Orlando abandoned this revision.Jan 18 2022, 2:28 AM