This is an archive of the discontinued LLVM Phabricator instance.

[llvm-debuginfo-analyzer] Support both Reference and Type attrs in single DIE
ClosedPublic

Authored by scott.linder on May 16 2023, 1:20 PM.

Details

Summary

Relax the assumption that at most one Reference-or-Type-like attribute is
present on a DWARF DIE.

Add support for at most one Type attribute (i.e. DW_AT_import xor
DW_AT_type) and separately at most one Reference attribute (i.e.
DW_AT_specification xor DW_AT_abstract_origin xor ...).

Update comment describing old assumption and tag it as a "FIXME" to
reflect the fact that this is perhaps still not general enough.

Add a test based on the case which led me to encounter the bug in the
wild.

Diff Detail

Event Timeline

scott.linder created this revision.May 16 2023, 1:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2023, 1:20 PM
scott.linder requested review of this revision.May 16 2023, 1:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2023, 1:20 PM

@scott.linder Thanks for taking the time to create the patch.

llvm/lib/DebugInfo/LogicalView/Readers/LVELFReader.cpp
1131

In the case of a new offset the original code created:

[Offset, LVElementEntry(nullptr, {Element}) ]

The new code:

[Offset, LVElementEntry(nullptr, References{}, Types{}) ]

The Element parameter is not recorded.

(Cross-posting from the other thread, to keep everything in one place)

I did a little digging and it seems like there is a faulty assumption in LVELFReader:

// We are assuming that DW_AT_specification, DW_AT_abstract_origin,
// DW_AT_type and DW_AT_extension do not appear at the same time
// in the same DIE.

Thanks very much for your analysis and for creating a small reproducible test.

This seems true for GCC, but not for Clang, at least for the simplest reproducer I could create:

$ cat a.cpp
struct S {
    static const int Arr[];
};
const int S::Arr[] = {
    0, 1, 2
};
$ gcc -g -c a.cpp -o a.o
$ build/bin/llvm-dwarfdump --debug-info a.o | grep -B1 -A2 DW_AT_specification
0x00000053:   DW_TAG_variable
                DW_AT_specification     (0x00000028 "Arr")
                DW_AT_decl_line (4)
                DW_AT_decl_column       (0x0b)
$ clang++ -g -c a.cpp -o a.o
$ build/bin/llvm-dwarfdump --debug-info a.o | grep -B1 -A2 DW_AT_specification
0x0000001e:   DW_TAG_variable
                DW_AT_specification     (0x0000003e "Arr")
                DW_AT_type      (0x00000068 "const int[3]")
                DW_AT_location  (DW_OP_addr 0x0)

It seems like the code could be adapted to track these independently, rather than assume only one is present? I can prepare a patch if that sounds reasonable to you!

I am happy with your proposal and a patch.

Edit: This may also be a Clang bug. I haven't read the relevant DWARF sections with enough detail to say whether the DWARF itself is valid or not, but either way it seems nice to support it in llvm-debuginfo-analyzer, especially considering how much flexibility the tool already affords in terms of input

I compiled your test case with the latest Clang:

0x0000001e:   DW_TAG_variable
                     DW_AT_specification	(0x00000031 "Arr")
                     DW_AT_type	(0x00000052 "const int[3]")
                     DW_AT_location	(DW_OP_addrx 0x0)
                     DW_AT_linkage_name	("_ZN1S3ArrE")

0x00000031:     DW_TAG_member
                       DW_AT_name	("Arr")
                       DW_AT_type	(0x0000003a "const int[]")
                       DW_AT_decl_file	("/data/projects/sandbox/pr-62716.cpp")
                       DW_AT_decl_line	(2)
                       DW_AT_external	(true)
                       DW_AT_declaration	(true)

From the DWARF section for DW_AT_specification:

A debugging information entry with a DW_AT_specification attribute does not need to duplicate information provided by the debugging information entry referenced by that specification attribute.

Clang:

  • Added an extra entry for DW_AT_type which seems redundant in DIE (0x1e) as by following the DW_AT_specification we can get the type
  • The referenced types are pointing to different DIEs (0x52 and 0x3a).

I think in this case the duplication may be meaningful, or at least legal and intelligible, as the types at the declaration vs the definition are actually distinct, see https://en.cppreference.com/w/cpp/language/array#Arrays_of_unknown_bound:

extern int x[];      // the type of x is "array of unknown bound of int"
int a[] = {1, 2, 3}; // the type of a is "array of 3 int"

Or from the C++ spec directly, i.e. from 6.8.1 [basic.types.general] in https://github.com/cplusplus/draft/releases/tag/n4917 (emphasis mine):

(6) A class type (such as “class X”) can be incomplete at one point in a translation unit and complete later on;
the type “class X” is the same type at both points. The declared type of an array object can be an array of
incomplete class type and therefore incomplete; if the class type is completed later on in the translation unit,
the array type becomes complete; the array type at those two points is the same type. The declared type of
an array object can be an array of unknown bound and therefore be incomplete at one point in a translation
unit and complete later on; the array types at those two points (“array of unknown bound of T” and “array
of N T”) are different types.
The type of a pointer to array of unknown bound, or of a type defined by a
typedef declaration to be an array of unknown bound, cannot be completed.

I also think so long as the DWARF spec does not prohibit it, it makes sense for tooling to support it. Admittedly there is a lot of wiggle-room and ambiguity in much of the DWARF spec, but in this case the phrasing "does not need to duplicate" seems to heavily imply that the duplication is legal, just not necessary when the attributes match.

llvm/lib/DebugInfo/LogicalView/Readers/LVELFReader.cpp
1131

It is handled below now, as the change also removes the if/else: we use try_emplace to find the entry at Offset, creating an empty one if none exists. Then, we always add Element to the appropriate set.

So the previously explicitly special-cased Iter == ElementTable.end() case is all contained in this first line of the function.

(Cross-posting from the other thread, to keep everything in one place)

I did a little digging and it seems like there is a faulty assumption in LVELFReader:

// We are assuming that DW_AT_specification, DW_AT_abstract_origin,
// DW_AT_type and DW_AT_extension do not appear at the same time
// in the same DIE.

Thanks very much for your analysis and for creating a small reproducible test.

This seems true for GCC, but not for Clang, at least for the simplest reproducer I could create:

$ cat a.cpp
struct S {
    static const int Arr[];
};
const int S::Arr[] = {
    0, 1, 2
};
$ gcc -g -c a.cpp -o a.o
$ build/bin/llvm-dwarfdump --debug-info a.o | grep -B1 -A2 DW_AT_specification
0x00000053:   DW_TAG_variable
                DW_AT_specification     (0x00000028 "Arr")
                DW_AT_decl_line (4)
                DW_AT_decl_column       (0x0b)
$ clang++ -g -c a.cpp -o a.o
$ build/bin/llvm-dwarfdump --debug-info a.o | grep -B1 -A2 DW_AT_specification
0x0000001e:   DW_TAG_variable
                DW_AT_specification     (0x0000003e "Arr")
                DW_AT_type      (0x00000068 "const int[3]")
                DW_AT_location  (DW_OP_addr 0x0)

It seems like the code could be adapted to track these independently, rather than assume only one is present? I can prepare a patch if that sounds reasonable to you!

I am happy with your proposal and a patch.

Edit: This may also be a Clang bug. I haven't read the relevant DWARF sections with enough detail to say whether the DWARF itself is valid or not, but either way it seems nice to support it in llvm-debuginfo-analyzer, especially considering how much flexibility the tool already affords in terms of input

I compiled your test case with the latest Clang:

0x0000001e:   DW_TAG_variable
                     DW_AT_specification	(0x00000031 "Arr")
                     DW_AT_type	(0x00000052 "const int[3]")
                     DW_AT_location	(DW_OP_addrx 0x0)
                     DW_AT_linkage_name	("_ZN1S3ArrE")

0x00000031:     DW_TAG_member
                       DW_AT_name	("Arr")
                       DW_AT_type	(0x0000003a "const int[]")
                       DW_AT_decl_file	("/data/projects/sandbox/pr-62716.cpp")
                       DW_AT_decl_line	(2)
                       DW_AT_external	(true)
                       DW_AT_declaration	(true)

From the DWARF section for DW_AT_specification:

A debugging information entry with a DW_AT_specification attribute does not need to duplicate information provided by the debugging information entry referenced by that specification attribute.

Clang:

  • Added an extra entry for DW_AT_type which seems redundant in DIE (0x1e) as by following the DW_AT_specification we can get the type
  • The referenced types are pointing to different DIEs (0x52 and 0x3a).

I think in this case the duplication may be meaningful, or at least legal and intelligible, as the types at the declaration vs the definition are actually distinct, see https://en.cppreference.com/w/cpp/language/array#Arrays_of_unknown_bound:

extern int x[];      // the type of x is "array of unknown bound of int"
int a[] = {1, 2, 3}; // the type of a is "array of 3 int"

Or from the C++ spec directly, i.e. from 6.8.1 [basic.types.general] in https://github.com/cplusplus/draft/releases/tag/n4917 (emphasis mine):

(6) A class type (such as “class X”) can be incomplete at one point in a translation unit and complete later on;
the type “class X” is the same type at both points. The declared type of an array object can be an array of
incomplete class type and therefore incomplete; if the class type is completed later on in the translation unit,
the array type becomes complete; the array type at those two points is the same type. The declared type of
an array object can be an array of unknown bound and therefore be incomplete at one point in a translation
unit and complete later on; the array types at those two points (“array of unknown bound of T” and “array
of N T”) are different types.
The type of a pointer to array of unknown bound, or of a type defined by a
typedef declaration to be an array of unknown bound, cannot be completed.

I also think so long as the DWARF spec does not prohibit it, it makes sense for tooling to support it. Admittedly there is a lot of wiggle-room and ambiguity in much of the DWARF spec, but in this case the phrasing "does not need to duplicate" seems to heavily imply that the duplication is legal, just not necessary when the attributes match.

Thanks very much for the great links and explanations. The generation of both types now makes sense.

llvm/lib/DebugInfo/LogicalView/Readers/LVELFReader.cpp
1131

Sorry; I missed the Element assignment.
The change looks great.

CarlosAlbertoEnciso added a comment.EditedMay 18 2023, 2:02 AM

Compiling the test case with GCC and Clang, your patch produces:

llvm-debuginfo-analyzer --print=symbols --attribute=reference,producer dw-at-specification-clang.o dw-at-specification-gcc.o
Logical View:
           {File} 'dw-at-specification-clang.o'

             {CompileUnit} 'dw-at-specification.cpp'
               {Producer} 'clang version 17.0.0 ...'
               {Variable} 'Arr' -> 'const int [3]'
                 {Reference} @2
     1         {Struct} 'S'
     2           {Member} extern public 'Arr' -> 'const int [1]'

Logical View:
           {File} 'dw-at-specification-gcc.o'

             {CompileUnit} 'dw-at-specification.cpp'
                {Producer} 'GNU C++17 11.3.0 ...'
     1         {Struct} 'S'
     2           {Variable} extern 'Arr' -> 'const int [1]'
     4         {Variable} 'Arr' -> 'const const int [3]'
                 {Reference} @2

Looks good.

Looking at the DWARF produced by GCC (I have 11.3.0) it produces the same both types as Clang. I think it would be good to include a test for GCC.

This revision is now accepted and ready to land.May 18 2023, 2:23 AM
This revision was landed with ongoing or failed builds.May 23 2023, 1:52 PM
This revision was automatically updated to reflect the committed changes.