This is an archive of the discontinued LLVM Phabricator instance.

Debug Info: Support for DW_AT_export_symbols for anonymous structs
ClosedPublic

Authored by shafik on Aug 16 2019, 8:09 AM.

Details

Summary

This implements the DWARF 5 feature described in:

http://dwarfstd.org/ShowIssue.php?issue=141212.1

To support recognizing anonymous structs:

struct A { 
  struct { // Anonymous struct 
      int y;
  };  
} a;

This patch adds a new (DI)flag to LLVM metadata:

ExportSymbols

Currently there is an ambiguity between unnamed struct and anonymous structs in some cases, see D66175 which implements n incomplete work-around and explains the limitations in the code comments.

Diff Detail

Repository
rL LLVM

Event Timeline

shafik created this revision.Aug 16 2019, 8:09 AM
aprantl added inline comments.Aug 16 2019, 8:17 AM
include/llvm/IR/DebugInfoFlags.def
45 ↗(On Diff #215589)

The re-use of bit 16 is okay, as the comment says, this Apple-specific flag is not in use and there are not bitcode compatibility concerns.

include/llvm/IR/DebugInfoMetadata.h
671 ↗(On Diff #215589)

Like in DINamespace, this should be called getExportSymbols() (or rename both).

test/Assembler/export-symbol-anonymous-class.ll
31 ↗(On Diff #215589)

My guess is that you can reduce the entire test to

!named = !{!9}
!9 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !6, file: !3, line: 2, size: 32, flags: DIFlagExportSymbols | DIFlagTypePassByValue, elements: !10, identifier: "_ZTSN1AUt_E")

otoh, showing the flag in its natural context is also useful, so let's keep it this way.

36 ↗(On Diff #215589)

We can delete this, it distracts

39 ↗(On Diff #215589)

this,

40 ↗(On Diff #215589)

and this

shafik updated this revision to Diff 215619.Aug 16 2019, 9:14 AM
shafik marked 4 inline comments as done.

Deleting some lines from the test as requested.

I'm not sure I agree with the DWARF issue here - name handling is necessarily language-specific, and there's no ambiguity about what an anonymous struct means in C or C++, there's only one way to do name resolution correctly there, and that's by treating them as transparent.

So I'd say rather than requiring all producers to put DW_AT_export_symbols on every anonymous struct in C or C++, consumers should assume its presence (if they want to model this name lookup this way - of course they don't need to model this in terms of DWARF concepts at all) for C and C++.

I'm not sure I agree with the DWARF issue here - name handling is necessarily language-specific, and there's no ambiguity about what an anonymous struct means in C or C++, there's only one way to do name resolution correctly there, and that's by treating them as transparent.

So I'd say rather than requiring all producers to put DW_AT_export_symbols on every anonymous struct in C or C++, consumers should assume its presence (if they want to model this name lookup this way - of course they don't need to model this in terms of DWARF concepts at all) for C and C++.

While the link quoted is a DWARF issue, that issue did actually make it into DWARF 5, (cf. 5.7.1 line 28ff). In case you were already aware of this, are you suggesting we should ignore the DWARF spec in LLVM here?

I'm not sure I agree with the DWARF issue here - name handling is necessarily language-specific, and there's no ambiguity about what an anonymous struct means in C or C++, there's only one way to do name resolution correctly there, and that's by treating them as transparent.

So I'd say rather than requiring all producers to put DW_AT_export_symbols on every anonymous struct in C or C++, consumers should assume its presence (if they want to model this name lookup this way - of course they don't need to model this in terms of DWARF concepts at all) for C and C++.

While the link quoted is a DWARF issue, that issue did actually make it into DWARF 5, (cf. 5.7.1 line 28ff). In case you were already aware of this, are you suggesting we should ignore the DWARF spec in LLVM here?

Ah, thanks for pointing that out - I hadn't read.

Eh, I think it's still pretty unnecessary (& the DWARF spec is only suggestive, doesn't tend to have requirements in this regard) and so I'd probably suggest waiting until some consumer really wants this (& I'd still want to have a discussion with the consumer about why they find this to be needed). But it's really cheap (especially in DWARFv5 where it can use a const_value form & cost no bytes in debug_info (if/when that sort of thing is implemented in LLVM, if it isn't already).

Eh, I think it's still pretty unnecessary (& the DWARF spec is only suggestive, doesn't tend to have requirements in this regard) and so I'd probably suggest waiting until some consumer really wants this (& I'd still want to have a discussion with the consumer about why they find this to be needed). But it's really cheap (especially in DWARFv5 where it can use a const_value form & cost no bytes in debug_info (if/when that sort of thing is implemented in LLVM, if it isn't already).

@shafik, Can you outline the problem Clang has differentiating anonymous structs and lambdas that made this patch necessary?

Eh, I think it's still pretty unnecessary (& the DWARF spec is only suggestive, doesn't tend to have requirements in this regard) and so I'd probably suggest waiting until some consumer really wants this (& I'd still want to have a discussion with the consumer about why they find this to be needed). But it's really cheap (especially in DWARFv5 where it can use a const_value form & cost no bytes in debug_info (if/when that sort of thing is implemented in LLVM, if it isn't already).

@shafik, Can you outline the problem Clang has differentiating anonymous structs and lambdas that made this patch necessary?

The issue in on the LLDB side where we want to reconstruct the clang AST for something like this:

struct A { 
  // anonymous struct
   struct {
       int x;
   };  
  // unnamed struct
   struct {
       int y;
   } C;
};

We have both an anonymous struct and an unnamed struct and the DWARF we currently generated w/o DW_AT_export_symbols would look something like this for the anonymous struct:

0x0000008a:     DW_TAG_structure_type
                  DW_AT_calling_convention      (DW_CC_pass_by_value)
                  DW_AT_byte_size       (0x04)
                  DW_AT_decl_file       ("/Users/shafik/code/anon_class.cpp")
                  DW_AT_decl_line       (2)

0x0000008f:       DW_TAG_member
                    DW_AT_name  ("x")
                    DW_AT_type  (0x0000000000000072 "int")
                    DW_AT_decl_file     ("/Users/shafik/code/anon_class.cpp")
                    DW_AT_decl_line     (3)
                    DW_AT_data_member_location  (0x00)

and like this for the unnamed struct:

0x000000a8:     DW_TAG_structure_type
                  DW_AT_calling_convention      (DW_CC_pass_by_value)
                  DW_AT_byte_size       (0x04)
                  DW_AT_decl_file       ("/Users/shafik/code/anon_class.cpp")
                  DW_AT_decl_line       (5)

0x000000ad:       DW_TAG_member
                    DW_AT_name  ("y")
                    DW_AT_type  (0x0000000000000072 "int")
                    DW_AT_decl_file     ("/Users/shafik/code/anon_class.cpp")
                    DW_AT_decl_line     (6)
                    DW_AT_data_member_location  (0x00)

and we are left not being able to tell the difference between the two cases.

The difference between those two cases would be based on their usage - the types are the same (kinda, basically). In one case there's an anonymous member of the anonymous type (so it's transparent/all the nested names are as-if they were names of members in the outer type) and in the other case there's a named variable of the anonymous type (so the named members are members of that outer member variable).

The difference between those two cases would be based on their usage - the types are the same (kinda, basically). In one case there's an anonymous member of the anonymous type (so it's transparent/all the nested names are as-if they were names of members in the outer type) and in the other case there's a named variable of the anonymous type (so the named members are members of that outer member variable).

The motivation for this change comes from assert that we were seeing during expression parsing when doing code completion see D66175 for more details. Basically we were incorrectly marking lambda, unnamed struct and anonymous structs as anonymous. We put in an improved heuristic but this would allow us to correctly identify anonymous classes in all cases. In expression parsing we are handing off the AST to an embedded clang parser and the assumptions clang are making are valid assumptions if we got the AST correct.

One could argue we could parse the members of the containing class in order to figure out if the struct has a name. From the previous example I gave we would be looking for something like this:

0x0000009c:     DW_TAG_member
                  DW_AT_name    ("C")
                  DW_AT_type    (0x00000000000000a8 "structure ")
                  DW_AT_decl_file       ("/Users/shafik/code/anon_class.cpp")
                  DW_AT_decl_line       (7)
                  DW_AT_data_member_location    (0x04)

but it gets more tricky if for example we have an array of the unnamed struct:

struct A { 
//...
   struct {
       int y;
   } C[10];
};

the DWARF looks like this:

0x0000009c:     DW_TAG_member
                  DW_AT_name    ("C")
                  DW_AT_type    (0x00000000000000bb "structure [10]")
                  DW_AT_decl_file       ("/Users/shafik/code/anon_class_b.cpp")
                  DW_AT_decl_line       (7)
                  DW_AT_data_member_location    (0x04)

....

0x000000bb:   DW_TAG_array_type
                DW_AT_type      (0x00000000000000a8 "structure ")

0x000000c0:     DW_TAG_subrange_type
                  DW_AT_type    (0x00000000000000c7 "__ARRAY_SIZE_TYPE__")
                  DW_AT_count   (0x0a)

and we have to recurse to find the array data to find that it points to the unnamed struct. This seems like a lot of work when we have an attribute we can apply and it is in the spec and so we are simply supporting a simpler and easy to maintain solution.

I also just realized that although I originally talked about LLDB being the consumer, ultimately since we have to assume any AST we generate from DWARF can be used in expression parsing clang is also the consumer as well.

Looking at the dwarfstd.org examples, without the flag (e.g., for DWARF v4) we should emit something like this:

<1> DW_TAG_struct_type
        DW_AT_name "A"
  <2> DW_TAG_struct_type
          // no DW_AT_name
      <3> DW_TAG_member
            DW_AT_name "x"
            DW_AT_type (int)
  <4> DW_TAG_struct_type
          // no DW_AT_name
      <5> DW_TAG_member
            DW_AT_name "y"
            DW_AT_type (int)
  <6> DW_TAG_member // this is A's member for the anonymous struct
          // no DW_AT_name
          DW_AT_type <2>
  <7> DW_TAG_member // this is A's member for the unnamed struct
          DW_AT_name "C"
          DW_AT_type <4>

which means the way the consumer knows to promote the members of <2> to the scope of <1> is that its only reference (from <6>) is via a member that is unnamed, and it knows NOT to promote the members of <4> because the reference to it is from a member that IS named.

Seems cleaner to flag the struct in v5, especially as it has essentially zero size cost (DW_AT_export_names would use DW_FORM_flag_present).

I'm aware that other languages can have different lookup rules, some of which really can't be expressed in DWARF and the consumer has to Just Know, if it's the kind of consumer that is expected to do name lookups. And the consumer also has to be ready to do the more complicated thing for earlier DWARF versions that don't have the flag. But if we have the flag, it seems like we ought to make use of it here.

Looking at the dwarfstd.org examples, without the flag (e.g., for DWARF v4) we should emit something like this:

<1> DW_TAG_struct_type
        DW_AT_name "A"
  <2> DW_TAG_struct_type
          // no DW_AT_name
      <3> DW_TAG_member
            DW_AT_name "x"
            DW_AT_type (int)
  <4> DW_TAG_struct_type
          // no DW_AT_name
      <5> DW_TAG_member
            DW_AT_name "y"
            DW_AT_type (int)
  <6> DW_TAG_member // this is A's member for the anonymous struct
          // no DW_AT_name
          DW_AT_type <2>
  <7> DW_TAG_member // this is A's member for the unnamed struct
          DW_AT_name "C"
          DW_AT_type <4>

which means the way the consumer knows to promote the members of <2> to the scope of <1> is that its only reference (from <6>) is via a member that is unnamed, and it knows NOT to promote the members of <4> because the reference to it is from a member that IS named.

Seems cleaner to flag the struct in v5, especially as it has essentially zero size cost (DW_AT_export_names would use DW_FORM_flag_present).

I'm aware that other languages can have different lookup rules, some of which really can't be expressed in DWARF and the consumer has to Just Know, if it's the kind of consumer that is expected to do name lookups. And the consumer also has to be ready to do the more complicated thing for earlier DWARF versions that don't have the flag. But if we have the flag, it seems like we ought to make use of it here.

Yeah, I can't argue much about not adding it here - given the low cost & extra expressivity.

But, yeah, if I were the one working on lldb, I'd be inclined to make this work correctly without/regardless of the flag (by looking at whether the unnamed struct was referenced from an unnamed member variable or not)

shafik updated this revision to Diff 216293.Aug 20 2019, 5:06 PM

Changed isExportSymbols -> getExportSymbols

Could you please also document the new flag in LangRef.rst under DIFlags?

Thanks for this (just added a few nitpicks inside).

test/Assembler/export-symbol-anonymous-class.ll
20 ↗(On Diff #216293)

now: !llvm.ident = !{!16}

24 ↗(On Diff #216293)

You can delete all the verbose info about the compiler version as following:
distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !3, producer: "clang version 10.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !5, nameTableKind: GNU)

25 ↗(On Diff #216293)

This ca be just a /dir as following:
!DIFile(filename: "simple_anon_class.cpp", directory: "/dir")

38 ↗(On Diff #216293)

In order to keep the order of the matadata:
!16 = !{!"clang version 10.0.0"}

shafik updated this revision to Diff 216493.Aug 21 2019, 3:00 PM
shafik marked 4 inline comments as done.
  • Adding documentation to LangRef.rst
  • Simplified the test further.

@djtodoro Thank for the suggestions on how to simplify the test even more.

aprantl accepted this revision.Aug 21 2019, 3:29 PM
This revision is now accepted and ready to land.Aug 21 2019, 3:29 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2019, 10:19 AM