This is an archive of the discontinued LLVM Phabricator instance.

[clang][DebugInfo] Add abi-tags on constructors/destructors as LLVM annotations
AcceptedPublic

Authored by Michael137 on Feb 16 2023, 5:27 AM.

Details

Reviewers
dblaikie
aprantl
Summary

Summary

After this patch, any AbiTagAttr attribute on a constructor/destructor
(like the ones for numerous types in libcxx) will be attached to the
declaration DW_TAG_subprogram as a DW_TAG_LLVM_annotation. E.g.,

DW_TAG_subprogram
  DW_AT_name        ("shared_ptr")
  DW_AT_decl_file   ("./bin/../include/c++/v1/__memory/shared_ptr.h")
  DW_AT_decl_line   (475)
  DW_AT_declaration (true)
  DW_AT_external    (true)
  DW_AT_accessibility       (DW_ACCESS_public)

  DW_TAG_LLVM_annotation
    DW_AT_name      ("abi_tag")
    DW_AT_const_value       ("v170000")

We only emit these annotations for constructors/destructors declarations
because those don't carry linkage names, which is where this information
is usually encoded.

Motivation

TLDR: this makes it possible for LLDB to evaluate expressions involving
ABI-tagged constructors/destructors.

ABI tags affect a function's linkage name, which LLDB uses to resolve
function calls during expression evaluation. However,
constructors/destructors are emitted as declarations and can have
multiple definitions with different linkage names for a single
declaration DIE. LLDB relies on Clang to mangle the function AST node
into the correct linkage name. However, LLDB doesn't know about the
AbiTagAttr that was attached to a constructor/destructor in the
source, so there's no way to cleanly reconstruct the AST and resolve
the function symbol.

With this patch, LLDB can attach the ABI tag to the ctor/dtor decl
and let clang determine which linkage name to pick.

Diff Detail

Event Timeline

Michael137 created this revision.Feb 16 2023, 5:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 5:27 AM
Michael137 requested review of this revision.Feb 16 2023, 5:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 5:27 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
  • Remove leftover comments
  • Remove leftover comments

Left is without patch. Right is with patch.

----------------------------------------------------                  ----------------------------------------------------                 
file: bin/lldb.dSYM/Contents/Resources/DWARF/lldb                     file: bin/lldb.dSYM/Contents/Resources/DWARF/lldb                    
----------------------------------------------------                  ----------------------------------------------------                 
SECTION           SIZE (b)                                            SECTION           SIZE (b)                                           
----------------  --------                                            ----------------  --------                                           
__debug_line       1168840 (6.21%)                                    __debug_line       1168840 (6.18%)                                   
__debug_info       6102802 (32.40%)                                   __debug_info       6196012 (32.73%)                                  
__debug_ranges       11760 (0.06%)                                    __debug_ranges       11760 (0.06%)                                   
__debug_aranges       3360 (0.02%)                                    __debug_aranges       3360 (0.02%)                                   
__debug_loc            412 (0.00%)                                    __debug_loc            412 (0.00%)                                   
__debug_abbrev        8419 (0.04%)                                    __debug_abbrev        8431 (0.04%)                                   
__debug_str        7762681 (41.21%)                                   __debug_str        7762697 (41.01%)                                  
__apple_names       654916 (3.48%)                                    __apple_names       654916 (3.46%)                                   
__apple_namespac     22936 (0.12%)                                    __apple_namespac     22936 (0.12%)                                   
__apple_types      1185783 (6.30%)                                    __apple_types      1185783 (6.26%)                                   
__apple_objc            36 (0.00%)                                    __apple_objc            36 (0.00%)                                   
                                                                                                                                           
 Total Size: 16921945  (89.84%)                                        Total Size: 17015183  (89.89%)                                      
 Total File Size: 18834777                                             Total File Size: 18928015                                           
----------------------------------------------------                  ----------------------------------------------------                 
                                                                                                                                           
----------------------------------------------------                  ----------------------------------------------------                 
file: lib/libc++.1.dylib.dSYM/Contents/Resources/DWARF/libc++.1.dylib file: lib/libc++.1.dylib.dSYM/Contents/Resources/DWARF/libc++.1.dylib
----------------------------------------------------                  ----------------------------------------------------                 
SECTION           SIZE (b)                                            SECTION           SIZE (b)                                           
----------------  --------                                            ----------------  --------                                           
__debug_line        497612 (12.53%)                                   __debug_line        497612 (12.49%)                                  
__debug_aranges       2032 (0.05%)                                    __debug_aranges       2032 (0.05%)                                   
__debug_info       1353725 (34.09%)                                   __debug_info       1366516 (34.30%)                                  
__debug_ranges       10544 (0.27%)                                    __debug_ranges       10544 (0.26%)                                   
__debug_loc            515 (0.01%)                                    __debug_loc            515 (0.01%)                                   
__debug_abbrev        5177 (0.13%)                                    __debug_abbrev        5189 (0.13%)                                   
__debug_str        1084612 (27.31%)                                   __debug_str        1084628 (27.22%)                                  
__apple_names       250484 (6.31%)                                    __apple_names       250484 (6.29%)                                   
__apple_namespac      1476 (0.04%)                                    __apple_namespac      1476 (0.04%)                                   
__apple_types        81090 (2.04%)                                    __apple_types        81090 (2.04%)                                   
__apple_objc            36 (0.00%)                                    __apple_objc            36 (0.00%)                                   
                                                                                                                                           
 Total Size: 3287303  (82.78%)                                         Total Size: 3300122  (82.83%)                                       
 Total File Size: 3971335                                              Total File Size: 3984154                                            
----------------------------------------------------                  ----------------------------------------------------
  • Update test

Size-wise this looks like an acceptable increase. If we created a new DW_AT_LLVM_abi_tag, we could save an extra 4 bytes (assuming DW_FORM_strp) per DIE. That might be worth it?

Ah, accidentally posted to the lldb part of this stack... instead:

Any chance we can make these work more like member functions (could the ctors include their mangled names, for instance)? Or is it the innate nature of ctors having the various C1/C2/etc versions?

Ah, accidentally posted to the lldb part of this stack... instead:

Any chance we can make these work more like member functions (could the ctors include their mangled names, for instance)? Or is it the innate nature of ctors having the various C1/C2/etc versions?

Initially we tried that in https://reviews.llvm.org/D143652. The existence of multiple constructor definitions which aren't linked to the DISubprogram declaration makes it tough. We need to start with a pretty expensive search through the index for all the possible definitions. But then we need to somehow choose the right one to take the linkage name from. And that context isn't available at the point where LLDB parses DWARF.

I'll post some numbers of how much space this would take with Adrian's suggestion and go from there

Ah, accidentally posted to the lldb part of this stack... instead:

Any chance we can make these work more like member functions (could the ctors include their mangled names, for instance)? Or is it the innate nature of ctors having the various C1/C2/etc versions?

Initially we tried that in https://reviews.llvm.org/D143652. The existence of multiple constructor definitions which aren't linked to the DISubprogram declaration makes it tough. We need to start with a pretty expensive search through the index for all the possible definitions. But then we need to somehow choose the right one to take the linkage name from. And that context isn't available at the point where LLDB parses DWARF.

I'll post some numbers of how much space this would take with Adrian's suggestion and go from there

On second thought, since there are possibly multiple ABI tags per decl and only a single attribute is allowed on a DIE we'd need to somehow encode the ABI tag strings into the single attribute. It's doable but I think the patch as proposed would be more maintainable.

Debug-info size difference for liblldb.dylib is below (left is with the annotations, right is without). I.e., debug size difference is ~0.4%

---------------------------------------------------------------------------------          
file: lib/liblldb.dylib.dSYM/Contents/Resources/DWARF/liblldb.dylib                        
---------------------------------------------------------------------------------          
SECTION           SIZE (b)                 SECTION           SIZE (b)                      
----------------  ---------                ----------------  ---------                     
__debug_line       91828488 (8.10%)        __debug_line       91828547 (8.12%)             
__debug_info      268703597 (23.69%)       __debug_info      265011327 (23.44%)            
__debug_ranges      2116400 (0.19%)        __debug_ranges      2116400 (0.19%)             
__debug_aranges      200464 (0.02%)        __debug_aranges      200464 (0.02%)             
__debug_loc           27234 (0.00%)        __debug_loc           27234 (0.00%)             
__debug_abbrev        14129 (0.00%)        __debug_abbrev        14117 (0.00%)             
__debug_str       456849994 (40.28%)       __debug_str       456849978 (40.41%)            
__apple_names      48851592 (4.31%)        __apple_names      48851592 (4.32%)             
__apple_namespac     638616 (0.06%)        __apple_namespac     638616 (0.06%)             
__apple_types      27153903 (2.39%)        __apple_types      27153903 (2.40%)             
__apple_objc             36 (0.00%)        __apple_objc             36 (0.00%)             
                                                                                           
 Total Size: 896384453  (79.04%)            Total Size: 892692214  (78.97%)                
 Total File Size: 1134157253                Total File Size: 1130465014                    
----------------------------------------------------

I'll just put this behind an lldb tuning

Ah, accidentally posted to the lldb part of this stack... instead:

Any chance we can make these work more like member functions (could the ctors include their mangled names, for instance)? Or is it the innate nature of ctors having the various C1/C2/etc versions?

Initially we tried that in https://reviews.llvm.org/D143652. The existence of multiple constructor definitions which aren't linked to the DISubprogram declaration makes it tough. We need to start with a pretty expensive search through the index for all the possible definitions. But then we need to somehow choose the right one to take the linkage name from. And that context isn't available at the point where LLDB parses DWARF.

I'll post some numbers of how much space this would take with Adrian's suggestion and go from there

Hmm, I'd sort of still be inclined to look further at the linkage name option - but maybe the existence of a situation where the usage is built with debug info, but the out of line ctor definitions are all in another library without debug info is a sufficiently motivating use case to justify the addition of these attributes to the ctor declaration. *shrug* Carry on then.

  • Hide behind LLDB tuning

Hmm, I'd sort of still be inclined to look further at the linkage name option - but maybe the existence of a situation where the usage is built with debug info, but the out of line ctor definitions are all in another library without debug info is a sufficiently motivating use case to justify the addition of these attributes to the ctor declaration. *shrug* Carry on then.

That would've been convenient but I don't think we can get it right with attaching the linkage name (even if we are willing to do the search for the correct linkage name). That would essentially coalesce the possibly multiple definitions of the constructor into a single one (since there's only ever a single CXXConstructorDecl in the AST for a given constructor in the source)

(CC @labath)

Michael137 edited the summary of this revision. (Show Details)Feb 20 2023, 10:06 AM
Michael137 edited the summary of this revision. (Show Details)
Michael137 edited the summary of this revision. (Show Details)
aprantl accepted this revision.Feb 20 2023, 2:26 PM
This revision is now accepted and ready to land.Feb 20 2023, 2:26 PM

Hmm, I'd sort of still be inclined to look further at the linkage name option - but maybe the existence of a situation where the usage is built with debug info, but the out of line ctor definitions are all in another library without debug info is a sufficiently motivating use case to justify the addition of these attributes to the ctor declaration. *shrug* Carry on then.

That would've been convenient but I don't think we can get it right with attaching the linkage name (even if we are willing to do the search for the correct linkage name). That would essentially coalesce the possibly multiple definitions of the constructor into a single one (since there's only ever a single CXXConstructorDecl in the AST for a given constructor in the source)

(CC @labath)

The thing that bothers me about this approach is that finding the right constructor in the object file requires reconstructing _exactly_ the mangled name of the constructor variant, and then looking up that name in the symbol table. The construction of the mangled name does not require getting just the abi tags of the constructor itself right. Rather, it requires knowing the abi tag annotations of all arguments (template and regular) of the constructor, and template arguments of all enclosing classes. Is the information in this patch enough to reconstruct this name (_ZN2S1I1AB4asdfE2S2I1BB4asdfEC2I1CB4asdfEET_1DB4asdf, a.k.a, S1<A[abi:asdf]>::S2<B[abi:asdf]>::S2<C[abi:asdf]>(C[abi:asdf], D[abi:asdf]))?

Even if it is, it still feels like we're not using dwarf the way it's meant to be used. Normal function would have DW_AT_linkage_name and DW_AT_addr attributes, and any one of those would be sufficient to find the function (unambiguously). The fact that we can't do that for constructors makes me believe that the problem is elsewhere. In a way, it feels like the right way to represent this situation in DWARF is to have two DW_TAG_subprogram DIEs -- given that there actually are two functions in the object file.

Of course that has a lot of problems of its own -- from breaking all existing debug info consumers to figuring out how to feed this information back into clang (in the LLDB expression parser). However, since we're contemplating extensions anyway, we could at least solve the first problem fairly easily by making the extra constructor an extension -- either an extra extension DIE, or just an extra attribute (with the other linkage name) on the existing constructor. For the second part, I could imagine some kind of an extension for the asm annotation for constructors, which would allow one to pass two mangled names instead of one. I have no idea how hard would it be to implement it though...

Anyway, that's just a dump of my thoughts on this topic. Feel free to disagree with this rant...

Michael137 added a comment.EditedFeb 21 2023, 1:32 AM

The construction of the mangled name does not require getting just the abi tags of the constructor itself right. Rather, it requires knowing the abi tag annotations of all arguments (template and regular) of the constructor, and template arguments of all enclosing classes. Is the information in this patch enough to reconstruct this name (_ZN2S1I1AB4asdfE2S2I1BB4asdfEC2I1CB4asdfEET_1DB4asdf, a.k.a, S1<A[abi:asdf]>::S2<B[abi:asdf]>::S2<C[abi:asdf]>(C[abi:asdf], D[abi:asdf]))?

That's a fair point. I purposefully didn't support abi_tags on all possible declarations because that would've ballooned the debug-info size too much. And we get away with it because libc++ exclusively tags functions. But of course there's no guarantee that they won't start tagging structures or namespaces

we could at least solve the first problem fairly easily by making the extra constructor an extension -- either an extra extension DIE, or just an extra attribute (with the other linkage name) on the existing constructor. For the second part, I could imagine some kind of an extension for the asm annotation for constructors, which would allow one to pass two mangled names instead of one. I have no idea how hard would it be to implement it though...

I've thought about doing it that way for a bit too. It does seem like a more consistent approach. What I concluded was that LLDB would have a hard time with this sort of setup. E.g., how would we know which subprogram (and thus which linkage name) to construct the single CXXConstructorDecl with? And once we are set on a linkage name how would we make clang pick the other one when appropriate?

Although I don't know of a scenario where we ever explicitly need to call the base object constructor from the expression evaluator. So maybe we can get away with only linking the complete object constructor DIE to the subprogram. Via a DW_AT_LLVM_complete_ctor or something and make sure to always attach the C1 linkage name to the CXXConstructorDecl? Would that be closer to what you're imagining? And if a consumer ever does need the C2 linkage name we can extend DWARF with it in the future

The construction of the mangled name does not require getting just the abi tags of the constructor itself right. Rather, it requires knowing the abi tag annotations of all arguments (template and regular) of the constructor, and template arguments of all enclosing classes. Is the information in this patch enough to reconstruct this name (_ZN2S1I1AB4asdfE2S2I1BB4asdfEC2I1CB4asdfEET_1DB4asdf, a.k.a, S1<A[abi:asdf]>::S2<B[abi:asdf]>::S2<C[abi:asdf]>(C[abi:asdf], D[abi:asdf]))?

That's a fair point. I purposefully didn't support abi_tags on all possible declarations because that would've ballooned the debug-info size too much. And we get away with it because libc++ exclusively tags functions. But of course there's no guarantee that they won't start tagging structures or namespaces

Got it.

we could at least solve the first problem fairly easily by making the extra constructor an extension -- either an extra extension DIE, or just an extra attribute (with the other linkage name) on the existing constructor. For the second part, I could imagine some kind of an extension for the asm annotation for constructors, which would allow one to pass two mangled names instead of one. I have no idea how hard would it be to implement it though...

I've thought about doing it that way for a bit too. It does seem like a more consistent approach. What I concluded was that LLDB would have a hard time with this sort of setup. E.g., how would we know which subprogram (and thus which linkage name) to construct the single CXXConstructorDecl with? And once we are set on a linkage name how would we make clang pick the other one when appropriate?

Thanks for considering this.

This is what I meant by the "extension for the asm annotation for constructors" part. In principle, we could extend clang so that one can write something like this:

class A {
  A() asm("mangled_name_for_the_full_constructor", "mangled_name_for_the_base_constructor");
};

and then clang would automatically pick the right mangled name depending on the context. (Exact syntax TBD, and one would need to figure out what to do with "allocating ctor" and "deleting dtor" variants). Once we had that, we generate that annotation using the data we parse from the DWARF. The exact method for that would depend on the chosen format, but it shouldn't be too complicated. I suspect the hardest part would be plumb this extra info all the way through clang and llvm.

Although I don't know of a scenario where we ever explicitly need to call the base object constructor from the expression evaluator.

It's definitely contrived, but I think one could write something like this:

(lldb) expr --top-level class A : public virtual_class_from_DWARF {}
(lldb) expr A().foo()

This is what I meant by the "extension for the asm annotation for constructors" part. In principle, we could extend clang so that one can write something like this:

Ah thanks for elaborating. That's an interesting approach. So this would require adding the asm extension to clang, emitting into DWARF some link from subprogram to the various possible linkage names, and then putting it all together in LLDB. I'll have a look at how feasible this might be

It's definitely contrived, but I think one could write something like this:
(lldb) expr --top-level class A : public virtual_class_from_DWARF {}
(lldb) expr A().foo()

Cool, that's pretty neat, thanks. Can confirm that it tries to pick the C2 constructor (and works fine with the proposed patch series); so regardless of where this goes, at the very least got another test-case :)

Michael137 added a comment.EditedFeb 21 2023, 10:45 AM

We thought a bit about what it would take to link a constructor declaration DIE to the various definitions (e.g., via a DW_AT_LLVM_complete_ctor_linkage_name or DW_AT_LLVM_complete_ctor_ref). The issue with this is that it would mess with type uniquing. E.g., what if two different CUs each had a constructor declaration but differed only in which definitions they linked to? Even if dsymutil were to be able to merge the declarations into a single one, LLDB would still have to support the case where two constructor DIEs for the same type point to different definitions depending on how they were used in the corresponding object files.

One could instead have some new attribute on the various constructor definitions specifying which constructor type it is, and then implement @pavel's suggestion of attaching multiple asm labels to the CXXConstructorDecl. To support that in LLDB we'd have to do a lookup in the DWARF index and filter appropriately. But it's unclear to me whether this is much better than the proposed patch. I pinged the libcxx people regarding their policy of using abi-tags on structures/namespaces

Some more findings on the Clang side: a very naive way to support multiple asm labels is to adjust the ManglingContext::mangleName to be aware of multiple asm labels and pick the correct one based on the GlobalDecl we're mangling. But we'd need to somehow match CtorType to the AsmLabelAttr in question. Which is possibly doable by encoding the CtorType into the label. E.g., asm("C2<mangled_name>") (?) Though that seems hacky

labath added a comment.Mar 6 2023, 3:26 AM

We thought a bit about what it would take to link a constructor declaration DIE to the various definitions (e.g., via a DW_AT_LLVM_complete_ctor_linkage_name or DW_AT_LLVM_complete_ctor_ref). The issue with this is that it would mess with type uniquing. E.g., what if two different CUs each had a constructor declaration but differed only in which definitions they linked to? Even if dsymutil were to be able to merge the declarations into a single one, LLDB would still have to support the case where two constructor DIEs for the same type point to different definitions depending on how they were used in the corresponding object files.

One could instead have some new attribute on the various constructor definitions specifying which constructor type it is, and then implement @pavel's suggestion of attaching multiple asm labels to the CXXConstructorDecl. To support that in LLDB we'd have to do a lookup in the DWARF index and filter appropriately.

What about doing both? Have the declaration DIE specify the linkage names of both structor types (regardless of whether they are used in a particular CU -- just like with "normal" functions), and then have the definition DIEs specify which flavour of the structor are they defining?

But it's unclear to me whether this is much better than the proposed patch. I pinged the libcxx people regarding their policy of using abi-tags on structures/namespaces

Yea.. I don't know. I don't really like it because it's still very much a secret clang-lldb handshake that will not work with other compilers or debuggers. But maybe it's better because it's not a three-way libc++-clang-lldb handshake and has a chance of working with libraries using abi tags in a different way than libc++?

I'm just the peanut gallery here, so feel free to overrule me.

Some more findings on the Clang side: a very naive way to support multiple asm labels is to adjust the ManglingContext::mangleName to be aware of multiple asm labels and pick the correct one based on the GlobalDecl we're mangling. But we'd need to somehow match CtorType to the AsmLabelAttr in question. Which is possibly doable by encoding the CtorType into the label. E.g., asm("C2<mangled_name>") (?) Though that seems hacky

A bit, though if the code change itself is simple/localized, then we might be able to get away with a new attribute, or an extension of the existing one (A() __attribute((full_structor_asm("fooC1"), base_structor_asm("fooC2")) or something -- it's not like people will be actually typing this code anyway...)