This is an archive of the discontinued LLVM Phabricator instance.

lldb: Use the DWARF linkage name when importing C++ methods
ClosedPublic

Authored by nelhage on Nov 20 2017, 10:40 PM.

Details

Summary

When importing C++ methods into clang AST nodes from the DWARF symbol table, preserve the DW_AT_linkage_name and use it as the linker ("asm") name for the symbol.

Concretely, this enables expression to call into names that use the GNU abi_tag extension, and enables lldb to call into code using std::string or std::list from recent versions of libstdc++, as described in https://bugs.llvm.org/show_bug.cgi?id=35310 . This also seems potentially broadly more robust than relying on the DWARF->clang->codegen pipeline to roundtrip properly, but I'm not immediately aware of any other cases in which it makes a difference.

Diff Detail

Repository
rL LLVM

Event Timeline

nelhage created this revision.Nov 20 2017, 10:40 PM
labath added a subscriber: labath.

This seems like a good idea to me (but Greg or Jim should review this).

Could you please also add a test case for this. It looks like you already have a simple program in the bug you linked, so it should be just a matter of turning that into a test. Let me know if you run into problems while doing that.

nelhage updated this revision to Diff 124018.Nov 22 2017, 3:55 PM

Add a test case.

Verified that it passes on my machine and fails with this patch
reverted.

nelhage updated this revision to Diff 124019.Nov 22 2017, 3:57 PM

Remove accidentally-added flycheck file.

clayborg edited edge metadata.Nov 27 2017, 9:42 AM

Jim will need to ok this. A few concerns follow. Most of the time when we don't get the DWARF -> AST conversion right, it can mean that we might code gen incorrect code. Is there not enough information for the GNU abi_tag in the DWARF to actually re-create these classes correctly in the AST? My first inclination would be to fix that if possible. If there is info missing from the DWARF such that we can't re-create the AST correctly, then it seems that this approach will help. Can you clarify what the GNU abi_tag stuff does? Will it change how code would be generated? If we have things in the AST that aren't correct, but have been fixed by correcting the "asm" tag, will we JIT up bad code?
-

Jim will need to ok this. A few concerns follow. Most of the time when we don't get the DWARF -> AST conversion right, it can mean that we might code gen incorrect code. Is there not enough information for the GNU abi_tag in the DWARF to actually re-create these classes correctly in the AST?

Correct, as far as I can tell. Looking at a symbol definition from the test program on the issue via llvm-dwarfdump, I see

0x00000076:     DW_TAG_subprogram
                  DW_AT_linkage_name    ("_ZN1A16helloWorld_cxx11B5cxx11Ev")
                  DW_AT_name    ("helloWorld_cxx11")
                  DW_AT_decl_file       ("/tmp/test_abi_tag.cc")
                  DW_AT_decl_line       (5)
                  DW_AT_declaration     (true)
                  DW_AT_external        (true)
                  DW_AT_accessibility   (DW_ACCESS_public)

0x00000082:       DW_TAG_formal_parameter
                    DW_AT_type  (cu + 0x009b "A*")
                    DW_AT_artificial    (true)

0x00000087:       NULL

The function of the abi_tag annotation is to append the B5cxx11 to the mangled name; I can't find any other indication in the DWARF of the ABI tag.

You can read more about abi_tag here:
https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html
Or an explanation of why it exists and how it is used here:
https://davmac.wordpress.com/2015/07/19/tale-of-two-abis/

The effect of this patch on the code generated by expression is to change the names we emit into the LLVM IR; Previously, we would emit a call to _ZN1A16helloWorld_cxx11Ev, which would then fail when resolving symbols. Now we (correctly) emit a call to _ZN1A16helloWorld_cxx11B5cxx11Ev.

The test case I added in this patch also demonstrates that GCC and clang both support arbitrarily overriding a symbol's mangled name in the source using asm("new_name"). I can't recall having ever encountered a use of that in the wild, so I don't care strongly about supporting it, but if we want to I think this general approach is almost certainly the only approach that would do so.

ok, sounds like the abi_tag stuff is meant to just affect the mangled named with no code gen implications. Jim should ok this, but I am ok if he is.

jingham edited edge metadata.Nov 27 2017, 1:53 PM

This patch is actually pretty broad, since it will make all C++ declarations coming from DWARF have overridden names so far as clang is concerned. So we are relying on the fact that as far as clang is concerned a method with an asm assigned name is equivalent to method whose name mangles to the same thing.

From what I can tell scanning through the clang sources, this attribute just overrides whatever mangling clang was intending to do for this decl. There shouldn't be any case where we have an incorrect mangled name in the DWARF so at first glance that should be okay.

But since this affects all C++ method names that we pass to clang, I'd feel better if we could get confirmation from a clang expert that this change has no other side-effects.

Hey -- Is there anything I can do to move this patch forward? Would it help to do something like only setting the attribute if the mangled name that *would* be generated doesn't match the one from the DWARF? Any such symbol is already broken for lldb, so such a patch feels perhaps safer and more conservative, at the cost of a bit of added complexity.

Sounds like finding a clang expert to clarify what Jim last asked for is the way forward. Do a source control "blame" command and see who worked on the code in the area of clang and maybe add them as reviewers so they can comment? I agree with Jim that this sounds like a good fix, but we should be careful. Is there no other way to detect this special "asm" name? No extra DWARF attributes?

Added Adrian Prantl as a reviewer in case he has any input. Adrian: is there any way that a DIE is marked up with an extra attribute when the asm name is explicitly set? It would be great to know this from the DWARF so we don't have to end up setting the ASM name for every single possibly affected DIE...

And I do agree with Jim that we don't want to have to mangle the typename to see if it matches, that is too much work.

Added Adrian Prantl as a reviewer in case he has any input. Adrian: is there any way that a DIE is marked up with an extra attribute when the asm name is explicitly set? It would be great to know this from the DWARF so we don't have to end up setting the ASM name for every single possibly affected DIE...

I'm afraid DW_AT_linkage *is* that attribute. If you don't want to explicitly set the mangled name for each entity with a DW_AT_linkage_name, the way to do this is to add a DWARF extension like DW_AT_GNU_abi_tag or something. How does GDB/gcc handle this? Maybe they already invented an encoding for the attribute?

Compiling my test program with g++ and looking at llvm-dwarfdump -all, I can see no reference to the ABI tag other than in the DW_AT_linkage_name. Skimming the gdb source, I see references to it knowing how to *demangle* ABI tags, but no references in the DWARF code or otherwise in the expression-handler. I strongly suspect that GDB implements essentially this solution, using DW_AT_linkage_name to populate linkage names in its symbol table, and uses its C++ demangler to find ABI tags for display purposes.

Further evidence that the ABI tag is never stored separately:

$ strings -a -n4 test_abi_tag | grep cxx11
helloWorld_cxx11
_ZN1A16helloWorld_cxx11B5cxx11Ev
_ZN1A16helloWorld_cxx11B5cxx11Ev

Those three instances are, I presume, the DW_AT_name, and the DW_AT_linkage_name, and the ELF symbol table.

I did a little bit of looking into performance implications today. It looks like DWARFASTParserClang::ParseTypeFromDWARF is only called lazily as symbols are needed, which alleviates some of my concerns, and also makes it a bit trickier to construct a convenient benchmark. Does someone with performance concerns have a specific usage pattern or operation or mode that they're concerned by the performance of for me to dig into?

davide added a subscriber: davide.Jan 20 2018, 9:32 PM

I did a little bit of looking into performance implications today. It looks like DWARFASTParserClang::ParseTypeFromDWARF is only called lazily as symbols are needed, which alleviates some of my concerns, and also makes it a bit trickier to construct a convenient benchmark. Does someone with performance concerns have a specific usage pattern or operation or mode that they're concerned by the performance of for me to dig into?

My take is that we should consider going ahead with this. I tried and I confirm your experience (basically std::string in libstdc++ is undebbuggable).
Please wait a couple of days in case somebody has comments otherwise I'd say this can go in. BTW, do you need somebody to commit this on your behalf?

Thanks!

Davide

Yes, I will need someone else to commit on my behalf -- I don't have commit access.

I am fine with checking this. The only issue on my end is the extra memory that will be needed to store these often huge mangled names in every AST context for the 0.0001% of the time it is actually needed.

davide accepted this revision.Jan 22 2018, 9:20 AM

Thanks Greg. I'll wait another day and check this one in. Nelson, does the plan make sense to you?

This revision is now accepted and ready to land.Jan 22 2018, 9:20 AM

I am fine with checking this. The only issue on my end is the extra memory that will be needed to store these often huge mangled names in every AST context for the 0.0001% of the time it is actually needed.

Well since this is for libstdc++ only, should we make this a platform-specific behavior and, e.g., disable it on Darwin?

I am fine with checking this. The only issue on my end is the extra memory that will be needed to store these often huge mangled names in every AST context for the 0.0001% of the time it is actually needed.

Well since this is for libstdc++ only, should we make this a platform-specific behavior and, e.g., disable it on Darwin?

Could be, but is it really worth the complexity? I'd rather defer this if we find it being a performance problem.
Either way is fine to me, FWIW.

Thanks Greg. I'll wait another day and check this one in. Nelson, does the plan make sense to you?

Yep, sounds great. Thanks so much.

I'd prefer to not gate this on !Darwin, since the added complexity seems unfortunate, and this isn't, strictly-speaking, only about libstdc++ (although in practice that's the ~only place where it's likely to manifest). But I'm happy to write the patch if it turns out to be an issue and the maintainers think that's the way forward.

As Greg pointed out C++ (and Swift) mangled names can be enormous, so it would definitely have an impact on the memory footprint (unless we are only passing StringRefs around. Are we?)

As Greg pointed out C++ (and Swift) mangled names can be enormous, so it would definitely have an impact on the memory footprint (unless we are only passing StringRefs around. Are we?)

Most strings seemed to be created using clang identifiers... Not sure what this API uses for the asm name...

As there are no strong objections, I'm going to check this in tomorrow PST and see how it goes.

Thanks for your contribution.

This revision was automatically updated to reflect the committed changes.