Better scheme to lookup alternate mangled name when looking up function address.

This change is relevant for inferiors compiled with GCC. GCC does not
emit complete debug info for std::basic_string<...>, and consequently, Clang
(the LLDB compiler) does not generate correct mangled names for certain

This change removes the hard-coded alternate names in

Before the hard-coded names were put in ItaniumABILanguageRuntime.cpp, one could
not evaluate std::string methods (ex. std::string::length). After putting in
the hard-coded names, one could evaluate them. However, it did not still
enable one to call methods on, say for example, std::vector<string>.
This change makes that possible.

There is some amount of incompleteness in this change. Consider the
following example:

std::string hello("hello"), world("world");
std::map<std::string, std::string> m;
m[hello] = world;

One can still not evaluate the expression "m[hello]" in LLDB. Will
address this issue in another pass.

Just a mention: This change does have an impact on expression evaluation performance and memory utilization.

So let me try to understand. When we are asked during expressions to lookup some mangled named for "std::string::length()", it doesn't exist in GCC binaries. So we want to then find any alternate manglings and we do this by asking the symbol file to find alternate manglings for "std::string::length"? I.E. we remove the parens and any arguments and lookup just the fully qualified function name?

If so, we should just lookup functions using:

Module::FindFunctions (const ConstString &name,
               const CompilerDeclContext *parent_decl_ctx,
               uint32_t name_type_mask, 
               bool symbols_ok,
               bool inlines_ok,
               bool append, 
               SymbolContextList& sc_list);

And then look at all of the symbol contexts in sc_list and find a function that we want? I don't really see the need for any of the new SymbolFile::GetMangledNamesForFunction() functions.

I could very well be missing something obvious. However, let me explain what I am trying to solve here. Lets take the example of std::vector<string>::size method. The DWARF we get when compiled with GCC is as follows:

< 3><0x000020a3>        DW_TAG_subprogram
                          DW_AT_external              yes(1)
                          DW_AT_name                  "size"
                          DW_AT_decl_file             0x00000003 /usr/include/c++/4.8/bits/stl_vector.h
                          DW_AT_decl_line             0x00000285
                          DW_AT_linkage_name          "_ZNKSt6vectorISsSaISsEE4sizeEv"
                          DW_AT_type                  <0x00001eb1>
                          DW_AT_accessibility         DW_ACCESS_public
                          DW_AT_declaration           yes(1)
                          DW_AT_object_pointer        <0x000020bc>
                          DW_AT_sibling               <0x000020c2>

If we demangle the linkage name from here, we get "std::vector<std::string, std::allocator<std::string> >::size() const". So, the m_function_fullname_index of SymbolFileDWARF will have entry for this function with this full name.

However, due to missing debug info elsewhere, the IR generated by clang (the LLDB compiler), generates a mangled name like this:
"_ZNKSt6vectorISbIcSt17char_traits<char>St15allocator<char>ESt82allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >E4sizeEv"

This demangles to
"std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::size() const"

Since neither the clang generated mangled name is present in the ELF symtab, and nor its corresponding demangled name is not present in any of the DWARF indices, the existing FindFunctions will not be helpful. Also, only functions with debug info (those which have an address specified in the DWARF) are indexed.

What I am doing in my change is to use the fact that all methods (and their types) are grokked while creating the AST for clang (the LLDB compiler). So, when a method is grokked, store a map from its scoped name to its DIE. Even if there were any discrepancies in the mangled name in the debug info versus that generated by the LLDB compiler, the fully scoped names should be the same. In which case, use the fully scoped name to get to the DIE and retrieve its "actual" mangled name.

I LOVE the idea of getting rid of those horrid "alternate manglings." We knew what the mangling was during name lookup, we should be able to recognize them later!
As listed in my inline comments, I have some concerns about the scope. This knowledge is built up during expression parsing and used during expression parsing – we're done.
Thanks for working on this, Siva.

This should definitely only be done if we can't find the name the original way. I'm always happy to pay extra runtime to fix an expression that would otherwise not work – but expressions that would work (the >90% case) shouldn't be paying for this.


Why is this attached to the DWARF? I would want to attach this to the ClangExpressionDeclMap because we identify these alternate names during function name lookup, and we just need to remember them when resolving the references in IR. After that, they are no longer needed.

sivachandra added inline comments.Sep 14 2015, 1:44 PM
The "original" way is still attempted first at line 643 above. Lines 669 to 674 below take care of another problem: Since that problem is much more a rare case than that solved in this change, I chose to keep this "try" before that "try".


My thinking was, DWARF is the only thing which knows about the correct mangled name, so keep it close to the code dealing with DWARF. Your suggestion also makes sense, but might (I have not yet thought enough about it) require us to expose DIE info into ClangExpressionDeclMap. I will think more about this approach and get back to you.

sivachandra added inline comments.Sep 15 2015, 12:43 PM

I spent some time thinking about this. ClandExpressionDeclMap doesn't really explicitly lookup method names. If we have an expression like "v.size()", we lookup what "v" is, and that conveys to Clang about the existence of a method "size" in its class. The requirement for alternate names kicks in (so to say) when we are looking for the address of the method. I am not very clear on how we can cleanly keep track of all the methods parsed, while looking up variables, in ClangExpressionDeclMap and use that knowledge while looking up addresses. Do you have any suggestions?

I agree that it is indeed odd to have a method GetMangledNamesForFunction in SymbolFile which is useful only for expression evaluation. How about having a temp object that ClangExpressionDeclMap registers with SymbolFile, and cleans it up after expression evaluation is done? SymbolFile stuffs in to the object all info that ClangExpressionDeclMap could potentially use while parsing the DIEs. ExpressionEvaluationIndex?

I finally got time to work on this. I have now rebased it and updated it take in DWO changes.

I understand this change only enhances debug experience when GCC is used as the DWARF producer (the targeted functionality already works as expected when the producer is clang). However, GCC is still very important for Android development and hence this fix is very useful for us.

A test for this already exists:

The expression parser side fixes look fine to me, and they remove some hacks and make things slightly more elegant.
Obviously the ideal would be to have GCC generate full debug info, but to me this patch looks like a fine next-best thing.

