This is an archive of the discontinued LLVM Phabricator instance.

[NativePDB] Add the ability to create clang record decls from mangled names.
AcceptedPublic

Authored by zturner on Nov 2 2018, 3:01 PM.

Details

Summary

Previously we were not able to accurately represent tag record types with clang record decls. The primary reason for this is that for type information PDB only contains fully instantiated type names. It does not contain rich information about namespaces, declaration contexts (e.g. parent classes, function scopes for local classes, etc), template parameters, or anything else. All you have is a big long string that is a fully instantiated type name. What it does give you, however, is the mangled name of this type. So we can extract all of the relevant information (minus the distinction between outer class and outer namespace) from the mangled name. This patch is the start of this effort.

It uses LLVM's demangler to demangle the name, and treat each component as one component of a namespace chain.

Obviously this is not true in the general case, as something like Foo<int>::Bar(double)::1'::LocalClass<7>::NestedClass` will get treated as if it were in a series of namespaces. However, it's a good start, and clang doesn't actually care for most uses. So we can improve on this incrementally with subsequent patches to make this more accurate.

Diff Detail

Event Timeline

zturner created this revision.Nov 2 2018, 3:01 PM
lemo accepted this revision.Nov 2 2018, 4:40 PM

nice!

lldb/source/Plugins/SymbolFile/NativePDB/MangledAST.cpp
81

what does this mean? pretend that all the members of a class are private?

This revision is now accepted and ready to land.Nov 2 2018, 4:40 PM

nice!

AFAIU this is just the "default" access of the class. I should probably investigate why it's needed in the first place, since it can be deduced from the tag type.

labath added a comment.Nov 3 2018, 3:33 AM

A very nice use of the structured demangler.

I actually found some issues with this, or at least some potential issues which I'm not sure are actual issues. But I'm adding some new functionality to LLDB to help me confirm whether these are real issues or not, and worst case scenario it will open up some new testing possibilities and help us understand the consequences of messing around with clang ASTs in various ways. So I'm not going to commit this quite yet. Stay tuned.

zturner updated this revision to Diff 172511.Nov 3 2018, 4:57 PM

I just added a new test which dumps the clang AST and makes sure no bogus records get introduced. Since this is already LGTM'ed I'm assuming this is good to go, but since it now depends on D54072, I figure I might as well update this diff since I can't commit it yet.

Bleh, I think I found a problem with this approach. Since we assume everything is a namespace, it can lead to ambiguities. I think we need to actually consult the debug info while parsing components of the mangled name. For example, suppose you have this code:

namespace NS1 {
  struct Tag2 {
    class TagRecord {};
  };
}

NS1::Tag2::TagRecord ATR;
NS1::Tag2 T2;

And in LLDB you run the following commands:

(lldb) target variable ATR
(lldb) target variable T2

We demangle the first name and create an AST from it, but we don't know what Tag2 is so we assume it's a namespace and we create a NamespaceDecl. Then we get to the second one we actually know what it is because we have a precise debug info record for it so we know it's a struct so we create a CXXRecordDecl for it. So now we end up creating an invalid AST. Clang will accept it, but it's going to lead to ambiguities on name lookup and create problems down the line. So, I think what we should do instead is ask the debug info "Do you know what the type named NS1 is in the global scope?" If no, it's a namespace. Otherwise we create (or lookup) the appropriate decl. Then we can repeat this at each step along the way.

BTW, when I run those above 2 commands with the patch as it is above, here is the output I get. And you can see that there are two conflicting decls.

D:\src\llvmbuild\ninja-x64>"bin\lldb" "-f" "D:\src\llvmbuild\ninja-x64\tools\lldb\lit\SymbolFile\NativePDB\Output\ast-reconstruction.cpp.tmp.exe"
(lldb) target create "D:\\src\\llvmbuild\\ninja-x64\\tools\\lldb\\lit\\SymbolFile\\NativePDB\\Output\\ast-reconstruction.cpp.tmp.exe"
Current executable set to 'D:\src\llvmbuild\ninja-x64\tools\lldb\lit\SymbolFile\NativePDB\Output\ast-reconstruction.cpp.tmp.exe' (x86_64).
(lldb) target variable ATR
(NS1::Tag2::TagRecord) ATR = {}
(lldb) target variable T2
(NS1::Tag2) T2 = {}
(lldb) target modules dump ast
TranslationUnitDecl 0x2247dff6fd8 <<invalid sloc>> <invalid sloc>
|-NamespaceDecl 0x2247dff7898 <<invalid sloc>> <invalid sloc> NS1
| |-NamespaceDecl 0x2247dff7918 <<invalid sloc>> <invalid sloc> Tag2
| | `-CXXRecordDecl 0x2247dff7998 <<invalid sloc>> <invalid sloc> class TagRecord definition
| |   |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
| |   | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
| |   | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
| |   | |-MoveConstructor exists simple trivial needs_implicit
| |   | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
| |   | |-MoveAssignment exists simple trivial needs_implicit
| |   | `-Destructor simple irrelevant trivial needs_implicit
| |   `-MSInheritanceAttr 0x2247dff7a60 <<invalid sloc>> Implicit __single_inheritance
| `-CXXRecordDecl 0x2247dff7b08 <<invalid sloc>> <invalid sloc> struct Tag2 definition
|   |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
|   | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
|   | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveConstructor exists simple trivial needs_implicit
|   | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
|   | |-MoveAssignment exists simple trivial needs_implicit
|   | `-Destructor simple irrelevant trivial needs_implicit
|   `-MSInheritanceAttr 0x2247dff7bd0 <<invalid sloc>> Implicit __single_inheritance
`-<undeserialized declarations>
Dumping clang ast for 1 modules.
zturner added a comment.EditedNov 4 2018, 11:55 AM

So I tried to go down this route, but alas, I still can't come up with a good way to make color output work, because the Stream is not a StreamFile object, it is a StreamString object, even when the output is going to a terminal. We could have an argument such as --color but perhaps it should be the default and the option should be --no-color.

Edit: Commented on wrong diff

aleksandr.urakov accepted this revision.Nov 6 2018, 1:56 AM

Bleh, I think I found a problem with this approach. Since we assume everything is a namespace, it can lead to ambiguities. I think we need to actually consult the debug info while parsing components of the mangled name.

Yes, it is what actually we have done in PDBASTParser::GetDeclContextContainingSymbol. Also I think that it is worth to remember if we already have created the class / function parent before, and do not create namespaces in this case. For example: N0::N1::CClass::PrivateFunc::__l2::InnerFuncStruct. We have here the PrivateFunc function and the InnerFuncStruct structure inside it. So we do not need to create the __l2 namespace inside the PrivateFunc function (moreover, it is not possible to create a namespace inside a class / function).

There is a similar problem with global and static variables and functions. Are the mangled names presented for them? Can we solve it in the same way?

Anyway, I think that this patch is a great start!

Bleh, I think I found a problem with this approach. Since we assume everything is a namespace, it can lead to ambiguities. I think we need to actually consult the debug info while parsing components of the mangled name.

Yes, it is what actually we have done in PDBASTParser::GetDeclContextContainingSymbol. Also I think that it is worth to remember if we already have created the class / function parent before, and do not create namespaces in this case. For example: N0::N1::CClass::PrivateFunc::__l2::InnerFuncStruct. We have here the PrivateFunc function and the InnerFuncStruct structure inside it. So we do not need to create the __l2 namespace inside the PrivateFunc function (moreover, it is not possible to create a namespace inside a class / function).

There is a similar problem with global and static variables and functions. Are the mangled names presented for them? Can we solve it in the same way?

Anyway, I think that this patch is a great start!

I actually think I have a totally different solution to this that will be more robust. It will handle the nested classes and namespaces perfectly, but not the scoped classes. But we can deal with that later. But it gives us the highest quality correctness level for this case, which I think is ultimately the most common (nested classes are pretty rare). So we can try the best methods first, and fall back to other methods as needed. I'm going to abandon this patch for now, but we'll keep it in mind for later. I'll upload a new patch with the new approach soon.

labath resigned from this revision.Aug 9 2019, 2:03 AM
shafik added a subscriber: shafik.Sep 21 2023, 12:18 PM

Is this PR still relevant or can it be closed?

Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2023, 12:18 PM