This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Restore AST from PDB symbols
ClosedPublic

Authored by aleksandr.urakov on Aug 23 2018, 7:10 AM.

Details

Summary

I work on the evaluation of expressions on Windows now, and currently I am trying to restore the AST info from PDB symbols.

This patch adds an implementation of retrieving of declarations and declaration contexts based on PDB symbols.

PDB has different type symbols for const-qualified types, and this implementation ensures that only one declaration was created for both const and non-const types, but creates different compiler types for them.

The implementation also processes the case when there are two symbols corresponding to a variable. It's possible e.g. for class static variables, they has one global symbol and one symbol belonging to a class.

PDB has no info about namespaces, so this implementation parses the full symbol name and tries to figure out if the symbol belongs to namespace or not, and then creates nested namespaces if necessary.

The patch is big enough, but I can't come up with testing. Have you any ideas about that?

Diff Detail

Repository
rL LLVM

Event Timeline

Ping!

Can you review this, please?

Drop a scope part of a class member function name.

I don’t think I’m really a good person to look at AST stuff. I can look for
general style comments, obvious flaws, and test coverage. but you may be
the best person regarding on the content of the patch. Does that sound ok?

Thanks :) Let's also wait for what Aaron will say? Because the patch is big enough.

As for testing, what do you think if I'll add complex tests for the expressions evaluation, when it will be ready? I have no other idea how to test the changes in this patch...

I think in clang they have a method to dump the AST. Could we add something like that to lldb-test? We're using the symbols subcommand, maybe it would be worth it to add a --dump-ast flag to that subcommand? That seems like a generally useful testing feature.

Yes, I'll try to implement that, thanks for the idea!

Added the dumping AST ability to lldb-test; adding a corresponding test for the patch.

  • Fix a typo bug in AddRecordMethods (use continue instead of break);
  • Do not search function declarations by name during getting of a declaration for a symbol, it may lead to ambiguity.

Lgtm, thanks!

asmith accepted this revision.Sep 8 2018, 5:38 PM

LGTM!

This revision is now accepted and ready to land.Sep 8 2018, 5:38 PM
This revision was automatically updated to reflect the committed changes.

This change is causing three of the symbol file PDB tests to fail:

lldb-Unit :: SymbolFile/PDB/release/SymbolFilePDBTests.exe/SymbolFilePDBTests.TestClassInNamespace
lldb-Unit :: SymbolFile/PDB/release/SymbolFilePDBTests.exe/SymbolFilePDBTests.TestNestedClassTypes
lldb-Unit :: SymbolFile/PDB/release/SymbolFilePDBTests.exe/SymbolFilePDBTests.TestSimpleClassTypes

For example:

e:\_work\36\s\llvm\tools\lldb\unittests\symbolfile\pdb\symbolfilepdbtests.cpp(370): error : Expected: 1u [e:\_work\36\b\LLVMBuild\check-all.vcxproj]
          
                Which is: 1
          
          To be equal to: symfile->FindTypes(sc, ConstString("Class"), nullptr, false, 0, searched_files, results)
          
                Which is: 2

Sorry, I've missed this. I'll fix it tomorrow. Thank you!

ted added a subscriber: ted.Sep 10 2018, 2:55 PM

Another issue:

auto context = symbol.getRawSymbol().getName();
auto context_size = context.rfind("::");

...

auto from = 0;
while (from < context_size) {

context_size is size_t (from std::string::rfind), but on clang 5.01, "auto from = 0" makes from an int. The comparison on the next line generates a warning:
comparison of integers of different signs: 'int' and 'unsigned long' [-Werror,-Wsign-compare] on 64 bit Linux.

The "auto from = 0" should be "size_t from = 0", since auto can't determine the correct type.

zturner added inline comments.Sep 10 2018, 4:36 PM
lldb/trunk/source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
268–269

All file local functions should be marked static.

273–288

This function is a little too general because it hits the IPDBRawSymbol interface which I would strongly discourage, and as a result I think it's probably wrong. class parent id only valid for certain types of symbols in the first place. Likewise, getLexicalParentId() is only valid for certain types of symbols. The full list is:

SymbolTypeLexicalParentClassParent
ArrayCompiland<invalid>
BaseClassCompilandEnclosing Class
BuiltinTypeCompiland<invalid>
EnumCompilandEnclosing Class
FunctionArgCompilandEnclosing Function
FunctionTypeCompilandEnclosing Class
PointerCompiland<invalid>
UDTCompilandClass Parent
VTableCompilandClass Parent
VTableShapeCompiland<invalid>
CompilandExe File<invalid>
CompilandDetailsCompiland<invalid>
CompilandEnvCompiland<invalid>
DataCompiland, Function, or BlockEnclosing Class (or null)
Exe<invalid><invalid>
FunctionCompilandClass Parent (if member function)
FunctionDebugStartFunction<invalid>
FunctionDebugEndFunction<invalid>
PublicSymbolExe<invalid>
ThunkCompiland or Function<invalid>
289

However, this is only for symbols (not types). So we can exclude everything above Compiland. In fact, we should assert that we don't get anything from above Compiland. So I would write this as:

uint32_t ParentId = 0;
switch (raw.getSymTag()) {
case PDB_SymType::Array:
case PDB_SymType::BaseClass:
case PDB_SymType::BuiltinType:
case PDB_SymType::Enum:
case PDB_SymType::FunctionArg:
case PDB_SymType::FunctionType:
case PDB_SymType::PointerType:
case PDB_SymType::UDTType:
case PDB_SymType::VTable:
case PDB_SymType::VTableShape:
  assert(false && "This is a type not a symbol!");
  return nullptr;
default:
  break;
}

And for the rest it's quite simple because you probably really only care about functions, public symbols, and data. In which case you can continue:

template<typename T>
static std::unique_ptr<llvm::pdb::PDBSymbol>
classOrLexicalParent(const T& t, IPDBSession &Session) {

uint32_t parent = t.getClassParentId();
if (parent == 0)
  parent = t.getLexicalParentId();
return Session.getSymbolById(parent);

}

if (auto &F = dyn_cast<PDBSymbolFunc>(symbol)) {
  return classOrLexicalParent(F);
} else if (auto &D = dyn_cast<PDBSymbolData>(symbol)) {
  return classOrLexicalParent(F);
} else if (auto &P = dyn_cast<PDBSymbolPublic>(symbol)) {
  return P.getLexicalParent();
}
In D51162#1229616, @ted wrote:

The "auto from = 0" should be "size_t from = 0", since auto can't determine the correct type.

Yes, I've missed that because MSVC doesn't emit such a warning. Thanks for catching that!

lldb/trunk/source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
268–269

But this function is already belonging to anonymous namespace, what static actually means in C++. Do you think that it's still necessary?

lldb/trunk/source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
289

The problem is that this function is used also for type symbols. When creating a type, we use GetDeclContextContainingSymbol to find a correct context for the type declaration, and call GetClassOrFunctionParent there. So we can find an enclosing class for an inner class for example.

The main reason why I have used the low-level interface here is to avoid extra dynamic casts. But what you are saying about (if I understand correctly) is the type safety, and may be it's worth several dynamic casts. Do you mind if I'll commit the fix for the test and the warning and then will try to figure out (considering the table you have sent) how to make this function more type safe? I think that it requires some more time to find the solution.

This change is causing three of the symbol file PDB tests to fail:

lldb-Unit :: SymbolFile/PDB/release/SymbolFilePDBTests.exe/SymbolFilePDBTests.TestClassInNamespace
lldb-Unit :: SymbolFile/PDB/release/SymbolFilePDBTests.exe/SymbolFilePDBTests.TestNestedClassTypes
lldb-Unit :: SymbolFile/PDB/release/SymbolFilePDBTests.exe/SymbolFilePDBTests.TestSimpleClassTypes

For example:

The interesting moment here is that the test-pdb-types.pdb contains some strange symbols not belonging to the source. For example:

class Class [sizeof = 1] {
  data +0x00 [sizeof=1] bool b
}

class NS::NSClass [sizeof = 16] {
  data +0x00 [sizeof=4] float f
  <padding> (4 bytes)
  data +0x08 [sizeof=8] double d
}
Total padding 4 bytes (25% of class size)
Immediate padding 4 bytes (25% of class size)

class Class::NestedClass [sizeof = 4] {
  data +0x00 [sizeof=4] float f
}

class Class [sizeof = 2] {
  data +0x00 [sizeof=2] ShortEnum se
}

class Class::NestedClass [sizeof = 4] {
  data +0x00 [sizeof=4] Enum e
}

But there's no Class with bool b and there's no Class::NestedClass with float f. So it seems that the test passed before occasionally. I will regenerate this PDB as well.

What do you think about this symbols? Could they appear after changes due to incremental builds?

@stella.stamenova @ted Fixed with rL341942, thanks again!
@zturner I'll rewrite GetClassOrFunctionParent and will create a different review for that, ok?

That’s fine