This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Use the raw PDB symbol interface more accurately
ClosedPublic

Authored by aleksandr.urakov on Sep 12 2018, 1:32 AM.

Details

Summary

This patch adds some symbol tag checks before using the IPDBRawSymbol interface to improve safety and readability.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner accepted this revision.Sep 12 2018, 1:49 PM
zturner added inline comments.
source/Plugins/SymbolFile/PDB/PDBASTParser.cpp
274 ↗(On Diff #165029)

Can you put a comment here?

// For items that are nested inside of a class, return the class that it is nested inside of.  Note that only certain items can be nested inside of classes.
281 ↗(On Diff #165029)

Do you want to handle VTable here?

290–292 ↗(On Diff #165029)
// Otherwise, if it is nested inside of a function, return the function.  Note that only certain items can be nested inside of functions.
This revision is now accepted and ready to land.Sep 12 2018, 1:49 PM

I've been experimenting with DIA locally and after some investigation I'm not sure this is going to be reliable. Let's say we have a class, we want the decl context containing the class. For example, on line 366. So we call GetDeclContextContainingSymbol. Despite what the MSDN documentation states, I'm pretty sure this is going to return a PDBSymbolExe. Worse, there is no guarantee that getLexicalParent() or getClassParent() will return the same thing twice. It all depends on how you obtained the object in the first place.

To make this concrete:

Suppose you find the PDB Global Scope object, enumerate it and you find some function called foo at address 12345. If you call getLexicalParent() on it, it will be the Exe symbol (because the way you obtained it is by calling ExeSymbol->findAllChildren, so the Exe symbol is the parent). Now you take this address 12345 and call Session->findSymbolByAddress(12345); It returns a new symbol with the same uid but now suddenly the lexical parent is the compiland where the symbol is defined. I hacked up llvm-pdbutil to dump some output from a test exe I created, and it outputs this:

Calling Session->findSymbolByAddress(1752)
3 `__raise_securityfailure` [Class: <null>, Lexical: 2 (Compiland 'f:\binaries\Intermediate\vctools\msvcrt.nativeproj_607447030\objd\x86\gs_report.obj')]
Printing function signature
1 `` [Class: <null>, Lexical: 4 (Tag 1)]
Calling Session->getSymbolById(3)
3 `__raise_securityfailure` [Class: <null>, Lexical: 2 (Compiland 'f:\binaries\Intermediate\vctools\msvcrt.nativeproj_607447030\objd\x86\gs_report.obj')]
Enumerating all functions
3 `__raise_securityfailure` [Class: <null>, Lexical: 4 (Tag 1)]
5 `_RTC_CheckEsp` [Class: <null>, Lexical: 4 (Tag 1)]

So notice that __raise_securityfailure has Lexical Parent of 4 when I enumerate functions, but a lexical parent of 2 (the matching compiland) if I obtain it by calling findSymbolByAddress.

Furthermore, for types, it seems that this method will never return anything other than the Exe. I was not able to find any way to get a symbol corresponding to a type record that returned a compiland.

Apply recommended fixes.

I've been experimenting with DIA locally and after some investigation I'm not sure this is going to be reliable. Let's say we have a class, we want the decl context containing the class. For example, on line 366. So we call GetDeclContextContainingSymbol. Despite what the MSDN documentation states, I'm pretty sure this is going to return a PDBSymbolExe.

But it returns the parent UDT symbol for me, when it is called for inner classes... I've added the next assert before return at the line 288:

assert(tag != PDB_SymType::UDT);

and have run the AST test:

llvm-lit.py -v ..\..\tools\lldb\lit\SymbolFile\PDB\ast-restore.test

and have retrieved the crash dump:

Assertion failed: tag != PDB_SymType::UDT, file ..\tools\lldb\source\Plugins\SymbolFile\PDB\PDBASTParser.cpp, line 288
LLVMSymbolizer: error reading file: 'ucrtbased.pdb': no such file or directory
LLVMSymbolizer: error reading file: 'wkernel32.pdb': no such file or directory
LLVMSymbolizer: error reading file: 'wntdll.pdb': no such file or directory
#0 0x0107f359 HandleAbort c:\work\llvm\lib\support\windows\signals.inc:409:0
#1 0x77dbf82b (C:\WINDOWS\SYSTEM32\ucrtbased.dll+0x9f82b)
#2 0x77dc0d72 (C:\WINDOWS\SYSTEM32\ucrtbased.dll+0xa0d72)
#3 0x77dc5124 (C:\WINDOWS\SYSTEM32\ucrtbased.dll+0xa5124)
#4 0x77dc371a (C:\WINDOWS\SYSTEM32\ucrtbased.dll+0xa371a)
#5 0x77dc568a (C:\WINDOWS\SYSTEM32\ucrtbased.dll+0xa568a)
#6 0x01e1c575 GetClassOrFunctionParent c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\pdbastparser.cpp:288:0
#7 0x01e19c19 PDBASTParser::GetDeclContextContainingSymbol(class llvm::pdb::PDBSymbol const &) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\pdbastparser.cpp:907:0
#8 0x01e16ee4 PDBASTParser::CreateLLDBTypeFromPDBType(class llvm::pdb::PDBSymbol const &) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\pdbastparser.cpp:373:0
#9 0x01df71ae SymbolFilePDB::ResolveTypeUID(unsigned __int64) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\symbolfilepdb.cpp:563:0
#10 0x01e1aa2d PDBASTParser::CompleteTypeFromUDT(class lldb_private::SymbolFile &,class lldb_private::CompilerType &,class llvm::pdb::PDBSymbolTypeUDT &) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\pdbastparser.cpp:1088:0
#11 0x01e19345 PDBASTParser::CompleteTypeFromPDB(class lldb_private::CompilerType &) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\pdbastparser.cpp:768:0
#12 0x01df73ab SymbolFilePDB::CompleteType(class lldb_private::CompilerType &) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\symbolfilepdb.cpp:585:0
#13 0x014fa4ea lldb_private::Type::ResolveClangType(enum lldb_private::Type::ResolveStateTag) c:\work\llvm\tools\lldb\source\symbol\type.cpp:552:0
#14 0x014f9a08 lldb_private::Type::GetFullCompilerType(void) c:\work\llvm\tools\lldb\source\symbol\type.cpp:593:0
#15 0x01e19566 PDBASTParser::GetDeclForSymbol(class llvm::pdb::PDBSymbol const &) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\pdbastparser.cpp:798:0
#16 0x01e1a2c3 PDBASTParser::ParseDeclsForDeclContext(class clang::DeclContext const *) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\pdbastparser.cpp:992:0
#17 0x01df78d2 SymbolFilePDB::ParseDeclsForContext(class lldb_private::CompilerDeclContext) c:\work\llvm\tools\lldb\source\plugins\symbolfile\pdb\symbolfilepdb.cpp:665:0
#18 0x00fde8d3 opts::symbols::dumpAST c:\work\llvm\tools\lldb\tools\lldb-test\lldb-test.cpp:537:0
#19 0x00fdf6a5 opts::symbols::dumpSymbols c:\work\llvm\tools\lldb\tools\lldb-test\lldb-test.cpp:710:0
#20 0x00fe0f9d main c:\work\llvm\tools\lldb\tools\lldb-test\lldb-test.cpp:952:0
#21 0x078e219e invoke_main f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:78:0
#22 0x078e2037 _scrt_common_main_seh f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:288:0
#23 0x078e1ecd _scrt_common_main f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:331:0
#24 0x078e2218 mainCRTStartup f:\dd\vctools\crt\vcstartup\src\startup\exe_main.cpp:17:0
#25 0x76558484 (C:\WINDOWS\System32\KERNEL32.DLL+0x18484)
#26 0x770a2fea (C:\WINDOWS\SYSTEM32\ntdll.dll+0x62fea)
#27 0x770a2fba (C:\WINDOWS\SYSTEM32\ntdll.dll+0x62fba)

During debugging I have figured out that it was N0::N1::Class::Inner resolving, and it had retrieved N0::N1::Class as a class parent here (which was completing at the time, and that completion had invoked N0::N1::Class::Inner resolving). So it seems that it works in this case.

Worse, there is no guarantee that getLexicalParent() or getClassParent() will return the same thing twice. It all depends on how you obtained the object in the first place.

Yes, I understand, that sometimes this function does not return a valid parent and returns nullptr. That's why I additionally check the symbol name at lines 913-943. It may fire, for example, for class static variables, for which DIA returns two symbols: one has a class parent (and has a short name e.g. ClassStatic) when another has not and treated as a global (and has a full name e.g. N0::N1::Class::ClassStatic).

For myself, I prove the correctness of GetClassOrFunctionParent from its postconditions. It may return only:

  • nullptr;
  • lexical parent symbol, which is a function (line 306). We check it in the line 305;
  • class parent symbol, which is a class (line 288). It must be a class, because all symbol types allowed in the outer switch must return a class symbol or nullptr as a class parent (if I'm not mistaken). But we filter nullptr in the line 287, so in the line 288 only a class symbol may be returned.

So it returns nullptr or a valid enclosing function or class. In the first case the symbol still can belong to a class or a function (as a class static variables described above), then we try to find a parent with the name. Unfortunately, there is a kind of heuristics here, but I have no idea yet how to do it better...

zturner accepted this revision.Sep 13 2018, 9:53 AM

Maybe I can improve this in the native implementation of our PDB reader which I'm currently working on, so that the results can be more consistent.

That would be great, thank you!

This revision was automatically updated to reflect the committed changes.