This is an archive of the discontinued LLVM Phabricator instance.

Add a function for mapping PDBSymbol index IDs to lldb::LangTypes
AbandonedPublic

Authored by lanza on Mar 26 2019, 7:17 PM.

Details

Summary

In order to get the PDBSymbolFile plugin to be language agnostic we
need a mechanism to map from arbitrary PDBSymbol IDs to the language
of it's parent PDBSymbolCompilandDetails. This patch implements a
DenseMap that maps from the child uid to the CompUnit which you can
then use to get the language.

Event Timeline

lanza created this revision.Mar 26 2019, 7:17 PM
lanza updated this revision to Diff 192408.Mar 26 2019, 9:36 PM

clean up

lanza updated this revision to Diff 192524.Mar 27 2019, 3:07 PM

add default to c++

compnerd added inline comments.Mar 27 2019, 3:32 PM
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
568

Why default to C++? Isn't there an unknown language? Or perhaps asm?

lanza updated this revision to Diff 192552.EditedMar 27 2019, 6:31 PM

clean up

lanza marked an inline comment as done.Mar 27 2019, 6:45 PM
lanza updated this revision to Diff 192554.Mar 27 2019, 6:47 PM

fix comment

MaskRay added inline comments.
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
551

auto -> LanguageType?

557

insert -> try_emplace

lanza marked 2 inline comments as done.Mar 28 2019, 3:13 PM

@MaskRay 👍🏼

compnerd added inline comments.Apr 1 2019, 12:22 PM
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
568

Hmm, with multi-language debugging, Im not sure if this is really correct. A Builtin.int_2048 is not really a C++ type. Also, why C++ and not C?

Do you have a single PDB with multiple source file languages in it? This seems like it is going to consume a ton of memory if there's one of these for each SymUID

lanza added a comment.Apr 1 2019, 3:17 PM

@zturner Yup, the target language is Swift. So you can have C, C++ and Swift. I guess an alternative idea would be to just store every symbol that isn't C++ since the majority of them are. I'll check into that.

Can't you just write a function that, every time you call it, traces the symbol back to its original compile unit (you can get this from the PdbCompilandSymId), get the CompileUnit item, and ask it for its language? The part that seems unnecessary is the cache.

lanza added a comment.Apr 1 2019, 5:51 PM

Can't you just write a function that, every time you call it, traces the symbol back to its original compile unit (you can get this from the PdbCompilandSymId), get the CompileUnit item, and ask it for its language? The part that seems unnecessary is the cache.

For a general PDBSymbol? There's a getCompilandId for PDBSymbolFunction and PDBSymbolData which get the compliand from the DIALines they hold. There's a GetPDBCompilandByUID that accepts an arbitrary ID and dyn_cast_or_null's it to a Compiland. I don't see anything else given that a general PDBSymbol doesn't have any access to it's parents.

lanza marked an inline comment as done.Apr 1 2019, 6:10 PM
lanza added inline comments.
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
568

The builtin here is for code view debug info types. Codeview reserves the first 0xfff types. e.g. 0x74 is int.

My initial thought was to default to the old behavior in case of failure. But now thinking more about it, the type should be matched to whatever language used it and not any sort of default. e.g. the return type for int main(); in hw.cpp should map to eLanguageTypeC_plus_plus while the return type for the automatic main in swift should map to eLanguageTypeSwift. I'll have to find a way to fix that.

Can't you just write a function that, every time you call it, traces the symbol back to its original compile unit (you can get this from the PdbCompilandSymId), get the CompileUnit item, and ask it for its language? The part that seems unnecessary is the cache.

For a general PDBSymbol? There's a getCompilandId for PDBSymbolFunction and PDBSymbolData which get the compliand from the DIALines they hold. There's a GetPDBCompilandByUID that accepts an arbitrary ID and dyn_cast_or_null's it to a Compiland. I don't see anything else given that a general PDBSymbol doesn't have any access to it's parents.

Ok I see what the confusion is now. Unfortunately I was on vacation all of last week so I didn't see the initial round of patches, but now that I have a chance to see what's going on I'm a little concerned about the entire direction.

All of the code from this and all other patches is going into SymbolFile/PDB when it should be going into SymbolFile/NativePDB. In fact, SymbolFile/PDB was supposed to be marked for deprecation with the intent to remove. I wanted to remove it several months ago because I knew it would cause exactly this sort of confusion, but there were some objections on the grounds that they weren't at complete feature parity yet.

On the other hand, by not doing so, now we have this problem where the two implementations begin to diverge.

I would really, really, strongly prefer if all work on SymbolFile/PDB can be abandoned and continue in SymbolFile/NativePDB.

It requires a bit of a learning curve because you have to somewhat understand the internals of a PDB file, but given that the future involves deleting SymbolFile/PDB entirely, I think it's the only real path forward.

With that out of the way, my comment had mistakenly thought this was already in SymbolFile/NativePDB. In that plugin, a SymIndexId is essentially a uint64 which can be used to construct an instance of PdbSymUid. From there, you can check what kind of symbol it is (function symbol, variable, etc) and if it's a "compiland symbol" (which is roughly anything that isn't a type, and specifically anything that appears inside of a compiland stream inside the underlying PDB file), you can convert this PdbSymUid to a PdbCompilandSymId, and from there you can get the compiland descriptor which tells you the language.

The difference between the two is that SymbolFile/PDB uses DIA and hence only works on Windows, whereas SymbolFile/NativePDB parses the bytes of the PDB directly and can therefore work on any platform. This is a strong selling point, because it makes previously impossible scenarios such as debugging a Windows minidump on a Linux machine possible.

Hopefully this makes sense.

I agree with Zachary here. I think that implementing this in the native PDB plugin is more preferable. It still has some issues, and unfortunately I can't get into this right now because of another current work, but I'm planning to do it some later.

I think that the original patch will have a negative impact on performance and memory consumption on huge PDBs (e.g. ~500 MB). Is it possible to retrieve a compiland on the fly through the raw method getLexicalParentId in a way similar to PDBASTParser::GetClassOrFunctionParent?

lanza added a comment.Apr 2 2019, 3:06 PM
Ok I see what the confusion is now. Unfortunately I was on vacation all of last week so I didn't see the initial round of patches, but now that I have a chance to see what's going on I'm a little concerned about the entire direction.

All of the code from this and all other patches is going into SymbolFile/PDB when it should be going into SymbolFile/NativePDB. In fact, SymbolFile/PDB was supposed to be marked for deprecation with the intent to remove. I wanted to remove it several months ago because I knew it would cause exactly this sort of confusion, but there were some objections on the grounds that they weren't at complete feature parity yet.

On the other hand, by not doing so, now we have this problem where the two implementations begin to diverge.

I would really, really, strongly prefer if all work on SymbolFile/PDB can be abandoned and continue in SymbolFile/NativePDB.

It requires a bit of a learning curve because you have to somewhat understand the internals of a PDB file, but given that the future involves deleting SymbolFile/PDB entirely, I think it's the only real path forward.

With that out of the way, my comment had mistakenly thought this was already in SymbolFile/NativePDB. In that plugin, a SymIndexId is essentially a uint64 which can be used to construct an instance of PdbSymUid. From there, you can check what kind of symbol it is (function symbol, variable, etc) and if it's a "compiland symbol" (which is roughly anything that isn't a type, and specifically anything that appears inside of a compiland stream inside the underlying PDB file), you can convert this PdbSymUid to a PdbCompilandSymId, and from there you can get the compiland descriptor which tells you the language.

The difference between the two is that SymbolFile/PDB uses DIA and hence only works on Windows, whereas SymbolFile/NativePDB parses the bytes of the PDB directly and can therefore work on any platform. This is a strong selling point, because it makes previously impossible scenarios such as debugging a Windows minidump on a Linux machine possible.

Hopefully this makes sense.

Got ya. From looking at SymbolFileNativePDB it seems like it assumes clang specific functionality a good bit more than SymbolFilePDB does. I'll start playing with it and see what needs to be done. Thanks!

lanza abandoned this revision.Jun 29 2019, 5:46 PM