In order to get the PDBSymbolFile plugin to be language agnostic we
need a mechanism to map from arbitrary PDBSymbol IDs to the language
of it's parent PDBSymbolCompilandDetails. This patch implements a
DenseMap that maps from the child uid to the CompUnit which you can
then use to get the language.
Details
Diff Detail
- Build Status
Buildable 29723 Build 29722: arc lint + arc unit
Event Timeline
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp | ||
---|---|---|
560 | Why default to C++? Isn't there an unknown language? Or perhaps asm? |
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp | ||
---|---|---|
560 | Hmm, with multi-language debugging, Im not sure if this is really correct. A Builtin.int_2048 is not really a C++ type. Also, why C++ and not C? |
Do you have a single PDB with multiple source file languages in it? This seems like it is going to consume a ton of memory if there's one of these for each SymUID
@zturner Yup, the target language is Swift. So you can have C, C++ and Swift. I guess an alternative idea would be to just store every symbol that isn't C++ since the majority of them are. I'll check into that.
Can't you just write a function that, every time you call it, traces the symbol back to its original compile unit (you can get this from the PdbCompilandSymId), get the CompileUnit item, and ask it for its language? The part that seems unnecessary is the cache.
Can't you just write a function that, every time you call it, traces the symbol back to its original compile unit (you can get this from the PdbCompilandSymId), get the CompileUnit item, and ask it for its language? The part that seems unnecessary is the cache.
For a general PDBSymbol? There's a getCompilandId for PDBSymbolFunction and PDBSymbolData which get the compliand from the DIALines they hold. There's a GetPDBCompilandByUID that accepts an arbitrary ID and dyn_cast_or_null's it to a Compiland. I don't see anything else given that a general PDBSymbol doesn't have any access to it's parents.
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp | ||
---|---|---|
560 | The builtin here is for code view debug info types. Codeview reserves the first 0xfff types. e.g. 0x74 is int. My initial thought was to default to the old behavior in case of failure. But now thinking more about it, the type should be matched to whatever language used it and not any sort of default. e.g. the return type for int main(); in hw.cpp should map to eLanguageTypeC_plus_plus while the return type for the automatic main in swift should map to eLanguageTypeSwift. I'll have to find a way to fix that. |
Ok I see what the confusion is now. Unfortunately I was on vacation all of last week so I didn't see the initial round of patches, but now that I have a chance to see what's going on I'm a little concerned about the entire direction.
All of the code from this and all other patches is going into SymbolFile/PDB when it should be going into SymbolFile/NativePDB. In fact, SymbolFile/PDB was supposed to be marked for deprecation with the intent to remove. I wanted to remove it several months ago because I knew it would cause exactly this sort of confusion, but there were some objections on the grounds that they weren't at complete feature parity yet.
On the other hand, by not doing so, now we have this problem where the two implementations begin to diverge.
I would really, really, strongly prefer if all work on SymbolFile/PDB can be abandoned and continue in SymbolFile/NativePDB.
It requires a bit of a learning curve because you have to somewhat understand the internals of a PDB file, but given that the future involves deleting SymbolFile/PDB entirely, I think it's the only real path forward.
With that out of the way, my comment had mistakenly thought this was already in SymbolFile/NativePDB. In that plugin, a SymIndexId is essentially a uint64 which can be used to construct an instance of PdbSymUid. From there, you can check what kind of symbol it is (function symbol, variable, etc) and if it's a "compiland symbol" (which is roughly anything that isn't a type, and specifically anything that appears inside of a compiland stream inside the underlying PDB file), you can convert this PdbSymUid to a PdbCompilandSymId, and from there you can get the compiland descriptor which tells you the language.
The difference between the two is that SymbolFile/PDB uses DIA and hence only works on Windows, whereas SymbolFile/NativePDB parses the bytes of the PDB directly and can therefore work on any platform. This is a strong selling point, because it makes previously impossible scenarios such as debugging a Windows minidump on a Linux machine possible.
Hopefully this makes sense.
I agree with Zachary here. I think that implementing this in the native PDB plugin is more preferable. It still has some issues, and unfortunately I can't get into this right now because of another current work, but I'm planning to do it some later.
I think that the original patch will have a negative impact on performance and memory consumption on huge PDBs (e.g. ~500 MB). Is it possible to retrieve a compiland on the fly through the raw method getLexicalParentId in a way similar to PDBASTParser::GetClassOrFunctionParent?
Ok I see what the confusion is now. Unfortunately I was on vacation all of last week so I didn't see the initial round of patches, but now that I have a chance to see what's going on I'm a little concerned about the entire direction. All of the code from this and all other patches is going into SymbolFile/PDB when it should be going into SymbolFile/NativePDB. In fact, SymbolFile/PDB was supposed to be marked for deprecation with the intent to remove. I wanted to remove it several months ago because I knew it would cause exactly this sort of confusion, but there were some objections on the grounds that they weren't at complete feature parity yet. On the other hand, by not doing so, now we have this problem where the two implementations begin to diverge. I would really, really, strongly prefer if all work on SymbolFile/PDB can be abandoned and continue in SymbolFile/NativePDB. It requires a bit of a learning curve because you have to somewhat understand the internals of a PDB file, but given that the future involves deleting SymbolFile/PDB entirely, I think it's the only real path forward. With that out of the way, my comment had mistakenly thought this was already in SymbolFile/NativePDB. In that plugin, a SymIndexId is essentially a uint64 which can be used to construct an instance of PdbSymUid. From there, you can check what kind of symbol it is (function symbol, variable, etc) and if it's a "compiland symbol" (which is roughly anything that isn't a type, and specifically anything that appears inside of a compiland stream inside the underlying PDB file), you can convert this PdbSymUid to a PdbCompilandSymId, and from there you can get the compiland descriptor which tells you the language. The difference between the two is that SymbolFile/PDB uses DIA and hence only works on Windows, whereas SymbolFile/NativePDB parses the bytes of the PDB directly and can therefore work on any platform. This is a strong selling point, because it makes previously impossible scenarios such as debugging a Windows minidump on a Linux machine possible. Hopefully this makes sense.
Got ya. From looking at SymbolFileNativePDB it seems like it assumes clang specific functionality a good bit more than SymbolFilePDB does. I'll start playing with it and see what needs to be done. Thanks!
auto -> LanguageType?