Download Raw Diff

Details

Reviewers

compnerd
zturner
asmith
amccarth
aleksandr.urakov
stella.stamenova

Summary

In order to get the PDBSymbolFile plugin to be language agnostic we
need a mechanism to map from arbitrary PDBSymbol IDs to the language
of it's parent PDBSymbolCompilandDetails. This patch implements a
DenseMap that maps from the child uid to the CompUnit which you can
then use to get the language.

Diff Detail

Build Status

Buildable 29672
Build 29671: arc lint + arc unit

Event Timeline

lanza created this revision.Mar 26 2019, 7:17 PM

Herald added subscribers: jdoerfert, arphaman. · View Herald TranscriptMar 26 2019, 7:17 PM

Harbormaster completed remote builds in B29669: Diff 192401.Mar 26 2019, 7:17 PM

lanza added reviewers: compnerd, zturner.Mar 26 2019, 7:18 PM

clean up

Harbormaster completed remote builds in B29672: Diff 192408.Mar 26 2019, 9:36 PM

add default to c++

Harbormaster completed remote builds in B29713: Diff 192524.Mar 27 2019, 3:09 PM

compnerd added inline comments.Mar 27 2019, 3:32 PM

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
568	Why default to `C++`? Isn't there an unknown language? Or perhaps asm?

clean up

Harbormaster completed remote builds in B29723: Diff 192552.Mar 27 2019, 6:31 PM

lanza marked an inline comment as done.Mar 27 2019, 6:45 PM

fix comment

Harbormaster completed remote builds in B29724: Diff 192554.Mar 27 2019, 6:48 PM

lanza added a reviewer: asmith.Mar 27 2019, 6:50 PM

MaskRay added a subscriber: MaskRay.Mar 28 2019, 2:59 AM

MaskRay added inline comments.

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
551	`auto` -> `LanguageType`?
557	`insert` -> `try_emplace`

@MaskRay 👍🏼

update

Harbormaster completed remote builds in B29775: Diff 192726.Mar 28 2019, 3:15 PM

compnerd added inline comments.Apr 1 2019, 12:22 PM

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
568	Hmm, with multi-language debugging, Im not sure if this is really correct. A `Builtin.int_2048` is not really a C++ type. Also, why C++ and not C?

Do you have a single PDB with multiple source file languages in it? This seems like it is going to consume a ton of memory if there's one of these for each SymUID

@zturner Yup, the target language is Swift. So you can have C, C++ and Swift. I guess an alternative idea would be to just store every symbol that isn't C++ since the majority of them are. I'll check into that.

Can't you just write a function that, every time you call it, traces the symbol back to its original compile unit (you can get this from the PdbCompilandSymId), get the CompileUnit item, and ask it for its language? The part that seems unnecessary is the cache.

Can't you just write a function that, every time you call it, traces the symbol back to its original compile unit (you can get this from the PdbCompilandSymId), get the CompileUnit item, and ask it for its language? The part that seems unnecessary is the cache.

For a general PDBSymbol? There's a getCompilandId for PDBSymbolFunction and PDBSymbolData which get the compliand from the DIALines they hold. There's a GetPDBCompilandByUID that accepts an arbitrary ID and dyn_cast_or_null's it to a Compiland. I don't see anything else given that a general PDBSymbol doesn't have any access to it's parents.

lanza marked an inline comment as done.Apr 1 2019, 6:10 PM

lanza added inline comments.

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
568	The `builtin` here is for code view debug info types. Codeview reserves the first `0xfff` types. e.g. `0x74` is `int`. My initial thought was to default to the old behavior in case of failure. But now thinking more about it, the type should be matched to whatever language used it and not any sort of default. e.g. the return type for `int main();` in `hw.cpp` should map to `eLanguageTypeC_plus_plus` while the return type for the automatic main in swift should map to `eLanguageTypeSwift`. I'll have to find a way to fix that.

In D59862#1450812, @lanza wrote:

Can't you just write a function that, every time you call it, traces the symbol back to its original compile unit (you can get this from the PdbCompilandSymId), get the CompileUnit item, and ask it for its language? The part that seems unnecessary is the cache.

For a general PDBSymbol? There's a getCompilandId for PDBSymbolFunction and PDBSymbolData which get the compliand from the DIALines they hold. There's a GetPDBCompilandByUID that accepts an arbitrary ID and dyn_cast_or_null's it to a Compiland. I don't see anything else given that a general PDBSymbol doesn't have any access to it's parents.

Ok I see what the confusion is now. Unfortunately I was on vacation all of last week so I didn't see the initial round of patches, but now that I have a chance to see what's going on I'm a little concerned about the entire direction.

All of the code from this and all other patches is going into SymbolFile/PDB when it should be going into SymbolFile/NativePDB. In fact, SymbolFile/PDB was supposed to be marked for deprecation with the intent to remove. I wanted to remove it several months ago because I knew it would cause exactly this sort of confusion, but there were some objections on the grounds that they weren't at complete feature parity yet.

On the other hand, by not doing so, now we have this problem where the two implementations begin to diverge.

I would really, really, strongly prefer if all work on SymbolFile/PDB can be abandoned and continue in SymbolFile/NativePDB.

It requires a bit of a learning curve because you have to somewhat understand the internals of a PDB file, but given that the future involves deleting SymbolFile/PDB entirely, I think it's the only real path forward.

With that out of the way, my comment had mistakenly thought this was already in SymbolFile/NativePDB. In that plugin, a SymIndexId is essentially a uint64 which can be used to construct an instance of PdbSymUid. From there, you can check what kind of symbol it is (function symbol, variable, etc) and if it's a "compiland symbol" (which is roughly anything that isn't a type, and specifically anything that appears inside of a compiland stream inside the underlying PDB file), you can convert this PdbSymUid to a PdbCompilandSymId, and from there you can get the compiland descriptor which tells you the language.

The difference between the two is that SymbolFile/PDB uses DIA and hence only works on Windows, whereas SymbolFile/NativePDB parses the bytes of the PDB directly and can therefore work on any platform. This is a strong selling point, because it makes previously impossible scenarios such as debugging a Windows minidump on a Linux machine possible.

Hopefully this makes sense.

zturner added reviewers: amccarth, aleksandr.urakov, stella.stamenova.Apr 1 2019, 8:09 PM

I agree with Zachary here. I think that implementing this in the native PDB plugin is more preferable. It still has some issues, and unfortunately I can't get into this right now because of another current work, but I'm planning to do it some later.

I think that the original patch will have a negative impact on performance and memory consumption on huge PDBs (e.g. ~500 MB). Is it possible to retrieve a compiland on the fly through the raw method getLexicalParentId in a way similar to PDBASTParser::GetClassOrFunctionParent?

All of the code from this and all other patches is going into SymbolFile/PDB when it should be going into SymbolFile/NativePDB. In fact, SymbolFile/PDB was supposed to be marked for deprecation with the intent to remove. I wanted to remove it several months ago because I knew it would cause exactly this sort of confusion, but there were some objections on the grounds that they weren't at complete feature parity yet.

On the other hand, by not doing so, now we have this problem where the two implementations begin to diverge.

I would really, really, strongly prefer if all work on SymbolFile/PDB can be abandoned and continue in SymbolFile/NativePDB.

With that out of the way, my comment had mistakenly thought this was already in SymbolFile/NativePDB. In that plugin, a SymIndexId is essentially a uint64 which can be used to construct an instance of PdbSymUid. From there, you can check what kind of symbol it is (function symbol, variable, etc) and if it's a "compiland symbol" (which is roughly anything that isn't a type, and specifically anything that appears inside of a compiland stream inside the underlying PDB file), you can convert this PdbSymUid to a PdbCompilandSymId, and from there you can get the compiland descriptor which tells you the language.

Hopefully this makes sense.

Got ya. From looking at SymbolFileNativePDB it seems like it assumes clang specific functionality a good bit more than SymbolFilePDB does. I'll start playing with it and see what needs to be done. Thanks!

lanza abandoned this revision.Jun 29 2019, 5:46 PM

Diff 192408

source/Plugins/SymbolFile/PDB/SymbolFilePDB.h

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	private:

lldb_private::Function *		lldb_private::Function *
ParseCompileUnitFunctionForPDBFunc(const llvm::pdb::PDBSymbolFunc &pdb_func,		ParseCompileUnitFunctionForPDBFunc(const llvm::pdb::PDBSymbolFunc &pdb_func,
lldb_private::CompileUnit &comp_unit);		lldb_private::CompileUnit &comp_unit);

void GetCompileUnitIndex(const llvm::pdb::PDBSymbolCompiland &pdb_compiland,		void GetCompileUnitIndex(const llvm::pdb::PDBSymbolCompiland &pdb_compiland,
uint32_t &index);		uint32_t &index);

		lldb::LanguageType GetLanguageForSymIndexId(uint32_t uid);

PDBASTParser *GetPDBAstParser();		PDBASTParser *GetPDBAstParser();

std::unique_ptr<llvm::pdb::PDBSymbolCompiland>		std::unique_ptr<llvm::pdb::PDBSymbolCompiland>
GetPDBCompilandByUID(uint32_t uid);		GetPDBCompilandByUID(uint32_t uid);

lldb_private::Mangled		lldb_private::Mangled
GetMangledForPDBFunc(const llvm::pdb::PDBSymbolFunc &pdb_func);		GetMangledForPDBFunc(const llvm::pdb::PDBSymbolFunc &pdb_func);

bool ResolveFunction(const llvm::pdb::PDBSymbolFunc &pdb_func,		bool ResolveFunction(const llvm::pdb::PDBSymbolFunc &pdb_func,
bool include_inlines,		bool include_inlines,
lldb_private::SymbolContextList &sc_list);		lldb_private::SymbolContextList &sc_list);

bool ResolveFunction(uint32_t uid, bool include_inlines,		bool ResolveFunction(uint32_t uid, bool include_inlines,
lldb_private::SymbolContextList &sc_list);		lldb_private::SymbolContextList &sc_list);

void CacheFunctionNames();		void CacheFunctionNames();

bool DeclContextMatchesThisSymbolFile(		bool DeclContextMatchesThisSymbolFile(
const lldb_private::CompilerDeclContext *decl_ctx);		const lldb_private::CompilerDeclContext *decl_ctx);

uint32_t GetCompilandId(const llvm::pdb::PDBSymbolData &data);		uint32_t GetCompilandId(const llvm::pdb::PDBSymbolData &data);

llvm::DenseMap<uint32_t, lldb::CompUnitSP> m_comp_units;		llvm::DenseMap<uint32_t, lldb::CompUnitSP> m_comp_units;
		llvm::DenseMap<uint32_t, lldb::CompUnitSP> m_symid_to_cu;
llvm::DenseMap<uint32_t, lldb::TypeSP> m_types;		llvm::DenseMap<uint32_t, lldb::TypeSP> m_types;
llvm::DenseMap<uint32_t, lldb::VariableSP> m_variables;		llvm::DenseMap<uint32_t, lldb::VariableSP> m_variables;
llvm::DenseMap<uint64_t, std::string> m_public_names;		llvm::DenseMap<uint64_t, std::string> m_public_names;

SecContribsMap m_sec_contribs;		SecContribsMap m_sec_contribs;

std::vector<lldb::TypeSP> m_builtin_types;		std::vector<lldb::TypeSP> m_builtin_types;
std::unique_ptr<llvm::pdb::IPDBSession> m_session_up;		std::unique_ptr<llvm::pdb::IPDBSession> m_session_up;
Show All 10 Lines

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp

Show First 20 Lines • Show All 541 Lines • ▼ Show 20 Lines	if (sc.function) {

// Parse variables in this compiland.		// Parse variables in this compiland.
num_added += ParseVariables(sc, *compiland);		num_added += ParseVariables(sc, *compiland);
}		}

return num_added;		return num_added;
}		}

		lldb::LanguageType SymbolFilePDB::GetLanguageForSymIndexId(uint32_t uid) {
		auto find_result = m_symid_to_cu.find(uid);
		MaskRayUnsubmitted Done Reply Inline Actions `auto` -> `LanguageType`? MaskRay: `auto` -> `LanguageType`?
		if (find_result != m_symid_to_cu.end())
		return ParseLanguage(*find_result->getSecond().get());

		for (auto& id_cu_pair : m_comp_units) {
		auto cu_sp = id_cu_pair.getSecond();
		auto pdb_compiland_up = GetPDBCompilandByUID(cu_sp->GetID());
		MaskRayUnsubmitted Done Reply Inline Actions `insert` -> `try_emplace` MaskRay: `insert` -> `try_emplace`
		auto children_enumerator_up = pdb_compiland_up->findAllChildren();
		if (children_enumerator_up && children_enumerator_up->getChildCount() > 0) {
		while (auto child = children_enumerator_up->getNext()) {
		m_symid_to_cu.insert(std::make_pair(uid, cu_sp));
		if (child->getSymIndexId() == uid) {
		return ParseLanguage(*cu_sp.get());
		}
		}
		}
		}
		}
		compnerdUnsubmitted Done Reply Inline Actions Why default to `C++`? Isn't there an unknown language? Or perhaps asm? compnerd: Why default to `C++`? Isn't there an unknown language? Or perhaps asm?
		compnerdUnsubmitted Not Done Reply Inline Actions Hmm, with multi-language debugging, Im not sure if this is really correct. A `Builtin.int_2048` is not really a C++ type. Also, why C++ and not C? compnerd: Hmm, with multi-language debugging, Im not sure if this is really correct. A `Builtin.
		lanzaAuthorUnsubmitted Done Reply Inline Actions The `builtin` here is for code view debug info types. Codeview reserves the first `0xfff` types. e.g. `0x74` is `int`. My initial thought was to default to the old behavior in case of failure. But now thinking more about it, the type should be matched to whatever language used it and not any sort of default. e.g. the return type for `int main();` in `hw.cpp` should map to `eLanguageTypeC_plus_plus` while the return type for the automatic main in swift should map to `eLanguageTypeSwift`. I'll have to find a way to fix that. lanza: The `builtin` here is for code view debug info types. Codeview reserves the first `0xfff` types.

lldb_private::Type *SymbolFilePDB::ResolveTypeUID(lldb::user_id_t type_uid) {		lldb_private::Type *SymbolFilePDB::ResolveTypeUID(lldb::user_id_t type_uid) {
auto find_result = m_types.find(type_uid);		auto find_result = m_types.find(type_uid);
if (find_result != m_types.end())		if (find_result != m_types.end())
return find_result->second.get();		return find_result->second.get();

TypeSystem *type_system =
GetTypeSystemForLanguage(lldb::eLanguageTypeC_plus_plus);
ClangASTContext *clang_type_system =
llvm::dyn_cast_or_null<ClangASTContext>(type_system);
if (!clang_type_system)
return nullptr;
PDBASTParser *pdb = clang_type_system->GetPDBParser();
if (!pdb)
return nullptr;

auto pdb_type = m_session_up->getSymbolById(type_uid);		auto pdb_type = m_session_up->getSymbolById(type_uid);
if (pdb_type == nullptr)		if (pdb_type == nullptr)
return nullptr;		return nullptr;

		auto lang = GetLanguageForSymIndexId(type_uid);
		TypeSystem *type_system = GetTypeSystemForLanguage(lang);
		if (!type_system)
		return nullptr;
		PDBASTParser *pdb = type_system->GetPDBParser();
		if (!pdb)
		return nullptr;

lldb::TypeSP result = pdb->CreateLLDBTypeFromPDBType(*pdb_type);		lldb::TypeSP result = pdb->CreateLLDBTypeFromPDBType(*pdb_type);
if (result) {		if (result) {
m_types.insert(std::make_pair(type_uid, result));		m_types.insert(std::make_pair(type_uid, result));
auto type_list = GetTypeList();		auto type_list = GetTypeList();
if (type_list)		if (type_list)
type_list->Insert(result);		type_list->Insert(result);
}		}
return result.get();		return result.get();
▲ Show 20 Lines • Show All 1,431 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add a function for mapping PDBSymbol index IDs to lldb::LangTypes
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 192408

source/Plugins/SymbolFile/PDB/SymbolFilePDB.h

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Add a function for mapping PDBSymbol index IDs to lldb::LangTypesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 192408

source/Plugins/SymbolFile/PDB/SymbolFilePDB.h

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp

Add a function for mapping PDBSymbol index IDs to lldb::LangTypes
AbandonedPublic