This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Improve performance of the PDB DIA plugin
ClosedPublic

Authored by aleksandr.urakov on Oct 17 2018, 8:04 AM.

Details

Summary

This patch improves performance of SymbolFilePDB on huge executables in two ways:

  • cache names of public symbols by address. When creating variables we are trying to get a mangled name for each one, and in GetMangledForPDBData we are enumerating all public symbols, which takes O(n) for each variable. With the cache we can retrieve a mangled name in O(log(n));
  • cache section contributions. When parsing variables for context we are enumerating all variables and check if the current one is belonging to the current compiland. So we are retrieving a compiland ID for the variable. But in PDBSymbolData::getCompilandId for almost every variable we are enumerating all section contributions to check if the variable is belonging to it, and get a compiland ID from the section contribution if so. It takes O(n) for each variable, but with caching it takes about O(log(n)). I've placed the cache in SymbolFilePDB and have created GetCompilandId there. It actually duplicates PDBSymbolData::getCompilandId except for the cache part. Another option is to support caching in PDBSymbolData::getCompilandId and to place cache in DIASession, but it seems that the last one doesn't imply such functionality, because it's a lightweight wrapper over DIA and whole its state is only a COM pointer to the DIA session. Moreover, PDBSymbolData::getCompilandId is used only inside of SymbolFilePDB, so I think that it's not a bad place to do such things. With this patch PDBSymbolData::getCompilandId is not used at all.

This bottlenecks were found with profiling. I've discovered these on a simple demo project of Unreal Engine (x86 executable ~72M, PDB ~82M).

This patch doesn't change external behavior of the plugin, so I think that there's no need for additional testing (already existing tests should warn us about regress, if any).

Diff Detail

Repository
rLLDB LLDB

Event Timeline

asmith accepted this revision.Oct 19 2018, 12:19 PM

Thanks for adding the cache. We noticed the slowdown also and were discussing the same thing. This LGTM if the other reviewers done have any comments.

This revision is now accepted and ready to land.Oct 19 2018, 12:19 PM
Hui added a subscriber: Hui.Oct 22 2018, 1:35 PM

Thanks for adding the cache. We noticed the slowdown also and were discussing the same thing. This LGTM if the other reviewers done have any comments.

I think parsing types e.g. SymbolFilePDB::ParseTypes also has speed bumps. Will be good to have them cached too.

In D53375#1271336, @Hui wrote:

I think parsing types e.g. SymbolFilePDB::ParseTypes also has speed bumps. Will be good to have them cached too.

We are already caching them, see m_types field.

This revision was automatically updated to reflect the committed changes.