This is an archive of the discontinued LLVM Phabricator instance.

Change TPI Bucket size for PDBs from minimum to maximum
ClosedPublic

Authored by CJHebert on Jan 18 2019, 2:50 PM.

Details

Summary

This patch changes the bucket count for the TPI and IPI streams in PDBs generated via LLVM from the minimum value to the maximum value. Changing this value improves symbol lookup for PDBs with large numbers of entries in the TPI and IPI streams.

In the microsoft-pdb repro published to support LLVM implementing PDB support, the provided code initializes the bucket count for the TPI and IPI streams to the maximum size. This occurs in tpi.cpp L33 and tpi.cpp L398. In the LLVM code for generating PDBs, these streams are created with minimum number of buckets. This difference makes LLVM generated PDBs slower for when used for debugging.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

CJHebert created this revision.Jan 18 2019, 2:50 PM

For context, this took the time for cold symbol look up for chrome.pdb from 10 minutes down to 2 minutes.

zturner accepted this revision.Jan 18 2019, 3:23 PM
zturner added subscribers: aganea, rnk.

Excellent find! Alexandre Ganea (cc'ed) reported to me offline that he had seen degraded watch-window performance when using clang-cl generated code, but we only had some guesses as to what was causing it (none of which was this). However, after reading your explanation, I'm pretty certain it had to be this. I don't think we would have discovered this without your patch, so thanks!

Do you have commit access?

This revision is now accepted and ready to land.Jan 18 2019, 3:23 PM

I don't have commit access.

aganea added a comment.EditedJan 19 2019, 10:34 AM

Thank you for this @CJHebert !

I was wondering:

  1. Can we do an adaptative scheme, based on TypeHashes.size()? Some of us have lots of DLL loaded in the process, this might uselessly increase the debugger's memory usage.
  2. Is MaxTpiHashBuckets - 1 (why not MaxTpiHashBuckets rather?) really a hard limit, or would VS support higher values?

For context, this took the time for cold symbol look up for chrome.pdb from 10 minutes down to 2 minutes.

  1. What takes 2 min? For the VS debugger to lookup a symbol when you debug break?
  1. Can we do an adaptative scheme, based on TypeHashes.size()? Some of us have lots of DLL loaded in the process, this might uselessly increase the debugger's memory usage.

I don't know the answer to this one directly. This is the value that the MSVC toolchain appears to use for all PDBs, and is the default used in the microsoft-pdb github repo when making new PDBs. I have no particular knowledge, but from the repo I suspect that any other count supports old PDBs that may have used an adaptive algorithm like you mention.

  1. Is MaxTpiHashBuckets - 1 (why not MaxTpiHashBuckets rather?) really a hard limit, or would VS support higher values?

This constant comes from the Microsoft-PDB github repo and appears to be the cap for these streams. The check for MaxTpiHashBuckets is checked via count < MaxTpiHashBuckets in LLVM and cchnMax in microsoft-pdb, so subtracting one produces the largest count without violating this check. I don't know how pdbs with higher values would work with existing tools.

For context, this took the time for cold symbol look up for chrome.pdb from 10 minutes down to 2 minutes.

  1. What takes 2 min? For the VS debugger to lookup a symbol when you debug break?

The specific scenario that is improved was cold lookup of symbols when using the dx command in windbg. This command and Visual Studio watch window use natvis, which appears to cause recursively cause symbol look ups. The mentioned chrome.pdb is absolutely massive at 1.4 gigs, the behavior for smaller pdbs was also improved.

When I apply this change locally, I get some test failures in LLD. Essentially, if you run llvm-pdbutil dump -types -type-extras foo.pdb we do some sort of crude hash verification. Basically, we take the hash that is in the PDB, then recompute what we think the hash should be, then compare the two, and if they're not equal we indicate that with a message. This message is trigger in the LLD tests, which makes me think that either something is wrong with the test, or something is wrong with the way we do the hash verification.

I'll have more time to see which one of these it is tomorrow.

This revision was automatically updated to reflect the committed changes.