This is an archive of the discontinued LLVM Phabricator instance.

[PDB] Add the ability to resolve forward references
ClosedPublic

Authored by zturner on Sep 19 2018, 3:08 PM.

Details

Summary

Some records point to an LF_CLASS, LF_UNION, LF_STRUCTURE, or LF_ENUM which is a forward reference and doesn't contain complete debug information. In these cases, we'd like to be able to quickly locate the full record. The TPI stream stores an array of pre-computed record hash values, one for each type record. If we pre-process this on startup, we can build a mapping from hash value -> {list of possible matching type indices}. Since hashes of full records are only based on the name and or unique name and not the full record contents, we can then use forward ref record to compute the hash of what *would* be the full record by just hashing the name, use this to get the list of possible matches, and iterate those looking for a match on name or unique name.

llvm-pdbutil is updated to resolve forward references for the purposes of testing (plus it's just useful).

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Sep 19 2018, 3:08 PM
zturner updated this revision to Diff 166190.Sep 19 2018, 3:09 PM

Remove the exe and obj file.

zturner updated this revision to Diff 166191.Sep 19 2018, 3:14 PM

This was originally a combination of several patches which I tried to split up into smaller functionality. Apparently I didn't do a very good job. Removed a bunch of code that shouldn't have been in this patch.

rnk added inline comments.Sep 19 2018, 4:08 PM
llvm/include/llvm/DebugInfo/PDB/Native/TpiHashing.h
60 ↗(On Diff #166191)

You will probably have to name the union field for this to build with all supported compilers.

llvm/include/llvm/DebugInfo/PDB/Native/TpiStream.h
87 ↗(On Diff #166191)

Nested C++ containers don't usually perform the best. Here come some obnoxious data structure micro-optimization suggestions down below!

llvm/lib/DebugInfo/PDB/Native/TpiStream.cpp
146 ↗(On Diff #166191)

How about this:

  1. Allocate an array of TypeIndex of size TypeIndexEnd - TypeIndexBegin, i.e. an array of all the type indices in the hash table.
  2. Make HashMap a std::vector<std::pair<unsigned, unsigned>>, where these are begin/end indices of ranges into the previous array. We can write a helper that takes a hash value and returns an ArrayRef<TypeIndex> using these two arrays.
  3. Build a temporary array of pairs of HashValue+TypeIndex, and sort it by hash value.
  4. Iterate over the sorted array of hash values and type indices. Push each type index onto the array. When the hash value changes, store a pair of begin/end indices for the type indices that share this hash code into the HashMap.
157 ↗(On Diff #166191)

If HV >= Header->NumHashBuckets, I would report an error or just drop the type. We could hash the type ourselves as a recovery, but it's a corrupt input file, anything but UB or crashing works.

It’s true that nested containers don’t usually perform well, but in this
case the outer vector *never* resizes. So it seems like it should be ok?

rnk added a comment.Sep 19 2018, 4:22 PM

It’s true that nested containers don’t usually perform well, but in this
case the outer vector *never* resizes. So it seems like it should be ok?

I'd be fine with the simple code, sure. We want the PDB to load fast, though, being cache friendly is the best way to do that, IMO. :)

I can probably also just use the pdb::HashTable class where the key is the
actual hash value. I’ll go with the vector first since it’s simple and
obviously correct then try to change it to something more efficient in a
followup

This revision was not accepted when it landed; it landed in state Needs Review.Sep 20 2018, 8:51 AM
This revision was automatically updated to reflect the committed changes.