This is an archive of the discontinued LLVM Phabricator instance.

Parse PDB Name Hash Table
Needs ReviewPublic

Authored by zturner on Apr 29 2016, 11:13 AM.

Details

Reviewers
majnemer
ruiu
Summary
PDB has a lot of similar data structures.  We already have code
for parsing a Name Map, but PDB seems to have a different but
very similar structure that is a hash table.  This is the
beginning of code needed in order to parse the name hash table,
but it is not yet complete.  It parses the basic metadata of
the hash table, the bucket array, and the names buffer, but
doesn't use any of these fields yet as the data structure
requires a non-trivial amount of work to understand.

The parsing code here corresponds to the `NMT` data structure in
the reference implementation, while `NameMap` corresponds to
the `NMTNI` data structure.

Diff Detail

Event Timeline

zturner updated this revision to Diff 55630.Apr 29 2016, 11:13 AM
zturner retitled this revision from to Parse PDB Name Hash Table.
zturner updated this object.
zturner added reviewers: ruiu, majnemer.
zturner added a subscriber: llvm-commits.
zturner updated this revision to Diff 55653.Apr 29 2016, 1:05 PM

I actually managed to figure out the rest of this, so we can now dump all the strings from it as well, and I added tests in llvm-pdbdump to verify.

zturner updated this revision to Diff 55661.Apr 29 2016, 1:48 PM

I had inadvertently written up the V2 hashing algorithm and used it in the case of a V1 header while we were claiming to not support V2. So now I add the V1 hashing function, and remove the version check, supporting both V1 and V2 hash algorithms.

zturner updated this revision to Diff 55663.Apr 29 2016, 2:04 PM

Get rid of the duff's device. I imagine the compiler is pretty good at optimizing this, and it's hard to understand. It should be much more obvious how this version works, and it's equivalent.

majnemer added inline comments.Apr 29 2016, 2:05 PM
lib/DebugInfo/PDB/Raw/NameHashTable.cpp
37–57

Could we make this a simple for-loop and let the compiler unroll it?

99

Hash is a uint32_t, does % 0xFFFFFFFF actually do anything?

154

Returning -1 isn't a great interface, who is supposed to call getIndexForString ?

majnemer added inline comments.Apr 29 2016, 2:07 PM
lib/DebugInfo/PDB/Raw/NameHashTable.cpp
27

Shouldn't this be a size_t ?

38–58

Heh, you just addressed this right as I wrote this comment.

zturner added inline comments.Apr 29 2016, 2:13 PM
lib/DebugInfo/PDB/Raw/NameHashTable.cpp
27

Yea probably.

99

Honestly I don't think so, but I was wondering about the case where the formula there results in precisely 0xFFFFFFFF. Upon closer inspection though, I don't think that's even possible, because 0xFFFFFFFF-1013904223 = 3281063072, which isn't divisible by 5. And

154

I think I can return 0 here instead. 0 looks to be not a valid name index. Technically a name index is not actually an index but rather an offset into the names buffer, which you get by indexing the indices array. offset 0 in the buffer appears to always be a null character.

zturner updated this revision to Diff 55669.Apr 29 2016, 2:19 PM

Address review comments.

I also changed the terminology everywhere from Index to ID so that people don't get confused and think they can iterate over all the names in a for loop.

majnemer added inline comments.Apr 29 2016, 2:40 PM
lib/DebugInfo/PDB/Raw/NameHashTable.cpp
132

If we are going to use 0 as our sentinel, can we assert that NameIndex isn't 0?

zturner added inline comments.Apr 29 2016, 2:50 PM
lib/DebugInfo/PDB/Raw/NameHashTable.cpp
132

I think it would be better to just skip over 0 values. If the number of buckets is larger than the number of strings (because say you added a string and then removed it) then you could have 0 values in the IDs array. So I should probably fix getStringForID to check for an ID of 0 and also fix getIDForString to skip the entry if it finds a 0 id.

zturner added inline comments.Apr 29 2016, 2:58 PM
lib/DebugInfo/PDB/Raw/NameHashTable.cpp
132

So another option, is that it seems like Indicies[0] == 0, and NamesBuffer[0] is the sentinel string. So my hypothesis is that the correct thing to do is treat whatever is in Indices[0] as the "invalid ID". So if I return IDs[0] here, and then check for IDs[0] in getStringForID, that should work and at least not be hardcoded.