PDB has a lot of similar data structures. We already have code for parsing a Name Map, but PDB seems to have a different but very similar structure that is a hash table. This is the beginning of code needed in order to parse the name hash table, but it is not yet complete. It parses the basic metadata of the hash table, the bucket array, and the names buffer, but doesn't use any of these fields yet as the data structure requires a non-trivial amount of work to understand. The parsing code here corresponds to the `NMT` data structure in the reference implementation, while `NameMap` corresponds to the `NMTNI` data structure.
Details
Diff Detail
Event Timeline
I actually managed to figure out the rest of this, so we can now dump all the strings from it as well, and I added tests in llvm-pdbdump to verify.
I had inadvertently written up the V2 hashing algorithm and used it in the case of a V1 header while we were claiming to not support V2. So now I add the V1 hashing function, and remove the version check, supporting both V1 and V2 hash algorithms.
Get rid of the duff's device. I imagine the compiler is pretty good at optimizing this, and it's hard to understand. It should be much more obvious how this version works, and it's equivalent.
lib/DebugInfo/PDB/Raw/NameHashTable.cpp | ||
---|---|---|
27 | Yea probably. | |
99 | Honestly I don't think so, but I was wondering about the case where the formula there results in precisely 0xFFFFFFFF. Upon closer inspection though, I don't think that's even possible, because 0xFFFFFFFF-1013904223 = 3281063072, which isn't divisible by 5. And | |
154 | I think I can return 0 here instead. 0 looks to be not a valid name index. Technically a name index is not actually an index but rather an offset into the names buffer, which you get by indexing the indices array. offset 0 in the buffer appears to always be a null character. |
I also changed the terminology everywhere from Index to ID so that people don't get confused and think they can iterate over all the names in a for loop.
lib/DebugInfo/PDB/Raw/NameHashTable.cpp | ||
---|---|---|
133 | If we are going to use 0 as our sentinel, can we assert that NameIndex isn't 0? |
lib/DebugInfo/PDB/Raw/NameHashTable.cpp | ||
---|---|---|
133 | I think it would be better to just skip over 0 values. If the number of buckets is larger than the number of strings (because say you added a string and then removed it) then you could have 0 values in the IDs array. So I should probably fix getStringForID to check for an ID of 0 and also fix getIDForString to skip the entry if it finds a 0 id. |
lib/DebugInfo/PDB/Raw/NameHashTable.cpp | ||
---|---|---|
133 | So another option, is that it seems like Indicies[0] == 0, and NamesBuffer[0] is the sentinel string. So my hypothesis is that the correct thing to do is treat whatever is in Indices[0] as the "invalid ID". So if I return IDs[0] here, and then check for IDs[0] in getStringForID, that should work and at least not be hardcoded. |
Shouldn't this be a size_t ?