Page MenuHomePhabricator

Raise the PDB Hash Map out of the NameMap class
ClosedPublic

Authored by zturner on Jan 13 2017, 5:06 PM.

Details

Summary
[pdb] Raise out the hash table used in NameMap to its own class.

This data structure is used in other places in the PDB code,
one of which we have immediate need for in the TPI stream.  The
data structure is the exact same as that used by the NameMap.
It is not really so much of a hash table as it is a linear
dictionary where items can be deleted and then later re-used.

In raising this out, I found some bugs in the original
implementation, particularly regarding the deleted and present
bit vectors whose values were improperly being ignored upon
serialization.

In the Microsoft implementation, this is a template class where
key and value can be arbitrary types.  However, in all uses
which seem important to serialize, the key and value are
32-bit integers, so to simplify things I've called this class
LinearIntegerMap.  We can re-generalize later if we find it
necessary to do so.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner updated this revision to Diff 84409.Jan 13 2017, 5:06 PM
zturner retitled this revision from to Raise the PDB Hash Map out of the NameMap class.
zturner updated this object.
zturner added reviewers: inglorion, amccarth, ruiu.
zturner added a subscriber: llvm-commits.
ruiu added inline comments.Jan 13 2017, 5:17 PM
llvm/include/llvm/DebugInfo/PDB/Raw/LinearIntegerMap.h
8 ↗(On Diff #84409)

It doesn't have to be detailed, but this needs a file comment to describe what this is.

69–70 ↗(On Diff #84409)

You can initialize them by = {0};, no?

llvm/lib/DebugInfo/PDB/Raw/LinearIntegerMap.cpp
22 ↗(On Diff #84409)

What does HAH stand for?

32–33 ↗(On Diff #84409)

You are extending these vectors unconditionally. What was the point of resize(1) in the ctor?

67–70 ↗(On Diff #84409)

Does this actually find largest numbers?

153–154 ↗(On Diff #84409)

Remove this.

188 ↗(On Diff #84409)

This needs a comment.

zturner added inline comments.Jan 13 2017, 5:25 PM
llvm/lib/DebugInfo/PDB/Raw/LinearIntegerMap.cpp
22 ↗(On Diff #84409)

Left over from old code where it was more obvious. I can fix this.

32–33 ↗(On Diff #84409)

You can either create one of these and manipulate it in memory then serialize it, or you can load one from a serialized format. If you create one from scratch, the capacity needs to be at least 1 so that you can write an item into it.

67–70 ↗(On Diff #84409)

Yes since it's a bit vector, the values you get from iterating over it are the indices that contain set bits. It iterates over the indices which contain 1 bits, in ascending order. So the last value should be the index of the largest set bit.

188 ↗(On Diff #84409)

Good point. FWIW this comes from the Microsoft code, where they want to maintain < 2/3 load factor (size / capacity ratio)

majnemer added inline comments.
llvm/include/llvm/DebugInfo/PDB/Raw/LinearIntegerMap.h
59 ↗(On Diff #84409)

Capacity ?

79 ↗(On Diff #84409)

= default; ?

llvm/lib/DebugInfo/PDB/Raw/LinearIntegerMap.cpp
187–188 ↗(On Diff #84409)

Capacity ? Also, I'd add parens.

194 ↗(On Diff #84409)

capacity() * 3 can overflow, should there be a std::max with capacity()? Maybe std::max(capacity() * 3 / 2 + 1, capacity() + 1) ?

zturner planned changes to this revision.Jan 14 2017, 11:46 AM

There are some intricacies of the data structure that I didn't properly handle with regards to growing and some interactions between deleted and empty entries.

AFAICT this is basically an open address hash table with lazy deletion where the hash function is the identity. I need to re-work this to properly handle rehashing and lazy deletion.

zturner updated this revision to Diff 84745.Jan 17 2017, 2:27 PM
zturner added a reviewer: rnk.

This patch is entirely re-worked. Support for Present and Deleted fields is correctly implemented, linear probing is correctly implemented, growing is correctly implemented, and some unit tests are added to verify all of this.

majnemer added inline comments.Jan 17 2017, 2:43 PM
llvm/lib/DebugInfo/PDB/Raw/HashTable.cpp
189 ↗(On Diff #84745)

Should we assert that NewCapacity is bigger than capacity()?

zturner added inline comments.Jan 17 2017, 2:47 PM
llvm/lib/DebugInfo/PDB/Raw/HashTable.cpp
189 ↗(On Diff #84745)

Forgot about this, I'm using your suggestion of std::max from the previous comment, will upload a fix and some other minor cleanups shortly.

zturner updated this revision to Diff 84751.Jan 17 2017, 2:52 PM

Fix an unchecked overflow in HashTable::grow() and fix some small typos and cleanups.

This revision is now accepted and ready to land.Jan 19 2017, 3:30 PM
This revision was automatically updated to reflect the committed changes.