Currently nothing uses this, but this at least gets the core algorithm in, and adds some test to demonstrate correctness.
Details
Diff Detail
Event Timeline
Added a minor sanity check to the unit test to make sure that different records in the same stream return different hashes.
Renames WeaklyHashedType to LocallyHashedType and ContentHashedType to GloballyHashedType to better express where the strength and weakness lies in the hash.
Updated the interface to allow you to specify previous types and previous ids separately. This isn't used yet, but should be supported since it's possible to want to compute global hashes for streams where types and ids are already split.
Also, +ruiu since rnk is OOO.
llvm/include/llvm/DebugInfo/CodeView/TypeHashing.h | ||
---|---|---|
30 | Is this used when an object file doesn't have type record hash values? If you use 64-bit values as unique keys and want to maintain a probability of collision lower than 10^-9, for example, the maximum number of type records you can have is 190,000, according to [1]. Is this enough? https://en.wikipedia.org/wiki/Birthday_problem#Probability_table | |
42 | identify |
LGTM
Let's land this so that we can experiment. I still doubt if a noncryptographic tree hash is faster than SHA1, but that can be experimented once this patch is in.
We can't compare tree vs non-tree. Only tree vs tree and non-tree vs non-tree. In those scenarios, noncryptographic hash function is always faster than SHA1.
Current algorithm uses non-cryptographic non-tree hash function. The new code will use cryptographic tree hash function. So hash computation gets slower (tree is slower than non-tree, cryptographic is slower than non-cryptographic), but the magic is that the work can be off-loaded to the compiler. The other thing that's easy to lose sight of is that it's not just the speed of computing the hash. It's also the cost of probing. Since our hashes are bucketed only on the lower 4 bytes, there's always going to be probes no matter what the hash function is. But with the tree hash, a probe gets much cheaper, because an equality comparison is just a 20-byte tree hash comparison, and not a massive full record comparison. On a large application, that could be several terabytes of memcmps which we can now eliminate.
Anyway, like you said we'll see after some more experimentation :) Thanks!
Is this used when an object file doesn't have type record hash values?
If you use 64-bit values as unique keys and want to maintain a probability of collision lower than 10^-9, for example, the maximum number of type records you can have is 190,000, according to [1]. Is this enough?
https://en.wikipedia.org/wiki/Birthday_problem#Probability_table