This is an archive of the discontinued LLVM Phabricator instance.

[CodeView] Add support for type record content hashing
ClosedPublic

Authored by zturner on Dec 1 2017, 10:09 AM.

Details

Summary

Currently nothing uses this, but this at least gets the core algorithm in, and adds some test to demonstrate correctness.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Dec 1 2017, 10:09 AM
zturner updated this revision to Diff 125171.Dec 1 2017, 10:25 AM

Added some doxygen comments explaining what this is for.

zturner updated this revision to Diff 125172.Dec 1 2017, 10:33 AM

Added a minor sanity check to the unit test to make sure that different records in the same stream return different hashes.

zturner updated this revision to Diff 125358.Dec 4 2017, 9:54 AM

Renames WeaklyHashedType to LocallyHashedType and ContentHashedType to GloballyHashedType to better express where the strength and weakness lies in the hash.

zturner updated this revision to Diff 125383.Dec 4 2017, 11:46 AM
zturner added a reviewer: ruiu.

Updated the interface to allow you to specify previous types and previous ids separately. This isn't used yet, but should be supported since it's possible to want to compute global hashes for streams where types and ids are already split.

Also, +ruiu since rnk is OOO.

ruiu added inline comments.Dec 4 2017, 11:12 PM
llvm/include/llvm/DebugInfo/CodeView/TypeHashing.h
29 ↗(On Diff #125383)

Is this used when an object file doesn't have type record hash values?

If you use 64-bit values as unique keys and want to maintain a probability of collision lower than 10^-9, for example, the maximum number of type records you can have is 190,000, according to [1]. Is this enough?

https://en.wikipedia.org/wiki/Birthday_problem#Probability_table

41 ↗(On Diff #125383)

identify

ruiu accepted this revision.Dec 5 2017, 1:42 PM

LGTM

Let's land this so that we can experiment. I still doubt if a noncryptographic tree hash is faster than SHA1, but that can be experimented once this patch is in.

This revision is now accepted and ready to land.Dec 5 2017, 1:42 PM
zturner added a comment.EditedDec 5 2017, 2:02 PM
In D40736#945605, @ruiu wrote:

LGTM

Let's land this so that we can experiment. I still doubt if a noncryptographic tree hash is faster than SHA1, but that can be experimented once this patch is in.

We can't compare tree vs non-tree. Only tree vs tree and non-tree vs non-tree. In those scenarios, noncryptographic hash function is always faster than SHA1.

Current algorithm uses non-cryptographic non-tree hash function. The new code will use cryptographic tree hash function. So hash computation gets slower (tree is slower than non-tree, cryptographic is slower than non-cryptographic), but the magic is that the work can be off-loaded to the compiler. The other thing that's easy to lose sight of is that it's not just the speed of computing the hash. It's also the cost of probing. Since our hashes are bucketed only on the lower 4 bytes, there's always going to be probes no matter what the hash function is. But with the tree hash, a probe gets much cheaper, because an equality comparison is just a 20-byte tree hash comparison, and not a massive full record comparison. On a large application, that could be several terabytes of memcmps which we can now eliminate.

Anyway, like you said we'll see after some more experimentation :) Thanks!

This revision was automatically updated to reflect the committed changes.