This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Fix a hash code truncating issue in ContextTrieNode.
ClosedPublic

Authored by hoy on Nov 11 2021, 9:53 AM.

Details

Summary

std::hash returns a 64bit hash code while previously we were using only lower 32 bits which caused hash collision for large workloads.

Diff Detail

Event Timeline

hoy created this revision.Nov 11 2021, 9:53 AM
hoy requested review of this revision.Nov 11 2021, 9:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2021, 9:53 AM
wlei accepted this revision.Nov 11 2021, 9:58 AM

LGTM, thanks for the fix!

This revision is now accepted and ready to land.Nov 11 2021, 9:58 AM
wenlei added inline comments.Nov 11 2021, 5:09 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
183–186

This nodeHash is the blend function similar to DJB and it's assuming 32-bit length, see the shift 16. If we are fully switching to 64 bit hash, we should change the blend function too.

However, does the collision come from the blend function (nodeHash), or lower part of std::hash? I guess it's the later, then I'm curious what strings actually lead to collision on the lower 32 bit?

hoy added inline comments.Nov 12 2021, 8:51 AM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
183–186

Good point! Is (Callsite.LineOffset << 32) | Callsite.Discriminator; a good form for 64-bit?

Yes, the collision came from a colliding lower-32-bit of std::hash

Same lower 32-bit: 2086825821
Name1:          _ZNSt8__detail15_Hash_code_baseINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS6_PmENS_10_Select1stESt4hashIS6_ENS_18_Mod_range_hashingENS_20_Default_ranged_hashELb1EE10_M_extractEv
                  _
Name2: ZSt12__niter_baseIPN4onnx3UseESt6vectorIS1_SaIS1_EEET_N9__gnu_cxx17__normal_iteratorIS6_T0_EE
wenlei added inline comments.Nov 13 2021, 7:08 PM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
183–186

Is (Callsite.LineOffset << 32) | Callsite.Discriminator; a good form for 64-bit?

I think that should be good.

Though I'd be surprised if we really need 64-bit hash to avoid collision. We just want to avoid collision among all callees of a function, which is a small space.

Not sure what's the hash function used by std::hash, but perhaps that is not a hash with great distribution when truncated. Can you try djbHash from llvm-project/llvm/include/llvm/Support/DJB.h to see if we can make it work without increasing to 64-bit?

hoy added inline comments.Nov 15 2021, 8:53 AM
llvm/lib/Transforms/IPO/SampleContextTracker.cpp
183–186

Looks like 32-bit hash isn't working well:

uint64_t NameHash1 = djbHash("_ZN8facebook18storage_foundation15detail_gen_"
                              "avx226t10x1_95cc5b80e2a6436ca08eEPKPKhPKPhm");
  uint64_t NameHash2 = djbHash(
      "_ZSt4moveIRPN5caffe5LayerIdEEEONSt16remove_referenceIT_E4typeEOS6_");
  outs() << NameHash1 << " " << NameHash2 << "\n";

output:
309151195 309151195
hoy updated this revision to Diff 387287.Nov 15 2021, 8:53 AM

Updating D113688: [CSSPGO] Fix a hash code truncating issue in ContextTrieNode.

wenlei accepted this revision.Nov 15 2021, 10:51 PM

lgtm with a nit, thanks.

llvm/lib/Transforms/IPO/SampleContextTracker.cpp
35

nit: the convention is to use auto when full type name is too long. uint64_t is an unambiguous case for explicit type. same for other places.

183–186

okay, thanks for checking. Using 64-bit hash makes sense then.

hoy updated this revision to Diff 387675.Nov 16 2021, 9:18 AM

Addressing Wenlei's comment.

This revision was automatically updated to reflect the committed changes.