This is an archive of the discontinued LLVM Phabricator instance.

Only hash the start of keys in StringMap
Needs ReviewPublic

Authored by clubby789 on Nov 7 2022, 5:04 PM.

Details

Summary

Hashing large constant blobs results in long compile times - hash only up to 1024 bytes.

Diff Detail

Event Timeline

clubby789 created this revision.Nov 7 2022, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 5:04 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
clubby789 requested review of this revision.Nov 7 2022, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 7 2022, 5:04 PM
clubby789 added inline comments.Nov 7 2022, 5:06 PM
llvm/lib/Support/StringMap.cpp
87

1024 was chosen arbitrarily and different numbers may give a better tradeoff between collisions and hash time.

It may make sense to first/also switch to xxHash (already included here https://github.com/llvm/llvm-project/blob/295861514e0d1e48df2918b630dd692ac27ee0de/llvm/include/llvm/Support/xxhash.h#L45), which should be much faster than DJB, especially for large keys.

OTOH, it appears that some tests depend on StringMap iteration order, so updating them as a result of changing the hash function may be painful. For example, https://reviews.llvm.org/D97396 (which attempted to change the seed value) was reverted for that reason.

Seems problematic if the strings happen to have common prefixes - not sure this is the right direction.

While it might be more work, might be worth considering changing hash function as @erikdesjardins mentioned, if considerations are made to help folks migrate/deal with the fallout of order changing (maybe even worth building in defensive features to reduce order dependence, like Abseil uses for its hash containers - though I'd have thought reverse iteration mode would've helped flush most of this out?)

Kobzol added a subscriber: Kobzol.Nov 10 2022, 7:51 AM