ld.lld SHF_MERGE|SHF_STRINGS duplicate elimination is computation heavy
and utilitizes llvm::xxHash64, a simplified version of XXH64.
Externally many sources confirm that a new variant XXH3 is much faster.
I have picked a few hash implementations and computed the
proportion of time spent on hashing in the overall link time (a debug
build of clang 16 on a machine using AMD Zen 2 architecture):
- llvm::xxHash64: 3.63%
- official XXH64 (#define XXH_VECTOR XXH_SCALAR): 3.53%
- official XXH3_64bits (#define XXH_VECTOR XXH_SCALAR): 1.21%
- official XXH3_64bits (default, essentially XXH_SSE2): 1.22%
- this patch llvm::xxh3_64bits: 1.19%
The remaining part of lld remains unchanged. Consequently, a lower ratio
indicates that hashing is faster. Therefore, it is evident that XXH3 from xxhash
is significantly faster than both the official version and our llvm::xxHash64.
(
string length: count
1-3: 393434
4-8: 2084056
9-16: 2846249
17-128: 5598928
129-240: 1317989
241-: 328058
)
This patch adds heavily simplified https://github.com/Cyan4973/xxHash,
taking account of many simplification ideas from Devin Hussey's xxhash-clean.
Important x86-64 optimization ideas:
- Make XXH3_len_129to240_64b and XXH3_hashLong_64b noinline
- Unroll XXH3_len_17to128_64b
- __restrict does not affect Clang code generation
Beside SHF_MERGE|SHF_STRINGS duplicate elimination, llvm/ADT/StringMap.h
StringMapImpl::LookupBucketFor and a few places in lld can potentially be
accelerated by switching to llvm::xxh3_64bits.
Why exposing this version to the user?