This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Support/
-
llvm/
-
Support/
1/1
xxhash.h
-
lib/Support/
-
Support/
6/6
xxhash.cpp
-
unittests/Support/
-
Support/
-
xxhashTest.cpp

Differential D154812

[Support] Add llvm::xxh3_64bits
ClosedPublic

Authored by MaskRay on Jul 9 2023, 8:04 PM.

Download Raw Diff

Details

Reviewers

andrewng
bkramer
peter.smith
PiotrZSL
serge-sans-paille

Commits

rG48e93f57f1ee: [Support] Add llvm::xxh3_64bits

Summary

ld.lld SHF_MERGE|SHF_STRINGS duplicate elimination is computation heavy
and utilitizes llvm::xxHash64, a simplified version of XXH64.
Externally many sources confirm that a new variant XXH3 is much faster.

I have picked a few hash implementations and computed the
proportion of time spent on hashing in the overall link time (a debug
build of clang 16 on a machine using AMD Zen 2 architecture):

llvm::xxHash64: 3.63%
official XXH64 (#define XXH_VECTOR XXH_SCALAR): 3.53%
official XXH3_64bits (#define XXH_VECTOR XXH_SCALAR): 1.21%
official XXH3_64bits (default, essentially XXH_SSE2): 1.22%
this patch llvm::xxh3_64bits: 1.19%

The remaining part of lld remains unchanged. Consequently, a lower ratio
indicates that hashing is faster. Therefore, it is evident that XXH3 from xxhash
is significantly faster than both the official version and our llvm::xxHash64.

(
string length: count
1-3: 393434
4-8: 2084056
9-16: 2846249
17-128: 5598928
129-240: 1317989
241-: 328058
)

This patch adds heavily simplified https://github.com/Cyan4973/xxHash,
taking account of many simplification ideas from Devin Hussey's xxhash-clean.

Important x86-64 optimization ideas:

Make XXH3_len_129to240_64b and XXH3_hashLong_64b noinline
Unroll XXH3_len_17to128_64b
__restrict does not affect Clang code generation

Beside SHF_MERGE|SHF_STRINGS duplicate elimination, llvm/ADT/StringMap.h
StringMapImpl::LookupBucketFor and a few places in lld can potentially be
accelerated by switching to llvm::xxh3_64bits.

Link: https://github.com/llvm/llvm-project/issues/63750

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MaskRay created this revision.Jul 9 2023, 8:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2023, 8:04 PM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

MaskRay requested review of this revision.Jul 9 2023, 8:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2023, 8:04 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

MaskRay edited the summary of this revision. (Show Details)Jul 9 2023, 8:05 PM

MaskRay mentioned this in D154813: [ELF] Use llvm::xxh3_64bits for MergeInputSection::splitStrings.Jul 9 2023, 8:18 PM

Harbormaster completed remote builds in B244040: Diff 538498.Jul 9 2023, 8:56 PM

Looks like a useful optimisation, I'm not in a great position to review the code. Will try and find some time to run a benchmark on AArch64 to see if I can replicate the results on a non-X64 machine.

MaskRay added a child revision: D154813: [ELF] Use llvm::xxh3_64bits for MergeInputSection::splitStrings.Jul 10 2023, 9:18 AM

Ping:)

LGTM with the minor nits mentioned above.

llvm/include/llvm/Support/xxhash.h
48	Why exposing this version to the user?
llvm/lib/Support/xxhash.cpp
3	you should probably update the copyright notice to 2021 to reflect upstream https://github.com/Cyan4973/xxHash/blob/dev/LICENSE#L2
38	Mention the commit number here?
151	could be `constexpr size_t`
198	same here.

This revision is now accepted and ready to land.Jul 18 2023, 4:39 AM

I haven't looked at the code but I have built ld.lld with this patch and the patch from D154813 on Windows with both VS 2022 cl 19.36.32537 and clang-cl 16.0.6. I have run the usual benchmarks (clang, chrome, mozilla, scylla) and also a UE5 link and I don't see any difference in performance on my Windows 10 PC, i.e. any delta is in the measurement "noise".

In D154812#4511112, @andrewng wrote:

I haven't looked at the code but I have built ld.lld with this patch and the patch from D154813 on Windows with both VS 2022 cl 19.36.32537 and clang-cl 16.0.6. I have run the usual benchmarks (clang, chrome, mozilla, scylla) and also a UE5 link and I don't see any difference in performance on my Windows 10 PC, i.e. any delta is in the measurement "noise".

Thank you for benchmarking this on Windows. Do these programs have large .debug_str sections? (Windows builds normally use CodeView/PDB, not DWARF).
I think only any effect is likely only observed with large .debug_str sections. To observe the effect of replacing the hash function, I need to compute the time ratio of hashing

The remaining part of lld remains unchanged. Consequently, a lower ratio indicates that hashing is faster.

I have observed ~2% speedup on an x86-64 machine, but the result on an aarch64 machine (Cavium ThunderX2) is so-so.
In PiotrZSL's setup the speedup seems larger?

I think xxh3 is still worth it and we can probably drop the xxh64 implementation once internal implementations have migrated.

% hyperfine --warmup 1 --min-runs 16 "numactl -C 16-23 "{/tmp/lld0,/tmp/lld1}" -flavor gnu @response.txt --threads=8"
Benchmark 1: numactl -C 16-23 /tmp/lld0 -flavor gnu @response.txt --threads=8
  Time (mean ± σ):      7.393 s ±  0.089 s    [User: 17.077 s, System: 4.274 s]
  Range (min … max):    7.208 s …  7.561 s    16 runs

Benchmark 2: numactl -C 16-23 /tmp/lld1 -flavor gnu @response.txt --threads=8
  Time (mean ± σ):      7.351 s ±  0.093 s    [User: 16.757 s, System: 4.236 s]
  Range (min … max):    7.165 s …  7.519 s    16 runs

address comment

llvm/lib/Support/xxhash.cpp
3	Yes, I'll do that. Changing this in the differential would cause clang-format to format all the text below, which is I'd want to avoid. I'll change this when landing...
151	Sounds good. I was to retain the original code style, but since the original code has been mostly rewritten, changing the macros should be fine as well...

update description

"In PiotrZSL's setup the speedup seems larger?"

I was running this on Ryzen 7 3800X under Ubuntu on single thread with 1.3GB clang binary linking.
When running this with more threads, gain looks smaller because that heavy chunk is split into multiple threads, and gain on entire application seems smaller.

This revision was landed with ongoing or failed builds.Jul 18 2023, 1:36 PM

Closed by commit rG48e93f57f1ee: [Support] Add llvm::xxh3_64bits (authored by MaskRay). · Explain Why

This revision was automatically updated to reflect the committed changes.

MaskRay added a commit: rG48e93f57f1ee: [Support] Add llvm::xxh3_64bits.

Harbormaster completed remote builds in B246332: Diff 541706.Jul 18 2023, 8:09 PM

I haven't looked at the code but I have built ld.lld with this patch and the patch from D154813 on Windows with both VS 2022 cl 19.36.32537 and clang-cl 16.0.6. I have run the usual benchmarks (clang, chrome, mozilla, scylla) and also a UE5 link and I don't see any difference in performance on my Windows 10 PC, i.e. any delta is in the measurement "noise".

Thank you for benchmarking this on Windows. Do these programs have large .debug_str sections? (Windows builds normally use CodeView/PDB, not DWARF).
I think only any effect is likely only observed with large .debug_str sections. To observe the effect of replacing the hash function, I need to compute the time ratio of hashing

The tests that I ran were all ELF links and one of the clang links included DWARF debug.

Out of interest, which compiler did you use to build your Linux lld?

MaskRay mentioned this in rG4f608151148d: [ELF] Use llvm::xxh3_64bits for MergeInputSection::splitStrings.Jul 19 2023, 8:34 AM

In D154812#4514078, @andrewng wrote:

I haven't looked at the code but I have built ld.lld with this patch and the patch from D154813 on Windows with both VS 2022 cl 19.36.32537 and clang-cl 16.0.6. I have run the usual benchmarks (clang, chrome, mozilla, scylla) and also a UE5 link and I don't see any difference in performance on my Windows 10 PC, i.e. any delta is in the measurement "noise".

Thank you for benchmarking this on Windows. Do these programs have large .debug_str sections? (Windows builds normally use CodeView/PDB, not DWARF).
I think only any effect is likely only observed with large .debug_str sections. To observe the effect of replacing the hash function, I need to compute the time ratio of hashing

The tests that I ran were all ELF links and one of the clang links included DWARF debug.

Out of interest, which compiler did you use to build your Linux lld?

I use a close-to-trunk prebuilt Clang from Chromium: curl -s https://raw.githubusercontent.com/chromium/chromium/main/tools/clang/scripts/update.py | python3 - --output-dir=~/Stable

Revision Contents

Path

Size

llvm/

include/

llvm/

Support/

xxhash.h

5 lines

lib/

Support/

xxhash.cpp

289 lines

unittests/

Support/

xxhashTest.cpp

43 lines

Diff 541712

llvm/include/llvm/Support/xxhash.h

	Show All 38 Lines
	#define LLVM_SUPPORT_XXHASH_H			#define LLVM_SUPPORT_XXHASH_H

	#include "llvm/ADT/ArrayRef.h"			#include "llvm/ADT/ArrayRef.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"

	namespace llvm {			namespace llvm {
	uint64_t xxHash64(llvm::StringRef Data);			uint64_t xxHash64(llvm::StringRef Data);
	uint64_t xxHash64(llvm::ArrayRef<uint8_t> Data);			uint64_t xxHash64(llvm::ArrayRef<uint8_t> Data);

				uint64_t xxh3_64bits(ArrayRef<uint8_t> data);
				serge-sans-pailleUnsubmitted Done Reply Inline Actions Why exposing this version to the user? serge-sans-paille: Why exposing this version to the user?
				inline uint64_t xxh3_64bits(StringRef data) {
				return xxh3_64bits(ArrayRef(data.bytes_begin(), data.size()));
				}
	}			}

	#endif			#endif

llvm/lib/Support/xxhash.cpp

/*		/*
* xxHash - Fast Hash algorithm		* xxHash - Fast Hash algorithm
* Copyright (C) 2012-2016, Yann Collet		* Copyright (C) 2012-2021, Yann Collet
		serge-sans-pailleUnsubmitted Done Reply Inline Actions you should probably update the copyright notice to 2021 to reflect upstream https://github.com/Cyan4973/xxHash/blob/dev/LICENSE#L2 serge-sans-paille: you should probably update the copyright notice to 2021 to reflect upstream https://github.
		MaskRayAuthorUnsubmitted Done Reply Inline Actions Yes, I'll do that. Changing this in the differential would cause clang-format to format all the text below, which is I'd want to avoid. I'll change this when landing... MaskRay: Yes, I'll do that. Changing this in the differential would cause clang-format to format all the…
*		*
* BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)		* BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
*		*
* Redistribution and use in source and binary forms, with or without		* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are		* modification, are permitted provided that the following conditions are
* met:		* met:
*		*
* * Redistributions of source code must retain the above copyright		* * Redistributions of source code must retain the above copyright
Show All 15 Lines
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE		* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.		* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*		*
* You can contact the author at :		* You can contact the author at :
* - xxHash homepage: http://www.xxhash.com		* - xxHash homepage: http://www.xxhash.com
* - xxHash source repository : https://github.com/Cyan4973/xxHash		* - xxHash source repository : https://github.com/Cyan4973/xxHash
*/		*/

/* based on revision d2df04efcbef7d7f6886d345861e5dfda4edacc1 Removed		// xxhash64 is based on commit d2df04efcbef7d7f6886d345861e5dfda4edacc1. Removed
* everything but a simple interface for computing XXh64. */		// everything but a simple interface for computing xxh64.

		// xxh3_64bits is based on commit d5891596637d21366b9b1dcf2c0007a3edb26a9e (July
		serge-sans-pailleUnsubmitted Done Reply Inline Actions Mention the commit number here? serge-sans-paille: Mention the commit number here?
		// 2023).

#include "llvm/Support/xxhash.h"		#include "llvm/Support/xxhash.h"
		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"

#include <stdlib.h>		#include <stdlib.h>

using namespace llvm;		using namespace llvm;
using namespace support;		using namespace support;

static uint64_t rotl64(uint64_t X, size_t R) {		static uint64_t rotl64(uint64_t X, size_t R) {
return (X << R) \| (X >> (64 - R));		return (X << R) \| (X >> (64 - R));
}		}

		constexpr uint32_t PRIME32_1 = 0x9E3779B1;
		constexpr uint32_t PRIME32_2 = 0x85EBCA77;
		constexpr uint32_t PRIME32_3 = 0xC2B2AE3D;

static const uint64_t PRIME64_1 = 11400714785074694791ULL;		static const uint64_t PRIME64_1 = 11400714785074694791ULL;
static const uint64_t PRIME64_2 = 14029467366897019727ULL;		static const uint64_t PRIME64_2 = 14029467366897019727ULL;
static const uint64_t PRIME64_3 = 1609587929392839161ULL;		static const uint64_t PRIME64_3 = 1609587929392839161ULL;
static const uint64_t PRIME64_4 = 9650029242287828579ULL;		static const uint64_t PRIME64_4 = 9650029242287828579ULL;
static const uint64_t PRIME64_5 = 2870177450012600261ULL;		static const uint64_t PRIME64_5 = 2870177450012600261ULL;

static uint64_t round(uint64_t Acc, uint64_t Input) {		static uint64_t round(uint64_t Acc, uint64_t Input) {
Acc += Input * PRIME64_2;		Acc += Input * PRIME64_2;
Acc = rotl64(Acc, 31);		Acc = rotl64(Acc, 31);
Acc *= PRIME64_1;		Acc *= PRIME64_1;
return Acc;		return Acc;
}		}

static uint64_t mergeRound(uint64_t Acc, uint64_t Val) {		static uint64_t mergeRound(uint64_t Acc, uint64_t Val) {
Val = round(0, Val);		Val = round(0, Val);
Acc ^= Val;		Acc ^= Val;
Acc = Acc * PRIME64_1 + PRIME64_4;		Acc = Acc * PRIME64_1 + PRIME64_4;
return Acc;		return Acc;
}		}

		static uint64_t XXH64_avalanche(uint64_t hash) {
		hash ^= hash >> 33;
		hash *= PRIME64_2;
		hash ^= hash >> 29;
		hash *= PRIME64_3;
		hash ^= hash >> 32;
		return hash;
		}

uint64_t llvm::xxHash64(StringRef Data) {		uint64_t llvm::xxHash64(StringRef Data) {
size_t Len = Data.size();		size_t Len = Data.size();
uint64_t Seed = 0;		uint64_t Seed = 0;
const unsigned char *P = Data.bytes_begin();		const unsigned char *P = Data.bytes_begin();
const unsigned char *const BEnd = Data.bytes_end();		const unsigned char *const BEnd = Data.bytes_end();
uint64_t H64;		uint64_t H64;

if (Len >= 32) {		if (Len >= 32) {
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	uint64_t llvm::xxHash64(StringRef Data) {
}		}

while (P < BEnd) {		while (P < BEnd) {
H64 ^= (P) PRIME64_5;		H64 ^= (P) PRIME64_5;
H64 = rotl64(H64, 11) * PRIME64_1;		H64 = rotl64(H64, 11) * PRIME64_1;
P++;		P++;
}		}

H64 ^= H64 >> 33;		return XXH64_avalanche(H64);
H64 *= PRIME64_2;
H64 ^= H64 >> 29;
H64 *= PRIME64_3;
H64 ^= H64 >> 32;

return H64;
}		}

uint64_t llvm::xxHash64(ArrayRef<uint8_t> Data) {		uint64_t llvm::xxHash64(ArrayRef<uint8_t> Data) {
return xxHash64({(const char *)Data.data(), Data.size()});		return xxHash64({(const char *)Data.data(), Data.size()});
}		}

		constexpr size_t XXH3_SECRETSIZE_MIN = 136;
		serge-sans-pailleUnsubmitted Done Reply Inline Actions could be `constexpr size_t` serge-sans-paille: could be `constexpr size_t`
		MaskRayAuthorUnsubmitted Done Reply Inline Actions Sounds good. I was to retain the original code style, but since the original code has been mostly rewritten, changing the macros should be fine as well... MaskRay: Sounds good. I was to retain the original code style, but since the original code has been…
		constexpr size_t XXH_SECRET_DEFAULT_SIZE = 192;

		/* Pseudorandom data taken directly from FARSH */
		// clang-format off
		constexpr uint8_t kSecret[XXH_SECRET_DEFAULT_SIZE] = {
		0xb8, 0xfe, 0x6c, 0x39, 0x23, 0xa4, 0x4b, 0xbe, 0x7c, 0x01, 0x81, 0x2c, 0xf7, 0x21, 0xad, 0x1c,
		0xde, 0xd4, 0x6d, 0xe9, 0x83, 0x90, 0x97, 0xdb, 0x72, 0x40, 0xa4, 0xa4, 0xb7, 0xb3, 0x67, 0x1f,
		0xcb, 0x79, 0xe6, 0x4e, 0xcc, 0xc0, 0xe5, 0x78, 0x82, 0x5a, 0xd0, 0x7d, 0xcc, 0xff, 0x72, 0x21,
		0xb8, 0x08, 0x46, 0x74, 0xf7, 0x43, 0x24, 0x8e, 0xe0, 0x35, 0x90, 0xe6, 0x81, 0x3a, 0x26, 0x4c,
		0x3c, 0x28, 0x52, 0xbb, 0x91, 0xc3, 0x00, 0xcb, 0x88, 0xd0, 0x65, 0x8b, 0x1b, 0x53, 0x2e, 0xa3,
		0x71, 0x64, 0x48, 0x97, 0xa2, 0x0d, 0xf9, 0x4e, 0x38, 0x19, 0xef, 0x46, 0xa9, 0xde, 0xac, 0xd8,
		0xa8, 0xfa, 0x76, 0x3f, 0xe3, 0x9c, 0x34, 0x3f, 0xf9, 0xdc, 0xbb, 0xc7, 0xc7, 0x0b, 0x4f, 0x1d,
		0x8a, 0x51, 0xe0, 0x4b, 0xcd, 0xb4, 0x59, 0x31, 0xc8, 0x9f, 0x7e, 0xc9, 0xd9, 0x78, 0x73, 0x64,
		0xea, 0xc5, 0xac, 0x83, 0x34, 0xd3, 0xeb, 0xc3, 0xc5, 0x81, 0xa0, 0xff, 0xfa, 0x13, 0x63, 0xeb,
		0x17, 0x0d, 0xdd, 0x51, 0xb7, 0xf0, 0xda, 0x49, 0xd3, 0x16, 0x55, 0x26, 0x29, 0xd4, 0x68, 0x9e,
		0x2b, 0x16, 0xbe, 0x58, 0x7d, 0x47, 0xa1, 0xfc, 0x8f, 0xf8, 0xb8, 0xd1, 0x7a, 0xd0, 0x31, 0xce,
		0x45, 0xcb, 0x3a, 0x8f, 0x95, 0x16, 0x04, 0x28, 0xaf, 0xd7, 0xfb, 0xca, 0xbb, 0x4b, 0x40, 0x7e,
		};
		// clang-format on

		constexpr uint64_t PRIME_MX1 = 0x165667919E3779F9;
		constexpr uint64_t PRIME_MX2 = 0x9FB21C651E98DF25;

		// Calculates a 64-bit to 128-bit multiply, then XOR folds it.
		static uint64_t XXH3_mul128_fold64(uint64_t lhs, uint64_t rhs) {
		#if defined(__SIZEOF_INT128__) \|\| \
		(defined(_INTEGRAL_MAX_BITS) && _INTEGRAL_MAX_BITS >= 128)
		__uint128_t product = (__uint128_t)lhs * (__uint128_t)rhs;
		return uint64_t(product) ^ uint64_t(product >> 64);

		#else
		/* First calculate all of the cross products. */
		const uint64_t lo_lo = (lhs & 0xFFFFFFFF) * (rhs & 0xFFFFFFFF);
		const uint64_t hi_lo = (lhs >> 32) * (rhs & 0xFFFFFFFF);
		const uint64_t lo_hi = (lhs & 0xFFFFFFFF) * (rhs >> 32);
		const uint64_t hi_hi = (lhs >> 32) * (rhs >> 32);

		/* Now add the products together. These will never overflow. */
		const uint64_t cross = (lo_lo >> 32) + (hi_lo & 0xFFFFFFFF) + lo_hi;
		const uint64_t upper = (hi_lo >> 32) + (cross >> 32) + hi_hi;
		const uint64_t lower = (cross << 32) \| (lo_lo & 0xFFFFFFFF);

		return upper ^ lower;
		#endif
		}

		constexpr size_t XXH_STRIPE_LEN = 64;
		serge-sans-pailleUnsubmitted Done Reply Inline Actions same here. serge-sans-paille: same here.
		constexpr size_t XXH_SECRET_CONSUME_RATE = 8;
		constexpr size_t XXH_ACC_NB = XXH_STRIPE_LEN / sizeof(uint64_t);

		static uint64_t XXH3_avalanche(uint64_t hash) {
		hash ^= hash >> 37;
		hash *= PRIME_MX1;
		hash ^= hash >> 32;
		return hash;
		}

		static uint64_t XXH3_len_1to3_64b(const uint8_t *input, size_t len,
		const uint8_t *secret, uint64_t seed) {
		const uint8_t c1 = input[0];
		const uint8_t c2 = input[len >> 1];
		const uint8_t c3 = input[len - 1];
		uint32_t combined = ((uint32_t)c1 << 16) \| ((uint32_t)c2 << 24) \|
		((uint32_t)c3 << 0) \| ((uint32_t)len << 8);
		uint64_t bitflip =
		(uint64_t)(endian::read32le(secret) ^ endian::read32le(secret + 4)) +
		seed;
		return XXH64_avalanche(uint64_t(combined) ^ bitflip);
		}

		static uint64_t XXH3_len_4to8_64b(const uint8_t *input, size_t len,
		const uint8_t *secret, uint64_t seed) {
		seed ^= (uint64_t)byteswap(uint32_t(seed)) << 32;
		const uint32_t input1 = endian::read32le(input);
		const uint32_t input2 = endian::read32le(input + len - 4);
		uint64_t acc =
		(endian::read64le(secret + 8) ^ endian::read64le(secret + 16)) - seed;
		const uint64_t input64 = (uint64_t)input2 \| ((uint64_t)input1 << 32);
		acc ^= input64;
		// XXH3_rrmxmx(acc, len)
		acc ^= rotl64(acc, 49) ^ rotl64(acc, 24);
		acc *= PRIME_MX2;
		acc ^= (acc >> 35) + (uint64_t)len;
		acc *= PRIME_MX2;
		return acc ^ (acc >> 28);
		}

		static uint64_t XXH3_len_9to16_64b(const uint8_t *input, size_t len,
		const uint8_t *secret, uint64_t const seed) {
		uint64_t input_lo =
		(endian::read64le(secret + 24) ^ endian::read64le(secret + 32)) + seed;
		uint64_t input_hi =
		(endian::read64le(secret + 40) ^ endian::read64le(secret + 48)) - seed;
		input_lo ^= endian::read64le(input);
		input_hi ^= endian::read64le(input + len - 8);
		uint64_t acc = uint64_t(len) + byteswap(input_lo) + input_hi +
		XXH3_mul128_fold64(input_lo, input_hi);
		return XXH3_avalanche(acc);
		}

		LLVM_ATTRIBUTE_ALWAYS_INLINE
		static uint64_t XXH3_len_0to16_64b(const uint8_t *input, size_t len,
		const uint8_t *secret, uint64_t const seed) {
		if (LLVM_LIKELY(len > 8))
		return XXH3_len_9to16_64b(input, len, secret, seed);
		if (LLVM_LIKELY(len >= 4))
		return XXH3_len_4to8_64b(input, len, secret, seed);
		if (len != 0)
		return XXH3_len_1to3_64b(input, len, secret, seed);
		return XXH64_avalanche(seed ^ endian::read64le(secret + 56) ^
		endian::read64le(secret + 64));
		}

		static uint64_t XXH3_mix16B(const uint8_t input, uint8_t const secret,
		uint64_t seed) {
		uint64_t lhs = seed;
		uint64_t rhs = 0U - seed;
		lhs += endian::read64le(secret);
		rhs += endian::read64le(secret + 8);
		lhs ^= endian::read64le(input);
		rhs ^= endian::read64le(input + 8);
		return XXH3_mul128_fold64(lhs, rhs);
		}

		/* For mid range keys, XXH3 uses a Mum-hash variant. */
		LLVM_ATTRIBUTE_ALWAYS_INLINE
		static uint64_t XXH3_len_17to128_64b(const uint8_t *input, size_t len,
		const uint8_t *secret,
		uint64_t const seed) {
		uint64_t acc = len * PRIME64_1, acc_end;
		acc += XXH3_mix16B(input + 0, secret + 0, seed);
		acc_end = XXH3_mix16B(input + len - 16, secret + 16, seed);
		if (len > 32) {
		acc += XXH3_mix16B(input + 16, secret + 32, seed);
		acc_end += XXH3_mix16B(input + len - 32, secret + 48, seed);
		if (len > 64) {
		acc += XXH3_mix16B(input + 32, secret + 64, seed);
		acc_end += XXH3_mix16B(input + len - 48, secret + 80, seed);
		if (len > 96) {
		acc += XXH3_mix16B(input + 48, secret + 96, seed);
		acc_end += XXH3_mix16B(input + len - 64, secret + 112, seed);
		}
		}
		}
		return XXH3_avalanche(acc + acc_end);
		}

		constexpr size_t XXH3_MIDSIZE_MAX = 240;

		LLVM_ATTRIBUTE_NOINLINE
		static uint64_t XXH3_len_129to240_64b(const uint8_t *input, size_t len,
		const uint8_t *secret, uint64_t seed) {
		constexpr size_t XXH3_MIDSIZE_STARTOFFSET = 3;
		constexpr size_t XXH3_MIDSIZE_LASTOFFSET = 17;
		uint64_t acc = (uint64_t)len * PRIME64_1;
		const unsigned nbRounds = len / 16;
		for (unsigned i = 0; i < 8; ++i)
		acc += XXH3_mix16B(input + 16 * i, secret + 16 * i, seed);
		acc = XXH3_avalanche(acc);

		for (unsigned i = 8; i < nbRounds; ++i) {
		acc += XXH3_mix16B(input + 16 * i,
		secret + 16 * (i - 8) + XXH3_MIDSIZE_STARTOFFSET, seed);
		}
		/* last bytes */
		acc +=
		XXH3_mix16B(input + len - 16,
		secret + XXH3_SECRETSIZE_MIN - XXH3_MIDSIZE_LASTOFFSET, seed);
		return XXH3_avalanche(acc);
		}

		LLVM_ATTRIBUTE_ALWAYS_INLINE
		static void XXH3_accumulate_512_scalar(uint64_t acc, const uint8_t input,
		const uint8_t *secret) {
		for (size_t i = 0; i < XXH_ACC_NB; ++i) {
		uint64_t data_val = endian::read64le(input + 8 * i);
		uint64_t data_key = data_val ^ endian::read64le(secret + 8 * i);
		acc[i ^ 1] += data_val;
		acc[i] += uint32_t(data_key) * (data_key >> 32);
		}
		}

		LLVM_ATTRIBUTE_ALWAYS_INLINE
		static void XXH3_accumulate_scalar(uint64_t acc, const uint8_t input,
		const uint8_t *secret, size_t nbStripes) {
		for (size_t n = 0; n < nbStripes; ++n)
		XXH3_accumulate_512_scalar(acc, input + n * XXH_STRIPE_LEN,
		secret + n * XXH_SECRET_CONSUME_RATE);
		}

		static void XXH3_scrambleAcc(uint64_t acc, const uint8_t secret) {
		for (size_t i = 0; i < XXH_ACC_NB; ++i) {
		acc[i] ^= acc[i] >> 47;
		acc[i] ^= endian::read64le(secret + 8 * i);
		acc[i] *= PRIME32_1;
		}
		}

		static uint64_t XXH3_mix2Accs(const uint64_t acc, const uint8_t secret) {
		return XXH3_mul128_fold64(acc[0] ^ endian::read64le(secret),
		acc[1] ^ endian::read64le(secret + 8));
		}

		static uint64_t XXH3_mergeAccs(const uint64_t acc, const uint8_t key,
		uint64_t start) {
		uint64_t result64 = start;
		for (size_t i = 0; i < 4; ++i)
		result64 += XXH3_mix2Accs(acc + 2 * i, key + 16 * i);
		return XXH3_avalanche(result64);
		}

		LLVM_ATTRIBUTE_NOINLINE
		static uint64_t XXH3_hashLong_64b(const uint8_t *input, size_t len,
		const uint8_t *secret, size_t secretSize) {
		const size_t nbStripesPerBlock =
		(secretSize - XXH_STRIPE_LEN) / XXH_SECRET_CONSUME_RATE;
		const size_t block_len = XXH_STRIPE_LEN * nbStripesPerBlock;
		const size_t nb_blocks = (len - 1) / block_len;
		alignas(16) uint64_t acc[XXH_ACC_NB] = {
		PRIME32_3, PRIME64_1, PRIME64_2, PRIME64_3,
		PRIME64_4, PRIME32_2, PRIME64_5, PRIME32_1,
		};
		for (size_t n = 0; n < nb_blocks; ++n) {
		XXH3_accumulate_scalar(acc, input + n * block_len, secret,
		nbStripesPerBlock);
		XXH3_scrambleAcc(acc, secret + secretSize - XXH_STRIPE_LEN);
		}

		/* last partial block */
		const size_t nbStripes = (len - 1 - (block_len * nb_blocks)) / XXH_STRIPE_LEN;
		assert(nbStripes <= secretSize / XXH_SECRET_CONSUME_RATE);
		XXH3_accumulate_scalar(acc, input + nb_blocks * block_len, secret, nbStripes);

		/* last stripe */
		constexpr size_t XXH_SECRET_LASTACC_START = 7;
		XXH3_accumulate_512_scalar(acc, input + len - XXH_STRIPE_LEN,
		secret + secretSize - XXH_STRIPE_LEN -
		XXH_SECRET_LASTACC_START);

		/* converge into final hash */
		constexpr size_t XXH_SECRET_MERGEACCS_START = 11;
		return XXH3_mergeAccs(acc, secret + XXH_SECRET_MERGEACCS_START,
		(uint64_t)len * PRIME64_1);
		}

		uint64_t llvm::xxh3_64bits(ArrayRef<uint8_t> data) {
		auto *in = data.data();
		size_t len = data.size();
		if (len <= 16)
		return XXH3_len_0to16_64b(in, len, kSecret, 0);
		if (len <= 128)
		return XXH3_len_17to128_64b(in, len, kSecret, 0);
		if (len <= XXH3_MIDSIZE_MAX)
		return XXH3_len_129to240_64b(in, len, kSecret, 0);
		return XXH3_hashLong_64b(in, len, kSecret, sizeof(kSecret));
		}

llvm/unittests/Support/xxhashTest.cpp

	Show All 12 Lines

	TEST(xxhashTest, Basic) {			TEST(xxhashTest, Basic) {
	EXPECT_EQ(0xef46db3751d8e999U, xxHash64(StringRef()));			EXPECT_EQ(0xef46db3751d8e999U, xxHash64(StringRef()));
	EXPECT_EQ(0x33bf00a859c4ba3fU, xxHash64("foo"));			EXPECT_EQ(0x33bf00a859c4ba3fU, xxHash64("foo"));
	EXPECT_EQ(0x48a37c90ad27a659U, xxHash64("bar"));			EXPECT_EQ(0x48a37c90ad27a659U, xxHash64("bar"));
	EXPECT_EQ(0x69196c1b3af0bff9U,			EXPECT_EQ(0x69196c1b3af0bff9U,
	xxHash64("0123456789abcdefghijklmnopqrstuvwxyz"));			xxHash64("0123456789abcdefghijklmnopqrstuvwxyz"));
	}			}

				TEST(xxhashTest, xxh3) {
				constexpr size_t size = 2243;
				uint8_t a[size];
				uint64_t x = 1;
				for (size_t i = 0; i < size; ++i) {
				x ^= x << 13;
				x ^= x >> 7;
				x ^= x << 17;
				a[i] = uint8_t(x);
				}

				#define F(len, expected) \
				EXPECT_EQ(uint64_t(expected), xxh3_64bits(ArrayRef(a, size_t(len))))
				F(0, 0x2d06800538d394c2);
				F(1, 0xd0d496e05c553485);
				F(2, 0x84d625edb7055eac);
				F(3, 0x6ea2d59aca5c3778);
				F(4, 0xbf65290914e80242);
				F(5, 0xc01fd099ad4fc8e4);
				F(6, 0x9e3ea8187399caa5);
				F(7, 0x9da8b60540644f5a);
				F(8, 0xabc1413da6cd0209);
				F(9, 0x8bc89400bfed51f6);
				F(16, 0x7e46916754d7c9b8);
				F(17, 0xed4be912ba5f836d);
				F(32, 0xf59b59b58c304fd1);
				F(33, 0x9013fb74ca603e0c);
				F(64, 0xfa5271fcce0db1c3);
				F(65, 0x79c42431727f1012);
				F(96, 0x591ee0ddf9c9ccd1);
				F(97, 0x8ffc6a3111fe19da);
				F(128, 0x06a146ee9a2da378);
				F(129, 0xbc7138129bf065da);
				F(403, 0xcefeb3ffa532ad8c);
				F(512, 0xcdfa6b6268e3650f);
				F(513, 0x4bb5d42742f9765f);
				F(2048, 0x330ce110cbb79eae);
				F(2049, 0x3ba6afa0249fef9a);
				F(2240, 0xd61d4d2a94e926a8);
				F(2243, 0x0979f786a24edde7);
				#undef F
				}

This is an archive of the discontinued LLVM Phabricator instance.

[Support] Add llvm::xxh3_64bitsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 541712

llvm/include/llvm/Support/xxhash.h

llvm/lib/Support/xxhash.cpp

llvm/unittests/Support/xxhashTest.cpp

[Support] Add llvm::xxh3_64bits
ClosedPublic