This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/ADT/
-
llvm/
-
ADT/
1/5
ConcurrentHashtable.h
-
unittests/ADT/
-
ADT/
-
CMakeLists.txt
-
ConcurrentHashtableTest.cpp

Differential D132455

[ADT] add ConcurrentHashtable class.
ClosedPublic

Authored by avl on Aug 23 2022, 2:30 AM.

Download Raw Diff

Details

Reviewers

aprantl
JDevlieghere
dblaikie
MaskRay
int3
ikudrin

Commits

rG42058eea7912: [reland][ADT] add ConcurrentHashtable class.
rG8482b238062e: [ADT] add ConcurrentHashtable class.

Summary

ConcurrentHashTable - is a resizeable concurrent hashtable.
The range of resizings is limited up to x2^32. The hashtable allows only concurrent insertions.

Concurrent hashtable is necessary for the D96035 patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

avl created this revision.Aug 23 2022, 2:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 2:30 AM

Herald added subscribers: StephenFan, mgorny. · View Herald Transcript

avl requested review of this revision.Aug 23 2022, 2:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 2:30 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

avl mentioned this in D125979: [ADT] add FixedConcurrentHashTable class..Aug 23 2022, 2:32 AM

tschuett added a subscriber: tschuett.Aug 23 2022, 2:36 AM

Swift has a ConcurrentMap. IIRC it uses atomics and no locks.
https://github.com/apple/swift/blob/36893aa26041a49bd767acedb99cdd60ad6b3380/include/swift/Runtime/Concurrent.h#L246

Harbormaster completed remote builds in B182788: Diff 454756.Aug 23 2022, 3:44 AM

In D132455#3742117, @tschuett wrote:

Swift has a ConcurrentMap. IIRC it uses atomics and no locks.
https://github.com/apple/swift/blob/36893aa26041a49bd767acedb99cdd60ad6b3380/include/swift/Runtime/Concurrent.h#L246

The comment says that Swift's ConcurrentMap is a binary tree, which is usually slower than hashmap. Another thing is that it does not support rebalancing, while this patch supports rehashing. Though I did not compare this patch with Swift's ConcurrentMap.

avl added a child revision: D132548: [WIP][ADT] Utility for comparision of hashtables implementation..Aug 24 2022, 4:11 AM

Performance results for this patch https://reviews.llvm.org/file/data/a36xmbx43mp35xx5rznf/PHID-FILE-lncmoijytnb3gn4gjrii/Performance.pdf (Collected by utility from : D132548)

lkail added a subscriber: lkail.Aug 24 2022, 6:32 AM

added possibility to specify allocator as template parameter,
added possibility to change range of resizings,
set default resizing range to x2^32

avl edited the summary of this revision. (Show Details)Dec 20 2022, 2:12 PM

Harbormaster completed remote builds in B204238: Diff 484377.Dec 20 2022, 3:27 PM

removed dependence on <experimental/random>

Harbormaster completed remote builds in B204352: Diff 484524.Dec 21 2022, 4:35 AM

@dexonsmith & co working on the CAS have also proposed a thread safe hash table of sorts ( https://reviews.llvm.org/D133715 )- it's a bit more esoteric/specialized, but I wonder if the use cases overlap enough to be able to unify them?

cure windows build.

Harbormaster completed remote builds in B205159: Diff 485614.Dec 29 2022, 8:24 AM

In D132455#4018472, @dblaikie wrote:

@dexonsmith & co working on the CAS have also proposed a thread safe hash table of sorts ( https://reviews.llvm.org/D133715 )- it's a bit more esoteric/specialized, but I wonder if the use cases overlap enough to be able to unify them?

I won’t have time to take a look myself for a couple of weeks, but adding other interested parties.

Certainly sounds like there’s crossover! The data structure in the other patch supports concurrent insertion and look-up and uses atomics rather than locks. It does not support iteration, although that could be implemented. It does not directly support arbitrary keys, but could be used to implement a more general map; the client is expected to do hashing and decide what to do with collisions. Likewise, it does not support erase, but the client could use a tombstone. Not sure if your use case requires those operations, or if the overhead would be worth it.

In D132455#4018472, @dblaikie wrote:

@dexonsmith & co working on the CAS have also proposed a thread safe hash table of sorts ( https://reviews.llvm.org/D133715 )- it's a bit more esoteric/specialized, but I wonder if the use cases overlap enough to be able to unify them?

David, thank you for pointing this another patch. It would be good to have a unified solution.

In D132455#4019525, @dexonsmith wrote:

In D132455#4018472, @dblaikie wrote:

@dexonsmith & co working on the CAS have also proposed a thread safe hash table of sorts ( https://reviews.llvm.org/D133715 )- it's a bit more esoteric/specialized, but I wonder if the use cases overlap enough to be able to unify them?

I won’t have time to take a look myself for a couple of weeks, but adding other interested parties.

Certainly sounds like there’s crossover! The data structure in the other patch supports concurrent insertion and look-up and uses atomics rather than locks. It does not support iteration, although that could be implemented. It does not directly support arbitrary keys, but could be used to implement a more general map; the client is expected to do hashing and decide what to do with collisions. Likewise, it does not support erase, but the client could use a tombstone. Not sure if your use case requires those operations, or if the overhead would be worth it.

This hashtable(D132455) is implemented for https://reviews.llvm.org/D96035 patch.
The main requirement is to have a possibility to store aggregate key/data pairs
(i.e. key is separated from the data) in parallel. Another requirement is to have information
whether data inserted immidiately or by previous call. It should use memory pool
(like BumpPtrAllocator).

So far, I created a table comparing patches:

------------------------------------------------------------------------
                    |   HashMappedTrie     |    ConcurrentHashtable    |
------------------------------------------------------------------------
    thread-safe     |         yes          |           yes             |
------------------------------------------------------------------------
 range of resizings |     not limited      |          x2^32            |
                    |                      |     can be increased      |
------------------------------------------------------------------------
    key/data pairs  |         no           |           yes             |
------------------------------------------------------------------------
     lock-free?     |         yes          |           no              |
                    |                      | uses mutexes for locking  |
------------------------------------------------------------------------
    insertions      |         yes          |           yes             |
------------------------------------------------------------------------
      lookups       |         yes          |           no              |
                    |                      |   can be easily added     |
------------------------------------------------------------------------
     deletions      |          no          |           no              |
                    |                      |   can be easily added     |
------------------------------------------------------------------------
     iterations     |          no          |           no              |
                    |  can be easily added |  an ineffective non-thread|
                    |                      |safe solution could be done|
------------------------------------------------------------------------
  hash collisions   |          yes         |      no collisions        |
                    |   should be handled  |                           |
                    |      by client       |                           |                    
------------------------------------------------------------------------

I did some first-look performance comparisons of the patches(using
this utility https://reviews.llvm.org/D132548).
The numbers might be inaccurate if I used HashMappedTrie incorrectly.
There is a difference - I did not set any initial sizes for HashMappedTrie
while initial sizes for ConcurrentHashtable were set. Also, only the key
was stored for HashMappedTrie while the key and data pair were stored for
ConcurrentHashtable. All runs insert 100000000 strings converted from
the corresponding integer.

------------------------------------------------------------------------------
                          |   HashMappedTrie     |    ConcurrentHashtable    |
------------------------------------------------------------------------------
 --num-threads 1          | time:       62sec    | time:           30sec     |                    
 --initial-size 100000000 | memory:     13.2G    | memory:         16.1G     |                    
------------------------------------------------------------------------------
 --num-threads 1          | time:       62sec    | time:           34sec     |                    
 --initial-size       100 | memory:     13.2G    | memory:         18.1G     |                    
------------------------------------------------------------------------------
 --num-threads 16         | time:       38sec    | time:          3.5sec     |                    
 --initial-size 100000000 | memory:     13.2G    | memory:         16.1G     |                    
------------------------------------------------------------------------------
 --num-threads 16         | time:       38sec    | time:          7.3sec     |                    
 --initial-size       100 | memory:     13.2G    | memory:         18.1G     |                    
------------------------------------------------------------------------------

avl added a child revision: D140841: [DWARFLinkerParallel] Add StringPool class..Jan 2 2023, 4:22 AM

Thanks for doing the comparison with https://reviews.llvm.org/D133715.

There are definitely some crossovers can help both data structure, like I also have a patch for a thread-safe-allocator: https://reviews.llvm.org/D133713. HashMappedTrie can definitely store key/data pair. For example, InMemoryCAS (https://reviews.llvm.org/D133716) is storing data as in HashMappedTrie to implement a CAS.

There are also definitely more differences, like one data structure is table-like and the other is a tree-like. Not sure if it is worth that we unify the interface so you can switch between but we can consider that. I think the biggest difference is how the collision is handled. For a CAS implementation that intended for caching, a hash collision cannot be accepted. I guess it is possible to extend HashMappedTrie to support collision and have a mode to error on collision, but it is not free (but might not be too costly).

I am super interested in how you do the comparison. Can you post a patch for the code you have for that? The HashMappedTrie hasn't been really tuned for performance/memory usage, it would be interesting to use your tool to do some investigation (for example, the NumRootBits and NumSubtrieBits can be tuned for memory/performance).

I am super interested in how you do the comparison. Can you post a patch for the code you have for that?

Sure. I will update https://reviews.llvm.org/D132548 to support HashMappedTrie in a couple of days.

I am super interested in how you do the comparison. Can you post a patch for the code you have for that? The HashMappedTrie hasn't been really tuned for performance/memory usage, it would be interesting to use your tool to do some investigation (for example, the NumRootBits and NumSubtrieBits can be tuned for memory/performance).

@steven_wu I`ve updated D132548 to have a possibility to measure HashMappedTrie. It depends on D125979, D132455, D133715.
In its current state patch does not set NumRootBits, NumSubtrieBits, though such options might be added.
If intel threading building block hashmap is not neccessary - the USE_ITBB should be unset.
if lib cuckoo hashmap is not neccessary - the USE_LIBCUCKOO should be unset.
Command line to run the tool:

/usr/bin/time -f " %E %M " ./check-hashtable --data-set random --num-threads 1 --table-kind hash-mapped-trie --aggregate-data

Any changes/suggestions are welcomed :-)

avl mentioned this in D140841: [DWARFLinkerParallel] Add StringPool class..Jan 4 2023, 10:12 AM

In D132455#4026480, @avl wrote:

I am super interested in how you do the comparison. Can you post a patch for the code you have for that? The HashMappedTrie hasn't been really tuned for performance/memory usage, it would be interesting to use your tool to do some investigation (for example, the NumRootBits and NumSubtrieBits can be tuned for memory/performance).

@steven_wu I`ve updated D132548 to have a possibility to measure HashMappedTrie. It depends on D125979, D132455, D133715.
In its current state patch does not set NumRootBits, NumSubtrieBits, though such options might be added.
If intel threading building block hashmap is not neccessary - the USE_ITBB should be unset.
if lib cuckoo hashmap is not neccessary - the USE_LIBCUCKOO should be unset.
Command line to run the tool:

/usr/bin/time -f " %E %M " ./check-hashtable --data-set random --num-threads 1 --table-kind hash-mapped-trie --aggregate-data

Any changes/suggestions are welcomed :-)

Thanks! It works great. The only downside is that different memory allocator is used in different implementation so the number is not directly comparable but could reflect the performance for simple use cases.

It is already good to look at the scaling factor for implementations (size and threads). For example, I see the default configuration for HashMappedTrie scales to about 4 threads, then it just goes really bad (because the root bits is only 6 so high contention is expected in the beginning).

improved dumping, did several refactorings, improved resizing code.

Harbormaster completed remote builds in B206063: Diff 486795.Jan 6 2023, 4:17 AM

refactored.
simplified implementation(removed version for integral data).
implemented a couple of space optimizations.

Harbormaster completed remote builds in B207933: Diff 489390.Jan 15 2023, 12:53 PM

ping.

avl edited the summary of this revision. (Show Details)Jan 26 2023, 7:54 AM

ping.

@JDevlieghere @aprantl Do you think it is better to move this ConcurrentHashtable into the DWARFLinkerParallel folder?

rebased. refactored&simplified.

Harbormaster completed remote builds in B215950: Diff 500444.Feb 25 2023, 12:58 PM

fixed build.

Harbormaster completed remote builds in B216025: Diff 500530.Feb 26 2023, 4:25 AM

@aprantl @JDevlieghere @dblaikie @MaskRay Would you mind to take a look at this review, please?

The performance comparison for this hash table for reading strings from DWARF info from clang binary(done by utility from D132548):

--num-threads 16

                             time      memory
1. llvm-concurrent-hashmap: 0.82 sec     3.1G
2. lldb-const-string:       0.98 sec     3.1G


--num-threads 1

                             time      memory
1. llvm-concurrent-hashmap: 5.7 sec     3.1G
2. lldb-const-string:       7.1 sec     3.1G
3. llvm-string-map:         5.7 sec     3.1G

The advantages comparing to lldb-const-string implementation:

ConcurrentHashTableByPtr is general. It might be used not only for strings.
ConcurrentHashTableByPtr has a bit better performance numbers.
ConcurrentHashTableByPtr is more scalable.

ping.

In D132455#4106255, @avl wrote:

ping.

@JDevlieghere @aprantl Do you think it is better to move this ConcurrentHashtable into the DWARFLinkerParallel folder?

I'm fine either way. If someone has concerns about the implementation, it's probably less contentious to land it in the DWARFLinkerParallel first, but so far I've not heard any objections. If we do want to use this from LLDB, then we'll need it to be in ADT eventually, though I'd like to do a comparison with Steven's HashMappedTrie for LLDB's real world usage.

I read through the code and most of it makes sense to me, but I wouldn't mind if someone that deal with data structures on a daily basis has another look. I'll mark this as accepted but would ask you to let it sit here for a few more days for others to take a look.

llvm/include/llvm/ADT/ConcurrentHashtable.h
114–115
223
298	What's the benefit of rehashing at 90% capacity? It seems like this is going to always leave a few empty slots on the table? I understand you always need to have one slot because you rehash after insertion, but it seems like you could rehash here rehash when you've exhausted the bucket?
309

This revision is now accepted and ready to land.Mar 15 2023, 11:38 AM

In D132455#4197342, @JDevlieghere wrote:

In D132455#4106255, @avl wrote:

ping.

@JDevlieghere @aprantl Do you think it is better to move this ConcurrentHashtable into the DWARFLinkerParallel folder?

I'm fine either way. If someone has concerns about the implementation, it's probably less contentious to land it in the DWARFLinkerParallel first, but so far I've not heard any objections. If we do want to use this from LLDB, then we'll need it to be in ADT eventually, though I'd like to do a comparison with Steven's HashMappedTrie for LLDB's real world usage.

I have the utility to do the comparision - https://reviews.llvm.org/D132548 The problem is that HashMappedTrie does not resolve hash collisions. it is assumed that such collisions would be resolved by the client of HashMappedTrie. I am not sure what is the good way to resolve such collisions. Thus, I do not know how to make a "fair" comparision at the moment.

llvm/include/llvm/ADT/ConcurrentHashtable.h
298	When the hashtable is nearly 100% full then it needs to pass too many entries while searching for the free slot. the worst scenario is if the bucket is of 1000000 entries size and it already has 999999 entries then it might need to enumerate all 999999 entries, which is slow. In case bucket is of 1000000 entries size and it is 90% full the number of entries which should be enumerated is smaller. So wasting 10% of memory allows to have 20% performance improvement. The exact value "0.9" is received while experimenting. Probably it would be good to have a possibility to change this value (for the case when memory is more important).

addressed comments.

Harbormaster completed remote builds in B221130: Diff 507499.Mar 22 2023, 4:08 PM

Thank you for the review!

Closed by commit rG8482b238062e: [ADT] add ConcurrentHashtable class. (authored by avl). · Explain WhyMar 23 2023, 6:35 AM

This revision was automatically updated to reflect the committed changes.

avl added a commit: rG8482b238062e: [ADT] add ConcurrentHashtable class..

avl added a reverting change: rGfd4aeba307ca: Revert "[ADT] add ConcurrentHashtable class.".Mar 23 2023, 6:43 AM

avl added a commit: rG42058eea7912: [reland][ADT] add ConcurrentHashtable class..Mar 27 2023, 6:50 AM

Alexey, not all platforms support thread local storage. Can you wrap the tests up in

#ifdef LLVM_ENABLE_THREADS

#else

#endif

You should also use LLVM_THREAD_LOCAL instead of using thread_local.

Thanks

@SeanP Thank you for catching! It looks like it is not necessary to insert #ifdef LLVM_ENABLE_THREADS. Just using LLVM_THREAD_LOCAL should be enough. Please, consider https://reviews.llvm.org/D147649

I stumbled across a potential integer overflow in this code. A proposed fix is posted as D158117.

andrew.w.kaylor mentioned this in rG6664e80ace08: Fix integer overflow in ConcurrentHashtTableByPtr.Aug 16 2023, 3:43 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

ADT/

ConcurrentHashtable.h

395 lines

unittests/

ADT/

CMakeLists.txt

1 line

ConcurrentHashtableTest.cpp

279 lines

Diff 507723

llvm/include/llvm/ADT/ConcurrentHashtable.h

This file was added.

//===- ConcurrentHashtable.h ------------------------------------*- C++ -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#ifndef LLVM_ADT_CONCURRENTHASHTABLE_H

#define LLVM_ADT_CONCURRENTHASHTABLE_H

#include "llvm/ADT/DenseMap.h"

#include "llvm/ADT/Hashing.h"

#include "llvm/ADT/PointerIntPair.h"

#include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/SmallVector.h"

#include "llvm/Support/Allocator.h"

#include "llvm/Support/Debug.h"

#include "llvm/Support/Parallel.h"

#include "llvm/Support/WithColor.h"

#include "llvm/Support/xxhash.h"

#include <atomic>

#include <cstddef>

#include <iomanip>

#include <mutex>

#include <sstream>

#include <type_traits>

namespace llvm {

/// ConcurrentHashTable - is a resizeable concurrent hashtable.

/// The number of resizings limited up to x2^32. This hashtable is

/// useful to have efficient access to aggregate data(like strings,

/// type descriptors...) and to keep only single copy of such

/// an aggregate. The hashtable allows only concurrent insertions:

///

/// KeyDataTy* = insert ( const KeyTy& );

///

/// Data structure:

///

/// Inserted value KeyTy is mapped to 64-bit hash value ->

///

/// [------- 64-bit Hash value --------]

/// [ StartEntryIndex ][ Bucket Index ]

/// | |

/// points to the points to

/// first probe the bucket.

/// position inside

/// bucket entries

///

/// After initialization, all buckets have an initial size. During insertions,

/// buckets might be extended to contain more entries. Each bucket can be

/// independently resized and rehashed(no need to lock the whole table).

/// Different buckets may have different sizes. If the single bucket is full

/// then the bucket is resized.

///

/// BucketsArray keeps all buckets. Each bucket keeps an array of Entries

/// (pointers to KeyDataTy) and another array of entries hashes:

///

/// BucketsArray[BucketIdx].Hashes[EntryIdx]:

/// BucketsArray[BucketIdx].Entries[EntryIdx]:

///

/// [Bucket 0].Hashes -> [uint32_t][uint32_t]

/// [Bucket 0].Entries -> [KeyDataTy*][KeyDataTy*]

///

/// [Bucket 1].Hashes -> [uint32_t][uint32_t][uint32_t][uint32_t]

/// [Bucket 1].Entries -> [KeyDataTy*][KeyDataTy*][KeyDataTy*][KeyDataTy*]

/// .........................

/// [Bucket N].Hashes -> [uint32_t][uint32_t][uint32_t]

/// [Bucket N].Entries -> [KeyDataTy*][KeyDataTy*][KeyDataTy*]

///

/// ConcurrentHashTableByPtr uses an external thread-safe allocator to allocate

/// KeyDataTy items.

template <typename KeyTy, typename KeyDataTy, typename AllocatorTy>

class ConcurrentHashTableInfoByPtr {

public:

/// \returns Hash value for the specified \p Key.

static inline uint64_t getHashValue(const KeyTy &Key) {

return xxHash64(Key);

}

/// \returns true if both \p LHS and \p RHS are equal.

static inline bool isEqual(const KeyTy &LHS, const KeyTy &RHS) {

return LHS == RHS;

}

/// \returns key for the specified \p KeyData.

static inline const KeyTy &getKey(const KeyDataTy &KeyData) {

return KeyData.getKey();

}

/// \returns newly created object of KeyDataTy type.

static inline KeyDataTy *create(const KeyTy &Key, AllocatorTy &Allocator) {

return KeyDataTy::create(Key, Allocator);

}

};

template <typename KeyTy, typename KeyDataTy, typename AllocatorTy,

typename Info =

ConcurrentHashTableInfoByPtr<KeyTy, KeyDataTy, AllocatorTy>>

class ConcurrentHashTableByPtr {

public:

ConcurrentHashTableByPtr(

AllocatorTy &Allocator, size_t EstimatedSize = 100000,

size_t ThreadsNum = parallel::strategy.compute_thread_count(),

size_t InitialNumberOfBuckets = 128)

: MultiThreadAllocator(Allocator) {

assert((ThreadsNum > 0) && "ThreadsNum must be greater than 0");

assert((InitialNumberOfBuckets > 0) &&

"InitialNumberOfBuckets must be greater than 0");

constexpr size_t UINT64_BitsNum = sizeof(uint64_t) * 8;

constexpr size_t UINT32_BitsNum = sizeof(uint32_t) * 8;

JDevlieghereUnsubmitted

Not Done

"InitialNumberOfBuckets must be greater than 0");

- size_t UINT64_BitsNum = sizeof(uint64_t) * 8;

- size_t UINT32_BitsNum = sizeof(uint32_t) * 8;

+ constexpr size_t UINT64_BitsNum = sizeof(uint64_t) * 8;

+ constexpr size_t UINT32_BitsNum = sizeof(uint32_t) * 8;

NumberOfBuckets = ThreadsNum;

JDevlieghere:

NumberOfBuckets = ThreadsNum;

// Calculate number of buckets.

if (ThreadsNum > 1) {

NumberOfBuckets *= InitialNumberOfBuckets;

NumberOfBuckets *= std::max(

countr_zero(PowerOf2Ceil(EstimatedSize / InitialNumberOfBuckets)) >>

2);

}

NumberOfBuckets = PowerOf2Ceil(NumberOfBuckets);

// Allocate buckets.

BucketsArray = std::make_unique<Bucket[]>(NumberOfBuckets);

InitialBucketSize = EstimatedSize / NumberOfBuckets;

InitialBucketSize = std::max((size_t)1, InitialBucketSize);

InitialBucketSize = PowerOf2Ceil(InitialBucketSize);

// Initialize each bucket.

for (size_t Idx = 0; Idx < NumberOfBuckets; Idx++) {

HashesPtr Hashes = new ExtHashBitsTy[InitialBucketSize];

memset(Hashes, 0, sizeof(ExtHashBitsTy) * InitialBucketSize);

DataPtr Entries = new EntryDataTy[InitialBucketSize];

memset(Entries, 0, sizeof(EntryDataTy) * InitialBucketSize);

BucketsArray[Idx].Size = InitialBucketSize;

BucketsArray[Idx].Hashes = Hashes;

BucketsArray[Idx].Entries = Entries;

}

// Calculate masks.

HashMask = NumberOfBuckets - 1;

size_t LeadingZerosNumber = countl_zero(HashMask);

HashBitsNum = UINT64_BitsNum - LeadingZerosNumber;

// We keep only high 32-bits of hash value. So bucket size cannot

// exceed 2^32. Bucket size is always power of two.

MaxBucketSize = 1Ull << (std::min(UINT32_BitsNum, LeadingZerosNumber));

// Calculate mask for extended hash bits.

ExtHashMask = (NumberOfBuckets * MaxBucketSize) - 1;

}

virtual ~ConcurrentHashTableByPtr() {

// Deallocate buckets.

for (size_t Idx = 0; Idx < NumberOfBuckets; Idx++) {

delete[] BucketsArray[Idx].Hashes;

delete[] BucketsArray[Idx].Entries;

}

/// Insert new value \p NewValue or return already existing entry.

///

/// \returns entry and "true" if an entry is just inserted or

/// "false" if an entry already exists.

std::pair<KeyDataTy *, bool> insert(const KeyTy &NewValue) {

// Calculate bucket index.

uint64_t Hash = Info::getHashValue(NewValue);

Bucket &CurBucket = BucketsArray[getBucketIdx(Hash)];

uint32_t ExtHashBits = getExtHashBits(Hash);

// Lock bucket.

CurBucket.Guard.lock();

HashesPtr BucketHashes = CurBucket.Hashes;

DataPtr BucketEntries = CurBucket.Entries;

size_t CurEntryIdx = getStartIdx(ExtHashBits, CurBucket.Size);

while (true) {

uint32_t CurEntryHashBits = BucketHashes[CurEntryIdx];

if (CurEntryHashBits == 0 && BucketEntries[CurEntryIdx] == nullptr) {

// Found empty slot. Insert data.

KeyDataTy *NewData = Info::create(NewValue, MultiThreadAllocator);

BucketEntries[CurEntryIdx] = NewData;

BucketHashes[CurEntryIdx] = ExtHashBits;

CurBucket.NumberOfEntries++;

RehashBucket(CurBucket);

CurBucket.Guard.unlock();

return {NewData, true};

}

if (CurEntryHashBits == ExtHashBits) {

// Hash matched. Check value for equality.

KeyDataTy *EntryData = BucketEntries[CurEntryIdx];

if (Info::isEqual(Info::getKey(*EntryData), NewValue)) {

// Already existed entry matched with inserted data is found.

CurBucket.Guard.unlock();

return {EntryData, false};

}

CurEntryIdx++;

CurEntryIdx &= (CurBucket.Size - 1);

}

llvm_unreachable("Insertion error.");

return {};

}

/// Print information about current state of hash table structures.

JDevlieghereUnsubmitted

Not Done

llvm_unreachable("Insertion error.");

- return std::pair<KeyDataTy *, bool>();

+ return {};

}

/// Print information about current state of hash table structures.

JDevlieghere:

void printStatistic(raw_ostream &OS) {

OS << "\n--- HashTable statistic:\n";

OS << "\nNumber of buckets = " << NumberOfBuckets;

OS << "\nInitial bucket size = " << InitialBucketSize;

uint64_t NumberOfNonEmptyBuckets = 0;

uint64_t NumberOfEntriesPlusEmpty = 0;

uint64_t OverallNumberOfEntries = 0;

uint64_t OverallSize = sizeof(*this) + NumberOfBuckets * sizeof(Bucket);

DenseMap<size_t, size_t> BucketSizesMap;

// For each bucket...

for (size_t Idx = 0; Idx < NumberOfBuckets; Idx++) {

Bucket &CurBucket = BucketsArray[Idx];

BucketSizesMap[CurBucket.Size]++;

if (CurBucket.NumberOfEntries != 0)

NumberOfNonEmptyBuckets++;

NumberOfEntriesPlusEmpty += CurBucket.Size;

OverallNumberOfEntries += CurBucket.NumberOfEntries;

OverallSize +=

(sizeof(ExtHashBitsTy) + sizeof(EntryDataTy)) * CurBucket.Size;

}

OS << "\nOverall number of entries = " << OverallNumberOfEntries;

OS << "\nOverall number of non empty buckets = " << NumberOfNonEmptyBuckets;

for (auto &BucketSize : BucketSizesMap)

OS << "\n Number of buckets with size " << BucketSize.first << ": "

<< BucketSize.second;

std::stringstream stream;

stream << std::fixed << std::setprecision(2)

<< ((float)OverallNumberOfEntries / (float)NumberOfEntriesPlusEmpty);

std::string str = stream.str();

OS << "\nLoad factor = " << str;

OS << "\nOverall allocated size = " << OverallSize;

}

protected:

using ExtHashBitsTy = uint32_t;

using EntryDataTy = KeyDataTy *;

using HashesPtr = ExtHashBitsTy *;

using DataPtr = EntryDataTy *;

// Bucket structure. Keeps bucket data.

struct Bucket {

Bucket() = default;

// Size of bucket.

uint32_t Size = 0;

// Number of non-null entries.

size_t NumberOfEntries = 0;

// Hashes for [Size] entries.

HashesPtr Hashes = nullptr;

// [Size] entries.

DataPtr Entries = nullptr;

// Mutex for this bucket.

std::mutex Guard;

};

// Reallocate and rehash bucket if this is full enough.

void RehashBucket(Bucket &CurBucket) {

assert((CurBucket.Size > 0) && "Uninitialised bucket");

if (CurBucket.NumberOfEntries < CurBucket.Size * 0.9)

return;

if (CurBucket.Size >= MaxBucketSize)

JDevlieghereUnsubmitted

Not Done

What's the benefit of rehashing at 90% capacity? It seems like this is going to always leave a few empty slots on the table? I understand you always need to have one slot because you rehash after insertion, but it seems like you could rehash here rehash when you've exhausted the bucket?

JDevlieghere: What's the benefit of rehashing at 90% capacity? It seems like this is going to always leave a…

avlAuthorUnsubmitted

Done

When the hashtable is nearly 100% full then it needs to pass too many entries while searching for the free slot. the worst scenario is if the bucket is of 1000000 entries size and it already has 999999 entries then it might need to enumerate all 999999 entries, which is slow.
In case bucket is of 1000000 entries size and it is 90% full the number of entries which should be enumerated is smaller.

So wasting 10% of memory allows to have 20% performance improvement. The exact value "0.9" is received while experimenting. Probably it would be good to have a possibility to change this value (for the case when memory is more important).

avl: When the hashtable is nearly 100% full then it needs to pass too many entries while searching…

report_fatal_error("ConcurrentHashTable is full");

size_t NewBucketSize = CurBucket.Size << 1;

assert((NewBucketSize <= MaxBucketSize) && "New bucket size is too big");

assert((CurBucket.Size < NewBucketSize) &&

"New bucket size less than size of current bucket");

// Store old entries & hashes arrays.

HashesPtr SrcHashes = CurBucket.Hashes;

DataPtr SrcEntries = CurBucket.Entries;

JDevlieghereUnsubmitted

Not Done

"New bucket size less than size of current bucket");

- // Store old entries&hashes arrays.

+ // Store old entries & hashes arrays.

HashesPtr SrcHashes = CurBucket.Hashes;

JDevlieghere:

// Allocate new entries&hashes arrays.

HashesPtr DestHashes = new ExtHashBitsTy[NewBucketSize];

memset(DestHashes, 0, sizeof(ExtHashBitsTy) * NewBucketSize);

DataPtr DestEntries = new EntryDataTy[NewBucketSize];

memset(DestEntries, 0, sizeof(EntryDataTy) * NewBucketSize);

// For each entry in source arrays...

for (size_t CurSrcEntryIdx = 0; CurSrcEntryIdx < CurBucket.Size;

CurSrcEntryIdx++) {

uint32_t CurSrcEntryHashBits = SrcHashes[CurSrcEntryIdx];

// Check for null entry.

if (CurSrcEntryHashBits == 0 && SrcEntries[CurSrcEntryIdx] == nullptr)

continue;

size_t StartDestIdx = getStartIdx(CurSrcEntryHashBits, NewBucketSize);

// Insert non-null entry into the new arrays.

while (true) {

uint32_t CurDestEntryHashBits = DestHashes[StartDestIdx];

if (CurDestEntryHashBits == 0 && DestEntries[StartDestIdx] == nullptr) {

// Found empty slot. Insert data.

DestHashes[StartDestIdx] = CurSrcEntryHashBits;

DestEntries[StartDestIdx] = SrcEntries[CurSrcEntryIdx];

break;

}

StartDestIdx++;

StartDestIdx = StartDestIdx & (NewBucketSize - 1);

}

// Update bucket fields.

CurBucket.Hashes = DestHashes;

CurBucket.Entries = DestEntries;

CurBucket.Size = NewBucketSize;

// Delete old bucket entries.

if (SrcHashes != nullptr)

delete[] SrcHashes;

if (SrcEntries != nullptr)

delete[] SrcEntries;

}

size_t getBucketIdx(hash_code Hash) { return Hash & HashMask; }

uint32_t getExtHashBits(uint64_t Hash) {

return (Hash & ExtHashMask) >> HashBitsNum;

}

size_t getStartIdx(uint32_t ExtHashBits, size_t BucketSize) {

assert((BucketSize > 0) && "Empty bucket");

return ExtHashBits & (BucketSize - 1);

}

// Number of bits in hash mask.

uint64_t HashBitsNum = 0;

// Hash mask.

uint64_t HashMask = 0;

// Hash mask for the extended hash bits.

uint64_t ExtHashMask = 0;

// The maximal bucket size.

size_t MaxBucketSize = 0;

// Initial size of bucket.

size_t InitialBucketSize = 0;

// The number of buckets.

size_t NumberOfBuckets = 0;

// Array of buckets.

std::unique_ptr<Bucket[]> BucketsArray;

// Used for allocating KeyDataTy values.

AllocatorTy &MultiThreadAllocator;

};

} // end namespace llvm

#endif // LLVM_ADT_CONCURRENTHASHTABLE_H

llvm/unittests/ADT/CMakeLists.txt

Show All 11 Lines	add_llvm_unittest(ADTTests
BitFieldsTest.cpp		BitFieldsTest.cpp
BitmaskEnumTest.cpp		BitmaskEnumTest.cpp
BitTest.cpp		BitTest.cpp
BitVectorTest.cpp		BitVectorTest.cpp
BreadthFirstIteratorTest.cpp		BreadthFirstIteratorTest.cpp
BumpPtrListTest.cpp		BumpPtrListTest.cpp
CoalescingBitVectorTest.cpp		CoalescingBitVectorTest.cpp
CombinationGeneratorTest.cpp		CombinationGeneratorTest.cpp
		ConcurrentHashtableTest.cpp
DAGDeltaAlgorithmTest.cpp		DAGDeltaAlgorithmTest.cpp
DeltaAlgorithmTest.cpp		DeltaAlgorithmTest.cpp
DenseMapTest.cpp		DenseMapTest.cpp
DenseSetTest.cpp		DenseSetTest.cpp
DepthFirstIteratorTest.cpp		DepthFirstIteratorTest.cpp
DirectedGraphTest.cpp		DirectedGraphTest.cpp
EditDistanceTest.cpp		EditDistanceTest.cpp
EnumeratedArrayTest.cpp		EnumeratedArrayTest.cpp
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/unittests/ADT/ConcurrentHashtableTest.cpp

This file was added.

				//===- ConcurrentHashtableTest.cpp ----------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/ConcurrentHashtable.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/FormatVariadic.h"
				#include "llvm/Support/Parallel.h"
				#include "gtest/gtest.h"
				#include <limits>
				#include <random>
				#include <vector>
				using namespace llvm;

				namespace {
				class String {
				public:
				String() {}
				const std::string &getKey() const { return Data; }

				template <typename AllocatorTy>
				static String *create(const std::string &Num, AllocatorTy &Allocator) {
				String *Result = Allocator.template Allocate<String>();
				new (Result) String(Num);
				return Result;
				}

				protected:
				String(const std::string &Num) { Data += Num; }

				std::string Data;
				std::array<char, 0x20> ExtraData;
				};

				static LLVM_THREAD_LOCAL BumpPtrAllocator ThreadLocalAllocator;
				class PerThreadAllocator : public AllocatorBase<PerThreadAllocator> {
				public:
				inline LLVM_ATTRIBUTE_RETURNS_NONNULL void *Allocate(size_t Size,
				size_t Alignment) {
				return ThreadLocalAllocator.Allocate(Size, Align(Alignment));
				}
				inline size_t getBytesAllocated() const {
				return ThreadLocalAllocator.getBytesAllocated();
				}

				// Pull in base class overloads.
				using AllocatorBase<PerThreadAllocator>::Allocate;
				} Allocator;

				TEST(ConcurrentHashTableTest, AddStringEntries) {
				ConcurrentHashTableByPtr<
				std::string, String, PerThreadAllocator,
				ConcurrentHashTableInfoByPtr<std::string, String, PerThreadAllocator>>
				HashTable(Allocator, 10);

				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::pair<String *, bool> res1 = HashTable.insert("1");
				// Check entry is inserted.
				EXPECT_TRUE(res1.first->getKey() == "1");
				EXPECT_TRUE(res1.second);

				std::pair<String *, bool> res2 = HashTable.insert("2");
				// Check old entry is still valid.
				EXPECT_TRUE(res1.first->getKey() == "1");
				// Check new entry is inserted.
				EXPECT_TRUE(res2.first->getKey() == "2");
				EXPECT_TRUE(res2.second);
				// Check new and old entries use different memory.
				EXPECT_TRUE(res1.first != res2.first);

				std::pair<String *, bool> res3 = HashTable.insert("3");
				// Check one more entry is inserted.
				EXPECT_TRUE(res3.first->getKey() == "3");
				EXPECT_TRUE(res3.second);

				std::pair<String *, bool> res4 = HashTable.insert("1");
				// Check duplicated entry is inserted.
				EXPECT_TRUE(res4.first->getKey() == "1");
				EXPECT_FALSE(res4.second);
				// Check duplicated entry uses the same memory.
				EXPECT_TRUE(res1.first == res4.first);

				// Check first entry is still valid.
				EXPECT_TRUE(res1.first->getKey() == "1");

				// Check data was allocated by allocator.
				EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);

				// Check statistic.
				std::string StatisticString;
				raw_string_ostream StatisticStream(StatisticString);
				HashTable.printStatistic(StatisticStream);

				EXPECT_TRUE(StatisticString.find("Overall number of entries = 3\n") !=
				std::string::npos);
				}

				TEST(ConcurrentHashTableTest, AddStringMultiplueEntries) {
				const size_t NumElements = 10000;
				ConcurrentHashTableByPtr<
				std::string, String, PerThreadAllocator,
				ConcurrentHashTableInfoByPtr<std::string, String, PerThreadAllocator>>
				HashTable(Allocator);

				// Check insertion.
				for (size_t I = 0; I < NumElements; I++) {
				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::string StringForElement = formatv("{0}", I);
				std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
				EXPECT_TRUE(Entry.second);
				EXPECT_TRUE(Entry.first->getKey() == StringForElement);
				EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);
				}

				std::string StatisticString;
				raw_string_ostream StatisticStream(StatisticString);
				HashTable.printStatistic(StatisticStream);

				// Verifying that the table contains exactly the number of elements we
				// inserted.
				EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=
				std::string::npos);

				// Check insertion of duplicates.
				for (size_t I = 0; I < NumElements; I++) {
				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::string StringForElement = formatv("{0}", I);
				std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
				EXPECT_FALSE(Entry.second);
				EXPECT_TRUE(Entry.first->getKey() == StringForElement);
				// Check no additional bytes were allocated for duplicate.
				EXPECT_TRUE(Allocator.getBytesAllocated() == AllocatedBytesAtStart);
				}

				// Check statistic.
				// Verifying that the table contains exactly the number of elements we
				// inserted.
				EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=
				std::string::npos);
				}

				TEST(ConcurrentHashTableTest, AddStringMultiplueEntriesWithResize) {
				// Number of elements exceeds original size, thus hashtable should be resized.
				const size_t NumElements = 20000;
				ConcurrentHashTableByPtr<
				std::string, String, PerThreadAllocator,
				ConcurrentHashTableInfoByPtr<std::string, String, PerThreadAllocator>>
				HashTable(Allocator, 100);

				// Check insertion.
				for (size_t I = 0; I < NumElements; I++) {
				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::string StringForElement = formatv("{0} {1}", I, I + 100);
				std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
				EXPECT_TRUE(Entry.second);
				EXPECT_TRUE(Entry.first->getKey() == StringForElement);
				EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);
				}

				std::string StatisticString;
				raw_string_ostream StatisticStream(StatisticString);
				HashTable.printStatistic(StatisticStream);

				// Verifying that the table contains exactly the number of elements we
				// inserted.
				EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=
				std::string::npos);

				// Check insertion of duplicates.
				for (size_t I = 0; I < NumElements; I++) {
				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::string StringForElement = formatv("{0} {1}", I, I + 100);
				std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
				EXPECT_FALSE(Entry.second);
				EXPECT_TRUE(Entry.first->getKey() == StringForElement);
				// Check no additional bytes were allocated for duplicate.
				EXPECT_TRUE(Allocator.getBytesAllocated() == AllocatedBytesAtStart);
				}

				// Check statistic.
				// Verifying that the table contains exactly the number of elements we
				// inserted.
				EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=
				std::string::npos);
				}

				TEST(ConcurrentHashTableTest, AddStringEntriesParallel) {
				const size_t NumElements = 10000;
				ConcurrentHashTableByPtr<
				std::string, String, PerThreadAllocator,
				ConcurrentHashTableInfoByPtr<std::string, String, PerThreadAllocator>>
				HashTable(Allocator);

				// Check parallel insertion.
				parallelFor(0, NumElements, [&](size_t I) {
				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::string StringForElement = formatv("{0}", I);
				std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
				EXPECT_TRUE(Entry.second);
				EXPECT_TRUE(Entry.first->getKey() == StringForElement);
				EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);
				});

				std::string StatisticString;
				raw_string_ostream StatisticStream(StatisticString);
				HashTable.printStatistic(StatisticStream);

				// Verifying that the table contains exactly the number of elements we
				// inserted.
				EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=
				std::string::npos);

				// Check parallel insertion of duplicates.
				parallelFor(0, NumElements, [&](size_t I) {
				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::string StringForElement = formatv("{0}", I);
				std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
				EXPECT_FALSE(Entry.second);
				EXPECT_TRUE(Entry.first->getKey() == StringForElement);
				// Check no additional bytes were allocated for duplicate.
				EXPECT_TRUE(Allocator.getBytesAllocated() == AllocatedBytesAtStart);
				});

				// Check statistic.
				// Verifying that the table contains exactly the number of elements we
				// inserted.
				EXPECT_TRUE(StatisticString.find("Overall number of entries = 10000\n") !=
				std::string::npos);
				}

				TEST(ConcurrentHashTableTest, AddStringEntriesParallelWithResize) {
				const size_t NumElements = 20000;
				ConcurrentHashTableByPtr<
				std::string, String, PerThreadAllocator,
				ConcurrentHashTableInfoByPtr<std::string, String, PerThreadAllocator>>
				HashTable(Allocator, 100);

				// Check parallel insertion.
				parallelFor(0, NumElements, [&](size_t I) {
				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::string StringForElement = formatv("{0}", I);
				std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
				EXPECT_TRUE(Entry.second);
				EXPECT_TRUE(Entry.first->getKey() == StringForElement);
				EXPECT_TRUE(Allocator.getBytesAllocated() > AllocatedBytesAtStart);
				});

				std::string StatisticString;
				raw_string_ostream StatisticStream(StatisticString);
				HashTable.printStatistic(StatisticStream);

				// Verifying that the table contains exactly the number of elements we
				// inserted.
				EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=
				std::string::npos);

				// Check parallel insertion of duplicates.
				parallelFor(0, NumElements, [&](size_t I) {
				size_t AllocatedBytesAtStart = Allocator.getBytesAllocated();
				std::string StringForElement = formatv("{0}", I);
				std::pair<String *, bool> Entry = HashTable.insert(StringForElement);
				EXPECT_FALSE(Entry.second);
				EXPECT_TRUE(Entry.first->getKey() == StringForElement);
				// Check no additional bytes were allocated for duplicate.
				EXPECT_TRUE(Allocator.getBytesAllocated() == AllocatedBytesAtStart);
				});

				// Check statistic.
				// Verifying that the table contains exactly the number of elements we
				// inserted.
				EXPECT_TRUE(StatisticString.find("Overall number of entries = 20000\n") !=
				std::string::npos);
				}

				} // namespace

This is an archive of the discontinued LLVM Phabricator instance.

[ADT] add ConcurrentHashtable class.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 507723

llvm/include/llvm/ADT/ConcurrentHashtable.h

llvm/unittests/ADT/CMakeLists.txt

llvm/unittests/ADT/ConcurrentHashtableTest.cpp

[ADT] add ConcurrentHashtable class.
ClosedPublic