This is an archive of the discontinued LLVM Phabricator instance.

[ADT] add FixedConcurrentHashTable class.
AbandonedPublic

Authored by avl on May 19 2022, 8:15 AM.

Details

Summary

This review is extracted from D96035.

It adds a hash table that could be filled asynchronously.
A possible use case for that kind of hashtable is keeping strings
(with possible additional data) and comparing them by
pointers comparisons. f.e. it might be used instead of
NonRelocatableStringpool which cannot be used in parallel.

The implementation of the hashmap is copied(with modifications)
from lld/COFF/DebugTypes.cpp.

Diff Detail

Event Timeline

avl created this revision.May 19 2022, 8:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 8:15 AM
Herald added a subscriber: mgorny. · View Herald Transcript
avl requested review of this revision.May 19 2022, 8:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 8:15 AM
rnk added a subscriber: MaskRay.May 20 2022, 11:16 AM

@MaskRay, can you review this? I think you have been thinking the most about how to parallelize linkers, and what fundamental data structures are most important.

@MaskRay, can you review this? I think you have been thinking the most about how to parallelize linkers, and what fundamental data structures are most important.

Will try to do my best. I'll also need to spend some time on reading D96035...

MaskRay added a comment.EditedMay 21 2022, 2:39 PM

LockFreeDataPool

The name is not descriptive. It misses the important words about "hash map". ConcurrentHashMap may be a better name.

The slab system appears to be the modification to lld/COFF/DebugTypes.cpp.
It provides limited growing capability when the capacity is hit.
There is a hard limit of 6 slabs, and report_fatal_error may eventually be called. Then why is reportWarning needed?
I think we should either implement a proper growing mechanism or just crash.

Does your application have a reasonable way estimating the capacity? Does it help to use HyperLogLog?
(There is a poor way to not implement a growing version: If we want to prevent a crash in a pathological case where the estimated capacity is insufficient and causes the concurrent hash map to crash, we can use something like lld::safeLldMain (llvm::CrashRecoveryContext).)

llvm/include/llvm/ADT/LockFreeDataPool.h
26 ↗(On Diff #430692)

I think mentioning the origin in the summary is sufficient, no need to keep it in the comment.

58 ↗(On Diff #430692)

Why so large?

159 ↗(On Diff #430692)

relaxed load

229 ↗(On Diff #430692)

MaxBucketLength needs an explanation.

236 ↗(On Diff #430692)

Is this useful? See my main comment.

MaskRay added inline comments.May 21 2022, 2:46 PM
llvm/include/llvm/ADT/LockFreeDataPool.h
113 ↗(On Diff #430692)

relaxed load on NextSlab and save it into a local variable.

(any chance we can avoid duplicating this, and instead refactor the lld one into LLVM and have lld use that one instead of using its own local copy?)

avl added a comment.May 23 2022, 11:31 AM

The name is not descriptive. It misses the important words about "hash map". ConcurrentHashMap may be a better name.

The data structure which I need is a concurrent data pool of unique objects.
That pool should keep objects and matching objects should have the same entry.

The hash map is an implementation detail. The pool uses hash map to control the data entries. There might be used another(other than hash map) structure controlling pool entries.

Probably, it is possible to separate this LockFreeDataPool into two classes data pool and hash map.

The slab system appears to be the modification to lld/COFF/DebugTypes.cpp.
It provides limited growing capability when the capacity is hit.
There is a hard limit of 6 slabs, and report_fatal_error may eventually be called. Then why is reportWarning needed?
I think we should either implement a proper growing mechanism or just crash.

The best use case for that hash map is when it is allocated once and no growings are necessary. But, there are use cases when it is hard to estimate initial size. In such cases, without resizing, application will crash or need to waste memory(to allocate a huge amount of memory covering exceptional cases).

Slab system in this patch allows to resize hash map by the price of slower search(that is the source of restriction by the number of resizes). This allows to allocate smaller hash map initially and reallocate it for the exceptionally large cases. The purpose of the warning is to inform user that an exceptional case is hit.

Does your application have a reasonable way estimating the capacity?

I use an coefficient for incoming DWARF to calculate the size of hash map. In most cases it is quite accurate. But there are some exceptional cases when estimation is not accurate and then possibility to do resize helps.

Does it help to use HyperLogLog?

Let me examine this, please.

(There is a poor way to not implement a growing version: If we want to prevent a crash in a pathological case where the estimated capacity is insufficient and causes the concurrent hash map to crash, we can use something like lld::safeLldMain (llvm::CrashRecoveryContext).)

I will check it, thanks. Do you think it would be better than possibility to resize?

llvm/include/llvm/ADT/LockFreeDataPool.h
58 ↗(On Diff #430692)

to avoid re-sizings for small sizes. It does not look to be too big value, but will do working with small sizes more smooth.

113 ↗(On Diff #430692)

something like this ?:

Slab *Slab = CurSlab->NextSlab.load(std::memory_order_relaxed);

if (Slab == nullptr) {
      const std::lock_guard<std::mutex> lock(SlabsMutex);

      if (Slab  == nullptr)   <<<<<<<<<<<
        CurSlab->NextSlab = allocateNextSlab(Allocator);
    }
}

It looks like we could not save NextSlab into the local variable because CurSlab->NextSlab can change the value when we arrive at pointed line..

avl added a comment.May 23 2022, 11:55 AM

(any chance we can avoid duplicating this, and instead refactor the lld one into LLVM and have lld use that one instead of using its own local copy?)

I thought about that. But the implementation from lld looks too specific for general implementation.
It uses knowledge about stored data:

  1. It keeps data together with hash inside hash table. This is OK for 32-bit integer but it would not be OK for larger type.
  1. It compares data to select which version should be stored. We need a way to do it or avoid it without introducing more checks.
if (!oldCell.isEmpty() >>>&& oldCell < newCell<<<)
  return idx;

I do not have effective way on how to put that behaviour in general implementation at the moment.
Probably, that might be done by inheritance and redefinition in child implementations.

(any chance we can avoid duplicating this, and instead refactor the lld one into LLVM and have lld use that one instead of using its own local copy?)

I thought about that. But the implementation from lld looks too specific for general implementation.
It uses knowledge about stored data:

  1. It keeps data together with hash inside hash table. This is OK for 32-bit integer but it would not be OK for larger type.

Other LLVM data structures make different choices based on the size of the content (eg: the default size of SmallVector varies depending on the size of the elements) - using traits or other template techniques seems suitable.

  1. It compares data to select which version should be stored. We need a way to do it or avoid it without introducing more checks.
if (!oldCell.isEmpty() >>>&& oldCell < newCell<<<)
  return idx;

Yeah, some sort of trait could be used to implement that (but maybe there are other broader abstractions) - like maybe that functionality should stay out in lld, even if the data structure's moved into LLVM - lld could "insert, check that the thing was already present, then do the comparison/decide whether to overwrite" - doesn't sound like it'd be a performance issue.

avl added a comment.May 24 2022, 7:21 AM

Ok, thanks. Will change review to use a general implementation in lld and DWARFLinker.

avl updated this revision to Diff 432913.May 30 2022, 6:54 AM

addressed comments:

  1. renamed LockFreeDataPool -> ConcurrentHashTable.
  2. implemented support for both lld and DWARFLinker implementations.
avl retitled this revision from [ADT] add LockFreeDataPool class. to [ADT] add ConcurrentHashTable class..May 30 2022, 6:55 AM
dblaikie added inline comments.May 30 2022, 4:24 PM
llvm/include/llvm/ADT/ConcurrentHashTable.h
83

(similarly below/for the other ctor here)

117–118

It's pretty weird to have warnings coming out of data structures. @MaskRay would it be reasonable to remove these/replace them with some other tool if we're promoting this to a more general purpose data structure rather than something more special-case in lld?

315–316

This should probably be provided via specialization, rather than as a separately named type?

I guess it has different requirements for the type, by the looks of it? That's curious/maybe enough to warrant a separate name, I guess, though I'm not 100% sure.

324

generally operator== should be implemented as a non-member (so that implicit conversions are equally applied to the left and right hand operands) - maybe this whole specification area should be written less in terms of specific declarations that should be provided, and more in terms of code snippets that should be valid/have certain semantics? (that's how the C++ standard documents requirements on things like value types, iterator types, etc)

llvm/unittests/ADT/ConcurrentHashTableTest.cpp
44–46
avl added inline comments.May 31 2022, 7:00 AM
llvm/include/llvm/ADT/ConcurrentHashTable.h
117–118

Probably, instead of warning we can introduce method:

bool IsInGoodState () const {
  return CurSlabNum <= MaxNumberOfSlabs && !hasFullSlab;
}

the consumer could call this method somewhere and inform the user:

if (!HashTable.IsInGoodState()) {
    outs() << "Hash table is in suboptimal state";
}

?

315–316

They have different interface:

std::pair<KeyDataTy, bool> insert(KeyDataTy NewData)

and

std::pair<KeyDataTy *, bool> insert(AllocatorTy &Allocator, KeyTy Key)

Implementing this through the specialization looks to be more complex and it seems does not have benefits comparing to the variant with separately named types?

dblaikie added inline comments.May 31 2022, 8:50 AM
llvm/include/llvm/ADT/ConcurrentHashTable.h
117–118

Maybe? I'd be curious to hear from @MaskRay here, since I guess he wrote the original/had the original motivation for the warning.

315–316

Fair enough. I wonder if it'd be acceptable to have the hash table take the allocator as a ctor parameter (& then keep the same API otherwise) and store it, or if there are too many of these that the extra pointer member would be too costly.

Though coming back to the original comment - why is external allocation usage/recommendation related to the size of the keys/size of a pointer?

avl added inline comments.May 31 2022, 10:02 AM
llvm/include/llvm/ADT/ConcurrentHashTable.h
117–118

that is me, who introduced the warning. Original lld implementation does not have it.

My motivation is that it should be known somehow that hashtable is not in good state and it is necessary to revise initial size.

315–316

Fair enough. I wonder if it'd be acceptable to have the hash table take the allocator as a ctor parameter (& then keep the same API otherwise) and store it, or if there are too many of these that the extra pointer member would be too costly.

The reason is necessity to synchronize allocator.

The allocator may be passed as the ctor parameter if it would be able to work in multy-thread mode. This implementation assumes that allocator is not able to work in multi-thread mode. To properly working it requires that the same allocator is used from the same thread. In that case no need to synchronize allocator.

  1. std::pair<KeyDataTy*, bool> insert(KeyTy Key)

The insert method might be called from different threads. if It would call to allocator stored in the constructor then we need to use mutex or to use special multi-thread implementation for the allocator.

  1. std::pair<KeyDataTy *, bool> insert(AllocatorTy &Allocator, KeyTy Key)

This solution assumes that the same thread uses the same allocator. It allows not to synchronize allocators.

Though coming back to the original comment - why is external allocation usage/recommendation related to the size of the keys/size of a pointer?

entry size for the solution with external allocation could not be less than pointer(since each entry is a pointer to the externally allocated data).

entry size for the solution with allocating inside hash table can be less and then the table may have smaller size. It also does not need special allocator, since data are kept inside hash table.

We could not always keep data inside hash table: If data size is greater then pointer then each entry(even null) would occupy bigger size. That would result in a bigger hash table. It would also result in copying extra data, since data entries are passed by value in that case.

So if entry size is less than pointer then it would require less size and no additional allocator required when ConcurrentHashTable is used. If entry size is greater than pointer then less space is used(no need to allocate full data for null entries) and data copying is faster when ConcurrentHashTableExtAlloc is used.

MaskRay added inline comments.May 31 2022, 6:50 PM
llvm/include/llvm/ADT/ConcurrentHashTable.h
117–118

The root issue is that the patch does not implement proper grow/resize, so setting a capacity and having oversize warnings here look like hacks. For many general-purpose usages it's difficult to estimate the size beforehand, or without heavily refactoring the usage pattern. So I wish that we implement a proper resizable hashtable with amortized O(1) time complexity, not the current version which will be slow if the actual size is much more larger than the reserved capacity.

avl added inline comments.Jun 1 2022, 7:49 AM
llvm/include/llvm/ADT/ConcurrentHashTable.h
117–118

This is correct. This implementation is not a good choice for the case when it is difficult to estimate the size beforehand.

But this might be a good choice for the cases when such estimation is possible. F.e. lld/COFF already uses
this implementation even without the resizing feature. I used it for DWARF files. All clang/llvm binaries are processed
without warning(and far from the resizing limits). That means that estimations are quite accurate. Still, resizing feature might be helpful for exceptional cases.

The implementation from this patch has several advantages: it is fast and lightweight, it does not require extra memory for buckets, does not require locking table or buckets for most cases, and does not require any additional memory management for the most cases.

For the task when it is not possible to estimate the size beforehand - there should be used another implementation.

avl updated this revision to Diff 433771.Jun 2 2022, 9:26 AM

addressed comments: removed warningHandler(added IsInSuboptimalState instead),
changed specification comments.

avl updated this revision to Diff 454533.Aug 22 2022, 9:10 AM

removed resizing(as I am going to submit another resizeable solution),
refactored to have a single "insert" method.

avl retitled this revision from [ADT] add ConcurrentHashTable class. to [ADT] add FixedConcurrentHashTable class..Aug 22 2022, 9:12 AM
avl edited the summary of this revision. (Show Details)
avl added a comment.Aug 23 2022, 2:32 AM

the patch with resizeable hashtable D132455

avl abandoned this revision.Feb 26 2023, 7:53 AM

closed in favor of D132455