Implement TrieRawHashMap which stores objects into a Trie based on the
hash of the object.
User needs to supply the hashing function and guarantees the uniqueness of
the hash for the objects to be inserted. Hash collision is not
supported
Differential D133715
[ADT] Add TrieRawHashMap steven_wu on Sep 12 2022, 10:41 AM. Authored by
Details
Implement TrieRawHashMap which stores objects into a Trie based on the User needs to supply the hashing function and guarantees the uniqueness of
Diff Detail
Unit Tests Event TimelineComment Actions Not sure I'll be the most detailed reviewer here, if anyone else wants to chime in. I've a generally understanding of what a trie is, but at my current level of detail looking at this code it's not entirely clear to me what this particular data structure is - maybe some more comments both in the source and in the patch description? simple example usage (in a test and/or in those comments/patch description) migth be helpful
Comment Actions [Resigning as reviewer, since I was the original author, but feel free ping me explicitly if I can be helpful for something and I'm not volunteering anything...] Yeah, probably more docs here would be good. Some notes that Steven might be able to use as a starting point (although feel free to ignore this an do your own thing!): At a high-level, this is a set/map data structure that supports fast thread-safe insertion/lookup and does not support erase. (IIRC, the interface similar to a std::map, where the key is a hash?) A hash-mapped trie is a tree of arrays where the prefix of the hash is consumed as an "index" into the array at each level of the tree. IIRC, this data structure as implemented:
Insertion (and lookup) works basically like the following (skipping over details that make the lock-free-thead-safety work):
Comment Actions Oh, this is all really useful info I didn't understand from the code - I'd assumed it was a user-visible trie (eg: something you could insert entries into, then walk the trie) - if the trie-ness is purely an implementation detail, maybe it shouldn't be tested/dumped? (like we don't test the layout of hash buckets in DenseMap tests) if it does need to be tested, maybe having a test friend in some way. As for naming, given the functionality. I'd consider putting Trie earlier in the name rather than later, same as DenseMap - it's not a trie data structure, it's a Set data structure implemented using a trie. So TrieSet. Though adding all the features might be a bit of a mouthful - ThreadSafeHashedTrieSet?... Not sure what the right answer is here (two hard problems in computer science, amirite?). Is it a set or a map? Not sure how to capture all the other fairly esoteric requirements (no removal, needs perfect hashing, lock free/thread safe, - I guess that's most of/all the user-visible major requirements (as you say, non-iterable is just a minor artifact of the current implementation, not a major defining feature of the interface)). PerfectHashMap? (if we were going verbose, ThreadSafePerfectHashMap... ) Comment Actions The interface is a map, just that the key type is restricted to look like a hash, and the client is expected to ensure the keys are well-distributed.
Both names make sense to me. "This is a map from a perfect hash to the data." Issues I see with those names:
Regardless of the name, maybe the programmer's guide could use an explainer. Note that there's an example data structure called ThreadSafeHashMappedTrieSet buried in the unit tests, which has an interface that hides the hash. Could be useful for some clients.
Maybe this could be PerfectHashSetImpl?
Comment Actions Hmm, right.
I think given the number of things we're trying to capture in the name, and the nuance here - not sure we can capture that in the name (even if that was the only thing we wanted to capture in the name I can't think of a punchy name - InsertOnlyFastConcurrentHashMap? :/)
Yeah, fair - I guess maybe that tends towards ConcurrentPerfectHashMap (open to Concurrent or ThreadSafe)
Might be this is esoteric enough not to be worth Programmer's Manual space, but certainly some good doc comments - but either/both I'm OK with.
I take it the current use cases you have in mind/lined up in the CAS use this directly? Maybe a Raw prefix? RawPerfectHashSet, RawConcurrentHashSet, etc... some combination/choice of those sort of things?
Comment Actions @steven_wu can confirm, but that's what I remember. (There was an early branch that used the hash-included-client-friendly set, but the use case for it was refactored IIRC.) To summarize, seems like the patch needs:
Comment Actions +1
+1
This bit I'm not so sure about - we don't test the bucketing behavior of DenseHashMap so far as I know (or other similar implementation details of the various other hashing data structures - (since they're the ones with fairly complicated implementation details, they seem the best comparison)), for instance, we only test its interface - why would we do differently for this data structure?
+1 Comment Actions IMO the layout is more complex than a flat array with quadratic probing. The logic for equal and/or nearly-equal hashes is a bit subtle, and there were hard-to-reason about bugs originally. The layout-stringification was necessary to understand what was going wrong, and the tests that use it help ensure future refactorings don't get it wrong. I don't remember the bugs, but two examples of subtleties:
I'd be a bit uneasy with the layout tests being dropped altogether. Maybe an alternative to testing the layout directly would be to add a verification member function that iterated through the data structure and ensured everything was self-consistent (else crash? else return false?). Then the tests could call the member function after a series of insertions that might trigger a "bad" layout. Comment Actions Fair enough - if it's sufficient to have a verify operation (maybe "assertValid" - so, yeah, crash when not valid) I'd go with that, but given the argument you've made, if you think verifying the specific structure is significantly more valuable than that, I'd be OK with some private/test-friended introspection API. Comment Actions IMO it's worth it; @steven_wu, if you disagree, I could live with assertValid(). (BTW, I remembered another justification. The test case inputs/setups are a bit subtle; checking the layout ensures you've covered the corner case you think you've covered.) Comment Actions I prefer to keep those tests as it still provides value and also insights into underlying implementation. If someone got around to do a new and better implementation, we can drop them by then. Comment Actions (avodiing quoting a lot) I think the stringification's especially/sort of an unfortunate way to test this - that stringified version probably isn't how a user would want to dump the data structure (they'd want to see a user-centric view of the data structure, not the implementation details). Is it practical to at least move to friending the test from the data structure and have the test poke around in the internals to inspect the structure & assert/expect/check various things about it? Comment Actions eh, maybe string's easier to read in expect diagnostics anyway... just seems a bit awkward/circuitous/unfortunate :/ (I guess the stringification could move into the test code, implemented via such a friend relationship) Comment Actions I do remember the string being easy to read in expect diagnostics, FWIW. What about renaming the methods to printLayout() and dumpLayout()? Then print() and dump() would at least still be available for something user-centric. Comment Actions Workable, though I'd still rather it be moved to the test file if it's not too inconvenient (with some friendship, probably). Avoid muddying the implementation with test-only features.
Comment Actions I find stringfication quite useful for debugging or inspecting the state of the CAS. Our branch (to be upstream later) has the OnDiskCAS, and we have a tool that can dump the CAS state (mostly the Trie information) in string format when pointed to the on disk CAS. My point is that even it is a testing/debugging tool, it is not limited to unit test only. I am ok with either names. But I think we can expose some private methods for debugging to avoid string comparison in unit test and hopefully that is more readable.
Comment Actions Address review feedback and rebase the patch using std::optional Still thinking how to unittest without string compare. The only way I can think is to implement iteration on HashMappedTrie so we can have API to check the state of SubTrie. That will expose more types to user, which will make API more complicated. Also for discussion of names, do we have other candidates to replace "HashMappedTrie" that is better and clearer?
Comment Actions Update tests to avoid string comparsion. It is not easy to add an iterator that can triversal Trie in a certain order but most test cases, we only care about the root node and the last sub-trie allocated. Those can be cheaply located just from the allocation chain, even it is not currently possible to construct the hash prefix if just walking the allocation chain. Comment Actions Ping. All feedbacks are addressed. Additional notes: I dig a bit deeper into the benchmark from https://reviews.llvm.org/D132548 as it shows bad scaling during parallel insertion tests (stop scaling to 8 threads). That is actually caused by our ThreadSafeBumpPtrAllocator that currently takes a lock. We can improve it in future since our use case doesn't expect heavy parallel insertions. However, it is quite obvious we should tune to RootBits and SubTrieBits. Increasing RootBits can significantly decrease the contention. A better strategy might be something like starting with something like 10 bits, then 4 bits, 2 bits and 1 bit. Shrinking number of bits can lead to better memory usage since the slots usage in the deep nodes are very low. Comment Actions The ability to dump the data structure I'm all for - like DenseMap, etc, but I think that dumping should be the user-visible state (it's a map, the trie part is an implementation detail that ideally shouldn't be exposed in the dump developers use regularly - presumably it's not necessary except when debugging the data structure itself, which should be rare (we don't usually need to look at DenseMap's internal bucketing, for instance)).
Can the unit test be a friend to the data structure and use the data structure's internal APIs?
I'd really like to avoid "trie" in the name, though I guess it's as much an implementation detail as "Dense" is in DenseMap, so might be deserving to be in the name. I think this is a map, so the name should probably end in 'map' not in 'trie'. I'd guess TrieHashMap - it's a type of hash map that uses a trie? Maybe make a shortlist of names to consider, etc? Comment Actions Could ThreadSafeBumpPtrAllocator could be made lock-free? I think at least it would be possible to implement one that only locked when a new allocation is needed, instead of every time the ptr is bumped as now. (I’ll think about it a bit.) Note that in the CAS use case it’s ideally true that most insertions are duplicates and don’t need to call the allocator at all. This is why we’ve been able to get away with a lock on each allocation. Comment Actions
Correct. The current dump method is not designed to dump user-visible state. The goal is to dump how hash/key is stored in trie, and it doesn't even dump any user stored value. To dump user visible state from DenseMap, it needs to support iteration first but HashMappedTrie currently doesn't support iteration (and the CAS/ObjectStore it implements cannot have iteration for security reason). The only useful dump method can be implemented without iteration is what it has now. I am ok to remove dump method from this patch since I already rewrite the tests without using dump. The string dump for the trie structure can be useful to debug full CAS in the future. Since CAS cannot have iteration, a separate tool that can dump the entire trie structure can be helpful to look inside CAS. In that case, dump is a public API that needs to be used by this CAS inspection tool, not a private method that expose to test only.
It is already the case now. I find a way to write unit-test with no string comparison and extract data from the trie without supporting real iteration. Supporting full iteration is not trivial since it will either involve a very complicated iterator or adding back edge in the trie or both. Currently, it has a very non-trivial way to iterate the subtrie in an order that is not really interesting to users, and the way it finds the hash prefix is quite expensive too. Thus, this is only useful for unit-tests and they are using private APIs that are friends to test only. A real iteration needs to do a tree walk from the root and store information it acquires along the way (like dump method).
I don't have candidates. TrieHashMap sounds good to me. Comment Actions Yes, the FIXME for a ThreadSafeBumpPtrAllocator is still there. Currently, I don't think it is urgent to fix. It is not expected to have someone to use Trie as a high performance thread safe set/map. Comment Actions Here's a sketch that I think mostly works? // Block for bump allocations. struct BumpBlock { std::atomic<char *> Ptr; std::atomic<BumpBlock *> Next; char Bytes[4084]; BumpBlock() : Ptr{Bytes}, Next{nullptr} {} // Compute new "Next", try to bump if there's space, else return nullptr. void *tryAllocate(size_t N, size_t Align = 1); }; // Tail-allocated data for "big" allocations. struct BumpSeparate { std::atomic<BumpSeparate *> Next; // tail-allocated data of appropriate and alignment, using malloc.... }; // Allocator. struct BumpAllocator { std::atomic<BumpBlock *> CurrentBlock; std::atomic<BumpSeparate *> LastAllocSeparately; // Delete everything since there's no ownership here... ~BumpAllocator(); void *allocate(size_t N, size_t Align = 1) { if (N > 2048) return allocateSeparately(N, Align); BumpBlock *B = CurrentBlock; std::unique_ptr<BumpBlock> New; void *NewAlloc = nullptr; while (true) { if (LLVM_LIKELY(B)) if (void *Alloc = B->tryAllocate(N, Align)) return Alloc; if (!New) { New = new BumpBlock; NewAlloc = New->tryAllocate(N, Align); assert(NewAlloc && "Empty block doesn't have space!"); } if (!CurrentBlock.compare_exchange_weak(B, New.get())) continue; // New was saved in CurrentBlock. Fix its "Next" pointer and release it // so it's not deallocated. New->Next = B; New.release(); return NewAlloc; } } private: // Maintain a list of "big" allocations, similar to above. void *allocateSeparately(size_t N, size_t Align); }; Not saying it needs to block this review... But having a fast concurrent BumpPtrAllocator would be independently useful, and I'd suggest optimizing the allocator before bloating the default trie size.
The immediate use case is as a high performance thread-safe data store. In the CAS use case we're expecting a lot of duplicate insertions which don't happen to hit the allocator, but I'm not sure that's really been checked/measured. Comment Actions Not sure I follow why it'd need to support iteration to support dumping of user-centric state. Supporting iteration (using an iterator abstraction) is a fair bit more complicated than walking the data structure directly to dump out its contents, I'd think.
Yeah, I'm generally OK with a public dump method that gives a user-centric result, like DenseMap's dump... oh, my mistake, DenseMap doesn't have a dump method. I guess we rely on debugger pretty printers for that (& as the person who wrote the gdb one at least, it's not great - because it's hard to walk the user-visible state correctly/skipping internal implementation details - so it does end up printing the buckets, etc). But, sure, doesn't need to be in this patch - I guess if we don't have it for other data structures, seems fair/reasonable/OK that we don't have it for this one.
Ah, OK, great.
Sounds good to me, then.
Comment Actions
+1 for fast concurrent ThreadSafeBumpPtrAllocator. What do you think about following alternative implementation? class ThreadSafeBumpPtrAllocator { ThreadSafeBumpPtrAllocator() { size_t ThreadsNum = ThreadPoolStrategy.compute_thread_count(); allocators.resize(ThreadsNum); } void* Allocate (size_t Num) { size_t AllocatorIdx = getThreadIdx(); return allocators[AllocatorIdx].Allocate(Num); } std::vector<BumpPtrAllocator> allocators; }; static thread_local ThreadIdx; size_t getThreadIdx() { return ThreadIdx; } This implementation uses the fact that ThreadPoolExecutor creates a fixed number Comment Actions @steven_wu To have some interface compatibility and to make it possible to use HashMappedTrie for use case from D96035 probably, std::pair<pointer, bool> insert(const_pointer Hint, value_type &&HashedData) The use case for this: std::pair<pointer, bool> Result = HashMappedTrie.insert(Data); if (Result.second) { // initialize Result.first data Result.first->field = data; } Comment Actions Let's move the discussion of ThreadSafeAllocator to https://reviews.llvm.org/D133713 since this patch just uses it and the implementation is over there. The background of this data structure is to use by a CAS, so it is ideal that the CAS doesn't need to lock to the amount of threads that is going to be spawned or rely on the thread id. Comment Actions +1; seems worth experimenting with (downside is you have at least as many allocations as active threads, but maybe that’s fine); IIRC C++11 thread-local initialization is slow on Darwin so we might want __thread here, or maybe the static makes it fast; +1 to Steven’s comment this should move to the other review (also IMO this could be an incremental improvement that lands later). Comment Actions @dexonsmith @benlangmuir @akyrtzi Locally, I updated to use std::pair as return value to make it more map like, and remove the dump method. I will update the patch once we have the name we agreed on. Comment Actions I guess one concern with TrieHashMap is that if this is the lower level implementation, and someone might implement a more map-like API on top of this, we might not want to take the "better" name for the data structure that'll be less directly used? Could prefix with "Raw" or maybe TrieRawHashMap? (since it's the hashing part that's particularly "raw" - relying on the hash being unique, etc) Comment Actions Maybe RawHashTrieMap? It reads better when Raw is in the front, and it contains hash-trie and trie-map, which are both terms describing data structures similar to this but this is much simpler, thus raw.
Comment Actions I think from a client perspective, this is some sort of variant of a HashMap, so ending with that makes sense to me. It’s an implementation detail that it’s a trie so that seems better in the prefix.
Comment Actions Update unittest:
|
This seems like a fair bit of text to wrap the trie - what's the value/extra test coverage this is providingg? (sorry I'm not following too clearly) I'd have hoped/expected this API could be tested more directly, without needing this wrapper and/or without needing to involve something as non-trivial as std::string (instead using simple user defined types in the test)?