This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add TrieRawHashMap
Needs ReviewPublic

Authored by steven_wu on Sep 12 2022, 10:41 AM.

Details

Summary

Implement TrieRawHashMap which stores objects into a Trie based on the
hash of the object.

User needs to supply the hashing function and guarantees the uniqueness of
the hash for the objects to be inserted. Hash collision is not
supported

Diff Detail

Event Timeline

steven_wu created this revision.Sep 12 2022, 10:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 10:41 AM
steven_wu requested review of this revision.Sep 12 2022, 10:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 10:41 AM
aganea added a subscriber: aganea.Sep 27 2022, 8:14 AM
steven_wu updated this revision to Diff 481059.Dec 7 2022, 2:08 PM

Update missing decl due to header cleanup in other files.

Not sure I'll be the most detailed reviewer here, if anyone else wants to chime in.

I've a generally understanding of what a trie is, but at my current level of detail looking at this code it's not entirely clear to me what this particular data structure is - maybe some more comments both in the source and in the patch description? simple example usage (in a test and/or in those comments/patch description) migth be helpful

llvm/include/llvm/ADT/HashMappedTrie.h
26

I think we can use static constexpr implicitly inline constants now, rather than enums?

35–37

Guess these could be constexpr variable templates rather than functions? But probably not a big deal either way.

64
157–161

This sort of feels a bit confusing - why are we hashing a hash?

llvm/unittests/ADT/HashMappedTrieTest.cpp
72–76

Seems unfortunate to do significant amounts of testing via the stringification of an object - any chance this could be done in a more API-centric way?

200

This is specifically to make it non-trivially destructible? should this record the execution of the dtor and check that the right number (or even instance) of destructions occur?

dexonsmith resigned from this revision.Dec 7 2022, 5:46 PM
dexonsmith added a subscriber: dexonsmith.

[Resigning as reviewer, since I was the original author, but feel free ping me explicitly if I can be helpful for something and I'm not volunteering anything...]

Not sure I'll be the most detailed reviewer here, if anyone else wants to chime in.

I've a generally understanding of what a trie is, but at my current level of detail looking at this code it's not entirely clear to me what this particular data structure is - maybe some more comments both in the source and in the patch description? simple example usage (in a test and/or in those comments/patch description) migth be helpful

Yeah, probably more docs here would be good. Some notes that Steven might be able to use as a starting point (although feel free to ignore this an do your own thing!):

At a high-level, this is a set/map data structure that supports fast thread-safe insertion/lookup and does not support erase. (IIRC, the interface similar to a std::map, where the key is a hash?)

A hash-mapped trie is a tree of arrays where the prefix of the hash is consumed as an "index" into the array at each level of the tree. IIRC, this data structure as implemented:

  • is lock-free and thread-safe
  • supports "insert" and "lookup"
  • does NOT detect a hash collision; first insertion wins; you want a strong-enough hash for your data set that there are no collisions, or handle collisions in the value_type somehow
  • does NOT support "erase"; IIRC, that'd be hard to do in a thread-safe way, but I didn't think hard about it
  • does NOT support "iteration"; IIRC, it'd be easy to implement iteration that exposed the internal layout
  • array sizes are configurable (root array can be a different size from sub-trie arrays)
  • each slot in an array is either empty, a sub-trie, or a value

Insertion (and lookup) works basically like the following (skipping over details that make the lock-free-thead-safety work):

  1. start with the root trie's array
  2. convert a prefix of the hash into an index into the current array
  3. if the slot is a value with the same hash, return the existing value
  4. if the slot is a value with a different hash, create a new sub-trie that contains the existing value, put the sub-trie in the slot, and continue with step (5)
  5. if the slot is a sub-trie, descend and go back to step (2) with the unused suffix of the hash on the sub-trie's array
  6. if the slot is empty, insert and return the new value
llvm/include/llvm/ADT/HashMappedTrie.h
157–161

This is creating a std::array<> out of an ArrayRef. std::array's constructors don't make this easy, so a wrapper function is helpful. Probably copyHash is a better name, or maybe there's a better way to do this!

llvm/unittests/ADT/HashMappedTrieTest.cpp
72–76

This is a good point. It was hard to test this way, and it's probably hard to maintain the tests.

The motivating thought was to avoid adding APIs that exposed the layout so that clients couldn't inspect/rely on it (i.e., there are no APIs for iterating through all values). But maybe that is unnecessarily restrictive.

IIRC, it would be fairly easy to change Trie::pointer to a recursive iterator, or to expose an iterator that wrapped it, or something. operator++ might be quite slow and/or be thread-unsafe, but maybe it's worth opening up just for the testing benefit.

(I'm happy either way; just chiming in in case I have more context about motivation for the currently-sparse API.)

alexander-shaposhnikov added inline comments.
llvm/include/llvm/ADT/HashMappedTrie.h
323

constexpr

[Resigning as reviewer, since I was the original author, but feel free ping me explicitly if I can be helpful for something and I'm not volunteering anything...]

Not sure I'll be the most detailed reviewer here, if anyone else wants to chime in.

I've a generally understanding of what a trie is, but at my current level of detail looking at this code it's not entirely clear to me what this particular data structure is - maybe some more comments both in the source and in the patch description? simple example usage (in a test and/or in those comments/patch description) migth be helpful

Yeah, probably more docs here would be good. Some notes that Steven might be able to use as a starting point (although feel free to ignore this an do your own thing!):

At a high-level, this is a set/map data structure that supports fast thread-safe insertion/lookup and does not support erase. (IIRC, the interface similar to a std::map, where the key is a hash?)

A hash-mapped trie is a tree of arrays where the prefix of the hash is consumed as an "index" into the array at each level of the tree. IIRC, this data structure as implemented:

  • is lock-free and thread-safe
  • supports "insert" and "lookup"
  • does NOT detect a hash collision; first insertion wins; you want a strong-enough hash for your data set that there are no collisions, or handle collisions in the value_type somehow
  • does NOT support "erase"; IIRC, that'd be hard to do in a thread-safe way, but I didn't think hard about it
  • does NOT support "iteration"; IIRC, it'd be easy to implement iteration that exposed the internal layout
  • array sizes are configurable (root array can be a different size from sub-trie arrays)
  • each slot in an array is either empty, a sub-trie, or a value

Insertion (and lookup) works basically like the following (skipping over details that make the lock-free-thead-safety work):

  1. start with the root trie's array
  2. convert a prefix of the hash into an index into the current array
  3. if the slot is a value with the same hash, return the existing value
  4. if the slot is a value with a different hash, create a new sub-trie that contains the existing value, put the sub-trie in the slot, and continue with step (5)
  5. if the slot is a sub-trie, descend and go back to step (2) with the unused suffix of the hash on the sub-trie's array
  6. if the slot is empty, insert and return the new value

Oh, this is all really useful info I didn't understand from the code - I'd assumed it was a user-visible trie (eg: something you could insert entries into, then walk the trie) - if the trie-ness is purely an implementation detail, maybe it shouldn't be tested/dumped? (like we don't test the layout of hash buckets in DenseMap tests) if it does need to be tested, maybe having a test friend in some way.

As for naming, given the functionality. I'd consider putting Trie earlier in the name rather than later, same as DenseMap - it's not a trie data structure, it's a Set data structure implemented using a trie. So TrieSet. Though adding all the features might be a bit of a mouthful - ThreadSafeHashedTrieSet?... Not sure what the right answer is here (two hard problems in computer science, amirite?).

Is it a set or a map?

Not sure how to capture all the other fairly esoteric requirements (no removal, needs perfect hashing, lock free/thread safe, - I guess that's most of/all the user-visible major requirements (as you say, non-iterable is just a minor artifact of the current implementation, not a major defining feature of the interface)). PerfectHashMap? (if we were going verbose, ThreadSafePerfectHashMap... )

Is it a set or a map?

The interface is a map, just that the key type is restricted to look like a hash, and the client is expected to ensure the keys are well-distributed.

Not sure how to capture all the other fairly esoteric requirements (no removal, needs perfect hashing, lock free/thread safe, - I guess that's most of/all the user-visible major requirements (as you say, non-iterable is just a minor artifact of the current implementation, not a major defining feature of the interface)). PerfectHashMap? (if we were going verbose, ThreadSafePerfectHashMap... )

Both names make sense to me. "This is a map from a perfect hash to the data."

Issues I see with those names:

  • I worry it's not obvious from either name *when* you'd want to use this. (The answer is, I think, you want this if you are building a set shared between multiple threads, you expect lots of concurrent lookup/insertion, and you want fast insertion/lookup nevertheless.)
  • It might sound like the data structure does hashing. (It doesn't. The client is expected to provide the hash.)
  • It steals the turf for more general map built on top of this. More later.

Regardless of the name, maybe the programmer's guide could use an explainer.

Note that there's an example data structure called ThreadSafeHashMappedTrieSet buried in the unit tests, which has an interface that hides the hash. Could be useful for some clients.

  • Maybe it would make sense to lift up? Name could be PerfectHashSet
  • You might want to build a map that also handles the hashing; name could potentially be PerfectHashMap

Maybe this could be PerfectHashSetImpl?

  • Has the "guts" for implementing a client-friendly PerfectHashSet or PerfectHashMap?
  • ... but can be used directly if you want to manage the hashing yourself?
llvm/unittests/ADT/HashMappedTrieTest.cpp
242–246

FYI, here, buried in the unit tests (demoted due to lack of immediate use cases), this is a set data structure built on top of the trie.

If this were lifted up again for actual use, as opposed to just a test, you'd want to template it on HasherT and then use HashBuilder (from llvm/include/llvm/Support/HashBuilder.h) to hash the value.

Is it a set or a map?

The interface is a map, just that the key type is restricted to look like a hash, and the client is expected to ensure the keys are well-distributed.

Hmm, right.

Not sure how to capture all the other fairly esoteric requirements (no removal, needs perfect hashing, lock free/thread safe, - I guess that's most of/all the user-visible major requirements (as you say, non-iterable is just a minor artifact of the current implementation, not a major defining feature of the interface)). PerfectHashMap? (if we were going verbose, ThreadSafePerfectHashMap... )

Both names make sense to me. "This is a map from a perfect hash to the data."

Issues I see with those names:

  • I worry it's not obvious from either name *when* you'd want to use this. (The answer is, I think, you want this if you are building a set shared between multiple threads, you expect lots of concurrent lookup/insertion, and you want fast insertion/lookup nevertheless.)

I think given the number of things we're trying to capture in the name, and the nuance here - not sure we can capture that in the name (even if that was the only thing we wanted to capture in the name I can't think of a punchy name - InsertOnlyFastConcurrentHashMap? :/)

  • It might sound like the data structure does hashing. (It doesn't. The client is expected to provide the hash.)

Yeah, fair - I guess maybe that tends towards ConcurrentPerfectHashMap (open to Concurrent or ThreadSafe)

  • It steals the turf for more general map built on top of this. More later.

Regardless of the name, maybe the programmer's guide could use an explainer.

Might be this is esoteric enough not to be worth Programmer's Manual space, but certainly some good doc comments - but either/both I'm OK with.

Note that there's an example data structure called ThreadSafeHashMappedTrieSet buried in the unit tests, which has an interface that hides the hash. Could be useful for some clients.

  • Maybe it would make sense to lift up? Name could be PerfectHashSet
  • You might want to build a map that also handles the hashing; name could potentially be PerfectHashMap

Maybe this could be PerfectHashSetImpl?

  • Has the "guts" for implementing a client-friendly PerfectHashSet or PerfectHashMap?
  • ... but can be used directly if you want to manage the hashing yourself?

I take it the current use cases you have in mind/lined up in the CAS use this directly? Maybe a Raw prefix? RawPerfectHashSet, RawConcurrentHashSet, etc... some combination/choice of those sort of things?

llvm/unittests/ADT/HashMappedTrieTest.cpp
242–246

Hmm, yeah, this looks like overkill for the test - given the data structure as-is, we don't need any hash algorithm here - the test, I would expect, would use unique array values/"hash" values hardcoded/directly.

If you've got a use for this data structure, it could come along in a subsequent patch?

Note that there's an example data structure called ThreadSafeHashMappedTrieSet buried in the unit tests, which has an interface that hides the hash. Could be useful for some clients.

  • Maybe it would make sense to lift up? Name could be PerfectHashSet
  • You might want to build a map that also handles the hashing; name could potentially be PerfectHashMap

Maybe this could be PerfectHashSetImpl?

  • Has the "guts" for implementing a client-friendly PerfectHashSet or PerfectHashMap?
  • ... but can be used directly if you want to manage the hashing yourself?

I take it the current use cases you have in mind/lined up in the CAS use this directly? Maybe a Raw prefix? RawPerfectHashSet, RawConcurrentHashSet, etc... some combination/choice of those sort of things?

@steven_wu can confirm, but that's what I remember. (There was an early branch that used the hash-included-client-friendly set, but the use case for it was refactored IIRC.)

To summarize, seems like the patch needs:

  • a new name (RawConcurrentHashSet SGTM);
  • some header docs explaining how to use it / design motivation;
  • a test support friend class that can inspect/test the trie layout without relying on stringification;
  • unit tests rewritten to use the friend (and probably dropping the example higher-level data structure).

Note that there's an example data structure called ThreadSafeHashMappedTrieSet buried in the unit tests, which has an interface that hides the hash. Could be useful for some clients.

  • Maybe it would make sense to lift up? Name could be PerfectHashSet
  • You might want to build a map that also handles the hashing; name could potentially be PerfectHashMap

Maybe this could be PerfectHashSetImpl?

  • Has the "guts" for implementing a client-friendly PerfectHashSet or PerfectHashMap?
  • ... but can be used directly if you want to manage the hashing yourself?

I take it the current use cases you have in mind/lined up in the CAS use this directly? Maybe a Raw prefix? RawPerfectHashSet, RawConcurrentHashSet, etc... some combination/choice of those sort of things?

@steven_wu can confirm, but that's what I remember. (There was an early branch that used the hash-included-client-friendly set, but the use case for it was refactored IIRC.)

To summarize, seems like the patch needs:

  • a new name (RawConcurrentHashSet SGTM);

+1

  • some header docs explaining how to use it / design motivation;

+1

  • a test support friend class that can inspect/test the trie layout without relying on stringification;

This bit I'm not so sure about - we don't test the bucketing behavior of DenseHashMap so far as I know (or other similar implementation details of the various other hashing data structures - (since they're the ones with fairly complicated implementation details, they seem the best comparison)), for instance, we only test its interface - why would we do differently for this data structure?

  • unit tests rewritten to use the friend (and probably dropping the example higher-level data structure).

+1

  • a test support friend class that can inspect/test the trie layout without relying on stringification;

This bit I'm not so sure about - we don't test the bucketing behavior of DenseHashMap so far as I know (or other similar implementation details of the various other hashing data structures - (since they're the ones with fairly complicated implementation details, they seem the best comparison)), for instance, we only test its interface - why would we do differently for this data structure?

IMO the layout is more complex than a flat array with quadratic probing. The logic for equal and/or nearly-equal hashes is a bit subtle, and there were hard-to-reason about bugs originally. The layout-stringification was necessary to understand what was going wrong, and the tests that use it help ensure future refactorings don't get it wrong.

I don't remember the bugs, but two examples of subtleties:

  • On an exact match you don't want to "sink" the existing entry down a level to a new sub-trie (you need to detect "exact match" before sinking). Getting this wrong will affect performance but not otherwise be user-visible.
  • The deepest sub-trie might be a different/smaller size than the others because there are only a few bits left-over, and the handling needs to be right. It's simpler to check for correct layout directly than to guess about what user-visible effects there might be for errors.

I'd be a bit uneasy with the layout tests being dropped altogether.

Maybe an alternative to testing the layout directly would be to add a verification member function that iterated through the data structure and ensured everything was self-consistent (else crash? else return false?). Then the tests could call the member function after a series of insertions that might trigger a "bad" layout.

  • a test support friend class that can inspect/test the trie layout without relying on stringification;

This bit I'm not so sure about - we don't test the bucketing behavior of DenseHashMap so far as I know (or other similar implementation details of the various other hashing data structures - (since they're the ones with fairly complicated implementation details, they seem the best comparison)), for instance, we only test its interface - why would we do differently for this data structure?

IMO the layout is more complex than a flat array with quadratic probing. The logic for equal and/or nearly-equal hashes is a bit subtle, and there were hard-to-reason about bugs originally. The layout-stringification was necessary to understand what was going wrong, and the tests that use it help ensure future refactorings don't get it wrong.

I don't remember the bugs, but two examples of subtleties:

  • On an exact match you don't want to "sink" the existing entry down a level to a new sub-trie (you need to detect "exact match" before sinking). Getting this wrong will affect performance but not otherwise be user-visible.
  • The deepest sub-trie might be a different/smaller size than the others because there are only a few bits left-over, and the handling needs to be right. It's simpler to check for correct layout directly than to guess about what user-visible effects there might be for errors.

I'd be a bit uneasy with the layout tests being dropped altogether.

Maybe an alternative to testing the layout directly would be to add a verification member function that iterated through the data structure and ensured everything was self-consistent (else crash? else return false?). Then the tests could call the member function after a series of insertions that might trigger a "bad" layout.

Fair enough - if it's sufficient to have a verify operation (maybe "assertValid" - so, yeah, crash when not valid) I'd go with that, but given the argument you've made, if you think verifying the specific structure is significantly more valuable than that, I'd be OK with some private/test-friended introspection API.

  • a test support friend class that can inspect/test the trie layout without relying on stringification;

This bit I'm not so sure about - we don't test the bucketing behavior of DenseHashMap so far as I know (or other similar implementation details of the various other hashing data structures - (since they're the ones with fairly complicated implementation details, they seem the best comparison)), for instance, we only test its interface - why would we do differently for this data structure?

IMO the layout is more complex than a flat array with quadratic probing. The logic for equal and/or nearly-equal hashes is a bit subtle, and there were hard-to-reason about bugs originally. The layout-stringification was necessary to understand what was going wrong, and the tests that use it help ensure future refactorings don't get it wrong.

I don't remember the bugs, but two examples of subtleties:

  • On an exact match you don't want to "sink" the existing entry down a level to a new sub-trie (you need to detect "exact match" before sinking). Getting this wrong will affect performance but not otherwise be user-visible.
  • The deepest sub-trie might be a different/smaller size than the others because there are only a few bits left-over, and the handling needs to be right. It's simpler to check for correct layout directly than to guess about what user-visible effects there might be for errors.

I'd be a bit uneasy with the layout tests being dropped altogether.

Maybe an alternative to testing the layout directly would be to add a verification member function that iterated through the data structure and ensured everything was self-consistent (else crash? else return false?). Then the tests could call the member function after a series of insertions that might trigger a "bad" layout.

Fair enough - if it's sufficient to have a verify operation (maybe "assertValid" - so, yeah, crash when not valid) I'd go with that, but given the argument you've made, if you think verifying the specific structure is significantly more valuable than that, I'd be OK with some private/test-friended introspection API.

IMO it's worth it; @steven_wu, if you disagree, I could live with assertValid(). (BTW, I remembered another justification. The test case inputs/setups are a bit subtle; checking the layout ensures you've covered the corner case you think you've covered.)

  • a test support friend class that can inspect/test the trie layout without relying on stringification;

This bit I'm not so sure about - we don't test the bucketing behavior of DenseHashMap so far as I know (or other similar implementation details of the various other hashing data structures - (since they're the ones with fairly complicated implementation details, they seem the best comparison)), for instance, we only test its interface - why would we do differently for this data structure?

IMO the layout is more complex than a flat array with quadratic probing. The logic for equal and/or nearly-equal hashes is a bit subtle, and there were hard-to-reason about bugs originally. The layout-stringification was necessary to understand what was going wrong, and the tests that use it help ensure future refactorings don't get it wrong.

I don't remember the bugs, but two examples of subtleties:

  • On an exact match you don't want to "sink" the existing entry down a level to a new sub-trie (you need to detect "exact match" before sinking). Getting this wrong will affect performance but not otherwise be user-visible.
  • The deepest sub-trie might be a different/smaller size than the others because there are only a few bits left-over, and the handling needs to be right. It's simpler to check for correct layout directly than to guess about what user-visible effects there might be for errors.

I'd be a bit uneasy with the layout tests being dropped altogether.

Maybe an alternative to testing the layout directly would be to add a verification member function that iterated through the data structure and ensured everything was self-consistent (else crash? else return false?). Then the tests could call the member function after a series of insertions that might trigger a "bad" layout.

Fair enough - if it's sufficient to have a verify operation (maybe "assertValid" - so, yeah, crash when not valid) I'd go with that, but given the argument you've made, if you think verifying the specific structure is significantly more valuable than that, I'd be OK with some private/test-friended introspection API.

IMO it's worth it; @steven_wu, if you disagree, I could live with assertValid(). (BTW, I remembered another justification. The test case inputs/setups are a bit subtle; checking the layout ensures you've covered the corner case you think you've covered.)

I prefer to keep those tests as it still provides value and also insights into underlying implementation. If someone got around to do a new and better implementation, we can drop them by then.

(avodiing quoting a lot)

I think the stringification's especially/sort of an unfortunate way to test this - that stringified version probably isn't how a user would want to dump the data structure (they'd want to see a user-centric view of the data structure, not the implementation details). Is it practical to at least move to friending the test from the data structure and have the test poke around in the internals to inspect the structure & assert/expect/check various things about it?

eh, maybe string's easier to read in expect diagnostics anyway... just seems a bit awkward/circuitous/unfortunate :/ (I guess the stringification could move into the test code, implemented via such a friend relationship)

eh, maybe string's easier to read in expect diagnostics anyway... just seems a bit awkward/circuitous/unfortunate :/ (I guess the stringification could move into the test code, implemented via such a friend relationship)

I do remember the string being easy to read in expect diagnostics, FWIW.

What about renaming the methods to printLayout() and dumpLayout()? Then print() and dump() would at least still be available for something user-centric.

eh, maybe string's easier to read in expect diagnostics anyway... just seems a bit awkward/circuitous/unfortunate :/ (I guess the stringification could move into the test code, implemented via such a friend relationship)

I do remember the string being easy to read in expect diagnostics, FWIW.

What about renaming the methods to printLayout() and dumpLayout()? Then print() and dump() would at least still be available for something user-centric.

Workable, though I'd still rather it be moved to the test file if it's not too inconvenient (with some friendship, probably). Avoid muddying the implementation with test-only features.

tschuett added inline comments.
llvm/lib/Support/HashMappedTrieIndexGenerator.h
14

Please use std::optional.

eh, maybe string's easier to read in expect diagnostics anyway... just seems a bit awkward/circuitous/unfortunate :/ (I guess the stringification could move into the test code, implemented via such a friend relationship)

I do remember the string being easy to read in expect diagnostics, FWIW.

What about renaming the methods to printLayout() and dumpLayout()? Then print() and dump() would at least still be available for something user-centric.

Workable, though I'd still rather it be moved to the test file if it's not too inconvenient (with some friendship, probably). Avoid muddying the implementation with test-only features.

I find stringfication quite useful for debugging or inspecting the state of the CAS. Our branch (to be upstream later) has the OnDiskCAS, and we have a tool that can dump the CAS state (mostly the Trie information) in string format when pointed to the on disk CAS. My point is that even it is a testing/debugging tool, it is not limited to unit test only. I am ok with either names.

But I think we can expose some private methods for debugging to avoid string comparison in unit test and hopefully that is more readable.

llvm/unittests/ADT/HashMappedTrieTest.cpp
242–246

Agree. I don't think the actual data structure is useful outside the context of this unit test. I will simply it.

steven_wu updated this revision to Diff 486055.Jan 3 2023, 12:42 PM

Address review feedback and rebase the patch using std::optional

Still thinking how to unittest without string compare. The only way I can think is to implement iteration on HashMappedTrie so we can have API to check the state of SubTrie. That will expose more types to user, which will make API more complicated.

Also for discussion of names, do we have other candidates to replace "HashMappedTrie" that is better and clearer?

steven_wu marked 9 inline comments as done.Jan 3 2023, 1:01 PM
steven_wu added inline comments.
llvm/unittests/ADT/HashMappedTrieTest.cpp
200

The test is for making sure no recursion happening when destroying a large Trie with objects inside (with or without destructor). The order doesn't matter. I added comment to clarify.

steven_wu updated this revision to Diff 486102.Jan 3 2023, 2:57 PM

Add some documentation about HashMappedTrie

steven_wu updated this revision to Diff 486409.Jan 4 2023, 3:30 PM

Update tests to avoid string comparsion.

It is not easy to add an iterator that can triversal Trie in a certain order but most test cases, we only care about the root node and the last sub-trie allocated. Those can be cheaply located just from the allocation chain, even it is not currently possible to construct the hash prefix if just walking the allocation chain.

steven_wu updated this revision to Diff 486411.Jan 4 2023, 3:32 PM

Fix typo in the comments

steven_wu updated this revision to Diff 486632.Jan 5 2023, 10:53 AM

Allow checking prefix of the sub-trie

Ping. All feedbacks are addressed.

Additional notes: I dig a bit deeper into the benchmark from https://reviews.llvm.org/D132548 as it shows bad scaling during parallel insertion tests (stop scaling to 8 threads). That is actually caused by our ThreadSafeBumpPtrAllocator that currently takes a lock. We can improve it in future since our use case doesn't expect heavy parallel insertions. However, it is quite obvious we should tune to RootBits and SubTrieBits. Increasing RootBits can significantly decrease the contention. A better strategy might be something like starting with something like 10 bits, then 4 bits, 2 bits and 1 bit. Shrinking number of bits can lead to better memory usage since the slots usage in the deep nodes are very low.

eh, maybe string's easier to read in expect diagnostics anyway... just seems a bit awkward/circuitous/unfortunate :/ (I guess the stringification could move into the test code, implemented via such a friend relationship)

I do remember the string being easy to read in expect diagnostics, FWIW.

What about renaming the methods to printLayout() and dumpLayout()? Then print() and dump() would at least still be available for something user-centric.

Workable, though I'd still rather it be moved to the test file if it's not too inconvenient (with some friendship, probably). Avoid muddying the implementation with test-only features.

I find stringfication quite useful for debugging or inspecting the state of the CAS. Our branch (to be upstream later) has the OnDiskCAS, and we have a tool that can dump the CAS state (mostly the Trie information) in string format when pointed to the on disk CAS. My point is that even it is a testing/debugging tool, it is not limited to unit test only. I am ok with either names.

The ability to dump the data structure I'm all for - like DenseMap, etc, but I think that dumping should be the user-visible state (it's a map, the trie part is an implementation detail that ideally shouldn't be exposed in the dump developers use regularly - presumably it's not necessary except when debugging the data structure itself, which should be rare (we don't usually need to look at DenseMap's internal bucketing, for instance)).

But I think we can expose some private methods for debugging to avoid string comparison in unit test and hopefully that is more readable.

Address review feedback and rebase the patch using std::optional

Still thinking how to unittest without string compare. The only way I can think is to implement iteration on HashMappedTrie so we can have API to check the state of SubTrie. That will expose more types to user, which will make API more complicated.

Can the unit test be a friend to the data structure and use the data structure's internal APIs?

Also for discussion of names, do we have other candidates to replace "HashMappedTrie" that is better and clearer?

I'd really like to avoid "trie" in the name, though I guess it's as much an implementation detail as "Dense" is in DenseMap, so might be deserving to be in the name.

I think this is a map, so the name should probably end in 'map' not in 'trie'. I'd guess TrieHashMap - it's a type of hash map that uses a trie? Maybe make a shortlist of names to consider, etc?

Ping. All feedbacks are addressed.

Additional notes: I dig a bit deeper into the benchmark from https://reviews.llvm.org/D132548 as it shows bad scaling during parallel insertion tests (stop scaling to 8 threads). That is actually caused by our ThreadSafeBumpPtrAllocator that currently takes a lock. We can improve it in future since our use case doesn't expect heavy parallel insertions. However, it is quite obvious we should tune to RootBits and SubTrieBits. Increasing RootBits can significantly decrease the contention. A better strategy might be something like starting with something like 10 bits, then 4 bits, 2 bits and 1 bit. Shrinking number of bits can lead to better memory usage since the slots usage in the deep nodes are very low.

Could ThreadSafeBumpPtrAllocator could be made lock-free? I think at least it would be possible to implement one that only locked when a new allocation is needed, instead of every time the ptr is bumped as now. (I’ll think about it a bit.)

Note that in the CAS use case it’s ideally true that most insertions are duplicates and don’t need to call the allocator at all. This is why we’ve been able to get away with a lock on each allocation.

The ability to dump the data structure I'm all for - like DenseMap, etc, but I think that dumping should be the user-visible state (it's a map, the trie part is an implementation detail that ideally shouldn't be exposed in the dump developers use regularly - presumably it's not necessary except when debugging the data structure itself, which should be rare (we don't usually need to look at DenseMap's internal bucketing, for instance)).

Correct. The current dump method is not designed to dump user-visible state. The goal is to dump how hash/key is stored in trie, and it doesn't even dump any user stored value. To dump user visible state from DenseMap, it needs to support iteration first but HashMappedTrie currently doesn't support iteration (and the CAS/ObjectStore it implements cannot have iteration for security reason). The only useful dump method can be implemented without iteration is what it has now. I am ok to remove dump method from this patch since I already rewrite the tests without using dump. The string dump for the trie structure can be useful to debug full CAS in the future. Since CAS cannot have iteration, a separate tool that can dump the entire trie structure can be helpful to look inside CAS. In that case, dump is a public API that needs to be used by this CAS inspection tool, not a private method that expose to test only.

Can the unit test be a friend to the data structure and use the data structure's internal APIs?

It is already the case now. I find a way to write unit-test with no string comparison and extract data from the trie without supporting real iteration. Supporting full iteration is not trivial since it will either involve a very complicated iterator or adding back edge in the trie or both. Currently, it has a very non-trivial way to iterate the subtrie in an order that is not really interesting to users, and the way it finds the hash prefix is quite expensive too. Thus, this is only useful for unit-tests and they are using private APIs that are friends to test only. A real iteration needs to do a tree walk from the root and store information it acquires along the way (like dump method).

I'd really like to avoid "trie" in the name, though I guess it's as much an implementation detail as "Dense" is in DenseMap, so might be deserving to be in the name.

I think this is a map, so the name should probably end in 'map' not in 'trie'. I'd guess TrieHashMap - it's a type of hash map that uses a trie? Maybe make a shortlist of names to consider, etc?

I don't have candidates. TrieHashMap sounds good to me.

Ping. All feedbacks are addressed.

Additional notes: I dig a bit deeper into the benchmark from https://reviews.llvm.org/D132548 as it shows bad scaling during parallel insertion tests (stop scaling to 8 threads). That is actually caused by our ThreadSafeBumpPtrAllocator that currently takes a lock. We can improve it in future since our use case doesn't expect heavy parallel insertions. However, it is quite obvious we should tune to RootBits and SubTrieBits. Increasing RootBits can significantly decrease the contention. A better strategy might be something like starting with something like 10 bits, then 4 bits, 2 bits and 1 bit. Shrinking number of bits can lead to better memory usage since the slots usage in the deep nodes are very low.

Could ThreadSafeBumpPtrAllocator could be made lock-free? I think at least it would be possible to implement one that only locked when a new allocation is needed, instead of every time the ptr is bumped as now. (I’ll think about it a bit.)

Note that in the CAS use case it’s ideally true that most insertions are duplicates and don’t need to call the allocator at all. This is why we’ve been able to get away with a lock on each allocation.

Yes, the FIXME for a ThreadSafeBumpPtrAllocator is still there. Currently, I don't think it is urgent to fix. It is not expected to have someone to use Trie as a high performance thread safe set/map.

dexonsmith added a comment.EditedJan 5 2023, 2:45 PM

Ping. All feedbacks are addressed.

Additional notes: I dig a bit deeper into the benchmark from https://reviews.llvm.org/D132548 as it shows bad scaling during parallel insertion tests (stop scaling to 8 threads). That is actually caused by our ThreadSafeBumpPtrAllocator that currently takes a lock. We can improve it in future since our use case doesn't expect heavy parallel insertions. However, it is quite obvious we should tune to RootBits and SubTrieBits. Increasing RootBits can significantly decrease the contention. A better strategy might be something like starting with something like 10 bits, then 4 bits, 2 bits and 1 bit. Shrinking number of bits can lead to better memory usage since the slots usage in the deep nodes are very low.

Could ThreadSafeBumpPtrAllocator could be made lock-free? I think at least it would be possible to implement one that only locked when a new allocation is needed, instead of every time the ptr is bumped as now. (I’ll think about it a bit.)

Note that in the CAS use case it’s ideally true that most insertions are duplicates and don’t need to call the allocator at all. This is why we’ve been able to get away with a lock on each allocation.

Here's a sketch that I think mostly works?

// Block for bump allocations.
struct BumpBlock {
  std::atomic<char *> Ptr;
  std::atomic<BumpBlock *> Next;
  char Bytes[4084];
  BumpBlock() : Ptr{Bytes}, Next{nullptr} {}

  // Compute new "Next", try to bump if there's space, else return nullptr.
  void *tryAllocate(size_t N, size_t Align = 1);
};

// Tail-allocated data for "big" allocations.
struct BumpSeparate {
  std::atomic<BumpSeparate *> Next;
  // tail-allocated data of appropriate and alignment, using malloc....
};

// Allocator.
struct BumpAllocator {
  std::atomic<BumpBlock *> CurrentBlock;
  std::atomic<BumpSeparate *> LastAllocSeparately;

  // Delete everything since there's no ownership here...
  ~BumpAllocator();

  void *allocate(size_t N, size_t Align = 1) {
    if (N > 2048)
      return allocateSeparately(N, Align);

    BumpBlock *B = CurrentBlock;
    std::unique_ptr<BumpBlock> New;
    void *NewAlloc = nullptr;
    while (true) {
      if (LLVM_LIKELY(B))
        if (void *Alloc = B->tryAllocate(N, Align))
          return Alloc;

      if (!New) {
        New = new BumpBlock;
        NewAlloc = New->tryAllocate(N, Align);
        assert(NewAlloc && "Empty block doesn't have space!");
      }
      if (!CurrentBlock.compare_exchange_weak(B, New.get()))
        continue;

      // New was saved in CurrentBlock. Fix its "Next" pointer and release it
      // so it's not deallocated.
      New->Next = B;
      New.release();
      return NewAlloc;
    }
  }

private:
  // Maintain a list of "big" allocations, similar to above.
  void *allocateSeparately(size_t N, size_t Align);
};

Not saying it needs to block this review...

But having a fast concurrent BumpPtrAllocator would be independently useful, and I'd suggest optimizing the allocator before bloating the default trie size.

Yes, the FIXME for a ThreadSafeBumpPtrAllocator is still there. Currently, I don't think it is urgent to fix. It is not expected to have someone to use Trie as a high performance thread safe set/map.

The immediate use case is as a high performance thread-safe data store. In the CAS use case we're expecting a lot of duplicate insertions which don't happen to hit the allocator, but I'm not sure that's really been checked/measured.

The ability to dump the data structure I'm all for - like DenseMap, etc, but I think that dumping should be the user-visible state (it's a map, the trie part is an implementation detail that ideally shouldn't be exposed in the dump developers use regularly - presumably it's not necessary except when debugging the data structure itself, which should be rare (we don't usually need to look at DenseMap's internal bucketing, for instance)).

Correct. The current dump method is not designed to dump user-visible state. The goal is to dump how hash/key is stored in trie, and it doesn't even dump any user stored value. To dump user visible state from DenseMap, it needs to support iteration first but HashMappedTrie currently doesn't support iteration (and the CAS/ObjectStore it implements cannot have iteration for security reason).

Not sure I follow why it'd need to support iteration to support dumping of user-centric state. Supporting iteration (using an iterator abstraction) is a fair bit more complicated than walking the data structure directly to dump out its contents, I'd think.

The only useful dump method can be implemented without iteration is what it has now. I am ok to remove dump method from this patch since I already rewrite the tests without using dump. The string dump for the trie structure can be useful to debug full CAS in the future. Since CAS cannot have iteration, a separate tool that can dump the entire trie structure can be helpful to look inside CAS. In that case, dump is a public API that needs to be used by this CAS inspection tool, not a private method that expose to test only.

Yeah, I'm generally OK with a public dump method that gives a user-centric result, like DenseMap's dump... oh, my mistake, DenseMap doesn't have a dump method. I guess we rely on debugger pretty printers for that (& as the person who wrote the gdb one at least, it's not great - because it's hard to walk the user-visible state correctly/skipping internal implementation details - so it does end up printing the buckets, etc).

But, sure, doesn't need to be in this patch - I guess if we don't have it for other data structures, seems fair/reasonable/OK that we don't have it for this one.

Can the unit test be a friend to the data structure and use the data structure's internal APIs?

It is already the case now. I find a way to write unit-test with no string comparison and extract data from the trie without supporting real iteration. Supporting full iteration is not trivial since it will either involve a very complicated iterator or adding back edge in the trie or both. Currently, it has a very non-trivial way to iterate the subtrie in an order that is not really interesting to users, and the way it finds the hash prefix is quite expensive too. Thus, this is only useful for unit-tests and they are using private APIs that are friends to test only. A real iteration needs to do a tree walk from the root and store information it acquires along the way (like dump method).

Ah, OK, great.

I'd really like to avoid "trie" in the name, though I guess it's as much an implementation detail as "Dense" is in DenseMap, so might be deserving to be in the name.

I think this is a map, so the name should probably end in 'map' not in 'trie'. I'd guess TrieHashMap - it's a type of hash map that uses a trie? Maybe make a shortlist of names to consider, etc?

I don't have candidates. TrieHashMap sounds good to me.

Sounds good to me, then.

llvm/unittests/ADT/HashMappedTrieTest.cpp
19

Rather than having to indirect everything through this class helper - I think it's possible to name the class of the test fixture itself, and then you could friend that, so the test fixture would have direct access? Might be simpler that way.

avl added a subscriber: avl.Jan 6 2023, 6:28 AM

But having a fast concurrent BumpPtrAllocator would be independently useful, and I'd suggest optimizing the allocator before bloating the default trie size.

+1 for fast concurrent ThreadSafeBumpPtrAllocator.

What do you think about following alternative implementation?

class ThreadSafeBumpPtrAllocator {
  ThreadSafeBumpPtrAllocator() {
    size_t ThreadsNum = ThreadPoolStrategy.compute_thread_count();
    allocators.resize(ThreadsNum);
  }
  
  void* Allocate (size_t Num) {
      size_t AllocatorIdx = getThreadIdx();
      
      return allocators[AllocatorIdx].Allocate(Num);
  }

  std::vector<BumpPtrAllocator> allocators;
};

static thread_local ThreadIdx;

size_t getThreadIdx() {
  return ThreadIdx;
}

This implementation uses the fact that ThreadPoolExecutor creates a fixed number
of threads(ThreadPoolStrategy.compute_thread_count()) and keeps them until destructed
. ThreadPoolExecutor can initialise thread local field ThreadIdx to the proper thread index.
The getThreadIdx() could return index of thread inside ThreadPoolExecutor.Threads.
ThreadSafeBumpPtrAllocator keeps separate allocator for each thread. In this case each thread would
always use separate allocator. No neccessary to have locks, cas operations, no races...

avl added a comment.Jan 6 2023, 6:45 AM

@steven_wu To have some interface compatibility and to make it possible to use HashMappedTrie for use case from D96035 probably,
Could it be possible to add boolean value to the result indicating whether data has just inserted?

std::pair<pointer, bool> insert(const_pointer Hint, value_type &&HashedData)

The use case for this:

std::pair<pointer, bool> Result = HashMappedTrie.insert(Data);
if (Result.second) {
  // initialize Result.first data
  Result.first->field = data;
}

But having a fast concurrent BumpPtrAllocator would be independently useful, and I'd suggest optimizing the allocator before bloating the default trie size.

+1 for fast concurrent ThreadSafeBumpPtrAllocator.

What do you think about following alternative implementation?

class ThreadSafeBumpPtrAllocator {
  ThreadSafeBumpPtrAllocator() {
    size_t ThreadsNum = ThreadPoolStrategy.compute_thread_count();
    allocators.resize(ThreadsNum);
  }
  
  void* Allocate (size_t Num) {
      size_t AllocatorIdx = getThreadIdx();
      
      return allocators[AllocatorIdx].Allocate(Num);
  }

  std::vector<BumpPtrAllocator> allocators;
};

static thread_local ThreadIdx;

size_t getThreadIdx() {
  return ThreadIdx;
}

This implementation uses the fact that ThreadPoolExecutor creates a fixed number
of threads(ThreadPoolStrategy.compute_thread_count()) and keeps them until destructed
. ThreadPoolExecutor can initialise thread local field ThreadIdx to the proper thread index.
The getThreadIdx() could return index of thread inside ThreadPoolExecutor.Threads.
ThreadSafeBumpPtrAllocator keeps separate allocator for each thread. In this case each thread would
always use separate allocator. No neccessary to have locks, cas operations, no races...

Let's move the discussion of ThreadSafeAllocator to https://reviews.llvm.org/D133713 since this patch just uses it and the implementation is over there.

The background of this data structure is to use by a CAS, so it is ideal that the CAS doesn't need to lock to the amount of threads that is going to be spawned or rely on the thread id.
You can have a thread local allocator that allocates the value to be stored, you just need to do the allocation in insertLazy with your own allocator and manage its life time.

But having a fast concurrent BumpPtrAllocator would be independently useful, and I'd suggest optimizing the allocator before bloating the default trie size.

+1 for fast concurrent ThreadSafeBumpPtrAllocator.

What do you think about following alternative implementation?

class ThreadSafeBumpPtrAllocator {
  ThreadSafeBumpPtrAllocator() {
    size_t ThreadsNum = ThreadPoolStrategy.compute_thread_count();
    allocators.resize(ThreadsNum);
  }
  
  void* Allocate (size_t Num) {
      size_t AllocatorIdx = getThreadIdx();
      
      return allocators[AllocatorIdx].Allocate(Num);
  }

  std::vector<BumpPtrAllocator> allocators;
};

static thread_local ThreadIdx;

size_t getThreadIdx() {
  return ThreadIdx;
}

This implementation uses the fact that ThreadPoolExecutor creates a fixed number
of threads(ThreadPoolStrategy.compute_thread_count()) and keeps them until destructed
. ThreadPoolExecutor can initialise thread local field ThreadIdx to the proper thread index.
The getThreadIdx() could return index of thread inside ThreadPoolExecutor.Threads.
ThreadSafeBumpPtrAllocator keeps separate allocator for each thread. In this case each thread would
always use separate allocator. No neccessary to have locks, cas operations, no races...

+1; seems worth experimenting with (downside is you have at least as many allocations as active threads, but maybe that’s fine); IIRC C++11 thread-local initialization is slow on Darwin so we might want __thread here, or maybe the static makes it fast; +1 to Steven’s comment this should move to the other review (also IMO this could be an incremental improvement that lands later).

@dexonsmith @benlangmuir @akyrtzi
Does TrieHashMap sound good to you?

Locally, I updated to use std::pair as return value to make it more map like, and remove the dump method. I will update the patch once we have the name we agreed on.

I guess one concern with TrieHashMap is that if this is the lower level implementation, and someone might implement a more map-like API on top of this, we might not want to take the "better" name for the data structure that'll be less directly used?

Could prefix with "Raw" or maybe TrieRawHashMap? (since it's the hashing part that's particularly "raw" - relying on the hash being unique, etc)

I guess one concern with TrieHashMap is that if this is the lower level implementation, and someone might implement a more map-like API on top of this, we might not want to take the "better" name for the data structure that'll be less directly used?

Could prefix with "Raw" or maybe TrieRawHashMap? (since it's the hashing part that's particularly "raw" - relying on the hash being unique, etc)

Either of those “raw” names SGTM.

Maybe RawHashTrieMap? It reads better when Raw is in the front, and it contains hash-trie and trie-map, which are both terms describing data structures similar to this but this is much simpler, thus raw.

llvm/unittests/ADT/HashMappedTrieTest.cpp
19

I didn't see this comment.

Can you elaborate more how this work? Since friend doesn't inherit and TEST_F are subclasses of the test fixture, the best I can think is to create forwarding in the test fixture, then it is not that much different from what it is now.

Maybe RawHashTrieMap? It reads better when Raw is in the front, and it contains hash-trie and trie-map, which are both terms describing data structures similar to this but this is much simpler, thus raw.

I think from a client perspective, this is some sort of variant of a HashMap, so ending with that makes sense to me. It’s an implementation detail that it’s a trie so that seems better in the prefix.

dexonsmith added inline comments.Jan 6 2023, 6:06 PM
llvm/unittests/ADT/HashMappedTrieTest.cpp
19

I think:

namespace llvm::testing {
class HashMappedTrieTest;
}

namespace llvm {
class HashMappedTrie {
  friend class HashMappedTrieTest;
}
}

And then use the llvm::testing namespace to implement the test.

steven_wu updated this revision to Diff 487489.Jan 9 2023, 9:55 AM

Rename to TrieRawHashMap.

Update tests to use test fixture.

steven_wu retitled this revision from [ADT] Add HashMappedTrie to [ADT] Add TrieRawHashMap.Jan 9 2023, 9:56 AM
steven_wu edited the summary of this revision. (Show Details)
dblaikie added inline comments.Jan 10 2023, 12:32 PM
llvm/unittests/ADT/HashMappedTrieTest.cpp
19

https://stackoverflow.com/questions/2396370/how-to-make-google-test-classes-friends-with-my-classes seems to discuss how to friend a fixture directly, though may require gtest support/inclusion in the header.

llvm/unittests/ADT/TrieRawHashMapTest.cpp
293 ↗(On Diff #487489)

This seems like a fair bit of text to wrap the trie - what's the value/extra test coverage this is providingg? (sorry I'm not following too clearly) I'd have hoped/expected this API could be tested more directly, without needing this wrapper and/or without needing to involve something as non-trivial as std::string (instead using simple user defined types in the test)?

steven_wu added inline comments.Jan 10 2023, 2:11 PM
llvm/unittests/ADT/HashMappedTrieTest.cpp
19

I am not sure about adding friend class in library header that references tests (and for each tests added).

What do you think about the current implementation? There is an extra forwarding in the test helper but not too bad.

llvm/unittests/ADT/TrieRawHashMapTest.cpp
293 ↗(On Diff #487489)

This a simplified version of the original test from the original patch. The original patch implements a generic set data structure in this test file and tests how it trie can be used like a set. Now it is simplified to be a stringset with a fake hash algorithm because there is no current value to put TrieSet in ADT.

dblaikie added inline comments.Jan 10 2023, 4:28 PM
llvm/unittests/ADT/HashMappedTrieTest.cpp
19

It'd only be for the one test fixture here (HashMappedTrieTest), I think. Feels like it'd be nice to avoid the indirection through the HashMappedTrieTestHelper.

llvm/unittests/ADT/TrieRawHashMapTest.cpp
293 ↗(On Diff #487489)

What extra test coverage does it provide? How bad/what would be the tradeoff of testing that functionality more directly without the mock?

steven_wu added inline comments.Jan 13 2023, 10:00 AM
llvm/unittests/ADT/HashMappedTrieTest.cpp
19

Sorry I still don't quite get it. Current TrieRawHashMapTestHelper is the only test fixture, but I have to put the indirection in there unless I want to friend every single tests (currently 3) in llvm/ADT/TrieRawHashMap.h using FRIEND_TEST. Am I missing something?

llvm/unittests/ADT/TrieRawHashMapTest.cpp
293 ↗(On Diff #487489)

The extra test coverage it provides is the ability to store a non-POD type data into the Trie. where insertLazy is tested for lazy construction. I guess we can also tested it with the naive uint64_t.

The alternative is might be just put ThreadSafeHashMappedTrieSet from the original patch into a header in ADT but there isn't a use case for that yet. So maybe testing insertLazy with uint64_t will make this cleaner.

Update unittest:

  • Simplify the test of StringSet by explicitly calling insertLazy on TrieHashMap.
  • Unify all the tests under the same template test fixture. Since we create this indirection, might as well use it to the full potential.
steven_wu updated this revision to Diff 489861.Jan 17 2023, 9:40 AM

Update after makeArrayRef deprecation