Utilize atomics to update linked lists without requiring locks. Reduces the number of calls to compute the hash by not having to compute it once to determine the bucket and then again inside a separate hashtable implementation. Use xxhash for better performance, especially with longer strings. Eliminates calls to hash when just updating the value by being able to use atomics instead of locks.
Note, I don't expect you guys to want this as is, since the # of buckets can't grow, and thus for very large applications this will behave O(n) instead of O(1). Before I bother to convert this to 1M skip lists (instead of 1M singly linked lists), is this something you're at all interested in? It brings execution time of my test down from ~4.5s to ~3.5s (lldb -b -o 'b main' -o 'run' my_test_program).
Do you have performance numbers here? I am all for it if it improves performance. If this is faster than llvm::StringMap why not just fix it there? Seems like everyone could benefit if you fix it in LLVM?
Sorry missed your first note about perf. This has to be able to handle any number of strings efficiently and growing buckets is probably needed. How many buckets are you starting with here? The design probably shouldn't take up gobs of memory if it just has a few strings as well.
Do we really need to use a homegrown hash table here? Can we not leverage std::unordered_map or some other hash table in LLVM here?
A proper lockfree hashtable that allows deletions is complicated and slow for simple cases, IME.
ConstString is a special case, because
- It's big
- It's insert-only
- It's highly concurrent
I think a proper lock free SkipList would be a better fit for LLVM though, and this code could then build upon it, which would solve it's existing scalability issue.
Truth is this isn't a huge deal (ignoring the HashString->xxHash change and assuming tcmalloc, it's saving 1+ seconds, out of 4.5-5ish). It might lead itself to a custom allocator, though, which may make linking with tcmalloc unnecessary (something like BumpPtrAllocator, but threadsafe, or maybe just thread local). Again, a trick you can get away with since this is insert-only.
So to break it down:
- the HashString->xxHash change could be made separately if there was a version of StringMap that took the hash as a parameter, rather than running the hash function itself. That would reduce the # of calls from 2 to 1, and would allow the use of a function that is better suited to lldb (I ran into resistance changing llvm::HashString since clang depends on it, and I couldn't prove it wouldn't hurt.
- 64-bit hashes for fewer collisions - StringMap uses 32-bit hashes, which makes sense for smaller hash tables. This may or may not matter much.
- no lock for reading or updating value - GetMangledCounterpart, SetMangledCounterparts, and half of GetConstCStringAndSetMangledCountrpart no longer compute a hash, and don't need a lock. You can actually do this with StringMap by using a struct as the value side that has std::atomic in it
- much finer granularity - I tried increasing the # of buckets in the old ConstString but could never improve performance (though that was without tcmalloc, maybe it'd work now)
- lockfree insertion
and possibly in the future:
- BumpPtrAllocator for struct Entry - maybe eliminate the need for tcmalloc (well, assuming the demangler used an arena allocator, something else I gave up rather than trying to convince libcxxabi to change, then having to port that back to llvm).
I haven't measured how much each of these changes matter. This change is small enough and in a file that changes infrequently, so we can keep this as a private patch if we have to.
I have feeling you gave up on the llvm change too quickly. My interpretation of that thread was that there was general support for the hash function switch, and people only wanted some confirmation it will not regress.
However, I do believe that this can be made faster than any solution based on a generic string map (but I don't like hardcoding bucket numbers either). For a change of this depth, I also think it would be fair to ask for some ConstString unit tests, so we have something we can run thread sanitizer on, instead of tracking down obscure debugger failures.
s/StringPoolEntryType/Pool::Entry/ instead of the typedef?