This is an archive of the discontinued LLVM Phabricator instance.

[esan] Add generic resizing hashtable
ClosedPublic

Authored by bruening on Jul 22 2016, 8:19 AM.

Details

Summary

Adds a new, generic, resizing hashtable data structure for use by esan
tools. No existing sanitizer hashtable is suitable for the use case for
most esan tools: we need non-fixed-size tables, parameterized keys and
payloads, and write access to payloads. The new hashtable uses either
simple internal or external mutex locking and supports pointer types using
overloads to de-reference and allow for custom hash and comparision
operators. The focus is on functionality, not performance, to catalyze
creation of a variety of tools. We can optimize the more successful tools
later.

Adds a test of the data structure.

Diff Detail

Repository
rL LLVM

Event Timeline

bruening updated this revision to Diff 65080.Jul 22 2016, 8:19 AM
bruening retitled this revision from to [esan] Add generic resizing hashtable.
bruening updated this object.
bruening added a reviewer: aizatsky.
bruening added subscribers: llvm-commits, eugenis, kcc and 2 others.
bruening removed a subscriber: zhaoqin.
aizatsky requested changes to this revision.Jul 27 2016, 2:30 PM
aizatsky edited edge metadata.
aizatsky added inline comments.
lib/esan/esan_hashtable.h
24 ↗(On Diff #65080)

Key pointer types...

24 ↗(On Diff #65080)

I actually wonder if this is a functionality you want to build in rather then provide through type traits.

Like this:

template <typename KeyTy, typename DataTy, typename KeyTraits = DefaultTraits<KeyTy>>

And later something like:

KeyTraits::hash(key)
KeyTraits::eq(key1, key2).

42 ↗(On Diff #65080)

any particular reason why not

DataTy* lookup(KeyTy Key) ?

Seems to me it would always save a declaration line for clients.

42 ↗(On Diff #65080)

const

45 ↗(On Diff #65080)

const

66 ↗(On Diff #65080)

const this and other non-changed fields.

This revision now requires changes to proceed.Jul 27 2016, 2:30 PM
bruening marked 2 inline comments as done.Aug 1 2016, 12:22 PM
bruening added inline comments.
lib/esan/esan_hashtable.h
24 ↗(On Diff #65080)

Could you elaborate on what the pros and cons of your proposal are? I assume you mean that we define DefaultTraits to invoke operator== and operator size_t() and the point here is to let the user separate hashing from casting and key equality from some other equality -- though the existing constructor takes in an optional hash function for that purpose, and I'm not sure equality would need to be separated.

42 ↗(On Diff #65080)

any particular reason why not
DataTy* lookup(KeyTy Key) ?

Returning a pointer requires external synchronization. An instantiation with DataTy==int will want an int to be returned by value, and there's no sentinel we can use to indicate failure.

42 ↗(On Diff #65080)

const

I ended up removing my initial const because of the potential lock acquisition: the mutex will change state. I don't think LLVM supports the C++11 mutable keyword?

45 ↗(On Diff #65080)

Same thing here. I can at least put a comment about it.

aizatsky added inline comments.Aug 1 2016, 12:58 PM
lib/esan/esan_hashtable.h
24 ↗(On Diff #65080)

Pros:

  • makes HashTable simpler
  • moves type-based decision into a much-much simpler class
  • type-based decision can be made explicit. Consider:

    HashTable<int*, int> vs HashTable<int*, int, DerefPointers>

I would also suggest to move HashFunc to the trait as well. This would improve performance of the hashing.

The only const that I see is that you need to define a default trait struct. But that looks more like a pro to me :)

28 ↗(On Diff #65080)

Again, I don't like that there's a conditional branch throughout the class based on the field.

The most logical way to me would be to define HashTable without any synchronization (i.e. ExternalLock = true), and create a SynchronizedHashTable : public HashTable with corresponding overrides.

bruening added inline comments.Aug 2 2016, 12:15 PM
lib/esan/esan_hashtable.h
24 ↗(On Diff #65080)

I agree that the table would be improved to move the hash function and comparison function into template parameters, but I'd prefer to not combine them, which forces a user who wants to set just one to define both.

I still like a default that de-references pointers, but after removing the payload-freeing function parameter and relying on the user using a smart pointer to free, and considering that in C++ the payload is easily copied into the hashtable entry structure via templates (unlike C where indirection via a pointer into the heap is the typical solution), which means that the payload is less likely to be a pointer, and references means that the key is also less likely to be a pointer (yes still more of a C programmer than a C++ programmer), I'm fine removing it.

28 ↗(On Diff #65080)

The problem is that inheritance means we pay virtual dispatch costs, while a container (i.e., SynchronizedHashTable contains a HashTable inside it) just means a lot of duplicated boilerplate to wrap everything. I instead made the ExternalLock a template parameter and thus constant at compile time, which eliminates the conditional branches.

bruening updated this revision to Diff 66520.Aug 2 2016, 12:17 PM
bruening edited edge metadata.

Makes the table more like a classic C++ container:
+ Moves the hash function pointer to be a functor template parameter
+ Adds a functor template parameter for the comparision function
+ Moves ExternalLock to be a template paramter to eliminate conditional branches
+ Eliminates the default pointer deref
+ Removes the payload-freeing function pointer in favor of calling the destructor

aizatsky accepted this revision.Aug 5 2016, 12:45 PM
aizatsky edited edge metadata.
This revision is now accepted and ready to land.Aug 5 2016, 12:45 PM
This revision was automatically updated to reflect the committed changes.