This is an archive of the discontinued LLVM Phabricator instance.

sanitizer_common: add HashMap class
AbandonedPublic

Authored by dvyukov on Nov 16 2021, 6:56 AM.

Details

Reviewers
melver
vitalybuka
Summary

HashMap is a single-threaded, dynamically-resized hashmap
with open addressing and linear probing.
Will be used in subsequent commits.

Depends on D113922.

Diff Detail

Event Timeline

dvyukov created this revision.Nov 16 2021, 6:56 AM
dvyukov requested review of this revision.Nov 16 2021, 6:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2021, 6:56 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
melver added inline comments.Nov 16 2021, 8:03 AM
compiler-rt/lib/sanitizer_common/sanitizer_hashmap.h
104

sanitizer_common.h has IsPowerOfTwo() which is equivalent

compiler-rt/lib/sanitizer_common/tests/sanitizer_hashmap_test.cpp
70

For new datastructures like this I find a random test that does differential fuzzing of sorts is quite helpful.

Something like:

RandOp() // returns random op
RandKey() // returns random key in range [A;B]
RandVal() // returns random val in range [X;Y]

std::unordered_map<int, long> reference; // assumes it's correct
HashMap<int, long> under_test;

for (int iter = 0; iter < 10000; iter++) {
  switch(RandOp()) {
    case kInsert: {
      int k = RandKey();
      int v = RandVal();
      under_test.Insert(k, v);
      reference.emplace_back(k, v);
    }
    case kRemove: // check that removal matches
    case kFind: // check that found values match
  }
  // check that various properties match, in this case probably just that size matches.
}
...
melver added inline comments.Nov 16 2021, 8:20 AM
compiler-rt/lib/sanitizer_common/sanitizer_hashmap.h
39

I think this pre-condition is excessive and might actually cause subtle mistakes by callers that don't expect this or don't care.

Right now I see you use DCHECK() everywhere to check that the key isn't already present, but then Insert does another CHECK.

I think a cleaner API would be to mark this function [[nodiscard]] (do we already have a macro?) and have it return a bool if insertion succeeded or not. That way the decision if the key should not be present is pushed to the caller, for who it might matter or not.

But there's no fundamental limitation to why Insert can't be called with the key already present. I'd make the behaviour similar to unordered_map::emplace().

dvyukov updated this revision to Diff 387676.Nov 16 2021, 9:19 AM
dvyukov marked 2 inline comments as done.

return bool from Insert
add random test

dvyukov marked an inline comment as done.Nov 16 2021, 9:20 AM

PTAL

melver accepted this revision.Nov 16 2021, 9:29 AM

Nice.

This revision is now accepted and ready to land.Nov 16 2021, 9:29 AM

How about I upload patch with DenseMap forked from LLVM? it needs just a few changes for compiler-rt?

How about I upload patch with DenseMap forked from LLVM? it needs just a few changes for compiler-rt?

Hard to say, DenseMap*.h is 1000+ LOC... what runtime difference does it make?
Can it be made to use a single large mmap? I think that' the only solution that will work well for sanitizer runtimes (doing lots of small allocations will require passing in allocator cache, and increasing in size allocations won't work well because we don't reuse memory between size classes, so it will be quadratic memory w/o mmap).
Overall I am happy to use anything that allows me to write Map<K,V>.

How about I upload patch with DenseMap forked from LLVM? it needs just a few changes for compiler-rt?

Hard to say, DenseMap*.h is 1000+ LOC... what runtime difference does it make?

few changes (without counting trowing away most of the stuff, like iterators and small set)
It's not about runtime, I already did that for my current work, in local checkout.

Can it be made to use a single large mmap? I think that' the only solution that will work well for sanitizer runtimes (doing lots of small allocations will require passing in allocator cache, and increasing in size allocations won't work well because we don't reuse memory between size classes, so it will be quadratic memory w/o mmap).

Yes. That's how I use it.

Overall I am happy to use anything that allows me to write Map<K,V>.

The same for me. I guess I can prefectly use HashMap as well. Quadratic probing is nice but is not requirement for me.
I'll cleanup my stuff and upload. I did that patch about a month ago, maybe it looks worse than I remember.

compiler-rt/lib/sanitizer_common/sanitizer_hashmap.h
25

constexpr constructor would be nice, but I can fix it later for my needs.

39

+1 it would be nice to have more conventional interface where it returns <KV*, bool new_inserted>

40

const K &k, const V &v and below?

53

this one a little bit scary.
presence of a single cluster with 8 elements will trigger grow. Then Grow will redistributed them into precisely two hashes with high prob to create another large cluster, trigger another Grow, making load factor of the table quite low.

Can be fixed in followup patches if needed.

74

can you use sentinel values here instead bool flag which will add padding?

vitalybuka resigned from this revision.Jan 6 2022, 4:08 PM

I amuse this can be abandoned?

dvyukov abandoned this revision.Jan 7 2022, 5:42 AM