This is an archive of the discontinued LLVM Phabricator instance.

Adding reserve and capacity methods to FoldingSet
ClosedPublic

Authored by bcraig on Jun 2 2016, 2:31 PM.

Details

Summary

When a hash table becomes too heavily loaded, it needs to grow and rebucket everything. This is very expensive for large tables, as it completely trashes the cache. Rehashing everything isn't cheap either.

This patch makes it possible to reserve space in FoldingSet, much as one would do with a std::vector. New unit tests were also added.

The clang static analyzer will likely be the first client of this change. Reserving nodes up front yields a 6.4% analysis speedup on a large .C file.

Diff Detail

Event Timeline

bcraig updated this revision to Diff 59456.Jun 2 2016, 2:31 PM
bcraig retitled this revision from to Adding reserve and capacity methods to FoldingSet.
bcraig updated this object.
bcraig added reviewers: bkramer, chandlerc, mehdi_amini.
bcraig added a subscriber: llvm-commits.
bcraig added a comment.Jun 2 2016, 2:43 PM

Related clang static analyzer review: http://reviews.llvm.org/D20933

bkramer added inline comments.Jun 3 2016, 3:36 AM
lib/Support/FoldingSet.cpp
304

So the capacity() is NumBuckets*2 and we're growing to that capacity? I'm confused. Either one of those interfaces is wrong or we're just missing docs.

bcraig added inline comments.Jun 3 2016, 6:24 AM
lib/Support/FoldingSet.cpp
304

GrowHashTable() is private. It has historically been called without a parameter to double the size of the hash table.
GrowHashTable(unsigned NewBucketCount) is also private. It operates in terms of bucket count.

reserve and capacity are public. They operate in terms of elements.

FoldingSet allows for twice as many elements as buckets, both before and after this patch.

This means that the math for doubling the size of the hash table (NumBuckets * 2) is the same math required to convert from buckets to element capacity (NumBuckets * 2).

bkramer accepted this revision.Jun 3 2016, 6:27 AM
bkramer edited edge metadata.

I see. So capacity() is already taking the chaining into account, while GrowHashTable does not. That makes sense. It would be nice if we had a comment describing this behavior somewhere, but this patch is fine either way.

This revision is now accepted and ready to land.Jun 3 2016, 6:27 AM
bcraig updated this revision to Diff 59544.Jun 3 2016, 6:36 AM
bcraig edited edge metadata.

Changed new GrowHashTable overload name to a non-overloaded GrowBucketCount.
Added a comment about load factor to capacity.

bcraig closed this revision.Jun 3 2016, 7:04 AM

r271669

Ben,

It would be a bit easier to review if you upload the diff with full context.

The patch LGTM.

include/llvm/ADT/FoldingSet.h
184

Please, add a comment describing the capacity() API.

bcraig added inline comments.Jun 6 2016, 11:14 AM
include/llvm/ADT/FoldingSet.h
184

reserve and capacity documentation added in r271694.