This is an archive of the discontinued LLVM Phabricator instance.

Fix two sources of UB in __next_hash_pow2 (from __hash_table)
ClosedPublic

Authored by vsk on May 26 2017, 2:14 AM.

Details

Summary

Fix two sources of UB in next_hash_pow2 (from hash_table)

There are two sources of undefined behavior in next_hash_pow2: an
invalid shift (occurs when
n is 0 or 1), and an invalid call to CLZ
(occurs when __n is 1).

This patch corrects both issues. It's NFC for all values of n which do
not trigger UB, and leads to defined results when
n is 0 or 1.

Minimal reproducer:

unordered_map<uint64_t, unsigned long> m;
m.reserve(4);
m.reserve(1);

rdar://problem/32407328

Diff Detail

Event Timeline

vsk created this revision.May 26 2017, 2:14 AM
vsk edited the summary of this revision. (Show Details)May 26 2017, 2:18 AM
mclow.lists edited edge metadata.May 26 2017, 8:20 AM

I can reproduce this, but I'd rather figure out why we're calling __next_hash_pow2(0) or (1) before deciding how to fix it.

vsk added a comment.May 26 2017, 9:26 AM

I can reproduce this, but I'd rather figure out why we're calling __next_hash_pow2(0) or (1) before deciding how to fix it.

It looks like we hit the UB while attempting to shrink a hash table during a rehash. If the current bucket count is a power of two, we try and find a smaller bucket count (also a power of two) large enough to accommodate all entries in the table.

The argument passed in to next_hash_pow2 from hash_table::rehash is __n = size_t(ceil(float(size()) / max_load_factor())). I think __n = 0 if the table is empty. And __n = 1 when the maximum load factor is (roughly) equal to the table's size.

As an alternate fix, it might be worth considering changing the rehashing algorithm. But I'd like to start with a more conservative fix for the UB issue, at least initially.

EricWF edited edge metadata.May 26 2017, 10:22 AM

@vsk: I would include your fuzzing test in this patch. Simply put it somewhere under test/libcxx/containers/unordered.

vsk updated this revision to Diff 100461.May 26 2017, 1:20 PM
vsk edited the summary of this revision. (Show Details)

Thanks @EricWF for pointing me to the right place to add a test. I've tried to follow the style used by the existing tests. PTAL.

EricWF added inline comments.May 31 2017, 5:59 PM
include/__hash_table
139

Shouldn't this return __n + 1 when __n <= 1, or even 2 in both cases?

vsk added inline comments.Jun 1 2017, 10:27 AM
include/__hash_table
139

I thought "next_hash_pow2(n)" meant "a hash based on n or the first power-of-two GTE n", but I suppose it's actually "first power-of-two GTE than n, for hash tables". In this case, returning 1 when __n <= 1 makes the most sense to me, since it's the first power of two GTE 0 and 1. What do you think?

mclow.lists accepted this revision.Jun 2 2017, 5:09 PM
mclow.lists added inline comments.
include/__hash_table
140

I turned the condition around, and commtted this as r304617:

return __n < 2 ? __n : (size_t(1) << (std::numeric_limits<size_t>::digits - __clz(__n-1)));
This revision is now accepted and ready to land.Jun 2 2017, 5:09 PM
mclow.lists closed this revision.Jun 7 2017, 9:31 AM