This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/
3
__hash_table
-
test/libcxx/containers/unord/
-
libcxx/
-
containers/
-
unord/
-
next_pow2.pass.cpp

Differential D33588

Fix two sources of UB in __next_hash_pow2 (from __hash_table)
ClosedPublic

Authored by vsk on May 26 2017, 2:14 AM.

Download Raw Diff

Details

Reviewers

mclow.lists
EricWF

Summary

Fix two sources of UB in next_hash_pow2 (from hash_table)

There are two sources of undefined behavior in next_hash_pow2: an
invalid shift (occurs when n is 0 or 1), and an invalid call to CLZ
(occurs when __n is 1).

This patch corrects both issues. It's NFC for all values of n which do
not trigger UB, and leads to defined results when n is 0 or 1.

Minimal reproducer:

unordered_map<uint64_t, unsigned long> m;
m.reserve(4);
m.reserve(1);

rdar://problem/32407328

Diff Detail

Event Timeline

vsk created this revision.May 26 2017, 2:14 AM

vsk edited the summary of this revision. (Show Details)May 26 2017, 2:18 AM

I can reproduce this, but I'd rather figure out why we're calling __next_hash_pow2(0) or (1) before deciding how to fix it.

In D33588#765768, @mclow.lists wrote:

I can reproduce this, but I'd rather figure out why we're calling __next_hash_pow2(0) or (1) before deciding how to fix it.

It looks like we hit the UB while attempting to shrink a hash table during a rehash. If the current bucket count is a power of two, we try and find a smaller bucket count (also a power of two) large enough to accommodate all entries in the table.

The argument passed in to next_hash_pow2 from hash_table::rehash is __n = size_t(ceil(float(size()) / max_load_factor())). I think __n = 0 if the table is empty. And __n = 1 when the maximum load factor is (roughly) equal to the table's size.

As an alternate fix, it might be worth considering changing the rehashing algorithm. But I'd like to start with a more conservative fix for the UB issue, at least initially.

@vsk: I would include your fuzzing test in this patch. Simply put it somewhere under test/libcxx/containers/unordered.

Thanks @EricWF for pointing me to the right place to add a test. I've tried to follow the style used by the existing tests. PTAL.

EricWF added inline comments.May 31 2017, 5:59 PM

include/__hash_table
139	Shouldn't this return `__n + 1` when `__n <= 1`, or even 2 in both cases?

vsk added inline comments.Jun 1 2017, 10:27 AM

include/__hash_table
139	I thought "next_hash_pow2(n)" meant "a hash based on n or the first power-of-two GTE n", but I suppose it's actually "first power-of-two GTE than n, for hash tables". In this case, returning `1` when `__n <= 1` makes the most sense to me, since it's the first power of two GTE 0 and 1. What do you think?

mclow.lists accepted this revision.Jun 2 2017, 5:09 PM

mclow.lists added inline comments.

include/__hash_table
140	I turned the condition around, and commtted this as r304617: return __n < 2 ? __n : (size_t(1) << (std::numeric_limits<size_t>::digits - __clz(__n-1)));

This revision is now accepted and ready to land.Jun 2 2017, 5:09 PM

mclow.lists closed this revision.Jun 7 2017, 9:31 AM

Revision Contents

Path

Size

include/

__hash_table

2 lines

test/

libcxx/

containers/

unord/

next_pow2.pass.cpp

80 lines

Diff 100461

include/__hash_table

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	__constrain_hash(size_t __h, size_t __bc)
return !(__bc & (__bc - 1)) ? __h & (__bc - 1) :		return !(__bc & (__bc - 1)) ? __h & (__bc - 1) :
(__h < __bc ? __h : __h % __bc);		(__h < __bc ? __h : __h % __bc);
}		}

inline _LIBCPP_INLINE_VISIBILITY		inline _LIBCPP_INLINE_VISIBILITY
size_t		size_t
__next_hash_pow2(size_t __n)		__next_hash_pow2(size_t __n)
{		{
return size_t(1) << (std::numeric_limits<size_t>::digits - __clz(__n-1));		return (__n > 1) ? (size_t(1) << (std::numeric_limits<size_t>::digits - __clz(__n-1))) : __n;
		EricWFUnsubmitted Not Done Reply Inline Actions Shouldn't this return `__n + 1` when `__n <= 1`, or even 2 in both cases? EricWF: Shouldn't this return `__n + 1` when `__n <= 1`, or even 2 in both cases?
		vskAuthorUnsubmitted Not Done Reply Inline Actions I thought "next_hash_pow2(n)" meant "a hash based on n or the first power-of-two GTE n", but I suppose it's actually "first power-of-two GTE than n, for hash tables". In this case, returning `1` when `__n <= 1` makes the most sense to me, since it's the first power of two GTE 0 and 1. What do you think? vsk: I thought "next_hash_pow2(n)" meant "a hash based on n or the first power-of-two GTE n", but I…
}		}
		mclow.listsUnsubmitted Not Done Reply Inline Actions I turned the condition around, and commtted this as r304617: return __n < 2 ? __n : (size_t(1) << (std::numeric_limits<size_t>::digits - __clz(__n-1))); mclow.lists: I turned the condition around, and commtted this as r304617: return __n < 2 ? __n…


template <class _Tp, class _Hash, class _Equal, class _Alloc> class __hash_table;		template <class _Tp, class _Hash, class _Equal, class _Alloc> class __hash_table;

template <class _NodePtr> class _LIBCPP_TEMPLATE_VIS __hash_iterator;		template <class _NodePtr> class _LIBCPP_TEMPLATE_VIS __hash_iterator;
template <class _ConstNodePtr> class _LIBCPP_TEMPLATE_VIS __hash_const_iterator;		template <class _ConstNodePtr> class _LIBCPP_TEMPLATE_VIS __hash_const_iterator;
template <class _NodePtr> class _LIBCPP_TEMPLATE_VIS __hash_local_iterator;		template <class _NodePtr> class _LIBCPP_TEMPLATE_VIS __hash_local_iterator;
template <class _ConstNodePtr> class _LIBCPP_TEMPLATE_VIS __hash_const_local_iterator;		template <class _ConstNodePtr> class _LIBCPP_TEMPLATE_VIS __hash_const_local_iterator;
▲ Show 20 Lines • Show All 2,524 Lines • Show Last 20 Lines

test/libcxx/containers/unord/next_pow2.pass.cpp

This file was added.

				//===----------------------------------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is dual licensed under the MIT and the University of Illinois Open
				// Source Licenses. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// REQUIRES: long_tests

				// Not a portable test

				// <__hash_table>

				// size_t __next_hash_pow2(size_t n);

				// If n <= 1, return n. If n is a power of 2, return n.
				// Otherwise, return the next power of 2.

				#include <__hash_table>
				#include <unordered_map>
				#include <cassert>

				#include <iostream>

				bool
				is_power_of_two(unsigned long n)
				{
				return __builtin_popcount(n) == 1;
				}

				void
				test_next_pow2()
				{
				assert(!is_power_of_two(0));
				assert(is_power_of_two(1));
				assert(is_power_of_two(2));
				assert(!is_power_of_two(3));

				assert(std::__next_hash_pow2(0) == 0);
				assert(std::__next_hash_pow2(1) == 1);

				for (std::size_t n = 2; n < (sizeof(std::size_t) * 8 - 1); ++n)
				{
				std::size_t pow2 = 1ULL << n;
				assert(std::__next_hash_pow2(pow2) == pow2);
				}

				for (std::size_t n : {3, 7, 9, 15, 127, 129})
				{
				std::size_t npow2 = std::__next_hash_pow2(n);
				assert(is_power_of_two(npow2) && npow2 > n);
				}
				}

				// Note: this is only really useful when run with -fsanitize=undefined.
				void
				fuzz_unordered_map_reserve(unsigned num_inserts,
				unsigned num_reserve1,
				unsigned num_reserve2)
				{
				std::unordered_map<uint64_t, unsigned long> m;
				m.reserve(num_reserve1);
				for (unsigned I = 0; I < num_inserts; ++I) m[I] = 0;
				m.reserve(num_reserve2);
				assert(m.bucket_count() >= num_reserve2);
				}

				int main()
				{
				test_next_pow2();

				for (unsigned num_inserts = 0; num_inserts <= 64; ++num_inserts)
				for (unsigned num_reserve1 = 1; num_reserve1 <= 64; ++num_reserve1)
				for (unsigned num_reserve2 = 1; num_reserve2 <= 64; ++num_reserve2)
				fuzz_unordered_map_reserve(num_inserts, num_reserve1, num_reserve2);

				return 0;
				}