There are couple of optimizations of __hash_table::find which are applicable
to other places like __hash_table::__node_insert_unique_prepare and
__hash_table::__emplace_unique_key_args.
for (__nd = __nd->__next_; __nd != nullptr && (__nd->__hash() == __hash // ^^^^^^^^^^^^^^^^^^^^^^ // (1) || std::__constrain_hash(__nd->__hash(), __bc) == __chash); __nd = __nd->__next_) { if ((__nd->__hash() == __hash) // ^^^^^^^^^^^^^^^^^^^^^^^^^^ // (2) && key_eq()(__nd->__upcast()->__value_, __k)) return iterator(__nd, this); }
(1): We can avoid expensive modulo operations from std::__constrain_hash if
hashes matched. This one is from commit 6a411472e3c4.
(2): We can avoid key_eq calls if hashes didn't match. Commit:
318d35a7bca6c4e5.
Both of them are applicable for inser and emplace methods.
Results of unordered_set_operations benchmark:
Before: -------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------------- BM_InsertValue/unordered_set_uint32/1024 38521 ns 38520 ns 18174 BM_InsertValue/unordered_set_uint32_sorted/1024 29297 ns 29296 ns 23804 BM_InsertValue/unordered_set_top_bits_uint32/1024 32547 ns 32546 ns 21655 BM_InsertValueRehash/unordered_set_top_bits_uint32/1024 56896 ns 56895 ns 12334 BM_InsertValue/unordered_set_string/1024 277956 ns 277952 ns 2487 BM_InsertValueRehash/unordered_set_string/1024 578982 ns 578912 ns 1224 BM_InsertDuplicate/unordered_set_int/1024 11793 ns 11793 ns 61427 BM_InsertDuplicate/unordered_set_string/1024 164009 ns 164005 ns 4339 BM_EmplaceDuplicate/unordered_set_int/1024 12728 ns 12728 ns 57487 BM_EmplaceDuplicate/unordered_set_string/1024 164516 ns 164512 ns 4251 BM_InsertDuplicate/unordered_set_int_insert_arg/1024 11366 ns 11365 ns 62265 BM_InsertDuplicate/unordered_set_string_insert_arg/1024 165776 ns 165772 ns 4225 BM_EmplaceDuplicate/unordered_set_int_insert_arg/1024 42768 ns 42765 ns 16301 BM_EmplaceDuplicate/unordered_set_string_arg/1024 371531 ns 371524 ns 1885 After: -------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------------------------- BM_InsertValue/unordered_set_uint32/1024 38988 ns 38985 ns 17789 BM_InsertValue/unordered_set_uint32_sorted/1024 28707 ns 28707 ns 24285 BM_InsertValue/unordered_set_top_bits_uint32/1024 32790 ns 32788 ns 21272 BM_InsertValueRehash/unordered_set_top_bits_uint32/1024 58293 ns 58281 ns 12329 BM_InsertValue/unordered_set_string/1024 303982 ns 303894 ns 2352 BM_InsertValueRehash/unordered_set_string/1024 608259 ns 608121 ns 1133 BM_InsertDuplicate/unordered_set_int/1024 12377 ns 12375 ns 56721 BM_InsertDuplicate/unordered_set_string/1024 167777 ns 167752 ns 4167 BM_EmplaceDuplicate/unordered_set_int/1024 11469 ns 11468 ns 61094 BM_EmplaceDuplicate/unordered_set_string/1024 165482 ns 165450 ns 4245 BM_InsertDuplicate/unordered_set_int_insert_arg/1024 11945 ns 11945 ns 57164 BM_InsertDuplicate/unordered_set_string_insert_arg/1024 164507 ns 164482 ns 4294 BM_EmplaceDuplicate/unordered_set_int_insert_arg/1024 41152 ns 41151 ns 17134 BM_EmplaceDuplicate/unordered_set_string_arg/1024 392580 ns 392529 ns 1808
Time differences looks more like noise to me. But if we want to have this
optimizations in find, we probably want them in insert and emplace as
well.