Page MenuHomePhabricator

Use hash value checks optimizations consistently
Needs ReviewPublic

Authored by ilvokhin on Dec 30 2022, 6:07 AM.

Details

Reviewers
ldionne
EricWF
Group Reviewers
Restricted Project
Summary

There are couple of optimizations of __hash_table::find which are applicable
to other places like __hash_table::__node_insert_unique_prepare and
__hash_table::__emplace_unique_key_args.

for (__nd = __nd->__next_; __nd != nullptr &&
    (__nd->__hash() == __hash
    // ^^^^^^^^^^^^^^^^^^^^^^
    //         (1)
      || std::__constrain_hash(__nd->__hash(), __bc) == __chash);
                                               __nd = __nd->__next_)
{   
    if ((__nd->__hash() == __hash)
    // ^^^^^^^^^^^^^^^^^^^^^^^^^^
    //           (2)
        && key_eq()(__nd->__upcast()->__value_, __k))
        return iterator(__nd, this);
}

(1): We can avoid expensive modulo operations from std::__constrain_hash if
hashes matched. This one is from commit 6a411472e3c4.
(2): We can avoid key_eq calls if hashes didn't match. Commit:
318d35a7bca6c4e5.

Both of them are applicable for inser and emplace methods.

Results of unordered_set_operations benchmark:

Before:
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BM_InsertValue/unordered_set_uint32/1024                     38521 ns        38520 ns        18174
BM_InsertValue/unordered_set_uint32_sorted/1024              29297 ns        29296 ns        23804
BM_InsertValue/unordered_set_top_bits_uint32/1024            32547 ns        32546 ns        21655
BM_InsertValueRehash/unordered_set_top_bits_uint32/1024      56896 ns        56895 ns        12334
BM_InsertValue/unordered_set_string/1024                    277956 ns       277952 ns         2487
BM_InsertValueRehash/unordered_set_string/1024              578982 ns       578912 ns         1224
BM_InsertDuplicate/unordered_set_int/1024                    11793 ns        11793 ns        61427
BM_InsertDuplicate/unordered_set_string/1024                164009 ns       164005 ns         4339
BM_EmplaceDuplicate/unordered_set_int/1024                   12728 ns        12728 ns        57487
BM_EmplaceDuplicate/unordered_set_string/1024               164516 ns       164512 ns         4251
BM_InsertDuplicate/unordered_set_int_insert_arg/1024         11366 ns        11365 ns        62265
BM_InsertDuplicate/unordered_set_string_insert_arg/1024     165776 ns       165772 ns         4225
BM_EmplaceDuplicate/unordered_set_int_insert_arg/1024        42768 ns        42765 ns        16301
BM_EmplaceDuplicate/unordered_set_string_arg/1024           371531 ns       371524 ns         1885

After:
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BM_InsertValue/unordered_set_uint32/1024                     38988 ns        38985 ns        17789
BM_InsertValue/unordered_set_uint32_sorted/1024              28707 ns        28707 ns        24285
BM_InsertValue/unordered_set_top_bits_uint32/1024            32790 ns        32788 ns        21272
BM_InsertValueRehash/unordered_set_top_bits_uint32/1024      58293 ns        58281 ns        12329
BM_InsertValue/unordered_set_string/1024                    303982 ns       303894 ns         2352
BM_InsertValueRehash/unordered_set_string/1024              608259 ns       608121 ns         1133
BM_InsertDuplicate/unordered_set_int/1024                    12377 ns        12375 ns        56721
BM_InsertDuplicate/unordered_set_string/1024                167777 ns       167752 ns         4167
BM_EmplaceDuplicate/unordered_set_int/1024                   11469 ns        11468 ns        61094
BM_EmplaceDuplicate/unordered_set_string/1024               165482 ns       165450 ns         4245
BM_InsertDuplicate/unordered_set_int_insert_arg/1024         11945 ns        11945 ns        57164
BM_InsertDuplicate/unordered_set_string_insert_arg/1024     164507 ns       164482 ns         4294
BM_EmplaceDuplicate/unordered_set_int_insert_arg/1024        41152 ns        41151 ns        17134
BM_EmplaceDuplicate/unordered_set_string_arg/1024           392580 ns       392529 ns         1808

Time differences looks more like noise to me. But if we want to have this
optimizations in find, we probably want them in insert and emplace as
well.

Diff Detail

Event Timeline

ilvokhin created this revision.Dec 30 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2022, 6:07 AM
ilvokhin requested review of this revision.Dec 30 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2022, 6:07 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript

Friendly ping: @ldionne, @EricWF I wonder if you have any thoughts on the change.