This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Don't call key_eq in unordered_map/set rehashing routine
ClosedPublic

Authored by itrofimow on Jun 16 2022, 7:24 PM.

Details

Summary

As of now containers key_eq might get called when rehashing happens, which is redundant for unique keys containers.

Diff Detail

Event Timeline

itrofimow created this revision.Jun 16 2022, 7:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2022, 7:24 PM
itrofimow requested review of this revision.Jun 16 2022, 7:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2022, 7:24 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
itrofimow updated this revision to Diff 437775.Jun 16 2022, 7:43 PM

+ clang-format

itrofimow updated this revision to Diff 437781.Jun 16 2022, 8:13 PM
itrofimow updated this revision to Diff 437787.Jun 16 2022, 8:59 PM
jloser added a subscriber: jloser.Jun 16 2022, 9:15 PM

I recommend against clang-format everywhere here. When uploading the patch, you can specify --nolint to arc so it won't complain about the lack of clang-formatting a particular changeset. The noise ratio compared with the actual content change here is high in my opinion.

itrofimow updated this revision to Diff 437793.Jun 16 2022, 9:57 PM

I recommend against clang-format everywhere here. When uploading the patch, you can specify --nolint to arc so it won't complain about the lack of clang-formatting a particular changeset. The noise ratio compared with the actual content change here is high in my opinion.

Thanks for the suggestion, i definitely will follow - clang-format goes crazy with these files

philnik requested changes to this revision.Jun 17 2022, 3:42 AM
philnik added a subscriber: philnik.

I agree with @jloser that this is a lot of noise. Please either ignore clang-format for now or provide a patch to only reformat the parts you change here and rebase this patch. More important for now: Please provide appropriate benchmarks in libcxx/benchmarks for this if we don't have any and provide the benchmark results.

This revision now requires changes to proceed.Jun 17 2022, 3:42 AM

Thanks for your contribution! I'll have another look after all comment have been addressed.

libcxx/include/__hash_table
863

Can you add some comment why these traits are needed? The reason is in the commit message, but not in the code.

865

Nowadays we tend to use using instead of typedef. This works for all supported compilers in C++98/C++03 mode.

868

I'm not fond of name _Traits it very generic. How about _UniqueKeys?

libcxx/include/unordered_map
1034

As an extension we add unordered_map to C++03.

itrofimow updated this revision to Diff 438077.Jun 17 2022, 5:44 PM

cleanup, + benchmarks

jloser added inline comments.Jun 17 2022, 5:58 PM
libcxx/include/__hash_table
863

Nit: s/reduntand/redundant

868

FWIW, I personally preferred the traits class name holding a (member or type) to denote the one trait of interest: unique_keys. I don't feel too strongly, but I feel it leads to a more open design generally.

2099–2100

Question: what's the libc++ stance for something like this which could be if constexpr in C++17 or later? Is it worth writing

#if _LIBCPP_STD_VER >= 17
if constexpr
#else
if
#endif
(!UniqueKeys::Value) {

I suspect the answer is just writing if for all standards modes and just letting the compiler realize it's always a constant value now up-front.

itrofimow added inline comments.Jun 17 2022, 5:59 PM
libcxx/benchmarks/unordered_set_operations.bench.cpp
311 ↗(On Diff #438077)

This is with the patch:
BM_Rehash/unordered_set_string_arg/1024 44848 ns 44846 ns 96172

This is on trunk:
BM_Rehash/unordered_set_string_arg/1024 55711 ns 55706 ns 74198

full output: https://pastebin.com/7DMcNTna

This benchmark doesn't mean there is an immense speedup in rehashing, it just shows that there is a speedup - max_load_factor(3) was used to be sure that there will be collisions and key_eq branch in rehashing would be hit. So this just measures how long it takes to compare long random strings, but with the patch we don't compare them at all.

Can some please help me with clang-format builds failing?
This is what i used to update the patch arc diff HEAD~ --update D128021 --nolint, but format build are still failing.
What should i do to fix that?

This looks pretty good so far, but I'd like to see a full set of benchmarks before approving. BTW it's really stupid that we have all the member function definitions out of line. It makes this diff so much larger than it needs to be.

Can some please help me with clang-format builds failing?
This is what i used to update the patch arc diff HEAD~ --update D128021 --nolint, but format build are still failing.
What should i do to fix that?

You can ignore the clang-format runner. It's just soft-failing. Your problem with the CI currently is that you've got some non-ASCII symbols in your patch. That's why Generated output fails.

libcxx/benchmarks/ContainerBenchmarks.h
143 ↗(On Diff #438077)

benchmark::DoNotOptimize(c) should be more than enough. This doesn't allow the compiler to optimize anything from the container.

libcxx/benchmarks/unordered_set_operations.bench.cpp
311 ↗(On Diff #438077)

Could you provide the results of a full set of input data (i.e. elements other than string and other sizes)? I don't think you cherry-picked, but having only a single data point smells like it. Since this patch also affects unordered_map, unordered_multimap and unordered_multiset I'd also like to see benchmarks for those (the multi-version are optional, but a nice to have).

libcxx/include/__hash_table
862–863

I don't think this is necessary. Just use integral_constant<bool, {true, false}> where __hash_table is used or use a NTTP instead.

2099–2100

It's definitely not worth it. This makes the code a lot less readable for a minor to non-existent improvement in compile times.

2099–2100

Could you indent this properly?

One request, when you address a review comment, can you check the "done" button this makes reviewing easier.

I had a look at the changes and I'm wondering about the impact on the size of the binary. You add one template argument to "only" adjust the __rehash function. This means the __hash_table for unordered_map<K, V> and unordered_multimap<K, V> will be different after this change. Could you provide some numbers regarding how much the binary will grow when an applications uses both unordered_map<K, V> and unordered_multimap<K, V>`?

libcxx/include/__hash_table
2099–2100

Question: what's the libc++ stance for something like this which could be if constexpr in C++17 or later? Is it worth writing

#if _LIBCPP_STD_VER >= 17
if constexpr
#else
if
#endif
(!UniqueKeys::Value) {

I suspect the answer is just writing if for all standards modes and just letting the compiler realize it's always a constant value now up-front.

There's no need for the #if LIBCPP_STD_VER >= 17, if _LIBCPP_CONSTEXPR_AFTER_CXX14 (!UniqueKeys::Value) { should work.

libcxx/include/ext/hash_map
489–490

There's no need for std:: here.

Thank you guys for the valuable feedback. I'll get back to this near the end of the upcoming week

itrofimow updated this revision to Diff 439213.Jun 22 2022, 4:45 PM

code review fixes

itrofimow marked 12 inline comments as done.Jun 22 2022, 5:10 PM

So i got some benchmarks: unordered_set_benches

And they didn't make any sense for BM_FindRehash: how could my changes affect it that much, when they don't even touch ::find?
After some profiling i found out that branch misprediction in std::equal_to increases up to x2 from ~11% to 23%, with overall mispediction being close to 4%.

Changing initial rehash in BM_FindRehash from 8 to 7 leads to this: unordered_set_benches, which is what i expected to be,
but i still have no idea what happens in original version.

libcxx/include/__hash_table
862–863

I feel like that would make __hash_table usage way less readable: is true self explanatory enough in __hash_table<Key, Hash, Eq, Alloc, true>?

libcxx/include/ext/hash_map
489–490

doesn't compile without it

itrofimow marked an inline comment as done.Jun 22 2022, 7:04 PM

One request, when you address a review comment, can you check the "done" button this makes reviewing easier.

I had a look at the changes and I'm wondering about the impact on the size of the binary. You add one template argument to "only" adjust the __rehash function. This means the __hash_table for unordered_map<K, V> and unordered_multimap<K, V> will be different after this change. Could you provide some numbers regarding how much the binary will grow when an applications uses both unordered_map<K, V> and unordered_multimap<K, V>`?

Would this example be enough?
I could've messed up compile flags, and guidance would be appreciated if i did so

Would this example be enough?
I could've messed up compile flags, and guidance would be appreciated if i did so

Thanks! Can you provide the output of the size command instead of the ls command for both binaries?

Would this example be enough?
I could've messed up compile flags, and guidance would be appreciated if i did so

Thanks! Can you provide the output of the size command instead of the ls command for both binaries?

root@adelaide:~/tmp# size binary_size_*

 text	   data	    bss	    dec	    hex	filename
10109	    744	      8	  10861	   2a6d	binary_size_main
11334	    744	      8	  12086	   2f36	binary_size_patch

Gently pinging this

Would this example be enough?
I could've messed up compile flags, and guidance would be appreciated if i did so

Thanks! Can you provide the output of the size command instead of the ls command for both binaries?

root@adelaide:~/tmp# size binary_size_*

 text	   data	    bss	    dec	    hex	filename
10109	    744	      8	  10861	   2a6d	binary_size_main
11334	    744	      8	  12086	   2f36	binary_size_patch

Thanks. This gives more insight. For the ls the patched version is smaller, but here patched version is bigger; which matches my expectations. I'm not happy with the 1 KB increment of the text size. This price will be paid for every type.

I want to have a closer look at this myself and see whether there's a better solution that gives the performance improvement with a smaller text size overhead.

libcxx/benchmarks/ContainerBenchmarks.h
140 ↗(On Diff #439213)

Please don't use auto when the type isn't clear, same for the bucket_count.

I had a look at the code and I wonder whether a different approach might be better. Instead of adding a template argument we change __rehash to two function __rehash_unique and __rehash_multi. This seems the approach already taken for other functions in the __hash_table class.

This means the public functions rehash and resize also need two versions. The other functions calling rehash are already specific for unique and multi. I think we can keep the unique key argument on these rehash functions. So I would suggest something along the lines of.

// These were rehash, keep them public and adjust the callers
_LIBCPP_INLINE_VISIBILITY void __rehash_unique(size_type __n) { __rehash<true>(__n); }
_LIBCPP_INLINE_VISIBILITY void __rehash_multi(size_type __n) { __rehash<false>(__n); }

// These were reserve, keep them public and adjust the callers
_LIBCPP_INLINE_VISIBILITY void __reserve_unique(size_type __n)
      {__rehash_unique(static_cast<size_type>(ceil(__n / max_load_factor())));}                                              
_LIBCPP_INLINE_VISIBILITY void __reserve_multi(size_type __n)
      {__rehash_multi(static_cast<size_type>(ceil(__n / max_load_factor())));}


// rename __rehash to __do_rehash and make it templated
template<class _UniqueKeys>
void __do_rehash(size_type __n);

// rename rehash to __rehash, make it private and make it templated
template<class _UniqueKeys>
void __rehash(size_type __n);

Then we can keep the templated rehashing you added but only have the template arguments on these two function instead of the entire class.

WDYT?

I had a look at the code and I wonder whether a different approach might be better. Instead of adding a template argument we change __rehash to two function __rehash_unique and __rehash_multi. This seems the approach already taken for other functions in the __hash_table class.

This means the public functions rehash and resize also need two versions. The other functions calling rehash are already specific for unique and multi. I think we can keep the unique key argument on these rehash functions. So I would suggest something along the lines of.

// These were rehash, keep them public and adjust the callers
_LIBCPP_INLINE_VISIBILITY void __rehash_unique(size_type __n) { __rehash<true>(__n); }
_LIBCPP_INLINE_VISIBILITY void __rehash_multi(size_type __n) { __rehash<false>(__n); }

// These were reserve, keep them public and adjust the callers
_LIBCPP_INLINE_VISIBILITY void __reserve_unique(size_type __n)
      {__rehash_unique(static_cast<size_type>(ceil(__n / max_load_factor())));}                                              
_LIBCPP_INLINE_VISIBILITY void __reserve_multi(size_type __n)
      {__rehash_multi(static_cast<size_type>(ceil(__n / max_load_factor())));}


// rename __rehash to __do_rehash and make it templated
template<class _UniqueKeys>
void __do_rehash(size_type __n);

// rename rehash to __rehash, make it private and make it templated
template<class _UniqueKeys>
void __rehash(size_type __n);

Then we can keep the templated rehashing you added but only have the template arguments on these two function instead of the entire class.

WDYT?

If binary size is a concern i definitely like this approach more

itrofimow updated this revision to Diff 441568.Jun 30 2022, 6:33 PM

review fixes, updated benchmarks (https://pastebin.com/raw/AF0fa9VJ)

itrofimow added inline comments.Jun 30 2022, 6:52 PM
libcxx/benchmarks/ContainerBenchmarks.h
140 ↗(On Diff #439213)

Type here differs for different benchmark registrations, so it would be some decltype otherwise.

I could change bucket_count to be a size_t if you insist, but i feel like it's not only clearly size_t, but also the same type rehash takes anyway

libcxx/benchmarks/unordered_set_operations.bench.cpp
311 ↗(On Diff #438077)

Strangely enough i couldn't find any unordered_map/multimap/multiset benchmarks, and i'm not sure i am familiar enough with this codebase to write a comprehensive ones from scratch

itrofimow updated this revision to Diff 441901.Jul 2 2022, 1:09 PM

Trying to force a rebuild

Mordante requested changes to this revision.Jul 3 2022, 4:28 AM

I think this patch is getting close to be done!

I see some small issues.
Can you provide the new binary sizes, I would like to see the overhead of this new approach.
Can you upload the patch with the complete context? Without context it's hard to validate whether the changes to unordered_set and unordered_map are correct. (I assume they are, but I want to validate.)

libcxx/benchmarks/unordered_set_operations.bench.cpp
110 ↗(On Diff #441901)

This function is confusion. It's called slow, but when the sizes don't match they function bails out quickly. Else it compares all elements. Some comment would be helpful.

114 ↗(On Diff #441901)

please put eq &= lhs[i] == rhs[i]; on its own line.

libcxx/include/__hash_table
1155

please us __ugly function names.

This revision now requires changes to proceed.Jul 3 2022, 4:28 AM

binary size after changes: https://pastebin.com/raw/z23mLN93

 text	   data	    bss	    dec	    hex	filename
10323	    744	      8	  11075	   2b43	binary_size_main
10996	    744	      8	  11748	   2de4	binary_size_patch

Thanks! I like these numbers better! This about half of the size overhead of the previous version.
I assume the main values have changed due to rebasing, right?

binary size after changes: https://pastebin.com/raw/z23mLN93

 text	   data	    bss	    dec	    hex	filename
10323	    744	      8	  11075	   2b43	binary_size_main
10996	    744	      8	  11748	   2de4	binary_size_patch

Thanks! I like these numbers better! This about half of the size overhead of the previous version.
I assume the main values have changed due to rebasing, right?

I did rebase, however i think main values changed due to me compiling on a different machine.
Not sure if i understand the size command output completely, but this time binary takes only ~150 bytes more on disk versus ~4.5Kb in the previous version

Smh about losing context - updated the patch via web interface, will fix remaining issues and upload the patch via arc this time

itrofimow marked 5 inline comments as done.Jul 5 2022, 8:37 PM
itrofimow updated this revision to Diff 442518.Jul 6 2022, 4:45 AM

rebase on main

Mordante accepted this revision as: Mordante.Jul 6 2022, 11:48 AM

binary size after changes: https://pastebin.com/raw/z23mLN93

 text	   data	    bss	    dec	    hex	filename
10323	    744	      8	  11075	   2b43	binary_size_main
10996	    744	      8	  11748	   2de4	binary_size_patch

Thanks! I like these numbers better! This about half of the size overhead of the previous version.
I assume the main values have changed due to rebasing, right?

I did rebase, however i think main values changed due to me compiling on a different machine.
Not sure if i understand the size command output completely, but this time binary takes only ~150 bytes more on disk versus ~4.5Kb in the previous version

The size command gives more insight in the real binary size. The file on disk contains a lot of additional information, like debug information, information needed for the dynamic linker, etc.

LGTM modulo some small nits. I'll leave the final approval to @philnik.

libcxx/benchmarks/ContainerBenchmarks.h
140 ↗(On Diff #439213)

I think it's good to use real types when possible.

libcxx/benchmarks/unordered_set_operations.bench.cpp
113 ↗(On Diff #442518)

Thanks this clarifies it!
You could consider adding an assert that the sizes are equal.

philnik accepted this revision.Jul 7 2022, 2:41 PM

LGTM.

This revision is now accepted and ready to land.Jul 7 2022, 2:41 PM