As of now containers key_eq might get called when rehashing happens, which is redundant for unique keys containers.
Details
- Reviewers
philnik Mordante - Group Reviewers
Restricted Project - Commits
- rG3085e42f80ac: [libc++] Don't call key_eq in unordered_map/set rehashing routine
Diff Detail
Unit Tests
Event Timeline
I recommend against clang-format everywhere here. When uploading the patch, you can specify --nolint to arc so it won't complain about the lack of clang-formatting a particular changeset. The noise ratio compared with the actual content change here is high in my opinion.
Thanks for the suggestion, i definitely will follow - clang-format goes crazy with these files
I agree with @jloser that this is a lot of noise. Please either ignore clang-format for now or provide a patch to only reformat the parts you change here and rebase this patch. More important for now: Please provide appropriate benchmarks in libcxx/benchmarks for this if we don't have any and provide the benchmark results.
Thanks for your contribution! I'll have another look after all comment have been addressed.
libcxx/include/__hash_table | ||
---|---|---|
863 | Can you add some comment why these traits are needed? The reason is in the commit message, but not in the code. | |
865 | Nowadays we tend to use using instead of typedef. This works for all supported compilers in C++98/C++03 mode. | |
868 | I'm not fond of name _Traits it very generic. How about _UniqueKeys? | |
libcxx/include/unordered_map | ||
1034 | As an extension we add unordered_map to C++03. |
libcxx/include/__hash_table | ||
---|---|---|
863 | Nit: s/reduntand/redundant | |
868 | FWIW, I personally preferred the traits class name holding a (member or type) to denote the one trait of interest: unique_keys. I don't feel too strongly, but I feel it leads to a more open design generally. | |
2099–2100 | Question: what's the libc++ stance for something like this which could be if constexpr in C++17 or later? Is it worth writing #if _LIBCPP_STD_VER >= 17 if constexpr #else if #endif (!UniqueKeys::Value) { I suspect the answer is just writing if for all standards modes and just letting the compiler realize it's always a constant value now up-front. |
libcxx/benchmarks/unordered_set_operations.bench.cpp | ||
---|---|---|
311 ↗ | (On Diff #438077) | This is with the patch: This is on trunk: full output: https://pastebin.com/7DMcNTna This benchmark doesn't mean there is an immense speedup in rehashing, it just shows that there is a speedup - max_load_factor(3) was used to be sure that there will be collisions and key_eq branch in rehashing would be hit. So this just measures how long it takes to compare long random strings, but with the patch we don't compare them at all. |
Can some please help me with clang-format builds failing?
This is what i used to update the patch arc diff HEAD~ --update D128021 --nolint, but format build are still failing.
What should i do to fix that?
This looks pretty good so far, but I'd like to see a full set of benchmarks before approving. BTW it's really stupid that we have all the member function definitions out of line. It makes this diff so much larger than it needs to be.
You can ignore the clang-format runner. It's just soft-failing. Your problem with the CI currently is that you've got some non-ASCII symbols in your patch. That's why Generated output fails.
libcxx/benchmarks/ContainerBenchmarks.h | ||
---|---|---|
143 ↗ | (On Diff #438077) | benchmark::DoNotOptimize(c) should be more than enough. This doesn't allow the compiler to optimize anything from the container. |
libcxx/benchmarks/unordered_set_operations.bench.cpp | ||
311 ↗ | (On Diff #438077) | Could you provide the results of a full set of input data (i.e. elements other than string and other sizes)? I don't think you cherry-picked, but having only a single data point smells like it. Since this patch also affects unordered_map, unordered_multimap and unordered_multiset I'd also like to see benchmarks for those (the multi-version are optional, but a nice to have). |
libcxx/include/__hash_table | ||
862–863 | I don't think this is necessary. Just use integral_constant<bool, {true, false}> where __hash_table is used or use a NTTP instead. | |
2099–2100 | It's definitely not worth it. This makes the code a lot less readable for a minor to non-existent improvement in compile times. | |
2099–2100 | Could you indent this properly? |
One request, when you address a review comment, can you check the "done" button this makes reviewing easier.
I had a look at the changes and I'm wondering about the impact on the size of the binary. You add one template argument to "only" adjust the __rehash function. This means the __hash_table for unordered_map<K, V> and unordered_multimap<K, V> will be different after this change. Could you provide some numbers regarding how much the binary will grow when an applications uses both unordered_map<K, V> and unordered_multimap<K, V>`?
libcxx/include/__hash_table | ||
---|---|---|
2099–2100 |
There's no need for the #if LIBCPP_STD_VER >= 17, if _LIBCPP_CONSTEXPR_AFTER_CXX14 (!UniqueKeys::Value) { should work. | |
libcxx/include/ext/hash_map | ||
489–490 | There's no need for std:: here. |
Thank you guys for the valuable feedback. I'll get back to this near the end of the upcoming week
So i got some benchmarks: unordered_set_benches
And they didn't make any sense for BM_FindRehash: how could my changes affect it that much, when they don't even touch ::find?
After some profiling i found out that branch misprediction in std::equal_to increases up to x2 from ~11% to 23%, with overall mispediction being close to 4%.
Changing initial rehash in BM_FindRehash from 8 to 7 leads to this: unordered_set_benches, which is what i expected to be,
but i still have no idea what happens in original version.
libcxx/include/__hash_table | ||
---|---|---|
862–863 | I feel like that would make __hash_table usage way less readable: is true self explanatory enough in __hash_table<Key, Hash, Eq, Alloc, true>? | |
libcxx/include/ext/hash_map | ||
489–490 | doesn't compile without it |
Would this example be enough?
I could've messed up compile flags, and guidance would be appreciated if i did so
Thanks! Can you provide the output of the size command instead of the ls command for both binaries?
root@adelaide:~/tmp# size binary_size_*
text data bss dec hex filename 10109 744 8 10861 2a6d binary_size_main 11334 744 8 12086 2f36 binary_size_patch
Thanks. This gives more insight. For the ls the patched version is smaller, but here patched version is bigger; which matches my expectations. I'm not happy with the 1 KB increment of the text size. This price will be paid for every type.
I want to have a closer look at this myself and see whether there's a better solution that gives the performance improvement with a smaller text size overhead.
libcxx/benchmarks/ContainerBenchmarks.h | ||
---|---|---|
140 ↗ | (On Diff #439213) | Please don't use auto when the type isn't clear, same for the bucket_count. |
I had a look at the code and I wonder whether a different approach might be better. Instead of adding a template argument we change __rehash to two function __rehash_unique and __rehash_multi. This seems the approach already taken for other functions in the __hash_table class.
This means the public functions rehash and resize also need two versions. The other functions calling rehash are already specific for unique and multi. I think we can keep the unique key argument on these rehash functions. So I would suggest something along the lines of.
// These were rehash, keep them public and adjust the callers _LIBCPP_INLINE_VISIBILITY void __rehash_unique(size_type __n) { __rehash<true>(__n); } _LIBCPP_INLINE_VISIBILITY void __rehash_multi(size_type __n) { __rehash<false>(__n); } // These were reserve, keep them public and adjust the callers _LIBCPP_INLINE_VISIBILITY void __reserve_unique(size_type __n) {__rehash_unique(static_cast<size_type>(ceil(__n / max_load_factor())));} _LIBCPP_INLINE_VISIBILITY void __reserve_multi(size_type __n) {__rehash_multi(static_cast<size_type>(ceil(__n / max_load_factor())));} // rename __rehash to __do_rehash and make it templated template<class _UniqueKeys> void __do_rehash(size_type __n); // rename rehash to __rehash, make it private and make it templated template<class _UniqueKeys> void __rehash(size_type __n);
Then we can keep the templated rehashing you added but only have the template arguments on these two function instead of the entire class.
WDYT?
libcxx/benchmarks/ContainerBenchmarks.h | ||
---|---|---|
140 ↗ | (On Diff #439213) | Type here differs for different benchmark registrations, so it would be some decltype otherwise. I could change bucket_count to be a size_t if you insist, but i feel like it's not only clearly size_t, but also the same type rehash takes anyway |
libcxx/benchmarks/unordered_set_operations.bench.cpp | ||
311 ↗ | (On Diff #438077) | Strangely enough i couldn't find any unordered_map/multimap/multiset benchmarks, and i'm not sure i am familiar enough with this codebase to write a comprehensive ones from scratch |
I think this patch is getting close to be done!
I see some small issues.
Can you provide the new binary sizes, I would like to see the overhead of this new approach.
Can you upload the patch with the complete context? Without context it's hard to validate whether the changes to unordered_set and unordered_map are correct. (I assume they are, but I want to validate.)
libcxx/benchmarks/unordered_set_operations.bench.cpp | ||
---|---|---|
110 ↗ | (On Diff #441901) | This function is confusion. It's called slow, but when the sizes don't match they function bails out quickly. Else it compares all elements. Some comment would be helpful. |
114 ↗ | (On Diff #441901) | please put eq &= lhs[i] == rhs[i]; on its own line. |
libcxx/include/__hash_table | ||
1155 | please us __ugly function names. |
text data bss dec hex filename 10323 744 8 11075 2b43 binary_size_main 10996 744 8 11748 2de4 binary_size_patch
Thanks! I like these numbers better! This about half of the size overhead of the previous version.
I assume the main values have changed due to rebasing, right?
I did rebase, however i think main values changed due to me compiling on a different machine.
Not sure if i understand the size command output completely, but this time binary takes only ~150 bytes more on disk versus ~4.5Kb in the previous version
Smh about losing context - updated the patch via web interface, will fix remaining issues and upload the patch via arc this time
The size command gives more insight in the real binary size. The file on disk contains a lot of additional information, like debug information, information needed for the dynamic linker, etc.
LGTM modulo some small nits. I'll leave the final approval to @philnik.
libcxx/benchmarks/ContainerBenchmarks.h | ||
---|---|---|
140 ↗ | (On Diff #439213) | I think it's good to use real types when possible. |
libcxx/benchmarks/unordered_set_operations.bench.cpp | ||
113 ↗ | (On Diff #442518) | Thanks this clarifies it! |
Can you add some comment why these traits are needed? The reason is in the commit message, but not in the code.