This is an archive of the discontinued LLVM Phabricator instance.

Comparing Unordered Containers (P0809)
ClosedPublic

Authored by ldionne on May 9 2019, 5:43 PM.

Details

Summary

This patch implements P0809 to add support for comparing unordered containers. It also fixes issues with the current implementations.

Diff Detail

Event Timeline

zoecarver created this revision.May 9 2019, 5:43 PM
zoecarver marked 2 inline comments as done.May 9 2019, 5:48 PM

Maybe I should add some fail tests for pre-c++20?

include/unordered_map
1641 ↗(On Diff #198960)

Was it ever necessary for the two maps to use the same allocator type (and is it now)? Furthermore, is it even necessary for them to have the same key/value type (I don't see anything in the standard that explicitly says they need to be)?

1690 ↗(On Diff #198960)

I can add some tests for this. Wanted to make sure it is correct first.

mclow.lists added inline comments.May 10 2019, 7:19 AM
include/unordered_map
1641 ↗(On Diff #198960)

All the container comparisons in the standard are defined in terms of comparing two containers of the same type. For allocator-aware containers, this includes the allocators. There is a proposal to relax that in flight (https://wg21.link/P0805), but I don't think that will be part of C++20.

My suggestion is to not support heterogenous comparisons at this time.
Note that this includes the hasher as well as the predicate and the allocator.

test/std/containers/unord/unord.map/comp_different_hash.pass.cpp
34 ↗(On Diff #198960)

I don't think that this is what P0809 meant. "Hashes with different behavior", yes.
"Hashes of different types", no.

mclow.lists added inline comments.May 10 2019, 7:22 AM
include/unordered_map
1690 ↗(On Diff #198960)

I like this implementation. It's straightforward, easy to read, and looks correct.
I'm still working through the complexity, though.

mclow.lists added inline comments.May 10 2019, 7:30 AM
include/unordered_map
1656 ↗(On Diff #198960)

This can be simplified for unordered_map, because unordered_map requires unique keys.

You can just use __y.find instead of equal_range and a single (Still need equal_range etc for unordered_multimap, though)

include/unordered_set
968 ↗(On Diff #198960)

Like the unordered_map code, this can be made much simpler because there are unique keys.

1006 ↗(On Diff #198960)

No need for is_permutation, again.

zoecarver marked 5 inline comments as done.May 10 2019, 7:47 AM
zoecarver added inline comments.
include/unordered_map
1641 ↗(On Diff #198960)

I will keep an eye on that paper. As per your below comment, I will get rid of this.

1656 ↗(On Diff #198960)

Good idea! I will change this.

include/unordered_set
968 ↗(On Diff #198960)

Good catch. Will update (same with plain unordered_set).

1006 ↗(On Diff #198960)

The standard specifically says to use is_permutation (for both set and multiset, etc.).

test/std/containers/unord/unord.map/comp_different_hash.pass.cpp
34 ↗(On Diff #198960)

While the paper doesn't explain it specifically, I think you are right. Because of the part that is removed which defines the behavior and not type. I can update it :P

mclow.lists added inline comments.May 10 2019, 8:02 AM
include/unordered_set
1006 ↗(On Diff #198960)

Yeah, I see that. I'm reviewing that and the original papers. I'll get back to you.

I (now) realize that I'm giving you inconsistent advice. (re is_permutation and the non-multi containers).
Sorry about that; I'll get back to you on that after some research.

  • remove paper changes (until I re-implement them)
  • use find instead of is_permutation
zoecarver updated this revision to Diff 199147.May 11 2019, 9:35 AM
  • use plain is_permutation
  • fix template name
zoecarver marked an inline comment as done.May 11 2019, 9:39 AM
zoecarver added inline comments.
include/unordered_map
2274 ↗(On Diff #199147)

From my tests, simply using is_permutation here is quite a bit faster than the previous method of comparison. I think that both bits of code are functionally equivalent (but let me know if that is not the case).

zoecarver updated this revision to Diff 199353.May 13 2019, 6:14 PM
  • remove all modification to unordered_{set,map}
  • add tests

Thank you @mclow.lists for writing most of the tests below.

I know that Louis and Eric disagree, but I'd rather have four tests, one for each of the unordered containers, rather than just one.
(unless someone wants to step up and re-do all the container tests, and I'm not hearing any enthusiasm for that).

test/std/containers/unord/comp_different_hash.pass.cpp
49 ↗(On Diff #199353)

I don't think that this does what what you want anymore.

71 ↗(On Diff #199353)

The first time through the loop you are dereferencing p2, which is equal to end.

zoecarver marked 2 inline comments as done.May 14 2019, 9:08 AM

I can do that so all the tests are "uniform".

test/std/containers/unord/comp_different_hash.pass.cpp
49 ↗(On Diff #199353)

You meant I should use std::numeric_limits<T>::max()?

71 ↗(On Diff #199353)

Good catch. Fixed.

zoecarver updated this revision to Diff 199468.May 14 2019, 9:11 AM
  • split up tests
  • fix test issues
zoecarver updated this revision to Diff 215799.Aug 18 2019, 2:33 PM

Updated status. Let's land this :)

Ping. This is a NFC.

ldionne added a reviewer: Restricted Project.Nov 2 2020, 2:26 PM

It would be cool to land it. I know that Arthur marked it as "Nothing To Do" already, but to the best of my knowledge there are no tests for that. And your tests seem exhaustive.

Nit: in the summary, the phrase "It also fixes issues with the current implementations." seems outdated.

libcxx/test/std/containers/unord/unord.set/comp_different_hash.pass.cpp
106 ↗(On Diff #215799)

For completeness, you should test operator!= as well (here and in other tests).

Quuxplusone added inline comments.
libcxx/test/std/containers/unord/unord.set/comp_different_hash.pass.cpp
64 ↗(On Diff #215799)

FWIW, I'd rather see something simpler here. As simple as providing hash_mod_two { return val % 2; } and hash_mod_three { return val % 3; }. I don't think it buys anything to have 5 different hashes as opposed to 2, nor to have partial specializations for T* of which the only one that's used is T=char.
I don't even see any reason to test char*, so I'd remove all that; but if there's any value in keeping it, then there should be a reason for c_vals[2] to be "cde" and not just "c".

In fact I think the whole void test() could be just [WARNING: UNTESTED CODE]

using Hash = int(*)(T);
std::unordered_set<int, Hash> c1({1, 2, 3, 4, 6, 7, 8}, 0, +[](int x) { return x % 2; });
std::unordered_set<int, Hash> c2({1, 2, 3, 4, 6, 7, 8}, 0, +[](int x) { return x % 3; });
std::unordered_set<int, Hash> c3({1, 2, 3, 5, 6, 7, 8}, 0, +[](int x) { return x % 3; });
assert(!std::equal(c1.begin(), c1.end(), c2.begin(), c2.end()));
assert(std::is_permutation(c1.begin(), c1.end(), c2.begin(), c2.end()));
assert(c1 == c2);
assert(!(c1 != c2));
assert(!std::is_permutation(c1.begin(), c1.end(), c3.begin(), c3.end()));
assert(c1 != c3);
assert(!(c1 == c3));

followed by a test for non-default equivalence predicate:

struct EqualMod6 { bool operator(int x, int y) const { return x % 6 == y % 6; } };
using Hash = int(*)(T);
std::unordered_set<int, Hash, EqualMod6> c1({1, 2, 3, 4, 6}, 0, +[](int x) { return x % 2; });
std::unordered_set<int, Hash, EqualMod6> c2({1, 8, 9, 4, 6}, 0, +[](int x) { return x % 3; });
assert(!std::equal(c1.begin(), c1.end(), c2.begin(), c2.end()));
assert(!std::is_permutation(c1.begin(), c1.end(), c2.begin(), c2.end()));
assert(!std::equal(c1.begin(), c1.end(), c2.begin(), c2.end(), EqualMod6()));
assert(std::is_permutation(c1.begin(), c1.end(), c2.begin(), c2.end(), EqualMod6()));
assert(c1 == c2);
assert(!(c1 != c2));

( https://en.cppreference.com/w/cpp/container/unordered_set/operator_cmp is pretty ugly but I think it makes sense what it's saying.)

ldionne commandeered this revision.Sep 5 2023, 10:45 AM
ldionne edited reviewers, added: zoecarver; removed: ldionne.
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2023, 10:45 AM
ldionne updated this revision to Diff 555900.Sep 5 2023, 10:48 AM
ldionne edited the summary of this revision. (Show Details)

Rebase and fix a few things.

Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2023, 10:48 AM
ldionne updated this revision to Diff 556028.Sep 6 2023, 6:45 AM

Fix >> issue in C++03 mode. Small consistency fixes.

ldionne updated this revision to Diff 556037.Sep 6 2023, 7:55 AM

Poke CI.

ldionne accepted this revision.Sep 6 2023, 4:36 PM

CI passed, but the Phab reporting is still broken.

This revision is now accepted and ready to land.Sep 6 2023, 4:36 PM
This revision was landed with ongoing or failed builds.Sep 6 2023, 4:38 PM
This revision was automatically updated to reflect the committed changes.