This is an archive of the discontinued LLVM Phabricator instance.

[libcxx][c++17] P0083R5: Splicing Maps and Sets Part 2: merge
ClosedPublic

Authored by erik.pilkington on Jul 3 2018, 1:59 PM.

Details

Summary

This was originally part of https://reviews.llvm.org/D46845, but I decided to split it out to clean up the diff. From that patch's description:

In __hash_table, I split up the functions __node_insert_unqiue and __node_insert_multi into 2 parts in order to implement merge. The first part, __node_insert_{multi,unique}_prepare checks to see if a node already is present in the container (as needed), and rehashes the container. This function can forward exceptions, but doesn't mutate the node it's preparing to insert. the __node_insert_{multi,unique}_perform is noexcept, but mutates the pointers in the node to actually insert it into the container. This allows merge to call a _prepare function on a node while it is in the source container, without invalidating anything or losing nodes if an exception is thrown.

Thanks for taking a look!
Erik

Diff Detail

Event Timeline

erik.pilkington created this revision.Jul 3 2018, 1:59 PM

In this new patch:

Ping! In the new patch:

  • Uncomment __cpp_lib_node_extract, reverting r340544
  • Rebase/Clean up the diff a bit
ldionne requested changes to this revision.Oct 24 2018, 12:30 PM

When merging a multi-container into a unique-container (e.g. map.merge(multimap) or set.merge(multiset)), I think we could do the following optimization: Once we've inserted an element from the multimap into the map, we could skip all the following equivalent elements in the multimap without having to perform a search in the map each time (since we know we won't insert the element anyway). Actually, I think we could do the same when we've found an element in the map and don't insert anything too. Basically something like this:

template <class _Tp, class _Compare, class _Allocator>
template <class _Tree>
_LIBCPP_INLINE_VISIBILITY
void
__tree<_Tp, _Compare, _Allocator>::__node_handle_merge_unique(_Tree& __source)
{
    static_assert(is_same<typename _Tree::__node_pointer, __node_pointer>::value, "");

    for (typename _Tree::iterator __i = __source.begin();
         __i != __source.end();)
    {
        __node_pointer __src_ptr = __i.__get_np();
        __parent_pointer __parent;
        __node_base_pointer& __child = __find_equal(
            __parent, _NodeTypes::__get_key(__src_ptr->__value_));
        ++__i;
        if (__child != nullptr) {
            // Here, increment __i until the key in the source is not equivalent to the key we found in *this.
            continue;
        } else {
            __source.__remove_node_pointer(__src_ptr);
            __insert_node_at(__parent, __child,
                             static_cast<__node_base_pointer>(__src_ptr));
            // Here, increment __i until the key in the source is not equivalent to the key we just inserted in *this.
        }
    }
}

I guess the underlying idea is to move forward as much as possible in __source once we've put our hands on an element in *this, since we don't have to re-lookup in *this as long as the keys in __source are equivalent to the one we just processed. Do you think this optimization is (1) valid and (2) worth trying out?

libcxx/include/__hash_table
1879

Can you add a quick comment that documents what this does, and what it returns?

1910

Ditto for documentation.

1940

Consider using a more descriptive name for __ndptr.

1958

Ditto for documentation.

2303

Consider using __source.remove(__it).release(); ++it; instead -- it's a bit more straightforward. Also, I think you could hoist the ++__it outside of the if altogether and get rid of the else branch.

2356

Same comment as above for __it++.

libcxx/include/__tree
2500

Do I understand correctly that __child != nullptr if and only if the key is already in *this, in which case we continue to skip the insertion (because that is a container with unique keys)?

The comment in __find_equal says

// Find place to insert if __v doesn't exist
// Set __parent to parent of null leaf
// Return reference to null leaf
// If __v exists, set parent to node of __v and return reference to node of __v

It looks like the "more correct" check to see whether the key already exists in *this would have been if (__child == __parent) -- is that correct?

2502

This doesn't throw, right? Would it make sense to mark __remove_node_pointer as noexcept? __tree_remove already is.

2503

This doesn't throw, right? Would it make sense to mark __insert_node_at as noexcept? __tree_balance_after_insert already is.

2557

So this always returns a reference to the null leaf that should be replaced by the new node, correct?

2562

Regarding static casting from __node_pointer to __node_base_pointer -- is this actually guaranteed to work for fancy pointer types? I don't think it is -- I think only casting to void_ptr and then back to __node_base_pointer would work per the Allocator concept requirements.

Note that in practice, a fancy pointer that would not support that would be misbehaved. Also, this problem already exists elsewhere in the code. I'm asking just for curiosity, I don't expect you to act on this comment in any way.

libcxx/include/map
946

Does this have to be public? Would doing otherwise require adding awkward friend declarations?

libcxx/test/std/containers/associative/map/map.modifiers/merge.pass.cpp
18

I think you mean map<key_type, value_type, C2, allocator_type>& source);. Also, you should mention the && overload.

Can you also please add tests that perform dst.merge(std::move(src)) before having done dst.merge(src) (the non-move version). Right now, we only seem to test that dst.merge(std::move(src)) works when the operation is a no-op anyway.

This comment applies to the other containers as well.

This revision now requires changes to proceed.Oct 24 2018, 12:30 PM
rsmith added a subscriber: rsmith.Oct 24 2018, 2:00 PM

When merging a multi-container into a unique-container (e.g. map.merge(multimap) or set.merge(multiset)), I think we could do the following optimization: Once we've inserted an element from the multimap into the map, we could skip all the following equivalent elements in the multimap without having to perform a search in the map each time (since we know we won't insert the element anyway).

We can only do that if the comparators divide elements into the same equivalence classes, which we cannot know in general. (We can do this optimization if we can prove the comparator is a pure stateless function, but we probably can't detect that -- even an empty class type could store its state outside the class.) However, if the source and destination have the same comparator, and that comparator is std::less<T>, then I think it's correct to do this optimization.

When merging a multi-container into a unique-container (e.g. map.merge(multimap) or set.merge(multiset)), I think we could do the following optimization: Once we've inserted an element from the multimap into the map, we could skip all the following equivalent elements in the multimap without having to perform a search in the map each time (since we know we won't insert the element anyway).

We can only do that if the comparators divide elements into the same equivalence classes, which we cannot know in general. (We can do this optimization if we can prove the comparator is a pure stateless function, but we probably can't detect that -- even an empty class type could store its state outside the class.) However, if the source and destination have the same comparator, and that comparator is std::less<T>, then I think it's correct to do this optimization.

Hmm, actually, we can probably do this in more cases than that. If we use the *destination's* comparator to check for equivalent elements instead of the source's comparator (under the assumption that the destination order is usually the same as the source order), this would work. But more generally, can we just provide the location of the most-recently-inserted element as an insertion hint and get this optimization as a special case of a more general optimization? But maybe that's actually what you were suggesting (the difference is hidden behind the "key in the source is not equivalent to the [other] key" comments -- equivalent according to which comparator?).

When merging a multi-container into a unique-container (e.g. map.merge(multimap) or set.merge(multiset)), I think we could do the following optimization: Once we've inserted an element from the multimap into the map, we could skip all the following equivalent elements in the multimap without having to perform a search in the map each time (since we know we won't insert the element anyway).

We can only do that if the comparators divide elements into the same equivalence classes, which we cannot know in general. (We can do this optimization if we can prove the comparator is a pure stateless function, but we probably can't detect that -- even an empty class type could store its state outside the class.) However, if the source and destination have the same comparator, and that comparator is std::less<T>, then I think it's correct to do this optimization.

Ah, so you mean if the two comparators were both valid but different strict weak orderings? Yes, I guess that kills my optimization in the general case. I think it would be possible to enable the optimization whenever the comparators for the map and the multimap are both stateless (std::is_empty) and the same. Then, I don't see a way they could be different orderings. (Note that we'd have to prove that a stateless comparator can't break this optimization by using global variables, which I think can be proven)

We can only do that if the comparators divide elements into the same equivalence classes, which we cannot know in general. (We can do this optimization if we can prove the comparator is a pure stateless function, but we probably can't detect that -- even an empty class type could store its state outside the class.) However, if the source and destination have the same comparator, and that comparator is std::less<T>, then I think it's correct to do this optimization.

Hmm, actually, we can probably do this in more cases than that. If we use the *destination's* comparator to check for equivalent elements instead of the source's comparator (under the assumption that the destination order is usually the same as the source order), this would work.

Using the destination's comparator to perform this optimization is clever, and I believe it would work. This is much better than using the source's comparator when std::is_empty<SourceComparator> and std::is_same<SourceComparator, DestinationComparator> are both true.

But more generally, can we just provide the location of the most-recently-inserted element as an insertion hint and get this optimization as a special case of a more general optimization? But maybe that's actually what you were suggesting (the difference is hidden behind the "key in the source is not equivalent to the [other] key" comments -- equivalent according to which comparator?).

The insertion hint idea did cross my mind when I was thinking about this but I did not pursue it further. I think it may be a cleaner way to achieve what I'm suggesting.

erik.pilkington marked 13 inline comments as done.

Address review comments. Thanks!

We can only do that if the comparators divide elements into the same equivalence classes, which we cannot know in general. (We can do this optimization if we can prove the comparator is a pure stateless function, but we probably can't detect that -- even an empty class type could store its state outside the class.) However, if the source and destination have the same comparator, and that comparator is std::less<T>, then I think it's correct to do this optimization.

Hmm, actually, we can probably do this in more cases than that. If we use the *destination's* comparator to check for equivalent elements instead of the source's comparator (under the assumption that the destination order is usually the same as the source order), this would work.

Using the destination's comparator to perform this optimization is clever, and I believe it would work. This is much better than using the source's comparator when std::is_empty<SourceComparator> and std::is_same<SourceComparator, DestinationComparator> are both true.

But more generally, can we just provide the location of the most-recently-inserted element as an insertion hint and get this optimization as a special case of a more general optimization? But maybe that's actually what you were suggesting (the difference is hidden behind the "key in the source is not equivalent to the [other] key" comments -- equivalent according to which comparator?).

The insertion hint idea did cross my mind when I was thinking about this but I did not pursue it further. I think it may be a cleaner way to achieve what I'm suggesting.

I think the insertion hint idea is really slick. I tried it out, and it does provide some pretty significant improvements, particularly when merging multi-containers that have a lot of duplicate keys. I would probably prefer doing it in a follow-up though, rather than complicating this "just get it working naively" patch.

libcxx/include/__hash_table
1879

Sure, sorry, I was just trying to conform to libcxx's "no comments" rule ;)

1940

Sure, new patch calls this __existing_node

2303

Sure, I agree that simplifies this. The new patch hoists the ++__it out of the if.

2356

I don't think that __source.remove(__it).release(); ++it; works here because after __source.remove(__it), __it is no longer a valid iterator, so you can't increment it.

libcxx/include/__tree
2500

Do I understand correctly that __child != nullptr if and only if the key is already in *this, in which case we continue to skip the insertion (because that is a container with unique keys)?

Yep, exactly.

It looks like the "more correct" check to see whether the key already exists in *this would have been if (child == parent) -- is that correct?

__child == __parent and __child != nullptr should be equivalent. It looks like other users of __find_equal check __child != nullptr, why do you say that __child == __parent is "more correct"?

2502

Yep, I agree. New patch adds _NOEXCEPT.

2557

Yep, thats my understanding as well.

2562

Yep, I think you're analysis is right. Every (fancy) pointer conversion needs to go through void, but this doesn't. We actually do this a lot in this file. Do you think this actually matters in practice? Maybe @mclow.lists @EricWF have some thoughts?

libcxx/include/map
946

Yep, in order for m1.merge(m2) to work, someone needs access to both m1 and m2's __tree_s, which are currently private. On second thought, I think I'll just add the friend declarations, since it makes this a bit more explicit.

libcxx/test/std/containers/associative/map/map.modifiers/merge.pass.cpp
18

Sure, done in the new patch.

I think the insertion hint idea is really slick. I tried it out, and it does provide some pretty significant improvements, particularly when merging multi-containers that have a lot of duplicate keys. I would probably prefer doing it in a follow-up though, rather than complicating this "just get it working naively" patch.

Works for me.

ldionne accepted this revision.Oct 31 2018, 7:29 AM

LGTM after rewording the comment. Please do not forget to submit a follow-up for the optimization.

Thanks for the great work and for the patience in getting this reviewed.

libcxx/include/__hash_table
1356

Please reword this comment in terms of "assumes that rehashing has already occurred and that no element with the same key exists in the map" (or something like that). Don't assume people have called __node_insert_unique_prepare before -- this sort of comment can get stale.

2356

Good catch.

libcxx/include/__tree
2500

I found it matched the comment in __find_equal more closely, but that's subjective. Leave as-is.

2562

It probably doesn't matter in practice. My experience using fancy pointers is that none of the standard libraries work with them. Only Boost.Container does.

This revision is now accepted and ready to land.Oct 31 2018, 7:29 AM
erik.pilkington marked an inline comment as done.Oct 31 2018, 10:33 AM

Thanks for reviewing!

libcxx/include/__node_handle
29 ↗(On Diff #171864)

(FYI: I moved this to <version> in the final commit)

This revision was automatically updated to reflect the committed changes.
mstorsjo added inline comments.
libcxx/trunk/include/set
710 ↗(On Diff #171960)

This broke certain cases of compilation with mingw-w64 headers, e.g. cases if <random> was included before <set>.

The root cause turned out to be that the mingw-w64 header yvals.h contains #define _C2 1: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-headers/crt/yvals.h#l179

Similar versions of yvals.h from MSVC also contain a similar definition up to as recent as MSVC 2015 (no longer in 2017 though), where it is defined like this:

#define _C2                     1       /* 0 if not 2's complement */

Given this predecent and the fact that both libcxx and the other platform headers need to cooperate within the namespace of reserved identifiers starting with an underscore, would you consider using a different identifier here?

ldionne added inline comments.Nov 1 2018, 7:44 AM
libcxx/trunk/include/set
710 ↗(On Diff #171960)

Fixed in r345834.

mstorsjo added inline comments.Nov 1 2018, 1:24 PM
libcxx/trunk/include/set
710 ↗(On Diff #171960)

Thank you!