[libcxx][c++17] P0083R5: Splicing Maps and Sets
ClosedPublic

Authored by erik.pilkington on May 14 2018, 1:21 PM.

Details

Summary

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0083r3.pdf

This patch adds support for p0083, splicing maps and sets. This feature allows users to extract a container node from a unordered or associative container, then insert it into another container without having to copy the node. The container_type::node_type member is a smart pointer that holds an extracted node. This proposal also includes a merge member function that can extract and insert nodes from a source container without losing nodes if an exception is thrown (from, i.e., a comparator).

This implementation for this is pretty straightforward, most of the operations necessary to write this API are already part of __tree and __hash_table. In __hash_table, I split up the functions __node_insert_unqiue and __node_insert_multi into 2 parts in order to implement merge. The first part, __node_insert_{multi,unique}_prepare checks to see if a node already is present in the container (as needed), and rehashes the container. This function can forward exceptions, but doesn't mutate the node it's preparing to insert. the __node_insert_{multi,unique}_perform is noexcept, but mutates the pointers in the node to actually insert it into the container. This allows merge to call a _prepare function on a node while it is in the source container, without invalidating anything or losing nodes if an exception is thrown. Another quirk is the __generic_container_node_destructor templated struct, this is forward-declared (and used, in a dependent context) in <__node_handle> and specialized in <__tree> and <__hash_table>. This allows <__node_handle> to call into __hash_node_destructor and __tree_node_destructor without including those headers.

As far as testing goes, I added some tests and ran them with msan, ubsan, and asan. I also ran the libstdc++ tests for this with the same sanitizers on linux and macos.

I'm still pretty new to libc++, so take this patch with a grain of salt!
Thanks for taking a look,
Erik

Diff Detail

Repository
rCXX libc++
erik.pilkington edited the summary of this revision. (Show Details)May 14 2018, 1:23 PM
rsmith added a subscriber: rsmith.May 14 2018, 3:43 PM

How do you deal with the fact that node handles for map-like types need to be able to mutate a const member of a pair in-place? Are you assuming/hoping the compiler won't do bad things to the program as a result, or is there some reason why it would be unlikely to do so / disallowed from doing so? If you're just casting pair<const K, V> to / from pair<K, V> (and I think that's what's happening), there is a very real risk that struct-path-sensitive TBAA will decide that pair<const K, V>::first and pair<K, V>::first cannot alias (and likewise for ::second) and then miscompile code using libc++'s node pointers.

One way we could deal with this is by adding an attribute to the compiler to indicate "the const is a lie", that we can apply to std::pair::first, with the semantics being that a top-level const is ignored when determining the "real" type of the member (and so mutating the member after a const_cast has defined behavior). This would apply to all std::pairs, not just the node_handle case, but in practice we don't optimize on the basis of a member being declared const *anyway*, so this isn't such a big deal right now.

Another possibility would be a compiler intrinsic to change the type of the pair, in-place, from a std::pair<const K, V> to a std::pair<K, V>, without changing anything about the members -- except that the first member changes type.

Yet another possibility would be to only ever access the key field through a type annotated with __attribute__((may_alias)), and then launder the node pointer when we're ready to insert it back into the container (to "pick up" the new value of the const member). That's formally breaking the rules (a launder isn't enough, because we didn't actually create a new object, and in any case we still mutated the value of a const subobject), but it'll probably work fine across compilers that support the attribute.

libcxx/include/__node_handle
148 ↗(On Diff #146659)

using __base::__base; would be a much more reasonable way to write an inheriting constructor declaration here.

erik.pilkington marked an inline comment as done.

Use the simpler inherited constructor spelling @rsmith suggested.

One way we could deal with this is by adding an attribute to the compiler to indicate "the const is a lie", that we can apply to std::pair::first, with the semantics being that a top-level const is ignored when determining the "real" type of the member (and so mutating the member after a const_cast has defined behavior). This would apply to all std::pairs, not just the node_handle case, but in practice we don't optimize on the basis of a member being declared const *anyway*, so this isn't such a big deal right now.

I'm a fan of this idea, this would also have the bonus of fixing some pre-existing type-punning between pair<const K, V> and pair<K, V> (see __hash_value_type and __value_type in <unordered_map> and <map>).

Another possibility would be a compiler intrinsic to change the type of the pair, in-place, from a std::pair<const K, V> to a std::pair<K, V>, without changing anything about the members -- except that the first member changes type.

How could this work? It would still be possible for TBAA to assume that void m(std::pair<const K, V>*, std::pair<K, V>*);'s parameters don't alias when optimizing m, regardless of how we form the non-const pair, right?

Yet another possibility would be to only ever access the key field through a type annotated with __attribute__((may_alias)), and then launder the node pointer when we're ready to insert it back into the container (to "pick up" the new value of the const member). That's formally breaking the rules (a launder isn't enough, because we didn't actually create a new object, and in any case we still mutated the value of a const subobject), but it'll probably work fine across compilers that support the attribute.

I think we could fall back to this in cases where the attribute from idea (1) isn't available. There, we could mark std::pair as may_alias and still fix __hash_value_type and __value_type. This would pessimize std::pair, but only for "older" compilers.

So unless I've gotten this all wrong, I'll write a patch to support that new attribute and come back to this patch if/when that lands. Sound good?

Thanks for the feedback!
Erik

One way we could deal with this is by adding an attribute to the compiler to indicate "the const is a lie", that we can apply to std::pair::first, with the semantics being that a top-level const is ignored when determining the "real" type of the member (and so mutating the member after a const_cast has defined behavior). This would apply to all std::pairs, not just the node_handle case, but in practice we don't optimize on the basis of a member being declared const *anyway*, so this isn't such a big deal right now.

I'm a fan of this idea, this would also have the bonus of fixing some pre-existing type-punning between pair<const K, V> and pair<K, V> (see hash_value_type and value_type in <unordered_map> and <map>).

This was supposed to be taken care of by [class.mem]/21 (common initial sequence), but that foundered on "standard-layout".

One way we could deal with this is by adding an attribute to the compiler to indicate "the const is a lie", that we can apply to std::pair::first, with the semantics being that a top-level const is ignored when determining the "real" type of the member (and so mutating the member after a const_cast has defined behavior). This would apply to all std::pairs, not just the node_handle case, but in practice we don't optimize on the basis of a member being declared const *anyway*, so this isn't such a big deal right now.

I'm a fan of this idea, this would also have the bonus of fixing some pre-existing type-punning between pair<const K, V> and pair<K, V> (see hash_value_type and value_type in <unordered_map> and <map>).

This was supposed to be taken care of by [class.mem]/21 (common initial sequence), but that foundered on "standard-layout".

Also, the common initial sequence rule only allows loads through the wrong type, not stores, and in any case does not alter the rule that modifying a const object results in UB.

One way we could deal with this is by adding an attribute to the compiler to indicate "the const is a lie", that we can apply to std::pair::first, with the semantics being that a top-level const is ignored when determining the "real" type of the member (and so mutating the member after a const_cast has defined behavior). This would apply to all std::pairs, not just the node_handle case, but in practice we don't optimize on the basis of a member being declared const *anyway*, so this isn't such a big deal right now.

Would you mind elaborating on this? If we don't optimize on a member being const, then what should this attribute actually do? The only thing I can think of is making a field with type const T with the attribute claim it's a T to TBAA, but that would be a no-op because loads and stores through those types are compatible anyways.

Yet another possibility would be to only ever access the key field through a type annotated with __attribute__((may_alias)), and then launder the node pointer when we're ready to insert it back into the container (to "pick up" the new value of the const member). That's formally breaking the rules (a launder isn't enough, because we didn't actually create a new object, and in any case we still mutated the value of a const subobject), but it'll probably work fine across compilers that support the attribute.

What about instead of type-punning between pair<const K, V> and pair<K, V> (with may_alias) we just did a const_cast in key, and launder the pair when finished? Because loads and stores to const K and K are already assumed to alias each other, we should be able to omit the may_alias attribute (right?). Would that be enough to placate the compiler? If it is, then I think we should just do this and avoid adding another attribute to clang. I'll upload a patch with this to show what I mean.

Thanks!
Erik

make __map_node_handle use a const_cast on the key instead of type-punning between pair<const K, V> and pair<K, V>. Add calls to std::launder in key() and before inserting a node handle.

Add _LIBCPP_ASSERT calls to all the entry points, friendly ping!

make __map_node_handle use a const_cast on the key instead of type-punning between pair<const K, V> and pair<K, V>. Add calls to std::launder in key() and before inserting a node handle.

Can you do this as a separate change, prior to adding node_handle? I would in particular expect this to result in the removal of the union __value_type from <map>.

Update this patch to depend on https://reviews.llvm.org/D47607, which fixes __value_type in map and unordered_map. Thanks!

EricWF added inline comments.May 31 2018, 11:05 PM
libcxx/include/__hash_table
2261 ↗(On Diff #149362)

If I'm not mistaken, __node_handle_extract_unique and __node_handle_extract_multi have the exact same implementation. This is intentional, no? If so can't we just use one implementation?

libcxx/include/__node_handle
1 ↗(On Diff #149362)

This header needs to be added to the module.modulemap.

82 ↗(On Diff #149362)

I'm not sure we should by trying to move a pointer. It's misleading, and we shouldn't expect it to act any differently.

118 ↗(On Diff #149362)

Missing _LIBCPP_NODISCARD_AFTER_CXX17

libcxx/test/std/containers/associative/set/merge.pass.cpp
20 ↗(On Diff #149362)

Please include test_macros.h directly.

erik.pilkington marked 5 inline comments as done.

Address @EricWF's comments.

libcxx/include/__hash_table
2261 ↗(On Diff #149362)

Yep, good point. The new patch just has __node_handle_extract.

Ping! If it'd make this easier to review, I'd be happy to split this up a bit.

Split this up. I'm moving the merge stuff to a follow-up, that makes this diff a lot easier to read.

I'm working on this! I'll be taking what is hopefully a final pass at the tests today.

ldionne requested changes to this revision.Jul 11 2018, 2:08 PM

Generally looks pretty good, but I have a couple questions/suggestions.

libcxx/include/__hash_table
2165 ↗(On Diff #153964)

When a function is declared with a visibility macro (_LIBCPP_INLINE_VISIBILITY in this case), I think it is customary in libc++ to repeat the visibility macro on the definition as well. If that's correct, apply here and to other similar functions below, and also in __tree.

2212 ↗(On Diff #153964)

Just a quick Q: am I mistaken or remove() is not part of the unordered_map API? We're not naming it __remove() just because we already have other functions called remove(), so any macro redefining remove would already break things -- is that the idea?

libcxx/include/__node_handle
33 ↗(On Diff #153964)

Clever way of reducing header dependencies. I like it.

36 ↗(On Diff #153964)

I'd like to suggest a different approach for implementing the node handles for sets and maps. I believe this would simplify things slightly:

template <class _NodeType, class _Alloc, template <class...> class _MapOrSetSpecifics>
class __basic_node_handle 
    : public _MapOrSetSpecifics<_NodeType, __basic_node_handle<_NodeType, _Alloc, _MapOrSetSpecifics>>
{
    // same as right now
};

Then, you can get the two different node handles by simply providing the methods in the base class:

template <class _NodeType, class _Derived>
struct __map_node_handle_specifics {
    using key_type = typename _NodeType::__node_value_type::key_type;
    using mapped_type = typename _NodeType::__node_value_type::mapped_type;

    _LIBCPP_INLINE_VISIBILITY
    key_type& key() const
    {
        return static_cast<_Derived const*>(this)->__ptr_->__value_.__ref().first;
    }

    _LIBCPP_INLINE_VISIBILITY
    mapped_type& mapped() const
    {
        return static_cast<_Derived const*>(this)->__ptr_->__value_.__ref().second;
    }
};

template <class _NodeType, class _Derived>
struct __set_node_handle_specifics {
    using value_type = typename _NodeType::__node_value_type;

    _LIBCPP_INLINE_VISIBILITY
    value_type& value() const
    {
        return static_cast<_Derived const*>(this)->__ptr_->__value_;
    }
};

template <class _NodeType, class _Alloc>
using __map_node_handle = __basic_node_handle<_NodeType, _Alloc, __map_node_handle_specifics>;

template <class _NodeType, class _Alloc>
using __set_node_handle = __basic_node_handle<_NodeType, _Alloc, __set_node_handle_specifics>;

This is a classic application of the CRTP. I believe it would reduce the amount of boilerplate currently required for special member functions. It's possible that you thought of this and realized it didn't work for a reason I'm missing right now, too, but the idea seems to work on the surface: https://wandbox.org/permlink/nMaKEg43PVJpA0ld.

You don't have to do this, but please consider it if you think it would simplify the code.

libcxx/include/map
915 ↗(On Diff #153964)

I believe this function should be marked with _LIBCPP_INLINE_VISIBILITY -- even though in practice the compiler will probably always inline it.

Also, you don't seem to be using that function at all in the diff -- is that right?

So please either mark with _LIBCPP_INLINE_VISIBILITY and use it, or remove it altogether. This comment applies to all 8 containers to which similar methods were added.

libcxx/include/set
399 ↗(On Diff #153964)

Is this forward declaration necessary?

libcxx/include/unordered_set
569 ↗(On Diff #153964)

Why don't you use

return __table.template __node_handle_insert_unique<node_type>(__hint, _VSTD::move(__nh));

instead? It seems like this way, if __node_handle_insert_unique(const_iterator, node_type) eventually starts taking advantage of the iterator hint, you'd get the improvement here for free.

1069 ↗(On Diff #153964)

For the other containers, the two insert methods come before the two extract methods. This is very nitpicky, but you should consider keeping the same order here too for consistency.

libcxx/test/std/containers/unord/unord.map/unord.map.modifiers/insert_extract.pass.cpp
14 ↗(On Diff #153964)

Please be more specific about which functions you're testing. I find it useful to look at those comments to quickly see what exact signatures are being tested when I'm looking for tests that exercice some specific piece of code.

Actually, perhaps you should extract those tests into multiple files:
unord.map/unord.map.modifiers/insert.node_handle.pass.cpp => test overload of insert on node_type
unord.map/unord.map.modifiers/insert.hint_node_handle.pass.cpp => test overload of insert with a hint and a node_type
unord.map/unord.map.modifiers/extract.node_handle.pass.cpp => test overload of extract on node_type
unord.map/unord.map.modifiers/extract.iterator.pass.cpp => test overload of extract on iterator
unord.map/insert_return_type.pass.cpp => test insert_return_type
unord.map/node_type.pass.cpp => test node handle operations

I believe this would be more consistent with the way things are currently organized for other functions. This comment applies to all containers touched by this review.

23 ↗(On Diff #153964)

Consider using using XXX = YYY consistently.

33 ↗(On Diff #153964)

You don't need spaces between consecutive >'s since those tests are not supported in C++03. Feel free to keep them or remove them, I'm just pointing it out.

99–101 ↗(On Diff #153964)

Why is this test useful?

This revision now requires changes to proceed.Jul 11 2018, 2:08 PM

I forgot to mention it in my initial review, but it also seems like you forgot to update all the synopsis in the comments at the beginning of headers.

erik.pilkington marked 12 inline comments as done.

Address @ldionne's comments.

libcxx/include/__hash_table
2165 ↗(On Diff #153964)

I can't say I know libc++'s policy here either, but it couldn't hurt, so I added the attributes to the new patch. Maybe @EricWF or @mclow.lists could weigh in here?

2212 ↗(On Diff #153964)

Yep, I think that was the thinking. IMO it would be nice to just call it __remove to avoid any confusion.

libcxx/include/__node_handle
36 ↗(On Diff #153964)

Oh, neat! That's a great idea, I didn't even consider CRTP here for some reason. Makes this simpler though. I added this to the new patch.

libcxx/include/map
915 ↗(On Diff #153964)

Oh, sorry! This was used in the implementation of merge, which I moved to a follow-up. I'll sink it down there and add the visibility attribute.

libcxx/include/set
399 ↗(On Diff #153964)

No, nice catch! This was also supposed to be a part of the follow-up, I'll move it there.

libcxx/test/std/containers/unord/unord.map/unord.map.modifiers/insert_extract.pass.cpp
14 ↗(On Diff #153964)

Okay, the new patch divides up the tests for each container into 4 files: extract_iterator.pass.cpp, extract_key.pass.cpp, insert_node_type.pass.cpp, and insert_node_type_hint.pass.cpp. I decided to move the insert_return_type and node_type testing to container.node/node_handle.pass.cpp because they don't really depend on any specific container, and it matches how the standard defines these.

23 ↗(On Diff #153964)

done!

33 ↗(On Diff #153964)

Sure, sorry! clang-format adds these in because it isn't smart enough to know when we're in a C++11 context.

99–101 ↗(On Diff #153964)

I guess it isn't really, it was more of a sanity check.

ldionne added inline comments.Jul 13 2018, 9:33 AM
libcxx/include/map
43 ↗(On Diff #155276)

You are missing // since C++17 comments here and elsewhere.

erik.pilkington marked an inline comment as done.

Add // C++17 to the header comments.

ldionne accepted this revision.Jul 13 2018, 12:56 PM

LGTM -- great! Just make sure the commit message does not pretend the change includes merge.

This may be obvious, but my LGTM alone should probably not be enough to submit.

This revision is now accepted and ready to land.Jul 13 2018, 12:56 PM

One more thing -
When you add a new header file, you need to update include/module.modulemap, test/libcxx/double_include.sh.cpp, and include/CMakeLists.txt.
Take a look at D49338 for an example.

Address @mclow.lists's comment. Thanks!

One more thing -
When you add a new header file, you need to update include/module.modulemap, test/libcxx/double_include.sh.cpp, and include/CMakeLists.txt.
Take a look at D49338 for an example.

Are you sure we have to update test/libcxx/double_include.sh.cpp? Looks like that file only has public headers, but this patch only adds __node_handle, which is internal. The new patch adds this header to include/CMakeLists.txt though (__node_handle already appeared in module.modulemap)

mclow.lists accepted this revision.Jul 31 2018, 4:09 PM

One more thing -
When you add a new header file, you need to update include/module.modulemap, test/libcxx/double_include.sh.cpp, and include/CMakeLists.txt.
Take a look at D49338 for an example.

Are you sure we have to update test/libcxx/double_include.sh.cpp? Looks like that file only has public headers, but this patch only adds __node_handle, which is internal. The new patch adds this header to include/CMakeLists.txt though (__node_handle already appeared in module.modulemap)

D'oh! You're right. I didn't see that you had a patch for module.modulemap.
Sorry for the noise.

Go ahead and commit (and thanks for doing this!)

This revision was automatically updated to reflect the committed changes.