This is an archive of the discontinued LLVM Phabricator instance.

unordered_map: Avoid unnecessary mallocs when no insert occurs
AbandonedPublic

Authored by dexonsmith on Jan 20 2016, 9:43 AM.

Details

Reviewers
mclow.lists
Summary

(Repost of:
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20160118/147379.html
on Phabricator as requested by Eric.)

This is a follow-up to r239666: "Fix PR12999 - unordered_set::insert
calls operator new when no insert occurs". That fix didn't apply to

unordered_map because unordered_map::value_type gets packed inside:

union __value_type {

pair<key_type, mapped_type> __nc;       // Only C++11 or higher.
pair<const key_type, mapped_type> __cc; // Always.
// Constructors...

};

and the underlying hash_table only knows about value_type.

This patch should avoid unnecessary mallocs whenever the caller passes
in a pair<T, U> such that T is trivially convertible to key_type.

Since __hash_table's value_type is really *never* a pair (for
unordered_map, it's a union of two pairs) the static dispatch is all in
unordered_map. It's doing this:

  • If the argument isn't a pair<>, alloc.
  • If argument.first can be referenced as const key_type&, don't alloc.
  • If argument.first can be trivially converted to key_type, don't alloc.
  • Else alloc.

In the pre-C++11 world the caller has already converted to
unordered_map::value_type. We can always avoid the alloc.

To support all of this:

  • In C++03, unordered_map_equal and unordered_map_hasher need to handle unordered_map::value_type.
  • In C++03, hash_table::insert_unique_value() now takes its argument by template.
  • In C++11, hash_table::insert_unique_value() is now a one-liner that forwards to __insert_unique_key_value() for the real work.
  • The versions of hash_table::construct_node() that take a pre-computed hash have been renamed to __construct_node_hash(), and these versions use perfect forwarding.

This is one of my first patches for libc++, so I may need some extra
guidance:

  • Did I successfully match the coding style? (I'm kind of lost without clang-format TBH.)
  • Should I separate the change to __construct_node_hash() into a separate prep commit? (I would if this were LLVM, but I'm not sure if the common practice is different for libc++.)

After this I'll fix the same performance issue in std::map (and I
assume std::set?).

Diff Detail

Event Timeline

dexonsmith updated this revision to Diff 45402.Jan 20 2016, 9:43 AM
dexonsmith retitled this revision from to unordered_map: Avoid unnecessary mallocs when no insert occurs.
dexonsmith updated this object.
dexonsmith added a reviewer: EricWF.
dexonsmith added a subscriber: cfe-commits.
EricWF edited edge metadata.Jan 21 2016, 11:12 PM

Overall the patch looks good but I have a few concerns.

  • If argument.first can be trivially converted to key_type, don't alloc.

I'm concerned with this part of the change because:

  • The is_trivially_* traits are often not available and can sometimes blow up.
  • It also makes the implementation of __insert_extract_key_if_pair a lot more complicated.
  • It is technically non-standard because the standard says "insert(P&&)` should be handled as-if "emplace(P&&").

I would like to put a lot more thought into this part of your patch. Could you remove it and add it in a follow up?

I would also love if we applied this optimization to single argument "emplace" calls.

include/__hash_table
867

I really don't like how the name and signature of the function changes between C++03 and C++11. It's confusing me a lot while reviewing the patch. I would much rather it always accept two parameters and be called __insert_unique_key_value.

We can add a C++03 __insert_unique_value that dispatches like you do above if needed.

include/unordered_map
403

This part LGTM so long as instantiating _Cp in the function signature doesn't prevent unordered_map from being used with incomplete types.

1017

We don't want to decay because we don't want "array-to-pointer" and "function-to-pointer" conversions. We just want to remove the cv and ref qualifiers.

I *just* added a "__uncvref"trait in <type_traits>. Please use that instead.

On second thought, I don't think we want to remove the volatile qualifiers.

1022

I'm pretty concerned about the trivially_constructible part of the patch/optimization. Can we handle this in a follow up revision instead?

  • Did I successfully match the coding style? (I'm kind of lost without clang-format TBH.)

The style looks pretty good. I'll comment on any nits I have.

  • Should I separate the change to __construct_node_hash() into a separate prep commit? (I would if this were LLVM, but I'm not sure if the common practice is different for libc++.)

Yes please!

EricWF added a subscriber: mclow.lists.

Adding @mclow.lists as a reviewer.

I'll upload a new patch in a moment. Replies inline below.

dexonsmith updated this revision to Diff 45758.Jan 22 2016, 3:04 PM
dexonsmith edited edge metadata.

Committed r258511 and r258575 as preps. Updated patch addresses review comments (and skips the trivially constructible parts).

Alright, so this is looking really good. Once we land this would you be interested in applying the same optimization to emplace calls?

include/__hash_table
863

Woops, I know this was my suggestion, but I don't see any reason to have this function.

Please remove __insert_unique_value. Callers can call __insert_unique_key_value directly.

875

Please use the following signature for this overload:

template <class _KeyLike, class _ValueTp>
pair<iterator, bool> __insert_unique_key_value(const _KeyTp& __key, const _ValueTp& __x);

Using _ValueTp twice in C++03, but _KeyTp in C++11 is confusing.
I understand that in C++03 _KeyTp should always be _ValueTp but it took me 10 minutes to realize that.

include/unordered_map
401

Is there a reason your hiding these overloads in C++11 and beyond?

I would prefer as few conditional compilation blocks as possible.

1004

Overall the dispatch looks good, but it could be a lot simpler. Instead of multi-level dispatch I would write a single trait to computer if "_Pp" is a keylike pair. Then insert(_Pp&&) just use that to call __insert_extract_key_if_referenceable directly using that trait.

For example:

template <class _Key, class _Pair, class _RawPair = typename __uncvref<_RawPair>::type>
struct __is_keylike_pair : false_type {};

template <class _Key, class _Pair, class _First, class _Second>
struct __is_keylike_pair<_Key, _Pair, pair<_First, _Second> > : is_same<typename remove_const<_First>::type, _Key> {};

We also should handle pair<...>& which you previously didn't have.

Great, I should have time to clean this up tomorrow afternoon.

Regarding emplace, this patch as is has tests for emplace, but
they depend on r258575 being in tree. You asked me to revert
that... I'll wait for your response over in that thread:
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20160118/147795.html

I have a few other questions inline below.

Great, I should have time to clean this up tomorrow afternoon.

Regarding emplace, this patch as is has tests for emplace, but
they depend on r258575 being in tree. You asked me to revert
that... I'll wait for your response over in that thread:
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20160118/147795.html

Lets talk about that here to keep it together.

I agree with your approach where insert(_Pp&&) dispatchs to emplace(). The single argument emplace call should then call __insert_extract_key_if_pair.

mclow.lists edited edge metadata.Jan 25 2016, 7:54 AM

I don't have any comments on this code at this time, but I want to caution people that this part of the standard library is extremely carefully specified, and meeting all the requirements is a fiddly bit of work.

For an example of this, look at LWG issue #2464, which has been added to the draft C++17 standard.

dexonsmith updated this revision to Diff 46603.Feb 1 2016, 5:52 PM
dexonsmith edited edge metadata.

Eric, I think this addresses all of your review comments.

There are a couple of prep NFC commits that I sent for review:
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20160201/148661.html
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20160201/148662.html

What's changed here:

  • insert() now calls emplace() -- this could be split out (it is locally).
  • Renamed hash_table::insert_unique_value to __insert_unique_key_value and updated callers -- this could also be split out (it is locally).
  • Moved the dispatch logic to emplace().
  • Changed the dispatch logic to use a type trait (although it's still multi-stage to find the number of arguments).
  • Removed the unnecessary #ifdefs in the hasher (etc.).

Marshall, thanks for the link to #2464. That does look scary.

Nevertheless, I think this -- and the other over-eager allocations
in {unordered_,}{multi,}{map,set} -- are worth optimizing as long
as we can do it in a standards-compliant way. It's a pretty serious
regression in performance compared to expectations from C++03's
insert().

Let me know if you can think of some testing I can add to be sure
I'm not breaking anything (maybe test/std/.../ doesn't have enough
coverage?).

dexonsmith updated this revision to Diff 47936.Feb 14 2016, 3:00 PM

Updated the patch to apply against r260513. The logic is much simpler now; thanks Eric for the cleanup.

Let me know if there's anything else to do!

Overall this patch is *almost* there. My only objection is that this optimization neglects "unordered_set". Optimally we would also catch "unordered_set<int>.emplace(42)"

include/__hash_table
79

The traits LGTM.

860

It took me a minute to understand why this doesn't break std::unordered_set<std::pair<T, U>>, could you make either the trait name, or implementation more explicit about the fact that it only ever returns true for "unordered_map"?

test/libcxx/containers/unord/unord.map/insert_dup_alloc.pass.cpp
24

I don't think we need to test 100 different values for the mapped type. Also you should consider using 'DisableAllocationGuard' instead of 'globalMemCounter` directly. For example:

Container c;
c.insert(std::make_pair(1, 1)); // Don't worry about the allocation behavior here.
{
  DisableAllocationGuard g;
  // test duplicates that should not allocate in this scope.
}
30

Please use "TEST_STD_VER >= 11" instead. The macro is defined it "test_macros.h".

@dexonsmith Actually is it OK if I contribute the tests for this patch? Your's are in no way bad, but I want to test this in a similar manner to unord.map.modifiers/insert_allocator_requirments.pass.cpp and it's not fair to ask you to do that.

However if your willing to learn how to use support/container_test_types.h that even better :-)

Honestly I didn't think much about the testing style. I used r239666
as a reference and modified it for std::unordered_map.

I'll redo the tests when I have a moment, based on the newer examples.
Probably next week some time?

dexonsmith updated this revision to Diff 50514.Mar 11 2016, 7:27 PM

A few updates:

  • Rebased on top of http://reviews.llvm.org/D18115.
  • Rewrote the test to match suggested style.
  • Updated the code to catch unordered_set::emplace as well, using a can_extract_key trait and a companion extract_key functor.

Sorry for the latency reposting; let me know how this looks.

No worries, I'll try and take a look this weekend. Thanks.

Adding inline comments for the implementation. Comments on the tests to follow shortly.

include/__hash_table
79

Could you make __extract_key behave the same way as __can_extract_key where we apply the __uncvref<ValTy> instead of expecting the user to?

86

Assuming we can't be passed volatile types (I'll double check that). Then we should just use is_same<_RawValTy, _Key>

89

Please keep the __can_extract_key and __extract_key specializations together.

100

Add a newline before this class definition.

dexonsmith updated this revision to Diff 50869.Mar 16 2016, 1:59 PM

Eric and I had a quick chat on IRC.

  • The asymmetric usage of extract_key and can_extract_key was awkward, as Eric pointed out already.
  • However, a symmetric __extract_key would have caused Clang (and other compilers) to IRGen extra copies of operator().
  • Eric suggested using a custom tag type (instead of true_type/false_type) to avoid the problem entirely.

I implemented that in this updated patch, and it's way cleaner. I'm using:

  • __extract_key_fail_tag: cannot extract key.
  • __extract_key_self_tag: use the argument itself as the key.
  • __extract_key_first_tag: use ".first" on the argument to grab the key.
dexonsmith updated this revision to Diff 50883.Mar 16 2016, 3:33 PM

Eric sent me his preferred tests, which look fine to me. I've applied them to this patch.

EricWF accepted this revision.Mar 16 2016, 3:40 PM
EricWF edited edge metadata.

LGTM after change in inline comment.

include/__hash_table
90

>> needs to be > > for C++03.

This revision is now accepted and ready to land.Mar 16 2016, 3:40 PM
EricWF resigned from this revision.Apr 19 2016, 7:28 PM
EricWF removed a reviewer: EricWF.

This is done.

This revision now requires review to proceed.Apr 19 2016, 7:28 PM
dexonsmith abandoned this revision.Oct 6 2020, 2:58 PM