Page MenuHomePhabricator

P1144 "Trivially relocatable" (3/3): optimize std::vector and std::swap for trivially relocatable types
Needs ReviewPublic

Authored by Quuxplusone on Sep 12 2019, 2:51 PM.

Details

Summary

Part 0/3 is D50119: it adds __has_extension(__is_trivially_relocatable) to the compiler.
Part 1/3 is D61761: it implements the library traits and algorithms mandated by P1144.
Part 2/3 is D63620: it implements Quality-of-Implementation features to warrant certain std library types as "trivially relocatable."
Part 3/3 is D67524: it implements Quality-of-Implementation features to optimize certain std library functions for better performance on types that have been warranted "trivially relocatable."

This patch has no ABI implications (or if it does, that's a bug, please tell me about it).

Notes on optimizations that I've made for https://p1144.godbolt.org/z/q2N_Ln but which are not part of this patch:

  • std::any can be made trivially relocatable, by changing _IsSmallObject to be false for non-trivially-relocatable types. However, that would break ABI.

Diff Detail

Repository
rCXX libc++

Event Timeline

Quuxplusone created this revision.Sep 12 2019, 2:51 PM

Note: std::is_constant_evaluated() is implemented and you can use __libcpp_is_constant_evaluated before C++2a.

I'm going to play around with this patch (and the others) a bit. Then, I can give a more full review.

include/memory
1614

Maybe leave a comment about what the various overloads mean.

Also, maybe add a default value (in case we want to use this outside of std::vector in the future).

5760

Nit: it might be less confusing to just use ::allocator_type.

5770

The more we can use _And the better. That way, at some point in the future, we can create an __and builtin type trait and improve build performance a bit.

include/vector
479

We don't need to do this anymore! Every compiler that we support supports rvalues (I think).

Quuxplusone marked 3 inline comments as done.Sep 12 2019, 7:32 PM
Quuxplusone added inline comments.
include/memory
1614

Defaulted function arguments are the devil.
I consider it a benefit that now it's impossible to call __construct_forward without explicitly saying whether you want to use memcpy or not; this is because the call to __construct_forward will usually be followed by a call to __destroy_at_end, and the final parameter to both calls must agree (or else you might call a constructor without the matching destructor, or vice versa).

Adding a comment is a good idea, but I'm not sure the best place to put it. Incidentally, in a previous revision I'd had something like

template<class X, class UseMemcpy>
static void foo(X x, UseMemcpy) { ... }
template<class X>
static void foo(X x, false_type) { ... }

where the first overload matches true_type; but that was unnecessary, error-prone, and not really any less confusing, so I trashed it.

5760

This is actually doing SFINAE on the value of __is_uncustomized — which in user-defined specializations of allocator_traits will either be absent (if they went from scratch) or have the wrong value (if they derived from allocator_traits<allocator<T>> to fill in the boilerplate).

The __has_trivial_construct part of this patch is a watered-down version of D49317. For now and maybe forever, libc++ wisely doesn't permit users to specialize allocator_traits. (See https://quuxplusone.github.io/blog/2018/07/14/traits-classes/ .)
So I guess it would be fine for me to eliminate __has_customized_allocator_traits from this patch.

include/vector
479

I think you're right, but then I'm confused by the similar code on the return type of move_if_noexcept (in trunk). What does _Tp&& mean, in C++03 mode? Is it a synonym for const _Tp&? If so, why does move_if_noexcept need that ifndef?

template <class _Tp>
inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX11
#ifndef _LIBCPP_CXX03_LANG
typename conditional
<
    !is_nothrow_move_constructible<_Tp>::value && is_copy_constructible<_Tp>::value,
    const _Tp&,
    _Tp&&
>::type
#else  // _LIBCPP_CXX03_LANG
const _Tp&
#endif
move_if_noexcept(_Tp& __x) _NOEXCEPT
{
    return _VSTD::move(__x);
}
zoecarver added inline comments.Sep 13 2019, 9:38 AM
include/vector
479

Clang supports rvalues in C++03 mode. So it will work just like in C++11 mode. move_if_noexcept probably just hasn't been updated. Example: https://godbolt.org/z/L2CJQa

Address most of @zoecarver's review comments. (Haven't looked into _And yet.)
Since __libcpp_is_constant_evaluated() is usable, use it in std::swap. (This passes all the test cases for swap, including the constexpr ones! Hooray!)

Quuxplusone retitled this revision from P1144 "Trivially relocatable" (3/3): optimize std::vector for trivially relocatable types to P1144 "Trivially relocatable" (3/3): optimize std::vector and std::swap for trivially relocatable types.Sep 13 2019, 4:24 PM
Quuxplusone edited the summary of this revision. (Show Details)
EricWF added a subscriber: EricWF.Sep 13 2019, 9:16 PM

How were the performance benefits measured? Are they visible in the existing benchmarks?

Add a new benchmark, and add a TriviallyRelocatableType to the matrix in algorithms.bench.cpp.

My expectation is that swap.bench.cpp should get noticeably faster after this patch.

My less confident expectation is that the new .*trivre.* test cases in algorithms.bench.cpp should initially be noticeably worse than the .*string.* cases, but after this patch they should catch up. What's happening there is that when you swap strings, you're calling string's own ADL overload of swap which is very fast, but when you swap TriviallyRelocatableTypes, you're calling the generic std::swap<TriviallyRelocatableType> which is not quite so fast until this patch makes it do memcpy.

I'm putting the benchmarks right here in this patch so that @EricWF can run them. (Zoe and I produced some good-looking benchmark numbers earlier tonight, with code similar to this code; but my build tree is too screwed up to reproduce them on demand.)

Fix a bug in my vector::insert; it wasn't exception-safe.

I'm putting the benchmarks right here in this patch so that @EricWF can run them. (Zoe and I produced some good-looking benchmark numbers earlier tonight, with code similar to this code; but my build tree is too screwed up to reproduce them on demand.)

@EricWF ping!

I'm putting the benchmarks right here in this patch so that @EricWF can run them. (Zoe and I produced some good-looking benchmark numbers earlier tonight, with code similar to this code; but my build tree is too screwed up to reproduce them on demand.)

@EricWF ping!

Rebased on master, and fixed an exception-safety bug in my patch to vector::insert/erase.

Quuxplusone marked 4 inline comments as done.Oct 29 2019, 2:35 PM

@EricWF ping!