This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Optimize vectors construction of trivial types from an iterator range with const-ness mismatch.
ClosedPublic

Authored by vsapsai on Jun 19 2018, 5:29 PM.

Details

Summary

We already have a specialization that will use memcpy for construction
of trivial types from an iterator range like

std::vector<int>(int *, int *);

But if we have const-ness mismatch like

std::vector<int>(const int *, const int *);

we would use a slow path that copies each element individually. This change
enables the optimal specialization for const-ness mismatch. Fixes PR37574.

rdar://problem/40485845

Diff Detail

Repository
rL LLVM

Event Timeline

vsapsai created this revision.Jun 19 2018, 5:29 PM

Related change is https://reviews.llvm.org/D8109 For me the performance improvement was a twice faster execution on a dirty benchmark that doesn't exclude set up and tear down.

Hi Volodymyr, thanks for working on this!

libcxx/include/memory
1479 ↗(On Diff #152002)

Shouldn't this be true_type?

1673–1677 ↗(On Diff #152002)

I'm not sure if this is correct. Previously, we only selected this overload if the allocator didn't have a construct() member, or if it was a std::allocator, in which case we know construct() just called in-place new. With this patch, we would select this overload for a custom allocator that overrode construct() so long as the value_types matched. I think the right behaviour for the custom allocator case would be to use the construct() member instead of memcpying.

vsapsai planned changes to this revision.Jun 26 2018, 11:48 AM
vsapsai added inline comments.
libcxx/include/memory
1479 ↗(On Diff #152002)

I see this is confusing and I'm still struggling how to express it. The issue is that in C++03 __has_construct should be something like unknown, so that neither __has_construct nor ! __has_construct evaluate to true because we don't really know if allocator has construct. This case is covered by the added test, in C++03 the memcpy specialization was enabled when

is_same<allocator_type, allocator<_Tp> >
  || !false_type

So is_same check had no effect and we were using memcpy to convert between int and float.

I was considering using something like

        typename enable_if
        <
            (is_same
             <
                typename _VSTD::remove_const<typename allocator_type::value_type>::type,
                typename _VSTD::remove_const<_SourceTp>::type
             >::value
#ifndef _LIBCPP_CXX03_LANG
                || !__has_construct<allocator_type, _DestTp*, _SourceTp>::value
#endif
            ) &&
             is_trivially_move_constructible<_DestTp>::value,
            void
        >::type

But that doesn't look readable to me, so I've introduced ad-hoc ternary logic with __has_construct_missing.

1673–1677 ↗(On Diff #152002)

Thanks for pointing out the case with custom allocator. My expectation is that it should use construct() instead of memcpy, I agree with you. Will check further and plan to add a test.

vsapsai updated this revision to Diff 152990.Jun 26 2018, 4:45 PM
  • Don't use memcpy specialization with custom allocators.

Not entirely satisfied with comparing allocator to non-const and const, don't
know if there is a more elegant way to do the same.

libcxx/include/memory
1479 ↗(On Diff #152002)

Oh I see, yikes! That's a pretty bad bug. I agree that this fix is best then, but can you add a comment explaining this to __has_construct_missing for future casual readers? Also, I think we should move the __has_construct_missing bugfix into a different (prerequisite) patch. Seems unrelated to the const related optimization below.

vsapsai added inline comments.Jun 27 2018, 11:30 AM
libcxx/include/memory
1479 ↗(On Diff #152002)

The bug as I described isn't really present now because function signature

__construct_range_forward(allocator_type&, _Tp* __begin1, _Tp* __end1, _Tp*& __begin2)

works as implicit is_same for __begin1 and __begin2 types. I think it is worth fixing separately and there is a bug with C++03 and custom allocators.

vsapsai updated this revision to Diff 153430.Jun 28 2018, 5:24 PM
  • Don't check !__has_construct for __construct_range_forward.

Incompatible types will cause a lack of construct but it doesn't mean we
should use memcpy instead. And missing _Alloc::construct doesn't mean
alloc_traits::construct will fail, so falling back on memcpy can be
premature.

vsapsai updated this revision to Diff 153431.Jun 28 2018, 5:30 PM

Try to exclude changes available in parent review from this review.

vsapsai added inline comments.Jun 28 2018, 5:40 PM
libcxx/include/memory
1479 ↗(On Diff #152002)

Instead of __has_construct_missing I've implemented real __has_construct in D48753. But it is stricter in C++03 than in C++11 and later. So it made me think that absence of construct with exact signature isn't a good reason to use memcpy.

I want to point out that this code (not -necessarily- this patch, but where it lives) needs to be rewritten.

There is no prohibition on users specializing allocator_traits for their allocators, and yet libc++'s vector depends on the existence of allocator_traits<Allocator>::__construct_range_forward (among other things).

I want to point out that this code (not -necessarily- this patch, but where it lives) needs to be rewritten.

There is no prohibition on users specializing allocator_traits for their allocators, and yet libc++'s vector depends on the existence of allocator_traits<Allocator>::__construct_range_forward (among other things).

Marshall, do you agree to have such rewrite to be done separately? It seems to me fairly involved and I don't want it to block other changes. And for this change I suggest observing how it develops. If it is fairly small and isolated, I think it would be useful to have it before the more substantial allocator_traits rewrite.

vsapsai planned changes to this revision.Jul 2 2018, 12:21 PM
vsapsai added inline comments.
libcxx/include/memory
1479 ↗(On Diff #152002)

I was wrong. Now I think the logic for using memcpy should be

if types are the same modulo constness
and
(
    using default allocator
    or
    using custom allocator without `construct`
)
and
is_trivially_move_constructible

The purpose of the allocator check is to cover cases when static construct would end up calling not user's code but libc++ code that we know can be replaced with memcpy.

Quuxplusone added inline comments.
libcxx/include/memory
1479 ↗(On Diff #152002)

I'd like to request the spelling __has_trivial_construct_and_destroy<A, T, T&&> as described here: https://www.youtube.com/watch?v=MWBfmmg8-Yo&t=21m45s
I have an implementation in my fork that might be useful for comparison, although even it doesn't add that last T&& parameter.
https://github.com/Quuxplusone/libcxx/commit/34eb0b5c8f03880b94d53b877cbca384783ad65a

What we're *really* interested in, in most cases, is __has_trivial_construct<A, T, T&&> && __has_trivial_destroy<A, T>. We don't care if there happens to exist an obscure overload such as A::construct(T*, Widget, Gadget) as long as it is not selected. This is particularly relevant to scoped_allocator_adaptor, whose construct is trivial for primitive types but non-trivial for allocator-aware types.

Also, when we write out the template type parameters we care about, we can do the decltype stuff really easily, without having to "fudge" the metaprogramming and possibly get it wrong. For example, if A::construct is an overload set, in which case the &_Xp::construct on this patch's line 1492 will be ill-formed even though a non-trivial construct does actually exist.

vsapsai updated this revision to Diff 154021.Jul 3 2018, 4:52 PM
  • Proper support for custom allocators without construct.
vsapsai added inline comments.Jul 3 2018, 5:06 PM
libcxx/include/memory
1479 ↗(On Diff #152002)

I agree with benefits of using __has_trivial_construct_and_destroy in general but in this case I don't see a need for __has_trivial_destroy. __construct_range_forward only copies new elements and it isn't concerned with destruction. Probably for some cases we can meld destroy+construct together but I think that is out of scope for the current patch.

Arthur, can you please give some examples of possible problems with scoped_allocator_adaptor. I didn't work with it closely and don't really understand your concern.

I wish I could avoid "fudging" the metaprogramming but we need to support C++03 and I don't see other alternatives. And in C++03 A::construct shouldn't be an overload set so my understanding is that it is OK to use &_Xp::construct.

Quuxplusone added inline comments.Jul 3 2018, 9:03 PM
libcxx/include/memory
1479 ↗(On Diff #152002)

in this case I don't see a need for __has_trivial_destroy

Agreed. I'm happy to split up my proposed thing into __has_trivial_construct<Alloc, T, Args...> (which you need for the C++11 version of this patch) and __has_trivial_destroy<Alloc, T> (which you do not need, so let's not add it).

And in C++03 A::construct shouldn't be an overload set so my understanding is that it is OK to use &_Xp::construct.

N1905, Table 34, expresses the syntactic constraint in the usual way: a.construct(p,t) must be a well-formed expression. It doesn't say anything about "...and it can't be a template or an overload set."
My understanding is that libc++ allows you to say decltype even in C++03; they emulate it using __typeof and so on. So I'm thinking it should be fine to SFINAE on decltype(a.construct(p,t)) directly and avoid the "fudgy" use of &_Xp::construct.

can you please give some examples of possible problems with scoped_allocator_adaptor

It's not a "problem" per se, but a missed optimization. Consider this test case:
https://wandbox.org/permlink/YbFfJRWvS4W9OxJ5

using SAA = std::scoped_allocator_adaptor<std::allocator<T>>;
SAA alloc;
std::vector<T, SAA> vec(alloc);
vec.emplace_back(std::move(t));

When "T" is a uses-allocator type, __has_trivial_construct<SAA, T> is false. But when "T" is *not* a uses-allocator type (such as my class Unaware in the wandbox), then we would really like __has_trivial_construct<SAA, T> to come out as "true", because

  • then we could skip SAA::construct and directly call T's constructor; and if T was trivially constructible,
  • then we could further optimize that.

The scoped_allocator_adaptor::construct method always exists. It is sometimes trivial and sometimes not trivial, depending on T.
Here's an optimization-minded example, building on the previous example:
https://wandbox.org/permlink/jtByy3fzIOWZAVu2
Both Aware<true> and Aware<false> are trivially move-constructible. vector is allowed to as-if-memcpy Aware<false> wherever it wants. vector is *definitely not* allowed to as-if-memcpy Aware<true> wherever it wants. vector's allocator type SAA is the same in both cases; the variable that changes is the type of T. So therefore, if we want to distinguish these two cases, we must make our __has_trivial_construct<Alloc, T> trait take T as a template parameter. If we go with __has_trivial_construct<Alloc>, we lose some potential for optimization.

I believe the same argument applies to get us all the way to __has_trivial_construct<Alloc, T, Args...>, but I'm too tired to think about that right now. :)

EricWF added a comment.Jul 4 2018, 4:02 AM

Are there any tests which actually exercise the new behavior?

libcxx/include/memory
1665 ↗(On Diff #154021)

I'm not sure we should care about allocators for T const. The're all but an abomination.

libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter_different_value_type.pass.cpp
13 ↗(On Diff #154021)

Can this be folded into an existing test file for the constructor it's targeting?

Are there any tests which actually exercise the new behavior?

Added tests only verify we don't use memcpy erroneously. And existing tests make sure there are no functionality regressions. But there is nothing to test the performance improvement. Are there any recommendations for that?

libcxx/include/memory
1665 ↗(On Diff #154021)

My main goal was to avoid performance regression for

std::vector<const int> v(const_raw, const_raw + SIZE);

I'm not protecting allocators for T const, it just seems cheap to support them anyway.

libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter_different_value_type.pass.cpp
13 ↗(On Diff #154021)

Will move to construct_iter_iter.pass.cpp. Which reminded me that I need to make construct_iter_iter_alloc.pass.cpp and construct_iter_iter.pass.cpp more in sync.

vsapsai added inline comments.Jul 4 2018, 5:27 PM
libcxx/include/memory
1479 ↗(On Diff #152002)

And in C++03 A::construct shouldn't be an overload set so my understanding is that it is OK to use &_Xp::construct.

N1905, Table 34, expresses the syntactic constraint in the usual way: a.construct(p,t) must be a well-formed expression. It doesn't say anything about "...and it can't be a template or an overload set."

*sigh* The cheapest fix is to make _Alloc::construct required and don't check it's presence. But I don't want to do that as it might break existing code. So I prefer not ideal but working in most cases approach.

My understanding is that libc++ allows you to say decltype even in C++03; they emulate it using __typeof and so on. So I'm thinking it should be fine to SFINAE on decltype(a.construct(p,t)) directly and avoid the "fudgy" use of &_Xp::construct.

Yes, decltype is defined in __config if necessary and I'm using it. But it's not the same as in C++11. Also declval is nominally C++11 feature and you cannot use common idiom with decltype in trailing return type. I'll try and see how far I can get with C++03.

I agree that opening a way for more optimizations like with scoped_allocator_adaptor is a great goal but at the moment I'd like to fix a simpler case.

vsapsai updated this revision to Diff 154327.Jul 5 2018, 3:05 PM
  • Clean up tests according to review. We don't need a new test for custom allocators, parent patch covers that.
ldionne requested changes to this revision.Dec 7 2018, 8:41 AM

I believe this patch fixes an important QOI bug: see http://llvm.org/PR37574. I wholeheartedly agree with Eric that allocators-over-const are an abomination, however pointers-to-const are a fine thing and our implementation should handle them gracefully.

Please rebase on top of master -- there's another function that does not appear in this diff that should be fixed too (__construct_forward).

libcxx/include/memory
1645 ↗(On Diff #154327)

Coming at it from a slightly different angle, I would think this is what we want:

template <class _SourceTp, class _DestTp, 
          class _RawSourceTp = typename remove_const<_SourceTp>::type,
          class _RawDestTp = typename remove_const<_DestTp>::type>
_LIBCPP_INLINE_VISIBILITY static typename enable_if<
                                                                           // We can use memcpy instead of a loop with construct if...
    is_trivially_move_constructible<_DestTp>::value &&                     // - the Dest is trivially move constructible, and
    is_same<_RawSourceTp, _RawDestTp>::value &&                            // - both types are the same modulo constness, and either
    (__is_default_allocator<allocator_type>::value ||                      //   + the allocator is the default allocator (and we know `construct` is just placement-new), or
     !__has_construct<allocator_type, _DestTp*, _SourceTp const&>::value), //   + the allocator does not provide a custom `construct` method (so we'd fall back to placement-new)
void>::type
__construct_range_forward(allocator_type&, _SourceTp* __begin1, _SourceTp* __end1, _DestTp*& __begin2)
{
    ptrdiff_t _Np = __end1 - __begin1;
    if (_Np > 0)
    {
        _VSTD::memcpy(const_cast<_RawDestTp*>(__begin2), __begin1, _Np * sizeof(_DestTp));
        __begin2 += _Np;
    }
}

And then we should have

template <class _Tp>
struct __is_default_allocator : false_type { };

template <class _Tp>
struct __is_default_allocator<_VSTD::allocator<_Tp> > : true_type { };

Does this make sense?

Also, I'm not sure I understand why we use const_cast on the destination type. It seems like we should instead enforce that it is non-const? But this is a pre-existing thing in the code, this doesn't affect this review.

1690 ↗(On Diff #154327)

This should be fixed in a similar way.

libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
186 ↗(On Diff #154327)

I do not understand this test, can you please explain?

This revision now requires changes to proceed.Dec 7 2018, 8:41 AM

Thanks for the feedback. Answered one simple question, the rest needs more time.

libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
186 ↗(On Diff #154327)

At some point I had implementation that was using memcpy for initialization in this case. But in memory {0, 1, 2} != {0.0f, 1.0f, 2.0f}. I think I'll change the test to initialize vector<int> with float[] and it will simplify asserts. Because currently those fabs and FLT_EPSILON are noisy and add unnecessary cognitive load.

If you use some of my suggestions, please make sure it compiles in C++03 mode. I might be using some C++11 features (not sure whether default argument for template parameters is C++03).

libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
186 ↗(On Diff #154327)

Ok.

scanon added a subscriber: scanon.Dec 7 2018, 12:04 PM
scanon added inline comments.
libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
187 ↗(On Diff #154327)

These comparisons with FLT_EPSILON are doing nothing but adding noise. There is no value other than 2 that can possibly satisfy fabs(v[2] - 2.0f) < FLT_EPSILON, so this test can simply be v[2] == 2, for example.

If you find yourself using FLT_EPSILON as a tolerance, it's almost always wrong.

scanon requested changes to this revision.Dec 7 2018, 12:04 PM
Quuxplusone added inline comments.Dec 8 2018, 8:40 AM
libcxx/include/memory
1645 ↗(On Diff #154327)

I agree that it is wrong to express the check in terms of is_same<allocator_type, allocator<...>>; it should be expressed in terms of a trait which is satisfied by std::allocator<T>-for-any-T. @ldionne expressed it in terms of __is_default_allocator<A>. I continue to ask that it be expressed in terms of __has_trivial_construct<A, _DestTp*, _SourceTp&>, where libc++ specializes __has_trivial_construct<std::allocator<_Tp>, ...> if need be.

Orthogonally, the condition __has_construct<allocator_type, _DestTp*, _SourceTp const&> is wrong because it has an extra const. It is conceivable — though of course implausible/pathological — for construct(T*, T&) to exist and do something different from construct(T*, const T&).

libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
196 ↗(On Diff #154327)

I suggest that interesting test cases include "array of int to vector of unsigned int" (trivial, but unimplemented in this patch) and "array of iostream* to vector of ostream*" (non-trivial because each pointer must be adjusted).

ldionne added inline comments.Dec 10 2018, 11:08 AM
libcxx/include/memory
1645 ↗(On Diff #154327)

I continue to ask that it be expressed in terms of __has_trivial_construct<A, _DestTp*, _SourceTp&>, where libc++ specializes __has_trivial_construct<std::allocator<_Tp>, ...> if need be.

Would you be OK with us applying this fix and then generalizing __is_default_allocator into __has_trivial_construct as a followup? I suspect we'll have more discussion around that generalization and I'd like for us to fix this bug because I find PR37574 somewhat concerning and I'd like for it to be fixed soon (like within a couple of days).

Orthogonally, the condition __has_construct<allocator_type, _DestTp*, _SourceTp const&> is wrong because it has an extra const. It is conceivable — though of course implausible/pathological — for construct(T*, T&) to exist and do something different from construct(T*, const T&).

Good catch. IIUC, __has_construct<allocator_type, _DestTp*, _SourceTp&> would work?

libcxx/include/memory
1645 ↗(On Diff #154327)

Would you be OK with us applying this fix and then generalizing __is_default_allocator into __has_trivial_construct as a followup?

Yes, I would; although I am somewhat cynical about the followup ever actually happening. :P

IIUC, __has_construct<allocator_type, _DestTp*, _SourceTp&> would work?

Yes.

1658 ↗(On Diff #154327)

Shouldn't this be something like is_trivially_constructible<_DestTp, _SourceTp&>? I mean, we're never doing move-construction here. We're constructing _DestTp from _SourceTp&, which is either copy-construction or non-const-copy-construction, depending on the constness of _SourceTp.

is_trivially_constructible has pitfalls in general — https://quuxplusone.github.io/blog/2018/07/03/trivially-constructible-from/ — but I think we won't fall into any of those pits as long as we're testing that _SourceTp is cv-qualified _DestTp.

ldionne added inline comments.Dec 10 2018, 1:29 PM
libcxx/include/memory
1658 ↗(On Diff #154327)

Shouldn't this be something like is_trivially_constructible<_DestTp, _SourceTp&>?

I think you're right. We should also fix __construct_forward and __construct_backward, but maybe as part of a separate change since the three are consistently using is_trivially_move_constructible (and so there might be a reason for this).

vsapsai updated this revision to Diff 180316.Jan 4 2019, 1:48 PM
  • Tweak the test.
  • Use for __construct_range_forward approach recommended by @ldionne (it works with C++03).
  • Use __is_default_allocator for __construct_forward and __construct_backward.

Didn't introduce remove_const to __construct_forward and
__construct_backward because they don't allow mixing types.

vsapsai added inline comments.Jan 4 2019, 1:58 PM
libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
196 ↗(On Diff #154327)

What is that supposed to test? My float/int test is to make sure we have is_same<_RawSourceTp, _RawDestTp>::value and don't try to memcpy unrelated types. I've chosen float and int because it is easier for a human to reason about them.

int and unsigned int are interested for testing for values that are outside of common range. But in this case you pay more attention to conversion between ints, not to the behaviour of the constructor. That's my interpretation but maybe I've missed some of your intentions.

Quuxplusone added inline comments.Jan 5 2019, 9:16 AM
libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
196 ↗(On Diff #154327)

My float/int test is to make sure we [...] don't try to memcpy unrelated types [when it's unsafe to do so].

Right. I suggest two additional tests. The first additional test, int/unsigned int, is to verify whether we memcpy unrelated types when it is safe to do so. The second test, iostream*/ostream*, is to verify whether we memcpy unrelated types when it is unsafe to memcpy them even though they are both of the same "kind" (both pointer types).

These will act as regression tests, to make sure that future changes to the memcpy criteria don't break these cases' behavior (although the first additional test is expected to get more efficient).

ldionne accepted this revision.Jan 7 2019, 7:29 AM

LGTM, but I suggest you address @Quuxplusone 's concerns regarding the tests.

vsapsai updated this revision to Diff 180556.Jan 7 2019, 1:49 PM
  • Test initializing vector of unsigned int with array of int.
Quuxplusone added inline comments.Jan 7 2019, 1:56 PM
libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
161 ↗(On Diff #180556)

I might add "can be, but currently isn't, done with memcpy."

Here's my other suggested test:

struct X { int x; };
struct Y { int y; };
struct Z : X, Y { int z; };
{
    Z z;
    Z *array[1] = { &z };
    // Though the types Z* and Y* are very similar, initialization still cannot be done with memcpy.
    std::vector<Y*> v(array, array + 1);
    assert(v[0] == &z);
}
vsapsai added inline comments.Jan 7 2019, 1:57 PM
libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
196 ↗(On Diff #154327)

Added test for int32_t/uint32_t. Size is mentioned explicitly to avoid surprises with testing on different architectures. Tested that such initialization isn't using memcpy and still slow.

Do you know ways to verify pointer adjustment for iostream*/ostream* using their public API? Because I found a way to do that with accessing vector::data() and mucking with pointers. But I don't like this approach because it seems brittle and specific to vector implementation. I'd rather use stable public API. Currently I don't plan to invest much effort into iostream*/ostream* test because I agree it is nice to have but is_same part is already covered.

vsapsai added inline comments.Jan 7 2019, 1:59 PM
libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
161 ↗(On Diff #180556)

Thanks, I'll try your XYZ test, looks like it should help with my iostream*/ostream* struggles.

vsapsai updated this revision to Diff 180577.Jan 7 2019, 3:20 PM
  • Test initializing vector of pointers with array of pointers of convertible type.
Quuxplusone added inline comments.Jan 7 2019, 3:24 PM
libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
161 ↗(On Diff #180556)

LGTM! Ship it!
(I'm surprised the structs can be defined inside the body of main() like that, but I have no objection to doing so.)

This revision was not accepted when it landed; it landed in state Needs Review.Jan 7 2019, 4:07 PM
This revision was automatically updated to reflect the committed changes.
vsapsai added inline comments.Jan 7 2019, 4:09 PM
libcxx/test/std/containers/sequences/vector/vector.cons/construct_iter_iter.pass.cpp
161 ↗(On Diff #180556)

I've decided not to add "can be, but currently isn't, done with memcpy" because I don't believe that comment will be kept up to date.

Thanks for all the rounds of review, Arthur.