This is an archive of the discontinued LLVM Phabricator instance.

Move __construct_forward (etc.) out of std::allocator_traits.
AbandonedPublic

Authored by Quuxplusone on Jul 13 2018, 1:06 PM.

Details

Summary

Inspired by Volodymyr's work on D48753, I've taken it upon myself to refactor the static member functions std::allocator_traits<A>::__construct_forward, __construct_backward, and __construct_range_forward into non-member functions. I think this is reasonable just in terms of code-cleanliness — they don't *have* to be member functions for any reason I can think of — and then it also permits a suitably sadomasochistic programmer to define his own specialization of std::allocator_traits without causing compiler errors in <vector>.

I have added a test case in test/libcxx/ for the sadomasochistic case, which I describe as "arguably ill-formed." I would be very very happy to see WG21 agree that specializing traits classes (pointer_traits, allocator_traits, iterator_traits) *is* ill-formed; I believe there's some disagreement on the subject at the moment. In the meantime, I think this would be a nice patch just on code-cleanliness grounds.

This patch is also groundwork for the "trivially relocatable" fork that I'm building on my GitHub; we'd need an architecture something like this in order to easily drop in support for trivial relocatability.

UPDATE: I suppose I should point out for context that a draft of the "trivially relocatable" proposal is now public, and a fork of libc++ incorporating the proposed feature is available on Compiler Explorer with vastly improved codegen compared to vanilla libc++.

Diff Detail

Repository
rCXX libc++

Event Timeline

Move the functions from <memory> to <vector>, since that's their only caller.
Uniform treatment of the pointer/iterator parameters; discover that the difference between "copy_forward" and "copy_range_forward" was that the former did moves and the latter did copies. Rename accordingly.

My review is incomplete, especially I cannot say with confidence if the proposed change is entirely free from unintended consequences that might break code not covered by the test suite. So other reviewers are welcome to chime in.

include/vector
298

Why does this function use _CopyViaMemcpy and not false_type like other functions?

300

Have you checked why using is accepted in C++03 mode? The tests are passing but I expected a compiler warning and didn't investigate further.

366–367

Good. I think decrementing __end2 after _Np check is better than what we had before.

542

I think the name __vector_constructable_via_memcpy better reflects the meaning. It detects cases when individual element construction can be safely replaced with memcpy, so it feels more about construct than about copy. And copy_via_memcpy is too imperative as for me, not really conveying it has boolean semantic.

1015

It's not immediately obvious why there is no check like is_same<_ForwardIterator, _Tp*> here. My guess is that we are using variables like this->__end_, v.__begin_ that we know are pointers. Don't think it's really a problem and not suggesting any changes, decided to mention it's a little bit tricky to understand.

Quuxplusone marked 4 inline comments as done.

Address @vsapsai's review comments.

include/vector
298

Oops, that's totally cruft left over from an earlier revision. Fixed!

300

I talked with Glen Fernandes about this on Slack the other day. I think the deal is that make check-cxx runs only the -std=c++2a tests, and if you want -std=c++03 you have to run them manually with llvm-lit --param=-std=c++03 -sv path/to/tests. Which of course I didn't do. :)
If there's a more foolproof way of automatically testing libc++ in *all* compiler modes, I'd like to know about it.

Fixed!

542

copy_via_memcpy is too imperative for me

I see your point. However, for background... in my other branch, this trait is joined by two companions:

struct __vector_relocate_via_memcpy
struct __vector_destroy_via_noop

So I'd like a naming scheme that fits all three use-cases comfortably.

How about just adding the word "should"? __vector_should_construct_via_memcpy, __vector_should_destroy_via_noop, etc? Would that sufficiently address the "too imperative" issue?

1015

Your guess is 100% correct, AFAIK. All we're doing here is copying from one __split_buffer to another, so both sides are always a contiguous range.

Quuxplusone marked 3 inline comments as done.Jul 16 2018, 5:49 PM
vsapsai added inline comments.Jul 17 2018, 11:18 AM
include/vector
300

The test suite didn't detect anything even in C++03 mode because of [-Wno-c++11-extensions](https://github.com/llvm-mirror/libcxx/blob/ffbb91bb640b1b0425a91aa70e2a6a2e0f7244e0/utils/libcxx/test/config.py#L922). Thanks for using typedef instead.

542

Yes, "should" is fine as it implies yes/no answer.

Quuxplusone marked 4 inline comments as done.Jul 17 2018, 11:24 AM

It would be nice if all the TMP required to determine whether to call __move_construct_forward(..., true_type) or __move_construct_forward(..., false_type) was done in __move_construct_forward itself (or a helper). This way, callers wouldn't have to do it themselves. For example, vector currently needs

typedef integral_constant<bool,
        __vector_should_construct_via_memcpy<_Tp, _Allocator>::value &&
        (is_same<_ForwardIterator, _Tp*>::value ||
         is_same<_ForwardIterator, const _Tp*>::value ||
         is_same<_ForwardIterator, pointer>::value)
    > __copy_via_memcpy;
...
_VSTD::__copy_construct_forward(__a, __first, __last, this->__end_, __copy_via_memcpy());

It would be neat if we could just do

VSTD::__copy_construct_forward(__a, __first, __last, this->__end_);

and have it dispatched correctly from there. That would make those functions potentially useful elsewhere. Does that make sense? Otherwise this LGTM.

include/vector
296

Do you really need inline here?

Quuxplusone marked an inline comment as done.Jul 27 2018, 1:25 PM
Quuxplusone added inline comments.
include/vector
296

I'm actually not sure — and also suddenly not sure if the visibility attribute should be _LIBCPP_INLINE_VISIBILITY or _LIBCPP_TEMPLATE_VIS. (I *think* the latter is only for type templates, though, not function templates?) However, this is exactly parallel to what we do for operator<, so I think changing it would be gratuitous. If someone wants to remove inline from a bunch of templates, I won't complain, but I also don't want this PR to be the one that initiates it.

template <class _Tp, class _Allocator>
inline _LIBCPP_INLINE_VISIBILITY
bool
operator< (const vector<_Tp, _Allocator>& __x, const vector<_Tp, _Allocator>& __y)
{
    return _VSTD::lexicographical_compare(__x.begin(), __x.end(), __y.begin(), __y.end());
}
545

Louis writes:

It would be nice if all the TMP required to determine whether to call __move_construct_forward(..., true_type) or __move_construct_forward(..., false_type) was done in __move_construct_forward itself (or a helper). This way, callers wouldn't have to do it themselves.

I know where you're coming from, but I believe that in this case we definitely *can't* do that, because the whole point of these routines is that the routine itself can't always tell whether it's supposed to memcpy or not; the *caller* is the only one with the power to decide that. The decision (in theory, though not yet in practice, because this particular PR is a pure refactoring) depends not only on details of _Tp and _Allocator but also on the specific call-site: we can memcpy more aggressively at some call-sites than others, because of information available only to the caller (such as "this is a relocation operation").

See https://github.com/Quuxplusone/libcxx/commit/e7e5999b01#diff-07c2b769648850d040dcbb07754e5f2fR1076 , lines 1076 et seq., for how I envision some future caller making the decisions on a callsite-by-callsite basis.

Quuxplusone marked 2 inline comments as done.Jul 27 2018, 1:25 PM
ldionne accepted this revision.Jul 27 2018, 1:28 PM

LGTM

include/vector
296

Sure. Then, the current one is correct. You want to be using _LIBCPP_INLINE_VISIBILITY here. Actually, you want to be using _LIBCPP_HIDE_FROM_ABI, but don't start doing this in this commit -- I'll do a bulk replacement later.

545

Got it.

This revision is now accepted and ready to land.Jul 27 2018, 1:28 PM

@ldionne: I don't know if your "LGTM" is necessarily sufficient to commit this or not; but either way, I don't have commit privs, so could I ask you (or someone else) to commit this on my behalf? Thanks!

@ldionne: I don't know if your "LGTM" is necessarily sufficient to commit this or not; but either way, I don't have commit privs, so could I ask you (or someone else) to commit this on my behalf? Thanks!

I would not dare say that my LGTM is sufficient. My goal in reviewing this was to lower the barrier for a more senior contributor (Eric/Marshall) to give a definitive LGTM.

I am not in favor of this patch.

I'm in favor of fixing the problem that Arthur has described, but not like this, for the following reasons:

  • Conceptually, these are (similar to) "Allocator-based versions of the algorithms proposed in P0040", and should (probably? possibly?) look more like them.
  • Mainly, though, I think that the goal of this patch (which is see as 'getting to memcpy') is not the direction that libc++ should take. Instead, we should be writing simple loops that the compiler can optimize into a call to memcpy if it chooses. Having calls to memcpy in the code paths makes it impossible to "constexp-ify" this code. (See https://libcxx.llvm.org/cxx2a_status.html (comments on std::copy and https://bugs.llvm.org/show_bug.cgi?id=25165).

I am not in favor of this patch.

I'm in favor of fixing the problem that Arthur has described, but not like this, for the following reasons:

  • Conceptually, these are (similar to) "Allocator-based versions of the algorithms proposed in P0040", and should (probably? possibly?) look more like them.
  • Mainly, though, I think that the goal of this patch (which is see as 'getting to memcpy') is not the direction that libc++ should take. Instead, we should be writing simple loops that the compiler can optimize into a call to memcpy if it chooses. Having calls to memcpy in the code paths makes it impossible to "constexp-ify" this code. (See https://libcxx.llvm.org/cxx2a_status.html (comments on std::copy and https://bugs.llvm.org/show_bug.cgi?id=25165).

Marshall makes a great point about memcpy and constexpr... We're trying to make the default allocator constexpr-friendly for C++20, and this doesn't play very nicely with that.

include/vector
318

Marshall writes:

Instead, we should be writing simple loops that the compiler can optimize into a call to memcpy if it chooses. Having calls to memcpy in the code paths makes it impossible to "constexp-ify" this code.

Well, I have three thoughts on that.

(A), "removing the calls to memcpy" sounds like you want to just call the *actual* move-constructor in a loop, and then later call the actual destructor in a loop. Which is to say, you don't want libc++ to have a codepath for this speed optimization at all. That's just leaving a ton of performance on the table, and I strongly disagree with that idea.

(B), regardless, couldn't you achieve that goal simply by taking this patch almost exactly as it is except removing the overloads that take true_type? If you want constexpr-friendliness badly enough that you're willing to call the move-constructor and destructor even of trivially copyable types, then you can still use this framework; you just have to remove the overloads that call memcpy. That wouldn't be a major refactoring.

(C), surely if you want the best of both worlds, you should be pushing someone to invent a constexpr memcpy and/or a way to detect constexpr-context at compile time? I don't think it makes sense to pessimize existing (non-constexpr) users in C++03-through-C++17 just because someone hypothetically might in C++2a-or-later want to mutate a std::vector in a constexpr context.

Quuxplusone edited the summary of this revision. (Show Details)Jul 30 2018, 10:20 AM
mclow.lists added inline comments.Jul 30 2018, 10:40 AM
include/vector
318

Which is to say, you don't want libc++ to have a codepath for this speed optimization at all.

You're completely correct. I don't want libc++ to have such a code path.
I want clang to generate a memcpy from the code w/o ever mentioning memcpy in the source.

include/vector
318

@mclow.lists: So would you accept a version of this patch that simply removed the true_type overloads? That would change this from a pure refactoring to a performance regression, but it would still reduce the overall diff between libc++ master and libc++ trivially-relocatable.

(Maybe it's no longer clear and needs restating: This patch is currently a pure refactoring. All I'm doing is moving the existing helper functions out of allocator_traits. IIUC, your objections apply to the existence of these existing helper functions just as much as to the refactored versions.)

I don't think it makes sense to pessimize existing (non-constexpr) users in C++03-through-C++17 just because someone hypothetically might in C++2a-or-later want to mutate a std::vector in a constexpr context.

That's not the right (implied) question.

The correct question is:

Will libc++pessimize existing (non-constexpr) users in C++03-through-C++17 *who are using old compilers* in order to support new constexpr features that come down the pike?

And the answer to that is yes - eventually.
I don't know when that will be, since the new compilers don't yet exist.
That's the point of https://bugs.llvm.org/show_bug.cgi?id=25165.

@mclow.lists: Well, anyway, is this pure refactoring acceptable, or would you prefer to keep these helpers inside allocator_traits until a better solution to constexpr-memcpy can be invented?

ldionne requested changes to this revision.Jul 30 2018, 12:44 PM

After thinking about this for some more, I'm not sure this patch is worth doing in its current form. The minimal patch for allowing specializations of allocator_traits would be:

  1. move the __move_construct_forward & friends functions from allocator_traits to private static member functions of std::vector (because they're only used in std::vector right now).
  2. keep the SFINAE on the allocator and avoid encoding any memcpy decision at the call site.

However, an even better alternative would be to look into adding an overload to uninitialized_move & friends that takes an allocator. We could then be clever in how this is implemented. The major benefit I see here is that there would be one common code path to optimize, as opposed to a std::vector-specific code path.

Given the small benefit provided by this patch, my opinion is that it's not worth moving forward with it as-is. However, I believe either of the two alternatives suggested above would be welcome, with a preference for the more comprehensive second approach, which requires a paper.

Arthur, what do you think? Do you think the second approach can work?

This revision now requires changes to proceed.Jul 30 2018, 12:44 PM

After thinking about this for some more, I'm not sure this patch is worth doing in its current form. The minimal patch for allowing specializations of allocator_traits would be:

  1. move the __move_construct_forward & friends functions from allocator_traits to private static member functions of std::vector (because they're only used in std::vector right now).
  2. keep the SFINAE on the allocator and avoid encoding any memcpy decision at the call site.

FWLIW, I approve of (1) but not (2), for the previously stated reason that the optimal path is known only at the call-site; the callee doesn't have enough information to know whether memcpy is appropriate. (But it sounds like Marshall doesn't want any memcpy happening at all, so maybe it's moot?)

However, an even better alternative would be to look into adding an overload to uninitialized_move & friends that takes an allocator. We could then be clever in how this is implemented. The major benefit I see here is that there would be one common code path to optimize, as opposed to a std::vector-specific code path.

Yes, when I implemented https://github.com/Quuxplusone/from-scratch/, one of the many things I noticed was that none of the uninitialized_foo algorithms were useful out of the box; every one of them needed to be reimplemented to take an allocator parameter. (A.k.a., "scoped_allocator_adaptor is why we can't have nice things.") However, as you point out, this is a long-standing problem and would require a library paper to do "right." (It would still be easy enough to add the needed algorithms with uglified names, e.g. __uninitialized_copy_a, __destroy_a, etc. This is exactly what libstdc++ does, and libc++ might be wise to copy its approach.)

I'd be happy to throw together a patch for __uninitialized_copy_a etc., since I think that would improve libc++ in general; but I don't see how that would directly help any specific short-term problem in libc++. This patch as it is helps two specific short-term problems:
(1) that user specializations of allocator_traits don't work (but, as the test case comments, this is arguably not a good idea anyway; see also https://quuxplusone.github.io/blog/2018/07/14/traits-classes/ )
(2) that the diff between libc++ trunk and libc++ trivially-relocatable is unnecessarily large
Messing with the uninitialized_foo algorithms would not directly help either of these problems, so we'd have to come up with some other rationale for it.

Quuxplusone marked 5 inline comments as done.Aug 14 2018, 11:03 AM
Quuxplusone abandoned this revision.Nov 19 2020, 4:40 PM

Abandoning, as this has been done in d9a4f936d05.