This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Optimize algorithms on uninitialized memory for trivial types.
AbandonedPublicDraft

Authored by var-const on Jan 18 2022, 8:58 PM.

Details

Reviewers
None
Group Reviewers
Restricted Project

Diff Detail

Event Timeline

var-const created this revision.Jan 18 2022, 8:58 PM
var-const added inline comments.Jan 18 2022, 9:00 PM
libcxx/include/__memory/uninitialized_algorithms.h
135

This is to simplify the initial implementation.

ldionne added inline comments.
libcxx/include/__memory/uninitialized_algorithms.h
139–140

Why is is_trivially_constructible_v<_ValueType, unsigned char> not sufficient?

146

I don't understand why this needs to be different from __can_use_memset_fill. Is the 0 value special?

153

Do you mean to use the new C++20 is_layout_compatible here, or is it something else?

161

I would simply use is_trivially_constructible<_To, _From> here. Do you see a reason why that wouldn't work?

187–190

IMO it would be cleaner to use std::enable_if instead -- we generally stay away from specializing function templates (we're happy overloading them though).

195

Perhaps we don't need those at all then, and we can always use ::value?

208

Why do we static_cast<unsigned char> and then pass it to memset(void*, int, ...), only for it to then cast again to unsigned char? Shouldn't we cast to int instead?

215

Why does this need to be a different function of its own? I'd understand if we used something like bzero (I think??) where there's OS-level optimizations with copy-on-write, but here I don't see the point.

The bzero stuff is far in my memory, but I think I remember there were optimizations around that.

225
226

My initial reaction to this is "we shouldn't try to guess whether a type is represented as zero". Indeed, that means that we need to pay for an additional memset and memcmp every time, and that could be pretty costly for large-ish types. Imagine arrays, for example.

var-const updated this revision to Diff 401510.Jan 19 2022, 9:56 PM
var-const marked 4 inline comments as done.
  • address feedback;
  • implement the optimization for the destroy family of algorithms;
  • add partial tests (missing copy and move).
var-const added inline comments.Jan 19 2022, 9:57 PM
libcxx/include/__memory/uninitialized_algorithms.h
139–140

Consider:

unsigned char x = 2;
int y = 0;
static_assert(std::is_trivially_constructible_v<decltype(y), decltype(x)>);
std::memset(&y, x, sizeof(y));
std::cout << y << std::endl; // 33686018

The intention of the check is to ensure memset would have the same effect as copying through the normal language means.

146

I've written a bit about this in the comment thread below regarding __memset_zero -- briefly, my understanding is that _any_ trivial object (including a POD struct) can be meaningfully set to all zeroes. However, because memset sets every byte in the output to the same byte value, it can only be used for filling if the type of the output fits into a byte (e.g. memsetting an array of chars to 10 is meaningful, producing [10 10 10...], whereas memsetting an array of ints to 10 would produce [168430090 168430090 168430090...]).

148

I'm actually not sure if this is addressed by the text of the Standard -- specifically, that memsetting all bytes of an arbitrary trivial struct to zero is a valid operation. It seems logical (a trivial type consists of trivial types recursively, meaning that a) it can be reduced to a sequence of primitive types and b) neither the type nor any nested types imposes any invariant on the values in its constructors), but AFAICT other implementations limit this optimization to built-in types.

153

The idea here is that all integral types are bitwise-compatible with each other, and all floating-point types are bitwise-compatible with each other, but integral types are _not_ bitwise-compatible with floating-point types; i.e., this is okay:

int x = ...;
long long y = ...;
memcpy(&x, &y, sizeof(x));

but this isn't:

int x = ...;
double y = ...;
memcpy(&x, &y, sizeof(x));

I don't think is_layout_compatible can help here -- for example, it wouldn't consider int and long bitwise-compatible.

This also leads to the question of whether we should allow truncation, e.g. using memcpy to initialize an int from long long, an unsigned int from int, or a float from a double. FWIW, some simple experiments show that, for Clang and GCC at least, memcpying and using normal language syntax produce the same truncating behavior for integral types, but not for floating-point types. To illustrate the last part:

{
  double x = std::numeric_limits<double>::max();
  float y = 0;
  y = x;
  std::cout << x << " " << y << std::endl;
  // 1.79769e+308 inf
}
{
  double x = std::numeric_limits<double>::max();
  float y = 0;
  std::memcpy(&y, &x, sizeof(y));
  std::cout << x << " " << y << std::endl;
  // 1.79769e+308 -nan
}
161

One example is converting between built-in floating-point and integer types -- is_trivially_constructible_v<long long, double> is true, but using memcpy would produce different (and incorrect) results compared to copying a double into a long long.

187–190

No longer applicable.

195

I don't have a very strong stance on this. Currently, there are more C++11-and-later algorithms (default_construct, value_construct, move) than C++03 algorithms (copy, fill) in this file, so it seemed reasonable to allow using the nicer syntax where possible. However, the gains are minimal, so I'm absolutely open to removing these.

208

memset is documented as converting to a unsigned char in its implementation (it accepts an int parameter for historical reasons), and I wanted to make that visible at the call site. If you feel this is confusing, I'm happy to change to an int cast like you mentioned (but I'd still prefer to add a comment that all possible values of __x should fit in a byte to avoid truncation).

215

I think zero is special in that, for a trivial but potentially large struct, it is the only value that can be used to set every byte in that struct and still get a meaningful result. __memset_fill presumes it is given a scalar __x that fits in a char. __memset_zero can work with a trivial _ValueType of an arbitrary size. I don't see a great way to use __memset_fill in that situation -- at the very least, in its current form it relies on being able to cast __x to int, whereas __memset_zero doesn't need to have that restriction.

226

Agreed -- I also feel like this optimization probably isn't worth it. Removed now.

var-const updated this revision to Diff 401511.Jan 19 2022, 9:57 PM

Remove a leftover debug comment

ldionne added inline comments.Jan 20 2022, 11:30 AM
libcxx/include/__memory/construct_at.h
30

This namespacing is unusual and IMO looks weird since __next_n has nothing to do specifically with __construct_at.

I would suggest instead using std::next, which is normally C++11 but we support in all Standard modes.

87–89

IMO we should not try to make the optimization here. I am fairly confident that the optimizer is going to remove the ~_Tp() call down in __destroy_at for trivially destructible types, however we still want to hit the _LIBCPP_ASSERT in there in case a nullptr is passed.

libcxx/include/__memory/uninitialized_algorithms.h
119

How about moving this to __iterator/next.h alongside std::next?

Also, I think this basically needs to do the same optimizations that std::ranges::next does through using std::ranges::advance. For example, if we have a random access iterator and we can assign the sentinel to the iterator, why would we increment it N times?

Alternatively, per what you just said offline, perhaps it's not worth trying to pretend that these utilities can work on types other than pointers?

139–140

Good point.

I think what we should do is this:

is_pointer<...> &&
is_trivially_constructible_v<_To, _From> &&
sizeof(_From) == sizeof(unsigned char) &&
sizeof(_To) == sizeof(unsigned char) &&
! is_volatile<...>

And then, what we should do in __memset_fill is use std::bit_cast<unsigned char>(__x) instead of static_cast. That way, this optimization can kick in for something like struct Foo { char c; }; (which is a valid case for this optimization).

148

Given how this is used (for uninitialized_value_construct), I think the right criteria here is just is_pointer && is_trivially_default_constructible && !is_volatile. In other words, I don't think it matters whether the type is trivially copyable (which is what is_trivial adds IIUC).

  • add more tests;
  • fix handling of volatile;
  • for memcpy, support using the unreachable sentinel.
var-const added inline comments.Jan 25 2022, 12:51 AM
libcxx/include/__memory/uninitialized_algorithms.h
148

Done.

I based this on this remark in [cppreference about memset](https://en.cppreference.com/w/cpp/string/byte/memset):

If the object is a potentially-overlapping subobject or is not `TriviallyCopyable` (e.g., scalar, C-compatible struct, or an array of trivially copyable type), the behavior is undefined.

Now that I think about it, however, that probably applies to already-initialized objects (in which case using memset is akin to reassigning) and making sure the class is trivially copyable makes sense. Here, however, we're essentially creating a new object, so whether it is trivially copyable probably doesn't matter.

Remove commented-out code.

Per our offline discussion, I think we can simplify a lot if we have some fancy internal unreachable_sentinel that is also a "sized sentinel" for pointers, and which returns INT_MAX or something when subtracted.

libcxx/include/__iterator/next.h
42 ↗(On Diff #402788)

We don't remove existing inlines until we've figured out the situation with code size regressions, but I would suggest not using it for new code unless it is needed semantically (or has been measured to be better).

libcxx/include/__memory/uninitialized_algorithms.h
136

I feel like it wouldn't add that much complexity to use __enable_if_t right away in this patch. In fact, having a separate function might give you a place to put the logic that is currently in __memcpy & friends, allowing you to remove these small helpers.

146

I think we should simplify this by inlining it into __can_memcpy_between. Also, here's a weak name suggestion that IMO is a bit clearer than __are_bitwise_compatible and __can_memcpy_between: __is_bitwise_copyable.

__are_bitwise_compatible: "compatible" is a bit vague IMO, it's not clear what it means until I read the definition
__can_memcpy_between: this is very clear, however I find it confusing with the other traits that check whether we can do memcpy, i.e. __can_use_memcpy_impl. It requires some cycles to think that those are entirely different properties.

154

I think this is_trivial could be replaced by is_trivially_copyable.

159

Unless I missed something subtle, you should be able to get rid of __can_use_memset_fill by doing:

template <class _InputIterator, class _OutputIterator, 
                  class _From = typename __iter_cv_value_type<_InputIterator>::type, 
                  class _To = typename __iter_cv_value_type<_OutputIterator>::type>
struct __can_use_memcpy_impl { ... };

Alternatively, you could simply do:

template <class _InputIterator, class _OutputIterator>
struct __can_use_memcpy_impl {
  using _From = typename __iter_cv_value_type<_InputIterator>::type;
  using _To = typename __iter_cv_value_type<_OutputIterator>::type;
  static constexpr bool value = ...;
};
197

You had the right reflex to use __value instead of value because this is an internal helper, however this (and ::type) are the only two places where we use non-internal names even in internal-only code. If someone were to define a macro named type or value, everything would break anyway. I think we should go for consistency with the rest of our code here and use value instead.

204

If there is only a few uses of these type traits, the cost of conditionally defining their _v counterparts is usually not worth it, and we generally don't bother doing it. It seems like the case here -- IMO the patch would be simpler if we didn't have those, even though I understand it means a couple more characters every time we use those traits.

223

I'm really curious to know whether/how other implementations are doing the bit_cast part. I think other implementations have been performing these optimizations for a long time, way before bit_cast was available. Do you know what they do?

299

I don't *think* we're allowed to perform this optimization as-is, because it wouldn't be equivalent in case the iterator throws an exception. Without the optimization, we'd normally call __destroy, which could be meaningful if the _ValueType has a non-trivial destructor.

To be safe, we could add && is_trivially_destructible. This might apply elsewhere too.

libcxx/test/std/utilities/memory/specialized.algorithms/buffer.h
40

Any reason why this isn't called iterator (and then we don't need the iterator_t typedef)?

libcxx/test/std/utilities/memory/specialized.algorithms/specialized.destroy/destroy.pass.cpp
115–127

You probably want to move this to tests() below directly, cause nothing here depends on It.

libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.construct.default/ranges_uninitialized_default_construct.pass.cpp
198

Consider inlining N for consistency with other test cases (here and elsewhere, in some other files too I think).

libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.fill.n/uninitialized_fill_n.pass.cpp
103

You can indent this (and all the new code you are adding too).

Address feedback

ldionne added inline comments.Jan 26 2022, 11:02 AM
libcxx/include/__bit/bit_cast.h
22
libcxx/include/__memory/uninitialized_algorithms.h
29

Please also comment about the added operator- and its purpose, since that's a significant difference (and a reason to use this over unreachable_sentinel for internal code, I guess?).

64–65
79
86–88

And actually, as we just discussed: The floating point part here assumes that if two floating point types T and U have the same size, then they have the same representation. I don't think there are two floating point types with the same size and the same representation that are *not* the same type, and that would be unlikely. In other words, I think the only time when we want the floating point criteria to kick in is already covered by is_same<From, To>, and so we should just remove the floating point criteria.

129

Another pattern we sometimes use is __enable_if_t<condition>* = nullptr, I think it's a tiny bit shorter. Whichever you prefer.

299

Here you don't guard against an iterator throwing and destruction not being trivial unless I missed something. In all places, you need to make sure that if you skip initialization, it's safe to also skip destruction in case of an exception thrown by the iterator.

var-const marked 14 inline comments as done.Jan 26 2022, 5:33 PM
var-const marked 6 inline comments as done.
var-const marked 3 inline comments as done.Jan 26 2022, 5:36 PM
var-const added inline comments.
libcxx/include/__memory/uninitialized_algorithms.h
223

Both GCC and MSVC use static_cast<unsigned char>.

GCC:

__builtin_memset(__first, static_cast<unsigned char>(__tmp), __len);

MSVC:

_CSTD memset(_To_address(_Dest), static_cast<unsigned char>(_Dest_val), _Count);
var-const marked an inline comment as done.Jan 26 2022, 5:36 PM
var-const updated this revision to Diff 403492.Jan 26 2022, 9:32 PM
var-const marked 2 inline comments as done.

Address feedback, implement _copy and _move tests, handle pointers to members correctly.

var-const marked 3 inline comments as done.Jan 26 2022, 9:39 PM
var-const added inline comments.
libcxx/include/__memory/uninitialized_algorithms.h
154

Hmm, I think so.

299

Done.

Interestingly enough, from what I could find in practice is_trivially_default_constructible is only true if is_trivially_destructible is also true -- see https://wandbox.org/permlink/OsAl5tGsIJJNI9cO.

It looks like Clang implements the former trait using a compiler function __is_trivially_constructible, so I can't quickly check how it's defined. However, I think that even if adding is_trivially_destructible may have no effect, it still makes things a little clearer.

libcxx/test/std/utilities/memory/specialized.algorithms/uninitialized.construct.default/ranges_uninitialized_default_construct.pass.cpp
198

I'd rather keep the constant unless you feel strongly about it. The code above mostly uses it, and I slightly prefer to keep it.

var-const marked 2 inline comments as done.Jan 26 2022, 9:39 PM

Missing tests

var-const abandoned this revision.Jan 27 2022, 12:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2022, 12:38 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript