This is an archive of the discontinued LLVM Phabricator instance.

template <class _Destructor>
struct __uninitialized_buffer_deleter {
    size_t __count_;
    _Destructor __dtor_;

    template <class _Tp>
    void operator()(_Tp* __ptr) const {
        __dtor_(__ptr, __count_);
#ifdef _LIBCPP_HAS_NO_ALIGNED_ALLOCATION
        ::operator delete(__ptr, __count_ * sizeof(_Tp));
#else
        ::operator delete(__ptr, __count_ * sizeof(_Tp), align_val_t(_LIBCPP_ALIGNOF(_Tp)));
#endif
    }
};

template <class _Array, class _Destructor>
unique_ptr<_Array, __uninitialized_buffer_deleter<_Destructor> > __get_uninitialized_buffer(size_t __n, _Destructor __destroy) {
    static_assert(is_array_v<_Array>);
    using _Tp = remove_extent_t<_Array>;
#ifdef _LIBCPP_HAS_NO_ALIGNED_ALLOCATION
    _Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count));
#else
    _Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count, align_val_t(_LIBCPP_ALIGNOF(_Tp))));
#endif

    using _Deleter = __uninitialized_buffer_deleter<_Destructor>;
    _Deleter __deleter{__n, __destroy};
    return unique_ptr<_Array, _Deleter>(__ptr, __deleter);
}

Would it make sense to add some internal tests for this class?

libcxx/include/__algorithm/stable_partition.h
19	Can this be removed?
142–143	Is never correct when the type does things in its destructor? Or are all elements in a moved from state?
libcxx/include/__memory/construct_at.h
99
libcxx/include/__memory/uninitialized_buffer.h
35

Address comments

In D152208#4400227, @Mordante wrote:

Would it make sense to add some internal tests for this class?

Since I changed this to using a unqiue_ptr I don't think it makes a lot of sense to add any internal tests.

libcxx/include/__algorithm/stable_partition.h
142–143	All the elements get destructed by `__stable_partition_impl`, so the buffer doesn't have to do anything other than deleting the allocation.

Harbormaster completed remote builds in B237032: Diff 528979.Jun 6 2023, 1:13 PM

philnik retitled this revision from [libc++] Introduce __uninitialized_buffer and use it instead of get_temporary_buffer to [libc++] Introduce __make_uninitialized_buffer and use it instead of get_temporary_buffer.Jun 6 2023, 2:28 PM

Try to fix CI

Harbormaster completed remote builds in B237077: Diff 529048.Jun 6 2023, 5:00 PM

Try to fix CI

Harbormaster completed remote builds in B237126: Diff 529102.Jun 6 2023, 8:36 PM

ldionne accepted this revision.Jun 7 2023, 8:29 AM

ldionne added inline comments.

libcxx/include/__memory/uninitialized_buffer.h
27	Can you please document what this class does. In particular, please document how the destructor function gets called (and include what the count represents -- bytes or number of elements). This is pretty obvious if you stop to think about it, but it should be documented.
libcxx/test/std/algorithms/alg.modifying.operations/alg.partitions/stable_partition.not_enough_memory.pass.cpp
35 ↗	(On Diff #529102)	You should check that the range has been partitioned. And actually maybe this should be in the general test for `stable_partition`? Same comments below.

This revision is now accepted and ready to land.Jun 7 2023, 8:29 AM

Address comments

Harbormaster completed remote builds in B237328: Diff 529382.Jun 7 2023, 12:20 PM

Try to fix CI

Harbormaster completed remote builds in B237522: Diff 529632.Jun 8 2023, 9:32 AM

Fix formatting

philnik added a child revision: D151717: [libc++][PSTL] Add a GCD backend.Jun 8 2023, 10:51 AM

LGTM!

Harbormaster completed remote builds in B237544: Diff 529665.Jun 8 2023, 1:09 PM

Try to fix CI

Harbormaster completed remote builds in B237572: Diff 529701.Jun 8 2023, 3:34 PM

Try to fix CI

Harbormaster completed remote builds in B237629: Diff 529781.Jun 8 2023, 7:43 PM

Try to fix CI

Harbormaster completed remote builds in B237789: Diff 529992.Jun 9 2023, 3:10 PM

Try to fix CI

Harbormaster completed remote builds in B237878: Diff 530105.Jun 9 2023, 11:26 PM

Try to fix CI

Harbormaster completed remote builds in B238198: Diff 530519.Jun 12 2023, 11:28 AM

Next try

Harbormaster completed remote builds in B238272: Diff 530620.Jun 12 2023, 10:13 PM

Try to fix CI

Harbormaster completed remote builds in B238492: Diff 530909.Jun 13 2023, 9:41 AM

assert on invalid params

Herald added subscribers: mstorsjo, arichardson. · View Herald TranscriptJun 13 2023, 11:08 AM

Harbormaster completed remote builds in B238563: Diff 530996.Jun 13 2023, 11:40 AM

ldionne added inline comments.Jun 13 2023, 3:50 PM

libcxx/test/std/algorithms/alg.sorting/alg.sort/stable.sort/stable_sort.pass.cpp
159–165	Here and everywhere -- you used 2-space indentation instead of 4 spaces as in the existing file.

Try adding a nothrow version

Try to fix CI

Harbormaster completed remote builds in B238864: Diff 531404.Jun 14 2023, 10:50 AM

philnik added a parent revision: D152939: [libc++] Add tests to make sure that stable algorithms work without memory available.Jun 15 2023, 8:40 AM

Rebased

Harbormaster completed remote builds in B239150: Diff 531778.Jun 15 2023, 9:29 AM

Rebased

Harbormaster completed remote builds in B239186: Diff 531837.Jun 15 2023, 3:17 PM

Try to fix CI

Harbormaster completed remote builds in B239246: Diff 531920.Jun 15 2023, 7:33 PM

Closed by commit rG31eeba3f7c0e: [libc++] Introduce __make_uninitialized_buffer and use it instead of… (authored by philnik). · Explain WhyJun 16 2023, 7:54 AM

This revision was automatically updated to reflect the committed changes.

philnik added a commit: rG31eeba3f7c0e: [libc++] Introduce __make_uninitialized_buffer and use it instead of….

philnik removed a child revision: D151717: [libc++][PSTL] Add a GCD backend.Jun 16 2023, 8:13 AM

We're hitting CFI errors in Chromium after this, see https://github.com/llvm/llvm-project/issues/63523

mingmingl added a subscriber: mingmingl.Jun 26 2023, 9:03 AM

mingmingl added inline comments.

libcxx/include/__memory/uninitialized_buffer.h
72–74	Hi, a regression is observed in a benchmark that uses `std::stable_sort` intensively. Comparing baseline and the regressed case, the baseline (using `get_temporary_buffer`) uses unaligned alloc ( `void* operator new[](size_t size)` ) when objects are not over aligned, and the regressed uses aligned alloc for all objects. This may increase the memory usage as well. Given `std::stable_sort` is pretty important, I wonder if this regression warrants a fix to have consistent usage of (aligned or unaligned) new operator?

philnik added inline comments.Jun 26 2023, 9:12 AM

libcxx/include/__memory/uninitialized_buffer.h
72–74	I don't understand how using the aligned alloc would increase memory usage. `void* operator new[](size_t size)` is basically `void* operator new[](size_t size, align_val_t{__STDCPP_DEFAULT_NEW_ALIGNMENT__})`. So unless you have a bad implementation it shouldn't make any difference.

mingmingl added inline comments.Jun 26 2023, 10:55 AM

libcxx/include/__memory/uninitialized_buffer.h
72–74	thanks for the quick response! To share more context, the malloc implementation is tcmalloc (https://github.com/google/tcmalloc); and aligned alloc exercises more instructions than unaligned one. Considering the affected library is `stable_sort`, the benefit of keeping unaligned alloc is not small. In the baseline, unaligned alloc uses `void* operator new[](size_t size) noexcept(false)` corresponds to `TCMallocInternalNewArray` (an alias of `TCMallocInternalNew`), the implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1134 In the regressed case, `operator new[](unsigned long, std::align_val_t, std::nothrow_t const&)` is used -> `TCMallocInternalNewArrayAlignedNothrow` is used. The implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1195 Comparing the `llvm-objdump` output below, aligned operator new has a longer instruction sequence (0x154 for regressed case, 0xec for baseline) -> this increases the number of instructions executed on a hot path (`std::stable_sort`). 000000000b4702c0 g F section-name 00000000000000ec tcmalloc_size_returning_operator_new # new operator used in BASELINE 000000000b4703c0 g F section-name 0000000000000154 tcmalloc_size_returning_operator_new_aligned 000000000b470540 g F section-name 00000000000001ec tcmalloc_size_returning_operator_new_hot_cold 000000000b470740 g F section-name 00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold 000000000b470a00 g F section-name 00000000000000ec tcmalloc_size_returning_operator_new_nothrow 000000000b470b00 g F section-name 0000000000000154 tcmalloc_size_returning_operator_new_aligned_nothrow # new operator in EXPERIMENT 000000000b470c80 g F section-name 00000000000001ec tcmalloc_size_returning_operator_new_hot_cold_nothrow 000000000b470e80 g F section-name 00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold_nothrow

philnik added inline comments.Jun 26 2023, 11:07 AM

libcxx/include/__memory/uninitialized_buffer.h
72–74	How exactly does that correlate to memory use? I have no idea how TCMalloc is implemented, so your links don't really help me. It could just be the case that the aligned path isn't as well optimized as the non-aligned one, resulting in slightly worse performance. I don't see any theoretical reason the aligned path should be any slower than the non-aligned one. Worst-case should be a single condition to see whether the non-aligned path should be taken if it's actually faster.

mingmingl added inline comments.Jun 26 2023, 11:50 AM

libcxx/include/__memory/uninitialized_buffer.h
72–74	How exactly does that correlate to memory use? I consulted tcmalloc experts and got the confirmation that this won't increase memory usage; however, using aligned alloc where it's not necessary is going to slow down the execution. I'm not very familiar with the implementation either and misread how alignment affects the size class calculation; sorry about the wrong statement on memory increase. In the regressed case, aligned operator new does extra work to achieve the same results. To put it the other way, the unaligned operator new can use a compile time constant in its calculations, and saves a runtime cost to choose a size class. This 10-line code snippet and the function should hopefully roughly explain where the additional runtime cost comes from when aligned alloc is used (without a full implementation detail) Worst-case should be a single condition to see whether the non-aligned path should be taken if it's actually faster. At a large scale, a single extra if statement here adds up across multiple binaries since `stable_sort` is used widely. Could the original unaligned new operator be added for parity?

In D152208#4448532, @hans wrote:

We're hitting CFI errors in Chromium after this, see https://github.com/llvm/llvm-project/issues/63523

Friendly ping to make sure this doesn't get lost in the other discussion.

Should we partially revert this change? Instead of making use of the new interface for stable_sort, the old get_temporary_buffer can be used (as an internal API), until the performance parity is reached?

In D152208#4452378, @hans wrote:

In D152208#4448532, @hans wrote:

We're hitting CFI errors in Chromium after this, see https://github.com/llvm/llvm-project/issues/63523

Friendly ping to make sure this doesn't get lost in the other discussion.

I'm working on adding a CFI run to our CI right now. It looks like I just have to add _LIBCPP_NO_CFI to __make_uninitialized_buffer, given that get_temporary_buffer also has that attribute. I hope to get the patch completed by tomorrow.

In D152208#4453582, @davidxl wrote:

Should we partially revert this change? Instead of making use of the new interface for stable_sort, the old get_temporary_buffer can be used (as an internal API), until the performance parity is reached?

I don't see why. If the allocation is actually so costly that it dominates stable_sort, you should either look into whether you actually need stable_sort for your workload or look into getting a better allocator (TCMalloc is probably not bad, so I suspect that's not the problem). Sorting doesn't seem like a task where a single allocation should ever dominate. I'd also like to see the actual use case where a single branch is so problematic that it warrants a revert before doing so. I don't even know how much of an impact this patch actually is in your specific use-case.

In D152208#4453925, @philnik wrote:

In D152208#4452378, @hans wrote:

In D152208#4448532, @hans wrote:

We're hitting CFI errors in Chromium after this, see https://github.com/llvm/llvm-project/issues/63523

Friendly ping to make sure this doesn't get lost in the other discussion.

I'm working on adding a CFI run to our CI right now. It looks like I just have to add _LIBCPP_NO_CFI to __make_uninitialized_buffer, given that get_temporary_buffer also has that attribute. I hope to get the patch completed by tomorrow.

In D152208#4453582, @davidxl wrote:

Should we partially revert this change? Instead of making use of the new interface for stable_sort, the old get_temporary_buffer can be used (as an internal API), until the performance parity is reached?

I don't see why. If the allocation is actually so costly that it dominates stable_sort, you should either look into whether you actually need stable_sort for your workload or look into getting a better allocator (TCMalloc is probably not bad, so I suspect that's not the problem). Sorting doesn't seem like a task where a single allocation should ever dominate. I'd also like to see the actual use case where a single branch is so problematic that it warrants a revert before doing so. I don't even know how much of an impact this patch actually is in your specific use-case.

My understanding is that problem is caused by the more aggressive use of aligned allocation with this patch (not due to increased memory usage). The old interface has special code (additional branch) to check the overalignment and specialize based on that. Note those check are compile time constant and will be optimized away. In that regard, the new implementation misses out the optimization (specialization) so I believe it is a regression.

The problem happens to be caught by using tcmalloc because it has a very well tuned code path for unaligned case.

In D152208#4453925, @philnik wrote:

I'm working on adding a CFI run to our CI right now. It looks like I just have to add _LIBCPP_NO_CFI to __make_uninitialized_buffer, given that get_temporary_buffer also has that attribute. I hope to get the patch completed by tomorrow.

I see the patch landed as https://github.com/llvm/llvm-project/commit/420a204d52205f1277a8d5df3dbafac6082e02e2 thanks!

@philnik

First, I appreciate the energy you're putting into this project. Your work doesn't go unnoticed.

As for this std::stable_sort topic, I get your position. Ideally, the performance difference between these
two approaches wouldn't be worth discussing. But reality often diverges from theory.

The folks who raised concerns are experiencing substantial performance impacts.
They've presented data to support this - the kind of proactive involvement we strive to cultivate in our community.

However, some of the discourse seems a bit hasty.
Instead of outright dismissing these concerns, let's take a step back and understand the nuances.
If clarity is missing, ask for more details.

It's also worth mentioning that referring to someone's implementation as "bad" may not be the best choice of words.
A more understanding tone can make a difference in promoting a collaborative and respectful community.

Your dedication to this project is appreciated. Let's keep improving it together.

In D152208#4453925, @philnik wrote:

In D152208#4452378, @hans wrote:

In D152208#4448532, @hans wrote:

We're hitting CFI errors in Chromium after this, see https://github.com/llvm/llvm-project/issues/63523

Friendly ping to make sure this doesn't get lost in the other discussion.

I'm working on adding a CFI run to our CI right now. It looks like I just have to add _LIBCPP_NO_CFI to __make_uninitialized_buffer, given that get_temporary_buffer also has that attribute. I hope to get the patch completed by tomorrow.

In D152208#4453582, @davidxl wrote:

Should we partially revert this change? Instead of making use of the new interface for stable_sort, the old get_temporary_buffer can be used (as an internal API), until the performance parity is reached?

I don't see why. If the allocation is actually so costly that it dominates stable_sort, you should either look into whether you actually need stable_sort for your workload

Forgot to answer the common on the use of stable_sort:

The stable sort is called by another common library not controlled directly by the user. Besides the behavior depends on the input. The problem will manifest when the array to be sorted is small.

or look into getting a better allocator (TCMalloc is probably not bad, so I suspect that's not the problem). Sorting doesn't seem like a task where a single allocation should ever dominate. I'd also like to see the actual use case where a single branch is so problematic that it warrants a revert before doing so. I don't even know how much of an impact this patch actually is in your specific use-case.

In D152208#4457550, @EricWF wrote:

@philnik

First, I appreciate the energy you're putting into this project. Your work doesn't go unnoticed.

As for this std::stable_sort topic, I get your position. Ideally, the performance difference between these
two approaches wouldn't be worth discussing. But reality often diverges from theory.

The folks who raised concerns are experiencing substantial performance impacts.

Maybe I've missed it, but I didn't see any data other than "performance got worse".

They've presented data to support this - the kind of proactive involvement we strive to cultivate in our community.

However, some of the discourse seems a bit hasty.
Instead of outright dismissing these concerns, let's take a step back and understand the nuances.

My intention was never to dismiss the concern, but I'd like a specific use-case so I can understand what the problem is.

If clarity is missing, ask for more details.

It's also worth mentioning that referring to someone's implementation as "bad" may not be the best choice of words.
A more understanding tone can make a difference in promoting a collaborative and respectful community.

Your dedication to this project is appreciated. Let's keep improving it together.

Yeah, that might not have been the best choice of words.

In D152208#4457564, @davidxl wrote:

Forgot to answer the common on the use of stable_sort:

The stable sort is called by another common library not controlled directly by the user. Besides the behavior depends on the input. The problem will manifest when the array to be sorted is small.

Maybe the better approach would be to improve our __stable_sort_switch heuristic. Would it be possible for you to share the problematic use-case? I suspect you might be using a non-trivially-copy-assignable type which isn't that costly to copy, resulting in small allocations that don't make any sense.

@philnik I rerouted the allocations for temporary buffer to go through the __libcpp_allocate infrastructure. It should address the issue. Please take a look at D154017

In D152208#4457678, @philnik wrote:

In D152208#4457550, @EricWF wrote:

@philnik

First, I appreciate the energy you're putting into this project. Your work doesn't go unnoticed.

As for this std::stable_sort topic, I get your position. Ideally, the performance difference between these
two approaches wouldn't be worth discussing. But reality often diverges from theory.

The folks who raised concerns are experiencing substantial performance impacts.

Maybe I've missed it, but I didn't see any data other than "performance got worse".

They've presented data to support this - the kind of proactive involvement we strive to cultivate in our community.

However, some of the discourse seems a bit hasty.
Instead of outright dismissing these concerns, let's take a step back and understand the nuances.

My intention was never to dismiss the concern, but I'd like a specific use-case so I can understand what the problem is.

If clarity is missing, ask for more details.

It's also worth mentioning that referring to someone's implementation as "bad" may not be the best choice of words.
A more understanding tone can make a difference in promoting a collaborative and respectful community.

Your dedication to this project is appreciated. Let's keep improving it together.

Yeah, that might not have been the best choice of words.

In D152208#4457564, @davidxl wrote:

Forgot to answer the common on the use of stable_sort:

The stable sort is called by another common library not controlled directly by the user. Besides the behavior depends on the input. The problem will manifest when the array to be sorted is small.

Maybe the better approach would be to improve our __stable_sort_switch heuristic. Would it be possible for you to share the problematic use-case? I suspect you might be using a non-trivially-copy-assignable type which isn't that costly to copy, resulting in small allocations that don't make any sense.

This patch affects multiple benchmarks. I checked one of them again, and it turned out the new calls to the aligned new are from inplace_merge which is also affected in this revision. Coincidentally, the stable_sort and inplace_merge are called back-to-back in the caller.

IMO I agree that if we were using aligned allocation and we're not anymore, this is simply something that went unnoticed in this patch and we should fix it.

Should stable_sort be dominated by an allocation? Probably not, that seems a bit crazy. But nonetheless it seems like we did introduce an unintended difference in this patch and it should be pretty easy to fix. Let's follow up on D152208.

In D152208#4459623, @ldionne wrote:

IMO I agree that if we were using aligned allocation and we're not anymore, this is simply something that went unnoticed in this patch and we should fix it.

Should stable_sort be dominated by an allocation? Probably not, that seems a bit crazy. But nonetheless it seems like we did introduce an unintended difference in this patch and it should be pretty easy to fix. Let's follow up on D152208.

Looks like I read this issue the wrong way around. So we were doing unaligned allocation before, and now we're using aligned allocation (which should be more correct, no?) and it's slower. That's a lot more curious, and I think it's worth investigating why this caused a regression because there might be some sort of perf issue in how your allocator deals with aligned allocation (or maybe on the libc++ side). Anyway, let's follow-up in D152208 to avoid forking the discussion too much.

In D152208#4460003, @ldionne wrote:

In D152208#4459623, @ldionne wrote:

IMO I agree that if we were using aligned allocation and we're not anymore, this is simply something that went unnoticed in this patch and we should fix it.

Should stable_sort be dominated by an allocation? Probably not, that seems a bit crazy. But nonetheless it seems like we did introduce an unintended difference in this patch and it should be pretty easy to fix. Let's follow up on D152208.

Looks like I read this issue the wrong way around. So we were doing unaligned allocation before, and now we're using aligned allocation (which should be more correct, no?) and it's slower. That's a lot more curious, and I think it's worth investigating why this caused a regression because there might be some sort of perf issue in how your allocator deals with aligned allocation (or maybe on the libc++ side). Anyway, let's follow-up in D152208 to avoid forking the discussion too much.

As I mentioned in the previous reply, the performance regression comes from addition runtime cost (allocation) in the context of inplace_merge call. A little more debugging shows that in the old version, the allocation can be completely elided when buf_size is 0 (see get_temporary_buffer code), but make_uninitialized_buffer will try to invoke operator new regardless.

// inplace_merge code

difference_type __len1 = _IterOps<_AlgPolicy>::distance(__first, __middle);
difference_type __len2 = _IterOps<_AlgPolicy>::distance(__middle, __last);
difference_type __buf_size = _VSTD::min(__len1, __len2);

In D152208#4460803, @davidxl wrote:
As I mentioned in the previous reply, the performance regression comes from addition runtime cost (allocation) in the context of inplace_merge call. A little more debugging shows that in the old version, the allocation can be completely elided when buf_size is 0 (see get_temporary_buffer code), but make_uninitialized_buffer will try to invoke operator new regardless.

// inplace_merge code
difference_type __len1 = _IterOps<_AlgPolicy>::distance(__first, __middle);
difference_type __len2 = _IterOps<_AlgPolicy>::distance(__middle, __last);
difference_type __buf_size = _VSTD::min(__len1, __len2);

Ok, so this whole thing has nothing to do with the fact that we are using aligned allocation vs not using aligned allocation. It has to do with the fact that we're allocating, period.

I'm going to revert this patch for now, because @EricWF pointed out that we actually had an issue with the class itself -- if constructing any of the elements in the buffer throws, the destruction is not going to happen properly (we'll over-destroy). This wasn't a problem with the use case that this was meant for (PSTL), since we std::terminate() if the construction of any element would throw. But I agree that is a poor general-purpose API to have. We can take another stab at this next week.

ldionne mentioned this in D154161: [libc++] Revert __uninitialized_buffer changes.Jun 29 2023, 3:16 PM

In D152208#4461845, @ldionne wrote:
In D152208#4460803, @davidxl wrote:
As I mentioned in the previous reply, the performance regression comes from addition runtime cost (allocation) in the context of inplace_merge call. A little more debugging shows that in the old version, the allocation can be completely elided when buf_size is 0 (see get_temporary_buffer code), but make_uninitialized_buffer will try to invoke operator new regardless.

// inplace_merge code
difference_type __len1 = _IterOps<_AlgPolicy>::distance(__first, __middle);
difference_type __len2 = _IterOps<_AlgPolicy>::distance(__middle, __last);
difference_type __buf_size = _VSTD::min(__len1, __len2);
Ok, so this whole thing has nothing to do with the fact that we are using aligned allocation vs not using aligned allocation. It has to do with the fact that we're allocating, period.

Correct.

I'm going to revert this patch for now, because @EricWF pointed out that we actually had an issue with the class itself -- if constructing any of the elements in the buffer throws, the destruction is not going to happen properly (we'll over-destroy). This wasn't a problem with the use case that this was meant for (PSTL), since we std::terminate() if the construction of any element would throw. But I agree that is a poor general-purpose API to have. We can take another stab at this next week.

In D152208#4460803, @davidxl wrote:
In D152208#4460003, @ldionne wrote:

In D152208#4459623, @ldionne wrote:

IMO I agree that if we were using aligned allocation and we're not anymore, this is simply something that went unnoticed in this patch and we should fix it.

Should stable_sort be dominated by an allocation? Probably not, that seems a bit crazy. But nonetheless it seems like we did introduce an unintended difference in this patch and it should be pretty easy to fix. Let's follow up on D152208.

Looks like I read this issue the wrong way around. So we were doing unaligned allocation before, and now we're using aligned allocation (which should be more correct, no?) and it's slower. That's a lot more curious, and I think it's worth investigating why this caused a regression because there might be some sort of perf issue in how your allocator deals with aligned allocation (or maybe on the libc++ side). Anyway, let's follow-up in D152208 to avoid forking the discussion too much.

As I mentioned in the previous reply, the performance regression comes from addition runtime cost (allocation) in the context of inplace_merge call. A little more debugging shows that in the old version, the allocation can be completely elided when buf_size is 0 (see get_temporary_buffer code), but make_uninitialized_buffer will try to invoke operator new regardless.

// inplace_merge code
difference_type __len1 = _IterOps<_AlgPolicy>::distance(__first, __middle);
difference_type __len2 = _IterOps<_AlgPolicy>::distance(__middle, __last);
difference_type __buf_size = _VSTD::min(__len1, __len2);

thanks for debugging this! I wish I continued to get a detailed call-graph when seeing the common caller instruction cycle increases rather than guessing it's stable_sort :(

ldionne mentioned this in rGf13e1a65cabb: [libc++] Revert __uninitialized_buffer changes.Jun 30 2023, 6:18 AM

philnik reopened this revision.Jul 3 2023, 11:04 AM

This revision is now accepted and ready to land.Jul 3 2023, 11:04 AM

ldionne mentioned this in D67086: Implement syncstream (p0053).Oct 25 2023, 9:13 AM

Revision Contents

Path

Size

libcxx/

include/

CMakeLists.txt

1 line

__algorithm/

inplace_merge.h

17 lines

stable_partition.h

43 lines

stable_sort.h

16 lines

__memory/

construct_at.h

14 lines

temporary_buffer.h

8 lines

uninitialized_buffer.h

84 lines

module.modulemap.in

1 line

test/

std/

algorithms/

alg.modifying.operations/

alg.partitions/

stable_partition.pass.cpp

49 lines

alg.sorting/

alg.merge/

inplace_merge.pass.cpp

11 lines

alg.sort/

stable.sort/

stable_sort.pass.cpp

9 lines

support/

count_new.h

13 lines

Diff 529632

libcxx/include/CMakeLists.txt

Show First 20 Lines • Show All 493 Lines • ▼ Show 20 Lines	set(files
__memory/ranges_construct_at.h		__memory/ranges_construct_at.h
__memory/ranges_uninitialized_algorithms.h		__memory/ranges_uninitialized_algorithms.h
__memory/raw_storage_iterator.h		__memory/raw_storage_iterator.h
__memory/shared_ptr.h		__memory/shared_ptr.h
__memory/swap_allocator.h		__memory/swap_allocator.h
__memory/temp_value.h		__memory/temp_value.h
__memory/temporary_buffer.h		__memory/temporary_buffer.h
__memory/uninitialized_algorithms.h		__memory/uninitialized_algorithms.h
		__memory/uninitialized_buffer.h
__memory/unique_ptr.h		__memory/unique_ptr.h
__memory/uses_allocator.h		__memory/uses_allocator.h
__memory/uses_allocator_construction.h		__memory/uses_allocator_construction.h
__memory/voidify.h		__memory/voidify.h
__memory_resource/memory_resource.h		__memory_resource/memory_resource.h
__memory_resource/monotonic_buffer_resource.h		__memory_resource/monotonic_buffer_resource.h
__memory_resource/polymorphic_allocator.h		__memory_resource/polymorphic_allocator.h
__memory_resource/pool_options.h		__memory_resource/pool_options.h
▲ Show 20 Lines • Show All 544 Lines • Show Last 20 Lines

libcxx/include/__algorithm/inplace_merge.h

	Show All 18 Lines
	#include <__algorithm/upper_bound.h>			#include <__algorithm/upper_bound.h>
	#include <__config>			#include <__config>
	#include <__functional/identity.h>			#include <__functional/identity.h>
	#include <__iterator/advance.h>			#include <__iterator/advance.h>
	#include <__iterator/distance.h>			#include <__iterator/distance.h>
	#include <__iterator/iterator_traits.h>			#include <__iterator/iterator_traits.h>
	#include <__iterator/reverse_iterator.h>			#include <__iterator/reverse_iterator.h>
	#include <__memory/destruct_n.h>			#include <__memory/destruct_n.h>
	#include <__memory/temporary_buffer.h>			#include <__memory/uninitialized_buffer.h>
	#include <__memory/unique_ptr.h>			#include <__memory/unique_ptr.h>
	#include <__utility/pair.h>			#include <__utility/pair.h>
	#include <new>			#include <new>

	#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)			#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
	# pragma GCC system_header			# pragma GCC system_header
	#endif			#endif

	▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	__inplace_merge(_BidirectionalIterator __first, _BidirectionalIterator __middle, _BidirectionalIterator __last,			__inplace_merge(_BidirectionalIterator __first, _BidirectionalIterator __middle, _BidirectionalIterator __last,
	_Compare&& __comp)			_Compare&& __comp)
	{			{
	typedef typename iterator_traits<_BidirectionalIterator>::value_type value_type;			typedef typename iterator_traits<_BidirectionalIterator>::value_type value_type;
	typedef typename iterator_traits<_BidirectionalIterator>::difference_type difference_type;			typedef typename iterator_traits<_BidirectionalIterator>::difference_type difference_type;
	difference_type __len1 = _IterOps<_AlgPolicy>::distance(__first, __middle);			difference_type __len1 = _IterOps<_AlgPolicy>::distance(__first, __middle);
	difference_type __len2 = _IterOps<_AlgPolicy>::distance(__middle, __last);			difference_type __len2 = _IterOps<_AlgPolicy>::distance(__middle, __last);
	difference_type __buf_size = _VSTD::min(__len1, __len2);			difference_type __buf_size = _VSTD::min(__len1, __len2);
	// TODO: Remove the use of std::get_temporary_buffer			auto __buf = std::__make_uninitialized_buffer<value_type[]>(nothrow, __buf_size);
	_LIBCPP_SUPPRESS_DEPRECATED_PUSH
	pair<value_type*, ptrdiff_t> __buf = _VSTD::get_temporary_buffer<value_type>(__buf_size);
	_LIBCPP_SUPPRESS_DEPRECATED_POP
	unique_ptr<value_type, __return_temporary_buffer> __h(__buf.first);
	return std::__inplace_merge<_AlgPolicy>(			return std::__inplace_merge<_AlgPolicy>(
	std::move(__first), std::move(__middle), std::move(__last), __comp, __len1, __len2, __buf.first, __buf.second);			std::move(__first),
				std::move(__middle),
				std::move(__last),
				__comp,
				__len1,
				__len2,
				__buf.get(),
				__buf ? __buf_size : 0);
	}			}

	template <class _BidirectionalIterator, class _Compare>			template <class _BidirectionalIterator, class _Compare>
	inline _LIBCPP_HIDE_FROM_ABI void inplace_merge(			inline _LIBCPP_HIDE_FROM_ABI void inplace_merge(
	_BidirectionalIterator __first, _BidirectionalIterator __middle, _BidirectionalIterator __last, _Compare __comp) {			_BidirectionalIterator __first, _BidirectionalIterator __middle, _BidirectionalIterator __last, _Compare __comp) {
	std::__inplace_merge<_ClassicAlgPolicy>(			std::__inplace_merge<_ClassicAlgPolicy>(
	std::move(__first), std::move(__middle), std::move(__last), static_cast<__comp_ref_type<_Compare> >(__comp));			std::move(__first), std::move(__middle), std::move(__last), static_cast<__comp_ref_type<_Compare> >(__comp));
	}			}
	Show All 14 Lines

libcxx/include/__algorithm/stable_partition.h

Show All 10 Lines

#include <__algorithm/iterator_operations.h> #include <__algorithm/iterator_operations.h>

#include <__algorithm/rotate.h> #include <__algorithm/rotate.h>

#include <__config> #include <__config>

#include <__iterator/advance.h> #include <__iterator/advance.h>

#include <__iterator/distance.h> #include <__iterator/distance.h>

#include <__iterator/iterator_traits.h> #include <__iterator/iterator_traits.h>

#include <__memory/destruct_n.h> #include <__memory/destruct_n.h>

#include <__memory/temporary_buffer.h> #include <__memory/uninitialized_buffer.h>

MordanteUnsubmitted

Done

#include <__memory/destruct_n.h>

- #include <__memory/temporary_buffer.h>

#include <__memory/uninitialized_buffer.h>

Can this be removed?

Mordante: Can this be removed?

#include <__memory/unique_ptr.h> #include <__memory/unique_ptr.h>

#include <__utility/move.h> #include <__utility/move.h>

#include <__utility/pair.h> #include <__utility/pair.h>

#include <new> #include <new>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER) #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)

# pragma GCC system_header # pragma GCC system_header

#endif #endif

_LIBCPP_BEGIN_NAMESPACE_STD _LIBCPP_BEGIN_NAMESPACE_STD

template <class _AlgPolicy, class _Predicate, class _ForwardIterator, class _Distance, class _Pair> template <class _AlgPolicy, class _Predicate, class _ForwardIterator, class _Distance, class _Pair>

_LIBCPP_HIDE_FROM_ABI _ForwardIterator _LIBCPP_HIDDEN _ForwardIterator

__stable_partition_impl(_ForwardIterator __first, _ForwardIterator __last, _Predicate __pred, __stable_partition_impl(_ForwardIterator __first, _ForwardIterator __last, _Predicate __pred,

_Distance __len, _Pair __p, forward_iterator_tag __fit) _Distance __len, _Pair __p, forward_iterator_tag __fit)

{ {

using _Ops = _IterOps<_AlgPolicy>; using _Ops = _IterOps<_AlgPolicy>;

// *__first is known to be false // *__first is known to be false

// __len >= 1 // __len >= 1

if (__len == 1) if (__len == 1)

▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines while (true)

break; break;

++__first; ++__first;

} }

// We now have a reduced range [__first, __last) // We now have a reduced range [__first, __last)

// *__first is known to be false // *__first is known to be false

typedef typename iterator_traits<_ForwardIterator>::difference_type difference_type; typedef typename iterator_traits<_ForwardIterator>::difference_type difference_type;

typedef typename iterator_traits<_ForwardIterator>::value_type value_type; typedef typename iterator_traits<_ForwardIterator>::value_type value_type;

difference_type __len = _IterOps<_AlgPolicy>::distance(__first, __last); difference_type __len = _IterOps<_AlgPolicy>::distance(__first, __last);

pair<value_type*, ptrdiff_t> __p(0, 0);

unique_ptr<value_type, __return_temporary_buffer> __h; __uninitialized_buffer_t<value_type[]> __buf;

if (__len >= __alloc_limit) if (__len >= __alloc_limit)

{ __buf = std::__make_uninitialized_buffer<value_type[]>(nothrow, __len);

MordanteUnsubmitted

Done

Is never correct when the type does things in its destructor?
Or are all elements in a moved from state?

Mordante: Is never correct when the type does things in its destructor? Or are all elements in a moved…

philnikAuthorUnsubmitted

Done

All the elements get destructed by __stable_partition_impl, so the buffer doesn't have to do anything other than deleting the allocation.

philnik: All the elements get destructed by `__stable_partition_impl`, so the buffer doesn't have to do…

// TODO: Remove the use of std::get_temporary_buffer

_LIBCPP_SUPPRESS_DEPRECATED_PUSH

__p = _VSTD::get_temporary_buffer<value_type>(__len);

_LIBCPP_SUPPRESS_DEPRECATED_POP

__h.reset(__p.first);

}

return std::__stable_partition_impl<_AlgPolicy, _Predicate&>( return std::__stable_partition_impl<_AlgPolicy, _Predicate&>(

std::move(__first), std::move(__last), __pred, __len, __p, forward_iterator_tag()); std::move(__first),

std::move(__last),

__pred,

__len,

std::make_pair(__buf.get(), __buf ? __len : 0),

forward_iterator_tag());

} }

template <class _AlgPolicy, class _Predicate, class _BidirectionalIterator, class _Distance, class _Pair> template <class _AlgPolicy, class _Predicate, class _BidirectionalIterator, class _Distance, class _Pair>

_BidirectionalIterator _BidirectionalIterator

__stable_partition_impl(_BidirectionalIterator __first, _BidirectionalIterator __last, _Predicate __pred, __stable_partition_impl(_BidirectionalIterator __first, _BidirectionalIterator __last, _Predicate __pred,

_Distance __len, _Pair __p, bidirectional_iterator_tag __bit) _Distance __len, _Pair __p, bidirectional_iterator_tag __bit)

{ {

using _Ops = _IterOps<_AlgPolicy>; using _Ops = _IterOps<_AlgPolicy>;

▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines do

if (__first == --__last) if (__first == --__last)

return __first; return __first;

} while (!__pred(*__last)); } while (!__pred(*__last));

// We now have a reduced range [__first, __last] // We now have a reduced range [__first, __last]

// *__first is known to be false // *__first is known to be false

// *__last is known to be true // *__last is known to be true

// __len >= 2 // __len >= 2

difference_type __len = _IterOps<_AlgPolicy>::distance(__first, __last) + 1; difference_type __len = _IterOps<_AlgPolicy>::distance(__first, __last) + 1;

pair<value_type*, ptrdiff_t> __p(0, 0); __uninitialized_buffer_t<value_type[]> __buf;

unique_ptr<value_type, __return_temporary_buffer> __h;

ldionneUnsubmitted

Done

I think __return_temporary_buffer is not needed anymore.

ldionne: I think `__return_temporary_buffer` is not needed anymore.

if (__len >= __alloc_limit) if (__len >= __alloc_limit)

{ __buf = std::__make_uninitialized_buffer<value_type[]>(nothrow, __len);

// TODO: Remove the use of std::get_temporary_buffer

_LIBCPP_SUPPRESS_DEPRECATED_PUSH

__p = _VSTD::get_temporary_buffer<value_type>(__len);

_LIBCPP_SUPPRESS_DEPRECATED_POP

__h.reset(__p.first);

}

return std::__stable_partition_impl<_AlgPolicy, _Predicate&>( return std::__stable_partition_impl<_AlgPolicy, _Predicate&>(

std::move(__first), std::move(__last), __pred, __len, __p, bidirectional_iterator_tag()); std::move(__first),

std::move(__last),

__pred,

__len,

std::make_pair(__buf.get(), __buf ? __len : 0),

bidirectional_iterator_tag());

} }

template <class _AlgPolicy, class _Predicate, class _ForwardIterator, class _IterCategory> template <class _AlgPolicy, class _Predicate, class _ForwardIterator, class _IterCategory>

_LIBCPP_HIDE_FROM_ABI _LIBCPP_HIDE_FROM_ABI

_ForwardIterator __stable_partition( _ForwardIterator __stable_partition(

_ForwardIterator __first, _ForwardIterator __last, _Predicate&& __pred, _IterCategory __iter_category) { _ForwardIterator __first, _ForwardIterator __last, _Predicate&& __pred, _IterCategory __iter_category) {

return std::__stable_partition_impl<_AlgPolicy, __remove_cvref_t<_Predicate>&>( return std::__stable_partition_impl<_AlgPolicy, __remove_cvref_t<_Predicate>&>(

std::move(__first), std::move(__last), __pred, __iter_category); std::move(__first), std::move(__last), __pred, __iter_category);

Show All 15 Lines

libcxx/include/__algorithm/stable_sort.h

	Show All 12 Lines
	#include <__algorithm/comp_ref_type.h>			#include <__algorithm/comp_ref_type.h>
	#include <__algorithm/inplace_merge.h>			#include <__algorithm/inplace_merge.h>
	#include <__algorithm/iterator_operations.h>			#include <__algorithm/iterator_operations.h>
	#include <__algorithm/sort.h>			#include <__algorithm/sort.h>
	#include <__config>			#include <__config>
	#include <__debug_utils/strict_weak_ordering_check.h>			#include <__debug_utils/strict_weak_ordering_check.h>
	#include <__iterator/iterator_traits.h>			#include <__iterator/iterator_traits.h>
	#include <__memory/destruct_n.h>			#include <__memory/destruct_n.h>
	#include <__memory/temporary_buffer.h>
	#include <__memory/unique_ptr.h>			#include <__memory/unique_ptr.h>
	#include <__type_traits/is_trivially_copy_assignable.h>			#include <__type_traits/is_trivially_copy_assignable.h>
	#include <__utility/move.h>			#include <__utility/move.h>
	#include <__utility/pair.h>			#include <__utility/pair.h>
	#include <new>			#include <new>

	#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)			#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
	# pragma GCC system_header			# pragma GCC system_header
	▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines

	template <class _AlgPolicy, class _RandomAccessIterator, class _Compare>			template <class _AlgPolicy, class _RandomAccessIterator, class _Compare>
	inline _LIBCPP_HIDE_FROM_ABI			inline _LIBCPP_HIDE_FROM_ABI
	void __stable_sort_impl(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare& __comp) {			void __stable_sort_impl(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare& __comp) {
	using value_type = typename iterator_traits<_RandomAccessIterator>::value_type;			using value_type = typename iterator_traits<_RandomAccessIterator>::value_type;
	using difference_type = typename iterator_traits<_RandomAccessIterator>::difference_type;			using difference_type = typename iterator_traits<_RandomAccessIterator>::difference_type;

	difference_type __len = __last - __first;			difference_type __len = __last - __first;
	pair<value_type*, ptrdiff_t> __buf(0, 0);			__uninitialized_buffer_t<value_type[]> __buf;
	unique_ptr<value_type, __return_temporary_buffer> __h;			if (__len > static_cast<difference_type>(__stable_sort_switch<value_type>::value))
	if (__len > static_cast<difference_type>(__stable_sort_switch<value_type>::value)) {			__buf = std::__make_uninitialized_buffer<value_type[]>(nothrow, __len);
	// TODO: Remove the use of std::get_temporary_buffer
	_LIBCPP_SUPPRESS_DEPRECATED_PUSH
	__buf = std::get_temporary_buffer<value_type>(__len);
	_LIBCPP_SUPPRESS_DEPRECATED_POP
	__h.reset(__buf.first);
	}

	std::__stable_sort<_AlgPolicy, __comp_ref_type<_Compare> >(__first, __last, __comp, __len, __buf.first, __buf.second);			std::__stable_sort<_AlgPolicy, __comp_ref_type<_Compare> >(
				__first, __last, __comp, __len, __buf.get(), __buf ? __len : 0);
	std::__check_strict_weak_ordering_sorted(__first, __last, __comp);			std::__check_strict_weak_ordering_sorted(__first, __last, __comp);
	}			}

	template <class _RandomAccessIterator, class _Compare>			template <class _RandomAccessIterator, class _Compare>
	inline _LIBCPP_HIDE_FROM_ABI			inline _LIBCPP_HIDE_FROM_ABI
	void stable_sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {			void stable_sort(_RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp) {
	std::__stable_sort_impl<_ClassicAlgPolicy>(std::move(__first), std::move(__last), __comp);			std::__stable_sort_impl<_ClassicAlgPolicy>(std::move(__first), std::move(__last), __comp);
	}			}
	Show All 10 Lines

libcxx/include/__memory/construct_at.h

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines

_BidirectionalIterator __reverse_destroy(_BidirectionalIterator __first, _BidirectionalIterator __last) {

while (__last != __first) {

--__last;

std::__destroy_at(std::addressof(*__last));

}

return __last;

}

template <class _ForwardIterator, class _Size>

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20

_ForwardIterator __destroy_n(_ForwardIterator __first, _Size __n) {

while (__n > 0) {

MordanteUnsubmitted

Done

_ForwardIterator __destroy_n(_ForwardIterator __first, _Size __n) {

- for (; __n > 0; (void)++__first, --__n)

+ for (; __n > 0; ++__first, (void)--__n)

std::__destroy_at(std::addressof(*__first));

Mordante:

std::__destroy_at(std::addressof(*__first));

++__first;

--__n;

}

return __first;

}

#if _LIBCPP_STD_VER >= 17

template <class _Tp, enable_if_t<!is_array_v<_Tp>, int> = 0>

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20

void destroy_at(_Tp* __loc) {

std::__destroy_at(__loc);

}

Show All 9 Lines

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20

void destroy(_ForwardIterator __first, _ForwardIterator __last) {

(void)std::__destroy(std::move(__first), std::move(__last));

}

template <class _ForwardIterator, class _Size>

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20

_ForwardIterator destroy_n(_ForwardIterator __first, _Size __n) {

for (; __n > 0; (void)++__first, --__n)

return std::__destroy_n(__first, __n);

std::__destroy_at(std::addressof(*__first));

return __first;

}

#endif // _LIBCPP_STD_VER >= 17

_LIBCPP_END_NAMESPACE_STD

#endif // _LIBCPP___MEMORY_CONSTRUCT_AT_H

libcxx/include/__memory/temporary_buffer.h

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines

	template <class _Tp>			template <class _Tp>
	inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_DEPRECATED_IN_CXX17			inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_DEPRECATED_IN_CXX17
	void return_temporary_buffer(_Tp* __p) _NOEXCEPT			void return_temporary_buffer(_Tp* __p) _NOEXCEPT
	{			{
	_VSTD::__libcpp_deallocate_unsized((void*)__p, _LIBCPP_ALIGNOF(_Tp));			_VSTD::__libcpp_deallocate_unsized((void*)__p, _LIBCPP_ALIGNOF(_Tp));
	}			}

	struct __return_temporary_buffer
	{
	_LIBCPP_SUPPRESS_DEPRECATED_PUSH
	template <class _Tp>
	_LIBCPP_INLINE_VISIBILITY void operator()(_Tp* __p) const {_VSTD::return_temporary_buffer(__p);}
	_LIBCPP_SUPPRESS_DEPRECATED_POP
	};

	_LIBCPP_END_NAMESPACE_STD			_LIBCPP_END_NAMESPACE_STD

	#endif // _LIBCPP___MEMORY_TEMPORARY_BUFFER_H			#endif // _LIBCPP___MEMORY_TEMPORARY_BUFFER_H

libcxx/include/__memory/uninitialized_buffer.h

This file was added.

//===----------------------------------------------------------------------===//

ldionneUnsubmitted

Done

Per our discussion just now, this might be worth replacing by:

template <class _Destructor>
struct __uninitialized_buffer_deleter {
    size_t __count_;
    _Destructor __dtor_;

    template <class _Tp>
    void operator()(_Tp* __ptr) const {
        __dtor_(__ptr, __count_);
#ifdef _LIBCPP_HAS_NO_ALIGNED_ALLOCATION
        ::operator delete(__ptr, __count_ * sizeof(_Tp));
#else
        ::operator delete(__ptr, __count_ * sizeof(_Tp), align_val_t(_LIBCPP_ALIGNOF(_Tp)));
#endif
    }
};

template <class _Array, class _Destructor>
unique_ptr<_Array, __uninitialized_buffer_deleter<_Destructor> > __get_uninitialized_buffer(size_t __n, _Destructor __destroy) {
    static_assert(is_array_v<_Array>);
    using _Tp = remove_extent_t<_Array>;
#ifdef _LIBCPP_HAS_NO_ALIGNED_ALLOCATION
    _Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count));
#else
    _Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count, align_val_t(_LIBCPP_ALIGNOF(_Tp))));
#endif

    using _Deleter = __uninitialized_buffer_deleter<_Destructor>;
    _Deleter __deleter{__n, __destroy};
    return unique_ptr<_Array, _Deleter>(__ptr, __deleter);
}

ldionne: Per our discussion just now, this might be worth replacing by: ``` template <class…

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#ifndef _LIBCPP___MEMORY_UNINITIALIZED_BUFFER_H

#define _LIBCPP___MEMORY_UNINITIALIZED_BUFFER_H

#include <__config>

#include <__memory/construct_at.h>

#include <__memory/unique_ptr.h>

#include <__type_traits/is_array.h>

#include <__type_traits/is_default_constructible.h>

#include <__type_traits/remove_extent.h>

#include <cstddef>

#include <new>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)

# pragma GCC system_header

#endif

// __make_uninitialized_buffer is a utility function to allocate some memory for scratch storage. The __destructor is

// called before deleting the memory, making it possible to destroy any leftover elements. The __destructor is called

// with the pointer to the first element and the total number of elements.

ldionneUnsubmitted

Done

Can you please document what this class does. In particular, please document how the destructor function gets called (and include what the count represents -- bytes or number of elements). This is pretty obvious if you stop to think about it, but it should be documented.

ldionne: Can you please document what this class does. In particular, please document how the destructor…

_LIBCPP_BEGIN_NAMESPACE_STD

template <class _Destructor>

class __uninitialized_buffer_deleter {

size_t __count_;

_Destructor __destructor_;

MordanteUnsubmitted

Done

#ifdef _LIBCPP_HAS_NO_ALIGNED_ALLOCATION

- _LIBCPP_HIDE_FROM_ABI __uninitialized_buffer(size_type __count)

+ _LIBCPP_HIDE_FROM_ABI __uninitialized_buffer(size_t __count)

: __data_(static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count))), __count_(__count) {}

Mordante:

public:

template <class _Dummy = int, __enable_if_t<is_default_constructible<_Destructor>::value, _Dummy> = 0>

__uninitialized_buffer_deleter() : __count_(0) {}

__uninitialized_buffer_deleter(size_t __count, _Destructor __destructor)

: __count_(__count), __destructor_(std::move(__destructor)) {}

template <class _Tp>

_LIBCPP_HIDE_FROM_ABI void operator()(_Tp* __ptr) {

__destructor_(__ptr, __count_);

#ifdef _LIBCPP_HAS_NO_ALIGNED_ALLOCATION

#if _LIBCPP_STD_VER >= 14

::operator delete(__ptr, __count_ * sizeof(_Tp));

#else

::operator delete(__ptr);

#endif

#else

::operator delete(__ptr, __count_ * sizeof(_Tp), align_val_t(_LIBCPP_ALIGNOF(_Tp)));

#endif

}

};

struct __noop {

template <class... _Args>

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 void operator()(_Args&&...) const {}

};

template <class _Array, class _Destructor = __noop>

using __uninitialized_buffer_t = unique_ptr<_Array, __uninitialized_buffer_deleter<_Destructor> >;

template <class _Array, class _Destructor = __noop>

_LIBCPP_HIDE_FROM_ABI __uninitialized_buffer_t<_Array, _Destructor>

__make_uninitialized_buffer(nothrow_t, size_t __count, _Destructor __destructor = __noop()) {

static_assert(is_array<_Array>::value, "");

using _Tp = __remove_extent_t<_Array>;

#ifdef _LIBCPP_HAS_NO_ALIGNED_ALLOCATION

_Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count, nothrow));

#else

mingminglUnsubmitted

Not Done

Hi, a regression is observed in a benchmark that uses std::stable_sort intensively.

Comparing baseline and the regressed case, the baseline (using get_temporary_buffer) uses unaligned alloc ( void* operator new[](size_t size) ) when objects are not over aligned, and the regressed uses aligned alloc for all objects. This may increase the memory usage as well.

Given std::stable_sort is pretty important, I wonder if this regression warrants a fix to have consistent usage of (aligned or unaligned) new operator?

mingmingl: Hi, a regression is observed in a benchmark that uses `std::stable_sort` intensively.

philnikAuthorUnsubmitted

Done

I don't understand how using the aligned alloc would increase memory usage. void* operator new[](size_t size) is basically void* operator new[](size_t size, align_val_t{__STDCPP_DEFAULT_NEW_ALIGNMENT__}). So unless you have a bad implementation it shouldn't make any difference.

philnik: I don't understand how using the aligned alloc would increase memory usage. `void* operator new…

mingminglUnsubmitted

Not Done

thanks for the quick response!

To share more context, the malloc implementation is tcmalloc (https://github.com/google/tcmalloc); and aligned alloc exercises more instructions than unaligned one. Considering the affected library is stable_sort, the benefit of keeping unaligned alloc is not small.

In the baseline, unaligned alloc uses void* operator new[](size_t size) noexcept(false) corresponds to TCMallocInternalNewArray (an alias of TCMallocInternalNew), the implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1134

In the regressed case, operator new[](unsigned long, std::align_val_t, std::nothrow_t const&) is used -> TCMallocInternalNewArrayAlignedNothrow is used. The implementation is https://github.com/google/tcmalloc/blob/139e709eefd5bc6a301868f752cb2d8931d47d45/tcmalloc/tcmalloc.cc#L1195

Comparing the llvm-objdump output below, aligned operator new has a longer instruction sequence (0x154 for regressed case, 0xec for baseline) -> this increases the number of instructions executed on a hot path (std::stable_sort).

000000000b4702c0 g     F section-name	00000000000000ec tcmalloc_size_returning_operator_new   # new operator used in BASELINE
000000000b4703c0 g     F section-name	0000000000000154 tcmalloc_size_returning_operator_new_aligned
000000000b470540 g     F section-name	00000000000001ec tcmalloc_size_returning_operator_new_hot_cold
000000000b470740 g     F section-name	00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold
000000000b470a00 g     F section-name 00000000000000ec tcmalloc_size_returning_operator_new_nothrow
000000000b470b00 g     F section-name	0000000000000154 tcmalloc_size_returning_operator_new_aligned_nothrow # new operator in EXPERIMENT
000000000b470c80 g     F section-name	00000000000001ec tcmalloc_size_returning_operator_new_hot_cold_nothrow
000000000b470e80 g     F section-name	00000000000002ac tcmalloc_size_returning_operator_new_aligned_hot_cold_nothrow

mingmingl: thanks for the quick response! To share more context, the malloc implementation is tcmalloc…

philnikAuthorUnsubmitted

Done

How exactly does that correlate to memory use? I have no idea how TCMalloc is implemented, so your links don't really help me. It could just be the case that the aligned path isn't as well optimized as the non-aligned one, resulting in slightly worse performance. I don't see any theoretical reason the aligned path should be any slower than the non-aligned one. Worst-case should be a single condition to see whether the non-aligned path should be taken if it's actually faster.

philnik: How exactly does that correlate to memory use? I have no idea how TCMalloc is implemented, so…

mingminglUnsubmitted

Not Done

How exactly does that correlate to memory use?

I consulted tcmalloc experts and got the confirmation that this won't increase memory usage; however, using aligned alloc where it's not necessary is going to slow down the execution. I'm not very familiar with the implementation either and misread how alignment affects the size class calculation; sorry about the wrong statement on memory increase.

In the regressed case, aligned operator new does extra work to achieve the same results. To put it the other way, the unaligned operator new can use a compile time constant in its calculations, and saves a runtime cost to choose a size class.

This 10-line code snippet and the function should hopefully roughly explain where the additional runtime cost comes from when aligned alloc is used (without a full implementation detail)

Worst-case should be a single condition to see whether the non-aligned path should be taken if it's actually faster.

At a large scale, a single extra if statement here adds up across multiple binaries since stable_sort is used widely.

Could the original unaligned new operator be added for parity?

mingmingl: > How exactly does that correlate to memory use? I consulted tcmalloc experts and got the…

_Tp* __ptr = static_cast<_Tp*>(::operator new(sizeof(_Tp) * __count, align_val_t(_LIBCPP_ALIGNOF(_Tp)), nothrow));

#endif

using _Deleter = __uninitialized_buffer_deleter<_Destructor>;

return unique_ptr<_Array, _Deleter>(__ptr, _Deleter(__count, std::move(__destructor)));

}

_LIBCPP_END_NAMESPACE_STD

#endif // _LIBCPP___MEMORY_UNINITIALIZED_BUFFER_H

libcxx/include/module.modulemap.in

Show First 20 Lines • Show All 1,204 Lines • ▼ Show 20 Lines	module __memory {
export algorithm.__algorithm.in_out_result		export algorithm.__algorithm.in_out_result
}		}
module raw_storage_iterator { private header "__memory/raw_storage_iterator.h" }		module raw_storage_iterator { private header "__memory/raw_storage_iterator.h" }
module shared_ptr { private header "__memory/shared_ptr.h" }		module shared_ptr { private header "__memory/shared_ptr.h" }
module swap_allocator { private header "__memory/swap_allocator.h" }		module swap_allocator { private header "__memory/swap_allocator.h" }
module temp_value { private header "__memory/temp_value.h" }		module temp_value { private header "__memory/temp_value.h" }
module temporary_buffer { private header "__memory/temporary_buffer.h" }		module temporary_buffer { private header "__memory/temporary_buffer.h" }
module uninitialized_algorithms { private header "__memory/uninitialized_algorithms.h" }		module uninitialized_algorithms { private header "__memory/uninitialized_algorithms.h" }
		module uninitialized_buffer { private header "__memory/uninitialized_buffer.h" }
module unique_ptr { private header "__memory/unique_ptr.h" }		module unique_ptr { private header "__memory/unique_ptr.h" }
module uses_allocator { private header "__memory/uses_allocator.h" }		module uses_allocator { private header "__memory/uses_allocator.h" }
module uses_allocator_construction { private header "__memory/uses_allocator_construction.h" }		module uses_allocator_construction { private header "__memory/uses_allocator_construction.h" }
module voidify { private header "__memory/voidify.h" }		module voidify { private header "__memory/voidify.h" }
}		}
}		}
module memory_resource {		module memory_resource {
header "memory_resource"		header "memory_resource"
▲ Show 20 Lines • Show All 684 Lines • Show Last 20 Lines

libcxx/test/std/algorithms/alg.modifying.operations/alg.partitions/stable_partition.pass.cpp

Show All 11 Lines
// requires ShuffleIterator<Iter>		// requires ShuffleIterator<Iter>
// && CopyConstructible<Pred>		// && CopyConstructible<Pred>
// Iter		// Iter
// stable_partition(Iter first, Iter last, Pred pred);		// stable_partition(Iter first, Iter last, Pred pred);

#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <memory>		#include <memory>
		#include <vector>

#include "test_macros.h"		#include "count_new.h"
#include "test_iterators.h"		#include "test_iterators.h"
		#include "test_macros.h"

struct is_odd		struct is_odd
{		{
bool operator()(const int& i) const {return i & 1;}		bool operator()(const int& i) const {return i & 1;}
};		};

struct odd_first		struct odd_first
{		{
bool operator()(const std::pair<int,int>& p) const		bool operator()(const std::pair<int,int>& p) const
{return p.first & 1;}		{return p.first & 1;}
};		};

template <class Iter>		template <class Iter>
void		void
test()		test()
{		{
{ // check mixed		{ // check mixed
typedef std::pair<int,int> P;		typedef std::pair<int,int> P;
P array[] =		P array[] =
{		{
P(0, 1),		P(0, 1),
P(0, 2),		P(0, 2),
P(1, 1),		P(1, 1),
P(1, 2),		P(1, 2),
P(2, 1),		P(2, 1),
Show All 11 Lines	test()
assert(array[2] == P(3, 1));		assert(array[2] == P(3, 1));
assert(array[3] == P(3, 2));		assert(array[3] == P(3, 2));
assert(array[4] == P(0, 1));		assert(array[4] == P(0, 1));
assert(array[5] == P(0, 2));		assert(array[5] == P(0, 2));
assert(array[6] == P(2, 1));		assert(array[6] == P(2, 1));
assert(array[7] == P(2, 2));		assert(array[7] == P(2, 2));
assert(array[8] == P(4, 1));		assert(array[8] == P(4, 1));
assert(array[9] == P(4, 2));		assert(array[9] == P(4, 2));
}		}
{		{
typedef std::pair<int,int> P;		typedef std::pair<int,int> P;
P array[] =		P array[] =
{		{
P(0, 1),		P(0, 1),
P(0, 2),		P(0, 2),
P(1, 1),		P(1, 1),
P(1, 2),		P(1, 2),
P(2, 1),		P(2, 1),
Show All 22 Lines	test()
// check one true		// check one true
r = std::stable_partition(Iter(array), Iter(array+1), odd_first());		r = std::stable_partition(Iter(array), Iter(array+1), odd_first());
assert(base(r) == array+1);		assert(base(r) == array+1);
assert(array[0] == P(1, 1));		assert(array[0] == P(1, 1));
// check one false		// check one false
r = std::stable_partition(Iter(array+4), Iter(array+5), odd_first());		r = std::stable_partition(Iter(array+4), Iter(array+5), odd_first());
assert(base(r) == array+4);		assert(base(r) == array+4);
assert(array[4] == P(0, 1));		assert(array[4] == P(0, 1));
}		}
{ // check all false		{ // check all false
typedef std::pair<int,int> P;		typedef std::pair<int,int> P;
P array[] =		P array[] =
{		{
P(0, 1),		P(0, 1),
P(0, 2),		P(0, 2),
P(2, 1),		P(2, 1),
P(2, 2),		P(2, 2),
P(4, 1),		P(4, 1),
Show All 11 Lines	test()
assert(array[2] == P(2, 1));		assert(array[2] == P(2, 1));
assert(array[3] == P(2, 2));		assert(array[3] == P(2, 2));
assert(array[4] == P(4, 1));		assert(array[4] == P(4, 1));
assert(array[5] == P(4, 2));		assert(array[5] == P(4, 2));
assert(array[6] == P(6, 1));		assert(array[6] == P(6, 1));
assert(array[7] == P(6, 2));		assert(array[7] == P(6, 2));
assert(array[8] == P(8, 1));		assert(array[8] == P(8, 1));
assert(array[9] == P(8, 2));		assert(array[9] == P(8, 2));
}		}
{ // check all true		{ // check all true
typedef std::pair<int,int> P;		typedef std::pair<int,int> P;
P array[] =		P array[] =
{		{
P(1, 1),		P(1, 1),
P(1, 2),		P(1, 2),
P(3, 1),		P(3, 1),
P(3, 2),		P(3, 2),
P(5, 1),		P(5, 1),
Show All 11 Lines	test()
assert(array[2] == P(3, 1));		assert(array[2] == P(3, 1));
assert(array[3] == P(3, 2));		assert(array[3] == P(3, 2));
assert(array[4] == P(5, 1));		assert(array[4] == P(5, 1));
assert(array[5] == P(5, 2));		assert(array[5] == P(5, 2));
assert(array[6] == P(7, 1));		assert(array[6] == P(7, 1));
assert(array[7] == P(7, 2));		assert(array[7] == P(7, 2));
assert(array[8] == P(9, 1));		assert(array[8] == P(9, 1));
assert(array[9] == P(9, 2));		assert(array[9] == P(9, 2));
}		}
{ // check all false but first true		{ // check all false but first true
typedef std::pair<int,int> P;		typedef std::pair<int,int> P;
P array[] =		P array[] =
{		{
P(1, 1),		P(1, 1),
P(0, 2),		P(0, 2),
P(2, 1),		P(2, 1),
P(2, 2),		P(2, 2),
P(4, 1),		P(4, 1),
Show All 11 Lines	test()
assert(array[2] == P(2, 1));		assert(array[2] == P(2, 1));
assert(array[3] == P(2, 2));		assert(array[3] == P(2, 2));
assert(array[4] == P(4, 1));		assert(array[4] == P(4, 1));
assert(array[5] == P(4, 2));		assert(array[5] == P(4, 2));
assert(array[6] == P(6, 1));		assert(array[6] == P(6, 1));
assert(array[7] == P(6, 2));		assert(array[7] == P(6, 2));
assert(array[8] == P(8, 1));		assert(array[8] == P(8, 1));
assert(array[9] == P(8, 2));		assert(array[9] == P(8, 2));
}		}
{ // check all false but last true		{ // check all false but last true
typedef std::pair<int,int> P;		typedef std::pair<int,int> P;
P array[] =		P array[] =
{		{
P(0, 1),		P(0, 1),
P(0, 2),		P(0, 2),
P(2, 1),		P(2, 1),
P(2, 2),		P(2, 2),
P(4, 1),		P(4, 1),
Show All 11 Lines	test()
assert(array[2] == P(0, 2));		assert(array[2] == P(0, 2));
assert(array[3] == P(2, 1));		assert(array[3] == P(2, 1));
assert(array[4] == P(2, 2));		assert(array[4] == P(2, 2));
assert(array[5] == P(4, 1));		assert(array[5] == P(4, 1));
assert(array[6] == P(4, 2));		assert(array[6] == P(4, 2));
assert(array[7] == P(6, 1));		assert(array[7] == P(6, 1));
assert(array[8] == P(6, 2));		assert(array[8] == P(6, 2));
assert(array[9] == P(8, 1));		assert(array[9] == P(8, 1));
}		}
{ // check all true but first false		{ // check all true but first false
typedef std::pair<int,int> P;		typedef std::pair<int,int> P;
P array[] =		P array[] =
{		{
P(0, 1),		P(0, 1),
P(1, 2),		P(1, 2),
P(3, 1),		P(3, 1),
P(3, 2),		P(3, 2),
P(5, 1),		P(5, 1),
Show All 11 Lines	test()
assert(array[2] == P(3, 2));		assert(array[2] == P(3, 2));
assert(array[3] == P(5, 1));		assert(array[3] == P(5, 1));
assert(array[4] == P(5, 2));		assert(array[4] == P(5, 2));
assert(array[5] == P(7, 1));		assert(array[5] == P(7, 1));
assert(array[6] == P(7, 2));		assert(array[6] == P(7, 2));
assert(array[7] == P(9, 1));		assert(array[7] == P(9, 1));
assert(array[8] == P(9, 2));		assert(array[8] == P(9, 2));
assert(array[9] == P(0, 1));		assert(array[9] == P(0, 1));
}		}
{ // check all true but last false		{ // check all true but last false
typedef std::pair<int,int> P;		typedef std::pair<int,int> P;
P array[] =		P array[] =
{		{
P(1, 1),		P(1, 1),
P(1, 2),		P(1, 2),
P(3, 1),		P(3, 1),
P(3, 2),		P(3, 2),
P(5, 1),		P(5, 1),
Show All 11 Lines	test()
assert(array[2] == P(3, 1));		assert(array[2] == P(3, 1));
assert(array[3] == P(3, 2));		assert(array[3] == P(3, 2));
assert(array[4] == P(5, 1));		assert(array[4] == P(5, 1));
assert(array[5] == P(5, 2));		assert(array[5] == P(5, 2));
assert(array[6] == P(7, 1));		assert(array[6] == P(7, 1));
assert(array[7] == P(7, 2));		assert(array[7] == P(7, 2));
assert(array[8] == P(9, 1));		assert(array[8] == P(9, 1));
assert(array[9] == P(0, 2));		assert(array[9] == P(0, 2));
}		}
		#if TEST_STD_VER >= 11
		{ // check that the algorithm still works when no memory is available
		std::vector<int> vec(150, 3);
		vec[5] = 6;
		getGlobalMemCounter()->throw_after = 0;
		std::stable_partition(vec.begin(), vec.end(), [](int i) { return i < 5; });
		assert(std::is_partitioned(vec.begin(), vec.end(), [](int i) { return i < 5; }));
		vec[5] = 6;
		getGlobalMemCounter()->throw_after = 0;
		std::stable_partition(forward_iterator(vec.begin()), forward_iterator(vec.end()), [](int i) { return i < 5; });
		assert(std::is_partitioned(vec.begin(), vec.end(), [](int i) { return i < 5; }));
		}
		#endif // TEST_STD_VER >= 11
}		}

#if TEST_STD_VER >= 11		#if TEST_STD_VER >= 11

struct is_null		struct is_null
{		{
template <class P>		template <class P>
bool operator()(const P& p) {return p == 0;}		bool operator()(const P& p) {return p == 0;}
Show All 26 Lines

libcxx/test/std/algorithms/alg.sorting/alg.merge/inplace_merge.pass.cpp

Show All 9 Lines

// template<BidirectionalIterator Iter>		// template<BidirectionalIterator Iter>
// requires ShuffleIterator<Iter>		// requires ShuffleIterator<Iter>
// && LessThanComparable<Iter::value_type>		// && LessThanComparable<Iter::value_type>
// void		// void
// inplace_merge(Iter first, Iter middle, Iter last);		// inplace_merge(Iter first, Iter middle, Iter last);

#include <algorithm>		#include <algorithm>
#include <random>
#include <cassert>		#include <cassert>
		#include <random>
		#include <vector>

#include "test_macros.h"		#include "count_new.h"
#include "test_iterators.h"		#include "test_iterators.h"
		#include "test_macros.h"

#if TEST_STD_VER >= 11		#if TEST_STD_VER >= 11
struct S {		struct S {
S() : i_(0) {}		S() : i_(0) {}
S(int i) : i_(i) {}		S(int i) : i_(i) {}

S(const S& rhs) : i_(rhs.i_) {}		S(const S& rhs) : i_(rhs.i_) {}
S( S&& rhs) : i_(rhs.i_) { rhs.i_ = -1; }		S( S&& rhs) : i_(rhs.i_) { rhs.i_ = -1; }
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	int main(int, char**)
test<int*>();		test<int*>();

#if TEST_STD_VER >= 11		#if TEST_STD_VER >= 11
test<bidirectional_iterator<S*> >();		test<bidirectional_iterator<S*> >();
test<random_access_iterator<S*> >();		test<random_access_iterator<S*> >();
test<S*>();		test<S*>();
#endif // TEST_STD_VER >= 11		#endif // TEST_STD_VER >= 11

		std::vector<int> vec(150, 3);
		getGlobalMemCounter()->throw_after = 0;
		std::inplace_merge(vec.begin(), vec.begin() + 100, vec.end());
		assert(std::all_of(vec.begin(), vec.end(), [](int i) { return i == 3; }));

return 0;		return 0;
}		}

libcxx/test/std/algorithms/alg.sorting/alg.sort/stable.sort/stable_sort.pass.cpp

Show All 9 Lines

// template<RandomAccessIterator Iter>		// template<RandomAccessIterator Iter>
// requires ShuffleIterator<Iter>		// requires ShuffleIterator<Iter>
// && LessThanComparable<Iter::value_type>		// && LessThanComparable<Iter::value_type>
// void		// void
// stable_sort(Iter first, Iter last);		// stable_sort(Iter first, Iter last);

#include <algorithm>		#include <algorithm>
		#include <cassert>
#include <iterator>		#include <iterator>
#include <random>		#include <random>
#include <cassert>

		#include "count_new.h"
#include "test_macros.h"		#include "test_macros.h"

std::mt19937 randomness;		std::mt19937 randomness;

template <class RI>		template <class RI>
void		void
test_sort_helper(RI f, RI l)		test_sort_helper(RI f, RI l)
{		{
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	int main(int, char**)
test_larger_sorts(256);		test_larger_sorts(256);
test_larger_sorts(257);		test_larger_sorts(257);
test_larger_sorts(499);		test_larger_sorts(499);
test_larger_sorts(500);		test_larger_sorts(500);
test_larger_sorts(997);		test_larger_sorts(997);
test_larger_sorts(1000);		test_larger_sorts(1000);
test_larger_sorts(1009);		test_larger_sorts(1009);

		{ // check that the algorithm works without memory
		std::vector<int> vec(150, 3);
		getGlobalMemCounter()->throw_after = 0;
		std::stable_sort(vec.begin(), vec.end());
		}

return 0;		return 0;
		ldionneUnsubmitted Not Done Reply Inline Actions Here and everywhere -- you used 2-space indentation instead of 4 spaces as in the existing file. ldionne: Here and everywhere -- you used 2-space indentation instead of 4 spaces as in the existing file.
}		}

libcxx/test/support/count_new.h

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef COUNT_NEW_H		#ifndef COUNT_NEW_H
#define COUNT_NEW_H		#define COUNT_NEW_H

# include <cstdlib>		#include <algorithm>
# include <cassert>		#include <cassert>
		#include <cstdlib>
# include <new>		#include <new>
#include <type_traits>		#include <type_traits>

#include "test_macros.h"		#include "test_macros.h"

#if defined(TEST_HAS_SANITIZERS)		#if defined(TEST_HAS_SANITIZERS)
#define DISABLE_NEW_COUNT		#define DISABLE_NEW_COUNT
#endif		#endif

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:
std::size_t last_new_array_size;		std::size_t last_new_array_size;
std::size_t last_new_array_align;		std::size_t last_new_array_align;
std::size_t last_delete_array_align;		std::size_t last_delete_array_align;

public:		public:
void newCalled(std::size_t s)		void newCalled(std::size_t s)
{		{
assert(disable_allocations == false);		assert(disable_allocations == false);
assert(s);
if (throw_after == 0) {		if (throw_after == 0) {
throw_after = never_throw_value;		throw_after = never_throw_value;
detail::throw_bad_alloc_helper();		detail::throw_bad_alloc_helper();
} else if (throw_after != never_throw_value) {		} else if (throw_after != never_throw_value) {
--throw_after;		--throw_after;
}		}
++new_called;		++new_called;
++outstanding_new;		++outstanding_new;
Show All 17 Lines	void alignedDeleteCalled(void *p, std::size_t a) {
deleteCalled(p);		deleteCalled(p);
++aligned_delete_called;		++aligned_delete_called;
last_delete_align = a;		last_delete_align = a;
}		}

void newArrayCalled(std::size_t s)		void newArrayCalled(std::size_t s)
{		{
assert(disable_allocations == false);		assert(disable_allocations == false);
assert(s);
if (throw_after == 0) {		if (throw_after == 0) {
throw_after = never_throw_value;		throw_after = never_throw_value;
detail::throw_bad_alloc_helper();		detail::throw_bad_alloc_helper();
} else {		} else {
// don't decrement throw_after here. newCalled will end up doing that.		// don't decrement throw_after here. newCalled will end up doing that.
}		}
++outstanding_array_new;		++outstanding_array_new;
++new_array_called;		++new_array_called;
▲ Show 20 Lines • Show All 281 Lines • ▼ Show 20 Lines
#if defined(_LIBCPP_MSVCRT_LIKE) \|\| \		#if defined(_LIBCPP_MSVCRT_LIKE) \|\| \
(!defined(_LIBCPP_VERSION) && defined(_WIN32))		(!defined(_LIBCPP_VERSION) && defined(_WIN32))
#define USE_ALIGNED_ALLOC		#define USE_ALIGNED_ALLOC
#endif		#endif

void* operator new(std::size_t s, std::align_val_t av) TEST_THROW_SPEC(std::bad_alloc) {		void* operator new(std::size_t s, std::align_val_t av) TEST_THROW_SPEC(std::bad_alloc) {
const std::size_t a = static_cast<std::size_t>(av);		const std::size_t a = static_cast<std::size_t>(av);
getGlobalMemCounter()->alignedNewCalled(s, a);		getGlobalMemCounter()->alignedNewCalled(s, a);
void *ret;		void *ret = nullptr;
#ifdef USE_ALIGNED_ALLOC		#ifdef USE_ALIGNED_ALLOC
ret = _aligned_malloc(s, a);		ret = _aligned_malloc(s, a);
#else		#else
posix_memalign(&ret, a, s);		posix_memalign(&ret, std::max(a, sizeof(void*)), s);
#endif		#endif
if (ret == nullptr)		if (ret == nullptr)
detail::throw_bad_alloc_helper();		detail::throw_bad_alloc_helper();
return ret;		return ret;
}		}

void operator delete(void *p, std::align_val_t av) TEST_NOEXCEPT {		void operator delete(void *p, std::align_val_t av) TEST_NOEXCEPT {
const std::size_t a = static_cast<std::size_t>(av);		const std::size_t a = static_cast<std::size_t>(av);
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Introduce __make_uninitialized_buffer and use it instead of get_temporary_bufferAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 529632

libcxx/include/CMakeLists.txt

libcxx/include/__algorithm/inplace_merge.h

libcxx/include/__algorithm/stable_partition.h

libcxx/include/__algorithm/stable_sort.h

libcxx/include/__memory/construct_at.h

libcxx/include/__memory/temporary_buffer.h

libcxx/include/__memory/uninitialized_buffer.h

libcxx/include/module.modulemap.in

libcxx/test/std/algorithms/alg.modifying.operations/alg.partitions/stable_partition.pass.cpp

libcxx/test/std/algorithms/alg.sorting/alg.merge/inplace_merge.pass.cpp

libcxx/test/std/algorithms/alg.sorting/alg.sort/stable.sort/stable_sort.pass.cpp

libcxx/test/support/count_new.h

[libc++] Introduce __make_uninitialized_buffer and use it instead of get_temporary_buffer
AcceptedPublic