This is an archive of the discontinued LLVM Phabricator instance.

Cleanup __uninitialized_temporary_buffer internals.
AbandonedPublic

Authored by EricWF on Jun 28 2023, 4:44 PM.

Details

Reviewers
philnik
davidxl
ldionne
Group Reviewers
Restricted Project
Summary

There are a few improvements this patch makes. First, it only
uses aligned allocation when the type actually requires it. This
addresses a sizable user-reported performance regression in some of the
algorithms.

Second, it removes a significant amount unused template & function parameters,
which improves code size and makes the type less error prone and more readable.

And finally, it fixes a bug where operator new/delete were being called
directly, rather than as a builtin -- preventing the compiler from
eliding the allocation.

Diff Detail

Event Timeline

EricWF created this revision.Jun 28 2023, 4:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2023, 4:44 PM
EricWF requested review of this revision.Jun 28 2023, 4:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2023, 4:44 PM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
EricWF updated this revision to Diff 535565.Jun 28 2023, 4:48 PM

Fix C++03 build

MaskRay added a subscriber: MaskRay.EditedJun 28 2023, 4:55 PM

Thanks for posting the fix (the lost !__is_overaligned_for_new branch).

Switching to __libcpp_allocate removes some duplicate code, which looks good to me. https://clang.llvm.org/docs/LanguageExtensions.html#builtin-operator-new-and-builtin-operator-delete

If you don't use arc diff, there needs to be some context. https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

uses aligned allocation when the type actually requires it. [...]

You may accidentally indent the paragraph by 4 spaces, so it was detected as a code block in Markdown.

Thanks for working on this. In general looks good, I like to see a green CI.

libcxx/include/__memory/uninitialized_buffer.h
38

pre-existing nit: sizeof(_Tp) * __count_ seems more natural and matches line 48.

libcxx/include/new
297

s/_LIBCPP_INLINE_VISIBILITY/_LIBCPP_HIDE_FROM_ABI/

300

Mainly curious, but would it make sense to have this check in the Clang builtin?

MaskRay added inline comments.Jun 28 2023, 11:35 PM
libcxx/include/new
300

Mainly curious, but would it make sense to have this check in the Clang builtin?

Which check?

ldionne requested changes to this revision.Jun 29 2023, 6:36 AM
ldionne added inline comments.
libcxx/include/__algorithm/inplace_merge.h
228

We were using nothrow to make it clear that this will return nullptr instead of throwing if the allocation fails. This was initially designed in the context of making sure that exception safety is really easy to determine since this will be used inside PSTL, where we want to have strict control over what can and can't throw (because we need to exclusively propagate exceptions that originate from the library "setup" code, but terminate for any other user-thrown exception).

I still feel like nothrow has its place here in that context.

libcxx/include/__memory/uninitialized_buffer.h
33

I don't think we need the default argument here since you're always passing an argument to the constructor below.

35

IIUC, you're not destroying the elements in the buffer anymore, which seems like a bug.

48

__libcpp_allocate vs operator new is a great catch, thanks for catching that.

66–67

We were using the _Tp[] syntax so that you would get an array specialization of unique_ptr<_Tp[]>, which then supports nice things like operator[]. I still think it makes sense to do that.

This revision now requires changes to proceed.Jun 29 2023, 6:36 AM
ldionne added a comment.EditedJun 29 2023, 6:49 AM

@davidxl @mingmingl Did you folks notice the perf regression through libc++'s own benchmarks? If not, can you please share which benchmarks you used to see the difference? I'd like to add something to libc++'s benchmark suite alongside with this patch so we can have a concrete "regression test" and verify that this is making things better.

Edit: We do have an ongoing problem in libc++ that we have some benchmarks and we run them on a regular basis in the CI, but we don't have any infrastructure that allows comparing benchmark results and detecting regressions. This makes these benchmarks a lot less useful than they could be.

libcxx/include/__memory/uninitialized_buffer.h
10

Not attached to anything: would it be possible to upload the patch with context? This makes it easier to review. arc diff does that automatically, normally.

Also see https://github.com/llvm/llvm-project/issues/63603 for a related issue that could actually improve the perf potentially more than this patch for small number of elements.

var-const edited the summary of this revision. (Show Details)Jun 29 2023, 9:32 AM
var-const edited the summary of this revision. (Show Details)
EricWF marked an inline comment as done.Jun 29 2023, 11:11 AM
EricWF added inline comments.
libcxx/include/__algorithm/inplace_merge.h
228

I strongly disagree. It's very confusing to introduce a novel use of a tag type, to document rather than dispatch, on an API that's purpose is to _maybe_ provide memory _if_ it's available.

If we're looking to document the semantics of get_temporary_buffer further, that's OK. If we're looking to improve the usability & safety of the of the API, that's OK too.
But let's proceed with that some other way.

I'll change the name to get __maybe_get_temporary_buffer.

libcxx/include/__memory/uninitialized_buffer.h
33

I'm with you. I tried to make this not have a default constructor, but there are a bunch of uses in the algorithms header where the type is default constructod.

libcxx/include/new
297

Do we have a doc describing what the difference is and providing top? That way I can stop making the mistake in the future?

300

I think they mean dispatching the call to overaligned new. And I don't think so.
The clang builtin is a very thin wrapper whose only job is to annotate the function call as "builtin" so it can be elided.

Plus, I think the library has more knowledge about the availability of these functions, since we provide them.

EricWF added inline comments.Jun 29 2023, 11:14 AM
libcxx/include/__memory/uninitialized_buffer.h
35

We never we destroying the elements in the buffer.

The deleter template parameter was always nop, but the complexity hid this issue. It's also not possible for the unique_ptr to know how many elements were actually created when it gets deleted, so the API was inherently unsafe from the start. I had assumed the algorithms were doing it because unique_ptr certainly wasn't .

EricWF updated this revision to Diff 535944.Jun 29 2023, 11:52 AM

Add documentation & rename __make_uninitialized_buffer to makes the non-throwing allocation semantics more readily apparent.

@davidxl @mingmingl Did you folks notice the perf regression through libc++'s own benchmarks? If not, can you please share which benchmarks you used to see the difference? I'd like to add something to libc++'s benchmark suite alongside with this patch so we can have a concrete "regression test" and verify that this is making things better.

Edit: We do have an ongoing problem in libc++ that we have some benchmarks and we run them on a regular basis in the CI, but we don't have any infrastructure that allows comparing benchmark results and detecting regressions. This makes these benchmarks a lot less useful than they could be.

Yeah, benchmarking infrastructure is hard to set up, especially one that can largely go unsupervised. My understanding is that the benchmarks we have are more macroscale. I believe the benchmarks which caught this internally are closer to a macro-scale benchmark and are dependent on a ton of infrastructure to gather and report macro-scale performance data, which makes it non-trivial to share externally.

(Also it's an unfortunate truth of benchmarks that macroscale ones seem to be much better at detecting regressions than nice & tidy microbenchmarks).

If somebody else knows more about the benchmarks that detected this, please chime in or correct me.

libcxx/include/__memory/uninitialized_buffer.h
10

Yeah, my arc install is a bit broken - and haven't had the time to fix it.

If you're looking to apply the patch, you can do so by downloading the raw patch, cd'ing into the libcxx directory, and running git apply <raw-patch-file>.

ldionne added inline comments.Jun 29 2023, 2:36 PM
libcxx/include/__memory/uninitialized_buffer.h
46

Nit: .. => .

51–53

The ability to have a non-trivial destructor was the reason why we created this class and refactored the existing uses of get_temporary_buffer() in the first place. We can't remove this functionality, this is needed for implementing the GCD backend in the PSTL.

61

__try_allocate_uninitialized_buffer?

libcxx/include/new
297

From __config:

// Just so we can migrate to the new macros gradually.
#  define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
MaskRay added inline comments.Jun 29 2023, 2:36 PM
libcxx/include/new
297

https://libcxx.llvm.org/DesignDocs/VisibilityMacros.html

_LIBCPP_INLINE_VISIBILITY - Historical predecessor of _LIBCPP_HIDE_FROM_ABI – please use _LIBCPP_HIDE_FROM_ABI instead.

@davidxl @mingmingl Did you folks notice the perf regression through libc++'s own benchmarks? If not, can you please share which benchmarks you used to see the difference? I'd like to add something to libc++'s benchmark suite alongside with this patch so we can have a concrete "regression test" and verify that this is making things better.

Edit: We do have an ongoing problem in libc++ that we have some benchmarks and we run them on a regular basis in the CI, but we don't have any infrastructure that allows comparing benchmark results and detecting regressions. This makes these benchmarks a lot less useful than they could be.

Yeah, benchmarking infrastructure is hard to set up, especially one that can largely go unsupervised. My understanding is that the benchmarks we have are more macroscale. I believe the benchmarks which caught this internally are closer to a macro-scale benchmark and are dependent on a ton of infrastructure to gather and report macro-scale performance data, which makes it non-trivial to share externally.

(Also it's an unfortunate truth of benchmarks that macroscale ones seem to be much better at detecting regressions than nice & tidy microbenchmarks).

If somebody else knows more about the benchmarks that detected this, please chime in or correct me.

I have only read some mimalloc and jemalloc code and I suspect they are less affected by omitting if (__is_overaligned_for_new(__align)).
(For mimalloc, it's almost not affected except a negligible overhead due to checking whether the alignment is a power of 2.)
tcmalloc seems to be affected more, but I have no insight into it. class_to_size

EricWF added inline comments.Jun 29 2023, 2:42 PM
libcxx/include/__memory/uninitialized_buffer.h
51–53

We can cross that bridge and add that functionality when it's needed.

As it stands now, the features your referring to are both unused and broken.
I'm happy to have a more throughout conversation on API design later, but right now we're trying to fix forward to avoid having to revert even further.

It would be better & safer to check in functionality only once it's going to be used. Otherwise, we get untested dead code with bugs in it waiting to be discovered.

61

I have no preference on the name, but try to me implies that it can fail and throw. Where maybe seems like failing is a non-exceptional condition, as it is here.
Your call.

Mordante added inline comments.Jun 29 2023, 2:53 PM
libcxx/include/new
300

I think they mean dispatching the call to overaligned new. And I don't think so.

Yes that check.

The clang builtin is a very thin wrapper whose only job is to annotate the function call as "builtin" so it can be elided.

Plus, I think the library has more knowledge about the availability of these functions, since we provide them.

I assumed the compiler knows more about the required alignment. But as said, mainly curious so no request to change anything.

EricWF updated this revision to Diff 536047.Jun 29 2023, 4:01 PM
EricWF retitled this revision from Cleanup __uninitialized_temporary_buffer internals. to Cleanup __uninitialized_temporary_buffer internals..
EricWF edited the summary of this revision. (Show Details)

Uploading with context

__uninitialized_buffer was reverted, so I think this can be closed.

EricWF abandoned this revision.Sep 7 2023, 10:08 AM