This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/
1/1
__config
64/74
functional

Differential D55045

Add a version of std::function that includes a few optimizations.
ClosedPublic

Authored by jsoyke on Nov 28 2018, 9:07 PM.

Download Raw Diff

Details

Reviewers

ldionne
EricWF
kristina
phosek

Diff Detail

Repository

rCXX libc++

Build Status

Buildable 25900
Build 25899: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I'd like to see benchmarking results that show the benefit of this approach.

Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).

This revision now requires changes to proceed.Nov 29 2018, 9:03 AM

In D55045#1312924, @ldionne wrote:

I'd like to see benchmarking results that show the benefit of this approach.

The benchmarks are already present. We should add the results to this review to make it easier to view, but it's also possible to verify yourself.

Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).

Well we have to continue supporting the old ABI, and I don't want yet another std::function to maintain; maintaining 4 separate versions is already a pain (the C++11 one, and three C++03 versions).
I much prefer having the two sets of code interleaved, so when std::function::foo needs a change or a bugfix, both sets of code are next to each other.

Addressing some comments, using explicit bools for 'trivial' and 'is_small' again (these perform better for some reason?).

Harbormaster completed remote builds in B25525: Diff 176031.Nov 29 2018, 8:09 PM

Current benchmark output with clang-7:

benchmark_output.txt13 KBDownload

And a paste of it for easy viewing: https://pastebin.com/raw/5P5zqtuy

One of the most important parts of this change in system-level benchmarks is eliminating the 3 levels of pointer / vtable indirection (__f_ -> vtable -> operator()) and 1 conditional in std::function::operator() (which doesn't really show up in the micro-benchmark, presumably because the cache stays hot or the compiler can optimize it very easily).

There are some alternative versions of this change that are worth considering, for example we could only eliminate the conditional in operator() and only eliminate one level of pointer indirection, which would allow for larger inline captures (up to sizeof(void*)*3, which happens to be large enough to accommodate most containers captured by value).

Another possibility is always storing non-trivial data out-of-line, which makes move operations really fast at the expense of an extra allocation for small non-trivial objects (in some of our codebases moving std::functions tends to be done in critical regions, like popping off of a thread-safe work queue, so the extra allocation may be worth it for some workloads).

In D55045#1313698, @EricWF wrote:

In D55045#1312924, @ldionne wrote:

I'd like to see benchmarking results that show the benefit of this approach.

The benchmarks are already present. We should add the results to this review to make it easier to view, but it's also possible to verify yourself.

I know, I wanted the author to do the work.

Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).

Well we have to continue supporting the old ABI, and I don't want yet another std::function to maintain; maintaining 4 separate versions is already a pain (the C++11 one, and three C++03 versions).
I much prefer having the two sets of code interleaved, so when std::function::foo needs a change or a bugfix, both sets of code are next to each other.

The thing is that we're making a choice here: it's possible that we could have more implementations of std::function coming down the line, and in that case interleaving the code wouldn't be a good idea. Instead, having a way to switch between several implementations more easily (understanding we can only pick one for a given ABI) would be better. We could experiment with different implementations and then settle on the one for the next ABI version. This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations). Defining this new std::function separately is one way of doing it, and defining some "hooks" that can be used to tweak storage and virtual dispatch (like https://github.com/ldionne/dyno) is another better but more complicated way.

In D55045#1314003, @jsoyke wrote:

Current benchmark output with clang-7:
benchmark_output.txt13 KBDownload

And a paste of it for easy viewing: https://pastebin.com/raw/5P5zqtuy

Thanks! What is this output? Is this some "diff" view between the current and the proposed patch? The time column with floating points represents a change in percentage from the baseline to the candidate (e.g. +0.13 == +13%)? I've used Google Benchmark in the past but I've never seen output exactly like that.

There are some alternative versions of this change that are worth considering, for example we could only eliminate the conditional in operator() and only eliminate one level of pointer indirection, which would allow for larger inline captures (up to sizeof(void*)*3, which happens to be large enough to accommodate most containers captured by value).

Another possibility is always storing non-trivial data out-of-line, which makes move operations really fast at the expense of an extra allocation for small non-trivial objects (in some of our codebases moving std::functions tends to be done in critical regions, like popping off of a thread-safe work queue, so the extra allocation may be worth it for some workloads).

We could also experiment with different sizes of SBO and see what's best. I don't think it's a one-size-fits-all problem (unless 3*sizeof(void) is magic for some reason?).

This revision now requires changes to proceed.Nov 30 2018, 3:39 AM

Thanks! What is this output? Is this some "diff" view between the current and the proposed patch? The time column with floating points represents a change in percentage from the baseline to the candidate (e.g. +0.13 == +13%)? I've used Google Benchmark in the past but I've never seen output exactly like that.

The output is the time delta, not a fraction. It's the output of libcxx/utils/google-benchmark/tools/compare.py. Some of the operations are sub-nano-second so it's handy for that. The times are so low here we may want to time ~10 operations or something instead of just one, but the comparison seems roughly inline with what I've seen with our internal tools.

We could also experiment with different sizes of SBO and see what's best. I don't think it's a one-size-fits-all problem (unless 3*sizeof(void) is magic for some reason?).

I could probably instrument the code to get a histogram of capture sizes we see in some real codebases, that could help inform any decisions here.

Added benchmark that creates many different types of functors to avoid branch prediction.

Harbormaster completed remote builds in B25545: Diff 176117.Nov 30 2018, 7:19 AM

The output is the time delta, not a fraction. It's the output of libcxx/utils/google-benchmark/tools/compare.py. Some of the operations are sub-nano-second so it's handy for that. The times are so low here we may want to time ~10 operations or something instead of just one, but the comparison seems roughly inline with what I've seen with our internal tools.

I think I may have been wrong about this... I haven't really used this tool much, it might be nice if it showed % or something.

I added a benchmark that instantiates many different types of functors to get a feel for how this would behave in a more realistic setting:

Benchmark                              Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------
BM_MixedFunctorTypes                -0.3121         -0.3122         19677         13535         19677         13534

Looking at that it seems like the -0.3121 must be a fraction... Not sure why I thought it was a delta.

I need to statically generate a bunch of different types in this new benchmark, which makes compilation really slow. Interested in suggestions on how to improve that.

Adding __ to all internal symbols.

Harbormaster completed remote builds in B25547: Diff 176122.Nov 30 2018, 7:45 AM

FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.

jsoyke marked an inline comment as done.Nov 30 2018, 7:50 AM

jsoyke added inline comments.

include/functional
1476	Note to self: Remove this before trying to submit. I'm only keeping it here during development and benchmarking because it makes it easier to copy / paste the code into our internal toolchain.

Bah, I just realized I still have a bunch of local variables that don't use reserved names. Fixing.

Fixing local variables and template paramters that didn't use reserved names.

Harbormaster completed remote builds in B25552: Diff 176139.Nov 30 2018, 8:41 AM

Switching to version of optimized function that doesn't include small-object-optimization for small non-trivial objects.

This seems to perform better in the new benchmark I added:

Benchmark Time CPU Time Old Time New CPU Old CPU New

BM_MixedFunctorTypes -0.3794 -0.3794 19661 12202 19660 12202

Harbormaster completed remote builds in B25569: Diff 176202.Nov 30 2018, 2:09 PM

Heres another version that might be worth discussing. It's a much simpler implementation because it doesn't attempt small value optimization for non-trivial objects, this makes operations like move-construction very fast.

Here is a paste of the benchmark comparison: https://pastebin.com/raw/pLrevUAc

I've had both implementations laying around for a while, but some analysis that I did today suggests that the current version (without the non-trivial SOO) might be better for some important workloads.

In D55045#1314652, @jsoyke wrote:

FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.

I'm strongly against it. The harder it is to see all the implementations of a given function f, the harder it is to know they exist and keep them in sync. In my experience spreading these things out leads to real bugs.

@ldionne wrote:

This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations).

I would like that too. but that's not a blocking issue on this review.
I also don't agree that we'll see more std::function implementations in the future. We won't. We're not going to standardize additional features to std::function unless they're not ABI breaking. That is, if the standard does function it'll won't require another version of it.
Maybe you were thinking about things like std::unique_function?

Note that having the C++03 functions be in a separate file with no relation or interleaving with the C++11 implementation has caused plenty of bugs in the past where they diverge.

EricWF added inline comments.Nov 30 2018, 5:48 PM

include/functional
1677	This implementation must work in all dialects (likely including C++03 w/ Clang extensions :-S ). So it has to support allocators. At minimum it needs to support calling the allocator constructors. We may be able to get away with ignoring the allocator though. I'll have to ask my other maintainers. Let me know if you want some help adding allocator support.
1798	`_LIBCPP_CONSTEXPR`

Adding allocator support.

Harbormaster completed remote builds in B25578: Diff 176239.Nov 30 2018, 9:01 PM

Use _LIBCPP_CONSTEXPR

Harbormaster completed remote builds in B25579: Diff 176240.Nov 30 2018, 9:06 PM

jsoyke added inline comments.Nov 30 2018, 9:07 PM

include/functional
1677	Done, but I think I should add some test coverage before declaring triumph. I wouldn't mind ignoring the arguments though if that's preferable for some reason, a few code paths already do (copying from an existing std::function for example).

In D55045#1315541, @EricWF wrote:

In D55045#1314652, @jsoyke wrote:

FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.

I'm strongly against it. The harder it is to see all the implementations of a given function f, the harder it is to know they exist and keep them in sync. In my experience spreading these things out leads to real bugs.

@ldionne wrote:

This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations).

I would like that too. but that's not a blocking issue on this review.
I also don't agree that we'll see more std::function implementations in the future. We won't. We're not going to standardize additional features to std::function unless they're not ABI breaking. That is, if the standard does function it'll won't require another version of it.
Maybe you were thinking about things like std::unique_function?

Note that having the C++03 functions be in a separate file with no relation or interleaving with the C++11 implementation has caused plenty of bugs in the past where they diverge.

I don't see the issue with breaking ABI that's designated as unstable as long as there's some form of announcement to avoid any confusion. Fundamentally that's the entire point of having unstable ABI versions, some changes are just not really possible to make without ABI breakages, most consumers just use the stable ABI, while unstable ABI is not backwards compatible otherwise it would make a whole range of changes to libc++ and/or Clang impossible.

In D55045#1315541, @EricWF wrote:

In D55045#1314652, @jsoyke wrote:

FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.

I'm strongly against it. The harder it is to see all the implementations of a given function f, the harder it is to know they exist and keep them in sync. In my experience spreading these things out leads to real bugs.

If you can test both implementations against the same set of tests, it shouldn't be hard to catch those bugs. Actually, that would be better than having only one version of the code tested (because let's face it, most people are not going to run the tests with both configurations, they're only going to use the default one).

@ldionne wrote:

This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations).

I would like that too. but that's not a blocking issue on this review.

It is a blocking issue if we're going to interleave the code with #ifdefs, as it makes the transition to a solution where we can maintain N implementations more difficult.

I also don't agree that we'll see more std::function implementations in the future. We won't. We're not going to standardize additional features to std::function unless they're not ABI breaking. That is, if the standard does function it'll won't require another version of it.
Maybe you were thinking about things like std::unique_function?

No, what I mean is that libc++ might want to provide different implementations of std::function, for example with different SBO sizes. The fact that @jsoyke seems to have many different implementations of std::function laying around is a validation of my hypothesis that we may want to have more than one std::function implementation.

In D55045#1315603, @kristina wrote:

I don't see the issue with breaking ABI that's designated as unstable as long as there's some form of announcement to avoid any confusion. Fundamentally that's the entire point of having unstable ABI versions, some changes are just not really possible to make without ABI breakages, most consumers just use the stable ABI, while unstable ABI is not backwards compatible otherwise it would make a whole range of changes to libc++ and/or Clang impossible.

I don't see a problem breaking the unstable ABI either. But we don't normally standardize ABI breaking changes. I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.

@jsoyke, I spoke to the other maintainers, and it's cool if we entirely ignore the allocators. So feel free to use that to squeeze out w/e benefits you can.

Otherwise, this is looking pretty good to me. Thanks for all the hard work.

include/functional
1834	Extra `;`.

@jsoyke, I spoke to the other maintainers, and it's cool if we entirely ignore the allocators. So feel free to use that to squeeze out w/e benefits you can.

It didn't cost anything in terms of performance, it just cost some code.

Remove extra ';'

Harbormaster completed remote builds in B25616: Diff 176436.Dec 3 2018, 11:33 AM

In D55045#1317092, @EricWF wrote:

In D55045#1315603, @kristina wrote:

I don't see the issue with breaking ABI that's designated as unstable as long as there's some form of announcement to avoid any confusion. Fundamentally that's the entire point of having unstable ABI versions, some changes are just not really possible to make without ABI breakages, most consumers just use the stable ABI, while unstable ABI is not backwards compatible otherwise it would make a whole range of changes to libc++ and/or Clang impossible.

I don't see a problem breaking the unstable ABI either. But we don't normally standardize ABI breaking changes.

I fully agree we don't care about breaking an unstable ABI (since vendors won't ship it if they care about ABI stability). I think everyone is on the same page here.

I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.

This is where we disagree -- I don't think there's a universally better implementation of std::function. One implementation might be better in some circumstances and another implementation might be better in other circumstances, hence my claim that we should not go for a solution that assumes at most two implementations.

On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?

Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.

In D55045#1317136, @ldionne wrote:

I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.

This is where we disagree -- I don't think there's a universally better implementation of std::function. One implementation might be better in some circumstances and another implementation might be better in other circumstances, hence my claim that we should not go for a solution that assumes at most two implementations.

Our job is to provide a universal implementation of std::function, and hence pick one implementation which weights the different alternatives and chooses what we believe to be best in the majority of cases.

On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?

Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.

Just above you suggested we get ready to add more than two implementations, but here you're setting a very high bar for adding one more?
I read this as a seeming contradiction.

In D55045#1317177, @EricWF wrote:

In D55045#1317136, @ldionne wrote:

I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.

This is where we disagree -- I don't think there's a universally better implementation of std::function. One implementation might be better in some circumstances and another implementation might be better in other circumstances, hence my claim that we should not go for a solution that assumes at most two implementations.

Our job is to provide a universal implementation of std::function, and hence pick one implementation which weights the different alternatives and chooses what we believe to be best in the majority of cases.

I agree, and that's why we need to select the best possible default implementation. I don't think that precludes giving users more choices. The current patch makes it harder to do that.

Similarly, if we were to stabilize ABI v2 with this new std::function, and if we eventually wanted to use yet another implementation for ABI v3, we'd now have three #if branches in each member function of std::function. I think it would be better to setup something better right now.

On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?

Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.

Just above you suggested we get ready to add more than two implementations, but here you're setting a very high bar for adding one more?
I read this as a seeming contradiction.

In the current state of the patch, we're adding exactly one new implementation of std::function, and we're doing it in a way that it's "the one new better std::function". In that case, yes, I do want to understand what this new (better) std::function brings to the table. If we had a way of switching between multiple implementations more easily and this was just another implementation that users can pick if they wish to, I would almost certainly care a bit less.

If we had a way of switching between multiple implementations more easily and this was just another implementation that users can pick if they wish to, I would almost certainly care a bit less.

Switching at build time? Or do you mean switching in a way that doesn't break ABI compatibility?

sbenza added a subscriber: sbenza.Dec 3 2018, 12:32 PM

sbenza added inline comments.

benchmarks/function.bench.cpp
222 ↗	(On Diff #176436)	These types already exist above.
include/functional
1715	`static_cast<void*>(__hold.get())`
1737	`_VSTD`. There are other uses of `std::` still.
1747	T&& is a pessimization for fundamental types. This should be something like: std::conditional_t< !_VSTD::is_reference_v<_ArgTypes> && sizeof(_ArgTypes) < sizeof(void*) && _VSTD::is_trivially_copy_constructible<_ArgTypes> && _VSTD::is_trivially_destructible<_ArgTypes>, _ArgTypes, _ArgTypes&&> Basically, small trivially copyable types should be passed by value to avoid the indirection. This potentially saves stack space (if the input was not an lvalue already) and indirections in the the callee.
1858	If `_LIBCPP_NO_RTTI`, we could optimize this by having a single global policy for all small objects.
2269	Do we want to unconditionally copy `__buf_` even when `__clone` is called?
2281	I don't think we should check. Setting the source to empty unconditionally should be faster than checking. (same on the other move constructor below)
2290	Do we need to do this here? Can't we do it on an `else` from the `__not_null`? That way it won't even exist unless `_Fp` is a pointer.
2325	It is not clear if `__construct_empty` will overwrite `__buf_` just by reading this. I would read `__buf_.__large` into a local variable before calling `__construct_empty()`

EricWF added inline comments.Dec 3 2018, 12:39 PM

include/functional
1737	I actually have no idea why `_VSTD::` exists, or why we would ever need it. One day I hope to kill it. I'm guess we should do w/e the surrounding code does.
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/copy_move.pass.cpp
162 ↗	(On Diff #176436)	Just remove these asserts. I'm not sure how useful they are. I guess it means we're allowing ourselves to break existing code, but that existing code depended on UB. I think maybe the asserts were an attempt to establish that the SSO was actually being hit.

Eric and I just spoke. My point of view:

We should investigate factoring out the functions that differ in implementation into separate classes with a common API. This way, we could have something like this:

template<class _Rp, class ..._ArgTypes>
class function<_Rp(_ArgTypes...)> {
#ifndef _LIBCPP_ABI_OPTIMIZED_FUNCTION
    typedef __function::__abi_v1_impl<_Rp(_ArgTypes...)> __implementation;
#else
    typedef __function::__abi_optimized_impl<_Rp(_ArgTypes...)> __implementation;
#endif

    __implementation __impl_;

    ...
};

Then, in the copy constructor, something like:

template<class _Rp, class ..._ArgTypes>
function<_Rp(_ArgTypes...)>::function(const function& __f) {
    __impl_.copy_construct(__f.__impl_);
}

And something analogous for other operations. This way, the implementation of each storage/dispatching mechanism would be collocated with itself and we could conceivably add new ones or introduce a new one in a future ABI version. There's also a possibility that we could reuse some of this logic (the SBO logic) in other components like std::any. We could also test those two implementations individually in some libc++-specific tests instead of leaving the new implementation go largely untested. I also think it would make the code cleaner than using #ifdefs all over the place, but that's subjective.

I'd like this avenue to be explored -- if it turns out not to be a fruitful approach, then I can live with the current approach.

Refactoring implementation of std::function into a few different classes to minimize the number of ifdefs.

Harbormaster completed remote builds in B25648: Diff 176546.Dec 3 2018, 9:33 PM

In D55045#1317360, @ldionne wrote:

Eric and I just spoke. My point of view:

We should investigate factoring out the functions that differ in implementation into separate classes with a common API. This way, we could have something like this:

Your suggestion is pretty close to what I came up with. Let me know what you think, I prefer this approach myself.

benchmarks/function.bench.cpp
222 ↗	(On Diff #176436)	I wanted something slightly different here, I want the compiler to actually generate different call operators, hence the GeneratedFunctor below.
include/functional
1747	I agree, this worked well for some internal optimizations we did, @EricWF objected at first, but maybe we should discuss it more.
1858	Is that worth it? I wouldn't think disabling RTTI is common.
2269	It's not worth the extra branch instruction based on benchmarks.
2281	This actually does perform better (at least on my workstation). Presumably because writes are more expensive than a conditional.
2290	Last time I looked it got compiled out anyway since the members are trivial, unread, and unconditionally set below.
2325	This should be clearer in the new version of the code which sets these fields explicitly.

Adding some comments to new classes.

Harbormaster completed remote builds in B25649: Diff 176547.Dec 3 2018, 9:44 PM

jsoyke marked 2 inline comments as done.Dec 3 2018, 9:48 PM

jsoyke added inline comments.

include/functional
1554	No strong feelings on any of the class names I've used here. I was thinking vfunc = vtable based and pfunc = policy based, but I'd be fine with something more descriptive as well.
1566	Let me know if I'm using this macro correctly. It seems like it should be present on all non-public symbols?

jsoyke marked an inline comment as done.Dec 3 2018, 9:53 PM

jsoyke added inline comments.

include/functional
2313	This causes a regression in the non-optimized version, presumably because swap() has a lot of conditionals and potential out-of-line calls, I'll fix this if the new code structure looks good otherwise.

I think we need to use std::launder when accessing the function object through the small buffer. This is a problem we seem to have in the current implementation too.

include/__config
98	Instead, I think this should be described as an "Unstable attempt to provide a more optimized `std::function`" or something along those lines. My understanding is that we're not bound to your current implementation for the new `std::function` -- we should figure out whatever is fastest and use that.
include/functional
1554	I'm trying to ponder whether changing this name would break the ABI. If someone explicitly instantiates a `std::function` in a shared object with default visibility, I think this is an ABI break because the dylib (if not recompiled) will export the vtable for `__base`, but the application (when recompiled against new headers) will try to find a definition for the vtable of `__vfunc`. Now, I'm willing to assume that nobody did something like that. But even then, is there a possibility for an ABI break? @EricWF can you think of something?
1563	`_LIBCPP_INLINE_VISIBILITY` here and on `allocator()` too. Also, those should be called `__target()` and `__allocator()` since they are internal, no?
1566	Roughly speaking, it needs to appear on everything that we don't want to export from the dylib. It's a bit more complicated, but that's the general idea.
1691	Typo: underlying
1701	Mangle to `__type_info` or similar. I know it's not necessary because users can't `#define type_info`, but I find it better to be consistent and mangle everything that's not exposed to users.
2244–2246	`static_cast<bool>`? Just a matter of style.
2311–2312	We didn't check for that before, did we? IOW the implementation worked transparently even in the case of a self move-assignment. Would it be possible to retain that property (and save this check in the current implementation)?

ldionne added inline comments.Dec 4 2018, 7:32 AM

include/functional
1556	We already have one of those in `__functional_03`, right? I understand both files can't be included at the same time, but I would consider changing the name to avoid confusion.
1698	I think this can be made `const`.
1792	This would be more readable: `__invoker(__use_small_storage<_Fun>() ? __call_small<_Fun> : __call_large<_Fun>)`.
2098	`nullptr`?
2100	`reinterpret_cast`

This revision now requires changes to proceed.Dec 4 2018, 7:32 AM

sbenza added inline comments.Dec 4 2018, 7:59 AM

benchmarks/function.bench.cpp
222 ↗	(On Diff #176436)	Add a comment of some kind or rename the classes. Otherwise it can be confusing for a maintainer.
include/functional
1563	I think catching these unreserved names should be done with tooling. It is much easier for a ClangTidy tool to detect all incorrect declarations of unreserved names.
1698	`type_info` too
1792	This would instantiate both functions when only one is needed.
1858	No idea. Projects that disable RTTI might like the extra saving?
2049	We don't need a branch here. Also, unconditionally making the moved-from empty can improve optimizations.
2077	Is this necessary?
2097	Shouldn't this be guarded for `_LIBCPP_NO_RTTI`? You can't use `typeid`
2311–2312	Without this check the code is equally correct, as in the moved-from value has a valid-but-unspecified state. The lhs happens to also be the moved-from value, but that is fine.
2313	If we are going to use the swap trick, then we don't need to do `=nullptr` in the line before.

ldionne added inline comments.Dec 4 2018, 11:54 AM

include/functional
1563	Technically, `allocator` and `target` are "reserved" because they are used elsewhere in the library. If a user were to `#define` them, they'd be in trouble already. For that reason, a Clang Tidy check would have to know the interface of everything in the standard library, and enforce that anything not part of such an interface in namespace `std` be mangled. A tool like this can definitely be written, but it may be tedious and difficult to keep up to date. I could be missing another (simpler) solution to implement this, though.
1792	I know, but what's the concern here? Compile-times or code bloat? If it's compile-time, we are instantiating a function just to do the dispatching. If it's code bloat, I'd be _very_ surprised if both definitions actually made it into the final executable, since one of them is obviously not used. This is not a blocking comment, but I'm not sure we're optimizing for the right thing here. Readability _is_ important, even in a standard library.
2311–2312	Rephrasing to make sure I understand: If `this == &__f`, then after the assignment `__f` is in a valid-but-unspecified state (in this case it is empty). Since `this == &__f`, `*this` is also in the same valid-but-unspecified state. I don't think this makes sense for self-assignment, because the `lhs` is usually understood to contain the value of the `rhs` after a move-assignment. IOW, the following should hold: T x = ...; T copy = x; T y = std::move(x); assert(y == copy); This is not the case here.

sbenza added inline comments.Dec 4 2018, 12:22 PM

include/functional
1563	Yeah, I was thinking of a tool that has a whitelist of identifier names and anything else should be a reserved identifier.
1792	tbh, I think we can make the function be like: template <typename _Fun> static _Rp __call_impl(const __storage* __buf, _ArgTypes&&... __args) { _Fun* __f = reinterpret_cast<_Fun>( __use_small_storage<_Fun>::value ? __buf->__large : &__buf->__small); return (__f)(_VSTD::forward<_ArgTypes>(__args)...); } Then there is a single function that handles both.
2311–2312	The example above does not have any self-assignment (or any assignment). My argument is that `y = std::move(y);` leaves `y` in an unspecified state. It is not required to be a noop. This is how it behaves today and it is up to spec.

Addressing recent review comments.

Some tests are failing at this new version. Some seem because the tests depend on under-specified behavior, some might be legitimate so I'll dig into those more tomorrow.

include/functional
1554	Renamed back to the original class name in case this is an issue. Would be fine since the ctor (which initializes the vtable pointer) is inline? Seems like everything will get folded into the std::function ctor, which should produce a layout compatible object that only references static symbols. I'd slightly prefer changing the names to be more descriptive now that there are many classes that do the same job, but your concern makes sense so I'll wait till we have a clear answer about this.
1556	I reverted names of existing classes due to ABI concerns mentioned elsewhere.
1563	I wonder if parsing the synopsis comments would be reasonable? Anyway, it sounds like there is nothing to change other than visibility here since the name is already "reserved"?
1566	Think I got them all...
1792	Thanks, that's way cleaner.
2100	FWIW, this ends up casting 'const void' to 'void ()()' in some cases, which I think is undefined, but the existing implementation relies on this so I didn't worry about fixing it here.
2311–2312	Self move assignment left the function empty before, preserving that behavior now.

Harbormaster completed remote builds in B25706: Diff 176760.Dec 4 2018, 8:12 PM

ldionne added inline comments.Dec 5 2018, 6:50 AM

include/functional
2311–2312	Yeah, my example is obviously broken. What I meant is just that one would expect the `lhs` to contain the value of the `rhs`, not for that value to be nuked. I understand the current implementation is valid in terms of the specification. I'm saying the specification does not satisfy the principle of least surprise. Leave it as-is.
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_F.pass.cpp
68 ↗	(On Diff #176760)	Should these checks be made to depend on `#ifdef _LIBCPP_ABI_OPTIMIZED_FUNCTION`?
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_rfunction.pass.cpp
92 ↗	(On Diff #176760)	Those asserts should be removed too.

Addressing some review comments.
Reformatting for consistency.
More explicit class names.
Fixing an expected performance regression.

Harbormaster completed remote builds in B25806: Diff 177106.Dec 6 2018, 8:27 PM

jsoyke marked 2 inline comments as done.Dec 6 2018, 8:33 PM

jsoyke added inline comments.

include/functional
1747	I added this optimization here, at least partially. There are tests that make sure that std::function<void(SomePredeclaredThing)> works, so unfortunately I cannot use sizeof(T) here.
2313	There is a test for "re-entrant" destruction, in other words the functor's destructor sets the std::function's value to something else during destruction. Not nulling this out before would cause that test to fail.
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_F.pass.cpp
68 ↗	(On Diff #176760)	No strong opinions from me, but I have a slight preference for removing it since it's basically testing an implementation detail.

jsoyke marked an inline comment as done.Dec 6 2018, 8:58 PM

jsoyke added inline comments.

include/functional
1779	This optimization wasn't in the last version, sbenza@ mentioned we might want to do it. It helps with the Invoke benchmarks, as much as 10-20%. It's probably worth mentioning that this can hurt performance in cases where the caller is using std::forward, or passing small value types by reference, but these are very uncommon (at least in my experience).

sbenza added inline comments.Dec 7 2018, 10:10 AM

include/functional
1678	Yesterday richardsmith was telling us that we should not need or want to use aligned_storage for pretty much anything. That we should char arrays directly, like alignas(8) char __small[2 * sizeof(void*)]; btw, I don't think we want the default alignment from aligned_storage here. That can be 16, which is an overkill and will propagate to any other class that contains a `std::function`. We know that some allocators will be slower when requested alignment is larger than 8. I don't mind if some functors end up in the heap because of our tighter alignment requirements. It will still be a win for all the other functors that have smaller alignment.
1779	Can you show how this can hurt? Do you have a benchmark that gets better when this is removed? This will only trigger if the signature of the function itself takes by value. `std::function::operator()` will have it by value already, and the user callback will take it by value. If we pass by ref we will be forced to put it on the stack on a different address even if the caller passed an lvalue because `operator()`'s arguments forced a copy. We will need to read that data and pass it in by value anyway. We are only moving that lvalue-to-rvalue decay before the vtable instead of in the vtable. https://gcc.godbolt.org/z/R7JBq2

jsoyke marked 3 inline comments as done.Dec 7 2018, 11:41 AM

jsoyke added inline comments.

include/functional
1779	Never-mind, my reasoning was flawed. I was basing my experience on a piece of software that did something like fast_forward but internally dispatched to std::function. It had to move register objects to the stack and made the fast_forward code perform worse, but we're fixing that problem here.

Avoid aligned_storage.

Harbormaster completed remote builds in B25842: Diff 177267.Dec 7 2018, 11:43 AM

I think I'm fine with this, but I'd still like to better understand the positive performance impact of the change. When we spoke @EricWF said he could provide an interpretation.

Thanks for the work!

include/functional
2169–2172	Please remove. :-)

This revision is now accepted and ready to land.Dec 7 2018, 12:09 PM

jsoyke marked an inline comment as done.Dec 7 2018, 12:12 PM

Remove hardcoded config option.

Harbormaster completed remote builds in B25843: Diff 177274.Dec 7 2018, 12:13 PM

Thanks for the review! I don't have commit access so I think someone else will have to commit this for me?

Sync to head.

Harbormaster completed remote builds in B25844: Diff 177283.Dec 7 2018, 12:36 PM

EricWF mentioned this in rL348778: Refactor std::function to more easily support alternative implementations..Dec 10 2018, 10:17 AM

EricWF mentioned this in rCXX348778: Refactor std::function to more easily support alternative implementations..

Sync to head.

Harbormaster completed remote builds in B25894: Diff 177556.Dec 10 2018, 10:24 AM

Dropping comment that doesn't applit now that __policy_storage doesn't use alligned_storage.

Harbormaster completed remote builds in B25895: Diff 177557.Dec 10 2018, 10:26 AM

EricWF added inline comments.Dec 10 2018, 10:29 AM

include/functional
1703	Nit: I don't think the `std::` qualifier is needed.
1717	I wouldn't mind naming these magic constants like: static const _LIBCPP_CONSTEXPR __policy __policy_ = { /* __clone / nullptr, / __destroy / nullptr, / __is_null */ true, };
1718	Small nit. I hate conditional compilation blocks, and would like as few as possible. I would write this as: template <class _Tp> _LIBCPP_INLINE_VISIBILITY static constexpr type_info *__get_typeid() { #ifndef _LIBCPP_NO_RTTI return &typeid(_Tp); #else return nullptr; #endif and then just use that function everywhere else.
1876	Nit: The `_VSTD::` is unnecessarily when referencing types.
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/copy_move.pass.cpp
162 ↗	(On Diff #176436)	Actually I take this back. Hyrums law leads me to believe if we regress this behavior in ABI v1, someone will notice. I think maybe we should still have these asserts for the old ABI (guarded by `#ifndef _LIBCPP_ABI_OPTIMIZED_FUNCTION`). Sorry.

Drop duplicate definition of __value_func.

Harbormaster completed remote builds in B25896: Diff 177558.Dec 10 2018, 10:32 AM

LGTM after addressing the inline comments. We can further iterate after landing this if needed.

include/functional
2311	`std::addressof` just in case?
2324	s/_LIBCPP_EXPLICIT/explicit

Addressing review comments.

Harbormaster completed remote builds in B25900: Diff 177568.Dec 10 2018, 11:15 AM

EricWF accepted this revision.Dec 10 2018, 11:16 AM

jsoyke marked 2 inline comments as done.Dec 10 2018, 11:16 AM

Just a heads up this change broke the lldb std::function formatter see the logs here:

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/13996/

I have not look at all the details yet so not clear how much work is involved in patching on out side yet.

I can take a stab at fixing that if you can point me to the code that does that formatting? I plan on making another change here so it would be good if I could fix the problem in advance.

The gist of the change is that the members you would depend on are now nested inside the __f_ member, but those members may change depending on how libc++ is configured.

The fix was not too bad, I landed the fix and the bots just turned green http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/14015/

If you run the lldb test suite we have test for the formatter and for stepping into std::function so those _should_ catch any breaking changes.

See changes here

https://reviews.llvm.org/rL348810

Committed in r348812. Thanks for all your hard work @jsoyke!

EricWF mentioned this in rL348812: Add a version of std::function that includes a few optimizations in ABI V2..Dec 10 2018, 4:19 PM

EricWF mentioned this in rCXX348812: Add a version of std::function that includes a few optimizations in ABI V2..

Revision Contents

Path

Size

include/

__config

2 lines

functional

303 lines

Diff 177568

include/__config

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	// its vtable and typeinfo to libc++ rather than having all other libraries			// its vtable and typeinfo to libc++ rather than having all other libraries
	// using that class define their own copies.			// using that class define their own copies.
	# define _LIBCPP_ABI_BAD_FUNCTION_CALL_KEY_FUNCTION			# define _LIBCPP_ABI_BAD_FUNCTION_CALL_KEY_FUNCTION
	// Enable optimized version of __do_get_(un)signed which avoids redundant copies.			// Enable optimized version of __do_get_(un)signed which avoids redundant copies.
	# define _LIBCPP_ABI_OPTIMIZED_LOCALE_NUM_GET			# define _LIBCPP_ABI_OPTIMIZED_LOCALE_NUM_GET
	// Use the smallest possible integer type to represent the index of the variant.			// Use the smallest possible integer type to represent the index of the variant.
	// Previously libc++ used "unsigned int" exclusivly.			// Previously libc++ used "unsigned int" exclusivly.
	# define _LIBCPP_ABI_VARIANT_INDEX_TYPE_OPTIMIZATION			# define _LIBCPP_ABI_VARIANT_INDEX_TYPE_OPTIMIZATION
				// Unstable attempt to provide a more optimized std::function
				ldionneUnsubmitted Done Reply Inline Actions Instead, I think this should be described as an "Unstable attempt to provide a more optimized `std::function`" or something along those lines. My understanding is that we're not bound to your current implementation for the new `std::function` -- we should figure out whatever is fastest and use that. ldionne: Instead, I think this should be described as an "Unstable attempt to provide a more optimized…
				# define _LIBCPP_ABI_OPTIMIZED_FUNCTION
	#elif _LIBCPP_ABI_VERSION == 1			#elif _LIBCPP_ABI_VERSION == 1
	# if !defined(_LIBCPP_OBJECT_FORMAT_COFF)			# if !defined(_LIBCPP_OBJECT_FORMAT_COFF)
	// Enable compiling copies of now inline methods into the dylib to support			// Enable compiling copies of now inline methods into the dylib to support
	// applications compiled against older libraries. This is unnecessary with			// applications compiled against older libraries. This is unnecessary with
	// COFF dllexport semantics, since dllexport forces a non-inline definition			// COFF dllexport semantics, since dllexport forces a non-inline definition
	// of inline functions to be emitted anyway. Our own non-inline copy would			// of inline functions to be emitted anyway. Our own non-inline copy would
	// conflict with the dllexport-emitted copy, so we disable it.			// conflict with the dllexport-emitted copy, so we disable it.
	# define _LIBCPP_DEPRECATED_ABI_LEGACY_LIBRARY_DEFINITIONS_FOR_INLINE_FUNCTIONS			# define _LIBCPP_DEPRECATED_ABI_LEGACY_LIBRARY_DEFINITIONS_FOR_INLINE_FUNCTIONS
	▲ Show 20 Lines • Show All 1,314 Lines • Show Last 20 Lines

include/functional

Show First 20 Lines • Show All 1,467 Lines • ▼ Show 20 Lines
bool __not_null(function<_Fp> const& __f) { return !!__f; }		bool __not_null(function<_Fp> const& __f) { return !!__f; }

} // namespace __function		} // namespace __function

#ifndef _LIBCPP_CXX03_LANG		#ifndef _LIBCPP_CXX03_LANG

namespace __function {		namespace __function {

// __alloc_func holds a functor and an allocator.		// __alloc_func holds a functor and an allocator.
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Note to self: Remove this before trying to submit. I'm only keeping it here during development and benchmarking because it makes it easier to copy / paste the code into our internal toolchain. jsoyke: Note to self: Remove this before trying to submit. I'm only keeping it here during development…

template <class _Fp, class _Ap, class _FB> class __alloc_func;		template <class _Fp, class _Ap, class _FB> class __alloc_func;

template <class _Fp, class _Ap, class _Rp, class... _ArgTypes>		template <class _Fp, class _Ap, class _Rp, class... _ArgTypes>
class __alloc_func<_Fp, _Ap, _Rp(_ArgTypes...)>		class __alloc_func<_Fp, _Ap, _Rp(_ArgTypes...)>
{		{
__compressed_pair<_Fp, _Ap> __f_;		__compressed_pair<_Fp, _Ap> __f_;

▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	public:
void destroy() _NOEXCEPT { __f_.~__compressed_pair<_Target, _Alloc>(); }		void destroy() _NOEXCEPT { __f_.~__compressed_pair<_Target, _Alloc>(); }
};		};

// __base provides an abstract interface for copyable functors.		// __base provides an abstract interface for copyable functors.

template<class _Fp> class __base;		template<class _Fp> class __base;

template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
class __base<_Rp(_ArgTypes...)>		class __base<_Rp(_ArgTypes...)>
		jsoykeAuthorUnsubmitted Done Reply Inline Actions No strong feelings on any of the class names I've used here. I was thinking vfunc = vtable based and pfunc = policy based, but I'd be fine with something more descriptive as well. jsoyke: No strong feelings on any of the class names I've used here. I was thinking vfunc = vtable…
		ldionneUnsubmitted Done Reply Inline Actions I'm trying to ponder whether changing this name would break the ABI. If someone explicitly instantiates a `std::function` in a shared object with default visibility, I think this is an ABI break because the dylib (if not recompiled) will export the vtable for `__base`, but the application (when recompiled against new headers) will try to find a definition for the vtable of `__vfunc`. Now, I'm willing to assume that nobody did something like that. But even then, is there a possibility for an ABI break? @EricWF can you think of something? ldionne: I'm trying to ponder whether changing this name would break the ABI. If someone explicitly…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Renamed back to the original class name in case this is an issue. Would be fine since the ctor (which initializes the vtable pointer) is inline? Seems like everything will get folded into the std::function ctor, which should produce a layout compatible object that only references static symbols. I'd slightly prefer changing the names to be more descriptive now that there are many classes that do the same job, but your concern makes sense so I'll wait till we have a clear answer about this. jsoyke: Renamed back to the original class name in case this is an issue. Would be fine since the ctor…
{		{
__base(const __base&);		__base(const __base&);
		ldionneUnsubmitted Done Reply Inline Actions We already have one of those in `__functional_03`, right? I understand both files can't be included at the same time, but I would consider changing the name to avoid confusion. ldionne: We already have one of those in `__functional_03`, right? I understand both files can't be…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions I reverted names of existing classes due to ABI concerns mentioned elsewhere. jsoyke: I reverted names of existing classes due to ABI concerns mentioned elsewhere.
__base& operator=(const __base&);		__base& operator=(const __base&);
public:		public:
_LIBCPP_INLINE_VISIBILITY __base() {}		_LIBCPP_INLINE_VISIBILITY __base() {}
_LIBCPP_INLINE_VISIBILITY virtual ~__base() {}		_LIBCPP_INLINE_VISIBILITY virtual ~__base() {}
virtual __base* __clone() const = 0;		virtual __base* __clone() const = 0;
virtual void __clone(__base*) const = 0;		virtual void __clone(__base*) const = 0;
virtual void destroy() _NOEXCEPT = 0;		virtual void destroy() _NOEXCEPT = 0;
		ldionneUnsubmitted Done Reply Inline Actions `_LIBCPP_INLINE_VISIBILITY` here and on `allocator()` too. Also, those should be called `__target()` and `__allocator()` since they are internal, no? ldionne: `_LIBCPP_INLINE_VISIBILITY` here and on `allocator()` too. Also, those should be called…
		sbenzaUnsubmitted Done Reply Inline Actions I think catching these unreserved names should be done with tooling. It is much easier for a ClangTidy tool to detect all incorrect declarations of unreserved names. sbenza: I think catching these unreserved names should be done with tooling. It is much easier for a…
		ldionneUnsubmitted Done Reply Inline Actions Technically, `allocator` and `target` are "reserved" because they are used elsewhere in the library. If a user were to `#define` them, they'd be in trouble already. For that reason, a Clang Tidy check would have to know the interface of everything in the standard library, and enforce that anything not part of such an interface in namespace `std` be mangled. A tool like this can definitely be written, but it may be tedious and difficult to keep up to date. I could be missing another (simpler) solution to implement this, though. ldionne: Technically, `allocator` and `target` are "reserved" because they are used elsewhere in the…
		sbenzaUnsubmitted Done Reply Inline Actions Yeah, I was thinking of a tool that has a whitelist of identifier names and anything else should be a reserved identifier. sbenza: Yeah, I was thinking of a tool that has a whitelist of identifier names and anything else…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions I wonder if parsing the synopsis comments would be reasonable? Anyway, it sounds like there is nothing to change other than visibility here since the name is already "reserved"? jsoyke: I wonder if parsing the synopsis comments would be reasonable? Anyway, it sounds like there is…
virtual void destroy_deallocate() _NOEXCEPT = 0;		virtual void destroy_deallocate() _NOEXCEPT = 0;
virtual _Rp operator()(_ArgTypes&& ...) = 0;		virtual _Rp operator()(_ArgTypes&& ...) = 0;
#ifndef _LIBCPP_NO_RTTI		#ifndef _LIBCPP_NO_RTTI
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Let me know if I'm using this macro correctly. It seems like it should be present on all non-public symbols? jsoyke: Let me know if I'm using this macro correctly. It seems like it should be present on all non…
		ldionneUnsubmitted Done Reply Inline Actions Roughly speaking, it needs to appear on everything that we don't want to export from the dylib. It's a bit more complicated, but that's the general idea. ldionne: Roughly speaking, it needs to appear on everything that we don't want to export from the dylib.
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Think I got them all... jsoyke: Think I got them all...
virtual const void* target(const type_info&) const _NOEXCEPT = 0;		virtual const void* target(const type_info&) const _NOEXCEPT = 0;
virtual const std::type_info& target_type() const _NOEXCEPT = 0;		virtual const std::type_info& target_type() const _NOEXCEPT = 0;
#endif // _LIBCPP_NO_RTTI		#endif // _LIBCPP_NO_RTTI
};		};

// __func implements __base for a given functor type.		// __func implements __base for a given functor type.

template<class _FD, class _Alloc, class _FB> class __func;		template<class _FD, class _Alloc, class _FB> class __func;
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	__func<_Fp, _Alloc, _Rp(_ArgTypes...)>::target_type() const _NOEXCEPT
return typeid(_Fp);		return typeid(_Fp);
}		}

#endif // _LIBCPP_NO_RTTI		#endif // _LIBCPP_NO_RTTI

// __value_func creates a value-type from a __func.		// __value_func creates a value-type from a __func.

template <class _Fp> class __value_func;		template <class _Fp> class __value_func;

		EricWFUnsubmitted Done Reply Inline Actions This implementation must work in all dialects (likely including C++03 w/ Clang extensions :-S ). So it has to support allocators. At minimum it needs to support calling the allocator constructors. We may be able to get away with ignoring the allocator though. I'll have to ask my other maintainers. Let me know if you want some help adding allocator support. EricWF: This implementation must work in all dialects (likely including C++03 w/ Clang extensions :-S ).
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Done, but I think I should add some test coverage before declaring triumph. I wouldn't mind ignoring the arguments though if that's preferable for some reason, a few code paths already do (copying from an existing std::function for example). jsoyke: Done, but I think I should add some test coverage before declaring triumph. I wouldn't mind…
template <class _Rp, class... _ArgTypes> class __value_func<_Rp(_ArgTypes...)>		template <class _Rp, class... _ArgTypes> class __value_func<_Rp(_ArgTypes...)>
		sbenzaUnsubmitted Done Reply Inline Actions Yesterday richardsmith was telling us that we should not need or want to use aligned_storage for pretty much anything. That we should char arrays directly, like alignas(8) char __small[2 * sizeof(void)]; btw, I don't think we want the default alignment from aligned_storage here. That can be 16, which is an overkill and will propagate to any other class that contains a `std::function`. We know that some allocators will be slower when requested alignment is larger than 8. I don't mind if some functors end up in the heap because of our tighter alignment requirements. It will still be a win for all the other functors that have smaller alignment. sbenza:* Yesterday richardsmith was telling us that we should not need or want to use aligned_storage…
{		{
typename aligned_storage<3 * sizeof(void*)>::type __buf_;		typename aligned_storage<3 * sizeof(void*)>::type __buf_;

typedef __base<_Rp(_ArgTypes...)> __func;		typedef __base<_Rp(_ArgTypes...)> __func;
__func* __f_;		__func* __f_;

_LIBCPP_NO_CFI static __func* __as_base(void* p)		_LIBCPP_NO_CFI static __func* __as_base(void* p)
{		{
return reinterpret_cast<__func*>(p);		return reinterpret_cast<__func*>(p);
}		}

public:		public:
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
		ldionneUnsubmitted Done Reply Inline Actions Typo: underlying ldionne: Typo: underlying
__value_func() _NOEXCEPT : __f_(0) {}		__value_func() _NOEXCEPT : __f_(0) {}

template <class _Fp, class _Alloc>		template <class _Fp, class _Alloc>
_LIBCPP_INLINE_VISIBILITY __value_func(_Fp&& __f, const _Alloc __a)		_LIBCPP_INLINE_VISIBILITY __value_func(_Fp&& __f, const _Alloc __a)
: __f_(0)		: __f_(0)
{		{
typedef allocator_traits<_Alloc> __alloc_traits;		typedef allocator_traits<_Alloc> __alloc_traits;
		ldionneUnsubmitted Done Reply Inline Actions I think this can be made `const`. ldionne: I think this can be made `const`.
		sbenzaUnsubmitted Done Reply Inline Actions `type_info` too sbenza: `type_info` too
typedef __function::__func<_Fp, _Alloc, _Rp(_ArgTypes...)> _Fun;		typedef __function::__func<_Fp, _Alloc, _Rp(_ArgTypes...)> _Fun;
typedef typename __rebind_alloc_helper<__alloc_traits, _Fun>::type		typedef typename __rebind_alloc_helper<__alloc_traits, _Fun>::type
_FunAlloc;		_FunAlloc;
		ldionneUnsubmitted Done Reply Inline Actions Mangle to `__type_info` or similar. I know it's not necessary because users can't `#define type_info`, but I find it better to be consistent and mangle everything that's not exposed to users. ldionne: Mangle to `__type_info` or similar. I know it's not necessary because users can't `#define…

if (__function::__not_null(__f))		if (__function::__not_null(__f))
		EricWFUnsubmitted Not Done Reply Inline Actions Nit: I don't think the `std::` qualifier is needed. EricWF: Nit: I don't think the `std::` qualifier is needed.
{		{
_FunAlloc __af(__a);		_FunAlloc __af(__a);
if (sizeof(_Fun) <= sizeof(__buf_) &&		if (sizeof(_Fun) <= sizeof(__buf_) &&
is_nothrow_copy_constructible<_Fp>::value &&		is_nothrow_copy_constructible<_Fp>::value &&
is_nothrow_copy_constructible<_FunAlloc>::value)		is_nothrow_copy_constructible<_FunAlloc>::value)
{		{
__f_ =		__f_ =
::new ((void*)&__buf_) _Fun(_VSTD::move(__f), _Alloc(__af));		::new ((void*)&__buf_) _Fun(_VSTD::move(__f), _Alloc(__af));
}		}
else		else
{		{
typedef __allocator_destructor<_FunAlloc> _Dp;		typedef __allocator_destructor<_FunAlloc> _Dp;
		EricWFUnsubmitted Done Reply Inline Actions We want to pass `_ArgTypes&&...` to prevent an additional copy on the way to the target function. EricWF: We want to pass `_ArgTypes&&...` to prevent an additional copy on the way to the target…
		sbenzaUnsubmitted Done Reply Inline Actions `static_cast<void>(__hold.get())` sbenza:* `static_cast<void*>(__hold.get())`
unique_ptr<__func, _Dp> __hold(__af.allocate(1), _Dp(__af, 1));		unique_ptr<__func, _Dp> __hold(__af.allocate(1), _Dp(__af, 1));
::new ((void*)__hold.get()) _Fun(_VSTD::move(__f), _Alloc(__a));		::new ((void*)__hold.get()) _Fun(_VSTD::move(__f), _Alloc(__a));
		EricWFUnsubmitted Not Done Reply Inline Actions I wouldn't mind naming these magic constants like: static const _LIBCPP_CONSTEXPR __policy __policy_ = { /* __clone / nullptr, / __destroy / nullptr, / __is_null / true, }; EricWF:* I wouldn't mind naming these magic constants like: ``` static const _LIBCPP_CONSTEXPR __policy…
__f_ = __hold.release();		__f_ = __hold.release();
		EricWFUnsubmitted Not Done Reply Inline Actions Small nit. I hate conditional compilation blocks, and would like as few as possible. I would write this as: template <class _Tp> _LIBCPP_INLINE_VISIBILITY static constexpr type_info __get_typeid() { #ifndef _LIBCPP_NO_RTTI return &typeid(_Tp); #else return nullptr; #endif and then just use that function everywhere else. EricWF:* Small nit. I hate conditional compilation blocks, and would like as few as possible. I would…
}		}
}		}
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
__value_func(const __value_func& __f)		__value_func(const __value_func& __f)
{		{
		EricWFUnsubmitted Done Reply Inline Actions We should forward everything as perfectly as we can. EricWF: We should forward everything as perfectly as we can.
if (__f.__f_ == 0)		if (__f.__f_ == 0)
__f_ = 0;		__f_ = 0;
else if ((void*)__f.__f_ == &__f.__buf_)		else if ((void*)__f.__f_ == &__f.__buf_)
{		{
__f_ = __as_base(&__buf_);		__f_ = __as_base(&__buf_);
__f.__f_->__clone(__f_);		__f.__f_->__clone(__f_);
}		}
else		else
__f_ = __f.__f_->__clone();		__f_ = __f.__f_->__clone();
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
		sbenzaUnsubmitted Done Reply Inline Actions `_VSTD`. There are other uses of `std::` still. sbenza: `_VSTD`. There are other uses of `std::` still.
		EricWFUnsubmitted Done Reply Inline Actions I actually have no idea why `_VSTD::` exists, or why we would ever need it. One day I hope to kill it. I'm guess we should do w/e the surrounding code does. EricWF: I actually have no idea why `_VSTD::` exists, or why we would ever need it. One day I hope to…
__value_func(__value_func&& __f) _NOEXCEPT		__value_func(__value_func&& __f) _NOEXCEPT
{		{
if (__f.__f_ == 0)		if (__f.__f_ == 0)
__f_ = 0;		__f_ = 0;
else if ((void*)__f.__f_ == &__f.__buf_)		else if ((void*)__f.__f_ == &__f.__buf_)
{		{
__f_ = __as_base(&__buf_);		__f_ = __as_base(&__buf_);
__f.__f_->__clone(__f_);		__f.__f_->__clone(__f_);
}		}
else		else
		sbenzaUnsubmitted Not Done Reply Inline Actions T&& is a pessimization for fundamental types. This should be something like: std::conditional_t< !_VSTD::is_reference_v<_ArgTypes> && sizeof(_ArgTypes) < sizeof(void) && _VSTD::is_trivially_copy_constructible<_ArgTypes> && _VSTD::is_trivially_destructible<_ArgTypes>, _ArgTypes, _ArgTypes&&> Basically, small trivially copyable types should be passed by value to avoid the indirection. This potentially saves stack space (if the input was not an lvalue already) and indirections in the the callee. sbenza:* T&& is a pessimization for fundamental types. This should be something like: ``` std…
		jsoykeAuthorUnsubmitted Not Done Reply Inline Actions I agree, this worked well for some internal optimizations we did, @EricWF objected at first, but maybe we should discuss it more. jsoyke: I agree, this worked well for some internal optimizations we did, @EricWF objected at first…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions I added this optimization here, at least partially. There are tests that make sure that std::function<void(SomePredeclaredThing)> works, so unfortunately I cannot use sizeof(T) here. jsoyke: I added this optimization here, at least partially. There are tests that make sure that std…
{		{
__f_ = __f.__f_;		__f_ = __f.__f_;
__f.__f_ = 0;		__f.__f_ = 0;
}		}
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
~__value_func()		~__value_func()
Show All 15 Lines	__value_func& operator=(__value_func&& __f)
__f_ = __as_base(&__buf_);		__f_ = __as_base(&__buf_);
__f.__f_->__clone(__f_);		__f.__f_->__clone(__f_);
}		}
else		else
{		{
__f_ = __f.__f_;		__f_ = __f.__f_;
__f.__f_ = 0;		__f.__f_ = 0;
}		}
return *this;		return *this;
		jsoykeAuthorUnsubmitted Done Reply Inline Actions This optimization wasn't in the last version, sbenza@ mentioned we might want to do it. It helps with the Invoke benchmarks, as much as 10-20%. It's probably worth mentioning that this can hurt performance in cases where the caller is using std::forward, or passing small value types by reference, but these are very uncommon (at least in my experience). jsoyke: This optimization wasn't in the last version, sbenza@ mentioned we might want to do it. It…
		sbenzaUnsubmitted Not Done Reply Inline Actions Can you show how this can hurt? Do you have a benchmark that gets better when this is removed? This will only trigger if the signature of the function itself takes by value. `std::function::operator()` will have it by value already, and the user callback will take it by value. If we pass by ref we will be forced to put it on the stack on a different address even if the caller passed an lvalue because `operator()`'s arguments forced a copy. We will need to read that data and pass it in by value anyway. We are only moving that lvalue-to-rvalue decay before the vtable instead of in the vtable. https://gcc.godbolt.org/z/R7JBq2 sbenza: Can you show how this can hurt? Do you have a benchmark that gets better when this is removed?
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Never-mind, my reasoning was flawed. I was basing my experience on a piece of software that did something like fast_forward but internally dispatched to std::function. It had to move register objects to the stack and made the fast_forward code perform worse, but we're fixing that problem here. jsoyke: Never-mind, my reasoning was flawed. I was basing my experience on a piece of software that did…
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
__value_func& operator=(nullptr_t)		__value_func& operator=(nullptr_t)
{		{
__func* __f = __f_;		__func* __f = __f_;
__f_ = 0;		__f_ = 0;
if ((void*)__f == &__buf_)		if ((void*)__f == &__buf_)
__f->destroy();		__f->destroy();
else if (__f)		else if (__f)
__f->destroy_deallocate();		__f->destroy_deallocate();
return *this;		return *this;
}		}
		ldionneUnsubmitted Done Reply Inline Actions This would be more readable: `__invoker(__use_small_storage<_Fun>() ? __call_small<_Fun> : __call_large<_Fun>)`. ldionne: This would be more readable: `__invoker(__use_small_storage<_Fun>() ? __call_small<_Fun>…
		sbenzaUnsubmitted Done Reply Inline Actions This would instantiate both functions when only one is needed. sbenza: This would instantiate both functions when only one is needed.
		ldionneUnsubmitted Done Reply Inline Actions I know, but what's the concern here? Compile-times or code bloat? If it's compile-time, we are instantiating a function just to do the dispatching. If it's code bloat, I'd be _very_ surprised if both definitions actually made it into the final executable, since one of them is obviously not used. This is not a blocking comment, but I'm not sure we're optimizing for the right thing here. Readability _is_ important, even in a standard library. ldionne: I know, but what's the concern here? Compile-times or code bloat? If it's compile-time, we are…
		sbenzaUnsubmitted Done Reply Inline Actions tbh, I think we can make the function be like: template <typename _Fun> static _Rp __call_impl(const __storage* __buf, _ArgTypes&&... __args) { _Fun* __f = reinterpret_cast<_Fun>( __use_small_storage<_Fun>::value ? __buf->__large : &__buf->__small); return (__f)(_VSTD::forward<_ArgTypes>(__args)...); } Then there is a single function that handles both. sbenza: tbh, I think we can make the function be like: ``` template <typename _Fun> static _Rp…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Thanks, that's way cleaner. jsoyke: Thanks, that's way cleaner.

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
_Rp operator()(_ArgTypes&&... __args) const		_Rp operator()(_ArgTypes&&... __args) const
{		{
if (__f_ == 0)		if (__f_ == 0)
__throw_bad_function_call();		__throw_bad_function_call();
		EricWFUnsubmitted Done Reply Inline Actions `::new (static_cast<void>(__hold.get())) _Fp(...)` just in case anybody is trying to hijack placement new. EricWF:* `::new (static_cast<void*>(__hold.get())) _Fp(...)` just in case anybody is trying to hijack…
		EricWFUnsubmitted Done Reply Inline Actions `_LIBCPP_CONSTEXPR` EricWF: `_LIBCPP_CONSTEXPR`
return (*__f_)(_VSTD::forward<_ArgTypes>(__args)...);		return (*__f_)(_VSTD::forward<_ArgTypes>(__args)...);
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
void swap(__value_func& __f) _NOEXCEPT		void swap(__value_func& __f) _NOEXCEPT
{		{
if (&__f == this)		if (&__f == this)
return;		return;
Show All 19 Lines	void swap(__value_func& __f) _NOEXCEPT
__f_ = __f.__f_;		__f_ = __f.__f_;
__f.__f_ = __as_base(&__f.__buf_);		__f.__f_ = __as_base(&__f.__buf_);
}		}
else if ((void*)__f.__f_ == &__f.__buf_)		else if ((void*)__f.__f_ == &__f.__buf_)
{		{
__f.__f_->__clone(__as_base(&__buf_));		__f.__f_->__clone(__as_base(&__buf_));
__f.__f_->destroy();		__f.__f_->destroy();
__f.__f_ = __f_;		__f.__f_ = __f_;
__f_ = __as_base(&__buf_);		__f_ = __as_base(&__buf_);
		EricWFUnsubmitted Done Reply Inline Actions Extra `;`. EricWF: Extra `;`.
}		}
else		else
_VSTD::swap(__f_, __f.__f_);		_VSTD::swap(__f_, __f.__f_);
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
_LIBCPP_EXPLICIT operator bool() const _NOEXCEPT { return __f_ != 0; }		_LIBCPP_EXPLICIT operator bool() const _NOEXCEPT { return __f_ != 0; }

#ifndef _LIBCPP_NO_RTTI		#ifndef _LIBCPP_NO_RTTI
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
const std::type_info& target_type() const _NOEXCEPT		const std::type_info& target_type() const _NOEXCEPT
{		{
if (__f_ == 0)		if (__f_ == 0)
return typeid(void);		return typeid(void);
return __f_->target_type();		return __f_->target_type();
}		}

template <typename _Tp>		template <typename _Tp>
_LIBCPP_INLINE_VISIBILITY const _Tp* target() const _NOEXCEPT		_LIBCPP_INLINE_VISIBILITY const _Tp* target() const _NOEXCEPT
{		{
if (__f_ == 0)		if (__f_ == 0)
return 0;		return 0;
return (const _Tp*)__f_->target(typeid(_Tp));		return (const _Tp*)__f_->target(typeid(_Tp));
}		}
		sbenzaUnsubmitted Done Reply Inline Actions If `_LIBCPP_NO_RTTI`, we could optimize this by having a single global policy for all small objects. sbenza: If `_LIBCPP_NO_RTTI`, we could optimize this by having a single global policy for all small…
		jsoykeAuthorUnsubmitted Not Done Reply Inline Actions Is that worth it? I wouldn't think disabling RTTI is common. jsoyke: Is that worth it? I wouldn't think disabling RTTI is common.
		sbenzaUnsubmitted Not Done Reply Inline Actions No idea. Projects that disable RTTI might like the extra saving? sbenza: No idea. Projects that disable RTTI might like the extra saving?
#endif // _LIBCPP_NO_RTTI		#endif // _LIBCPP_NO_RTTI
};		};

		// Storage for a functor object, to be used with __policy to manage copy and
		// destruction.
		union __policy_storage
		{
		mutable char __small[sizeof(void) 2];
		void* __large;
		};

		// True if _Fun can safely be held in __policy_storage.__small.
		template <typename _Fun>
		struct __use_small_storage
		: public _VSTD::integral_constant<
		bool, sizeof(_Fun) <= sizeof(__policy_storage) &&
		alignof(_Fun) <= alignof(__policy_storage) &&
		_VSTD::is_trivially_copy_constructible<_Fun>::value &&
		EricWFUnsubmitted Not Done Reply Inline Actions Nit: The `_VSTD::` is unnecessarily when referencing types. EricWF: Nit: The `_VSTD::` is unnecessarily when referencing types.
		_VSTD::is_trivially_destructible<_Fun>::value> {};

		// Policy contains information about how to copy, destroy, and move the
		// underlying functor. You can think of it as a vtable of sorts.
		struct __policy
		{
		// Used to copy or destroy __large values. null for trivial objects.
		void* (const __clone)(const void);
		void (const __destroy)(void);

		// True if this is the null policy (no value).
		const bool __is_null;

		// The target type. May be null if RTTI is disabled.
		const std::type_info* const __type_info;

		// Returns a pointer to a static policy object suitable for the functor
		// type.
		template <typename _Fun>
		_LIBCPP_INLINE_VISIBILITY static const __policy* __create()
		{
		return __choose_policy<_Fun>(__use_small_storage<_Fun>());
		}

		_LIBCPP_INLINE_VISIBILITY
		static const __policy* __create_empty()
		{
		static const _LIBCPP_CONSTEXPR __policy __policy_ = {nullptr, nullptr,
		true,
		#ifndef _LIBCPP_NO_RTTI
		&typeid(void)
		#else
		nullptr
		#endif
		};
		return &__policy_;
		}

		private:
		template <typename _Fun> static void* __large_clone(const void* __s)
		{
		const _Fun* __f = static_cast<const _Fun*>(__s);
		return __f->__clone();
		}

		template <typename _Fun> static void __large_destroy(void* __s)
		{
		typedef allocator_traits<typename _Fun::_Alloc> __alloc_traits;
		typedef typename __rebind_alloc_helper<__alloc_traits, _Fun>::type
		_FunAlloc;
		_Fun* __f = static_cast<_Fun*>(__s);
		_FunAlloc __a(__f->__allocator());
		__f->destroy();
		__a.deallocate(__f, 1);
		}

		template <typename _Fun>
		_LIBCPP_INLINE_VISIBILITY static const __policy*
		__choose_policy(/* is_small = */ false_type)
		{
		static const _LIBCPP_CONSTEXPR __policy __policy_ = {
		&__large_clone<_Fun>, &__large_destroy<_Fun>, false,
		#ifndef _LIBCPP_NO_RTTI
		&typeid(typename _Fun::_Target)
		#else
		nullptr
		#endif
		};
		return &__policy_;
		}

		template <typename _Fun>
		_LIBCPP_INLINE_VISIBILITY static const __policy*
		__choose_policy(/* is_small = */ true_type)
		{
		static const _LIBCPP_CONSTEXPR __policy __policy_ = {
		nullptr, nullptr, false,
		#ifndef _LIBCPP_NO_RTTI
		&typeid(typename _Fun::_Target)
		#else
		nullptr
		#endif
		};
		return &__policy_;
		}
		};

		// Used to choose between perfect forwarding or pass-by-value. Pass-by-value is
		// faster for types that can be passed in registers.
		template <typename _Tp>
		using __fast_forward =
		typename _VSTD::conditional<_VSTD::is_scalar<_Tp>::value, _Tp, _Tp&&>::type;

		// __policy_invoker calls an instance of __alloc_func held in __policy_storage.

		template <class _Fp> struct __policy_invoker;

		template <class _Rp, class... _ArgTypes>
		struct __policy_invoker<_Rp(_ArgTypes...)>
		{
		typedef _Rp (__Call)(const __policy_storage,
		__fast_forward<_ArgTypes>...);

		__Call __call_;

		// Creates an invoker that throws bad_function_call.
		_LIBCPP_INLINE_VISIBILITY
		__policy_invoker() : __call_(&__call_empty) {}

		// Creates an invoker that calls the given instance of __func.
		template <typename _Fun>
		_LIBCPP_INLINE_VISIBILITY static __policy_invoker __create()
		{
		return __policy_invoker(&__call_impl<_Fun>);
		}

		private:
		_LIBCPP_INLINE_VISIBILITY
		explicit __policy_invoker(__Call __c) : __call_(__c) {}

		static _Rp __call_empty(const __policy_storage*,
		__fast_forward<_ArgTypes>...)
		{
		__throw_bad_function_call();
		}

		template <typename _Fun>
		static _Rp __call_impl(const __policy_storage* __buf,
		__fast_forward<_ArgTypes>... __args)
		{
		_Fun* __f = reinterpret_cast<_Fun*>(__use_small_storage<_Fun>::value
		? &__buf->__small
		: __buf->__large);
		return (*__f)(_VSTD::forward<_ArgTypes>(__args)...);
		}
		};

		// __policy_func uses a __policy and __policy_invoker to create a type-erased,
		// copyable functor.

		template <class _Fp> class __policy_func;

		template <class _Rp, class... _ArgTypes> class __policy_func<_Rp(_ArgTypes...)>
		{
		// Inline storage for small objects.
		__policy_storage __buf_;

		// Calls the value stored in __buf_. This could technically be part of
		// policy, but storing it here eliminates a level of indirection inside
		// operator().
		typedef __function::__policy_invoker<_Rp(_ArgTypes...)> __invoker;
		__invoker __invoker_;

		// The policy that describes how to move / copy / destroy __buf_. Never
		// null, even if the function is empty.
		const __policy* __policy_;

		public:
		_LIBCPP_INLINE_VISIBILITY
		__policy_func() : __policy_(__policy::__create_empty()) {}

		template <class _Fp, class _Alloc>
		_LIBCPP_INLINE_VISIBILITY __policy_func(_Fp&& __f, const _Alloc& __a)
		: __policy_(__policy::__create_empty())
		{
		typedef __alloc_func<_Fp, _Alloc, _Rp(_ArgTypes...)> _Fun;
		typedef allocator_traits<_Alloc> __alloc_traits;
		typedef typename __rebind_alloc_helper<__alloc_traits, _Fun>::type
		_FunAlloc;

		if (__function::__not_null(__f))
		{
		__invoker_ = __invoker::template __create<_Fun>();
		sbenzaUnsubmitted Done Reply Inline Actions We don't need a branch here. Also, unconditionally making the moved-from empty can improve optimizations. sbenza: We don't need a branch here. Also, unconditionally making the moved-from empty can improve…
		__policy_ = __policy::__create<_Fun>();

		_FunAlloc __af(__a);
		if (__use_small_storage<_Fun>())
		{
		::new ((void*)&__buf_.__small)
		_Fun(_VSTD::move(__f), _Alloc(__af));
		}
		else
		{
		typedef __allocator_destructor<_FunAlloc> _Dp;
		unique_ptr<_Fun, _Dp> __hold(__af.allocate(1), _Dp(__af, 1));
		::new ((void*)__hold.get())
		_Fun(_VSTD::move(__f), _Alloc(__af));
		__buf_.__large = __hold.release();
		}
		}
		}

		_LIBCPP_INLINE_VISIBILITY
		__policy_func(const __policy_func& __f)
		: __buf_(__f.__buf_), __invoker_(__f.__invoker_),
		__policy_(__f.__policy_)
		{
		if (__policy_->__clone)
		__buf_.__large = __policy_->__clone(__f.__buf_.__large);
		}

		sbenzaUnsubmitted Done Reply Inline Actions Is this necessary? sbenza: Is this necessary?
		_LIBCPP_INLINE_VISIBILITY
		__policy_func(__policy_func&& __f)
		: __buf_(__f.__buf_), __invoker_(__f.__invoker_),
		__policy_(__f.__policy_)
		{
		if (__policy_->__destroy)
		{
		__f.__policy_ = __policy::__create_empty();
		__f.__invoker_ = __invoker();
		}
		}

		_LIBCPP_INLINE_VISIBILITY
		~__policy_func()
		{
		if (__policy_->__destroy)
		__policy_->__destroy(__buf_.__large);
		}

		_LIBCPP_INLINE_VISIBILITY
		sbenzaUnsubmitted Done Reply Inline Actions Shouldn't this be guarded for `_LIBCPP_NO_RTTI`? You can't use `typeid` sbenza: Shouldn't this be guarded for `_LIBCPP_NO_RTTI`? You can't use `typeid`
		__policy_func& operator=(__policy_func&& __f)
		ldionneUnsubmitted Done Reply Inline Actions `nullptr`? ldionne: `nullptr`?
		{
		*this = nullptr;
		ldionneUnsubmitted Done Reply Inline Actions `reinterpret_cast` ldionne: `reinterpret_cast`
		jsoykeAuthorUnsubmitted Done Reply Inline Actions FWIW, this ends up casting 'const void' to 'void ()()' in some cases, which I think is undefined, but the existing implementation relies on this so I didn't worry about fixing it here. jsoyke: FWIW, this ends up casting 'const void' to 'void ()()' in some cases, which I think is…
		__buf_ = __f.__buf_;
		__invoker_ = __f.__invoker_;
		__policy_ = __f.__policy_;
		__f.__policy_ = __policy::__create_empty();
		__f.__invoker_ = __invoker();
		return *this;
		}

		_LIBCPP_INLINE_VISIBILITY
		__policy_func& operator=(nullptr_t)
		{
		const __policy* __p = __policy_;
		__policy_ = __policy::__create_empty();
		__invoker_ = __invoker();
		if (__p->__destroy)
		__p->__destroy(__buf_.__large);
		return *this;
		}

		_LIBCPP_INLINE_VISIBILITY
		_Rp operator()(_ArgTypes&&... __args) const
		{
		return __invoker_.__call_(_VSTD::addressof(__buf_),
		_VSTD::forward<_ArgTypes>(__args)...);
		}

		_LIBCPP_INLINE_VISIBILITY
		void swap(__policy_func& __f)
		{
		_VSTD::swap(__invoker_, __f.__invoker_);
		_VSTD::swap(__policy_, __f.__policy_);
		_VSTD::swap(__buf_, __f.__buf_);
		}

		_LIBCPP_INLINE_VISIBILITY
		explicit operator bool() const _NOEXCEPT
		{
		return !__policy_->__is_null;
		}

		#ifndef _LIBCPP_NO_RTTI
		_LIBCPP_INLINE_VISIBILITY
		const std::type_info& target_type() const _NOEXCEPT
		{
		return *__policy_->__type_info;
		}

		template <typename _Tp>
		_LIBCPP_INLINE_VISIBILITY const _Tp* target() const _NOEXCEPT
		{
		if (__policy_->__is_null \|\| typeid(_Tp) != *__policy_->__type_info)
		return nullptr;
		if (__policy_->__clone) // Out of line storage.
		return reinterpret_cast<const _Tp*>(__buf_.__large);
		else
		return reinterpret_cast<const _Tp*>(&__buf_.__small);
		}
		#endif // _LIBCPP_NO_RTTI
		};

} // __function		} // __function

template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
class _LIBCPP_TEMPLATE_VIS function<_Rp(_ArgTypes...)>		class _LIBCPP_TEMPLATE_VIS function<_Rp(_ArgTypes...)>
: public __function::__maybe_derive_from_unary_function<_Rp(_ArgTypes...)>,		: public __function::__maybe_derive_from_unary_function<_Rp(_ArgTypes...)>,
public __function::__maybe_derive_from_binary_function<_Rp(_ArgTypes...)>		public __function::__maybe_derive_from_binary_function<_Rp(_ArgTypes...)>
{		{
		#ifndef _LIBCPP_ABI_OPTIMIZED_FUNCTION
typedef __function::__value_func<_Rp(_ArgTypes...)> __func;		typedef __function::__value_func<_Rp(_ArgTypes...)> __func;
		#else
		typedef __function::__policy_func<_Rp(_ArgTypes...)> __func;
		#endif
		ldionneUnsubmitted Done Reply Inline Actions Please remove. :-) ldionne: Please remove. :-)

__func __f_;		__func __f_;

template <class _Fp, bool = __lazy_and<		template <class _Fp, bool = __lazy_and<
integral_constant<bool, !is_same<__uncvref_t<_Fp>, function>::value>,		integral_constant<bool, !is_same<__uncvref_t<_Fp>, function>::value>,
__invokable<_Fp&, _ArgTypes...>		__invokable<_Fp&, _ArgTypes...>
>::value>		>::value>
struct __callable;		struct __callable;
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	#if _LIBCPP_STD_VER <= 14
template<class _Fp, class _Alloc>		template<class _Fp, class _Alloc>
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
void assign(_Fp&& __f, const _Alloc& __a)		void assign(_Fp&& __f, const _Alloc& __a)
{function(allocator_arg, __a, _VSTD::forward<_Fp>(__f)).swap(*this);}		{function(allocator_arg, __a, _VSTD::forward<_Fp>(__f)).swap(*this);}
#endif		#endif

// function capacity:		// function capacity:
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
_LIBCPP_EXPLICIT operator bool() const _NOEXCEPT {		_LIBCPP_EXPLICIT operator bool() const _NOEXCEPT {
return static_cast<bool>(__f_);		return static_cast<bool>(__f_);
}		}
		ldionneUnsubmitted Done Reply Inline Actions `static_cast<bool>`? Just a matter of style. ldionne: `static_cast<bool>`? Just a matter of style.

// deleted overloads close possible hole in the type system		// deleted overloads close possible hole in the type system
template<class _R2, class... _ArgTypes2>		template<class _R2, class... _ArgTypes2>
bool operator==(const function<_R2(_ArgTypes2...)>&) const = delete;		bool operator==(const function<_R2(_ArgTypes2...)>&) const = delete;
template<class _R2, class... _ArgTypes2>		template<class _R2, class... _ArgTypes2>
bool operator!=(const function<_R2(_ArgTypes2...)>&) const = delete;		bool operator!=(const function<_R2(_ArgTypes2...)>&) const = delete;
public:		public:
// function invocation:		// function invocation:
_Rp operator()(_ArgTypes...) const;		_Rp operator()(_ArgTypes...) const;

#ifndef _LIBCPP_NO_RTTI		#ifndef _LIBCPP_NO_RTTI
// function target access:		// function target access:
const std::type_info& target_type() const _NOEXCEPT;		const std::type_info& target_type() const _NOEXCEPT;
template <typename _Tp> _Tp* target() _NOEXCEPT;		template <typename _Tp> _Tp* target() _NOEXCEPT;
template <typename _Tp> const _Tp* target() const _NOEXCEPT;		template <typename _Tp> const _Tp* target() const _NOEXCEPT;
#endif // _LIBCPP_NO_RTTI		#endif // _LIBCPP_NO_RTTI
};		};

template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
function<_Rp(_ArgTypes...)>::function(const function& __f) : __f_(__f.__f_) {}		function<_Rp(_ArgTypes...)>::function(const function& __f) : __f_(__f.__f_) {}

#if _LIBCPP_STD_VER <= 14		#if _LIBCPP_STD_VER <= 14
template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
		sbenzaUnsubmitted Done Reply Inline Actions Do we want to unconditionally copy `__buf_` even when `__clone` is called? sbenza: Do we want to unconditionally copy `__buf_` even when `__clone` is called?
		jsoykeAuthorUnsubmitted Done Reply Inline Actions It's not worth the extra branch instruction based on benchmarks. jsoyke: It's not worth the extra branch instruction based on benchmarks.
template <class _Alloc>		template <class _Alloc>
function<_Rp(_ArgTypes...)>::function(allocator_arg_t, const _Alloc&,		function<_Rp(_ArgTypes...)>::function(allocator_arg_t, const _Alloc&,
const function& __f) : __f_(__f.__f_) {}		const function& __f) : __f_(__f.__f_) {}
#endif		#endif

template <class _Rp, class... _ArgTypes>		template <class _Rp, class... _ArgTypes>
function<_Rp(_ArgTypes...)>::function(function&& __f) _NOEXCEPT		function<_Rp(_ArgTypes...)>::function(function&& __f) _NOEXCEPT
: __f_(_VSTD::move(__f.__f_)) {}		: __f_(_VSTD::move(__f.__f_)) {}

#if _LIBCPP_STD_VER <= 14		#if _LIBCPP_STD_VER <= 14
template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
template <class _Alloc>		template <class _Alloc>
		sbenzaUnsubmitted Done Reply Inline Actions I don't think we should check. Setting the source to empty unconditionally should be faster than checking. (same on the other move constructor below) sbenza: I don't think we should check. Setting the source to empty unconditionally should be faster…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions This actually does perform better (at least on my workstation). Presumably because writes are more expensive than a conditional. jsoyke: This actually does perform better (at least on my workstation). Presumably because writes are…
function<_Rp(_ArgTypes...)>::function(allocator_arg_t, const _Alloc&,		function<_Rp(_ArgTypes...)>::function(allocator_arg_t, const _Alloc&,
function&& __f)		function&& __f)
: __f_(_VSTD::move(__f.__f_)) {}		: __f_(_VSTD::move(__f.__f_)) {}
#endif		#endif

template <class _Rp, class... _ArgTypes>		template <class _Rp, class... _ArgTypes>
template <class _Fp, class>		template <class _Fp, class>
function<_Rp(_ArgTypes...)>::function(_Fp __f)		function<_Rp(_ArgTypes...)>::function(_Fp __f)
: __f_(_VSTD::move(__f), allocator<_Fp>()) {}		: __f_(_VSTD::move(__f), allocator<_Fp>()) {}
		sbenzaUnsubmitted Done Reply Inline Actions Do we need to do this here? Can't we do it on an `else` from the `__not_null`? That way it won't even exist unless `_Fp` is a pointer. sbenza: Do we need to do this here? Can't we do it on an `else` from the `__not_null`? That way it…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Last time I looked it got compiled out anyway since the members are trivial, unread, and unconditionally set below. jsoyke: Last time I looked it got compiled out anyway since the members are trivial, unread, and…

#if _LIBCPP_STD_VER <= 14		#if _LIBCPP_STD_VER <= 14
template <class _Rp, class... _ArgTypes>		template <class _Rp, class... _ArgTypes>
template <class _Fp, class _Alloc, class>		template <class _Fp, class _Alloc, class>
function<_Rp(_ArgTypes...)>::function(allocator_arg_t, const _Alloc& __a,		function<_Rp(_ArgTypes...)>::function(allocator_arg_t, const _Alloc& __a,
_Fp __f)		_Fp __f)
: __f_(_VSTD::move(__f), __a) {}		: __f_(_VSTD::move(__f), __a) {}
#endif		#endif

template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
function<_Rp(_ArgTypes...)>&		function<_Rp(_ArgTypes...)>&
function<_Rp(_ArgTypes...)>::operator=(const function& __f)		function<_Rp(_ArgTypes...)>::operator=(const function& __f)
{		{
function(__f).swap(*this);		function(__f).swap(*this);
return *this;		return *this;
}		}

template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
function<_Rp(_ArgTypes...)>&		function<_Rp(_ArgTypes...)>&
function<_Rp(_ArgTypes...)>::operator=(function&& __f) _NOEXCEPT		function<_Rp(_ArgTypes...)>::operator=(function&& __f) _NOEXCEPT
{		{
		EricWFUnsubmitted Done Reply Inline Actions `std::addressof` just in case? EricWF: `std::addressof` just in case?
__f_ = std::move(__f.__f_);		__f_ = std::move(__f.__f_);
		ldionneUnsubmitted Done Reply Inline Actions We didn't check for that before, did we? IOW the implementation worked transparently even in the case of a self move-assignment. Would it be possible to retain that property (and save this check in the current implementation)? ldionne: We didn't check for that before, did we? IOW the implementation worked transparently even in…
		sbenzaUnsubmitted Done Reply Inline Actions Without this check the code is equally correct, as in the moved-from value has a valid-but-unspecified state. The lhs happens to also be the moved-from value, but that is fine. sbenza: Without this check the code is equally correct, as in the moved-from value has a valid-but…
		ldionneUnsubmitted Done Reply Inline Actions Rephrasing to make sure I understand: If `this == &__f`, then after the assignment `__f` is in a valid-but-unspecified state (in this case it is empty). Since `this == &__f`, `this` is also in the same valid-but-unspecified state. I don't think this makes sense for self-assignment, because the `lhs` is usually understood to contain the value of the `rhs` after a move-assignment. IOW, the following should hold: T x = ...; T copy = x; T y = std::move(x); assert(y == copy); This is not the case here. ldionne:* Rephrasing to make sure I understand: If `this == &__f`, then after the assignment `__f` is in…
		sbenzaUnsubmitted Done Reply Inline Actions The example above does not have any self-assignment (or any assignment). My argument is that `y = std::move(y);` leaves `y` in an unspecified state. It is not required to be a noop. This is how it behaves today and it is up to spec. sbenza: The example above does not have any self-assignment (or any assignment). My argument is that…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions Self move assignment left the function empty before, preserving that behavior now. jsoyke: Self move assignment left the function empty before, preserving that behavior now.
		ldionneUnsubmitted Done Reply Inline Actions Yeah, my example is obviously broken. What I meant is just that one would expect the `lhs` to contain the value of the `rhs`, not for that value to be nuked. I understand the current implementation is valid in terms of the specification. I'm saying the specification does not satisfy the principle of least surprise. Leave it as-is. ldionne: Yeah, my example is obviously broken. What I meant is just that one would expect the `lhs` to…
return *this;		return *this;
		jsoykeAuthorUnsubmitted Done Reply Inline Actions This causes a regression in the non-optimized version, presumably because swap() has a lot of conditionals and potential out-of-line calls, I'll fix this if the new code structure looks good otherwise. jsoyke: This causes a regression in the non-optimized version, presumably because swap() has a lot of…
		sbenzaUnsubmitted Not Done Reply Inline Actions If we are going to use the swap trick, then we don't need to do `=nullptr` in the line before. sbenza: If we are going to use the swap trick, then we don't need to do `=nullptr` in the line before.
		jsoykeAuthorUnsubmitted Done Reply Inline Actions There is a test for "re-entrant" destruction, in other words the functor's destructor sets the std::function's value to something else during destruction. Not nulling this out before would cause that test to fail. jsoyke: There is a test for "re-entrant" destruction, in other words the functor's destructor sets the…
}		}

template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
function<_Rp(_ArgTypes...)>&		function<_Rp(_ArgTypes...)>&
function<_Rp(_ArgTypes...)>::operator=(nullptr_t) _NOEXCEPT		function<_Rp(_ArgTypes...)>::operator=(nullptr_t) _NOEXCEPT
{		{
__f_ = nullptr;		__f_ = nullptr;
return *this;		return *this;
}		}

template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
		EricWFUnsubmitted Done Reply Inline Actions s/_LIBCPP_EXPLICIT/explicit EricWF: s/_LIBCPP_EXPLICIT/explicit
template <class _Fp, class>		template <class _Fp, class>
		sbenzaUnsubmitted Done Reply Inline Actions It is not clear if `__construct_empty` will overwrite `__buf_` just by reading this. I would read `__buf_.__large` into a local variable before calling `__construct_empty()` sbenza: It is not clear if `__construct_empty` will overwrite `__buf_` just by reading this. I would…
		jsoykeAuthorUnsubmitted Done Reply Inline Actions This should be clearer in the new version of the code which sets these fields explicitly. jsoyke: This should be clearer in the new version of the code which sets these fields explicitly.
function<_Rp(_ArgTypes...)>&		function<_Rp(_ArgTypes...)>&
function<_Rp(_ArgTypes...)>::operator=(_Fp&& __f)		function<_Rp(_ArgTypes...)>::operator=(_Fp&& __f)
{		{
function(_VSTD::forward<_Fp>(__f)).swap(*this);		function(_VSTD::forward<_Fp>(__f)).swap(*this);
return *this;		return *this;
}		}

template<class _Rp, class ..._ArgTypes>		template<class _Rp, class ..._ArgTypes>
▲ Show 20 Lines • Show All 625 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add a version of std::function that includes a few optimizations.ClosedPublic

Details

Diff Detail

Event Timeline

Benchmark Time CPU Time Old Time New CPU Old CPU New

Revision Contents

Diff 177568

include/__config

include/functional

Add a version of std::function that includes a few optimizations.
ClosedPublic