This is an archive of the discontinued LLVM Phabricator instance.

Add a version of std::function that includes a few optimizations.
ClosedPublic

Authored by jsoyke on Nov 28 2018, 9:07 PM.

Details

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ldionne requested changes to this revision.Nov 29 2018, 9:03 AM

I'd like to see benchmarking results that show the benefit of this approach.

Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).

This revision now requires changes to proceed.Nov 29 2018, 9:03 AM

I'd like to see benchmarking results that show the benefit of this approach.

The benchmarks are already present. We should add the results to this review to make it easier to view, but it's also possible to verify yourself.

Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).

Well we have to continue supporting the old ABI, and I don't want yet another std::function to maintain; maintaining 4 separate versions is already a pain (the C++11 one, and three C++03 versions).
I much prefer having the two sets of code interleaved, so when std::function::foo needs a change or a bugfix, both sets of code are next to each other.

jsoyke updated this revision to Diff 176031.Nov 29 2018, 8:09 PM
jsoyke marked 3 inline comments as done.

Addressing some comments, using explicit bools for 'trivial' and 'is_small' again (these perform better for some reason?).

Current benchmark output with clang-7:

And a paste of it for easy viewing: https://pastebin.com/raw/5P5zqtuy

One of the most important parts of this change in system-level benchmarks is eliminating the 3 levels of pointer / vtable indirection (__f_ -> vtable -> operator()) and 1 conditional in std::function::operator() (which doesn't really show up in the micro-benchmark, presumably because the cache stays hot or the compiler can optimize it very easily).

There are some alternative versions of this change that are worth considering, for example we could only eliminate the conditional in operator() and only eliminate one level of pointer indirection, which would allow for larger inline captures (up to sizeof(void*)*3, which happens to be large enough to accommodate most containers captured by value).

Another possibility is always storing non-trivial data out-of-line, which makes move operations really fast at the expense of an extra allocation for small non-trivial objects (in some of our codebases moving std::functions tends to be done in critical regions, like popping off of a thread-safe work queue, so the extra allocation may be worth it for some workloads).

ldionne requested changes to this revision.Nov 30 2018, 3:39 AM

I'd like to see benchmarking results that show the benefit of this approach.

The benchmarks are already present. We should add the results to this review to make it easier to view, but it's also possible to verify yourself.

I know, I wanted the author to do the work.

Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).

Well we have to continue supporting the old ABI, and I don't want yet another std::function to maintain; maintaining 4 separate versions is already a pain (the C++11 one, and three C++03 versions).
I much prefer having the two sets of code interleaved, so when std::function::foo needs a change or a bugfix, both sets of code are next to each other.

The thing is that we're making a choice here: it's possible that we could have more implementations of std::function coming down the line, and in that case interleaving the code wouldn't be a good idea. Instead, having a way to switch between several implementations more easily (understanding we can only pick one for a given ABI) would be better. We could experiment with different implementations and then settle on the one for the next ABI version. This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations). Defining this new std::function separately is one way of doing it, and defining some "hooks" that can be used to tweak storage and virtual dispatch (like https://github.com/ldionne/dyno) is another better but more complicated way.

Current benchmark output with clang-7:

And a paste of it for easy viewing: https://pastebin.com/raw/5P5zqtuy

Thanks! What is this output? Is this some "diff" view between the current and the proposed patch? The time column with floating points represents a change in percentage from the baseline to the candidate (e.g. +0.13 == +13%)? I've used Google Benchmark in the past but I've never seen output exactly like that.

There are some alternative versions of this change that are worth considering, for example we could only eliminate the conditional in operator() and only eliminate one level of pointer indirection, which would allow for larger inline captures (up to sizeof(void*)*3, which happens to be large enough to accommodate most containers captured by value).

Another possibility is always storing non-trivial data out-of-line, which makes move operations really fast at the expense of an extra allocation for small non-trivial objects (in some of our codebases moving std::functions tends to be done in critical regions, like popping off of a thread-safe work queue, so the extra allocation may be worth it for some workloads).

We could also experiment with different sizes of SBO and see what's best. I don't think it's a one-size-fits-all problem (unless 3*sizeof(void) is magic for some reason?).

This revision now requires changes to proceed.Nov 30 2018, 3:39 AM

Thanks! What is this output? Is this some "diff" view between the current and the proposed patch? The time column with floating points represents a change in percentage from the baseline to the candidate (e.g. +0.13 == +13%)? I've used Google Benchmark in the past but I've never seen output exactly like that.

The output is the time delta, not a fraction. It's the output of libcxx/utils/google-benchmark/tools/compare.py. Some of the operations are sub-nano-second so it's handy for that. The times are so low here we may want to time ~10 operations or something instead of just one, but the comparison seems roughly inline with what I've seen with our internal tools.

We could also experiment with different sizes of SBO and see what's best. I don't think it's a one-size-fits-all problem (unless 3*sizeof(void) is magic for some reason?).

I could probably instrument the code to get a histogram of capture sizes we see in some real codebases, that could help inform any decisions here.

jsoyke updated this revision to Diff 176117.Nov 30 2018, 7:18 AM

Added benchmark that creates many different types of functors to avoid branch prediction.

The output is the time delta, not a fraction. It's the output of libcxx/utils/google-benchmark/tools/compare.py. Some of the operations are sub-nano-second so it's handy for that. The times are so low here we may want to time ~10 operations or something instead of just one, but the comparison seems roughly inline with what I've seen with our internal tools.

I think I may have been wrong about this... I haven't really used this tool much, it might be nice if it showed % or something.

I added a benchmark that instantiates many different types of functors to get a feel for how this would behave in a more realistic setting:

Benchmark                              Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------
BM_MixedFunctorTypes                -0.3121         -0.3122         19677         13535         19677         13534

Looking at that it seems like the -0.3121 must be a fraction... Not sure why I thought it was a delta.

I need to statically generate a bunch of different types in this new benchmark, which makes compilation really slow. Interested in suggestions on how to improve that.

jsoyke updated this revision to Diff 176122.Nov 30 2018, 7:45 AM

Adding __ to all internal symbols.

FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.

jsoyke marked an inline comment as done.Nov 30 2018, 7:50 AM
jsoyke added inline comments.
include/functional
1476

Note to self: Remove this before trying to submit. I'm only keeping it here during development and benchmarking because it makes it easier to copy / paste the code into our internal toolchain.

Bah, I just realized I still have a bunch of local variables that don't use reserved names. Fixing.

jsoyke updated this revision to Diff 176139.Nov 30 2018, 8:41 AM

Fixing local variables and template paramters that didn't use reserved names.

jsoyke updated this revision to Diff 176202.Nov 30 2018, 2:09 PM

Switching to version of optimized function that doesn't include small-object-optimization for small non-trivial objects.

This seems to perform better in the new benchmark I added:

Benchmark Time CPU Time Old Time New CPU Old CPU New

BM_MixedFunctorTypes -0.3794 -0.3794 19661 12202 19660 12202

Heres another version that might be worth discussing. It's a much simpler implementation because it doesn't attempt small value optimization for non-trivial objects, this makes operations like move-construction very fast.

Here is a paste of the benchmark comparison: https://pastebin.com/raw/pLrevUAc

I've had both implementations laying around for a while, but some analysis that I did today suggests that the current version (without the non-trivial SOO) might be better for some important workloads.

FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.

I'm strongly against it. The harder it is to see all the implementations of a given function f, the harder it is to know they exist and keep them in sync. In my experience spreading these things out leads to real bugs.

@ldionne wrote:

This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations).

I would like that too. but that's not a blocking issue on this review.
I also don't agree that we'll see more std::function implementations in the future. We won't. We're not going to standardize additional features to std::function unless they're not ABI breaking. That is, if the standard does function it'll won't require another version of it.
Maybe you were thinking about things like std::unique_function?

Note that having the C++03 functions be in a separate file with no relation or interleaving with the C++11 implementation has caused plenty of bugs in the past where they diverge.

EricWF added inline comments.Nov 30 2018, 5:48 PM
include/functional
1677

This implementation must work in all dialects (likely including C++03 w/ Clang extensions :-S ). So it has to support allocators.

At minimum it needs to support calling the allocator constructors. We may be able to get away with ignoring the allocator though. I'll have to ask my other maintainers.

Let me know if you want some help adding allocator support.

1798

_LIBCPP_CONSTEXPR

jsoyke updated this revision to Diff 176239.Nov 30 2018, 9:01 PM

Adding allocator support.

jsoyke updated this revision to Diff 176240.Nov 30 2018, 9:06 PM
jsoyke marked 2 inline comments as done.

Use _LIBCPP_CONSTEXPR

jsoyke added inline comments.Nov 30 2018, 9:07 PM
include/functional
1677

Done, but I think I should add some test coverage before declaring triumph.

I wouldn't mind ignoring the arguments though if that's preferable for some reason, a few code paths already do (copying from an existing std::function for example).

FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.

I'm strongly against it. The harder it is to see all the implementations of a given function f, the harder it is to know they exist and keep them in sync. In my experience spreading these things out leads to real bugs.

@ldionne wrote:

This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations).

I would like that too. but that's not a blocking issue on this review.
I also don't agree that we'll see more std::function implementations in the future. We won't. We're not going to standardize additional features to std::function unless they're not ABI breaking. That is, if the standard does function it'll won't require another version of it.
Maybe you were thinking about things like std::unique_function?

Note that having the C++03 functions be in a separate file with no relation or interleaving with the C++11 implementation has caused plenty of bugs in the past where they diverge.

I don't see the issue with breaking ABI that's designated as unstable as long as there's some form of announcement to avoid any confusion. Fundamentally that's the entire point of having unstable ABI versions, some changes are just not really possible to make without ABI breakages, most consumers just use the stable ABI, while unstable ABI is not backwards compatible otherwise it would make a whole range of changes to libc++ and/or Clang impossible.

FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.

I'm strongly against it. The harder it is to see all the implementations of a given function f, the harder it is to know they exist and keep them in sync. In my experience spreading these things out leads to real bugs.

If you can test both implementations against the same set of tests, it shouldn't be hard to catch those bugs. Actually, that would be better than having only one version of the code tested (because let's face it, most people are not going to run the tests with both configurations, they're only going to use the default one).

@ldionne wrote:

This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations).

I would like that too. but that's not a blocking issue on this review.

It is a blocking issue if we're going to interleave the code with #ifdefs, as it makes the transition to a solution where we can maintain N implementations more difficult.

I also don't agree that we'll see more std::function implementations in the future. We won't. We're not going to standardize additional features to std::function unless they're not ABI breaking. That is, if the standard does function it'll won't require another version of it.
Maybe you were thinking about things like std::unique_function?

No, what I mean is that libc++ might want to provide different implementations of std::function, for example with different SBO sizes. The fact that @jsoyke seems to have many different implementations of std::function laying around is a validation of my hypothesis that we may want to have more than one std::function implementation.

I don't see the issue with breaking ABI that's designated as unstable as long as there's some form of announcement to avoid any confusion. Fundamentally that's the entire point of having unstable ABI versions, some changes are just not really possible to make without ABI breakages, most consumers just use the stable ABI, while unstable ABI is not backwards compatible otherwise it would make a whole range of changes to libc++ and/or Clang impossible.

I don't see a problem breaking the unstable ABI either. But we don't normally standardize ABI breaking changes. I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.

@jsoyke, I spoke to the other maintainers, and it's cool if we entirely ignore the allocators. So feel free to use that to squeeze out w/e benefits you can.

Otherwise, this is looking pretty good to me. Thanks for all the hard work.

include/functional
1834

Extra ;.

jsoyke marked an inline comment as done.Dec 3 2018, 11:17 AM

@jsoyke, I spoke to the other maintainers, and it's cool if we entirely ignore the allocators. So feel free to use that to squeeze out w/e benefits you can.

It didn't cost anything in terms of performance, it just cost some code.

jsoyke updated this revision to Diff 176436.Dec 3 2018, 11:33 AM

Remove extra ';'

I don't see the issue with breaking ABI that's designated as unstable as long as there's some form of announcement to avoid any confusion. Fundamentally that's the entire point of having unstable ABI versions, some changes are just not really possible to make without ABI breakages, most consumers just use the stable ABI, while unstable ABI is not backwards compatible otherwise it would make a whole range of changes to libc++ and/or Clang impossible.

I don't see a problem breaking the unstable ABI either. But we don't normally standardize ABI breaking changes.

I fully agree we don't care about breaking an unstable ABI (since vendors won't ship it if they care about ABI stability). I think everyone is on the same page here.

I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.

This is where we disagree -- I don't think there's a universally better implementation of std::function. One implementation might be better in some circumstances and another implementation might be better in other circumstances, hence my claim that we should not go for a solution that assumes at most two implementations.

On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?

Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.

I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.

This is where we disagree -- I don't think there's a universally better implementation of std::function. One implementation might be better in some circumstances and another implementation might be better in other circumstances, hence my claim that we should not go for a solution that assumes at most two implementations.

Our job is to provide a universal implementation of std::function, and hence pick one implementation which weights the different alternatives and chooses what we believe to be best in the majority of cases.

On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?

Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.

Just above you suggested we get ready to add more than two implementations, but here you're setting a very high bar for adding one more?
I read this as a seeming contradiction.

I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.

This is where we disagree -- I don't think there's a universally better implementation of std::function. One implementation might be better in some circumstances and another implementation might be better in other circumstances, hence my claim that we should not go for a solution that assumes at most two implementations.

Our job is to provide a universal implementation of std::function, and hence pick one implementation which weights the different alternatives and chooses what we believe to be best in the majority of cases.

I agree, and that's why we need to select the best possible default implementation. I don't think that precludes giving users more choices. The current patch makes it harder to do that.

Similarly, if we were to stabilize ABI v2 with this new std::function, and if we eventually wanted to use yet another implementation for ABI v3, we'd now have three #if branches in each member function of std::function. I think it would be better to setup something better right now.

On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?

Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.

Just above you suggested we get ready to add more than two implementations, but here you're setting a very high bar for adding one more?
I read this as a seeming contradiction.

In the current state of the patch, we're adding exactly one new implementation of std::function, and we're doing it in a way that it's "the one new better std::function". In that case, yes, I do want to understand what this new (better) std::function brings to the table. If we had a way of switching between multiple implementations more easily and this was just another implementation that users can pick if they wish to, I would almost certainly care a bit less.

If we had a way of switching between multiple implementations more easily and this was just another implementation that users can pick if they wish to, I would almost certainly care a bit less.

Switching at build time? Or do you mean switching in a way that doesn't break ABI compatibility?

sbenza added a subscriber: sbenza.Dec 3 2018, 12:32 PM
sbenza added inline comments.
benchmarks/function.bench.cpp
222 ↗(On Diff #176436)

These types already exist above.

include/functional
1715

static_cast<void*>(__hold.get())

1737

_VSTD.
There are other uses of std:: still.

1747

T&& is a pessimization for fundamental types.
This should be something like:

std::conditional_t<
  !_VSTD::is_reference_v<_ArgTypes> &&
  sizeof(_ArgTypes) < sizeof(void*) &&
  _VSTD::is_trivially_copy_constructible<_ArgTypes> &&
  _VSTD::is_trivially_destructible<_ArgTypes>,
  _ArgTypes, _ArgTypes&&>

Basically, small trivially copyable types should be passed by value to avoid the indirection. This potentially saves stack space (if the input was not an lvalue already) and indirections in the the callee.

1858

If _LIBCPP_NO_RTTI, we could optimize this by having a single global policy for all small objects.

2269

Do we want to unconditionally copy __buf_ even when __clone is called?

2281

I don't think we should check.
Setting the source to empty unconditionally should be faster than checking.
(same on the other move constructor below)

2290

Do we need to do this here?
Can't we do it on an else from the __not_null?
That way it won't even exist unless _Fp is a pointer.

2325

It is not clear if __construct_empty will overwrite __buf_ just by reading this.
I would read __buf_.__large into a local variable before calling __construct_empty()

EricWF added inline comments.Dec 3 2018, 12:39 PM
include/functional
1737

I actually have no idea why _VSTD:: exists, or why we would ever need it. One day I hope to kill it.

I'm guess we should do w/e the surrounding code does.

test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/copy_move.pass.cpp
162 ↗(On Diff #176436)

Just remove these asserts. I'm not sure how useful they are.

I guess it means we're allowing ourselves to break existing code, but that existing code depended on UB.
I think maybe the asserts were an attempt to establish that the SSO was actually being hit.

Eric and I just spoke. My point of view:

We should investigate factoring out the functions that differ in implementation into separate classes with a common API. This way, we could have something like this:

template<class _Rp, class ..._ArgTypes>
class function<_Rp(_ArgTypes...)> {
#ifndef _LIBCPP_ABI_OPTIMIZED_FUNCTION
    typedef __function::__abi_v1_impl<_Rp(_ArgTypes...)> __implementation;
#else
    typedef __function::__abi_optimized_impl<_Rp(_ArgTypes...)> __implementation;
#endif

    __implementation __impl_;

    ...
};

Then, in the copy constructor, something like:

template<class _Rp, class ..._ArgTypes>
function<_Rp(_ArgTypes...)>::function(const function& __f) {
    __impl_.copy_construct(__f.__impl_);
}

And something analogous for other operations. This way, the implementation of each storage/dispatching mechanism would be collocated with itself and we could conceivably add new ones or introduce a new one in a future ABI version. There's also a possibility that we could reuse some of this logic (the SBO logic) in other components like std::any. We could also test those two implementations individually in some libc++-specific tests instead of leaving the new implementation go largely untested. I also think it would make the code cleaner than using #ifdefs all over the place, but that's subjective.

I'd like this avenue to be explored -- if it turns out not to be a fruitful approach, then I can live with the current approach.

jsoyke updated this revision to Diff 176546.Dec 3 2018, 9:33 PM
jsoyke marked 17 inline comments as done.

Refactoring implementation of std::function into a few different classes to minimize the number of ifdefs.

jsoyke added a comment.Dec 3 2018, 9:38 PM

Eric and I just spoke. My point of view:

We should investigate factoring out the functions that differ in implementation into separate classes with a common API. This way, we could have something like this:

Your suggestion is pretty close to what I came up with. Let me know what you think, I prefer this approach myself.

benchmarks/function.bench.cpp
222 ↗(On Diff #176436)

I wanted something slightly different here, I want the compiler to actually generate different call operators, hence the GeneratedFunctor below.

include/functional
1747

I agree, this worked well for some internal optimizations we did, @EricWF objected at first, but maybe we should discuss it more.

1858

Is that worth it? I wouldn't think disabling RTTI is common.

2269

It's not worth the extra branch instruction based on benchmarks.

2281

This actually does perform better (at least on my workstation). Presumably because writes are more expensive than a conditional.

2290

Last time I looked it got compiled out anyway since the members are trivial, unread, and unconditionally set below.

2325

This should be clearer in the new version of the code which sets these fields explicitly.

jsoyke updated this revision to Diff 176547.Dec 3 2018, 9:44 PM

Adding some comments to new classes.

jsoyke marked 2 inline comments as done.Dec 3 2018, 9:48 PM
jsoyke added inline comments.
include/functional
1554

No strong feelings on any of the class names I've used here. I was thinking vfunc = vtable based and pfunc = policy based, but I'd be fine with something more descriptive as well.

1566

Let me know if I'm using this macro correctly. It seems like it should be present on all non-public symbols?

jsoyke marked an inline comment as done.Dec 3 2018, 9:53 PM
jsoyke added inline comments.
include/functional
2313

This causes a regression in the non-optimized version, presumably because swap() has a lot of conditionals and potential out-of-line calls, I'll fix this if the new code structure looks good otherwise.

ldionne requested changes to this revision.Dec 4 2018, 7:32 AM

I think we need to use std::launder when accessing the function object through the small buffer. This is a problem we seem to have in the current implementation too.

include/__config
98

Instead, I think this should be described as an "Unstable attempt to provide a more optimized std::function" or something along those lines. My understanding is that we're not bound to your current implementation for the new std::function -- we should figure out whatever is fastest and use that.

include/functional
1554

I'm trying to ponder whether changing this name would break the ABI. If someone explicitly instantiates a std::function in a shared object with default visibility, I think this is an ABI break because the dylib (if not recompiled) will export the vtable for __base, but the application (when recompiled against new headers) will try to find a definition for the vtable of __vfunc.

Now, I'm willing to assume that nobody did something like that. But even then, is there a possibility for an ABI break? @EricWF can you think of something?

1563

_LIBCPP_INLINE_VISIBILITY here and on allocator() too. Also, those should be called __target() and __allocator() since they are internal, no?

1566

Roughly speaking, it needs to appear on everything that we don't want to export from the dylib. It's a bit more complicated, but that's the general idea.

1691

Typo: underlying

1701

Mangle to __type_info or similar. I know it's not necessary because users can't #define type_info, but I find it better to be consistent and mangle everything that's not exposed to users.

2244–2246

static_cast<bool>? Just a matter of style.

2311–2312

We didn't check for that before, did we? IOW the implementation worked transparently even in the case of a self move-assignment. Would it be possible to retain that property (and save this check in the current implementation)?

ldionne added inline comments.Dec 4 2018, 7:32 AM
include/functional
1556

We already have one of those in __functional_03, right? I understand both files can't be included at the same time, but I would consider changing the name to avoid confusion.

1698

I think this can be made const.

1792

This would be more readable: __invoker(__use_small_storage<_Fun>() ? __call_small<_Fun> : __call_large<_Fun>).

2098

nullptr?

2100

reinterpret_cast

This revision now requires changes to proceed.Dec 4 2018, 7:32 AM
sbenza added inline comments.Dec 4 2018, 7:59 AM
benchmarks/function.bench.cpp
222 ↗(On Diff #176436)

Add a comment of some kind or rename the classes.
Otherwise it can be confusing for a maintainer.

include/functional
1563

I think catching these unreserved names should be done with tooling.
It is much easier for a ClangTidy tool to detect all incorrect declarations of unreserved names.

1698

type_info too

1792

This would instantiate both functions when only one is needed.

1858

No idea. Projects that disable RTTI might like the extra saving?

2049

We don't need a branch here.
Also, unconditionally making the moved-from empty can improve optimizations.

2077

Is this necessary?

2097

Shouldn't this be guarded for _LIBCPP_NO_RTTI?
You can't use typeid

2311–2312

Without this check the code is equally correct, as in the moved-from value has a valid-but-unspecified state. The lhs happens to also be the moved-from value, but that is fine.

2313

If we are going to use the swap trick, then we don't need to do =nullptr in the line before.

ldionne added inline comments.Dec 4 2018, 11:54 AM
include/functional
1563

Technically, allocator and target are "reserved" because they are used elsewhere in the library. If a user were to #define them, they'd be in trouble already.

For that reason, a Clang Tidy check would have to know the interface of everything in the standard library, and enforce that anything not part of such an interface in namespace std be mangled. A tool like this can definitely be written, but it may be tedious and difficult to keep up to date. I could be missing another (simpler) solution to implement this, though.

1792

I know, but what's the concern here? Compile-times or code bloat? If it's compile-time, we are instantiating a function just to do the dispatching. If it's code bloat, I'd be _very_ surprised if both definitions actually made it into the final executable, since one of them is obviously not used.

This is not a blocking comment, but I'm not sure we're optimizing for the right thing here. Readability _is_ important, even in a standard library.

2311–2312

Rephrasing to make sure I understand: If this == &__f, then after the assignment __f is in a valid-but-unspecified state (in this case it is empty). Since this == &__f, *this is also in the same valid-but-unspecified state.

I don't think this makes sense for self-assignment, because the lhs is usually understood to contain the value of the rhs after a move-assignment. IOW, the following should hold:

T x = ...;
T copy = x;
T y = std::move(x);
assert(y == copy);

This is not the case here.

sbenza added inline comments.Dec 4 2018, 12:22 PM
include/functional
1563

Yeah, I was thinking of a tool that has a whitelist of identifier names and anything else should be a reserved identifier.

1792

tbh, I think we can make the function be like:

template <typename _Fun>
static _Rp __call_impl(const __storage* __buf, _ArgTypes&&... __args)
{
  _Fun* __f = reinterpret_cast<_Fun*>(
      __use_small_storage<_Fun>::value ? __buf->__large : &__buf->__small);
  return (*__f)(_VSTD::forward<_ArgTypes>(__args)...);
}

Then there is a single function that handles both.

2311–2312

The example above does not have any self-assignment (or any assignment).

My argument is that y = std::move(y); leaves y in an unspecified state. It is not required to be a noop. This is how it behaves today and it is up to spec.

jsoyke updated this revision to Diff 176760.Dec 4 2018, 8:11 PM
jsoyke marked 33 inline comments as done.

Addressing recent review comments.

jsoyke added a comment.Dec 4 2018, 8:12 PM

Some tests are failing at this new version. Some seem because the tests depend on under-specified behavior, some might be legitimate so I'll dig into those more tomorrow.

include/functional
1554

Renamed back to the original class name in case this is an issue.

Would be fine since the ctor (which initializes the vtable pointer) is inline? Seems like everything will get folded into the std::function ctor, which should produce a layout compatible object that only references static symbols.

I'd slightly prefer changing the names to be more descriptive now that there are many classes that do the same job, but your concern makes sense so I'll wait till we have a clear answer about this.

1556

I reverted names of existing classes due to ABI concerns mentioned elsewhere.

1563

I wonder if parsing the synopsis comments would be reasonable? Anyway, it sounds like there is nothing to change other than visibility here since the name is already "reserved"?

1566

Think I got them all...

1792

Thanks, that's way cleaner.

2100

FWIW, this ends up casting 'const void*' to 'void (*)()' in some cases, which I think is undefined, but the existing implementation relies on this so I didn't worry about fixing it here.

2311–2312

Self move assignment left the function empty before, preserving that behavior now.

ldionne added inline comments.Dec 5 2018, 6:50 AM
include/functional
2311–2312

Yeah, my example is obviously broken. What I meant is just that one would expect the lhs to contain the value of the rhs, not for that value to be nuked.

I understand the current implementation is valid in terms of the specification. I'm saying the specification does not satisfy the principle of least surprise. Leave it as-is.

test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_F.pass.cpp
68 ↗(On Diff #176760)

Should these checks be made to depend on #ifdef _LIBCPP_ABI_OPTIMIZED_FUNCTION?

test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_rfunction.pass.cpp
92 ↗(On Diff #176760)

Those asserts should be removed too.

jsoyke updated this revision to Diff 177106.Dec 6 2018, 8:27 PM
jsoyke marked 4 inline comments as done.

Addressing some review comments.
Reformatting for consistency.
More explicit class names.
Fixing an expected performance regression.

jsoyke marked 2 inline comments as done.Dec 6 2018, 8:33 PM
jsoyke added inline comments.
include/functional
1747

I added this optimization here, at least partially.

There are tests that make sure that std::function<void(SomePredeclaredThing)> works, so unfortunately I cannot use sizeof(T) here.

2313

There is a test for "re-entrant" destruction, in other words the functor's destructor sets the std::function's value to something else during destruction. Not nulling this out before would cause that test to fail.

test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_F.pass.cpp
68 ↗(On Diff #176760)

No strong opinions from me, but I have a slight preference for removing it since it's basically testing an implementation detail.

jsoyke marked an inline comment as done.Dec 6 2018, 8:58 PM
jsoyke added inline comments.
include/functional
1779

This optimization wasn't in the last version, sbenza@ mentioned we might want to do it. It helps with the Invoke benchmarks, as much as 10-20%.

It's probably worth mentioning that this can hurt performance in cases where the caller is using std::forward, or passing small value types by reference, but these are very uncommon (at least in my experience).

sbenza added inline comments.Dec 7 2018, 10:10 AM
include/functional
1678

Yesterday richardsmith was telling us that we should not need or want to use aligned_storage for pretty much anything.
That we should char arrays directly, like

alignas(8) char __small[2 * sizeof(void*)];

btw, I don't think we want the default alignment from aligned_storage here. That can be 16, which is an overkill and will propagate to any other class that contains a std::function. We know that some allocators will be slower when requested alignment is larger than 8.

I don't mind if some functors end up in the heap because of our tighter alignment requirements. It will still be a win for all the other functors that have smaller alignment.

1779

Can you show how this can hurt?
Do you have a benchmark that gets better when this is removed?

This will only trigger if the signature of the function itself takes by value.
std::function::operator() will have it by value already, and the user callback will take it by value.
If we pass by ref we will be forced to put it on the stack on a different address even if the caller passed an lvalue because operator()'s arguments forced a copy.

We will need to read that data and pass it in by value anyway. We are only moving that lvalue-to-rvalue decay before the vtable instead of in the vtable.

https://gcc.godbolt.org/z/R7JBq2

jsoyke marked 3 inline comments as done.Dec 7 2018, 11:41 AM
jsoyke added inline comments.
include/functional
1779

Never-mind, my reasoning was flawed. I was basing my experience on a piece of software that did something like fast_forward but internally dispatched to std::function. It had to move register objects to the stack and made the fast_forward code perform worse, but we're fixing that problem here.

jsoyke updated this revision to Diff 177267.Dec 7 2018, 11:42 AM

Avoid aligned_storage.

ldionne accepted this revision.Dec 7 2018, 12:09 PM

I think I'm fine with this, but I'd still like to better understand the positive performance impact of the change. When we spoke @EricWF said he could provide an interpretation.

Thanks for the work!

include/functional
2169–2172

Please remove. :-)

This revision is now accepted and ready to land.Dec 7 2018, 12:09 PM
jsoyke marked an inline comment as done.Dec 7 2018, 12:12 PM
jsoyke updated this revision to Diff 177274.Dec 7 2018, 12:13 PM

Remove hardcoded config option.

Thanks for the review! I don't have commit access so I think someone else will have to commit this for me?

jsoyke updated this revision to Diff 177283.Dec 7 2018, 12:36 PM

Sync to head.

jsoyke updated this revision to Diff 177556.Dec 10 2018, 10:24 AM

Sync to head.

jsoyke updated this revision to Diff 177557.Dec 10 2018, 10:26 AM

Dropping comment that doesn't applit now that __policy_storage doesn't use alligned_storage.

EricWF added inline comments.Dec 10 2018, 10:29 AM
include/functional
1703

Nit: I don't think the std:: qualifier is needed.

1717

I wouldn't mind naming these magic constants like:

static const _LIBCPP_CONSTEXPR __policy __policy_ = {
  /* __clone */ nullptr,  /* __destroy */ nullptr, /* __is_null */ true,
};
1718

Small nit. I hate conditional compilation blocks, and would like as few as possible. I would write this as:

template <class _Tp>
_LIBCPP_INLINE_VISIBILITY
static constexpr type_info *__get_typeid() {
#ifndef _LIBCPP_NO_RTTI
  return &typeid(_Tp);
#else
  return nullptr;
#endif

and then just use that function everywhere else.

1876

Nit: The _VSTD:: is unnecessarily when referencing types.

test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/copy_move.pass.cpp
162 ↗(On Diff #176436)

Actually I take this back. Hyrums law leads me to believe if we regress this behavior in ABI v1, someone will notice.

I think maybe we should still have these asserts for the old ABI (guarded by #ifndef _LIBCPP_ABI_OPTIMIZED_FUNCTION).

Sorry.

jsoyke updated this revision to Diff 177558.Dec 10 2018, 10:31 AM

Drop duplicate definition of __value_func.

LGTM after addressing the inline comments. We can further iterate after landing this if needed.

include/functional
2311

std::addressof just in case?

2324

s/_LIBCPP_EXPLICIT/explicit

jsoyke updated this revision to Diff 177568.Dec 10 2018, 11:15 AM

Addressing review comments.

EricWF accepted this revision.Dec 10 2018, 11:16 AM
jsoyke marked 2 inline comments as done.Dec 10 2018, 11:16 AM
shafik added a subscriber: shafik.Dec 10 2018, 12:42 PM

Just a heads up this change broke the lldb std::function formatter see the logs here:

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/13996/

I have not look at all the details yet so not clear how much work is involved in patching on out side yet.

I can take a stab at fixing that if you can point me to the code that does that formatting? I plan on making another change here so it would be good if I could fix the problem in advance.

The gist of the change is that the members you would depend on are now nested inside the __f_ member, but those members may change depending on how libc++ is configured.

The fix was not too bad, I landed the fix and the bots just turned green http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/14015/

If you run the lldb test suite we have test for the formatter and for stepping into std::function so those _should_ catch any breaking changes.

See changes here

https://reviews.llvm.org/rL348810
EricWF closed this revision.Dec 10 2018, 4:18 PM

Committed in r348812. Thanks for all your hard work @jsoyke!