Diff Detail
- Repository
- rCXX libc++
- Build Status
Buildable 25900 Build 25899: arc lint + arc unit
Event Timeline
I'd like to see benchmarking results that show the benefit of this approach.
Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).
The benchmarks are already present. We should add the results to this review to make it easier to view, but it's also possible to verify yourself.
Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).
Well we have to continue supporting the old ABI, and I don't want yet another std::function to maintain; maintaining 4 separate versions is already a pain (the C++11 one, and three C++03 versions).
I much prefer having the two sets of code interleaved, so when std::function::foo needs a change or a bugfix, both sets of code are next to each other.
Addressing some comments, using explicit bools for 'trivial' and 'is_small' again (these perform better for some reason?).
Current benchmark output with clang-7:
And a paste of it for easy viewing: https://pastebin.com/raw/5P5zqtuy
One of the most important parts of this change in system-level benchmarks is eliminating the 3 levels of pointer / vtable indirection (__f_ -> vtable -> operator()) and 1 conditional in std::function::operator() (which doesn't really show up in the micro-benchmark, presumably because the cache stays hot or the compiler can optimize it very easily).
There are some alternative versions of this change that are worth considering, for example we could only eliminate the conditional in operator() and only eliminate one level of pointer indirection, which would allow for larger inline captures (up to sizeof(void*)*3, which happens to be large enough to accommodate most containers captured by value).
Another possibility is always storing non-trivial data out-of-line, which makes move operations really fast at the expense of an extra allocation for small non-trivial objects (in some of our codebases moving std::functions tends to be done in critical regions, like popping off of a thread-safe work queue, so the extra allocation may be worth it for some workloads).
I know, I wanted the author to do the work.
Also, it's not clear to me that trying to "merge" the two implementations is the right choice. Maybe we want a straight up different implementation (i.e. replace std::function as a whole instead of having many #ifdefs in the existing std::function).
Well we have to continue supporting the old ABI, and I don't want yet another std::function to maintain; maintaining 4 separate versions is already a pain (the C++11 one, and three C++03 versions).
I much prefer having the two sets of code interleaved, so when std::function::foo needs a change or a bugfix, both sets of code are next to each other.
The thing is that we're making a choice here: it's possible that we could have more implementations of std::function coming down the line, and in that case interleaving the code wouldn't be a good idea. Instead, having a way to switch between several implementations more easily (understanding we can only pick one for a given ABI) would be better. We could experiment with different implementations and then settle on the one for the next ABI version. This is why I would like for us to plan on a solution that is more easily maintainable in the face of N implementations, not just 2 implementations (disregarding the C++03 implementations). Defining this new std::function separately is one way of doing it, and defining some "hooks" that can be used to tweak storage and virtual dispatch (like https://github.com/ldionne/dyno) is another better but more complicated way.
Thanks! What is this output? Is this some "diff" view between the current and the proposed patch? The time column with floating points represents a change in percentage from the baseline to the candidate (e.g. +0.13 == +13%)? I've used Google Benchmark in the past but I've never seen output exactly like that.
There are some alternative versions of this change that are worth considering, for example we could only eliminate the conditional in operator() and only eliminate one level of pointer indirection, which would allow for larger inline captures (up to sizeof(void*)*3, which happens to be large enough to accommodate most containers captured by value).
Another possibility is always storing non-trivial data out-of-line, which makes move operations really fast at the expense of an extra allocation for small non-trivial objects (in some of our codebases moving std::functions tends to be done in critical regions, like popping off of a thread-safe work queue, so the extra allocation may be worth it for some workloads).
We could also experiment with different sizes of SBO and see what's best. I don't think it's a one-size-fits-all problem (unless 3*sizeof(void) is magic for some reason?).
Thanks! What is this output? Is this some "diff" view between the current and the proposed patch? The time column with floating points represents a change in percentage from the baseline to the candidate (e.g. +0.13 == +13%)? I've used Google Benchmark in the past but I've never seen output exactly like that.
The output is the time delta, not a fraction. It's the output of libcxx/utils/google-benchmark/tools/compare.py. Some of the operations are sub-nano-second so it's handy for that. The times are so low here we may want to time ~10 operations or something instead of just one, but the comparison seems roughly inline with what I've seen with our internal tools.
We could also experiment with different sizes of SBO and see what's best. I don't think it's a one-size-fits-all problem (unless 3*sizeof(void) is magic for some reason?).
I could probably instrument the code to get a histogram of capture sizes we see in some real codebases, that could help inform any decisions here.
Added benchmark that creates many different types of functors to avoid branch prediction.
The output is the time delta, not a fraction. It's the output of libcxx/utils/google-benchmark/tools/compare.py. Some of the operations are sub-nano-second so it's handy for that. The times are so low here we may want to time ~10 operations or something instead of just one, but the comparison seems roughly inline with what I've seen with our internal tools.
I think I may have been wrong about this... I haven't really used this tool much, it might be nice if it showed % or something.
I added a benchmark that instantiates many different types of functors to get a feel for how this would behave in a more realistic setting:
Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------- BM_MixedFunctorTypes -0.3121 -0.3122 19677 13535 19677 13534
Looking at that it seems like the -0.3121 must be a fraction... Not sure why I thought it was a delta.
I need to statically generate a bunch of different types in this new benchmark, which makes compilation really slow. Interested in suggestions on how to improve that.
FWIW, I'm slightly in favor of using different headers. For example, I just realized that this change currently leaves a bunch of baggage symbols laying around that the new code won't use, it would be easier to see that kind of thing if the internals of each function were just in it's own file.
include/functional | ||
---|---|---|
1476 | Note to self: Remove this before trying to submit. I'm only keeping it here during development and benchmarking because it makes it easier to copy / paste the code into our internal toolchain. |
Bah, I just realized I still have a bunch of local variables that don't use reserved names. Fixing.
Switching to version of optimized function that doesn't include small-object-optimization for small non-trivial objects.
This seems to perform better in the new benchmark I added:
Benchmark Time CPU Time Old Time New CPU Old CPU New
BM_MixedFunctorTypes -0.3794 -0.3794 19661 12202 19660 12202
Heres another version that might be worth discussing. It's a much simpler implementation because it doesn't attempt small value optimization for non-trivial objects, this makes operations like move-construction very fast.
Here is a paste of the benchmark comparison: https://pastebin.com/raw/pLrevUAc
I've had both implementations laying around for a while, but some analysis that I did today suggests that the current version (without the non-trivial SOO) might be better for some important workloads.
I'm strongly against it. The harder it is to see all the implementations of a given function f, the harder it is to know they exist and keep them in sync. In my experience spreading these things out leads to real bugs.
I would like that too. but that's not a blocking issue on this review.
I also don't agree that we'll see more std::function implementations in the future. We won't. We're not going to standardize additional features to std::function unless they're not ABI breaking. That is, if the standard does function it'll won't require another version of it.
Maybe you were thinking about things like std::unique_function?
Note that having the C++03 functions be in a separate file with no relation or interleaving with the C++11 implementation has caused plenty of bugs in the past where they diverge.
include/functional | ||
---|---|---|
1677 | This implementation must work in all dialects (likely including C++03 w/ Clang extensions :-S ). So it has to support allocators. At minimum it needs to support calling the allocator constructors. We may be able to get away with ignoring the allocator though. I'll have to ask my other maintainers. Let me know if you want some help adding allocator support. | |
1798 | _LIBCPP_CONSTEXPR |
include/functional | ||
---|---|---|
1677 | Done, but I think I should add some test coverage before declaring triumph. I wouldn't mind ignoring the arguments though if that's preferable for some reason, a few code paths already do (copying from an existing std::function for example). |
I don't see the issue with breaking ABI that's designated as unstable as long as there's some form of announcement to avoid any confusion. Fundamentally that's the entire point of having unstable ABI versions, some changes are just not really possible to make without ABI breakages, most consumers just use the stable ABI, while unstable ABI is not backwards compatible otherwise it would make a whole range of changes to libc++ and/or Clang impossible.
If you can test both implementations against the same set of tests, it shouldn't be hard to catch those bugs. Actually, that would be better than having only one version of the code tested (because let's face it, most people are not going to run the tests with both configurations, they're only going to use the default one).
I would like that too. but that's not a blocking issue on this review.
It is a blocking issue if we're going to interleave the code with #ifdefs, as it makes the transition to a solution where we can maintain N implementations more difficult.
I also don't agree that we'll see more std::function implementations in the future. We won't. We're not going to standardize additional features to std::function unless they're not ABI breaking. That is, if the standard does function it'll won't require another version of it.
Maybe you were thinking about things like std::unique_function?
No, what I mean is that libc++ might want to provide different implementations of std::function, for example with different SBO sizes. The fact that @jsoyke seems to have many different implementations of std::function laying around is a validation of my hypothesis that we may want to have more than one std::function implementation.
I don't see a problem breaking the unstable ABI either. But we don't normally standardize ABI breaking changes. I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.
@jsoyke, I spoke to the other maintainers, and it's cool if we entirely ignore the allocators. So feel free to use that to squeeze out w/e benefits you can.
Otherwise, this is looking pretty good to me. Thanks for all the hard work.
include/functional | ||
---|---|---|
1834 | Extra ;. |
@jsoyke, I spoke to the other maintainers, and it's cool if we entirely ignore the allocators. So feel free to use that to squeeze out w/e benefits you can.
It didn't cost anything in terms of performance, it just cost some code.
I fully agree we don't care about breaking an unstable ABI (since vendors won't ship it if they care about ABI stability). I think everyone is on the same page here.
I think this means we should only ever have at most 2 versions of std::function. One for the stable ABI and potentially an improved version for the unstable ABI.
This is where we disagree -- I don't think there's a universally better implementation of std::function. One implementation might be better in some circumstances and another implementation might be better in other circumstances, hence my claim that we should not go for a solution that assumes at most two implementations.
On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?
Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.
Our job is to provide a universal implementation of std::function, and hence pick one implementation which weights the different alternatives and chooses what we believe to be best in the majority of cases.
On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?
Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.
Just above you suggested we get ready to add more than two implementations, but here you're setting a very high bar for adding one more?
I read this as a seeming contradiction.
I agree, and that's why we need to select the best possible default implementation. I don't think that precludes giving users more choices. The current patch makes it harder to do that.
Similarly, if we were to stabilize ABI v2 with this new std::function, and if we eventually wanted to use yet another implementation for ABI v3, we'd now have three #if branches in each member function of std::function. I think it would be better to setup something better right now.
On a different note, I still don't understand why this std::function is better from a benchmark perspective. Some interpretation of those pastes would be appreciated, and we still don't seem to be sure what the output is (% or delta?). @EricWF should know, since he wrote GoogleBenchmark?
Sorry if I seem to be setting the bar a bit high, but I think we need this before we basically double the maintenance burden of std::function.
Just above you suggested we get ready to add more than two implementations, but here you're setting a very high bar for adding one more?
I read this as a seeming contradiction.
In the current state of the patch, we're adding exactly one new implementation of std::function, and we're doing it in a way that it's "the one new better std::function". In that case, yes, I do want to understand what this new (better) std::function brings to the table. If we had a way of switching between multiple implementations more easily and this was just another implementation that users can pick if they wish to, I would almost certainly care a bit less.
If we had a way of switching between multiple implementations more easily and this was just another implementation that users can pick if they wish to, I would almost certainly care a bit less.
Switching at build time? Or do you mean switching in a way that doesn't break ABI compatibility?
benchmarks/function.bench.cpp | ||
---|---|---|
222 ↗ | (On Diff #176436) | These types already exist above. |
include/functional | ||
1715 | static_cast<void*>(__hold.get()) | |
1737 | _VSTD. | |
1747 | T&& is a pessimization for fundamental types. std::conditional_t< !_VSTD::is_reference_v<_ArgTypes> && sizeof(_ArgTypes) < sizeof(void*) && _VSTD::is_trivially_copy_constructible<_ArgTypes> && _VSTD::is_trivially_destructible<_ArgTypes>, _ArgTypes, _ArgTypes&&> Basically, small trivially copyable types should be passed by value to avoid the indirection. This potentially saves stack space (if the input was not an lvalue already) and indirections in the the callee. | |
1858 | If _LIBCPP_NO_RTTI, we could optimize this by having a single global policy for all small objects. | |
2269 | Do we want to unconditionally copy __buf_ even when __clone is called? | |
2281 | I don't think we should check. | |
2290 | Do we need to do this here? | |
2325 | It is not clear if __construct_empty will overwrite __buf_ just by reading this. |
include/functional | ||
---|---|---|
1737 | I actually have no idea why _VSTD:: exists, or why we would ever need it. One day I hope to kill it. I'm guess we should do w/e the surrounding code does. | |
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/copy_move.pass.cpp | ||
162 ↗ | (On Diff #176436) | Just remove these asserts. I'm not sure how useful they are. I guess it means we're allowing ourselves to break existing code, but that existing code depended on UB. |
Eric and I just spoke. My point of view:
We should investigate factoring out the functions that differ in implementation into separate classes with a common API. This way, we could have something like this:
template<class _Rp, class ..._ArgTypes> class function<_Rp(_ArgTypes...)> { #ifndef _LIBCPP_ABI_OPTIMIZED_FUNCTION typedef __function::__abi_v1_impl<_Rp(_ArgTypes...)> __implementation; #else typedef __function::__abi_optimized_impl<_Rp(_ArgTypes...)> __implementation; #endif __implementation __impl_; ... };
Then, in the copy constructor, something like:
template<class _Rp, class ..._ArgTypes> function<_Rp(_ArgTypes...)>::function(const function& __f) { __impl_.copy_construct(__f.__impl_); }
And something analogous for other operations. This way, the implementation of each storage/dispatching mechanism would be collocated with itself and we could conceivably add new ones or introduce a new one in a future ABI version. There's also a possibility that we could reuse some of this logic (the SBO logic) in other components like std::any. We could also test those two implementations individually in some libc++-specific tests instead of leaving the new implementation go largely untested. I also think it would make the code cleaner than using #ifdefs all over the place, but that's subjective.
I'd like this avenue to be explored -- if it turns out not to be a fruitful approach, then I can live with the current approach.
Refactoring implementation of std::function into a few different classes to minimize the number of ifdefs.
Your suggestion is pretty close to what I came up with. Let me know what you think, I prefer this approach myself.
benchmarks/function.bench.cpp | ||
---|---|---|
222 ↗ | (On Diff #176436) | I wanted something slightly different here, I want the compiler to actually generate different call operators, hence the GeneratedFunctor below. |
include/functional | ||
1747 | I agree, this worked well for some internal optimizations we did, @EricWF objected at first, but maybe we should discuss it more. | |
1858 | Is that worth it? I wouldn't think disabling RTTI is common. | |
2269 | It's not worth the extra branch instruction based on benchmarks. | |
2281 | This actually does perform better (at least on my workstation). Presumably because writes are more expensive than a conditional. | |
2290 | Last time I looked it got compiled out anyway since the members are trivial, unread, and unconditionally set below. | |
2325 | This should be clearer in the new version of the code which sets these fields explicitly. |
include/functional | ||
---|---|---|
1554 | No strong feelings on any of the class names I've used here. I was thinking vfunc = vtable based and pfunc = policy based, but I'd be fine with something more descriptive as well. | |
1566 | Let me know if I'm using this macro correctly. It seems like it should be present on all non-public symbols? |
include/functional | ||
---|---|---|
2313 | This causes a regression in the non-optimized version, presumably because swap() has a lot of conditionals and potential out-of-line calls, I'll fix this if the new code structure looks good otherwise. |
I think we need to use std::launder when accessing the function object through the small buffer. This is a problem we seem to have in the current implementation too.
include/__config | ||
---|---|---|
98 | Instead, I think this should be described as an "Unstable attempt to provide a more optimized std::function" or something along those lines. My understanding is that we're not bound to your current implementation for the new std::function -- we should figure out whatever is fastest and use that. | |
include/functional | ||
1554 | I'm trying to ponder whether changing this name would break the ABI. If someone explicitly instantiates a std::function in a shared object with default visibility, I think this is an ABI break because the dylib (if not recompiled) will export the vtable for __base, but the application (when recompiled against new headers) will try to find a definition for the vtable of __vfunc. Now, I'm willing to assume that nobody did something like that. But even then, is there a possibility for an ABI break? @EricWF can you think of something? | |
1563 | _LIBCPP_INLINE_VISIBILITY here and on allocator() too. Also, those should be called __target() and __allocator() since they are internal, no? | |
1566 | Roughly speaking, it needs to appear on everything that we don't want to export from the dylib. It's a bit more complicated, but that's the general idea. | |
1691 | Typo: underlying | |
1701 | Mangle to __type_info or similar. I know it's not necessary because users can't #define type_info, but I find it better to be consistent and mangle everything that's not exposed to users. | |
2244–2246 | static_cast<bool>? Just a matter of style. | |
2311–2312 | We didn't check for that before, did we? IOW the implementation worked transparently even in the case of a self move-assignment. Would it be possible to retain that property (and save this check in the current implementation)? |
include/functional | ||
---|---|---|
1556 | We already have one of those in __functional_03, right? I understand both files can't be included at the same time, but I would consider changing the name to avoid confusion. | |
1698 | I think this can be made const. | |
1792 | This would be more readable: __invoker(__use_small_storage<_Fun>() ? __call_small<_Fun> : __call_large<_Fun>). | |
2098 | nullptr? | |
2100 | reinterpret_cast |
benchmarks/function.bench.cpp | ||
---|---|---|
222 ↗ | (On Diff #176436) | Add a comment of some kind or rename the classes. |
include/functional | ||
1563 | I think catching these unreserved names should be done with tooling. | |
1698 | type_info too | |
1792 | This would instantiate both functions when only one is needed. | |
1858 | No idea. Projects that disable RTTI might like the extra saving? | |
2049 | We don't need a branch here. | |
2077 | Is this necessary? | |
2097 | Shouldn't this be guarded for _LIBCPP_NO_RTTI? | |
2311–2312 | Without this check the code is equally correct, as in the moved-from value has a valid-but-unspecified state. The lhs happens to also be the moved-from value, but that is fine. | |
2313 | If we are going to use the swap trick, then we don't need to do =nullptr in the line before. |
include/functional | ||
---|---|---|
1563 | Technically, allocator and target are "reserved" because they are used elsewhere in the library. If a user were to #define them, they'd be in trouble already. For that reason, a Clang Tidy check would have to know the interface of everything in the standard library, and enforce that anything not part of such an interface in namespace std be mangled. A tool like this can definitely be written, but it may be tedious and difficult to keep up to date. I could be missing another (simpler) solution to implement this, though. | |
1792 | I know, but what's the concern here? Compile-times or code bloat? If it's compile-time, we are instantiating a function just to do the dispatching. If it's code bloat, I'd be _very_ surprised if both definitions actually made it into the final executable, since one of them is obviously not used. This is not a blocking comment, but I'm not sure we're optimizing for the right thing here. Readability _is_ important, even in a standard library. | |
2311–2312 | Rephrasing to make sure I understand: If this == &__f, then after the assignment __f is in a valid-but-unspecified state (in this case it is empty). Since this == &__f, *this is also in the same valid-but-unspecified state. I don't think this makes sense for self-assignment, because the lhs is usually understood to contain the value of the rhs after a move-assignment. IOW, the following should hold: T x = ...; T copy = x; T y = std::move(x); assert(y == copy); This is not the case here. |
include/functional | ||
---|---|---|
1563 | Yeah, I was thinking of a tool that has a whitelist of identifier names and anything else should be a reserved identifier. | |
1792 | tbh, I think we can make the function be like: template <typename _Fun> static _Rp __call_impl(const __storage* __buf, _ArgTypes&&... __args) { _Fun* __f = reinterpret_cast<_Fun*>( __use_small_storage<_Fun>::value ? __buf->__large : &__buf->__small); return (*__f)(_VSTD::forward<_ArgTypes>(__args)...); } Then there is a single function that handles both. | |
2311–2312 | The example above does not have any self-assignment (or any assignment). My argument is that y = std::move(y); leaves y in an unspecified state. It is not required to be a noop. This is how it behaves today and it is up to spec. |
Some tests are failing at this new version. Some seem because the tests depend on under-specified behavior, some might be legitimate so I'll dig into those more tomorrow.
include/functional | ||
---|---|---|
1554 | Renamed back to the original class name in case this is an issue. Would be fine since the ctor (which initializes the vtable pointer) is inline? Seems like everything will get folded into the std::function ctor, which should produce a layout compatible object that only references static symbols. I'd slightly prefer changing the names to be more descriptive now that there are many classes that do the same job, but your concern makes sense so I'll wait till we have a clear answer about this. | |
1556 | I reverted names of existing classes due to ABI concerns mentioned elsewhere. | |
1563 | I wonder if parsing the synopsis comments would be reasonable? Anyway, it sounds like there is nothing to change other than visibility here since the name is already "reserved"? | |
1566 | Think I got them all... | |
1792 | Thanks, that's way cleaner. | |
2100 | FWIW, this ends up casting 'const void*' to 'void (*)()' in some cases, which I think is undefined, but the existing implementation relies on this so I didn't worry about fixing it here. | |
2311–2312 | Self move assignment left the function empty before, preserving that behavior now. |
include/functional | ||
---|---|---|
2311–2312 | Yeah, my example is obviously broken. What I meant is just that one would expect the lhs to contain the value of the rhs, not for that value to be nuked. I understand the current implementation is valid in terms of the specification. I'm saying the specification does not satisfy the principle of least surprise. Leave it as-is. | |
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_F.pass.cpp | ||
68 ↗ | (On Diff #176760) | Should these checks be made to depend on #ifdef _LIBCPP_ABI_OPTIMIZED_FUNCTION? |
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_rfunction.pass.cpp | ||
92 ↗ | (On Diff #176760) | Those asserts should be removed too. |
Addressing some review comments.
Reformatting for consistency.
More explicit class names.
Fixing an expected performance regression.
include/functional | ||
---|---|---|
1747 | I added this optimization here, at least partially. There are tests that make sure that std::function<void(SomePredeclaredThing)> works, so unfortunately I cannot use sizeof(T) here. | |
2313 | There is a test for "re-entrant" destruction, in other words the functor's destructor sets the std::function's value to something else during destruction. Not nulling this out before would cause that test to fail. | |
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/alloc_F.pass.cpp | ||
68 ↗ | (On Diff #176760) | No strong opinions from me, but I have a slight preference for removing it since it's basically testing an implementation detail. |
include/functional | ||
---|---|---|
1779 | This optimization wasn't in the last version, sbenza@ mentioned we might want to do it. It helps with the Invoke benchmarks, as much as 10-20%. It's probably worth mentioning that this can hurt performance in cases where the caller is using std::forward, or passing small value types by reference, but these are very uncommon (at least in my experience). |
include/functional | ||
---|---|---|
1678 | Yesterday richardsmith was telling us that we should not need or want to use aligned_storage for pretty much anything. alignas(8) char __small[2 * sizeof(void*)]; btw, I don't think we want the default alignment from aligned_storage here. That can be 16, which is an overkill and will propagate to any other class that contains a std::function. We know that some allocators will be slower when requested alignment is larger than 8. I don't mind if some functors end up in the heap because of our tighter alignment requirements. It will still be a win for all the other functors that have smaller alignment. | |
1779 | Can you show how this can hurt? This will only trigger if the signature of the function itself takes by value. We will need to read that data and pass it in by value anyway. We are only moving that lvalue-to-rvalue decay before the vtable instead of in the vtable. |
include/functional | ||
---|---|---|
1779 | Never-mind, my reasoning was flawed. I was basing my experience on a piece of software that did something like fast_forward but internally dispatched to std::function. It had to move register objects to the stack and made the fast_forward code perform worse, but we're fixing that problem here. |
Thanks for the review! I don't have commit access so I think someone else will have to commit this for me?
Dropping comment that doesn't applit now that __policy_storage doesn't use alligned_storage.
include/functional | ||
---|---|---|
1703 | Nit: I don't think the std:: qualifier is needed. | |
1717 | I wouldn't mind naming these magic constants like: static const _LIBCPP_CONSTEXPR __policy __policy_ = { /* __clone */ nullptr, /* __destroy */ nullptr, /* __is_null */ true, }; | |
1718 | Small nit. I hate conditional compilation blocks, and would like as few as possible. I would write this as: template <class _Tp> _LIBCPP_INLINE_VISIBILITY static constexpr type_info *__get_typeid() { #ifndef _LIBCPP_NO_RTTI return &typeid(_Tp); #else return nullptr; #endif and then just use that function everywhere else. | |
1876 | Nit: The _VSTD:: is unnecessarily when referencing types. | |
test/std/utilities/function.objects/func.wrap/func.wrap.func/func.wrap.func.con/copy_move.pass.cpp | ||
162 ↗ | (On Diff #176436) | Actually I take this back. Hyrums law leads me to believe if we regress this behavior in ABI v1, someone will notice. I think maybe we should still have these asserts for the old ABI (guarded by #ifndef _LIBCPP_ABI_OPTIMIZED_FUNCTION). Sorry. |
Just a heads up this change broke the lldb std::function formatter see the logs here:
http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/13996/
I have not look at all the details yet so not clear how much work is involved in patching on out side yet.
I can take a stab at fixing that if you can point me to the code that does that formatting? I plan on making another change here so it would be good if I could fix the problem in advance.
The gist of the change is that the members you would depend on are now nested inside the __f_ member, but those members may change depending on how libc++ is configured.
The fix was not too bad, I landed the fix and the bots just turned green http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/14015/
If you run the lldb test suite we have test for the formatter and for stepping into std::function so those _should_ catch any breaking changes.
Instead, I think this should be described as an "Unstable attempt to provide a more optimized std::function" or something along those lines. My understanding is that we're not bound to your current implementation for the new std::function -- we should figure out whatever is fastest and use that.