Page MenuHomePhabricator

Add a version of std::function that includes a few optimizations.

Authored by jsoyke on Nov 28 2018, 9:07 PM.


Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ldionne added inline comments.Dec 4 2018, 7:32 AM

We already have one of those in __functional_03, right? I understand both files can't be included at the same time, but I would consider changing the name to avoid confusion.


I think this can be made const.


This would be more readable: __invoker(__use_small_storage<_Fun>() ? __call_small<_Fun> : __call_large<_Fun>).





This revision now requires changes to proceed.Dec 4 2018, 7:32 AM
sbenza added inline comments.Dec 4 2018, 7:59 AM
222 ↗(On Diff #176436)

Add a comment of some kind or rename the classes.
Otherwise it can be confusing for a maintainer.


I think catching these unreserved names should be done with tooling.
It is much easier for a ClangTidy tool to detect all incorrect declarations of unreserved names.


type_info too


This would instantiate both functions when only one is needed.


No idea. Projects that disable RTTI might like the extra saving?


We don't need a branch here.
Also, unconditionally making the moved-from empty can improve optimizations.


Is this necessary?


Shouldn't this be guarded for _LIBCPP_NO_RTTI?
You can't use typeid


Without this check the code is equally correct, as in the moved-from value has a valid-but-unspecified state. The lhs happens to also be the moved-from value, but that is fine.


If we are going to use the swap trick, then we don't need to do =nullptr in the line before.

ldionne added inline comments.Dec 4 2018, 11:54 AM

Technically, allocator and target are "reserved" because they are used elsewhere in the library. If a user were to #define them, they'd be in trouble already.

For that reason, a Clang Tidy check would have to know the interface of everything in the standard library, and enforce that anything not part of such an interface in namespace std be mangled. A tool like this can definitely be written, but it may be tedious and difficult to keep up to date. I could be missing another (simpler) solution to implement this, though.


I know, but what's the concern here? Compile-times or code bloat? If it's compile-time, we are instantiating a function just to do the dispatching. If it's code bloat, I'd be _very_ surprised if both definitions actually made it into the final executable, since one of them is obviously not used.

This is not a blocking comment, but I'm not sure we're optimizing for the right thing here. Readability _is_ important, even in a standard library.


Rephrasing to make sure I understand: If this == &__f, then after the assignment __f is in a valid-but-unspecified state (in this case it is empty). Since this == &__f, *this is also in the same valid-but-unspecified state.

I don't think this makes sense for self-assignment, because the lhs is usually understood to contain the value of the rhs after a move-assignment. IOW, the following should hold:

T x = ...;
T copy = x;
T y = std::move(x);
assert(y == copy);

This is not the case here.

sbenza added inline comments.Dec 4 2018, 12:22 PM

Yeah, I was thinking of a tool that has a whitelist of identifier names and anything else should be a reserved identifier.


tbh, I think we can make the function be like:

template <typename _Fun>
static _Rp __call_impl(const __storage* __buf, _ArgTypes&&... __args)
  _Fun* __f = reinterpret_cast<_Fun*>(
      __use_small_storage<_Fun>::value ? __buf->__large : &__buf->__small);
  return (*__f)(_VSTD::forward<_ArgTypes>(__args)...);

Then there is a single function that handles both.


The example above does not have any self-assignment (or any assignment).

My argument is that y = std::move(y); leaves y in an unspecified state. It is not required to be a noop. This is how it behaves today and it is up to spec.

jsoyke updated this revision to Diff 176760.Dec 4 2018, 8:11 PM
jsoyke marked 33 inline comments as done.

Addressing recent review comments.

jsoyke added a comment.Dec 4 2018, 8:12 PM

Some tests are failing at this new version. Some seem because the tests depend on under-specified behavior, some might be legitimate so I'll dig into those more tomorrow.


Renamed back to the original class name in case this is an issue.

Would be fine since the ctor (which initializes the vtable pointer) is inline? Seems like everything will get folded into the std::function ctor, which should produce a layout compatible object that only references static symbols.

I'd slightly prefer changing the names to be more descriptive now that there are many classes that do the same job, but your concern makes sense so I'll wait till we have a clear answer about this.


I reverted names of existing classes due to ABI concerns mentioned elsewhere.


I wonder if parsing the synopsis comments would be reasonable? Anyway, it sounds like there is nothing to change other than visibility here since the name is already "reserved"?


Think I got them all...


Thanks, that's way cleaner.


FWIW, this ends up casting 'const void*' to 'void (*)()' in some cases, which I think is undefined, but the existing implementation relies on this so I didn't worry about fixing it here.


Self move assignment left the function empty before, preserving that behavior now.

ldionne added inline comments.Dec 5 2018, 6:50 AM

Yeah, my example is obviously broken. What I meant is just that one would expect the lhs to contain the value of the rhs, not for that value to be nuked.

I understand the current implementation is valid in terms of the specification. I'm saying the specification does not satisfy the principle of least surprise. Leave it as-is.

68 ↗(On Diff #176760)

Should these checks be made to depend on #ifdef _LIBCPP_ABI_OPTIMIZED_FUNCTION?

92 ↗(On Diff #176760)

Those asserts should be removed too.

jsoyke updated this revision to Diff 177106.Dec 6 2018, 8:27 PM
jsoyke marked 4 inline comments as done.

Addressing some review comments.
Reformatting for consistency.
More explicit class names.
Fixing an expected performance regression.

jsoyke marked 2 inline comments as done.Dec 6 2018, 8:33 PM
jsoyke added inline comments.

I added this optimization here, at least partially.

There are tests that make sure that std::function<void(SomePredeclaredThing)> works, so unfortunately I cannot use sizeof(T) here.


There is a test for "re-entrant" destruction, in other words the functor's destructor sets the std::function's value to something else during destruction. Not nulling this out before would cause that test to fail.

68 ↗(On Diff #176760)

No strong opinions from me, but I have a slight preference for removing it since it's basically testing an implementation detail.

jsoyke marked an inline comment as done.Dec 6 2018, 8:58 PM
jsoyke added inline comments.

This optimization wasn't in the last version, sbenza@ mentioned we might want to do it. It helps with the Invoke benchmarks, as much as 10-20%.

It's probably worth mentioning that this can hurt performance in cases where the caller is using std::forward, or passing small value types by reference, but these are very uncommon (at least in my experience).

sbenza added inline comments.Dec 7 2018, 10:10 AM

Yesterday richardsmith was telling us that we should not need or want to use aligned_storage for pretty much anything.
That we should char arrays directly, like

alignas(8) char __small[2 * sizeof(void*)];

btw, I don't think we want the default alignment from aligned_storage here. That can be 16, which is an overkill and will propagate to any other class that contains a std::function. We know that some allocators will be slower when requested alignment is larger than 8.

I don't mind if some functors end up in the heap because of our tighter alignment requirements. It will still be a win for all the other functors that have smaller alignment.


Can you show how this can hurt?
Do you have a benchmark that gets better when this is removed?

This will only trigger if the signature of the function itself takes by value.
std::function::operator() will have it by value already, and the user callback will take it by value.
If we pass by ref we will be forced to put it on the stack on a different address even if the caller passed an lvalue because operator()'s arguments forced a copy.

We will need to read that data and pass it in by value anyway. We are only moving that lvalue-to-rvalue decay before the vtable instead of in the vtable.

jsoyke marked 3 inline comments as done.Dec 7 2018, 11:41 AM
jsoyke added inline comments.

Never-mind, my reasoning was flawed. I was basing my experience on a piece of software that did something like fast_forward but internally dispatched to std::function. It had to move register objects to the stack and made the fast_forward code perform worse, but we're fixing that problem here.

jsoyke updated this revision to Diff 177267.Dec 7 2018, 11:42 AM

Avoid aligned_storage.

ldionne accepted this revision.Dec 7 2018, 12:09 PM

I think I'm fine with this, but I'd still like to better understand the positive performance impact of the change. When we spoke @EricWF said he could provide an interpretation.

Thanks for the work!


Please remove. :-)

This revision is now accepted and ready to land.Dec 7 2018, 12:09 PM
jsoyke marked an inline comment as done.Dec 7 2018, 12:12 PM
jsoyke updated this revision to Diff 177274.Dec 7 2018, 12:13 PM

Remove hardcoded config option.

Thanks for the review! I don't have commit access so I think someone else will have to commit this for me?

jsoyke updated this revision to Diff 177283.Dec 7 2018, 12:36 PM

Sync to head.

jsoyke updated this revision to Diff 177556.Dec 10 2018, 10:24 AM

Sync to head.

jsoyke updated this revision to Diff 177557.Dec 10 2018, 10:26 AM

Dropping comment that doesn't applit now that __policy_storage doesn't use alligned_storage.

EricWF added inline comments.Dec 10 2018, 10:29 AM

Nit: I don't think the std:: qualifier is needed.


I wouldn't mind naming these magic constants like:

static const _LIBCPP_CONSTEXPR __policy __policy_ = {
  /* __clone */ nullptr,  /* __destroy */ nullptr, /* __is_null */ true,

Small nit. I hate conditional compilation blocks, and would like as few as possible. I would write this as:

template <class _Tp>
static constexpr type_info *__get_typeid() {
  return &typeid(_Tp);
  return nullptr;

and then just use that function everywhere else.


Nit: The _VSTD:: is unnecessarily when referencing types.

162 ↗(On Diff #176436)

Actually I take this back. Hyrums law leads me to believe if we regress this behavior in ABI v1, someone will notice.

I think maybe we should still have these asserts for the old ABI (guarded by #ifndef _LIBCPP_ABI_OPTIMIZED_FUNCTION).


jsoyke updated this revision to Diff 177558.Dec 10 2018, 10:31 AM

Drop duplicate definition of __value_func.

LGTM after addressing the inline comments. We can further iterate after landing this if needed.


std::addressof just in case?



jsoyke updated this revision to Diff 177568.Dec 10 2018, 11:15 AM

Addressing review comments.

EricWF accepted this revision.Dec 10 2018, 11:16 AM
jsoyke marked 2 inline comments as done.Dec 10 2018, 11:16 AM
shafik added a subscriber: shafik.Dec 10 2018, 12:42 PM

Just a heads up this change broke the lldb std::function formatter see the logs here:

I have not look at all the details yet so not clear how much work is involved in patching on out side yet.

I can take a stab at fixing that if you can point me to the code that does that formatting? I plan on making another change here so it would be good if I could fix the problem in advance.

The gist of the change is that the members you would depend on are now nested inside the __f_ member, but those members may change depending on how libc++ is configured.

The fix was not too bad, I landed the fix and the bots just turned green

If you run the lldb test suite we have test for the formatter and for stepping into std::function so those _should_ catch any breaking changes.

See changes here
EricWF closed this revision.Dec 10 2018, 4:18 PM

Committed in r348812. Thanks for all your hard work @jsoyke!