This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Implement P0288R9 (move_only_function)
Needs ReviewPublic

Authored by philnik on Jul 27 2022, 5:57 AM.


Group Reviewers
Restricted Project

Diff Detail

Event Timeline

philnik created this revision.Jul 27 2022, 5:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 5:57 AM
Herald added a subscriber: mgorny. · View Herald Transcript
philnik requested review of this revision.Jul 27 2022, 5:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 5:57 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
philnik planned changes to this revision.Jul 28 2022, 11:36 AM

Thanks for looking into this :) std::move_only_function is very high on the list of my favorite new C++23 features!


Is it more efficient to only set __status_ = NotEngaged? Afaict, this would overwrite only 1 instead of 48 bytes

(here and a couple of other places which set __storage_ = {}


redundant return before else?


std::bit_cast<__ReturnT (*)(...)> instead of std::bit_cast<void (*)(...)>?

Also, it seems there are no test cases which instantiate move_only_function with a return type != void?


I think this should also take _LIBCPP_MOVE_ONLY_FUNCTION_NOEXCEPT into account when casting to the function pointer.

As written now, for move_only_function<void () noexcept>, this operator() will contain a call to a non-noexcept function. The compiler will hence generate additional exception handling code. Also, this blocks tail call optimization (


I think this can be implemented more efficiently by folding this switch with the indirect function call we are doing here anyway.

Proposed changes

Move this switch into __function_wrappers_impl::__call, taking _Store::_Status as a template parameter:

template<_Store::_Status status>
__call(_Store* store, _ArgTypes&&... __args) noexcept(_LIBCPP_MOVE_ONLY_FUNCTION_NOEXCEPT) {
    auto __doCall = [&](void* __functor) {
        return std::invoke(static_cast<_Functor _LIBCPP_MOVE_ONLY_FUNCTION_INV_QUALS>(*static_cast<_Functor*>(__functor)),
    using enum _Store::_Status;
    if constexpr (status == _NotEngaged) {
      _LIBCPP_ASSERT(false, "Tried to call move_only_function which doesn't hold a callable object");
    }  else if constexpr (status == _TriviallyDestructible) {
        return __doCall(&__storage_.__trivially_destructible_.__data_);
    }  else if constexpr (status == _TriviallyRelocatable) {
        return __doCall(&__storage_.__trivially_relocatable_.__data_);
    }  else if constexpr (status == _Heap) {
        return __doCall(&__storage_.__large_.__data_);

In __construct, the __storage.__large_.__call would then be assigned similar to

auto& __data   = __storage_.__large_;
__data.__call_ = std::bit_cast<void*>(&_FuncWraps::__call<_Store::_Status::_Heap>);

Because the __call_ member is always at the same position inside the __move_only_function_storage, we can then implement operator() as a simple

_LIBCPP_HIDE_FROM_ABI _ReturnT operator()(_ArgTypes... __args) _LIBCPP_MOVE_ONLY_FUNCTION_CV_REF
  return std::bit_cast<ReturnT (*)(_LIBCPP_MOVE_ONLY_FUNCTION_CV  _Store*, _ArgTypes...)>(__storage_.__large_.__call_)(
      &__storage, std::forward<_ArgTypes>(__args)...);

Why I think this is a good idea

Replacing the switch with a if constexpr chain guarantees that this switch is completely optimized away.

Futhermore for usages like

void setCallback(std::move_only_function<void()> cb);

void setCallbackWithGreetings(bool bye, const char* whom) {
     takesSomeCallback([=]() {
          std::cout << (bye ? "Goodbye " : "Hello ") << whom << "\n";

the compiler can inline the lambda body into the __function_wrappers_impl::__call. The compiler can then fold the offset computation for __storage_.__large_.__data_ with the offset computations for whom and bye. With the previous implementation of operator() this would not have been possible since the __storage_.__large_.__data_ happens before the indirect call, while the whom & bye offset computations happened inside the indirect call.

avogelsgesang added inline comments.Jul 29 2022, 9:00 PM

__functor should probably be the last argument.

This function forwards all its arguments except for __functor.
Removing an argument from the front of the argument list means that all parameters must be "shifted" between the registers/on the stack.
If instead functor is the last argument, no arguments need to be moved around.


This is also beneficial for operator(): operator() has to add the functor to the argument list. Appending an argument is trivial, but prepending is expensive...


Not sure if this idea is blessed by the standard or undefined behavior, but:

After changing operator() to append instead of prepend the pointer to the functor, this could be rewritten to std::bit_cast<void*>(static_cast<UnrefFunc>(_func));, i.e. we could directly store the function pointer inside __call_ instead of a pointer to the _FuncWraps::__call dispatch function.

The operator() would call this function pointer with one additional argument. But at on the x64 calling conventions, it is fine to call a function with additional parameters. Additional parameters are simply ignored...

avogelsgesang added inline comments.Aug 2 2022, 4:02 AM

nevermind - on 2nd thought: this code is only removing the first argument if __function is a function pointer. If it is, e.g., a lambda then the first argument, i.e. the this pointer, will be replaced but all other arguments stay in their original position.

Similarly, operator() does not actually prepend an argument, but just replaces the first argument, i.e. the this pointer


nevermind - see other comment why this makes no sense

lichray requested changes to this revision.Aug 10 2022, 12:17 PM
lichray added a subscriber: lichray.
lichray added inline comments.

We should aim for a macro-free implementation to benefit users.


In his CppCon 2017 talk, Louis @ldionne showed that embedding a vtable in the object storage is not an optimization compared to using a vptr even if we ignore the fact that embedded vtable uses more space. Also, mo-func of 6 x pointers in size may be too large. It's even larger than Swift's existential container (3-pointers + metadata + vptr).


Similarly, operator() does not actually prepend an argument, but just replaces the first argument, i.e. the this pointer


  1. _ArgTypes&& - should consider passing objects by copy rather than references as an optimization in certain cases.
  2. Please make sure that we don't have a design that works like the following: when move_only_function is initialized from a pointer to function, we need to dereference __functor first to reach the function pointer, and then make a call via that function pointer.

It seems that some logic is embedded in the mo-func body, rather than dispatched. __destroy_ is not sufficiently used.


Please make sure the type is ABI-compatible after adopting It is on the way to C++26.


We should test against behaviors, not guts. If the difference is not observable according to the standard, consider observing some extension such as a more strict noexcept.

philnik updated this revision to Diff 463895.Sep 29 2022, 7:46 AM
philnik added a subscriber: Quuxplusone.

New approach based on @Quuxplusone's feedback.

philnik marked 9 inline comments as done.Sep 29 2022, 8:00 AM
philnik added inline comments.

If you have an idea on how to do that I'd be happy to try it, but I don't think writing the implementation 12 times is worth removing the macros.


Do you have any reasoning how large it should be? It's simple to change, since I don't know what the size should be.


Can the indirection be avoided for function pointers?


I think the only things still relevant from this thread are P2511 and avoiding the indirection for function pointers. To make it simpler I added comments independent from this thread for these things.

ldionne requested changes to this revision.Sep 29 2022, 10:01 AM
ldionne added subscribers: jwakely, fhahn, rsmith.

In the interest of making progress on this, let's do the following concrete action items in addition to what I left in the comments:

  1. Extract a separate class to handle the SBO.
  2. Survey other implementations and also other open source projects to see what approach they've taken.

The tricky thing with this patch is that several design questions are intertwined: support for trivial-relocation, size of the SBO and how to store the "virtual functions" are all interdependent.


Since move_only_function is new, why do we need an ABI macro to control this?

Edit: @philnik tells me that's because GCC doesn't support the attribute. What a shame. Since we want to keep code ABI compatible between Clang and GCC (when compiled using libc++), I guess we do need to hide that.

@philnik Can you please file a bug against GCC to start supporting the trivial_abi attribute and paste a link here? I think this is a pretty serious missed optimization that would be so easy to get. CC @jwakely for awareness.


General comment 1:

I think we should extract the logic to implement the small buffer optimization into its own piece instead of implementing it inside move_only_function. This isn't the last time we'll have to do this kind of optimization, and in fact it would have been nice to have this when std::function was written too. I can help with that, I have a local copy of an implementation of a __sbo class I had written when I did my own stab at any_invocable (which is now move_only_function), a long time ago.


The only way I can see that working is to reinterpret_cast the function pointer itself to the type that __call_ expects, and assign it to __call_ directly. However, that would assume that the __function_wrappers<_Func>::__call thunk is truly a no-op, which I think is always the case.

I think this is technically UB though, because you have to cast the function pointer back to its real type before calling it (which is exactly the point of the thunk). This is something that's extremely hard to express within the language, however it's also something I would perhaps expect to be trivial to optimize for the compiler. After all, the compiler will basically see a thunk that does absolutely nothing except jump somewhere else. I would expect that the compiler can just hoist one level of indirection.

Note: As pointed out by @philnik in live review, indeed functions can't have the same address, so I don't think this is an optimizations that compilers can perform. That being said, I think this would be doable if we had a way to tell the compiler that we don't care about the address of that object. I'm just thinking out loud, but should be we allowed to apply [[no_unique_address]] to functions? @rsmith @fhahn Any thoughts?

ldionne added inline comments.Sep 29 2022, 10:01 AM

General comment 2:

Fundamentally, we need to decide whether we're going to implement the function call dispatching using a classic vtable, or generate a "vtable" ourselves (either inlined in the object like you do right now, or not). This has to be thought through, but my initial preference would be to go for a manual vtable with one layer of indirection, since IIRC that's what I had the best results with when I investigated this with Dyno. IIRC, the observation was that you don't really pay for the additional level of indirection if you're calling the function over and over again, since it's hot in the cache anyway. Instead, you're better to just reduce the size of the object. This is a really tough call to make though, because we're operating with a lot of assumptions and we have to make this work well for all possible workloads, which is simply impossible.

Probably relevant things to look at (sorry for the plug): and It's been a while but I had done a survey of various techniques in that area and I think the conclusions might still be relevant, even if that's a little bit dated.


General comment: Please make sure you have _LIBCPP_ASSERTs in place for UB in this class, and tests that these assertions work as intended.


IMO this is too large. I believe that types with sizeof(T) == 32 AND that are trivially move constructible/destructible AND that we want to type erase are going to be quite rare. I think that most types will be either smaller, or will be non-trivial. Shooting from the hip, I would use something like 2*sizeof(void*) at most, but we definitely need to look at what other implementations are doing, and also what other common utilities like this do (let's do a quick survey in open source libraries, e.g. have Folly or Abseil already shipped something like this, and if so, is there something we can learn from their experience?).


I think we should make it possible to store non-trivially-fooable types in the small buffer. IIUC you only need that constraint in order to implement swap, but you should be able to handle that by adding yet another function to the vtable. This would create a size issue right now, but wouldn't be a concern anymore if we move to a remote vtable.

This revision now requires changes to proceed.Sep 29 2022, 10:01 AM
  1. Survey other implementations and also other open source projects to see what approach they've taken.

GCC's implementation (which could change before we declare C++23 support non-experimental) has a SBO buffer that is big enough to store a pointer to member function and an object pointer, i.e. a simple delegate.

Please do file a GCC request for trivial_abi, I don't think that's on anybody's radar yet.

lichray added inline comments.Sep 29 2022, 1:50 PM

No macro and maintainable. Here is an example:
With C++ explicit object member function, you can also turn the six operator() into one.


According to STL the person, it is reasonable to "store a std::string locally." To libc++, that's 24 bytes; 32 bytes for the whole mv-func object with a vptr.

philnik updated this revision to Diff 467116.Oct 12 2022, 6:27 AM
philnik marked 4 inline comments as done.
  • Address some of the comments

I don't think that implementation is conforming. The standard explicitly states that there have to be partial specializations for each combination of cv, ref and noex ( I didn't ask before, but how exactly does it affect users whether the implementation is macro-free or not?

librarynamesizebuffer size
MSVC STLstd::move_only_function6456

What are you talking about?

ldionne requested changes to this revision.Oct 13 2022, 9:36 AM
ldionne added a subscriber: EricWF.
ldionne added inline comments.

I agree with @philnik that this isn't conforming, although I think it's only pedantically non-conforming. In practice, I would expect that it works for most users.

Most importantly, it seems to me like the spec is too narrow here. What the standard *really* wants to say is that move_only_function can handle any combination of cv noexcept ref, but I don't think it was ever an explicit goal to say that it would be achieved via specialization specifically.

@jwakely @lichray Do you agree this is overspecified? If so, I could write a NB comment about it and we could relax the specification. Otherwise, I would argue that the current approach taken by @philnik is probably the right one, even though I agree it's kind of lame to be required by the Standard to use the X macro idiom.


Let's call this in a way that it's move_only_function specific. Maybe _MoveOnlyFunctionVtable or whatever. Just std::_VTable is taking a pretty general name and assigning it specific meaning, which is weird.


If we do like llvm::unique_function (and I think we should), then we would have two sorts of vtables (pseudocode obviously):

struct _TrivialMoveOnlyFunctionVTable {
  auto __call_;

struct _NonTrivialMoveOnlyFunctionVTable : _TrivialMoveOnlyFunctionVTable {
  auto __destroy_;
  auto __move_;

We could use the pointer union trick to store whether the vtable is for a trivial type or not, which would replace your check for __destroy_ == nullptr in __reset(). IMO that would be slightly better.

We should implement a pointer union like type in libc++, that's something we should have had for a while. Give me a few days to get to it, and if I don't manage to I'll let you know and you can treat yourself.


How do you know that __func.__vtable_ is compatible with this function's vtable? And by that I mean precisely: how do you know that this function's vtable.__call_ (and __destroy_ and __move_) have exactly the same signature as __func.__vtable_'s? If they are not exactly the same, it's going to be UB to call them through pointers to different types.

From live review: you say this is definitely UB. First, please ensure to call out things like that explicitly. Second, I don't think we should perform this optimization unless we have some sort of blessing from the compiler (through a builtin or something like that).


Shooting from the hip, I would go for an implementation like llvm::unique_function, with 32 total size and 24 SBO. But it would be great to have more data about how these types are used. Unfortunately that's pretty difficult to get.

@EricWF Do you have any way to gather some data on your code base that would help us pick something here?


I think we should allow storing non-trivially movable types in the SBO. This would require another "virtual" function to perform the move, but I think that's worth the trouble. Concretely, the size of the vtable isn't a huge issue.


IMO it would be nice if __small_buffer had a constructor that took an object. It would then encapsulate way more logic. That does mean that __small_buffer would have to know how to allocate and deallocate, but that can be done via template parameters. I think that would simplify the implementation of move_only_function quite a bit.

It would also make sense (?) if __small_buffer could be swapped, IMO. That also requires additional knowledge that should be parameterizable.


Do we need launder?


You'll need to guard this for exceptions once you store non-trivial types in the buffer.


This allows dropping the dependency on std::array, which doesn't add much. The _BufferSize is always > 0.

1 ↗(On Diff #467116)

Can you call this assert.<whatever> for consistency?


Please keep this in each file, even if that means duplicating it. Seeing a variable named called in the tests coming from seamingly nowhere is really weird IMO. And the duplication isn't much.


This is weird too - this has a different behavior depending on whether the function returns void or not. I understand you were maybe trying to reduce code duplication, but this is kind of spaghetti in return, IMO. Instead, consider always setting a bool to express that the function was called. The only difference will then become whether you're taking an argument or not and what you're returning, which seems better:

struct NonTrivial {
  NonTrivial() = default;
  NonTrivial(MoveCounter, bool* set_on_call) {}
  NonTrivial(std::initializer_list<int>&, MoveCounter, bool* set_on_call) {}
  NonTrivial(NonTrivial&&) noexcept(false) {} // todo
  ~NonTrivial() {}

  void operator()() const noexcept { *called_ = true; }
  int operator()(int i) const noexcept { *called_ = true; return i; }

Also, I would suggest not using a global here. It's not constexpr friendly and it's also not the best style, even though I understand it results in more "compact" code.

This revision now requires changes to proceed.Oct 13 2022, 9:36 AM
ldionne added inline comments.Nov 11 2022, 7:11 PM
62 copyable_function is making progress through the committee. We should discuss how these two types are going to interoperate and whether that should inform the representation we choose for move_only_function. In particular, it would be a nice QOI property to have move_only_function(copyable_function&&) just steal the guts of the copyable_function and make no allocation.

philnik updated this revision to Diff 479775.Dec 2 2022, 4:14 PM
philnik marked 10 inline comments as done.

Address some more comments


We've got multiple threads discussing the this, so I'm closing some.


The question of using the SBO for non-trivially relocatable types is spanning multiple threads, so I'm closing this one.


This would also make move_only_function not trivially relocatable anymore.


I think the answer is yes, because we might never have destroyed a trivial object that was in __buffer_ before, but I'm not familiar enough with the technicalities of object lifetime to be confident in answering the question.

philnik updated this revision to Diff 479830.Dec 3 2022, 4:55 AM

Generate files

philnik updated this revision to Diff 479835.Dec 3 2022, 6:01 AM

Try to fix CI

ldionne requested changes to this revision.Dec 15 2022, 10:53 AM
ldionne added inline comments.

Is there a reason why you allow these macros to *not* be defined? I would simply document the macros that this file needs in a comment and also have a #ifndef #error #endif for each of them.


I think the question here is basically whether we will want to have non-trivially-movable types in the SBO of copyable_function. If we do, then we won't be able to steal the guts, otherwise we will (assuming we use a SBO size that is no larger than that of move-only-function).


Please add documentation explaining the design choices we made and why. Basically a summary of this review. You'll thank yourself in a few months/years -- it's much much easier to do it right now while it's fresh.


This doesn't look super useful anymore.


I would call those __trivial_vtable and __nontrivial_vtable instead, since they are vtables and not vpointers.


_LIBCPP_HIDE_FROM_ABI here and wherever else you need it -- please make a pass.


You could use an immediately-invoked lambda here to avoid defining a separate function, since it's used exactly once.


This one in a separate patch too, please!


We should implement _PointerLikeTraits for __pointer_int_pair so that we can chain __pointer_int_pairs.


Let's require std::is_signed_v unless you want to have some fun making sure it works with signed types.


Can you please make sure that you have tests with a fancy pointer class?


Lets explicitly do __pointer_int_pair(__pointer_int_pair const&) = default, and so on for the move constructor and the assignment operators. It's nice documentation to state that it's trivially whatever-able just like one would expect.


constexpr? That will make it constinit-able.


I think this assertion is incorrect. Let's say you have something like this:

// int value: 15 aka 1111 binary
// number of bits free in the pointer: 4
// number of bits in use: 2
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 0000 // before you store anything into the pointer
xxxxxxxxxxxxxxxxxxxxxxxxxxxx11 1100 // after you store 15 into the pointer (it overflowed)
xxxxxxxxxxxxxxxxxxxxxxxxxxxx11 1111 // assuming you were using the 2 lower bits of the pointer to store something like int-value=3

Your assertion will pass but you still overflowed.




This could be constexpr?


Looking at this, I am wondering whether we actually want to give it a pointer-like interface at all. This is a pair, not a pointer. Maybe we just want __first() and __second()? That would simplify the API.

And we should also make it work with structured bindings by making it a tuple-like class.


Can you split this class off into its own patch?


That way we can decide from move_only_function the additional criteria for storing the type in the SBO.

Nevermind, this adds too much complexity in the move constructor for little benefit since the only user (for now) would have it be trivially relocatable anyways.


Please document the design of the class, in particular the fact that it only ever stores trivially movable objects in the buffer, which allows it to be trivially relocatable. Also mention that the users of the class are expected to manage the lifetime of the object inside the buffer.


So the tradeoff here is that either we accept non-trivially movable types in the SBO, or we only accept trivially movable types and then we can guarantee that the SBO is trivially relocatable.

I think this decision boils down to what's the most important use case for move_only_function. In live review, I said I thought it would be used mostly as a function parameter, however you rightly point out that we'll have function_ref for that. So for times where we don't want to actually store the move_only_function, we can just use function_ref and there's never an allocation.

So I think it would make sense to keep it trivially relocatable, as in the current patch.


I think we need to launder for this case:

struct Foo { };
struct Bar { };

Foo* foo = buf.__get<Foo>();

Bar* bar = buf.__get<Bar>();

// somewhere else
void use(__small_buffer& buf) {

Maybe there should be more tests? This is a pretty fundamental utility.

Also please test chaining, etc.


return 0 everywhere please!


Maybe this should take an enum that you can bit-or to say which combinations you want?

This revision now requires changes to proceed.Dec 15 2022, 10:53 AM
philnik updated this revision to Diff 483946.Dec 19 2022, 6:26 AM
philnik marked 25 inline comments as done.

Address some more comments


There are a two reasons:

  1. It's a lot harder to define _LIBCPP_MOVE_ONLY_FUNCTION_INVOKE_QUALS
  2. Having defaults for them allows editors to parse the file properly, instead of generating errors all over the place.

Doesn't apply anymore.


I guess so, but I don't think it makes a lot of sense to mark constexpr since nothing else is constexpr, or is there some benefit?

ldionne added inline comments.Dec 19 2022, 12:11 PM

I would add

// This is also a nice place to document how we're storing the vtable, i.e. explain that we're either pointing to a TrivialVTable or a non-trivial one based on the bool.
using _VTableStorage = __pointer_int_pair<const _TrivialVTable*, bool, bitfield_width(1)>;

Can you use a more descriptive name here?