This is an archive of the discontinued LLVM Phabricator instance.

[libc++] LWG2070: Use Allocator construction for objects created with allocate_shared
ClosedPublic

Authored by ldionne on Nov 10 2020, 12:25 PM.

Details

Summary

This patch updates allocate_shared to call allocator_traits::construct when creating the object held inside the shared_pointer, and allocator_traits::destroy when destroying it. This resolves the part of P0674R1 that was originally filed as LWG2070.

This change is landed separately from the rest of P0674R1 because it is incredibly tricky from an ABI perspective.

This is the reason why this change is so tricky, as explained in the code:

Prior to C++20 (or if the compiler doesn't support [[no_unique_address]]),
we use an implementation that performs EBO for both the allocator and the
object type stored in the `shared_ptr`. This is the original implementation
that has been used in libc++, and we must be ABI compatible with it.

Starting in C++20, P0674 requires us to use Allocator construction for
initializing the object type. That requirement rules out the use of the
EBO for the object type, since using the EBO implies that the base will be
initialized when the control block is initialized (and hence we can't do it
through Allocator construction). Hence, supporting P0674 requires changing
how we store the object type inside the control block, which we do while
being ABI compatible by using [[no_unique_address]].

To achieve ABI compatibility, this patch implements an ABI-compatible type equivalent to __compressed_pair, but where the data member is stored in a union, and hence available to be initialized via Allocator construction. The tricky part with this emulation is that it needs to be 100% ABI compatible with __compressed_pair, which means also emulating its drawbacks, such as the inability to elide storage for final classes.

Fixes https://llvm.org/PR41900
Supersedes https://llvm.org/D62760

Diff Detail

Event Timeline

ldionne created this revision.Nov 10 2020, 12:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2020, 12:25 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
ldionne requested review of this revision.Nov 10 2020, 12:25 PM
zoecarver added inline comments.Nov 10 2020, 1:05 PM
libcxx/include/memory
3369

I don't think this works. We need to call allocator_traits<_Tp>::construct which requires us to have a pointer of type _Tp* which is why D62760 has to use __shared_ptr_pointer instead of __shared_ptr_emplace.

allocator_traits<_ControlBlockAllocator>::construct isn't sufficient, see the relevant part of the standard: https://eel.is/c++draft/util.smartptr.shared#create-7.5

This is my bad; I should have made this more clear or changed the order of the patches. One of the primary reasons I'm pushing on D62760 is because it adds __shared_ptr_aligned_block and the preliminary logic to use it. This is necessary to support array types in make_shared and allocate_shared, and I thought it would be easier if there was a separate patch to implement __shared_ptr_aligned_block.

That being said, make_shared and allocate_shared will have different overloads for array types, so if we can get away with using __shared_ptr_emplace in the non-array overloads, we should 100% try to do that to simplify the implementation (as is done in this patch).

libcxx/test/std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.create/allocate_shared_construct.pass.cpp
29

In either patch, it would probably be a good idea to move these into MyAllocator so we can make sure that the correct specialization is setting them.

ldionne updated this revision to Diff 311732.Dec 14 2020, 3:41 PM

Update with a new approach entirely.

ldionne retitled this revision from [libc++] Call allocator_traits construct() and destroy() in allocate_shared to [libc++] LWG2070: Use Allocator construction for objects created with allocate_shared.Dec 14 2020, 3:43 PM
ldionne edited the summary of this revision. (Show Details)
ldionne updated this revision to Diff 311734.Dec 14 2020, 3:44 PM
ldionne edited the summary of this revision. (Show Details)

Reupload as patch application failed

Another approach, which Zoe thought about just now and told me on Slack, would be to implement the storage like this instead (untested, I just wrote this up):

struct _Storage {
    _LIBCPP_HIDE_FROM_ABI explicit _Storage(_Alloc&& __a) {
        ::new ((void*)__get_alloc()) _Alloc(_VSTD::move(__a));
    }
    _LIBCPP_HIDE_FROM_ABI ~_Storage() {
        __get_alloc()->~_Alloc();
    }
    _Alloc* __get_alloc() _NOEXCEPT { /* compute offset */ }
    _Tp* __get_elem() _NOEXCEPT { /* compute offset */ }
    char alignas(__compressed_pair<_Alloc, _Tp>) __blob_[sizeof(__compressed_pair<_Alloc, _Tp>)];
};

I think we'd be able to compute the offsets with a bit of help from the __compressed_pair, and then I'm pretty sure it would be valid to placement-new into the blob as we like. I'll explore this approach too to see what it gives - the benefit would be not needing [[no_unique_address]] and potentially simpler code.

Let me start by saying that I haven't spent a lot of time looking at it but I really like this implementation.


Another approach, which Zoe thought about just now and told me on Slack, would be to implement the storage like this instead (untested, I just wrote this up):

struct _Storage {
    _LIBCPP_HIDE_FROM_ABI explicit _Storage(_Alloc&& __a) {
        ::new ((void*)__get_alloc()) _Alloc(_VSTD::move(__a));
    }
    _LIBCPP_HIDE_FROM_ABI ~_Storage() {
        __get_alloc()->~_Alloc();
    }
    _Alloc* __get_alloc() _NOEXCEPT { /* compute offset */ }
    _Tp* __get_elem() _NOEXCEPT { /* compute offset */ }
    char alignas(__compressed_pair<_Alloc, _Tp>) __blob_[sizeof(__compressed_pair<_Alloc, _Tp>)];
};

I think we'd be able to compute the offsets with a bit of help from the __compressed_pair, and then I'm pretty sure it would be valid to placement-new into the blob as we like. I'll explore this approach too to see what it gives - the benefit would be not needing [[no_unique_address]] and potentially simpler code.

If the approach I'm talking about is simpler in this patch, it might not be simpler in absolute terms. If we're going to implement _Storage anyway as a more general thing, I think this implementation makes a lot of sense.

I think the code snippet I quoted is kind of missing my point. Using aligned storage, or a union, or anything similar is a bit irrelevant to what I was trying to say. I was highlighting a good way to do the "compute offset" part. The fundamental problem that this patch is trying to solve is that we need a pointer to the second element of __data_ in __shared_ptr_emplace, and we need that pointer before __shared_ptr_emplace is initialized. One way to do that (which you implement in this patch) is to make the second member a union and take its address. Another solution (what I was proposing) is we could simply static cast a pointer to __data_ to the correct __compressed_pair_elem type. Because __compressed_pair inherits from both of its elements, this should Just Work™. The latter solution would require no changes to the layout of __shared_ptr_emplace (and depending on how you look at it, that might be positive).

libcxx/include/memory
2611

This reminds me that we should support/test move-only allocator types.

2634

Do we really need __tmp allocators everywhere?

2639

IMHO we should make _Storage a more general utility.

Also can you please static assert that the alignment and size is the same as an equivilant __compressed_pair.

2643

This class is a work of art :)

ldionne updated this revision to Diff 312624.Dec 17 2020, 2:50 PM

Add static_assert

libcxx/include/memory
2634

We need this one because we destroy the original one when we call ~Storage(), and then we need to deallocate this using the allocator. So I really don't see a way to get around this.

2639

IMHO we should make _Storage a more general utility.

I think I agree, but I'd like to do that in a separate patch. Making it generic will add complexity, like the ability to skip initialization of either member, etc. I'd like to keep this change small because of the trickiness.

Also can you please static assert that the alignment and size is the same as an equivilant __compressed_pair.

Done.

2643

I'm not sure whether I want to say thank you or run away and hide :-). I guess the problem we're trying to solve is pretty terrible, but the solution isn't too bad.

ldionne added inline comments.Dec 17 2020, 2:52 PM
libcxx/test/std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.create/allocate_shared_construct.pass.cpp
29

I don't think we can, because the allocator is going to be rebound to allocate an unknown object (ie the shared pointer control block).

zoecarver added inline comments.Dec 17 2020, 2:54 PM
libcxx/include/memory
2648

The above comment got me thinking... why can't this be a default destructor?

ldionne added inline comments.Dec 17 2020, 3:05 PM
libcxx/include/memory
2648

Frankly, I was getting an error about the compiler being unable to generate a destructor because the type in the union is non trivial, and this stopped the error.

I imagine the compiler is saying: you have a union with a non-trivial destructor so I want you to write your cleanup explicitly if you need some. But I could be wrong.

This LGTM. Not only to fix the LWG issue, but this is a great cleanup to this part of the codebase. So, thanks Louis.

libcxx/test/std/utilities/memory/util.smartptr/util.smartptr.shared/libcxx.control_block_layout.pass.cpp
39

Maybe this should be a libcxx test only? I see "libcxx" in the name, but maybe it should be moved to the "libcxx" test directory.

138

This test makes me feel really confident that there won't be any ABI issues. Thanks!

libcxx/test/std/utilities/memory/util.smartptr/util.smartptr.shared/util.smartptr.shared.create/allocate_shared_construct.pass.cpp
29

Fair enough. Makes sense.

zoecarver accepted this revision.Dec 17 2020, 3:20 PM
ldionne updated this revision to Diff 314994.Jan 6 2021, 2:43 PM

Use alternative implementation that's much simpler than [[no_unique_address]].

ldionne added inline comments.Jan 6 2021, 2:46 PM
libcxx/test/std/utilities/memory/util.smartptr/util.smartptr.shared/libcxx.control_block_layout.pass.cpp
39

It is a libc++ test only, since it has // REQUIRES: libc++. I've actually been thinking about not having separate directories for libcxx specific tests and other tests, since we have lit markup that can handle it.

I believe one of the reasons was to ease the life of MSVC and libstdc++ using our test suite, however they now run using lit and so they do have the features available.

zoecarver added inline comments.Jan 6 2021, 4:52 PM
libcxx/include/memory
2635

Alternatively, you could make __blob_ a std::aligned_storage.

2639

(This is a new comment, I'm not replying to an old comment, even though that's what it looks like.)

Isn't this UB? __blob_ is uninitialized when you call .first() on it in __get_alloc().

ldionne added inline comments.Jan 7 2021, 2:35 PM
libcxx/include/memory
2635

I used to love aligned_storage, and then I read http://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1413r2.pdf. I'm not sure I agree entirely with the paper, but I've been moving away from the type since it seemed like there were efforts to deprecate it. I'm not sure where those efforts ended up, though.

2639

Yes, it is indeed UB (hopefully benign UB). I ran it through ubsan before updating the patch and it didn't complain, so I thought to myself "it's good enough". But you're entirely right, it *is* plain and simple UB.

The issue is that I haven't been able to find a way to implement it without that. Simply, what we want is this:

_Alloc* __get_alloc() _NOEXCEPT {
    std::ptrdiff_t __offset = _CompressedPair::__first_offset(); // how to compute this?
    _Alloc* __alloc = static_cast<_Alloc*>(&__blob_ + __offset);
    return __alloc;
}

I wasn't able to find an implementation that didn't actually require an object to be alive:

_LIBCPP_INLINE_VISIBILITY
static std::ptrdiff_t __first_offset() _NOEXCEPT {
    __compressed_pair __p; // also requires default-constructing, which doesn't work
    _T1 const* __first = _VSTD::addressof(__p.first());
    return static_cast<char const*>(__first) - static_cast<char const*>(&__p);
}

I tried something with offsetof inside __compressed_pair_elem, but I couldn't compute the offset of the second element of the compressed pair. I don't know whether it's impossible or whether I'm just not imaginative enough. If we can't think of something, I can go back to the no_unique_address implementation (it's just a bit more complicated, but it's correct AFAICT).

zoecarver added inline comments.Jan 7 2021, 2:45 PM
libcxx/include/memory
2639

I actually really liked the no_unique_address implementation. And I think at this point we only support (or should only support) compiler that have implemented no_unique_address all the way back to C++03.

That being said, there is a way to do what you're thinking. We discussed this earlier, but I don't think I did a good job explaining. Basically, here's the idea: __compressed_pair *always* inherits from two __compressed_pair_elems. Always. We can simply static_cast a __compressed_pair pointer to both of it's base classes. That will give you two pointers which you can use to do placement new on both members.

(I think that might still technically be UB, because you never actually construct the object, but it seems like better UB then calling a member on an uninitialized object, which seems fairly dangerous IMHO.)

zoecarver added inline comments.Jan 7 2021, 2:52 PM
libcxx/include/memory
2639

Here's a pseudo-code example:

_Alloc* __get_alloc() _NOEXCEPT {
    _CompressedPair *__as_pair = reinterpret_cast<_CompressedPair*>(__blob_);
    _CompressedPair::_Base1* __first = static_cast<_CompressedPair::_Base1*>(__as_pair);
    _Alloc *__alloc = reinterpret_cast<_Alloc*>(__first);
    return __alloc;
}
ldionne updated this revision to Diff 315421.Jan 8 2021, 9:08 AM

Remove UB!

libcxx/include/memory
2639

Hmm, that's clever, and actually very simple. Do you agree to go forward with that implementation, then?

Working with you is very humbling :-).

zoecarver accepted this revision.Jan 8 2021, 9:18 AM

This looks great. Thanks for spending so much time iterating on this patch!

libcxx/include/memory
2635

The reason I liked using it in libc++ is that it works in C++03 mode (and alignas / alignof didn't use to). But now that we don't need to use it, I'm on board with moving away.

Thanks for the paper link :)

2639

Yes, I'm happy to go forward with that.

Thanks, you're too kind :)

ldionne accepted this revision as: Restricted Project.Jan 8 2021, 10:03 AM

CI is passing, I'm going to ship this.

This revision is now accepted and ready to land.Jan 8 2021, 10:03 AM

@ldionne, shouldn't the related bug be updated?
https://llvm.org/PR41900

@ldionne, shouldn't the related bug be updated?
https://llvm.org/PR41900

Yes, thanks for the reminder. Done now.