This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Remove vector base class
ClosedPublic

Authored by philnik on Jan 12 2022, 5:52 AM.

Details

Reviewers
ldionne
Quuxplusone
Mordante
Group Reviewers
Restricted Project
Commits
rGb82da8b55560: [libc++] Remove vector base class
Summary

Remove the vector base class as suggested by @ldionne

Diff Detail

Event Timeline

philnik requested review of this revision.Jan 12 2022, 5:52 AM
philnik created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2022, 5:52 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
ldionne requested changes to this revision.Jan 24 2022, 12:33 PM

Given that we end up with a lot of conditionals when removing this, I don't think this is worth doing. We don't gain anything performance wise, and we actually lose something clarity wise because this makes the code more complicated. If we had been able to move the members out of __vector_base without breaking the ABI, I think this would have been compelling, but now I'm not sure. WDYT?

libcxx/include/vector
304–305

I was trying to convince myself that it wouldn't be ABI breaking to do the following:

template <bool> struct __vector_base_common { ... };
template <class, class> struct __vector_base : private __vector_base_common<true> { /* nothing here */ };

template <class _Tp, class _Allocator>
class vector
#if !defined(_LIBCPP_ABI_NO_VECTOR_BASE_CLASS)
  : private __vector_base<_Tp, _Allocator>
#endif
{
  pointer                                                      __begin_;
  pointer                                                      __end_;
  __compressed_pair<pointer, allocator_type>                   __end_cap_;

  // use members normally here, unconditionally
};

However, the usual pitfall of multiply inheritting from vector<T> applies here too. Let's assume we have:

struct IntVector1 : std::vector<int> { };
struct IntVector2 : std::vector<int> { };
struct Foo : IntVector1, IntVector2 { };

Before the change, the layout should be:

+---------------------------------+
| __vector_base_common<true>      | 0 bytes, EBO'd
| __vector_base<int>::__begin_    | 8 bytes
| __vector_base<int>::__end_      | 8 bytes
| __vector_base<int>::__end_cap_  | 8 bytes (disregarding allocator)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| __vector_base_common<true>      | 1 byte, can't EBO
| __vector_base<int>::__begin_    | 8 bytes
| __vector_base<int>::__end_      | 8 bytes
| __vector_base<int>::__end_cap_  | 8 bytes (disregarding allocator)
+---------------------------------+

After the change, the layout should be:

+---------------------------------+
| __vector_base_common<true>      | 0 bytes, EBO'd
| __vector_base<int>              | 0 bytes, EBO'd
| vector<int>::__begin_           | 8 bytes
| vector<int>::__end_             | 8 bytes
| vector<int>::__end_cap_         | 8 bytes (disregarding allocator)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| __vector_base_common<true>      | 1 byte, can't EBO
| __vector_base<int>              | 1 byte, can't EBO <<-- this is the ABI break
| vector<int>::__begin_           | 8 bytes
| vector<int>::__end_             | 8 bytes
| vector<int>::__end_cap_         | 8 bytes (disregarding allocator)
+---------------------------------+

Since __vector_base<int> would now be empty, it would get EBO'd the first time it's a base class, but not the second time, so there would be one extra byte because of that "spurious" base class.

Long story short, my suggestion doesn't work because it would break multiple inheritance of std::vector.

304–305

vector.cpp should not export the functions if _LIBCPP_ABI_NO_VECTOR_BASE_CLASS is not defined.

This revision now requires changes to proceed.Jan 24 2022, 12:33 PM
philnik updated this revision to Diff 402655.Jan 24 2022, 1:50 PM
philnik marked an inline comment as done.
  • Address comments
libcxx/include/vector
304–305

https://godbolt.org/z/rdMfbchKq seems to suggest that something with your tables is wrong. My current diff has still the same size with your example.

ldionne added inline comments.Feb 1 2022, 6:53 AM
libcxx/include/vector
304–305

Of course!!!! Thanks for challenging this, my tables were wrong indeed. See for example https://godbolt.org/z/Wdz1Kasxn.

What I did wrong is that I blindly did as if the EBO would stop kicking in for a duplicate base class. In reality, EBO doesn't kick in if it would cause the two base classes (or base class and first member) to have the same address. Basically, EBO always kicks in except if it would cause two objects to have the same address.

This is an issue with e.g. std::reverse_iterator (as explained in https://quuxplusone.github.io/blog/2021/05/07/std-iterator-as-a-base-class/) because std::reverse_iterator inherits from std::iterator (which is empty and hence EBO'd), and then its first member is also something that might inherit from std::iterator (making the two std::iterator subobjects overlap if EBO is used).

However, since std::vector is never an empty class, the address of __vector_base_common is never going to overlap, because there's always a couple of pointers between one __vector_base and the next. So EBO can always be used.

Hence, I don't think this is an ABI break at all, and in fact I now believe we can even remove the base class altogether, without any check. In that case, I would move

template <bool>
struct __vector_base_common;

template <>
struct __vector_base_common<true> {
    // Both are defined in vector.cpp
    _LIBCPP_NORETURN _LIBCPP_EXPORTED_FROM_ABI void __throw_length_error() const;
    _LIBCPP_NORETURN _LIBCPP_EXPORTED_FROM_ABI void __throw_out_of_range() const;
};

To the .cpp file and wouldn't define it in the headers at all. Then, we can rename _LIBCPP_ABI_NO_VECTOR_BASE_CLASS to _LIBCPP_ABI_NO_VECTOR_BASE_COMMON and have it control only whether the library provides those symbols.

I'd like to have the logic above counter checked, @Quuxplusone maybe?

Also, the same logic should apply to std::basic_string.

Quuxplusone added inline comments.Feb 1 2022, 8:22 AM
libcxx/include/vector
304–305

That sounds right to me.
So the new <vector> header would unconditionally not call into the dylib to throw exceptions; but we'd keep the exception-throwing functions present in the dylib (under !defined(_LIBCPP_ABI_NO_VECTOR_BASE_COMMON)) for the benefit of old .o files that were compiled against the previous release's <vector> header.

Also, the same logic should apply to std::basic_string.

So I'd say the name of the ABI macro should be more general, and control both base class types together? Right now we have _LIBCPP_ABI_NO_BASIC_STRING_BASE_CLASS and _LIBCPP_ABI_NO_VECTOR_BASE_CLASS. We could make it _LIBCPP_ABI_NO_COMMON_CONTAINER_BASE_EXPORTS or something like that.

philnik updated this revision to Diff 404997.Feb 1 2022, 10:34 AM
  • Remove all of vector base
libcxx/include/vector
304–305

I would keep them seperate. There is no technical reason, but I think _LIBCPP_ABI_NO_COMMON_CONTAINER_BASE_EXPORTS sounds too much like 'there are no more common base classes in containers'. That might be true, but I didn't check and don't want to guarantee it. There might be an actual use case somewhere, and having two macros doesn't really hurt anything.

philnik updated this revision to Diff 405008.Feb 1 2022, 10:53 AM
  • Rebased
  • Fix CI

LGTM but possibly significant comment.

libcxx/include/vector
649–652

I'd feel slightly better if this unrelated NSDMI-ification were done in a separate commit; but if CI is happy then I'm happy.

libcxx/src/vector.cpp
18–30

These member functions are now implicitly inline, because they appear in the body of a class. FWLIW I'd feel vastly safer if they appeared outside the body of the class, so that they would not be inline.
I'm guessing that _LIBCPP_EXPORTED_FROM_ABI is some kind of magic "anti-inline" keyword that will cause them to be generated anyway, on relevant platforms? @ldionne you think this is OK (and better than defining them out-of-line)?

ldionne requested changes to this revision.Feb 3 2022, 1:52 PM
ldionne added inline comments.
libcxx/include/vector
304–305

Thanks for counter checking the logic, @Quuxplusone. I am neutral on renaming the macros, but I suggest we keep them separate at this point because it doesn't really hurt, the patches are written, and I'd like to ship + cherry-pick them.

649–652

I don't mind doing it in this patch, however I do wonder why we're using __compressed_pair<pointer, allocator_type>(nullptr, allocator_type()) instead of doing

__compressed_pair<pointer, allocator_type> __end_cap_(nullptr, __default_init_tag());

which is what we used to do in the default constructor of the base class. @philnik any reason?

libcxx/src/vector.cpp
18–30

_LIBCPP_EXPORTED_FROM_ABI doesn't do anything special with respect to the inline-ness of the function. So indeed, they are not getting generated and that's why the CI is failing. I suggest we be consistent with D118733 and instead do:

template <bool>
struct __vector_base_common {
  _LIBCPP_NORETURN _LIBCPP_EXPORTED_FROM_ABI void __throw_length_error() const;
  _LIBCPP_NORETURN _LIBCPP_EXPORTED_FROM_ABI void __throw_out_of_range() const;
};

template <>
void __vector_base_common<true>::__throw_length_error() const {
  _VSTD::__throw_length_error("vector");
}

template <>
void __vector_base_common<true>::__throw_out_of_range() const {
  _VSTD::__throw_out_of_range("vector");
}
This revision now requires changes to proceed.Feb 3 2022, 1:52 PM
philnik updated this revision to Diff 405784.Feb 3 2022, 2:13 PM
philnik marked 4 inline comments as done.
  • Rebased
  • Address comments
  • Rename macro to be consistent with basic_string
ldionne accepted this revision.Feb 3 2022, 2:41 PM

Thanks a lot, this LGTM.

libcxx/include/vector
1048

I think it's weird that we're default initializing the allocator instead of value initializing it here, but it's definitely consistent with what we did before this patch.

Actually, this leads me to believe that we don't implement the C++20 vector properly. Indeed, in C++20, this constructor is specified as:

constexpr explicit vector(size_type n, const Allocator& = Allocator());

Unless mistake, this means that we should be value-initializing the allocator here to be equivalent to the default argument. This seems to have changed in C++14 (https://en.cppreference.com/w/cpp/container/vector/vector). @Quuxplusone, do you happen to know what paper changed this? Does this look suspicious to you too?

This comment doesn't affect this review since it's not a behavior change with this patch, but we should rectify the situation in a follow-up patch if my suspicion above is confirmed.

This revision is now accepted and ready to land.Feb 3 2022, 2:41 PM
Quuxplusone added inline comments.Feb 3 2022, 7:36 PM
libcxx/include/vector
1048

This seems to have changed in C++14. @Quuxplusone, do you happen to know what paper changed this? Does this look suspicious to you too?

Whenever you see default arguments (which are the devil) and the shuffling-around of ctor overload sets, it's almost certainly due to p1163... but maybe actually not, in this case. :) According to timsong-cpp.github.io, C++11 had

explicit vector(const Allocator& = Allocator());
explicit vector(size_type n);
vector(size_type n, const T& value, const Allocator& = Allocator());

and C++14 had

vector();
explicit vector(const Allocator&);
explicit vector(size_type n, const Allocator& = Allocator());
vector(size_type n, const T& value, const Allocator& = Allocator());

That (n, alloc) ctor was added to C++14 via LWG2210.

Anyway, yeah, I think "value-initialize" is the correct answer. And I assume we agree that the extra move-construct is not observable.

If you're going to mess with the overload set here, now would be a great time to eliminate the default arguments entirely. ;)

This revision was automatically updated to reflect the committed changes.