Remove the vector base class as suggested by @ldionne
Details
- Reviewers
ldionne • Quuxplusone Mordante - Group Reviewers
Restricted Project - Commits
- rGb82da8b55560: [libc++] Remove vector base class
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Given that we end up with a lot of conditionals when removing this, I don't think this is worth doing. We don't gain anything performance wise, and we actually lose something clarity wise because this makes the code more complicated. If we had been able to move the members out of __vector_base without breaking the ABI, I think this would have been compelling, but now I'm not sure. WDYT?
libcxx/include/vector | ||
---|---|---|
304–305 | I was trying to convince myself that it wouldn't be ABI breaking to do the following: template <bool> struct __vector_base_common { ... }; template <class, class> struct __vector_base : private __vector_base_common<true> { /* nothing here */ }; template <class _Tp, class _Allocator> class vector #if !defined(_LIBCPP_ABI_NO_VECTOR_BASE_CLASS) : private __vector_base<_Tp, _Allocator> #endif { pointer __begin_; pointer __end_; __compressed_pair<pointer, allocator_type> __end_cap_; // use members normally here, unconditionally }; However, the usual pitfall of multiply inheritting from vector<T> applies here too. Let's assume we have: struct IntVector1 : std::vector<int> { }; struct IntVector2 : std::vector<int> { }; struct Foo : IntVector1, IntVector2 { }; Before the change, the layout should be: +---------------------------------+ | __vector_base_common<true> | 0 bytes, EBO'd | __vector_base<int>::__begin_ | 8 bytes | __vector_base<int>::__end_ | 8 bytes | __vector_base<int>::__end_cap_ | 8 bytes (disregarding allocator) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | __vector_base_common<true> | 1 byte, can't EBO | __vector_base<int>::__begin_ | 8 bytes | __vector_base<int>::__end_ | 8 bytes | __vector_base<int>::__end_cap_ | 8 bytes (disregarding allocator) +---------------------------------+ After the change, the layout should be: +---------------------------------+ | __vector_base_common<true> | 0 bytes, EBO'd | __vector_base<int> | 0 bytes, EBO'd | vector<int>::__begin_ | 8 bytes | vector<int>::__end_ | 8 bytes | vector<int>::__end_cap_ | 8 bytes (disregarding allocator) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | __vector_base_common<true> | 1 byte, can't EBO | __vector_base<int> | 1 byte, can't EBO <<-- this is the ABI break | vector<int>::__begin_ | 8 bytes | vector<int>::__end_ | 8 bytes | vector<int>::__end_cap_ | 8 bytes (disregarding allocator) +---------------------------------+ Since __vector_base<int> would now be empty, it would get EBO'd the first time it's a base class, but not the second time, so there would be one extra byte because of that "spurious" base class. Long story short, my suggestion doesn't work because it would break multiple inheritance of std::vector. | |
304–305 | vector.cpp should not export the functions if _LIBCPP_ABI_NO_VECTOR_BASE_CLASS is not defined. |
- Address comments
libcxx/include/vector | ||
---|---|---|
304–305 | https://godbolt.org/z/rdMfbchKq seems to suggest that something with your tables is wrong. My current diff has still the same size with your example. |
libcxx/include/vector | ||
---|---|---|
304–305 | Of course!!!! Thanks for challenging this, my tables were wrong indeed. See for example https://godbolt.org/z/Wdz1Kasxn. What I did wrong is that I blindly did as if the EBO would stop kicking in for a duplicate base class. In reality, EBO doesn't kick in if it would cause the two base classes (or base class and first member) to have the same address. Basically, EBO always kicks in except if it would cause two objects to have the same address. This is an issue with e.g. std::reverse_iterator (as explained in https://quuxplusone.github.io/blog/2021/05/07/std-iterator-as-a-base-class/) because std::reverse_iterator inherits from std::iterator (which is empty and hence EBO'd), and then its first member is also something that might inherit from std::iterator (making the two std::iterator subobjects overlap if EBO is used). However, since std::vector is never an empty class, the address of __vector_base_common is never going to overlap, because there's always a couple of pointers between one __vector_base and the next. So EBO can always be used. Hence, I don't think this is an ABI break at all, and in fact I now believe we can even remove the base class altogether, without any check. In that case, I would move template <bool> struct __vector_base_common; template <> struct __vector_base_common<true> { // Both are defined in vector.cpp _LIBCPP_NORETURN _LIBCPP_EXPORTED_FROM_ABI void __throw_length_error() const; _LIBCPP_NORETURN _LIBCPP_EXPORTED_FROM_ABI void __throw_out_of_range() const; }; To the .cpp file and wouldn't define it in the headers at all. Then, we can rename _LIBCPP_ABI_NO_VECTOR_BASE_CLASS to _LIBCPP_ABI_NO_VECTOR_BASE_COMMON and have it control only whether the library provides those symbols. I'd like to have the logic above counter checked, @Quuxplusone maybe? Also, the same logic should apply to std::basic_string. |
libcxx/include/vector | ||
---|---|---|
304–305 | That sounds right to me.
So I'd say the name of the ABI macro should be more general, and control both base class types together? Right now we have _LIBCPP_ABI_NO_BASIC_STRING_BASE_CLASS and _LIBCPP_ABI_NO_VECTOR_BASE_CLASS. We could make it _LIBCPP_ABI_NO_COMMON_CONTAINER_BASE_EXPORTS or something like that. |
- Remove all of vector base
libcxx/include/vector | ||
---|---|---|
304–305 | I would keep them seperate. There is no technical reason, but I think _LIBCPP_ABI_NO_COMMON_CONTAINER_BASE_EXPORTS sounds too much like 'there are no more common base classes in containers'. That might be true, but I didn't check and don't want to guarantee it. There might be an actual use case somewhere, and having two macros doesn't really hurt anything. |
LGTM but possibly significant comment.
libcxx/include/vector | ||
---|---|---|
649–652 | I'd feel slightly better if this unrelated NSDMI-ification were done in a separate commit; but if CI is happy then I'm happy. | |
libcxx/src/vector.cpp | ||
18–30 | These member functions are now implicitly inline, because they appear in the body of a class. FWLIW I'd feel vastly safer if they appeared outside the body of the class, so that they would not be inline. |
libcxx/include/vector | ||
---|---|---|
304–305 | Thanks for counter checking the logic, @Quuxplusone. I am neutral on renaming the macros, but I suggest we keep them separate at this point because it doesn't really hurt, the patches are written, and I'd like to ship + cherry-pick them. | |
649–652 | I don't mind doing it in this patch, however I do wonder why we're using __compressed_pair<pointer, allocator_type>(nullptr, allocator_type()) instead of doing __compressed_pair<pointer, allocator_type> __end_cap_(nullptr, __default_init_tag()); which is what we used to do in the default constructor of the base class. @philnik any reason? | |
libcxx/src/vector.cpp | ||
18–30 | _LIBCPP_EXPORTED_FROM_ABI doesn't do anything special with respect to the inline-ness of the function. So indeed, they are not getting generated and that's why the CI is failing. I suggest we be consistent with D118733 and instead do: template <bool> struct __vector_base_common { _LIBCPP_NORETURN _LIBCPP_EXPORTED_FROM_ABI void __throw_length_error() const; _LIBCPP_NORETURN _LIBCPP_EXPORTED_FROM_ABI void __throw_out_of_range() const; }; template <> void __vector_base_common<true>::__throw_length_error() const { _VSTD::__throw_length_error("vector"); } template <> void __vector_base_common<true>::__throw_out_of_range() const { _VSTD::__throw_out_of_range("vector"); } |
Thanks a lot, this LGTM.
libcxx/include/vector | ||
---|---|---|
1048 | I think it's weird that we're default initializing the allocator instead of value initializing it here, but it's definitely consistent with what we did before this patch. Actually, this leads me to believe that we don't implement the C++20 vector properly. Indeed, in C++20, this constructor is specified as: constexpr explicit vector(size_type n, const Allocator& = Allocator()); Unless mistake, this means that we should be value-initializing the allocator here to be equivalent to the default argument. This seems to have changed in C++14 (https://en.cppreference.com/w/cpp/container/vector/vector). @Quuxplusone, do you happen to know what paper changed this? Does this look suspicious to you too? This comment doesn't affect this review since it's not a behavior change with this patch, but we should rectify the situation in a follow-up patch if my suspicion above is confirmed. |
libcxx/include/vector | ||
---|---|---|
1048 |
Whenever you see default arguments (which are the devil) and the shuffling-around of ctor overload sets, it's almost certainly due to p1163... but maybe actually not, in this case. :) According to timsong-cpp.github.io, C++11 had explicit vector(const Allocator& = Allocator()); explicit vector(size_type n); vector(size_type n, const T& value, const Allocator& = Allocator()); and C++14 had vector(); explicit vector(const Allocator&); explicit vector(size_type n, const Allocator& = Allocator()); vector(size_type n, const T& value, const Allocator& = Allocator()); That (n, alloc) ctor was added to C++14 via LWG2210. Anyway, yeah, I think "value-initialize" is the correct answer. And I assume we agree that the extra move-construct is not observable. If you're going to mess with the overload set here, now would be a great time to eliminate the default arguments entirely. ;) |
I was trying to convince myself that it wouldn't be ABI breaking to do the following:
However, the usual pitfall of multiply inheritting from vector<T> applies here too. Let's assume we have:
Before the change, the layout should be:
After the change, the layout should be:
Since __vector_base<int> would now be empty, it would get EBO'd the first time it's a base class, but not the second time, so there would be one extra byte because of that "spurious" base class.
Long story short, my suggestion doesn't work because it would break multiple inheritance of std::vector.