Details
- Reviewers
ldionne Mordante var-const - Group Reviewers
Restricted Project - Commits
- rGf4fb72e6d4ce: [libc++] Use uninitialized algorithms for vector
rG23cf42e706fb: [libc++] Use uninitialized algorithms for vector
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libcxx/include/__memory/swap_allocator.h | ||
---|---|---|
20 | This is just copied and re-formatted. |
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
501 | Is this moved too? For reviewing I prefer a stack of 2 (or more) commits where the moving of code is separated from the real changes. | |
libcxx/include/__utility/move.h | ||
31 | Is this needed? I assume the original code also resulted in this type when exceptions were disabled. | |
libcxx/test/libcxx/containers/sequences/vector/asan_throw.pass.cpp | ||
102 | Is this a behaviour change or a bug fix? I don't understand this change. Maybe add a description to the patch that explains this behaviour change. That way when we look at the history we know why this was done. |
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
501 | No, this isn't just moved. Here the names and behaviour changed. The algorithms now destroy the elements again if an exception has been thrown. | |
libcxx/include/__utility/move.h | ||
31 | https://godbolt.org/z/vsPqYa99h I also found it weird, but noexcept is still correctly checked with -fno-exceptions. | |
libcxx/test/libcxx/containers/sequences/vector/asan_throw.pass.cpp | ||
102 | https://eel.is/c++draft/vector#modifiers-2 is the relevant paragraph I think. I'm not entirely sure if this is just a behaviour change or if it's a bugfix. I think it's a bugfix. |
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
503 | Can you please add a comment explaining what __uninitialized_allocator_copy & friends do similar to what we do in __uninitialized_allocator_fill_n (and others)? In particular, I think that explaining the exception safety guarantee offered by each algorithm added here is important. | |
506 | Should we have a static_assert(__is_cpp17_copy_insertable<...>) here? | |
525 | Let's introduce RawType2 for symmetry? | |
568 | If you used std::copy (or std::copy_n), I think you could simplify this and you wouldn't have to handle reverse_iterator specially below. | |
libcxx/include/__utility/move.h | ||
30 | We shouldn't do this. It makes us non-conforming under -fno-exceptions. We shouldn't try to do as-if noexcept(<anything>) == true *in the library* when -fno-exceptions is used. If we wanted to have that behavior, it should be achieved through the compiler. Also relevant as background: D62228 | |
libcxx/include/vector | ||
905 | The code didn't destroy the new elements before in case of a failure, but now it does. I wonder whether the original code was written that way on purpose? | |
libcxx/test/libcxx/containers/sequences/vector/asan_throw.pass.cpp | ||
102 | My reading is that we are fixing a bug, since this line of the spec should apply:
Indeed, the exception is thrown by X(char), which is none-of-the-above, and so there should be no effects. Previously, the size of the vector would have been modified, and that's wrong. This mandates a test in libcxx/test/std -- it's a pretty serious bug since exception guarantees in push_back & friends are a big deal. | |
106 | We should also have a test that ensures that we destroy the newly created elements if an exception is thrown. I think it was wrong to skip that in the code previously, since that means that we'd have been potentially leaking stuff if an exception was thrown. |
- Address comments
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
506 | I'm not sure. I'll investigate it later, since it breaks a lot of tests. I think it's either not applicable or the trait is currently broken. It's not used anywhere right now. |
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
362–365 | Let's use the same pattern for __enable_if_t here, i.e. use a non-type template parameter like you do below. | |
390–391 | We should figure out what clang-format does wrong here, but in the meantime I would rather use the same formatting as line 364-366. | |
503 | Please also add a note that array elements are NOT treated specially by this function. We may also want to differentiate between functions in this file that handle arrays vs those that don't, since it's really not obvious from their current names. | |
512 | I think you either need to construct array elements recursively *and* destroy them recursively, or not. But construction and destruction has to be consistent w.r.t. how it handles array elements. Otherwise, you'll get a mismatching number of calls to allocator_traits::construct and allocator_traits::destroy. Concretely, I think for std::vector you don't want to treat array elements specially. So I would add a std::__allocator_destroy(_Alloc&, _Iter, _Sent) function and call that in the catch (...) instead. This should be tested by ensuring that we have a matching number of calls to construct and destroy when we use this algorithm with array elements. | |
537 | Here, I suggest this instead: template <class _Alloc, class _Type, class _RawType = typename remove_const<_Type>::type, __enable_if_t< // using _RawType because of the allocator<T const> extension is_trivially_copy_constructible<_RawType>::value && is_trivially_copy_assignable<_RawType>::value && (__is_default_allocator<_Alloc>::value || !__has_construct<_Alloc, _RawType*, _Type const&>::value) >* = nullptr> _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_AFTER_CXX17 _Type* __uninitialized_allocator_copy(_Alloc&, _Type const* __first1, _Type const* __last1, _Type* __first2) { // TODO: Remove the const_cast once we drop support for std::allocator<T const> return std::copy(__first1, __last1, const_cast<_RawType*>(__first2)); } In particular, note the tweaked enable_if conditions -- I think this is what we need to be safe here. If the type is not trivially copy constructible, we MUST call its copy constructor (instead of the assignment in std::copy), otherwise the optimization is not transparent. If the type is not trivially assignable, then we can also notice that we are doing an assignment instead of a copy-construction here, and so it's not transparent. | |
543 | ||
564 | Same comment, we shouldn't treat array types specially here. Treating array types specially was only meaningful for the std::make_shared<T[N]>(...) functions because they were specified that way. Otherwise, I would have never bothered to handle array types specially :-). |
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
356 | Can you please add a TODO to switch this to a normal left-to-right algorithm and to use reverse_iterator from callers? | |
502–503 | Please comment on what this function does. | |
510 | We should destroy in reverse order of construction, it's usually what's expected. Applies everywhere. | |
515–516 | Instead of creating a separate __transaction class for C++03, I would do this: template <class _Alloc, class _Iter> struct _AllocatorDestroyRange { _LIBCPP_CONSTEXPR_AFTER_CXX11 void operator()() const { std::__allocator_destroy(__alloc_, __first, __last); } _Alloc& __alloc_; _Iter& __first; _Iter& __last; }; And then I'd use this from the __uninitialized_FOO functions as: __transaction<_AllocatorDestroyRange<_Alloc, _Iter1> > __guard(_AllocatorDestroyRange<_Alloc, _Iter1>(__allloc, __destruct_first, __first2)); Basically, I don't like that we are creating our own local emulation of std::bind/std::bind_back just for __transaction. Another option would be something like auto __guard = std::__make_transaction(std::bind_back(&__allocator_destroy<_Alloc, _Iter2, _Iter2>, __alloc, __destruct_first, __first2)); However, bind_back is not available in C++03 and I'm not 100% sure it would be a good idea to drag in that dependency. | |
526–530 | Perhaps this should be called __allocator_has_trivial_copy_construct instead? |
LGTM with comments applied.
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
31–32 | I don't think we need to add this. We don't use min() or max() in this file. | |
501 | Nitpick. | |
525–527 | ||
571–573 | ||
589 | Same, can be removed. | |
libcxx/include/__utility/transaction.h | ||
89–90 ↗ | (On Diff #443493) | _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR ? |
91 ↗ | (On Diff #443493) |
@ldionne looks like this broke the lldb test suite (https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/). Could you revert this/fix the test?
Could you maybe give me some more context? Right now I have no idea what the breakage even is, let alone how to fix it.
From the expanded "All failed tests" section on https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45513/testReport/
Assertion failed: (isa<InjectedClassNameType>(Decl->TypeForDecl)), function getInjectedClassNameType, file /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/AST/ASTContext.cpp, line 4588. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. HandleCommand(command = "expr s_vector.push({4})") 1. <eof> parser at end of file 2. /Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/include/c++/v1/stack:236:10: instantiating function definition 'std::stack<C, std::vector<C>>::push' 3. /Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/include/c++/v1/vector:577:36: instantiating function definition 'std::vector<C>::push_back' 4. /Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/include/c++/v1/vector:718:17: instantiating function definition 'std::vector<C>::__push_back_slow_path<C>' 5. /Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/include/c++/v1/vector:700:10: instantiating function definition 'std::vector<C>::__swap_out_circular_buffer' 6. /Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/include/c++/v1/__memory/uninitialized_algorithms.h:614:1: instantiating function definition 'std::__uninitialized_allocator_move_if_noexcept<std::allocator<C>, std::reverse_iterator<C *>, std::reverse_iterator<C *>, C, void>'
It doesn't look like libc++ is at fault here. If the goal is to get the test green again ASAP, I would XFAIL the test with a TODO and try to get someone from Clang to take a look, since it seems to be crashing inside Clang.
Reverting the libc++ patch will only hide the issue (like a XFAIL), but it will also be pretty disruptive for us.
Is there any way to convert TestStackFromStdModule.py to a normal reproducer? Or is this specific to LLDB in some way? Sorry, I don't understand what the test is trying to achieve and what it does to do that. I guess it somehow creates a std::stack<C, std::vector<C>> and then does some operations on them? If the problem isn't the code itself I guess I have no way to fix it in this patch.
I tried
#include <cassert> #include <stack> struct C { C(int i_) : i{i_} {} int i; }; int main() { std::stack<C, std::vector<C>> s_vector; s_vector.push({4}); s_vector.pop(); s_vector.size(); s_vector.top(); s_vector.emplace(5); assert(s_vector.top().i == 5); }
but that is happy.
@ldionne just by looking at the patch, I can't tell if it's clang or libcxx who is doing the wrong thing (I do see that the stack trace includes this new __uninitialized_allocator_move_if_noexcept function, which is hidden from the ABI, maybe that's part of the problem?) . It shouldn't be LLDB's responsibility to track down who is responsible after and it's not healthy for the project to xfail tests that are broken by one of its dependencies and investigate later on, which would just lead to more and more xfails. The LLVM policy states that we should revert patches "If you break a buildbot in a way which can’t be quickly fixed, please revert.". So I think we should do that.
It also states that there should be a reproducer (ideally) and more generally a way for the author to debug the issue. It should should be simple to obtain a reproducer if it's a libc++ problem. Otherwise the patch is reverted and I still don't know what the bug is; assuming there even is one in this code. If you don't give me that I'll re-land the patch without any meaningful progress, which is useless.
I have literally no idea what the test is even doing (as I said before). I also don't know how to run it. You have given me literally 0 information regarding the test other that "it fails". If it's a libc++ issue it should be possible to reproduce it without having to run LLDB tests.
The test compiles this code:
#include <list> #include <stack> #include <vector> struct C { // Constructor for testing emplace. C(int i) : i(i) {}; int i; }; int main(int argc, char **argv) { // std::deque is the default container. std::stack<C> s_deque({{1}, {2}, {3}}); std::stack<C, std::vector<C>> s_vector({{1}, {2}, {3}}); std::stack<C, std::list<C>> s_list({{1}, {2}, {3}}); return 0; // Set break point at this line. }
Runs lldb and attaches at the line with the " // Set break point at this line." comment. From the lldb console it (among other things):
- Turns on importing the std module (settings set target.import-std-module true) - "Import the 'std' C++ module to improve expression parsing involving C++ standard library types."
- Evaluates the following expressions: "expr s_vector.push({4})" which constructs a new vector to add to the stack.
You can find the cpp file in the same directory as the test, there's also a Makefile there that specifies how the test is compiled.
If you'd like to experiment manually, you can:
- Delete the lldb-test-build.noindex folder in your build directory.
- Run "bin/lldb-dotest ../llvm-project/lldb -p TestStackFromStdModule.py"
- Navigate to "build/lldb-test-build.noindex/commands/expression/import-std-module/stack/TestStackFromStdModule.test_dsym"
There should be a a.out file there that you can attach with your built lldb and test with.
@augusto2112 Thank you for the information. I've run lldb locally with my patch and it seems to be fine. I ran everything in the test and AFAICT everything ran and produced the correct results. My lldb is compiled from 36c9e9968affac543952e81637a0584a4b708597 without assertions enabled. So the bug either doesn't show in the output (maybe the assertion is just wrong?), or the bug has been introduced within the last 3 weeks. Either way it's pretty clear to me that my code is fine. I'd like to re-land this patch with the LLDB test disabled, since D68365 depends on it and I'd like to get that into LLVM 15. Do you have any objections to this?
Ok, I tested your change and removing the assert and it seems to work fine. It's not great that the assertion is failing though, the ASTContext::getInjectedClassNameType code hasn't been touched since 2010 so I doubt that it's broken (the assert checks the of the decl is an InjectedClassNameType. Before your change it's an InjectedClassName after your change it somehow becomes a Record, given the function is called getInjectedClassNameType that assert seems correct to me). If this is urgent you can xfail the test, and if you could include a bug report number with your xfail that'd be great.
In Chromium we noticed that this adds 65 MB of debug info to one of our binaries (we noticed because that pushed it over the 4GB limit, so we'll need to do something about that anyway).
I noticed that the commit message has the "what" but not the "why" -- is using "uninitialized algorithms for vector" something mandated by the standard, some sort of optimization, or something else? Any chance vector could be made more lean instead? :-)
That is only true if those XFAILs are not investigated and fixed in a timely fashion. And in that case, I would say there is a larger issue that nobody's responsible for fixing those. If we have bugs in Clang/LLDB, we should have folks ready to jump in and investigate them -- otherwise, that is what's unhealthy for the project.
The point we are trying to make is that the libc++ code itself is correct. The tests are passing, it's valid C++ and it does what it should. The fact that it happens to start crashing Clang is an issue, but it doesn't mean that we should prevent ourselves from using this construct. For instance, this is something that users in the wild could write themselves and it would crash Clang just the same. So instead of bending backwards or reverting the patch in libc++, the correct thing to do is to acknowledge that there's a Clang (or LLDB) bug and try to fix it as soon as possible. And in the meantime, to ensure that the CI stays meaningful, mark the test as XFAILing due to that bug.
If this were a minor unimportant patch, I wouldn't be spending so much time arguing -- we all have more important stuff to do. However, this happens to be an incredibly important patch if we want to add support for constexpr std:::vector, which we are aiming to land in time for LLVM 15.
Now, that is an actual issue that mandates our attention. Thanks for the heads up. I suspect it has to do with the fact that we instantiate more templates before we get to the memmove optimization in std::copy. @hans Do you have any way to tell what's in the 65 MB of debug information? Or a reproducer so we can try a few things out and see what impact it has on the size of the debug info? I assume this problem wouldn't be visible on a simple reproducer hand-written from scratch.
@philnik Let's re-land this with an XFAIL for the LLDB test and work with the Chromium folks to decrease the amount of debug information this creates. We can do this after the release and cherry-pick back the improvements.
It's not just Chromium, btw. We're seeing more fallout from to this commit. My teammates will post specific findings a bit later today.
Hi folks,
We have some code which compiles fine with the version previous to this patch and fails after.
The code compiles fine with godbolt: https://gcc.godbolt.org/z/ToPGG5cMb
But fails when built with clang containing this revision.
Repro compilation command:
clang -stdlib=libc++ -std=gnu++17 \ -c /tmp/test.cc \ -o /tmp/test.o
Compiler output:
/tmp/test.cc:5:17: error: assigning to 'float' from incompatible type 'Vx<float, 2>' data[Index] = std::forward<LastArg>(last_arg); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/test.cc:22:5: note: in instantiation of function template specialization 'SetData<0, 2, float, Vx<float, 2> &>' requested here SetData<0, Length, Element>(data_, std::forward<Args>(args)...); ^ [redacted]/include/c++/v1/__memory/allocator.h:165:28: note: in instantiation of function template specialization 'Vx<float, 2>::Vx<Vx<float, 2> &>' requested here ::new ((void*)__p) _Up(_VSTD::forward<_Args>(__args)...); ^ [redacted]/include/c++/v1/__memory/allocator_traits.h:290:13: note: in instantiation of function template specialization 'std::allocator<Vx<float, 2>>::construct<Vx<float, 2>, Vx<float, 2> &>' requested here __a.construct(__p, _VSTD::forward<_Args>(__args)...); ^ [redacted]/include/c++/v1/__memory/uninitialized_algorithms.h:536:31: note: in instantiation of function template specialization 'std::allocator_traits<std::allocator<Vx<float, 2>>>::construct<Vx<float, 2>, Vx<float, 2> &, void>' requested here allocator_traits<_Alloc>::construct(__alloc, std::__to_address(__first2), *__first1); ^ [redacted]/include/c++/v1/vector:1012:22: note: in instantiation of function template specialization 'std::__uninitialized_allocator_copy<std::allocator<Vx<float, 2>>, Vx<float, 2> *, Vx<float, 2> *, Vx<float, 2> *>' requested here __tx.__pos_ = std::__uninitialized_allocator_copy(__alloc(), __first, __last, __tx.__pos_); ^ [redacted]/include/c++/v1/vector:1162:9: note: in instantiation of function template specialization 'std::vector<Vx<float, 2>>::__construct_at_end<Vx<float, 2> *>' requested here __construct_at_end(__x.__begin_, __x.__end_, __n); ^ /tmp/test.cc:35:16: note: in instantiation of member function 'std::vector<Vx<float, 2>>::vector' requested here do_something(vertices); ^ 1 error generated.
Hi,
We are seeing some binary size increase of 2-3% because of this commit. The binary size increases are due to debug information section increases and also text section increases.
Text section sizes increases between 1-2%.
I think this is a bug in your code. https://godbolt.org/z/sv8YehKhY fails the same way, but without std::vector. BTW the simplest fix would be to add Vx(Vx& v) : Vx(std::as_const(v)) {}.
@augusto2112 Could you help me with the XFAIL? I guess we don't want to # XFAIL: *, but only something like # XFAIL: clang-assertions. Do you know what exactly I should check in the XFAIL for?
@joanahalili @hans @alexfh Could you check whether the current patch fixes the binary size problems?
Unfortunately I'm not aware of anyway of xfailing a test only when assertions are enabled.
@philnik , can you please help me understand what is going on with the original example posted by @bgraur ?
The fact is that the example compiled before this patch and failed with this patch. Even if the example is buggy, I still need to understand what the semantics were when it worked. Was it using the wrong ctor? Why did it change with this patch?
I'm seeing a number of tests failing with this patch, so understanding the example might give me a clue where to look.
@philnik , can you please help me understand what is going on with the original example posted by @bgraur ?
The fact is that the example compiled before this patch and failed with this patch. Even if the example is buggy, I still need to understand what the semantics were when it worked. Was it using the wrong ctor? Why did it change with this patch?
I'm seeing a number of tests failing with this patch, so understanding the example might give me a clue where to look.
In the example the constructor
template <typename... Args> explicit Vx(Args&&... args)
is shadowing the default generated copy constructor Vx(const Vx&) = default when the rhs is not const
See https://gcc.godbolt.org/z/3GYe9444P
If you have
Vx<int, 5> v1{}; Vx<int, 5> v2{v1};
The reason is that v1 is not const, on line 2, your constructor which takes Args&& is a better match then the cosnt Vx&, simply because the argument is not const.
And obviously your constructor cannot construct Vx<int, 5> with a Vx<int, 5>
However, if the caller uses a different syntax to trigger copy constructor
Vx<int, 5> v1{}; Vx<int, 5> v2 = v1;
https://gcc.godbolt.org/z/xffnrfWao
This works. Because unlike the previous way Vx<int, 5> v2{v1}; explicitly call the copy constructor, this Vx<int, 5> v2 = v1 implicitly calls the copy constructor.
You constructor is marked as explicit so in this case your constructor is not considered and the compiler will use the default generated copy constructor Vx(const Vx&) = default.
So it might be that @philnik 's patch somehow changes the way to invoke copy constructor. In theory he could stick to original way, but relying on that sounds very fragile.
Usually if you need to write constructor that takes Args&&..., it is best practice to SFINAE out when sizeof... == 1 and remove_cvref_t<Arg> is the class itself
Thanks a lot, @huixie90!
Then it might be, that the tests I'm seeing failing were calling one ctor before this patch and another ctor with this patch.
Maybe now this ctor resolution flakiness happens where memcpy versions were called before? Thinking about how to check this...
The problem is exactly as @huixie90 described. The reason this issue became apparent with this patch is that we call the constructor during constant evaluation. During runtime the calls still get forwarded to memmove. The interesting part for you is uninitialized_algorithms.h:566-578.
It helps a little. With the original version of this (23cf42e706fb) we got:
ld.lld: error: output file too large: 4306465056 bytes
With the new version:
ld.lld: error: output file too large: 4305167360 bytes
Without any of those (at b4722cc4c96e) the binary is 4244939036 bytes
which is very close to the 4 GB limit, so we have to address that anyway, I just wanted to flag this so the growth didn't go unnoticed.
@philnik Made some experiments and the current theory is that the bulk of the increase in debug information is caused by the fact that we now destroy the elements when an exception is thrown. That was previously a bug and it is being fixed by this patch.
Given that the 4gb limit will have to be addressed regardless, we would like to land this patch (which will unblock constexpr std::vector) and tackle improvements to debug information after the LLVM 15 branch has been created. Depending on the nature of the fix, we may be able to cherry-pick back to LLVM 15.
@hans @joanahalili @alexfh Does that sound reasonable? By the way, thanks for the heads up, this sort of input is super useful in finding things that we would not be able to easily see otherwise.
Ping -- we'd like to merge this today in time for LLVM 15.
EDIT: Sorry, got into a race condition with @hans's comment.
We build with -fno-exceptions though, though I'm not sure if that affects this?
Given that the 4gb limit will have to be addressed regardless, we would like to land this patch (which will unblock constexpr std::vector) and tackle improvements to debug information after the LLVM 15 branch has been created. Depending on the nature of the fix, we may be able to cherry-pick back to LLVM 15.
@hans @joanahalili @alexfh Does that sound reasonable? By the way, thanks for the heads up, this sort of input is super useful in finding things that we would not be able to easily see otherwise.
Sounds reasonable to me, though I can't speak for @joanahalili and @alexfh.
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
533–534 | @philnik Since they say they use -fno-exceptions, I assume the issue is that std::__transaction doesn't get optimized away for when exceptions are disabled (which would make sense because they are probably compiling with lower optimization levels too). Instead, we could use #ifndef _LIBCPP_NO_EXCEPTIONS try { #endif while (__first1 != __last1) { allocator_traits<_Alloc>::construct(__alloc, std::__to_address(__first2), *__first1); ++__first1; ++__first2; } #ifndef _LIBCPP_NO_EXCEPTIONS } catch (...) { std::__allocator_destroy(__alloc_, std::reverse_iterator<_Iter>(__first2), std::reverse_iterator<_Iter>(__destruct_first)); throw; } #endif And ditch __transaction altogether. We can then look into improving codegen with __transaction separately. Under -fno-exception, the code above should be reallly close to what we had before. |
This LGTM, this should address the debug information issues at least when compiling with -fno-exceptions.
@hans Are you able to confirm this by applying the patch locally?
The CI is green -- ignore the failure on AIX, it's unrelated and it has been fixed on main.
I applied the patch locally and checked again the binary sizes. There are only mild increases which is fine on our end.
Awesome, thanks. We'll look into __transaction and how we can solve these codegen problems.
Hello, this changes behavior for the following program. Is that intentional? (It has an easy workaround -- just remove that defaulted ctor, and we only hit it in a single place as far as I know, so it's not a big problem for us -- but I thought I'd check if the change in behavior is intentional)
$ cat test.cc #include <vector> class Instruction { public: int i; }; class StructuredControlState { public: StructuredControlState(Instruction* break_merge, Instruction* merge) : break_merge_(break_merge), current_merge_(merge) {} StructuredControlState(const StructuredControlState&) = default; bool InBreakable() const { return break_merge_; } bool InStructuredFlow() const { return CurrentMergeId() != 0; } uint32_t CurrentMergeId() const; uint32_t CurrentMergeHeader() const; uint32_t BreakMergeId() const; Instruction* BreakMergeInst() const { return break_merge_; } private: Instruction* break_merge_; Instruction* current_merge_; }; class MergeReturnPass { void GenerateState() { Instruction inst; v.emplace_back(&inst, &inst); } std::vector<StructuredControlState> v; };
Before this change:
$ third_party/llvm-build/Release+Asserts/bin/clang -I buildtools/third_party/libc++/trunk/include/ -c test.cc -I buildtools/third_party/libc++ -nostdinc++ -std=c++17 -Wall -Wdeprecated-copy # fine
After this change:
$ third_party/llvm-build/Release+Asserts/bin/clang -I buildtools/third_party/libc++/trunk/include/ -c test.cc -I buildtools/third_party/libc++ -nostdinc++ -std=c++17 -Wall -Wdeprecated-copy test.cc:11:3: warning: definition of implicit copy assignment operator for 'StructuredControlState' is deprecated because it has a user-declared copy constructor [-Wdeprecated-copy] StructuredControlState(const StructuredControlState&) = default; ^ buildtools/third_party/libc++/trunk/include/__algorithm/move.h:33:15: note: in implicit copy assignment operator for 'StructuredControlState' first required here *__result = std::move(*__first); ^ buildtools/third_party/libc++/trunk/include/__algorithm/move.h:52:17: note: in instantiation of function template specialization 'std::__move_impl<StructuredControlState *, StructuredControlState *, StructuredControlState *>' requested here return std::__move_impl<_InType*, _InType*, _OutType*>(__first, __last, __result); ^ buildtools/third_party/libc++/trunk/include/__algorithm/move.h:84:8: note: in instantiation of function template specialization 'std::__move_impl<StructuredControlState, StructuredControlState, void>' requested here std::__move_impl(__last_base, __first_base, __result_first); ^ buildtools/third_party/libc++/trunk/include/__algorithm/move.h:94:21: note: in instantiation of function template specialization 'std::__move_impl<StructuredControlState *, StructuredControlState *, 0>' requested here auto __ret = std::__move_impl(std::__unwrap_iter(__first), std::__unwrap_iter(__last), std::__unwrap_iter(__result)); ^ buildtools/third_party/libc++/trunk/include/__algorithm/move.h:110:15: note: in instantiation of function template specialization 'std::__move<std::reverse_iterator<StructuredControlState *>, std::reverse_iterator<StructuredControlState *>, std::reverse_iterator<StructuredControlState *>>' requested here return std::__move(__first, __last, __result).second; ^ buildtools/third_party/libc++/trunk/include/__memory/uninitialized_algorithms.h:635:17: note: in instantiation of function template specialization 'std::move<std::reverse_iterator<StructuredControlState *>, std::reverse_iterator<StructuredControlState *>>' requested here return std::move(__first1, __last1, __first2); ^ buildtools/third_party/libc++/trunk/include/vector:914:27: note: in instantiation of function template specialization 'std::__uninitialized_allocator_move_if_noexcept<std::allocator<StructuredControlState>, std::reverse_iterator<StructuredControlState *>, std::reverse_iterator<StructuredControlState *>, StructuredControlState, void>' requested here __v.__begin_ = std::__uninitialized_allocator_move_if_noexcept( ^ buildtools/third_party/libc++/trunk/include/vector:1581:5: note: in instantiation of member function 'std::vector<StructuredControlState>::__swap_out_circular_buffer' requested here __swap_out_circular_buffer(__v); ^ buildtools/third_party/libc++/trunk/include/vector:1600:9: note: in instantiation of function template specialization 'std::vector<StructuredControlState>::__emplace_back_slow_path<Instruction *, Instruction *>' requested here __emplace_back_slow_path(_VSTD::forward<_Args>(__args)...); ^ test.cc:25:7: note: in instantiation of function template specialization 'std::vector<StructuredControlState>::emplace_back<Instruction *, Instruction *>' requested here v.emplace_back(&inst, &inst); ^
@thakis I wouldn't say the change is intentional, but the change is definitely expected. Especially, since it looks like -Wdeprecated-copy is basically a rule-of-{0, 3, 5} warning, which you violated by having the copy constructor explicitly defaulted. Although I'm not sure why clang only warns when the constructor is used instead of when the class is declared. If you want you can use clang-tidy with cppcoreguidelines-special-member-functions to ensure that all your classes follow the rule-of-0-or-5.
Thanks for the reply!
As for the question, I'm guessing it's due to false positive rate of useful instances of the warning. Clang often warns on use instead of on declaration for that reason.
Here is a sample that was compiling Ok before the change and is now broken: https://godbolt.org/z/djPG94f69
#include <vector> template <typename B> struct REAL_TYPEDEF { typedef B base_type; B v; REAL_TYPEDEF() : v(0){} explicit REAL_TYPEDEF(B v) : v(v){} inline bool operator==(const REAL_TYPEDEF<B>& rhs) const { return v == rhs.v; } }; template <typename T> inline bool operator!=(const T& lhs, const T& rhs) { return !(lhs == rhs); } namespace zim { typedef int offset_type; #define TYPEDEF(NAME, TYPE) \ struct NAME : public REAL_TYPEDEF<TYPE> { \ explicit NAME(TYPE v = 0) : REAL_TYPEDEF<TYPE>(v){} \ }; \ static_assert(sizeof(NAME) == sizeof(TYPE), ""); TYPEDEF(offset_t, offset_type) int foo(int n) { std::vector<offset_t> b; b.reserve(n); return b.size(); } }; // namespace zim
New output:
In file included from <source>:1: In file included from /opt/compiler-explorer/clang-trunk-20220802/bin/../include/c++/v1/vector:296: In file included from /opt/compiler-explorer/clang-trunk-20220802/bin/../include/c++/v1/__split_buffer:24: In file included from /opt/compiler-explorer/clang-trunk-20220802/bin/../include/c++/v1/memory:858: In file included from /opt/compiler-explorer/clang-trunk-20220802/bin/../include/c++/v1/__memory/ranges_uninitialized_algorithms.h:22: /opt/compiler-explorer/clang-trunk-20220802/bin/../include/c++/v1/__memory/uninitialized_algorithms.h:628:21: error: use of overloaded operator '!=' is ambiguous (with operand types 'std::reverse_iterator<zim::offset_t *>' and 'std::reverse_iterator<zim::offset_t *>') while (__first1 != __last1) { ~~~~~~~~ ^ ~~~~~~~ /opt/compiler-explorer/clang-trunk-20220802/bin/../include/c++/v1/vector:914:27: note: in instantiation of function template specialization 'std::__uninitialized_allocator_move_if_noexcept<std::allocator<zim::offset_t>, std::reverse_iterator<zim::offset_t *>, std::reverse_iterator<zim::offset_t *>, zim::offset_t, void>' requested here __v.__begin_ = std::__uninitialized_allocator_move_if_noexcept( ^ /opt/compiler-explorer/clang-trunk-20220802/bin/../include/c++/v1/vector:1501:9: note: in instantiation of member function 'std::vector<zim::offset_t>::__swap_out_circular_buffer' requested here __swap_out_circular_buffer(__v); ^ <source>:35:5: note: in instantiation of member function 'std::vector<zim::offset_t>::reserve' requested here b.reserve(n); ^ /opt/compiler-explorer/clang-trunk-20220802/bin/../include/c++/v1/__iterator/reverse_iterator.h:233:1: note: candidate function [with _Iter1 = zim::offset_t *, _Iter2 = zim::offset_t *] operator!=(const reverse_iterator<_Iter1>& __x, const reverse_iterator<_Iter2>& __y) ^ <source>:16:13: note: candidate function [with T = std::reverse_iterator<zim::offset_t *>] inline bool operator!=(const T& lhs, const T& rhs) { ^ 1 error generated.
If I run the following with -O2 -fno-inline, after the patch is ~50% slower
It's even worse with Asan, Msan Tsan (no -fno-inline is needed), New code calls memcpy for each P, which is intercepted.
Somehow if compiled without fno-inline and no sanitizers, performance is unaffected
Is this expected?
#include <vector> using NodeIndex = size_t; class N { public: N(NodeIndex n) : v(n) {} private: struct P { size_t a; size_t b; size_t c; }; std::vector<P> v; }; int main() { for (int i = 0; i < 1000; ++i) { N b(i*10); std::vector<N> v(i * 2, b); } }
@eaeltsin I'm not sure your code is supposed to work. It breaks every iterator wrapper AFAICT. The correct way to add an operator!= is to
- change lines 15-18 to
template <typename T> inline bool operator!=(const REAL_TYPEDEF<T>& lhs, const REAL_TYPEDEF<T>& rhs) { return !(lhs == rhs); }
- add
inline bool operator!=(const REAL_TYPEDEF<B>& rhs) const { return !(*this == rhs); }
to REAL_TYPEDEF
- use some library that adds it through CRTP, or
- use C++20
@vitalybuka This isn't unexpected. -fno-inline disables inlining, which is essential for a lot of other optimizations. Using -fno-inline pretty much defeats the optimizer: https://godbolt.org/z/zrE5o1WK1.
Thanks @philnik!
The code is from libzim - https://github.com/openzim/libzim/blob/966f7b217e9bc36dc30be6d9e46d51a2bfb7091c/src/zim_types.h#L36 . It doesn't look nice to me, and there definitely are multiple ways to make it work.
My question is more about the change in the compiler behavior when selecting the correct overloaded operator. Was the code clearly non standard-compliant before?
IIUC the code is not standards-compliant. For example it breaks https://godbolt.org/z/8jbqaY45b. I think http://eel.is/c++draft/constraints#namespace.std-7 is the interesting paragraph here, specifically (a) the overload's declaration depends on at least one user-defined type.
@vitalybuka This isn't unexpected. -fno-inline disables inlining, which is essential for a lot of other optimizations. Using -fno-inline pretty much defeats the optimizer: https://godbolt.org/z/zrE5o1WK1.
I am more concerned about sanitizers
https://godbolt.org/z/1x9qjGG19 Near LBB11_5 we have now __asan_memcpy per every "P", before it was for entire vector.
I assume some additional improvement in instrumentation are possible, maybe replacing fixed short asan_memcpy with check/load/store. Or even optimizing asan_memcpy itself.
But still maybe some ideas if it's solvable on libc++ level so we rely less on optimizations?
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
634 | Ah, this is another source of debug info growth - the std::move(iter, iter, iter) implementation instantiates std::pair (so this change added an extra 2,000 instantiations of std::pair to a clang dbg build - and std::pair isn't especially light weight with the compressed pair types, all the members, etc. Any chance the implementation of std::move could be changed to avoid using std::pair - while I realize that might involve some code duplication if the underlying helpers are used in a few places that want both parts of the paired result, it might still be worthwhile. I'll look into making a prototype. Oh, I didn't send this, and did the prototype... so results: It looks like the __move_impl doesn't actually use the pair result at all (the ranges-based move does use it, but the code isn't shared - maybe it was at some point) so I removed it, and that got the total .dwp change for this uninitialized patch + that move(iter, iter) patch be slightly negative: FILE SIZE VM SIZE -------------- -------------- +0.2% +571Ki [ = ] 0 .debug_str.dwo +0.2% +26.3Ki [ = ] 0 .debug_rnglists.dwo +0.0% +15.5Ki [ = ] 0 .debug_str_offsets.dwo +0.3% +1.67Ki [ = ] 0 .debug_loclists.dwo -0.2% -17.0Ki [ = ] 0 .debug_abbrev.dwo -0.2% -697Ki [ = ] 0 .debug_info.dwo -0.0% -99.5Ki [ = ] 0 TOTAL Net reduction of about 2k std::pair instantiations, rather than a net increase of about 2k with only the uninitialized patch. though __compressed_pair/__compressed_pair_elem instances didn't change by much - *shrug*. 25% (3k down from 4k) fewer make_pair instantiations. Few other minor things of note. the increases in construct (5933 -> 6229) and reverse_iterator (7827 -> 8786) are probably somewhat unavoidable/expected. (well, I understand the construct ones - oh, the reverse iterator ones might be for the destruction codepath that this patch mentions is a bugfix/intentionally added) I'll send out the __move pair removal cleanup shortly. |
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
634 | Yeah, I didn't look closely enough. So this'd require some code duplication or conditionality - I'm open to ideas - I'll throw up a straw-man patch at least. |
I suspect this might go away if we manually lowered std::uninitialized_foo to memcpy like we do for std::copy and std::move.
(Just in case, I had a patch to do that a while ago: https://reviews.llvm.org/D118329)
libcxx/include/__memory/uninitialized_algorithms.h | ||
---|---|---|
634 | After fixing up a few things (there were still a few mentions of std::pair in move.h and uses of first and second instead of __first_` and __second_, make_pair replaced with {}, but got the general idea) I ran that and got this comparison: FILE SIZE VM SIZE -------------- -------------- +0.6% +1.48Mi [ = ] 0 .debug_str.dwo +0.1% +513Ki [ = ] 0 .debug_info.dwo +0.2% +126Ki [ = ] 0 .debug_str_offsets.dwo +0.3% +34.5Ki [ = ] 0 .debug_rnglists.dwo +0.3% +1.67Ki [ = ] 0 .debug_loclists.dwo -0.2% -16.0Ki [ = ] 0 .debug_abbrev.dwo +0.3% +2.13Mi [ = ] 0 TOTAL (this is the diff/growth in .dwp between building clang with two different compilers, one that includes the uninitialized patch + the __libcxx_pair_ (along with a bunch of other changes - whatever's changed between our release and testing-the-next-release compiler - I can get specific versions if it's useful, but it looks like most of the growth is due to this uninitialized patch) and one that includes neither) Looking at the string count diffs - 2473 strings starting with "__libcpp_pair", 4133 -> 3309 instances of "make_pair" So, it seems it does certainly lower the cost, but not as low as not using a pair here at all/possibly duplicating the code so there's a separate copy for range-based-move. |
I'm curious to check if this resolves some of the problems we are having now... @var-const, is your patch still applicable at head?
Unfortunately, the patch hasn't been rebased for a few months. That part of the code base hasn't changed much in the meantime, but it's likely the patch won't apply cleanly.
This is just copied and re-formatted.