Background: __clear_and_shrink() (added in https://reviews.llvm.org/D41976) is used in the copy assignment of allocators. It was added as an optimization to prevent calling __shrink_to_fit(), when we have just called clear() on the string as it makes an additional allocation. Instead, it deallocates the allocated string and then sets the short string size to 0, making the active member of the union in __rep, __s I.e. a short string. It changes the representation to a short string to satisfy the invariant data()[size()] != value_type() and to prevent deallocation of the heap allocated long string a second time in the string destructor.
Issue: Since C++20, it is legal to change the active member of a union within a constant expression. However, it is not legal to read from a non-active member of a union. During constant evaluation, short string optimisation is not used so the string is always a long string. After calling __clear_and_shrink, the active member is changed to a short string. Therefore, any operations which attempt to read from __rep will cause a compiler error. So this function has the requirement that no read operations are performed on __rep after it is called.
Reproducing: The following code demonstrates the issue:
#include <cassert> #include <string> constexpr bool test() { std::string l = "Long string so that allocation definitely, for sure, absolutely happens. Probably."; l.__clear_and_shrink(); return true; } int main(int, char**) { static_assert(test()); return 0; }
This code fails with the error:
../src/clear_and_shrink_test.cpp:14:19: error: static assertion expression is not an integral constant expression static_assert(test()); ^~~~~~ /home/brendanemery/work_esr/llvm-project/build/include/c++/v1/string:1618:17: note: read of member '__l' of union with active member '__s' is not allowed in a constant expression {return __r_.first().__l.__data_;} ^ /home/brendanemery/work_esr/llvm-project/build/include/c++/v1/string:2347:47: note: in call to '&l->__get_long_pointer()' __alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap()); ^ ../src/clear_and_shrink_test.cpp:6:17: note: in call to '&l->~basic_string()' std::string l = "Long string so that allocation definitely, for sure, absolutely happens. Probably."; ^ ../src/clear_and_shrink_test.cpp:14:19: note: in call to 'test()' static_assert(test());
Solution: Since __clear_and_shrink() is only used in one place, my proposed solution is to inline the contents of the function in the calling code. This (1) maintains the intended optimisation i.e. not calling shrink_to_fit() in the copy assignment of allocators. (2) Prevents the string left being in a dangerous state after calling __clear_and_shrink() i.e. causing a compiler error if a read operation is performed on __rep.
Please run the string benchmarks to make sure this change doesn't cause performance regressions.