This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
include/
2/5
string_view
-
test/std/strings/string.view/string.view.capacity/
-
std/
-
strings/
-
string.view/
-
string.view.capacity/
2/2
capacity.pass.cpp

Differential D114395

[libc++] Fix the return value of max_size()
ClosedPublic

Authored by ldionne on Nov 22 2021, 1:47 PM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

Restricted Project

Commits

rG2db67e97712e: [libc++] Fix the return value of max_size()

Summary

I assume nobody ever uses std::string_view::max_size() outside of
testing. However, we should still return a value that is based on
something with a reasonable rationale. Previously, we would forget
to take into account the size of the character type stored in the
string. This patch factors both that and the size of the string_view
itself into account.

Thanks to @mclow.lists for pointing out this issue.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ldionne requested review of this revision.Nov 22 2021, 1:47 PM

ldionne created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 1:47 PM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added a subscriber: libcxx-commits. · View Herald Transcript

• Quuxplusone added a subscriber: • Quuxplusone.Nov 22 2021, 2:45 PM

• Quuxplusone added inline comments.

libcxx/include/string_view
334	(1) This seems unnecessarily fiddly; I don't really see anything wrong with the original. (2) If we're going to do this, let's add a line break somewhere. (3) Since `sv.size()` cannot physically return anything bigger than `PTRDIFF_MAX`, you probably want to use `ptrdiff_t` instead of `size_type` here... or maybe return `std::min(PTRDIFF_MAX, SIZE_MAX / sizeof(value_type))`? But again, this is needlessly fiddly and I see nothing wrong with just returning `SIZE_MAX` unconditionally. Literally nobody should ever be calling this method. Do any of the other methods (e.g. the constructor) need to do something special to preserve the postcondition that `sv.size() <= sv.max_size()`? Before this change, the postcondition was vacuously true; now, it could conceivably be false, and maybe we need to check for that?

Harbormaster completed remote builds in B135504: Diff 389034.Nov 22 2021, 4:29 PM

mclow.lists added inline comments.Nov 22 2021, 4:31 PM

libcxx/include/string_view
334	My preference is `return (numeric_limits<size_type>::max() / sizeof(value_type)` since this is the max number of elements that a `string_view` can hold. As for "literally nobody should be calling this method", I agree, with the exception of people writing test suites.
libcxx/test/std/strings/string.view/string.view.capacity/capacity.pass.cpp
51	this is just not true, IMHO. The sequence of `value_type`s that a `string_view`s pointer points at can (theoretically) be up to `size_t` bytes long.

ldionne marked 2 inline comments as done.Nov 24 2021, 2:40 PM

ldionne added inline comments.

libcxx/include/string_view
334	@Quuxplusone (1) This seems unnecessarily fiddly; I don't really see anything wrong with the original. If the character type is more than 1 byte long, the original is wrong. @mclow.lists My preference is return `(numeric_limits<size_type>::max() / sizeof(value_type)` since this is the max number of elements that a string_view can hold. Yeah, I agree now. I was using `- sizeof(*this)` as some sort of poor man's way of removing the current `string_view`'s memory usage from the lot. However, this is entirely misguided -- re-reading the definition of `size_t`: `std::size_t` can store the maximum size of a theoretically possible object of any type (including array). It doesn't have anything to do with the amount of memory usable on a system, it's only the maximum size of a single object. So yeah, I agree with your suggested definition.
libcxx/test/std/strings/string.view/string.view.capacity/capacity.pass.cpp
51	Yeah, agreed too -- this was basically the same misconception I explained above. Will fix this, thanks for indirectly teaching me something :-).

Apply reviewer feedback. Thanks!

Harbormaster completed remote builds in B135949: Diff 389624.Nov 24 2021, 4:00 PM

FWIW, I'm still of the (very firmly held) opinion that this patch is unnecessary. max_size() is always just an upper bound on the size of the sequence. Lowering that upper bound

might introduce actual factual errors, e.g. if we get the bound wrong — whereas size_t(-1) can never be wrong
will never help any user of libc++, because a loose upper bound (as returned by max_size) is never useful and always useless
is no closer to being "accurate," because it is just as impossible in practice to have a string_view of length 9223372036854775807 or 4611686018427387903, as to have a string_view of length 18446744073709551615. These are all equally lying numbers, and there's no point pretending that any one of them is "more useful" or "more accurate" than any other.

In D114395#3159050, @Quuxplusone wrote:

FWIW, I'm still of the (very firmly held) opinion that this patch is unnecessary. max_size() is always just an upper bound on the size of the sequence. Lowering that upper bound

might introduce actual factual errors, e.g. if we get the bound wrong — whereas size_t(-1) can never be wrong

will never help any user of libc++, because a loose upper bound (as returned by max_size) is never useful and always useless

is no closer to being "accurate," because it is just as impossible in practice to have a string_view of length 9223372036854775807 or 4611686018427387903, as to have a string_view of length 18446744073709551615. These are all equally lying numbers, and there's no point pretending that any one of them is "more useful" or "more accurate" than any other.

Consider a 32 bit system. numeric_limits<size_t>::max() is 4294967295.
Now consider: string_view(p, 3'000'000'000); i.e, a string view that encompasses 3GB of RAM.

That works today. I will concede that if you try to calculate distance(begin(), end()) you get an overflow, but all the other calls work fine. You can enumerate the elements, you can resize, etc.

In D114395#3159050, @Quuxplusone wrote:

FWIW, I'm still of the (very firmly held) opinion that this patch is unnecessary. max_size() is always just an upper bound on the size of the sequence. Lowering that upper bound

might introduce actual factual errors, e.g. if we get the bound wrong — whereas size_t(-1) can never be wrong

Which kinds of errors would you be worried about? The sizeof division itself isn't so complicated that I'd be worried about it being wrong.

is no closer to being "accurate," because it is just as impossible in practice to have a string_view of length 9223372036854775807 or 4611686018427387903, as to have a string_view of length 18446744073709551615. These are all equally lying numbers, and there's no point pretending that any one of them is "more useful" or "more accurate" than any other.

It's a bit unfortunate though that a basic_string_view<char16_t> of size 9223372036854775808 has a memory size of 0 bytes on a 64-bit system. (Or does it? The problem is that unsigned overflow is well-defined.)

Sure, the other numbers are also highly suspicious, but there are different kinds of wrong.

libcxx/include/string_view
334	Isn't the point of `max_size` to return the maximal size so that address or memory consumption computation doesn't wrap around or overflow? So I think this change is correct. Surely `max_size` is more relevant in (owning) containers, but staying below should guarantee that `size() * sizeof(value_type)` doesn't overflow. Do any of the other methods (e.g. the constructor) need to do something special to preserve the postcondition that `sv.size() <= sv.max_size()`? Before this change, the postcondition was vacuously true; now, it could conceivably be false, and maybe we need to check for that? One might add an assertion to the `basic_string_view(const _CharT*, size_type)` constructor, but it'll probably not find anything relevant.

@aaronpuchert wrote:

Which kinds of errors would you be worried about? The sizeof division itself isn't so complicated that I'd be worried about it being wrong.

Well, the original PR had had a division-and-subtraction, and then was updated to just a division when the original math turned out to have been factually wrong. https://reviews.llvm.org/D114395?vs=389034&id=389624#toc This follows the "Less code = less bugs" principle: division+subtraction turned out to have been wronger than division, and I'm just saying that division in turn is potentially wronger than no-division (and cannot possibly be righter than no-division, since no-division is a factually correct upper bound by definition).

@mclow.lists writes:

Consider a 32 bit system... string_view(p, 3'000'000'000) ... That works today.

Yes, and it will keep working after any of these proposed changes, right? We just need to maintain string_view::max_size() > 3000000000.
The interesting case for this PR is u16string_view(p, 1'500'000'000)... which is still satisfied by both the existing size_t(-1) and the proposed size_t(-1) / sizeof(char16_t). So the question is just whether any libc++ user will be looking at u16string_view::max_size() and making decisions based on its value, and if so, what value will lead to them making better decisions — size_t(-1), or size_t(-1)-divided-by-something.

• Quuxplusone added inline comments.Nov 29 2021, 2:00 PM

libcxx/include/string_view
334	Do any of the other methods (e.g. the constructor) need to do something special to preserve the postcondition that `sv.size() <= sv.max_size()`? Actually, I'm now reasonably confident that we should not assert or throw in this situation. It would even be worthwhile to add a unit test for this: std::u16string_view sv(nullptr, size_t(-1)); // all preconditions are met; this must work assert(sv.size() == size_t(-1)); It's true that accessing `sv[42]` would be UB at this point, but I think the Standard is 100% clear about the values that `sv`'s data members must have, after that constructor call. However, I admit that this is irrelevant to what value we return for `max_size()`. It's OK for `sv.size() > sv.max_size()`. (`string_view` is different from `string` in this respect.)

The definition of max_size() is the maximum number of elements that a string_view can hold.
(Actual definition: Returns: The largest possible number of char-like objects that can be referred to by a basic_string_view.)

Also, numeric_limits<size_t>::max() is the maximum size (in bytes) of any data structure.

My argument is that those two imply that sv.max_size() * sizeof(sv::value_type) (pardon the fak-ey code) should:

Not overflow.
Be no larger than numeric_limits<size_t>::max()

Interestingly enough:

libstdc++ returns: (numeric_limits<size_t>::max() - epsilon)/sizeof(Elem)
MSVC returns: min(numeric_limits<ptrdiff_t>::max(), numeric_limits<size_t>::max()/sizeof(Elem))

In D114395#3159527, @mclow.lists wrote:

The definition of max_size() is the maximum number of elements that a string_view can hold.
(Actual definition: Returns: The largest possible number of char-like objects that can be referred to by a basic_string_view.)

Also, numeric_limits<size_t>::max() is the maximum size (in bytes) of any data structure.

My argument is that those two imply that sv.max_size() * sizeof(sv::value_type) (pardon the fak-ey code) should:

Not overflow.

Be no larger than numeric_limits<size_t>::max()

This is what sways me. I think we all agree that max_size() is kind of a ridiculous method. However, the current definition is more wrong than the one proposed in this patch based on the above criteria, which I think is good. Well, I don't really care that much, but I think it's no wronger than what we are doing anyways.

Accepting as libc++ based on Marshall's review.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 6 2021, 10:53 AM

Closed by commit rG2db67e97712e: [libc++] Fix the return value of max_size() (authored by ldionne). · Explain Why

This revision was automatically updated to reflect the committed changes.

ldionne added a commit: rG2db67e97712e: [libc++] Fix the return value of max_size().

Revision Contents

Path

Size

libcxx/

include/

string_view

2 lines

test/

std/

strings/

string.view/

string.view.capacity/

capacity.pass.cpp

11 lines

Diff 389624

libcxx/include/string_view

Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	#endif
// [string.view.capacity], capacity		// [string.view.capacity], capacity
_LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY		_LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY
size_type size() const _NOEXCEPT { return __size; }		size_type size() const _NOEXCEPT { return __size; }

_LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY		_LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY
size_type length() const _NOEXCEPT { return __size; }		size_type length() const _NOEXCEPT { return __size; }

_LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY		_LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY
size_type max_size() const _NOEXCEPT { return numeric_limits<size_type>::max(); }		size_type max_size() const _NOEXCEPT { return numeric_limits<size_type>::max() / sizeof(value_type); }
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions (1) This seems unnecessarily fiddly; I don't really see anything wrong with the original. (2) If we're going to do this, let's add a line break somewhere. (3) Since `sv.size()` cannot physically return anything bigger than `PTRDIFF_MAX`, you probably want to use `ptrdiff_t` instead of `size_type` here... or maybe return `std::min(PTRDIFF_MAX, SIZE_MAX / sizeof(value_type))`? But again, this is needlessly fiddly and I see nothing wrong with just returning `SIZE_MAX` unconditionally. Literally nobody should ever be calling this method. Do any of the other methods (e.g. the constructor) need to do something special to preserve the postcondition that `sv.size() <= sv.max_size()`? Before this change, the postcondition was vacuously true; now, it could conceivably be false, and maybe we need to check for that? Quuxplusone: (1) This seems unnecessarily fiddly; I don't really see anything wrong with the original. (2)…
		mclow.listsUnsubmitted Done Reply Inline Actions My preference is `return (numeric_limits<size_type>::max() / sizeof(value_type)` since this is the max number of elements that a `string_view` can hold. As for "literally nobody should be calling this method", I agree, with the exception of people writing test suites. mclow.lists: My preference is `return (numeric_limits<size_type>::max() / sizeof(value_type)` since this is…
		ldionneAuthorUnsubmitted Done Reply Inline Actions @Quuxplusone (1) This seems unnecessarily fiddly; I don't really see anything wrong with the original. If the character type is more than 1 byte long, the original is wrong. @mclow.lists My preference is return `(numeric_limits<size_type>::max() / sizeof(value_type)` since this is the max number of elements that a string_view can hold. Yeah, I agree now. I was using `- sizeof(this)` as some sort of poor man's way of removing the current `string_view`'s memory usage from the lot. However, this is entirely misguided -- re-reading the definition of `size_t`: `std::size_t` can store the maximum size of a theoretically possible object of any type (including array). It doesn't have anything to do with the amount of memory usable on a system, it's only the maximum size of a single object. So yeah, I agree with your suggested definition. ldionne:* @Quuxplusone > (1) This seems unnecessarily fiddly; I don't really see anything wrong with…
		aaronpuchertUnsubmitted Not Done Reply Inline Actions Isn't the point of `max_size` to return the maximal size so that address or memory consumption computation doesn't wrap around or overflow? So I think this change is correct. Surely `max_size` is more relevant in (owning) containers, but staying below should guarantee that `size() * sizeof(value_type)` doesn't overflow. Do any of the other methods (e.g. the constructor) need to do something special to preserve the postcondition that `sv.size() <= sv.max_size()`? Before this change, the postcondition was vacuously true; now, it could conceivably be false, and maybe we need to check for that? One might add an assertion to the `basic_string_view(const _CharT, size_type)` constructor, but it'll probably not find anything relevant. aaronpuchert:* Isn't the point of `max_size` to return the maximal size so that address or memory consumption…
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions Do any of the other methods (e.g. the constructor) need to do something special to preserve the postcondition that `sv.size() <= sv.max_size()`? Actually, I'm now reasonably confident that we should not assert or throw in this situation. It would even be worthwhile to add a unit test for this: std::u16string_view sv(nullptr, size_t(-1)); // all preconditions are met; this must work assert(sv.size() == size_t(-1)); It's true that accessing `sv[42]` would be UB at this point, but I think the Standard is 100% clear about the values that `sv`'s data members must have, after that constructor call. However, I admit that this is irrelevant to what value we return for `max_size()`. It's OK for `sv.size() > sv.max_size()`. (`string_view` is different from `string` in this respect.) Quuxplusone: > Do any of the other methods (e.g. the constructor) need to do something special to preserve…

_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR		_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR
bool empty() const _NOEXCEPT { return __size == 0; }		bool empty() const _NOEXCEPT { return __size == 0; }

// [string.view.access], element access		// [string.view.access], element access
_LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY		_LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY
const_reference operator[](size_type __pos) const _NOEXCEPT {		const_reference operator[](size_type __pos) const _NOEXCEPT {
return _LIBCPP_ASSERT(__pos < size(), "string_view[] index out of bounds"), __data[__pos];		return _LIBCPP_ASSERT(__pos < size(), "string_view[] index out of bounds"), __data[__pos];
▲ Show 20 Lines • Show All 582 Lines • Show Last 20 Lines

libcxx/test/std/strings/string.view/string.view.capacity/capacity.pass.cpp

Show All 10 Lines
// [string.view.capacity], capacity		// [string.view.capacity], capacity
// constexpr size_type size() const noexcept;		// constexpr size_type size() const noexcept;
// constexpr size_type length() const noexcept;		// constexpr size_type length() const noexcept;
// constexpr size_type max_size() const noexcept;		// constexpr size_type max_size() const noexcept;
// constexpr bool empty() const noexcept;		// constexpr bool empty() const noexcept;

#include <string_view>		#include <string_view>
#include <cassert>		#include <cassert>
		#include <limits>

#include "test_macros.h"		#include "test_macros.h"

template<typename SV>		template<typename SV>
void test1 () {		void test1 () {
#if TEST_STD_VER > 11		#if TEST_STD_VER > 11
{		{
constexpr SV sv1;		constexpr SV sv1;
Show All 10 Lines	#endif
ASSERT_NOEXCEPT(sv1.empty());		ASSERT_NOEXCEPT(sv1.empty());
ASSERT_NOEXCEPT(sv1.max_size());		ASSERT_NOEXCEPT(sv1.max_size());
ASSERT_NOEXCEPT(sv1.length());		ASSERT_NOEXCEPT(sv1.length());
assert ( sv1.size() == 0 );		assert ( sv1.size() == 0 );
assert ( sv1.empty());		assert ( sv1.empty());
assert ( sv1.size() == sv1.length());		assert ( sv1.size() == sv1.length());
assert ( sv1.max_size() > sv1.size());		assert ( sv1.max_size() > sv1.size());
}		}

		// Sanity check max_size() -- a string_view can't store more bytes than a single object
		// can contain. Any implementation that fails this check is certainly lying.
		{
		typedef typename SV::value_type CharT;
		typedef typename SV::size_type Size;
		mclow.listsUnsubmitted Done Reply Inline Actions this is just not true, IMHO. The sequence of `value_type`s that a `string_view`s pointer points at can (theoretically) be up to `size_t` bytes long. mclow.lists: this is just not true, IMHO. The sequence of `value_type`s that a `string_view`s pointer…
		ldionneAuthorUnsubmitted Done Reply Inline Actions Yeah, agreed too -- this was basically the same misconception I explained above. Will fix this, thanks for indirectly teaching me something :-). ldionne: Yeah, agreed too -- this was basically the same misconception I explained above. Will fix this…
		SV sv;
		assert(sv.max_size() <= std::numeric_limits<Size>::max() / sizeof(CharT));
		LIBCPP_ASSERT(sv.max_size() == std::numeric_limits<Size>::max() / sizeof(CharT));
		}
}		}

template<typename CharT>		template<typename CharT>
void test2 ( const CharT *s, size_t len ) {		void test2 ( const CharT *s, size_t len ) {
{		{
std::basic_string_view<CharT> sv1 ( s );		std::basic_string_view<CharT> sv1 ( s );
assert ( sv1.size() == len );		assert ( sv1.size() == len );
assert ( sv1.data() == s );		assert ( sv1.data() == s );
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines