This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
src/
1/1
locale.cpp
-
test/std/localization/
-
std/
-
localization/
6/6
codecvt_unicode.h
-
locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/
-
category.ctype/
-
locale.codecvt/
-
locale.codecvt.members/
-
char16_t_in.pass.cpp
1/1
char16_t_out.pass.cpp
-
char32_t_in.pass.cpp
-
char32_t_out.pass.cpp
-
locale.stdcvt/
1/1
codecvt_utf8_in.pass.cpp
-
codecvt_utf8_out.pass.cpp
-
codecvt_utf8_utf16_in.pass.cpp
-
codecvt_utf8_utf16_out.pass.cpp

Differential D143349

[libc++] Fix UTF-8 decoding in codecvts. Fix #60177.
AbandonedPublic

Authored by dimztimz on Feb 5 2023, 2:09 PM.

Download Raw Diff

Details

Reviewers

ldionne
Mordante

Group Reviewers

Restricted Project

Summary

This patch fixes one case where the decoding member function in() was returning partial instead of error. Additionally, it adds large testsuite that tests conversions between UTF-8 and other encodings. The testsuite covers this bug.

Diff Detail

Event Timeline

dimztimz created this revision.Feb 5 2023, 2:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2023, 2:09 PM

dimztimz requested review of this revision.Feb 5 2023, 2:09 PM

Herald added 1 blocking reviewer(s): Restricted Project. · View Herald TranscriptFeb 5 2023, 2:09 PM

Herald added a subscriber: libcxx-commits. · View Herald Transcript

Harbormaster completed remote builds in B211956: Diff 494946.Feb 5 2023, 2:17 PM

Apply clang-format.

Harbormaster completed remote builds in B211959: Diff 494951.Feb 5 2023, 2:46 PM

Apply clang-format second time.

Harbormaster completed remote builds in B211962: Diff 494953.Feb 5 2023, 3:11 PM

Replace non-ASCII characters in strings with escape codes.

Harbormaster completed remote builds in B211969: Diff 494960.Feb 5 2023, 4:08 PM

Thanks for working on this! I just did a quick scan over the code. I really want to review it after the formatting changes are undone and the CI passes.

libcxx/src/locale.cpp
2024–2052	Can you undo the formatting changes in this hunk? It makes finding the real changes quite hard. (I know the format CI will probably complain about it, but you can ignore that. We are tuning the CI to not give these unwanted messages in the future.)
libcxx/test/std/localization/codecvt_unicode.h
9	Please add include guards.
30	There's no real reason to use trailing return types here.
36	We normally don't do this, it doesn't improve the readability of the code.
37	Just to improve the readability.
55	Please don't use `auto` here, this does not match the LLVM coding style.
libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char16_t_out.pass.cpp
32	I'm not fond of this include path, it feels quite fragile. I think it would be better to move the code to the `test/support` directory then the suggestion above works. (This is the same location as the `test_macros.h` reside.
libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_in.pass.cpp
275	This is the preferred style. For the compilers we support this works in C++03 mode. You could even consider to remove the entire `typedef` since it's only used once.

tahonermann added a subscriber: tahonermann.Feb 6 2023, 6:17 AM

Fix tests for C++03.

dimztimz marked 4 inline comments as done.Feb 6 2023, 6:56 AM

This comment was removed by dimztimz.

libcxx/test/std/localization/codecvt_unicode.h
37	Just to improve the readability.

dimztimz marked an inline comment as done.Feb 6 2023, 7:00 AM

Harbormaster completed remote builds in B212085: Diff 495109.Feb 6 2023, 9:38 AM

Resolve cosmetic issues.

dimztimz marked 3 inline comments as done.Feb 6 2023, 10:34 AM

@Mordante I think now it is ready to be reviewed. I've undone the clang-format. As for the CI, everything passes except for "Apple back deployment" which I don't know what it is.

Patch with full context. I forgot the CLI parametar -U999999 when I was generating my previous patch.

dimztimz marked an inline comment as done.Feb 6 2023, 1:28 PM

Harbormaster completed remote builds in B212199: Diff 495262.Feb 6 2023, 3:22 PM

Test codecvts with char8_t, too. Deal with apple back-deployment and properly mark test with XFAIL.

Harbormaster completed remote builds in B212884: Diff 496223.Feb 9 2023, 1:26 PM

Non-ASCII chars.

fix for windows

Harbormaster completed remote builds in B212890: Diff 496235.Feb 9 2023, 3:39 PM

Sorry for the late review, but I was quite busy last week.
Thanks a lot for fixes. I really like the additional unit tests!

Several minor issues with the patch.

libcxx/test/std/localization/codecvt_unicode.pass.cpp
21 ↗	(On Diff #496235)	Please have one declaration per line.
34 ↗	(On Diff #496235)	Can you use `std::array` instead? This is available on all platforms where we support C++03.
70 ↗	(On Diff #496235)	What is the difference between this part of the test and the one on line 52? Please add some comments.
111–115 ↗	(On Diff #496235)	I think this order of the test improves readability, same for the other tests. Now the "too small bufffer", gradually grows to the proper size and then we make the "input too small". Maybe even more readable would be to have a test case there the 3th CP exactly fits in the output.
148 ↗	(On Diff #496235)	I think it would be good to test a few more corner cases in this test. input values in the surrogate range (U+D800 to U+DBFF and U+DC00 to U+DFFF) outside the valid range > U+10FFFF
175 ↗	(On Diff #496235)	What's the difference between an ASCII byte and an invalid byte? Both are just invalid due not having the bit pattern `10xxxxxx`, right?
706 ↗	(On Diff #496235)	Can you make sure all these blocks have comments. The tests are not to easy to read, without comment I really have hard time to validate the test. Especially since you use the surrogate values here, are you testing the surrogate values fail, or that the input is malformed in other ways.

dimztimz added inline comments.Feb 13 2023, 4:16 AM

libcxx/test/std/localization/codecvt_unicode.pass.cpp
34 ↗	(On Diff #496235)	`std::array` is not a good fit in this case for three reasons: There is no inference of size. Does not play as well with string literals. Most importantly, in C++03 the member function `size()` is not constexpr.
70 ↗	(On Diff #496235)	This one calls with the full out-buffer, see bellow `out, std::end(out)`.
111–115 ↗	(On Diff #496235)	I think this is subjective, you can give arguments for few different orderings.
175 ↗	(On Diff #496235)	Well in this test-case there is no difference. But in general, in UTF-8 string if your aim is to fully decode a string then all valid sequences must be treated as valid, and any erroneous bytes between them should be either skipped, replaced with a replacement char, or reported upwards in the call chain (or some combination of these). the ASCII byte breaks the original sequence but creates a new smaller valid sequence. To reach it, once you receive error, you can push your input pointer by one and do another call to `in()` to check if there is another valid sequence further in the string.

Add more tests and comments

dimztimz marked 6 inline comments as done.Feb 17 2023, 11:19 AM

The changes look good to me. It took me a while to convince myself that the intended behavioral change is correct, but I eventually concluded that the changes match the intent in [locale.codecvt.virtuals]p5 (http://eel.is/c++draft/locale.codecvt#virtuals-5).

The tests and test methodology likewise look good to me. One suggestion to consider: In my own testing, I like to test the boundaries of each valid encoding range (see https://github.com/tahonermann/text_view/blob/master/test/test-encodings.cpp#L1333-L1357) to ensure coverage for all well-formed code unit sequences. Likewise, it can be useful to exercise that an error is produced for ill-formed code unit sequences just outside each of those boundaries.

Harbormaster completed remote builds in B214470: Diff 498456.Feb 17 2023, 4:08 PM

I like @tahonermann's suggestion to test the edge cases.

libcxx/test/std/localization/codecvt_unicode.pass.cpp
111–115 ↗	(On Diff #496235)	I agree it's subjective, it's just what feels easier for me. I'm concerned that the test is not easy to understand, even with the comments. I'm aware of the problem domain and what you are testing. Even with that knowledge I had issues understanding the test. So I fear it will be worse for people not too familiar with UTF-8 encoding.
175 ↗	(On Diff #496235)	Fair point. I think it would be good to mention the ASCII byte is a valid one code point code unit, since that is what actually matters. The test would give the same result when the code unit was the start of a multibyte code unit, right? (Except then the next code unit might be invalid again.)

In D143349#4137640, @Mordante wrote:

I like @tahonermann's suggestion to test the edge cases.

That can be done as a separate patch after this one gets accepted. One has to think how to incorporate that testing framework into this one, there is no straightforward way. That takes time. This testsuite is pretty comprehensive on its own. We should massage this one until its ready to be merged, and after than larger changes can be done.

libcxx/test/std/localization/codecvt_unicode.pass.cpp
111–115 ↗	(On Diff #496235)	The problem lies in the specification for `std::codecvt` it is underspecified and hard to understand. Everyone will have the same hard time and there is no way around it. One has to reread the specs multiple times and after that the tests should be easier to read. Maybe I can add here more comments, what do you think? Or you want me to change the order of the test cases?
175 ↗	(On Diff #496235)	I did not understand you here.

That can be done as a separate patch after this one gets accepted. One has to think how to incorporate that testing framework into this one, there is no straightforward way. That takes time. This testsuite is pretty comprehensive on its own. We should massage this one until its ready to be merged, and after than larger changes can be done.

That rationale produces a different response for me. Changing testing frameworks is tricky as it is easy to inadvertently lose coverage in the process. I see that as reason to design the testing framework to suite the eventual needs (when known) up front.

In D143349#4142982, @tahonermann wrote:

That can be done as a separate patch after this one gets accepted. One has to think how to incorporate that testing framework into this one, there is no straightforward way. That takes time. This testsuite is pretty comprehensive on its own. We should massage this one until its ready to be merged, and after than larger changes can be done.

That rationale produces a different response for me. Changing testing frameworks is tricky as it is easy to inadvertently lose coverage in the process. I see that as reason to design the testing framework to suite the eventual needs (when known) up front.

I find your concerns completely unjustified. You can always send a patch with tests in a completely separate file. I encourage you to do it. I can't do your work.

This patch is supposed to be a bugfix first, and a testsuite second, and a pretty good one too.

In D143349#4143309, @dimztimz wrote:

I find your concerns completely unjustified.

You are under no obligation to agree with them.

You can always send a patch with tests in a completely separate file. I encourage you to do it. I can't do your work.

I'm not sure what you are attributing as being "my work", nor why you would consider it my obligation. Code review is motivated by a desire to maximize quality. If you think a suggestion is a bad idea, out of scope, something you don't have time for or just don't want to do, that is certainly ok.

This patch is supposed to be a bugfix first, and a testsuite second, and a pretty good one too.

Indeed, and thank you for it. The bug that you have proposed a fix for might have have been avoided had more extensive test coverage been in place. I presume you are a user of these interfaces (I am not) and therefore have a desire for them to work reliably. My suggestion was motivated to help fill in additional testing gaps using the infrastructure you are now offering in the hopes that doing so would identify additional defects or prevent regressions in the future (which I presume you would benefit from). If you don't find that motivating, that is ok. The libc++ maintainers can (and will) determine if they are sufficiently motivated to accept any future burden of maintaining and/or improving what you have offered or whether they would like additional changes first before accepting (I am not a libc++ maintainer).

More comments. Test for surrogates in UTF-32.

dimztimz marked 3 inline comments as done.Mar 1 2023, 11:23 AM

dimztimz added inline comments.

libcxx/test/std/localization/codecvt_unicode.pass.cpp
175 ↗	(On Diff #496235)	I added additional comments here with my latest patch and I think it explains the situation much better.

Harbormaster completed remote builds in B216784: Diff 501599.Mar 1 2023, 11:33 AM

Someone should process the issue on Github, its still sitting there tagged as new issue https://github.com/llvm/llvm-project/issues/60177 .

Improve surrogate test for UTF-32

Harbormaster completed remote builds in B216850: Diff 501686.Mar 1 2023, 5:18 PM

I added some minor suggested edits, but otherwise, I think this is fine to accept.

libcxx/test/std/localization/codecvt_unicode.pass.cpp
46 ↗	(On Diff #501686)	This depends on the ordinary literal encoding being UTF-8 and that is not guaranteed (note that people are working on Clang's support for non-ASCII based operating systems). The suggested edit avoids that dependency.
100 ↗	(On Diff #501686)
158 ↗	(On Diff #501686)
174 ↗	(On Diff #501686)
301 ↗	(On Diff #501686)
336 ↗	(On Diff #501686)
387 ↗	(On Diff #501686)
454 ↗	(On Diff #501686)
507 ↗	(On Diff #501686)
570 ↗	(On Diff #501686)
713 ↗	(On Diff #501686)
748 ↗	(On Diff #501686)
806 ↗	(On Diff #501686)
889 ↗	(On Diff #501686)
942 ↗	(On Diff #501686)
991 ↗	(On Diff #501686)
1145 ↗	(On Diff #501686)
1180 ↗	(On Diff #501686)
1225 ↗	(On Diff #501686)

@dimztimz do you need anything in order to make progress on this? It looks like there's a few comments to address, then this can be rebased and seems like folks were happy with the patch.

dimztimz abandoned this revision.Oct 1 2023, 4:34 AM

Note this was moved to https://github.com/llvm/llvm-project/pull/68442

Revision Contents

Path

Size

libcxx/

src/

locale.cpp

283 lines

test/

std/

localization/

codecvt_unicode.h

1155 lines

locale.categories/

category.ctype/

locale.codecvt/

locale.codecvt.members/

char16_t_in.pass.cpp

5 lines

char16_t_out.pass.cpp

4 lines

char32_t_in.pass.cpp

5 lines

char32_t_out.pass.cpp

4 lines

locale.stdcvt/

codecvt_utf8_in.pass.cpp

24 lines

codecvt_utf8_out.pass.cpp

11 lines

codecvt_utf8_utf16_in.pass.cpp

11 lines

codecvt_utf8_utf16_out.pass.cpp

11 lines

Diff 494951

libcxx/src/locale.cpp

Show First 20 Lines • Show All 2,015 Lines • ▼ Show 20 Lines	for (; frm_nxt < frm_end && to_nxt < to_end; ++to_nxt)
return codecvt_base::error;		return codecvt_base::error;
uint16_t t = static_cast<uint16_t>(((c1 & 0x1F) << 6) \| (c2 & 0x3F));		uint16_t t = static_cast<uint16_t>(((c1 & 0x1F) << 6) \| (c2 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = t;		*to_nxt = t;
frm_nxt += 2;		frm_nxt += 2;
}		}
else if (c1 < 0xF0)		else if (c1 < 0xF0)
{		{
if (frm_end-frm_nxt < 3)		if (frm_end - frm_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
uint8_t c2 = frm_nxt[1];		uint8_t c2 = frm_nxt[1];
uint8_t c3 = frm_nxt[2];		switch (c1) {
switch (c1)
{
case 0xE0:		case 0xE0:
if ((c2 & 0xE0) != 0xA0)		if ((c2 & 0xE0) != 0xA0)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
case 0xED:		case 0xED:
if ((c2 & 0xE0) != 0x80)		if ((c2 & 0xE0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
default:		default:
if ((c2 & 0xC0) != 0x80)		if ((c2 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
}		}
		if (frm_end - frm_nxt < 3)
		return codecvt_base::partial;
		uint8_t c3 = frm_nxt[2];
if ((c3 & 0xC0) != 0x80)		if ((c3 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
uint16_t t = static_cast<uint16_t>(((c1 & 0x0F) << 12)		uint16_t t = static_cast<uint16_t>(((c1 & 0x0F) << 12)
\| ((c2 & 0x3F) << 6)		\| ((c2 & 0x3F) << 6)
\| (c3 & 0x3F));		\| (c3 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = t;		*to_nxt = t;
		MordanteUnsubmitted Done Reply Inline Actions Can you undo the formatting changes in this hunk? It makes finding the real changes quite hard. (I know the format CI will probably complain about it, but you can ignore that. We are tuning the CI to not give these unwanted messages in the future.) Mordante: Can you undo the formatting changes in this hunk? It makes finding the real changes quite hard.
frm_nxt += 3;		frm_nxt += 3;
}		}
else if (c1 < 0xF5)		else if (c1 < 0xF5)
{		{
if (frm_end-frm_nxt < 4)		if (frm_end - frm_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
uint8_t c2 = frm_nxt[1];		uint8_t c2 = frm_nxt[1];
uint8_t c3 = frm_nxt[2];		switch (c1) {
uint8_t c4 = frm_nxt[3];
switch (c1)
{
case 0xF0:		case 0xF0:
if (!(0x90 <= c2 && c2 <= 0xBF))		if (!(0x90 <= c2 && c2 <= 0xBF))
return codecvt_base::error;		return codecvt_base::error;
break;		break;
case 0xF4:		case 0xF4:
if ((c2 & 0xF0) != 0x80)		if ((c2 & 0xF0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
default:		default:
if ((c2 & 0xC0) != 0x80)		if ((c2 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
}		}
if ((c3 & 0xC0) != 0x80 \|\| (c4 & 0xC0) != 0x80)		if (frm_end - frm_nxt < 3)
		return codecvt_base::partial;
		uint8_t c3 = frm_nxt[2];
		if ((c3 & 0xC0) != 0x80)
		return codecvt_base::error;
		if (frm_end - frm_nxt < 4)
		return codecvt_base::partial;
		uint8_t c4 = frm_nxt[3];
		if ((c4 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
if (to_end-to_nxt < 2)		if (to_end-to_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
if ((((c1 & 7UL) << 18) +		if ((((c1 & 7UL) << 18) +
((c2 & 0x3FUL) << 12) +		((c2 & 0x3FUL) << 12) +
((c3 & 0x3FUL) << 6) + (c4 & 0x3F)) > Maxcode)		((c3 & 0x3FUL) << 6) + (c4 & 0x3F)) > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = static_cast<uint16_t>(		*to_nxt = static_cast<uint16_t>(
0xD800		0xD800
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	for (; frm_nxt < frm_end && to_nxt < to_end; ++to_nxt)
uint16_t t = static_cast<uint16_t>(((c1 & 0x1F) << 6) \| (c2 & 0x3F));		uint16_t t = static_cast<uint16_t>(((c1 & 0x1F) << 6) \| (c2 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = static_cast<uint32_t>(t);		*to_nxt = static_cast<uint32_t>(t);
frm_nxt += 2;		frm_nxt += 2;
}		}
else if (c1 < 0xF0)		else if (c1 < 0xF0)
{		{
if (frm_end-frm_nxt < 3)		if (frm_end - frm_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
uint8_t c2 = frm_nxt[1];		uint8_t c2 = frm_nxt[1];
uint8_t c3 = frm_nxt[2];		switch (c1) {
switch (c1)
{
case 0xE0:		case 0xE0:
if ((c2 & 0xE0) != 0xA0)		if ((c2 & 0xE0) != 0xA0)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
case 0xED:		case 0xED:
if ((c2 & 0xE0) != 0x80)		if ((c2 & 0xE0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
default:		default:
if ((c2 & 0xC0) != 0x80)		if ((c2 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
}		}
		if (frm_end - frm_nxt < 3)
		return codecvt_base::partial;
		uint8_t c3 = frm_nxt[2];
if ((c3 & 0xC0) != 0x80)		if ((c3 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
uint16_t t = static_cast<uint16_t>(((c1 & 0x0F) << 12)		uint16_t t = static_cast<uint16_t>(((c1 & 0x0F) << 12)
\| ((c2 & 0x3F) << 6)		\| ((c2 & 0x3F) << 6)
\| (c3 & 0x3F));		\| (c3 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = static_cast<uint32_t>(t);		*to_nxt = static_cast<uint32_t>(t);
frm_nxt += 3;		frm_nxt += 3;
}		}
else if (c1 < 0xF5)		else if (c1 < 0xF5)
{		{
if (frm_end-frm_nxt < 4)		if (frm_end - frm_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
uint8_t c2 = frm_nxt[1];		uint8_t c2 = frm_nxt[1];
uint8_t c3 = frm_nxt[2];		switch (c1) {
uint8_t c4 = frm_nxt[3];
switch (c1)
{
case 0xF0:		case 0xF0:
if (!(0x90 <= c2 && c2 <= 0xBF))		if (!(0x90 <= c2 && c2 <= 0xBF))
return codecvt_base::error;		return codecvt_base::error;
break;		break;
case 0xF4:		case 0xF4:
if ((c2 & 0xF0) != 0x80)		if ((c2 & 0xF0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
default:		default:
if ((c2 & 0xC0) != 0x80)		if ((c2 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
}		}
if ((c3 & 0xC0) != 0x80 \|\| (c4 & 0xC0) != 0x80)		if (frm_end - frm_nxt < 3)
		return codecvt_base::partial;
		uint8_t c3 = frm_nxt[2];
		if ((c3 & 0xC0) != 0x80)
		return codecvt_base::error;
		if (frm_end - frm_nxt < 4)
		return codecvt_base::partial;
		uint8_t c4 = frm_nxt[3];
		if ((c4 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
if (to_end-to_nxt < 2)		if (to_end-to_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
if ((((c1 & 7UL) << 18) +		if ((((c1 & 7UL) << 18) +
((c2 & 0x3FUL) << 12) +		((c2 & 0x3FUL) << 12) +
((c3 & 0x3FUL) << 6) + (c4 & 0x3F)) > Maxcode)		((c3 & 0x3FUL) << 6) + (c4 & 0x3F)) > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = static_cast<uint32_t>(		*to_nxt = static_cast<uint32_t>(
0xD800		0xD800
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	for (; frm_nxt < frm_end && to_nxt < to_end; ++to_nxt)
\| (c2 & 0x3F));		\| (c2 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = t;		*to_nxt = t;
frm_nxt += 2;		frm_nxt += 2;
}		}
else if (c1 < 0xF0)		else if (c1 < 0xF0)
{		{
if (frm_end-frm_nxt < 3)		if (frm_end - frm_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
uint8_t c2 = frm_nxt[1];		uint8_t c2 = frm_nxt[1];
uint8_t c3 = frm_nxt[2];		switch (c1) {
switch (c1)
{
case 0xE0:		case 0xE0:
if ((c2 & 0xE0) != 0xA0)		if ((c2 & 0xE0) != 0xA0)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
case 0xED:		case 0xED:
if ((c2 & 0xE0) != 0x80)		if ((c2 & 0xE0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
default:		default:
if ((c2 & 0xC0) != 0x80)		if ((c2 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
}		}
		if (frm_end - frm_nxt < 3)
		return codecvt_base::partial;
		uint8_t c3 = frm_nxt[2];
if ((c3 & 0xC0) != 0x80)		if ((c3 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
uint32_t t = static_cast<uint32_t>(((c1 & 0x0F) << 12)		uint32_t t = static_cast<uint32_t>(((c1 & 0x0F) << 12)
\| ((c2 & 0x3F) << 6)		\| ((c2 & 0x3F) << 6)
\| (c3 & 0x3F));		\| (c3 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = t;		*to_nxt = t;
frm_nxt += 3;		frm_nxt += 3;
}		}
else if (c1 < 0xF5)		else if (c1 < 0xF5)
{		{
if (frm_end-frm_nxt < 4)		if (frm_end - frm_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
uint8_t c2 = frm_nxt[1];		uint8_t c2 = frm_nxt[1];
uint8_t c3 = frm_nxt[2];		switch (c1) {
uint8_t c4 = frm_nxt[3];
switch (c1)
{
case 0xF0:		case 0xF0:
if (!(0x90 <= c2 && c2 <= 0xBF))		if (!(0x90 <= c2 && c2 <= 0xBF))
return codecvt_base::error;		return codecvt_base::error;
break;		break;
case 0xF4:		case 0xF4:
if ((c2 & 0xF0) != 0x80)		if ((c2 & 0xF0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
default:		default:
if ((c2 & 0xC0) != 0x80)		if ((c2 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
}		}
if ((c3 & 0xC0) != 0x80 \|\| (c4 & 0xC0) != 0x80)		if (frm_end - frm_nxt < 3)
		return codecvt_base::partial;
		uint8_t c3 = frm_nxt[2];
		if ((c3 & 0xC0) != 0x80)
		return codecvt_base::error;
		if (frm_end - frm_nxt < 4)
		return codecvt_base::partial;
		uint8_t c4 = frm_nxt[3];
		if ((c4 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
uint32_t t = static_cast<uint32_t>(((c1 & 0x07) << 18)		uint32_t t = static_cast<uint32_t>(((c1 & 0x07) << 18)
\| ((c2 & 0x3F) << 12)		\| ((c2 & 0x3F) << 12)
\| ((c3 & 0x3F) << 6)		\| ((c3 & 0x3F) << 6)
\| (c4 & 0x3F));		\| (c4 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = t;		*to_nxt = t;
frm_nxt += 4;		frm_nxt += 4;
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	for (; frm_nxt < frm_end && to_nxt < to_end; ++to_nxt)
\| (c2 & 0x3F));		\| (c2 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = t;		*to_nxt = t;
frm_nxt += 2;		frm_nxt += 2;
}		}
else if (c1 < 0xF0)		else if (c1 < 0xF0)
{		{
if (frm_end-frm_nxt < 3)		if (frm_end - frm_nxt < 2)
return codecvt_base::partial;		return codecvt_base::partial;
uint8_t c2 = frm_nxt[1];		uint8_t c2 = frm_nxt[1];
uint8_t c3 = frm_nxt[2];		switch (c1) {
switch (c1)
{
case 0xE0:		case 0xE0:
if ((c2 & 0xE0) != 0xA0)		if ((c2 & 0xE0) != 0xA0)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
case 0xED:		case 0xED:
if ((c2 & 0xE0) != 0x80)		if ((c2 & 0xE0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
default:		default:
if ((c2 & 0xC0) != 0x80)		if ((c2 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
break;		break;
}		}
		if (frm_end - frm_nxt < 3)
		return codecvt_base::partial;
		uint8_t c3 = frm_nxt[2];
if ((c3 & 0xC0) != 0x80)		if ((c3 & 0xC0) != 0x80)
return codecvt_base::error;		return codecvt_base::error;
uint16_t t = static_cast<uint16_t>(((c1 & 0x0F) << 12)		uint16_t t = static_cast<uint16_t>(((c1 & 0x0F) << 12)
\| ((c2 & 0x3F) << 6)		\| ((c2 & 0x3F) << 6)
\| (c3 & 0x3F));		\| (c3 & 0x3F));
if (t > Maxcode)		if (t > Maxcode)
return codecvt_base::error;		return codecvt_base::error;
*to_nxt = t;		*to_nxt = t;
▲ Show 20 Lines • Show All 3,898 Lines • Show Last 20 Lines

libcxx/test/std/localization/codecvt_unicode.h

This file was added.

//===----------------------------------------------------------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#include <algorithm>

MordanteUnsubmitted

Done

Please add include guards.

Mordante: Please add include guards.

#include <locale>

#include <string>

#include <cassert>

struct test_offsets_ok {

size_t in_size, out_size;

};

struct test_offsets_partial {

size_t in_size, out_size, expected_in_next, expected_out_next;

};

template <class CharT>

struct test_offsets_error {

size_t in_size, out_size, expected_in_next, expected_out_next;

CharT replace_char;

size_t replace_pos;

};

template <class T, size_t N>

auto constexpr array_size(const T (&)[N]) -> size_t {

return N;

MordanteUnsubmitted

Done

template <class T, size_t N>

- auto constexpr array_size(const T (&)[N]) -> size_t {

+ size_t constexpr array_size(const T (&)[N]) {

return N;

There's no real reason to use trailing return types here.

Mordante: There's no real reason to use trailing return types here.

}

template <class CharT>

void utf8_to_utf32_in_ok(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

MordanteUnsubmitted

Done

void utf8_to_utf32_in_ok(const std::codecvt<CharT, char, mbstate_t>& cvt) {

- using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

We normally don't do this, it doesn't improve the readability of the code.

Mordante: We normally don't do this, it doesn't improve the readability of the code.

const char in[] = "bш\uAAAA\U0010AAAA";

MordanteUnsubmitted

Done

using namespace std;

- // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

+ // UTF-8 string of 1-byte code point (CP), 2-byte CP, 3-byte CP and 4-byte CP

const char in[] = "b\u0448\uAAAA\U0010AAAA";

Just to improve the readability.

Mordante: Just to improve the readability.

dimztimzAuthorUnsubmitted

Done

Just to improve the readability.

dimztimz: > Just to improve the readability.

const char32_t exp_literal[] = U"bш\uAAAA\U0010AAAA";

CharT exp[array_size(exp_literal)] = {};

std::copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(in) == 11, "");

static_assert(array_size(exp_literal) == 5, "");

static_assert(array_size(exp) == 5, "");

assert(char_traits<char>::length(in) == 10);

assert(char_traits<char32_t>::length(exp_literal) == 4);

assert(char_traits<CharT>::length(exp) == 4);

test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {3, 2}, {6, 3}, {10, 4}};

for (auto t : offsets) {

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

MordanteUnsubmitted

Done

Please don't use auto here, this does not match the LLVM coding style.

Mordante: Please don't use `auto` here, this does not match the LLVM coding style.

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<CharT>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

for (auto t : offsets) {

CharT out[array_size(exp)] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, end(out), out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<CharT>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

template <class CharT>

void utf8_to_utf32_in_partial(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

const char in[] = "bш\uAAAA\U0010AAAA";

const char32_t exp_literal[] = U"bш\uAAAA\U0010AAAA";

CharT exp[array_size(exp_literal)] = {};

std::copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(in) == 11, "");

static_assert(array_size(exp_literal) == 5, "");

static_assert(array_size(exp) == 5, "");

assert(char_traits<char>::length(in) == 10);

assert(char_traits<char32_t>::length(exp_literal) == 4);

assert(char_traits<CharT>::length(exp) == 4);

test_offsets_partial offsets[] = {

{1, 0, 0, 0}, // no space for first CP

{3, 1, 1, 1}, // no space for second CP

{2, 2, 1, 1}, // incomplete second CP

{2, 1, 1, 1}, // incomplete second CP, and no space for it

{6, 2, 3, 2}, // no space for third CP

{4, 3, 3, 2}, // incomplete third CP

{5, 3, 3, 2}, // incomplete third CP

{4, 2, 3, 2}, // incomplete third CP, and no space for it

{5, 2, 3, 2}, // incomplete third CP, and no space for it

{10, 3, 6, 3}, // no space for fourth CP

{7, 4, 6, 3}, // incomplete fourth CP

{8, 4, 6, 3}, // incomplete fourth CP

{9, 4, 6, 3}, // incomplete fourth CP

{7, 3, 6, 3}, // incomplete fourth CP, and no space for it

{8, 3, 6, 3}, // incomplete fourth CP, and no space for it

{9, 3, 6, 3}, // incomplete fourth CP, and no space for it

};

for (auto t : offsets) {

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.partial);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<CharT>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf8_to_utf32_in_error(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

const char valid_in[] = "bш\uAAAA\U0010AAAA";

const char32_t exp_literal[] = U"bш\uAAAA\U0010AAAA";

CharT exp[array_size(exp_literal)] = {};

std::copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(valid_in) == 11, "");

static_assert(array_size(exp_literal) == 5, "");

static_assert(array_size(exp) == 5, "");

assert(char_traits<char>::length(valid_in) == 10);

assert(char_traits<char32_t>::length(exp_literal) == 4);

assert(char_traits<CharT>::length(exp) == 4);

test_offsets_error<char> offsets[] = {

// replace leading byte with invalid byte

{1, 4, 0, 0, '\xFF', 0},

{3, 4, 1, 1, '\xFF', 1},

{6, 4, 3, 2, '\xFF', 3},

{10, 4, 6, 3, '\xFF', 6},

// replace first trailing byte with ASCII byte

{3, 4, 1, 1, 'z', 2},

{6, 4, 3, 2, 'z', 4},

{10, 4, 6, 3, 'z', 7},

// replace first trailing byte with invalid byte

{3, 4, 1, 1, '\xFF', 2},

{6, 4, 3, 2, '\xFF', 4},

{10, 4, 6, 3, '\xFF', 7},

// replace second trailing byte with ASCII byte

{6, 4, 3, 2, 'z', 5},

{10, 4, 6, 3, 'z', 8},

// replace second trailing byte with invalid byte

{6, 4, 3, 2, '\xFF', 5},

{10, 4, 6, 3, '\xFF', 8},

// replace third trailing byte

{10, 4, 6, 3, 'z', 9},

{10, 4, 6, 3, '\xFF', 9},

// replace first trailing byte with ASCII byte, also incomplete at end

{5, 4, 3, 2, 'z', 4},

{8, 4, 6, 3, 'z', 7},

{9, 4, 6, 3, 'z', 7},

// replace first trailing byte with invalid byte, also incomplete at end

{5, 4, 3, 2, '\xFF', 4},

{8, 4, 6, 3, '\xFF', 7},

{9, 4, 6, 3, '\xFF', 7},

// replace second trailing byte with ASCII byte, also incomplete at end

{9, 4, 6, 3, 'z', 8},

// replace second trailing byte with invalid byte, also incomplete at end

{9, 4, 6, 3, '\xFF', 8},

};

for (auto t : offsets) {

char in[array_size(valid_in)] = {};

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

char_traits<char>::copy(in, valid_in, array_size(valid_in));

in[t.replace_pos] = t.replace_char;

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.error);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<CharT>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf8_to_utf32_in(const std::codecvt<CharT, char, mbstate_t>& cvt) {

utf8_to_utf32_in_ok(cvt);

utf8_to_utf32_in_partial(cvt);

utf8_to_utf32_in_error(cvt);

}

template <class CharT>

void utf32_to_utf8_out_ok(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

const char32_t in_literal[] = U"bш\uAAAA\U0010AAAA";

const char exp[] = "bш\uAAAA\U0010AAAA";

CharT in[array_size(in_literal)] = {};

copy(begin(in_literal), end(in_literal), begin(in));

static_assert(array_size(in_literal) == 5, "");

static_assert(array_size(in) == 5, "");

static_assert(array_size(exp) == 11, "");

assert(char_traits<char32_t>::length(in_literal) == 4);

assert(char_traits<CharT>::length(in) == 4);

assert(char_traits<char>::length(exp) == 10);

const test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {4, 10}};

for (auto t : offsets) {

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<char>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

template <class CharT>

void utf32_to_utf8_out_partial(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

const char32_t in_literal[] = U"bш\uAAAA\U0010AAAA";

const char exp[] = "bш\uAAAA\U0010AAAA";

CharT in[array_size(in_literal)] = {};

copy(begin(in_literal), end(in_literal), begin(in));

static_assert(array_size(in_literal) == 5, "");

static_assert(array_size(in) == 5, "");

static_assert(array_size(exp) == 11, "");

assert(char_traits<char32_t>::length(in_literal) == 4);

assert(char_traits<CharT>::length(in) == 4);

assert(char_traits<char>::length(exp) == 10);

const test_offsets_partial offsets[] = {

{1, 0, 0, 0}, // no space for first CP

{2, 1, 1, 1}, // no space for second CP

{2, 2, 1, 1}, // no space for second CP

{3, 3, 2, 3}, // no space for third CP

{3, 4, 2, 3}, // no space for third CP

{3, 5, 2, 3}, // no space for third CP

{4, 6, 3, 6}, // no space for fourth CP

{4, 7, 3, 6}, // no space for fourth CP

{4, 8, 3, 6}, // no space for fourth CP

{4, 9, 3, 6}, // no space for fourth CP

};

for (auto t : offsets) {

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.partial);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<char>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf32_to_utf8_out_error(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

const char32_t valid_in[] = U"bш\uAAAA\U0010AAAA";

const char exp[] = "bш\uAAAA\U0010AAAA";

static_assert(array_size(valid_in) == 5, "");

static_assert(array_size(exp) == 11, "");

assert(char_traits<char32_t>::length(valid_in) == 4);

assert(char_traits<char>::length(exp) == 10);

test_offsets_error<CharT> offsets[] = {

{4, 10, 0, 0, 0x00110000, 0},

{4, 10, 1, 1, 0x00110000, 1},

{4, 10, 2, 3, 0x00110000, 2},

{4, 10, 3, 6, 0x00110000, 3}};

for (auto t : offsets) {

CharT in[array_size(valid_in)] = {};

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

copy(begin(valid_in), end(valid_in), begin(in));

in[t.replace_pos] = t.replace_char;

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.error);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<char>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf32_to_utf8_out(const std::codecvt<CharT, char, mbstate_t>& cvt) {

utf32_to_utf8_out_ok(cvt);

utf32_to_utf8_out_partial(cvt);

utf32_to_utf8_out_error(cvt);

}

template <class CharT>

void test_utf8_utf32_codecvts(const std::codecvt<CharT, char, mbstate_t>& cvt) {

utf8_to_utf32_in(cvt);

utf32_to_utf8_out(cvt);

}

template <class CharT>

void utf8_to_utf16_in_ok(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

const char in[] = "bш\uAAAA\U0010AAAA";

const char16_t exp_literal[] = u"bш\uAAAA\U0010AAAA";

CharT exp[array_size(exp_literal)] = {};

copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(in) == 11, "");

static_assert(array_size(exp_literal) == 6, "");

static_assert(array_size(exp) == 6, "");

assert(char_traits<char>::length(in) == 10);

assert(char_traits<char16_t>::length(exp_literal) == 5);

assert(char_traits<CharT>::length(exp) == 5);

test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {3, 2}, {6, 3}, {10, 5}};

for (auto t : offsets) {

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<CharT>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

for (auto t : offsets) {

CharT out[array_size(exp)] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, end(out), out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<CharT>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

template <class CharT>

void utf8_to_utf16_in_partial(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

const char in[] = "bш\uAAAA\U0010AAAA";

const char16_t exp_literal[] = u"bш\uAAAA\U0010AAAA";

CharT exp[array_size(exp_literal)] = {};

copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(in) == 11, "");

static_assert(array_size(exp_literal) == 6, "");

static_assert(array_size(exp) == 6, "");

assert(char_traits<char>::length(in) == 10);

assert(char_traits<char16_t>::length(exp_literal) == 5);

assert(char_traits<CharT>::length(exp) == 5);

test_offsets_partial offsets[] = {

{1, 0, 0, 0}, // no space for first CP

{3, 1, 1, 1}, // no space for second CP

{2, 2, 1, 1}, // incomplete second CP

{2, 1, 1, 1}, // incomplete second CP, and no space for it

{6, 2, 3, 2}, // no space for third CP

{4, 3, 3, 2}, // incomplete third CP

{5, 3, 3, 2}, // incomplete third CP

{4, 2, 3, 2}, // incomplete third CP, and no space for it

{5, 2, 3, 2}, // incomplete third CP, and no space for it

{10, 3, 6, 3}, // no space for fourth CP

{10, 4, 6, 3}, // no space for fourth CP

{7, 5, 6, 3}, // incomplete fourth CP

{8, 5, 6, 3}, // incomplete fourth CP

{9, 5, 6, 3}, // incomplete fourth CP

{7, 3, 6, 3}, // incomplete fourth CP, and no space for it

{8, 3, 6, 3}, // incomplete fourth CP, and no space for it

{9, 3, 6, 3}, // incomplete fourth CP, and no space for it

{7, 4, 6, 3}, // incomplete fourth CP, and no space for it

{8, 4, 6, 3}, // incomplete fourth CP, and no space for it

{9, 4, 6, 3}, // incomplete fourth CP, and no space for it

};

for (auto t : offsets) {

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.partial);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<CharT>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf8_to_utf16_in_error(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

const char valid_in[] = "bш\uAAAA\U0010AAAA";

const char16_t exp_literal[] = u"bш\uAAAA\U0010AAAA";

CharT exp[array_size(exp_literal)] = {};

copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(valid_in) == 11, "");

static_assert(array_size(exp_literal) == 6, "");

static_assert(array_size(exp) == 6, "");

assert(char_traits<char>::length(valid_in) == 10);

assert(char_traits<char16_t>::length(exp_literal) == 5);

assert(char_traits<CharT>::length(exp) == 5);

test_offsets_error<char> offsets[] = {

// replace leading byte with invalid byte

{1, 5, 0, 0, '\xFF', 0},

{3, 5, 1, 1, '\xFF', 1},

{6, 5, 3, 2, '\xFF', 3},

{10, 5, 6, 3, '\xFF', 6},

// replace first trailing byte with ASCII byte

{3, 5, 1, 1, 'z', 2},

{6, 5, 3, 2, 'z', 4},

{10, 5, 6, 3, 'z', 7},

// replace first trailing byte with invalid byte

{3, 5, 1, 1, '\xFF', 2},

{6, 5, 3, 2, '\xFF', 4},

{10, 5, 6, 3, '\xFF', 7},

// replace second trailing byte with ASCII byte

{6, 5, 3, 2, 'z', 5},

{10, 5, 6, 3, 'z', 8},

// replace second trailing byte with invalid byte

{6, 5, 3, 2, '\xFF', 5},

{10, 5, 6, 3, '\xFF', 8},

// replace third trailing byte

{10, 5, 6, 3, 'z', 9},

{10, 5, 6, 3, '\xFF', 9},

// replace first trailing byte with ASCII byte, also incomplete at end

{5, 5, 3, 2, 'z', 4},

{8, 5, 6, 3, 'z', 7},

{9, 5, 6, 3, 'z', 7},

// replace first trailing byte with invalid byte, also incomplete at end

{5, 5, 3, 2, '\xFF', 4},

{8, 5, 6, 3, '\xFF', 7},

{9, 5, 6, 3, '\xFF', 7},

// replace second trailing byte with ASCII byte, also incomplete at end

{9, 5, 6, 3, 'z', 8},

// replace second trailing byte with invalid byte, also incomplete at end

{9, 5, 6, 3, '\xFF', 8},

};

for (auto t : offsets) {

char in[array_size(valid_in)] = {};

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

char_traits<char>::copy(in, valid_in, array_size(valid_in));

in[t.replace_pos] = t.replace_char;

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.error);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<CharT>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf8_to_utf16_in(const std::codecvt<CharT, char, mbstate_t>& cvt) {

utf8_to_utf16_in_ok(cvt);

utf8_to_utf16_in_partial(cvt);

utf8_to_utf16_in_error(cvt);

}

template <class CharT>

void utf16_to_utf8_out_ok(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

const char16_t in_literal[] = u"bш\uAAAA\U0010AAAA";

const char exp[] = "bш\uAAAA\U0010AAAA";

CharT in[array_size(in_literal)];

copy(begin(in_literal), end(in_literal), begin(in));

static_assert(array_size(in_literal) == 6, "");

static_assert(array_size(exp) == 11, "");

static_assert(array_size(in) == 6, "");

assert(char_traits<char16_t>::length(in_literal) == 5);

assert(char_traits<char>::length(exp) == 10);

assert(char_traits<CharT>::length(in) == 5);

const test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {5, 10}};

for (auto t : offsets) {

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<char>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

template <class CharT>

void utf16_to_utf8_out_partial(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP

const char16_t in_literal[] = u"bш\uAAAA\U0010AAAA";

const char exp[] = "bш\uAAAA\U0010AAAA";

CharT in[array_size(in_literal)];

copy(begin(in_literal), end(in_literal), begin(in));

static_assert(array_size(in_literal) == 6, "");

static_assert(array_size(exp) == 11, "");

static_assert(array_size(in) == 6, "");

assert(char_traits<char16_t>::length(in_literal) == 5);

assert(char_traits<char>::length(exp) == 10);

assert(char_traits<CharT>::length(in) == 5);

const test_offsets_partial offsets[] = {

{1, 0, 0, 0}, // no space for first CP

{2, 1, 1, 1}, // no space for second CP

{2, 2, 1, 1}, // no space for second CP

{3, 3, 2, 3}, // no space for third CP

{3, 4, 2, 3}, // no space for third CP

{3, 5, 2, 3}, // no space for third CP

{5, 6, 3, 6}, // no space for fourth CP

{5, 7, 3, 6}, // no space for fourth CP

{5, 8, 3, 6}, // no space for fourth CP

{5, 9, 3, 6}, // no space for fourth CP

{4, 10, 3, 6}, // incomplete fourth CP

{4, 6, 3, 6}, // incomplete fourth CP, and no space for it

{4, 7, 3, 6}, // incomplete fourth CP, and no space for it

{4, 8, 3, 6}, // incomplete fourth CP, and no space for it

{4, 9, 3, 6}, // incomplete fourth CP, and no space for it

};

for (auto t : offsets) {

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.partial);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<char>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf16_to_utf8_out_error(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

const char16_t valid_in[] = u"bш\uAAAA\U0010AAAA";

const char exp[] = "bш\uAAAA\U0010AAAA";

static_assert(array_size(valid_in) == 6, "");

static_assert(array_size(exp) == 11, "");

assert(char_traits<char16_t>::length(valid_in) == 5);

assert(char_traits<char>::length(exp) == 10);

test_offsets_error<CharT> offsets[] = {

{5, 10, 0, 0, 0xD800, 0},

{5, 10, 0, 0, 0xDBFF, 0},

{5, 10, 0, 0, 0xDC00, 0},

{5, 10, 0, 0, 0xDFFF, 0},

{5, 10, 1, 1, 0xD800, 1},

{5, 10, 1, 1, 0xDBFF, 1},

{5, 10, 1, 1, 0xDC00, 1},

{5, 10, 1, 1, 0xDFFF, 1},

{5, 10, 2, 3, 0xD800, 2},

{5, 10, 2, 3, 0xDBFF, 2},

{5, 10, 2, 3, 0xDC00, 2},

{5, 10, 2, 3, 0xDFFF, 2},

// make the leading surrogate a trailing one

{5, 10, 3, 6, 0xDC00, 3},

{5, 10, 3, 6, 0xDFFF, 3},

// make the trailing surrogate a leading one

{5, 10, 3, 6, 0xD800, 4},

{5, 10, 3, 6, 0xDBFF, 4},

// make the trailing surrogate a BMP char

{5, 10, 3, 6, u'z', 4},

};

for (auto t : offsets) {

CharT in[array_size(valid_in)] = {};

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

copy(begin(valid_in), end(valid_in), begin(in));

in[t.replace_pos] = t.replace_char;

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.error);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<char>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf16_to_utf8_out(const std::codecvt<CharT, char, mbstate_t>& cvt) {

utf16_to_utf8_out_ok(cvt);

utf16_to_utf8_out_partial(cvt);

utf16_to_utf8_out_error(cvt);

}

template <class CharT>

void test_utf8_utf16_cvts(const std::codecvt<CharT, char, mbstate_t>& cvt) {

utf8_to_utf16_in(cvt);

utf16_to_utf8_out(cvt);

}

template <class CharT>

void utf8_to_ucs2_in_ok(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP

const char in[] = "bш\uAAAA";

const char16_t exp_literal[] = u"bш\uAAAA";

CharT exp[array_size(exp_literal)] = {};

copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(in) == 7, "");

static_assert(array_size(exp_literal) == 4, "");

static_assert(array_size(exp) == 4, "");

assert(char_traits<char>::length(in) == 6);

assert(char_traits<char16_t>::length(exp_literal) == 3);

assert(char_traits<CharT>::length(exp) == 3);

test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {3, 2}, {6, 3}};

for (auto t : offsets) {

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<CharT>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

for (auto t : offsets) {

CharT out[array_size(exp)] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, end(out), out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<CharT>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

template <class CharT>

void utf8_to_ucs2_in_partial(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP

const char in[] = "bш\uAAAA";

const char16_t exp_literal[] = u"bш\uAAAA";

CharT exp[array_size(exp_literal)] = {};

copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(in) == 7, "");

static_assert(array_size(exp_literal) == 4, "");

static_assert(array_size(exp) == 4, "");

assert(char_traits<char>::length(in) == 6);

assert(char_traits<char16_t>::length(exp_literal) == 3);

assert(char_traits<CharT>::length(exp) == 3);

test_offsets_partial offsets[] = {

{1, 0, 0, 0}, // no space for first CP

{3, 1, 1, 1}, // no space for second CP

{2, 2, 1, 1}, // incomplete second CP

{2, 1, 1, 1}, // incomplete second CP, and no space for it

{6, 2, 3, 2}, // no space for third CP

{4, 3, 3, 2}, // incomplete third CP

{5, 3, 3, 2}, // incomplete third CP

{4, 2, 3, 2}, // incomplete third CP, and no space for it

{5, 2, 3, 2}, // incomplete third CP, and no space for it

};

for (auto t : offsets) {

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.partial);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<CharT>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf8_to_ucs2_in_error(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

const char valid_in[] = "bш\uAAAA\U0010AAAA";

const char16_t exp_literal[] = u"bш\uAAAA\U0010AAAA";

CharT exp[array_size(exp_literal)] = {};

copy(begin(exp_literal), end(exp_literal), begin(exp));

static_assert(array_size(valid_in) == 11, "");

static_assert(array_size(exp_literal) == 6, "");

static_assert(array_size(exp) == 6, "");

assert(char_traits<char>::length(valid_in) == 10);

assert(char_traits<char16_t>::length(exp_literal) == 5);

assert(char_traits<CharT>::length(exp) == 5);

test_offsets_error<char> offsets[] = {

// replace leading byte with invalid byte

{1, 5, 0, 0, '\xFF', 0},

{3, 5, 1, 1, '\xFF', 1},

{6, 5, 3, 2, '\xFF', 3},

{10, 5, 6, 3, '\xFF', 6},

// replace first trailing byte with ASCII byte

{3, 5, 1, 1, 'z', 2},

{6, 5, 3, 2, 'z', 4},

{10, 5, 6, 3, 'z', 7},

// replace first trailing byte with invalid byte

{3, 5, 1, 1, '\xFF', 2},

{6, 5, 3, 2, '\xFF', 4},

{10, 5, 6, 3, '\xFF', 7},

// replace second trailing byte with ASCII byte

{6, 5, 3, 2, 'z', 5},

{10, 5, 6, 3, 'z', 8},

// replace second trailing byte with invalid byte

{6, 5, 3, 2, '\xFF', 5},

{10, 5, 6, 3, '\xFF', 8},

// replace third trailing byte

{10, 5, 6, 3, 'z', 9},

{10, 5, 6, 3, '\xFF', 9},

// When we see a leading byte of 4-byte CP, we should return error, no

// matter if it is incomplete at the end or has errors in the trailing

// bytes.

// Don't replace anything, show full 4-byte CP

{10, 4, 6, 3, 'b', 0},

{10, 5, 6, 3, 'b', 0},

// Don't replace anything, show incomplete 4-byte CP at the end

{7, 4, 6, 3, 'b', 0}, // incomplete fourth CP

{8, 4, 6, 3, 'b', 0}, // incomplete fourth CP

{9, 4, 6, 3, 'b', 0}, // incomplete fourth CP

{7, 5, 6, 3, 'b', 0}, // incomplete fourth CP

{8, 5, 6, 3, 'b', 0}, // incomplete fourth CP

{9, 5, 6, 3, 'b', 0}, // incomplete fourth CP

// replace first trailing byte with ASCII byte, also incomplete at end

{5, 5, 3, 2, 'z', 4},

// replace first trailing byte with invalid byte, also incomplete at end

{5, 5, 3, 2, '\xFF', 4},

// replace first trailing byte with ASCII byte, also incomplete at end

{8, 5, 6, 3, 'z', 7},

{9, 5, 6, 3, 'z', 7},

// replace first trailing byte with invalid byte, also incomplete at end

{8, 5, 6, 3, '\xFF', 7},

{9, 5, 6, 3, '\xFF', 7},

// replace second trailing byte with ASCII byte, also incomplete at end

{9, 5, 6, 3, 'z', 8},

// replace second trailing byte with invalid byte, also incomplete at end

{9, 5, 6, 3, '\xFF', 8},

};

for (auto t : offsets) {

char in[array_size(valid_in)] = {};

CharT out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

char_traits<char>::copy(in, valid_in, array_size(valid_in));

in[t.replace_pos] = t.replace_char;

auto state = mbstate_t{};

auto in_next = (const char*)nullptr;

auto out_next = (CharT*)nullptr;

auto res = codecvt_base::result();

res = cvt.in(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.error);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<CharT>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void utf8_to_ucs2_in(const std::codecvt<CharT, char, mbstate_t>& cvt) {

utf8_to_ucs2_in_ok(cvt);

utf8_to_ucs2_in_partial(cvt);

utf8_to_ucs2_in_error(cvt);

}

template <class CharT>

void ucs2_to_utf8_out_ok(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP

const char16_t in_literal[] = u"bш\uAAAA";

const char exp[] = "bш\uAAAA";

CharT in[array_size(in_literal)] = {};

copy(begin(in_literal), end(in_literal), begin(in));

static_assert(array_size(in_literal) == 4, "");

static_assert(array_size(exp) == 7, "");

static_assert(array_size(in) == 4, "");

assert(char_traits<char16_t>::length(in_literal) == 3);

assert(char_traits<char>::length(exp) == 6);

assert(char_traits<CharT>::length(in) == 3);

const test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}};

for (auto t : offsets) {

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.ok);

assert(in_next == in + t.in_size);

assert(out_next == out + t.out_size);

assert(char_traits<char>::compare(out, exp, t.out_size) == 0);

if (t.out_size < array_size(out))

assert(out[t.out_size] == 0);

}

template <class CharT>

void ucs2_to_utf8_out_partial(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

// UTF-8 string of 1-byte CP, 2-byte CP and 3-byte CP

const char16_t in_literal[] = u"bш\uAAAA";

const char exp[] = "bш\uAAAA";

CharT in[array_size(in_literal)] = {};

copy(begin(in_literal), end(in_literal), begin(in));

static_assert(array_size(in_literal) == 4, "");

static_assert(array_size(exp) == 7, "");

static_assert(array_size(in) == 4, "");

assert(char_traits<char16_t>::length(in_literal) == 3);

assert(char_traits<char>::length(exp) == 6);

assert(char_traits<CharT>::length(in) == 3);

const test_offsets_partial offsets[] = {

{1, 0, 0, 0}, // no space for first CP

{2, 1, 1, 1}, // no space for second CP

{2, 2, 1, 1}, // no space for second CP

{3, 3, 2, 3}, // no space for third CP

{3, 4, 2, 3}, // no space for third CP

{3, 5, 2, 3}, // no space for third CP

};

for (auto t : offsets) {

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.partial);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<char>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void ucs2_to_utf8_out_error(const std::codecvt<CharT, char, mbstate_t>& cvt) {

using namespace std;

const char16_t valid_in[] = u"bш\uAAAA\U0010AAAA";

const char exp[] = "bш\uAAAA\U0010AAAA";

static_assert(array_size(valid_in) == 6, "");

static_assert(array_size(exp) == 11, "");

assert(char_traits<char16_t>::length(valid_in) == 5);

assert(char_traits<char>::length(exp) == 10);

test_offsets_error<CharT> offsets[] = {

{5, 10, 0, 0, 0xD800, 0},

{5, 10, 0, 0, 0xDBFF, 0},

{5, 10, 0, 0, 0xDC00, 0},

{5, 10, 0, 0, 0xDFFF, 0},

{5, 10, 1, 1, 0xD800, 1},

{5, 10, 1, 1, 0xDBFF, 1},

{5, 10, 1, 1, 0xDC00, 1},

{5, 10, 1, 1, 0xDFFF, 1},

{5, 10, 2, 3, 0xD800, 2},

{5, 10, 2, 3, 0xDBFF, 2},

{5, 10, 2, 3, 0xDC00, 2},

{5, 10, 2, 3, 0xDFFF, 2},

// dont replace anything, just show the surrogate pair

{5, 10, 3, 6, u'b', 0},

// make the leading surrogate a trailing one

{5, 10, 3, 6, 0xDC00, 3},

{5, 10, 3, 6, 0xDFFF, 3},

// make the trailing surrogate a leading one

{5, 10, 3, 6, 0xD800, 4},

{5, 10, 3, 6, 0xDBFF, 4},

// make the trailing surrogate a BMP char

{5, 10, 3, 6, u'z', 4},

{5, 7, 3, 6, u'b', 0}, // no space for fourth CP

{5, 8, 3, 6, u'b', 0}, // no space for fourth CP

{5, 9, 3, 6, u'b', 0}, // no space for fourth CP

{4, 10, 3, 6, u'b', 0}, // incomplete fourth CP

{4, 7, 3, 6, u'b', 0}, // incomplete fourth CP, and no space for it

{4, 8, 3, 6, u'b', 0}, // incomplete fourth CP, and no space for it

{4, 9, 3, 6, u'b', 0}, // incomplete fourth CP, and no space for it

};

for (auto t : offsets) {

CharT in[array_size(valid_in)] = {};

char out[array_size(exp) - 1] = {};

assert(t.in_size <= array_size(in));

assert(t.out_size <= array_size(out));

assert(t.expected_in_next <= t.in_size);

assert(t.expected_out_next <= t.out_size);

copy(begin(valid_in), end(valid_in), begin(in));

in[t.replace_pos] = t.replace_char;

auto state = mbstate_t{};

auto in_next = (const CharT*)nullptr;

auto out_next = (char*)nullptr;

auto res = codecvt_base::result();

res = cvt.out(state, in, in + t.in_size, in_next, out, out + t.out_size, out_next);

assert(res == cvt.error);

assert(in_next == in + t.expected_in_next);

assert(out_next == out + t.expected_out_next);

assert(char_traits<char>::compare(out, exp, t.expected_out_next) == 0);

if (t.expected_out_next < array_size(out))

assert(out[t.expected_out_next] == 0);

}

template <class CharT>

void ucs2_to_utf8_out(const std::codecvt<CharT, char, mbstate_t>& cvt) {

ucs2_to_utf8_out_ok(cvt);

ucs2_to_utf8_out_partial(cvt);

ucs2_to_utf8_out_error(cvt);

}

template <class CharT>

void test_utf8_ucs2_cvts(const std::codecvt<CharT, char, mbstate_t>& cvt) {

utf8_to_ucs2_in(cvt);

ucs2_to_utf8_out(cvt);

}

libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char16_t_in.pass.cpp

	Show All 21 Lines
	// value different from ASCII character.			// value different from ASCII character.
	// UNSUPPORTED: target={{.+}}-zos{{.*}}			// UNSUPPORTED: target={{.+}}-zos{{.*}}

	#include <locale>			#include <locale>
	#include <string>			#include <string>
	#include <vector>			#include <vector>
	#include <cassert>			#include <cassert>

				#include "../../../../codecvt_unicode.h"
	#include "test_macros.h"			#include "test_macros.h"

	typedef std::codecvt<char16_t, char, std::mbstate_t> F;			typedef std::codecvt<char16_t, char, std::mbstate_t> F;

	int main(int, char**)			int main(int, char**)
	{			{
	std::locale l = std::locale::classic();			std::locale l = std::locale::classic();
	const char from[] = "some text";			const char from[] = "some text";
	F::intern_type to[9];			F::intern_type to[9];
	const F& f = std::use_facet<F>(l);			const F& f = std::use_facet<F>(l);
	std::mbstate_t mbs = {};			std::mbstate_t mbs = {};
	const char* from_next = 0;			const char* from_next = 0;
	F::intern_type* to_next = 0;			F::intern_type* to_next = 0;
	assert(f.in(mbs, from, from + 9, from_next,			assert(f.in(mbs, from, from + 9, from_next,
	to, to + 9, to_next) == F::ok);			to, to + 9, to_next) == F::ok);
	assert(from_next - from == 9);			assert(from_next - from == 9);
	assert(to_next - to == 9);			assert(to_next - to == 9);
	for (unsigned i = 0; i < 9; ++i)			for (unsigned i = 0; i < 9; ++i)
	assert(to[i] == from[i]);			assert(to[i] == from[i]);

				utf8_to_utf16_in(f);

	return 0;			return 0;
	}			}

libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char16_t_out.pass.cpp

Show All 23 Lines

#include <locale> #include <locale>

#include <string> #include <string>

#include <vector> #include <vector>

#include <cassert> #include <cassert>

#include <stdio.h> #include <stdio.h>

#include "../../../../codecvt_unicode.h"

MordanteUnsubmitted

Done

#include <stdio.h>

- #include "../../../../codecvt_unicode.h"

+ #include "codecvt_unicode.h"

#include "test_macros.h"

I'm not fond of this include path, it feels quite fragile. I think it would be better to move the code to the test/support directory then the suggestion above works. (This is the same location as the test_macros.h reside.

Mordante: I'm not fond of this include path, it feels quite fragile. I think it would be better to move…

#include "test_macros.h" #include "test_macros.h"

typedef std::codecvt<char16_t, char, std::mbstate_t> F; typedef std::codecvt<char16_t, char, std::mbstate_t> F;

int main(int, char**) int main(int, char**)

{ {

std::locale l = std::locale::classic(); std::locale l = std::locale::classic();

const F& f = std::use_facet<F>(l); const F& f = std::use_facet<F>(l);

{ {

F::intern_type from[9] = {'s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't'}; F::intern_type from[9] = {'s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't'};

char to[9] = {0}; char to[9] = {0};

std::mbstate_t mbs = {}; std::mbstate_t mbs = {};

const F::intern_type* from_next = 0; const F::intern_type* from_next = 0;

char* to_next = 0; char* to_next = 0;

F::result r = f.out(mbs, from, from + 9, from_next, F::result r = f.out(mbs, from, from + 9, from_next,

to, to + 9, to_next); to, to + 9, to_next);

assert(r == F::ok); assert(r == F::ok);

assert(from_next - from == 9); assert(from_next - from == 9);

assert(to_next - to == 9); assert(to_next - to == 9);

for (unsigned i = 0; i < 9; ++i) for (unsigned i = 0; i < 9; ++i)

assert(to[i] == from[i]); assert(to[i] == from[i]);

} }

utf16_to_utf8_out(f);

return 0; return 0;

} }

libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char32_t_in.pass.cpp

	Show All 21 Lines
	// value different from ASCII character.			// value different from ASCII character.
	// UNSUPPORTED: target={{.+}}-zos{{.*}}			// UNSUPPORTED: target={{.+}}-zos{{.*}}

	#include <locale>			#include <locale>
	#include <string>			#include <string>
	#include <vector>			#include <vector>
	#include <cassert>			#include <cassert>

				#include "../../../../codecvt_unicode.h"
	#include "test_macros.h"			#include "test_macros.h"

	typedef std::codecvt<char32_t, char, std::mbstate_t> F;			typedef std::codecvt<char32_t, char, std::mbstate_t> F;

	int main(int, char**)			int main(int, char**)
	{			{
	std::locale l = std::locale::classic();			std::locale l = std::locale::classic();
	const char from[] = "some text";			const char from[] = "some text";
	F::intern_type to[9];			F::intern_type to[9];
	const F& f = std::use_facet<F>(l);			const F& f = std::use_facet<F>(l);
	std::mbstate_t mbs = {};			std::mbstate_t mbs = {};
	const char* from_next = 0;			const char* from_next = 0;
	F::intern_type* to_next = 0;			F::intern_type* to_next = 0;
	assert(f.in(mbs, from, from + 9, from_next,			assert(f.in(mbs, from, from + 9, from_next,
	to, to + 9, to_next) == F::ok);			to, to + 9, to_next) == F::ok);
	assert(from_next - from == 9);			assert(from_next - from == 9);
	assert(to_next - to == 9);			assert(to_next - to == 9);
	for (unsigned i = 0; i < 9; ++i)			for (unsigned i = 0; i < 9; ++i)
	assert(to[i] == static_cast<char32_t>(from[i]));			assert(to[i] == static_cast<char32_t>(from[i]));

				utf8_to_utf32_in(f);

	return 0;			return 0;
	}			}

libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char32_t_out.pass.cpp

	Show All 23 Lines

	#include <locale>			#include <locale>
	#include <string>			#include <string>
	#include <vector>			#include <vector>
	#include <cassert>			#include <cassert>

	#include <stdio.h>			#include <stdio.h>

				#include "../../../../codecvt_unicode.h"
	#include "test_macros.h"			#include "test_macros.h"

	typedef std::codecvt<char32_t, char, std::mbstate_t> F;			typedef std::codecvt<char32_t, char, std::mbstate_t> F;

	int main(int, char**)			int main(int, char**)
	{			{
	std::locale l = std::locale::classic();			std::locale l = std::locale::classic();
	const F& f = std::use_facet<F>(l);			const F& f = std::use_facet<F>(l);
	{			{
	F::intern_type from[9] = {'s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't'};			F::intern_type from[9] = {'s', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't'};
	char to[9] = {0};			char to[9] = {0};
	std::mbstate_t mbs = {};			std::mbstate_t mbs = {};
	const F::intern_type* from_next = 0;			const F::intern_type* from_next = 0;
	char* to_next = 0;			char* to_next = 0;
	F::result r = f.out(mbs, from, from + 9, from_next,			F::result r = f.out(mbs, from, from + 9, from_next,
	to, to + 9, to_next);			to, to + 9, to_next);
	assert(r == F::ok);			assert(r == F::ok);
	assert(from_next - from == 9);			assert(from_next - from == 9);
	assert(to_next - to == 9);			assert(to_next - to == 9);
	for (unsigned i = 0; i < 9; ++i)			for (unsigned i = 0; i < 9; ++i)
	assert(static_cast<char32_t>(to[i]) == from[i]);			assert(static_cast<char32_t>(to[i]) == from[i]);
	}			}
				utf32_to_utf8_out(f);

	return 0;			return 0;
	}			}

libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_in.pass.cpp

Show All 20 Lines

// result // result

// in(stateT& state, // in(stateT& state,

// const externT* from, const externT* from_end, const externT*& from_next, // const externT* from, const externT* from_end, const externT*& from_next,

// internT* to, internT* to_end, internT*& to_next) const; // internT* to, internT* to_end, internT*& to_next) const;

#include <codecvt> #include <codecvt>

#include <cassert> #include <cassert>

#include "../codecvt_unicode.h"

#include "test_macros.h" #include "test_macros.h"

int main(int, char**) int main(int, char**)

{ {

typedef std::codecvt_utf8<char32_t> C; typedef std::codecvt_utf8<char32_t> C;

C c; C c;

char32_t w = 0; char32_t w = 0;

▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines int main(int, char**)

n[0] = char(0x56); n[0] = char(0x56);

r = c.in(m, n, n+1, np, &w, &w+1, wp); r = c.in(m, n, n+1, np, &w, &w+1, wp);

assert(r == std::codecvt_base::ok); assert(r == std::codecvt_base::ok);

assert(wp == &w+1); assert(wp == &w+1);

assert(np == n+1); assert(np == n+1);

assert(w == 0x56); assert(w == 0x56);

} }

{ {

typedef std::codecvt_utf8<char32_t> C;

MordanteUnsubmitted

Done

assert(w == 0x56);

}

{

- typedef std::codecvt_utf8<char32_t> C;

+ using C = std::codecvt_utf8<char32_t>;

C c;

This is the preferred style. For the compilers we support this works in C++03 mode.
You could even consider to remove the entire typedef since it's only used once.

Mordante: This is the preferred style. For the compilers we support this works in C++03 mode. You could…

C c;

utf8_to_utf32_in(c);

}

{

typedef std::codecvt_utf8<char16_t> C; typedef std::codecvt_utf8<char16_t> C;

C c; C c;

char16_t w = 0; char16_t w = 0;

char n[3] = {char(0xE1), char(0x80), char(0x85)}; char n[3] = {char(0xE1), char(0x80), char(0x85)};

char16_t* wp = nullptr; char16_t* wp = nullptr;

std::mbstate_t m; std::mbstate_t m;

const char* np = nullptr; const char* np = nullptr;

std::codecvt_base::result r = c.in(m, n, n+3, np, &w, &w+1, wp); std::codecvt_base::result r = c.in(m, n, n+3, np, &w, &w+1, wp);

▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines int main(int, char**)

w = 0x56; w = 0x56;

n[0] = char(0x56); n[0] = char(0x56);

r = c.in(m, n, n+1, np, &w, &w+1, wp); r = c.in(m, n, n+1, np, &w, &w+1, wp);

assert(r == std::codecvt_base::ok); assert(r == std::codecvt_base::ok);

assert(wp == &w+1); assert(wp == &w+1);

assert(np == n+1); assert(np == n+1);

assert(w == 0x56); assert(w == 0x56);

} }

{

typedef std::codecvt_utf8<char16_t> C;

C c;

utf8_to_ucs2_in(c);

}

#ifndef TEST_HAS_NO_WIDE_CHARACTERS

{

typedef std::codecvt_utf8<wchar_t> C;

C c;

# if __SIZEOF_WCHAR_T__ == 2

utf8_to_ucs2_in(c);

# elif __SIZEOF_WCHAR_T__ == 4

utf8_to_utf32_in(c);

# endif

}

#endif

return 0; return 0;

} }

libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_out.pass.cpp

Show All 20 Lines
// result		// result
// out(stateT& state,		// out(stateT& state,
// const internT* from, const internT* from_end, const internT*& from_next,		// const internT* from, const internT* from_end, const internT*& from_next,
// externT* to, externT* to_end, externT*& to_next) const;		// externT* to, externT* to_end, externT*& to_next) const;

#include <codecvt>		#include <codecvt>
#include <cassert>		#include <cassert>

		#include "../codecvt_unicode.h"
#include "test_macros.h"		#include "test_macros.h"

template <class CharT, size_t = sizeof(CharT)>		template <class CharT, size_t = sizeof(CharT)>
struct TestHelper;		struct TestHelper;

template <class CharT>		template <class CharT>
struct TestHelper<CharT, 2> {		struct TestHelper<CharT, 2> {
static void test();		static void test();
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	void TestHelper<CharT, 2>::test() {
assert(n[0] == char(0xEF));		assert(n[0] == char(0xEF));
assert(n[1] == char(0xBB));		assert(n[1] == char(0xBB));
assert(n[2] == char(0xBF));		assert(n[2] == char(0xBF));
assert(n[3] == char(0x56));		assert(n[3] == char(0x56));
assert(n[4] == char(0x93));		assert(n[4] == char(0x93));
assert(n[5] == char(0x85));		assert(n[5] == char(0x85));
assert(n[6] == char(0));		assert(n[6] == char(0));
}		}
		{
		typedef std::codecvt_utf8<CharT> C;
		C c;
		ucs2_to_utf8_out(c);
		}
}		}

template <class CharT>		template <class CharT>
void TestHelper<CharT, 4>::test() {		void TestHelper<CharT, 4>::test() {
{		{
typedef std::codecvt_utf8<CharT> C;		typedef std::codecvt_utf8<CharT> C;
C c;		C c;
CharT w = 0x40003;		CharT w = 0x40003;
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	void TestHelper<CharT, 4>::test() {
assert(n[0] == char(0xEF));		assert(n[0] == char(0xEF));
assert(n[1] == char(0xBB));		assert(n[1] == char(0xBB));
assert(n[2] == char(0xBF));		assert(n[2] == char(0xBF));
assert(n[3] == char(0x56));		assert(n[3] == char(0x56));
assert(n[4] == char(0x93));		assert(n[4] == char(0x93));
assert(n[5] == char(0x85));		assert(n[5] == char(0x85));
assert(n[6] == char(0x83));		assert(n[6] == char(0x83));
}		}
		{
		typedef std::codecvt_utf8<CharT> C;
		C c;
		utf32_to_utf8_out(c);
		}
}		}

int main(int, char**) {		int main(int, char**) {
#ifndef TEST_HAS_NO_WIDE_CHARACTERS		#ifndef TEST_HAS_NO_WIDE_CHARACTERS
TestHelper<wchar_t>::test();		TestHelper<wchar_t>::test();
#endif		#endif
TestHelper<char32_t>::test();		TestHelper<char32_t>::test();
TestHelper<char16_t>::test();		TestHelper<char16_t>::test();

return 0;		return 0;
}		}

libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_utf16_in.pass.cpp

Show All 20 Lines
// result		// result
// in(stateT& state,		// in(stateT& state,
// const externT* from, const externT* from_end, const externT*& from_next,		// const externT* from, const externT* from_end, const externT*& from_next,
// internT* to, internT* to_end, internT*& to_next) const;		// internT* to, internT* to_end, internT*& to_next) const;

#include <codecvt>		#include <codecvt>
#include <cassert>		#include <cassert>

		#include "../codecvt_unicode.h"
#include "test_macros.h"		#include "test_macros.h"

template <class CharT, size_t = sizeof(CharT)>		template <class CharT, size_t = sizeof(CharT)>
struct TestHelper;		struct TestHelper;
template <class CharT>		template <class CharT>
struct TestHelper<CharT, 2> {		struct TestHelper<CharT, 2> {
static void test();		static void test();
};		};
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	void TestHelper<CharT, 2>::test() {

n[0] = char(0x56);		n[0] = char(0x56);
r = c.in(m, n, n + 1, np, w, w + 2, wp);		r = c.in(m, n, n + 1, np, w, w + 2, wp);
assert(r == std::codecvt_base::ok);		assert(r == std::codecvt_base::ok);
assert(wp == w + 1);		assert(wp == w + 1);
assert(np == n + 1);		assert(np == n + 1);
assert(w[0] == 0x0056);		assert(w[0] == 0x0056);
}		}
		{
		typedef std::codecvt_utf8_utf16<CharT> C;
		C c;
		utf8_to_utf16_in(c);
		}
}		}

template <class CharT>		template <class CharT>
void TestHelper<CharT, 4>::test() {		void TestHelper<CharT, 4>::test() {
{		{
typedef std::codecvt_utf8_utf16<CharT> C;		typedef std::codecvt_utf8_utf16<CharT> C;
C c;		C c;
CharT w[2] = {0};		CharT w[2] = {0};
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	void TestHelper<CharT, 4>::test() {

n[0] = char(0x56);		n[0] = char(0x56);
r = c.in(m, n, n + 1, np, w, w + 2, wp);		r = c.in(m, n, n + 1, np, w, w + 2, wp);
assert(r == std::codecvt_base::ok);		assert(r == std::codecvt_base::ok);
assert(wp == w + 1);		assert(wp == w + 1);
assert(np == n + 1);		assert(np == n + 1);
assert(w[0] == 0x0056);		assert(w[0] == 0x0056);
}		}
		{
		typedef std::codecvt_utf8_utf16<CharT> C;
		C c;
		utf8_to_utf16_in(c);
		}
}		}

int main(int, char**) {		int main(int, char**) {
#if !defined(_WIN32) && !defined(TEST_HAS_NO_WIDE_CHARACTERS)		#if !defined(_WIN32) && !defined(TEST_HAS_NO_WIDE_CHARACTERS)
TestHelper<wchar_t>::test();		TestHelper<wchar_t>::test();
#endif		#endif
TestHelper<char32_t>::test();		TestHelper<char32_t>::test();
TestHelper<char16_t>::test();		TestHelper<char16_t>::test();

return 0;		return 0;
}		}

libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_utf16_out.pass.cpp

Show All 20 Lines
// result		// result
// out(stateT& state,		// out(stateT& state,
// const internT* from, const internT* from_end, const internT*& from_next,		// const internT* from, const internT* from_end, const internT*& from_next,
// externT* to, externT* to_end, externT*& to_next) const;		// externT* to, externT* to_end, externT*& to_next) const;

#include <codecvt>		#include <codecvt>
#include <cassert>		#include <cassert>

		#include "../codecvt_unicode.h"
#include "test_macros.h"		#include "test_macros.h"

template <class CharT, size_t = sizeof(CharT)>		template <class CharT, size_t = sizeof(CharT)>
struct TestHelper;		struct TestHelper;
template <class CharT>		template <class CharT>
struct TestHelper<CharT, 2> {		struct TestHelper<CharT, 2> {
static void test();		static void test();
};		};
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	void TestHelper<CharT, 2>::test() {
assert(r == std::codecvt_base::ok);		assert(r == std::codecvt_base::ok);
assert(wp == w + 1);		assert(wp == w + 1);
assert(np == n + 4);		assert(np == n + 4);
assert(n[0] == char(0xEF));		assert(n[0] == char(0xEF));
assert(n[1] == char(0xBB));		assert(n[1] == char(0xBB));
assert(n[2] == char(0xBF));		assert(n[2] == char(0xBF));
assert(n[3] == char(0x56));		assert(n[3] == char(0x56));
}		}
		{
		typedef std::codecvt_utf8_utf16<CharT> C;
		C c;
		utf16_to_utf8_out(c);
		}
}		}

template <class CharT>		template <class CharT>
void TestHelper<CharT, 4>::test() {		void TestHelper<CharT, 4>::test() {
{		{
typedef std::codecvt_utf8_utf16<CharT> C;		typedef std::codecvt_utf8_utf16<CharT> C;
C c;		C c;
CharT w[2] = {0xD8C0, 0xDC03};		CharT w[2] = {0xD8C0, 0xDC03};
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	void TestHelper<CharT, 4>::test() {
assert(r == std::codecvt_base::ok);		assert(r == std::codecvt_base::ok);
assert(wp == w + 1);		assert(wp == w + 1);
assert(np == n + 4);		assert(np == n + 4);
assert(n[0] == char(0xEF));		assert(n[0] == char(0xEF));
assert(n[1] == char(0xBB));		assert(n[1] == char(0xBB));
assert(n[2] == char(0xBF));		assert(n[2] == char(0xBF));
assert(n[3] == char(0x56));		assert(n[3] == char(0x56));
}		}
		{
		typedef std::codecvt_utf8_utf16<CharT> C;
		C c;
		utf16_to_utf8_out(c);
		}
}		}

int main(int, char**) {		int main(int, char**) {
#if !defined(_WIN32) && !defined(TEST_HAS_NO_WIDE_CHARACTERS)		#if !defined(_WIN32) && !defined(TEST_HAS_NO_WIDE_CHARACTERS)
TestHelper<wchar_t>::test();		TestHelper<wchar_t>::test();
#endif		#endif
TestHelper<char32_t>::test();		TestHelper<char32_t>::test();
TestHelper<char16_t>::test();		TestHelper<char16_t>::test();

return 0;		return 0;
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Fix UTF-8 decoding in codecvts. Fix #60177.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 494951

libcxx/src/locale.cpp

libcxx/test/std/localization/codecvt_unicode.h

libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char16_t_in.pass.cpp

libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char16_t_out.pass.cpp

libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char32_t_in.pass.cpp

libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char32_t_out.pass.cpp

libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_in.pass.cpp

libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_out.pass.cpp

libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_utf16_in.pass.cpp

libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_utf16_out.pass.cpp

[libc++] Fix UTF-8 decoding in codecvts. Fix #60177.
AbandonedPublic