This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] std::codecvt - short wchar support
AbandonedPublic

Authored by iid_iunknown on Apr 7 2015, 2:29 PM.

Details

Summary

std::codecvt conversions internally use uint32_t in codecvt specializations for wchar_t, which breaks conversion logic if the '-fshort-wchar' was used during compilation.
This patch modifies std::codecvt_utf8 to use the same logic for short wchar_t as the one for WIN32 where wchar_t is 2 bytes long.

std::codecvt_utf16 and std::codecvt_utf8_utf16 changed to work with both WIN32 and fshort-wchar.

It also improves code reuse by generalizing utf16_to_utf8 and utf8_to_utf16 functions, and fixes the libcxx tests dependent on std::codecvt, which use 4-byte test data or expect UCS-4 encoding.

UPDATE

The libc++ tests not working with short wchar now have their short wchar counterparts. The short_wchar feature added to the libc++ tests. Lit decides which version of a test to run depending on availability of short_wchar.

Diff Detail

Repository
rL LLVM

Event Timeline

iid_iunknown retitled this revision from to [libcxx] std::codecvt - short wchar support.
iid_iunknown updated this object.
iid_iunknown edited the test plan for this revision. (Show Details)
iid_iunknown set the repository for this revision to rL LLVM.
iid_iunknown added a subscriber: Unknown Object (MLST).
jroelofs edited edge metadata.Apr 8 2015, 11:20 AM

A couple of drive-by comments.

test/std/localization/locale.stdcvt/codecvt_utf16_in.pass.cpp
2

Does win32 not have __SIZEOF_WCHAR_T__?

6

The else should really be something to the tune of: `#error Can't tell how big wchar_t is on this platform". Otherwise this is a potential nightmare for people doing ports.

test/std/localization/locales/locale.convenience/conversions/conversions.string/to_bytes.pass.cpp
6

same here. Maybe this macro should go in one of the test support headers instead?

mclow.lists edited edge metadata.Apr 8 2015, 2:55 PM

I don't like the gutting of the tests.
I don't like the confusion added to the code.

src/locale.cpp
0–1

You're changing a function named utf8_to_utf16 to take parameters of type uint32_t *?

Eww. I object.

0–1

Same comment as below.

test/std/localization/locales/locale.convenience/conversions/conversions.string/to_bytes.pass.cpp
1

This is not "fixing the libcxx tests dependent on std::codecvt".

This is *removing the tests*.

mclow.lists added inline comments.Apr 8 2015, 9:49 PM
src/locale.cpp
0–1

I apologize for this comment. We already have a function that does this.
Still, eww.

I don't like the gutting of the tests.
I don't like the confusion added to the code.

Codecvt_utf8 already handles short wchar properly and does this in exactly same manner. The patch adds short wchar support to other codecvt's, which seem to have been forgotten. The change is similar to how it is done in gcc's libstdc++.

The next patch changes this a bit by introducing the SHORT_WCHAR macro. This saves some typing and, I hope, makes the code more readable. Would you take a look and share your thoughts, please?

Yes, I agree about the tests. The next patch will avoid messing in the tests by splitting such wchar-size-dependent tests into two versions: for short wchar and for "long" wchar. Lit will choose which test to run depending on short_wchar feature availability, which will also be added.

src/locale.cpp
0–1

Objecting to the existing functionality. We already have two utf8_to_utf16 functions and two utf16_to_utf8 to support char16_t and char32_t.

This part of the patch is not a functional change. It just tries to improve code reuse instead.
As this is just an improvement, I can revert it if code repetition looks better to someone or the corrected version seems confusing.

test/std/localization/locale.stdcvt/codecvt_utf16_in.pass.cpp
2

It does not seem to have __SIZEOF_WCHAR_T__, unfortunately.

6

Thank you for your remark. I will introduce such a message in the next patch version.

test/std/localization/locales/locale.convenience/conversions/conversions.string/to_bytes.pass.cpp
6

Ok, I am going to change the way the tests are handled for short wchar, so that they won't contain these #if / #else anymore.

iid_iunknown updated this object.
iid_iunknown edited edge metadata.

Changes locale.cpp to make the code look less messy.
Introduces the short_wchar feature to the libc++ tests, creates separate tests for short wchar.

iid_iunknown abandoned this revision.Dec 28 2016, 5:51 AM

Abandoning the patch as it is too old.