This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Add missing short wchar handling for codecvt_utf8, codecvt_utf16 and codecvt_utf8_utf16
ClosedPublic

Authored by jasonliu on Apr 19 2021, 11:07 AM.

Details

Summary

AIX have 2 byte wchar in 32 bit mode and 4 byte wchar in 64 bit mode.
This patch add more missing short wchar handling under the existing _LIBCPP_SHORT_WCHAR macro.

Diff Detail

Event Timeline

jasonliu requested review of this revision.Apr 19 2021, 11:07 AM
jasonliu created this revision.
SeanP added a comment.Apr 21 2021, 8:59 PM

Why did you have to add a specialization based on size? Could you provide a test case or two to help explain. My first thought is the _LIBCPP_SHORT_WCHAR macro should be good enough since aix & z/OS only support one wchar_t size per 32/64 bit mode.

Why did you have to add a specialization based on size? Could you provide a test case or two to help explain. My first thought is the _LIBCPP_SHORT_WCHAR macro should be good enough since aix & z/OS only support one wchar_t size per 32/64 bit mode.

With -fshort-wchar, you could have a short wchar in 64 bit mode. So this implementation still gives you the correct result if that's used.

zibi added a comment.Apr 23 2021, 8:12 AM

It looks like there are few test cases failing in Windows. Was that investigated to see if it is related to this patch?

jasonliu updated this revision to Diff 340577.Apr 26 2021, 10:21 AM
jasonliu retitled this revision from [libc++] code_cvt change to handle short wchar(2 byte) for some platform to [libc++] Add missing short wchar handling for codecvt_utf8, codecvt_utf16 and codecvt_utf8_utf16 .
jasonliu edited the summary of this revision. (Show Details)

Why did you have to add a specialization based on size? Could you provide a test case or two to help explain. My first thought is the _LIBCPP_SHORT_WCHAR macro should be good enough since aix & z/OS only support one wchar_t size per 32/64 bit mode.

With -fshort-wchar, you could have a short wchar in 64 bit mode. So this implementation still gives you the correct result if that's used.

Relay some offline discussion: t -fshort-wchar is something that most of libc libraries doesn’t support. i.e. wchar functions won’t return a sensible result when -fshort-wchar is enabled. So there is no point to try to make it work with libc++. So choosing to use existing macro _LIBCPP_SHORT_WCHAR to handle the wchar size differences here.

jasonliu planned changes to this revision.Apr 29 2021, 1:35 PM

As expected, this changes the behavior of windows platform for the do_length and do_max_length of __codecvt_utf8<wchar_t>.
But I think the changes are actually desired. Could anyone working on Windows platform confirm that?
If we could agree on that, then I could work on updating the test case.

jasonliu updated this revision to Diff 343108.May 5 2021, 11:08 AM
jasonliu edited the summary of this revision. (Show Details)

Updated test case to account for short wchar_t.

ldionne accepted this revision.Jul 26 2021, 5:40 AM
ldionne added a subscriber: mstorsjo.

Can you rebase onto main to trigger CI? This LGTM but I'd like @mstorsjo to peek at this since that setting is used on Windows too.

This revision is now accepted and ready to land.Jul 26 2021, 5:40 AM
jasonliu updated this revision to Diff 362233.Jul 27 2021, 5:01 PM

Rebase to trigger CI.

As expected, this changes the behavior of windows platform for the do_length and do_max_length of __codecvt_utf8<wchar_t>.
But I think the changes are actually desired. Could anyone working on Windows platform confirm that?

Yes, Windows has got 2 byte wchar_t - so if I read this correctly, libc++ has got an existing define for _LIBCPP_SHORT_WCHAR, but it was incomplete/inconsistent, and this fixes up things to work more correctly for that case? That's certainly good.

In practice, which cases didn't work as they should in that configuration before? I haven't noticed that aspect anywhere in the libc++ tests on Windows (although it could be hidden by some XFAIL: LIBCXX-WINDOWS-FIXME, but that would show up as XPASS in that case too).

If we could agree on that, then I could work on updating the test case.

I don't on a quick glance see what the practical effect of the test updates is, but if it still passes in the Windows CI configurations I guess it should be fine.

jasonliu added a comment.EditedJul 28 2021, 3:25 PM

In practice, which cases didn't work as they should in that configuration before? I haven't noticed that aspect anywhere in the libc++ tests on Windows (although it could be hidden by some XFAIL: LIBCXX-WINDOWS-FIXME, but that would show up as XPASS in that case too).

@mstorsjo I modified the test case so that it would produce the correct result for 2 byte wchar. Without those test case modification, windows CI would fail.
Those test case weren't testing the correct thing for 2 byte wchar and thus previous behavior that produced a result that's intended for 4 byte wchar on Windows verified "clean".

FWIW, no objection from me to proceed with this patch, thanks for the explanation!

@daltenty @xingxue
Since I don't have access to AIX machine any more, would you help to test and land this patch if this still aligns to what you want on AIX?
If not, feel free to commandeering this patch if needed.

Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2021, 1:21 PM