Page MenuHomePhabricator

[5/N] [libcxx] Convert paths to/from the right narrow code page for narrow strings on windows
Needs ReviewPublic

Authored by mstorsjo on Tue, Nov 10, 1:33 AM.


Group Reviewers
Restricted Project

On windows, the narrow, char based paths normally don't use utf8, but can use many different native code pages, and this is what system functions that operate on files, taking such paths/file names, interpret them as.

Diff Detail

Event Timeline

mstorsjo created this revision.Tue, Nov 10, 1:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Nov 10, 1:33 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
mstorsjo requested review of this revision.Tue, Nov 10, 1:33 AM
mstorsjo retitled this revision from [libcxx] Convert paths to/from the right narrow code page for narrow strings on windows to [5/N] [libcxx] Convert paths to/from the right narrow code page for narrow strings on windows.Tue, Nov 10, 1:35 AM
mstorsjo updated this revision to Diff 305076.Fri, Nov 13, 3:33 AM

Updated to define WIN32_LEAN_AND_MEAN and NOMINMAX when including windows.h in operations.cpp.

The Windows details look correct, though I have a couple questions on the code page conversions.


I'm not clear on what AreFileApisANSI does, nor have I found it using code search. In what case would any APIs use the OEM code page when the user's code page is something else?

I'm also wondering whether CP_ACP should be CP_THREAD_ACP, which, by default, will be the user's current code page which could be different than the system code page.

mstorsjo added inline comments.Mon, Nov 16, 2:52 PM

See D91133 for a testcase (which passes with MSVC STL) for this and some more - a process can switch between whether the narrow file apis take OEM CP or ACP with SetFileApisToANSI() and SetFileApisToOEM().

The MSVC STL sources don't seem to use CP_THREAD_ACP at least... however they also check for ___lc_codepage_func() == CP_UTF8, which I guess we also should...

mstorsjo added inline comments.Tue, Nov 17, 12:05 AM

To follow up on the last bit regarding ___lc_codepage_func(), contrary to SetFileApisToOEM() which operates on the kernel32 level, affecting how all narrow file names are interpreted, one can also do setlocale(LC_ALL, ".utf8");, which makes e.g. fopen() work with utf8 file names, but it has no effect on actual kernel32 -A suffixed file APIs.

But in any case, I guess adding the extra check for ___lc_codepage_func(), like ___lc_codepage_func() == CP_UTF8 ? CP_UTF8 : AreFileApisANSI() ? CP_ACP : CP_OEMCP.

After reading up on CP_THREAD_ACP, I don't think that's relevant. The main point is that if I export a narrow form of a filename from a std::filesystem::path object (or create a path object from a narrow string), I expect the string to be in the same codepage that fopen() or CreateFileA() would accept and interpreted in the same way, and as far as I'm reading about CP_THREAD_ACP, it only affects other bits, but not how kernel32 file APIs map narrow chars to wchar.

amccarth accepted this revision.Tue, Nov 17, 1:28 PM


Excellent. Somehow I missed that AreFileApisANSI was a Win32 function. I must have mistyped it in my searches the other day. This looks good.