Download Raw Diff

Details

Reviewers

amccarth
EricWF
curdeius
ldionne

Group Reviewers

Restricted Project

Commits

rGde698ae73444: [libcxx] Convert paths to/from the right narrow code page for narrow strings on…

Summary

On windows, the narrow, char based paths normally don't use utf8, but can use many different native code pages, and this is what system functions that operate on files, taking such paths/file names, interpret them as.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mstorsjo created this revision.Nov 10 2020, 1:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2020, 1:33 AM

Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript

mstorsjo requested review of this revision.Nov 10 2020, 1:33 AM

Harbormaster completed remote builds in B78251: Diff 304089.Nov 10 2020, 1:34 AM

mstorsjo retitled this revision from [libcxx] Convert paths to/from the right narrow code page for narrow strings on windows to [5/N] [libcxx] Convert paths to/from the right narrow code page for narrow strings on windows.Nov 10 2020, 1:35 AM

mstorsjo added a parent revision: D91136: [4/N] [libcxx] Reorder the two u8path functions, to make the following diff more readable. NFC..

mstorsjo added a child revision: D91138: [6/N] [libcxx] Handle backslash as path separator on windows.

Updated to define WIN32_LEAN_AND_MEAN and NOMINMAX when including windows.h in operations.cpp.

mstorsjo added a reviewer: amccarth.Nov 13 2020, 3:34 AM

The Windows details look correct, though I have a couple questions on the code page conversions.

libcxx/src/filesystem/operations.cpp
1696	I'm not clear on what AreFileApisANSI does, nor have I found it using code search. In what case would any APIs use the OEM code page when the user's code page is something else? I'm also wondering whether CP_ACP should be CP_THREAD_ACP, which, by default, will be the user's current code page which could be different than the system code page.

mstorsjo added inline comments.Nov 16 2020, 2:52 PM

libcxx/src/filesystem/operations.cpp
1696	See D91133 for a testcase (which passes with MSVC STL) for this and some more - a process can switch between whether the narrow file apis take OEM CP or ACP with SetFileApisToANSI() and SetFileApisToOEM(). The MSVC STL sources don't seem to use CP_THREAD_ACP at least... however they also check for `___lc_codepage_func() == CP_UTF8`, which I guess we also should...

mstorsjo added inline comments.Nov 17 2020, 12:05 AM

libcxx/src/filesystem/operations.cpp
1696	To follow up on the last bit regarding `___lc_codepage_func()`, contrary to `SetFileApisToOEM()` which operates on the kernel32 level, affecting how all narrow file names are interpreted, one can also do `setlocale(LC_ALL, ".utf8");`, which makes e.g. `fopen()` work with utf8 file names, but it has no effect on actual kernel32 -A suffixed file APIs. But in any case, I guess adding the extra check for `___lc_codepage_func()`, like `___lc_codepage_func() == CP_UTF8 ? CP_UTF8 : AreFileApisANSI() ? CP_ACP : CP_OEMCP`. After reading up on CP_THREAD_ACP, I don't think that's relevant. The main point is that if I export a narrow form of a filename from a std::filesystem::path object (or create a path object from a narrow string), I expect the string to be in the same codepage that `fopen()` or `CreateFileA()` would accept and interpreted in the same way, and as far as I'm reading about CP_THREAD_ACP, it only affects other bits, but not how kernel32 file APIs map narrow chars to wchar.

LGTM.

Excellent. Somehow I missed that AreFileApisANSI was a Win32 function. I must have mistyped it in my searches the other day. This looks good.

mstorsjo added a reviewer: EricWF.Nov 21 2020, 6:17 AM

I take your word on what you explained about codepages as I'm not much familiar with it. Otherwise LGTM with some nits.

libcxx/include/filesystem
1442	Or just use it instead of _Str?
1471
1476	Please add comments to longer #endifs.

Applied the suggested changes.

Set the repository, to allow the CI to run.

Harbormaster completed remote builds in B81713: Diff 310661.Dec 9 2020, 2:40 PM

curdeius added inline comments.Dec 9 2020, 2:44 PM

libcxx/include/filesystem
1444	It's there a patch where you add `_w.reserve(...)`? If not, this one seems like a good candidate. If I'm not mistaken, widening can decrease the number of characters, so that might get tricky (like going through the input twice, once to count the output size, then doing the real conversion), but at least a FIXME note would be great. BTW, you probably know that, `MultiByteToWideChar` will return the required output buffer size without doing the conversion if you pass 0 as `cchWideChar`.

mstorsjo added inline comments.Dec 10 2020, 12:17 AM

libcxx/include/filesystem
1444	I guess I could call `reserve()` here using the length of the utf8 string as size. As converting from utf8 to wchar in most cases will make the input shorter, so the size allocated by `reserve()` should be enough in most concievable cases. For strings consisting mostly of ascii chars, the actual difference in length shouldn't be much, so it shouldn't hurt much with such a rough estimate. Yeah I know that `MultiByteToWideChar` can count the needed output size, this is used in e.g. `size_t __size = __char_to_wide(__str, nullptr, 0);` further up in the same patch when operating on the native narrow charset. I'd rather not mix `MultiByteToWideChar` with libc++'s `__widen_from_utf8`, especially as the exact length isn't needed beforehand here as that conversion allocates more as needed.

curdeius added inline comments.Dec 10 2020, 12:42 AM

libcxx/include/filesystem
1444	I agree that mixing `MultiByteToWideChar` with `__widen_from_utf8` doesn't seem like a good idea. But I'll be for what you suggested, so just reserve possibly more.

Added calls to reserve() where suitable.

LGTM except nits!

libcxx/include/filesystem
1464	I don't think `#ifdefing` the `static_assert` message adds anything. It just makes things harder to read IMO. I know this was done the same way elsewhere, but I think we should actually remove the ifdefs in those other places instead.
libcxx/src/filesystem/operations.cpp
21	Please indent includes like: #if defined(_LIBCPP_WIN32API` # define WIN32_LEAN_AND_MEAN # include <stuff> #else # include <more stuff> #endif

This revision now requires changes to proceed.Dec 14 2020, 3:28 PM

ldionne set the repository for this revision to rG LLVM Github Monorepo.Dec 14 2020, 3:28 PM

mstorsjo mentioned this in D91135: [3/N] [libcxx] Make filesystem::path::value_type wchar_t on windows.Dec 14 2020, 11:11 PM

mstorsjo added inline comments.Dec 14 2020, 11:13 PM

libcxx/include/filesystem
1464	Fair enough, I can remove that bit from this patch, and I'll send a separate patch for removing the other ifdefs, and rebase these patches on top of that (where relevant) once that's merged.
libcxx/src/filesystem/operations.cpp
21	Ok, will change.

mstorsjo mentioned this in D93283: [libcxx] Remove ifdefs in the message to static_assert. NFC..Dec 14 2020, 11:20 PM

Indented preprocessor directives within ifdefs (indented with one space, to match other existing similar cases), removed ifdef in assert message.

Reupload to re-set the repository, to trigger CI

Harbormaster completed remote builds in B82405: Diff 311807.Dec 14 2020, 11:43 PM

Rebased

Harbormaster completed remote builds in B82768: Diff 312438.Dec 17 2020, 6:53 AM

Thanks!

This revision is now accepted and ready to land.Dec 17 2020, 2:04 PM

Closed by commit rGde698ae73444: [libcxx] Convert paths to/from the right narrow code page for narrow strings on… (authored by mstorsjo). · Explain WhyDec 18 2020, 1:25 AM

This revision was automatically updated to reflect the committed changes.

mstorsjo added a commit: rGde698ae73444: [libcxx] Convert paths to/from the right narrow code page for narrow strings on….

Diff 312717

libcxx/include/filesystem

Show First 20 Lines • Show All 684 Lines • ▼ Show 20 Lines

#if defined(_LIBCPP_WIN32API) #if defined(_LIBCPP_WIN32API)

typedef wstring __path_string; typedef wstring __path_string;

typedef wchar_t __path_value; typedef wchar_t __path_value;

#else #else

typedef string __path_string; typedef string __path_string;

typedef char __path_value; typedef char __path_value;

#endif #endif

#if defined(_LIBCPP_WIN32API)

_LIBCPP_FUNC_VIS

size_t __wide_to_char(const wstring&, char*, size_t);

_LIBCPP_FUNC_VIS

size_t __char_to_wide(const string&, wchar_t*, size_t);

#endif

template <class _ECharT> template <class _ECharT>

struct _PathCVT; struct _PathCVT;

#if !defined(_LIBCPP_HAS_NO_LOCALIZATION) #if !defined(_LIBCPP_HAS_NO_LOCALIZATION)

template <class _ECharT> template <class _ECharT>

struct _PathCVT { struct _PathCVT {

static_assert(__can_convert_char<_ECharT>::value, static_assert(__can_convert_char<_ECharT>::value,

"Char type not convertible"); "Char type not convertible");

▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines struct _PathCVT<__path_value> {

static void __append_source(__path_string& __dest, _Source const& __s) { static void __append_source(__path_string& __dest, _Source const& __s) {

using _Traits = __is_pathable<_Source>; using _Traits = __is_pathable<_Source>;

__append_range(__dest, _Traits::__range_begin(__s), __append_range(__dest, _Traits::__range_begin(__s),

_Traits::__range_end(__s)); _Traits::__range_end(__s));

} }

}; };

#if defined(_LIBCPP_WIN32API) #if defined(_LIBCPP_WIN32API)

template <>

struct _PathCVT<char> {

static void

__append_string(__path_string& __dest, const basic_string<char> &__str) {

size_t __size = __char_to_wide(__str, nullptr, 0);

size_t __pos = __dest.size();

__dest.resize(__pos + __size);

__char_to_wide(__str, const_cast<__path_value*>(__dest.data()) + __pos, __size);

}

template <class _Iter>

static typename enable_if<__is_exactly_cpp17_input_iterator<_Iter>::value>::type

__append_range(__path_string& __dest, _Iter __b, _Iter __e) {

basic_string<char> __tmp(__b, __e);

__append_string(__dest, __tmp);

}

template <class _Iter>

static typename enable_if<__is_cpp17_forward_iterator<_Iter>::value>::type

__append_range(__path_string& __dest, _Iter __b, _Iter __e) {

basic_string<char> __tmp(__b, __e);

__append_string(__dest, __tmp);

}

template <class _Iter>

static void __append_range(__path_string& __dest, _Iter __b, _NullSentinel) {

const char __sentinel = char{};

basic_string<char> __tmp;

for (; *__b != __sentinel; ++__b)

__tmp.push_back(*__b);

__append_string(__dest, __tmp);

}

template <class _Source>

static void __append_source(__path_string& __dest, _Source const& __s) {

using _Traits = __is_pathable<_Source>;

__append_range(__dest, _Traits::__range_begin(__s),

_Traits::__range_end(__s));

}

};

template <class _ECharT> template <class _ECharT>

struct _PathExport { struct _PathExport {

typedef __narrow_to_utf8<sizeof(wchar_t) * __CHAR_BIT__> _Narrower; typedef __narrow_to_utf8<sizeof(wchar_t) * __CHAR_BIT__> _Narrower;

typedef __widen_from_utf8<sizeof(_ECharT) * __CHAR_BIT__> _Widener; typedef __widen_from_utf8<sizeof(_ECharT) * __CHAR_BIT__> _Widener;

template <class _Str> template <class _Str>

static void __append(_Str& __dest, const __path_string& __src) { static void __append(_Str& __dest, const __path_string& __src) {

string __utf8; string __utf8;

_Narrower()(back_inserter(__utf8), __src.data(), __src.data() + __src.size()); _Narrower()(back_inserter(__utf8), __src.data(), __src.data() + __src.size());

_Widener()(back_inserter(__dest), __utf8.data(), __utf8.data() + __utf8.size()); _Widener()(back_inserter(__dest), __utf8.data(), __utf8.data() + __utf8.size());

} }

}; };

template <> template <>

struct _PathExport<char> {

template <class _Str>

static void __append(_Str& __dest, const __path_string& __src) {

size_t __size = __wide_to_char(__src, nullptr, 0);

size_t __pos = __dest.size();

__dest.resize(__size);

__wide_to_char(__src, const_cast<char*>(__dest.data()) + __pos, __size);

}

};

template <>

struct _PathExport<wchar_t> { struct _PathExport<wchar_t> {

template <class _Str> template <class _Str>

static void __append(_Str& __dest, const __path_string& __src) { static void __append(_Str& __dest, const __path_string& __src) {

__dest.append(__src.begin(), __src.end()); __dest.append(__src.begin(), __src.end());

} }

}; };

template <> template <>

▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines string(const _Allocator& __a = _Allocator()) const {

_PathExport<_ECharT>::__append(__s, __pn_); _PathExport<_ECharT>::__append(__s, __pn_);

return __s; return __s;

} }

_LIBCPP_INLINE_VISIBILITY _VSTD::string string() const { _LIBCPP_INLINE_VISIBILITY _VSTD::string string() const {

return string<char>(); return string<char>();

} }

_LIBCPP_INLINE_VISIBILITY __u8_string u8string() const { _LIBCPP_INLINE_VISIBILITY __u8_string u8string() const {

return string<__u8_string::value_type>(); using _CVT = __narrow_to_utf8<sizeof(wchar_t) * __CHAR_BIT__>;

__u8_string __s;

__s.reserve(__pn_.size());

_CVT()(back_inserter(__s), __pn_.data(), __pn_.data() + __pn_.size());

return __s;

} }

_LIBCPP_INLINE_VISIBILITY _VSTD::u16string u16string() const { _LIBCPP_INLINE_VISIBILITY _VSTD::u16string u16string() const {

return string<char16_t>(); return string<char16_t>();

} }

_LIBCPP_INLINE_VISIBILITY _VSTD::u32string u32string() const { _LIBCPP_INLINE_VISIBILITY _VSTD::u32string u32string() const {

return string<char32_t>(); return string<char32_t>();

} }

▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines _LIBCPP_INLINE_VISIBILITY _LIBCPP_DEPRECATED_WITH_CHAR8_T

u8path(_InputIt __f, _InputIt __l) { u8path(_InputIt __f, _InputIt __l) {

static_assert( static_assert(

#ifndef _LIBCPP_NO_HAS_CHAR8_T #ifndef _LIBCPP_NO_HAS_CHAR8_T

is_same<typename __is_pathable<_InputIt>::__char_type, char8_t>::value || is_same<typename __is_pathable<_InputIt>::__char_type, char8_t>::value ||

#endif #endif

is_same<typename __is_pathable<_InputIt>::__char_type, char>::value, is_same<typename __is_pathable<_InputIt>::__char_type, char>::value,

"u8path(Iter, Iter) requires Iter have a value_type of type 'char'" "u8path(Iter, Iter) requires Iter have a value_type of type 'char'"

" or 'char8_t'"); " or 'char8_t'");

#if defined(_LIBCPP_WIN32API)

string __tmp(__f, __l);

using _CVT = __widen_from_utf8<sizeof(wchar_t) * __CHAR_BIT__>;

curdeiusUnsubmitted

Not Done

using _CVT = __widen_from_utf8<sizeof(wchar_t) * __CHAR_BIT__>;

- using _Str = basic_string<wchar_t>;

+ using _Str = wstring;

_Str __w;

Or just use it instead of _Str?

curdeius: Or just use it instead of _Str?

_VSTD::wstring __w;

__w.reserve(__tmp.size());

curdeiusUnsubmitted

Not Done

It's there a patch where you add _w.reserve(...)? If not, this one seems like a good candidate.
If I'm not mistaken, widening can decrease the number of characters, so that might get tricky (like going through the input twice, once to count the output size, then doing the real conversion), but at least a FIXME note would be great.
BTW, you probably know that, MultiByteToWideChar will return the required output buffer size without doing the conversion if you pass 0 as cchWideChar.

curdeius: It's there a patch where you add `_w.reserve(...)`? If not, this one seems like a good…

mstorsjoAuthorUnsubmitted

Done

I guess I could call reserve() here using the length of the utf8 string as size. As converting from utf8 to wchar in most cases will make the input shorter, so the size allocated by reserve() should be enough in most concievable cases. For strings consisting mostly of ascii chars, the actual difference in length shouldn't be much, so it shouldn't hurt much with such a rough estimate.

Yeah I know that MultiByteToWideChar can count the needed output size, this is used in e.g. size_t __size = __char_to_wide(__str, nullptr, 0); further up in the same patch when operating on the native narrow charset. I'd rather not mix MultiByteToWideChar with libc++'s __widen_from_utf8, especially as the exact length isn't needed beforehand here as that conversion allocates more as needed.

mstorsjo: I guess I could call `reserve()` here using the length of the utf8 string as size. As…

curdeiusUnsubmitted

Not Done

I agree that mixing MultiByteToWideChar with __widen_from_utf8 doesn't seem like a good idea.
But I'll be for what you suggested, so just reserve possibly more.

curdeius: I agree that mixing `MultiByteToWideChar` with `__widen_from_utf8` doesn't seem like a good…

_CVT()(back_inserter(__w), __tmp.data(), __tmp.data() + __tmp.size());

return path(__w);

#else

return path(__f, __l); return path(__f, __l);

#endif /* !_LIBCPP_WIN32API */

}

#if defined(_LIBCPP_WIN32API)

template <class _InputIt>

_LIBCPP_INLINE_VISIBILITY _LIBCPP_DEPRECATED_WITH_CHAR8_T

typename enable_if<__is_pathable<_InputIt>::value, path>::type

u8path(_InputIt __f, _NullSentinel) {

static_assert(

#ifndef _LIBCPP_NO_HAS_CHAR8_T

is_same<typename __is_pathable<_InputIt>::__char_type, char8_t>::value ||

#endif

is_same<typename __is_pathable<_InputIt>::__char_type, char>::value,

"u8path(Iter, Iter) requires Iter have a value_type of type 'char'"

" or 'char8_t'");

string __tmp;

ldionneUnsubmitted

Not Done

I don't think #ifdefing the static_assert message adds anything. It just makes things harder to read IMO. I know this was done the same way elsewhere, but I think we should actually remove the ifdefs in those other places instead.

ldionne: I don't think `#ifdefing` the `static_assert` message adds anything. It just makes things…

mstorsjoAuthorUnsubmitted

Done

Fair enough, I can remove that bit from this patch, and I'll send a separate patch for removing the other ifdefs, and rebase these patches on top of that (where relevant) once that's merged.

mstorsjo: Fair enough, I can remove that bit from this patch, and I'll send a separate patch for removing…

const char __sentinel = char{};

for (; *__f != __sentinel; ++__f)

__tmp.push_back(*__f);

using _CVT = __widen_from_utf8<sizeof(wchar_t) * __CHAR_BIT__>;

_VSTD::wstring __w;

__w.reserve(__tmp.size());

_CVT()(back_inserter(__w), __tmp.data(), __tmp.data() + __tmp.size());

curdeiusUnsubmitted

Not Done

using _Str = basic_string<wchar_t>;

- _Str __w;

+ _VSTD::wstring __w;

_CVT()(back_inserter(__w), __tmp.data(), __tmp.data() + __tmp.size());

curdeius:

return path(__w);

} }

#endif /* _LIBCPP_WIN32API */

template <class _Source> template <class _Source>

curdeiusUnsubmitted

Not Done

Please add comments to longer #endifs.

curdeius: Please add comments to longer #endifs.

_LIBCPP_INLINE_VISIBILITY _LIBCPP_DEPRECATED_WITH_CHAR8_T _LIBCPP_INLINE_VISIBILITY _LIBCPP_DEPRECATED_WITH_CHAR8_T

typename enable_if<__is_pathable<_Source>::value, path>::type typename enable_if<__is_pathable<_Source>::value, path>::type

u8path(const _Source& __s) { u8path(const _Source& __s) {

static_assert( static_assert(

#ifndef _LIBCPP_NO_HAS_CHAR8_T #ifndef _LIBCPP_NO_HAS_CHAR8_T

is_same<typename __is_pathable<_Source>::__char_type, char8_t>::value || is_same<typename __is_pathable<_Source>::__char_type, char8_t>::value ||

#endif #endif

is_same<typename __is_pathable<_Source>::__char_type, char>::value, is_same<typename __is_pathable<_Source>::__char_type, char>::value,

"u8path(Source const&) requires Source have a character type of type " "u8path(Source const&) requires Source have a character type of type "

"'char' or 'char8_t'"); "'char' or 'char8_t'");

#if defined(_LIBCPP_WIN32API)

using _Traits = __is_pathable<_Source>;

return u8path(__unwrap_iter(_Traits::__range_begin(__s)), __unwrap_iter(_Traits::__range_end(__s)));

#else

return path(__s); return path(__s);

#endif

} }

class _LIBCPP_TYPE_VIS path::iterator { class _LIBCPP_TYPE_VIS path::iterator {

public: public:

enum _ParserState : unsigned char { enum _ParserState : unsigned char {

_Singular, _Singular,

_BeforeBegin, _BeforeBegin,

_InRootName, _InRootName,

▲ Show 20 Lines • Show All 1,423 Lines • Show Last 20 Lines

libcxx/src/filesystem/filesystem_common.h

	Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	template <class T>			template <class T>
	T error_value();			T error_value();
	template <>			template <>
	_LIBCPP_CONSTEXPR_AFTER_CXX11 void error_value<void>() {}			_LIBCPP_CONSTEXPR_AFTER_CXX11 void error_value<void>() {}
	template <>			template <>
	bool error_value<bool>() {			bool error_value<bool>() {
	return false;			return false;
	}			}
				#if __SIZEOF_SIZE_T__ != __SIZEOF_LONG_LONG__
				template <>
				size_t error_value<size_t>() {
				return size_t(-1);
				}
				#endif
	template <>			template <>
	uintmax_t error_value<uintmax_t>() {			uintmax_t error_value<uintmax_t>() {
	return uintmax_t(-1);			return uintmax_t(-1);
	}			}
	template <>			template <>
	_LIBCPP_CONSTEXPR_AFTER_CXX11 file_time_type error_value<file_time_type>() {			_LIBCPP_CONSTEXPR_AFTER_CXX11 file_time_type error_value<file_time_type>() {
	return file_time_type::min();			return file_time_type::min();
	}			}
	▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

libcxx/src/filesystem/operations.cpp

Show All 11 Lines
#include "string_view"		#include "string_view"
#include "type_traits"		#include "type_traits"
#include "vector"		#include "vector"
#include "cstdlib"		#include "cstdlib"
#include "climits"		#include "climits"

#include "filesystem_common.h"		#include "filesystem_common.h"

		#if defined(_LIBCPP_WIN32API)
		# define WIN32_LEAN_AND_MEAN
		ldionneUnsubmitted Not Done Reply Inline Actions Please indent includes like: #if defined(_LIBCPP_WIN32API` # define WIN32_LEAN_AND_MEAN # include <stuff> #else # include <more stuff> #endif ldionne: Please indent includes like: ``` #if defined(_LIBCPP_WIN32API` # define WIN32_LEAN_AND_MEAN…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions Ok, will change. mstorsjo: Ok, will change.
		# define NOMINMAX
		# include <windows.h>
		#else
#include <unistd.h>		# include <unistd.h>
#include <sys/stat.h>		# include <sys/stat.h>
#include <sys/statvfs.h>		# include <sys/statvfs.h>
		#endif
#include <time.h>		#include <time.h>
#include <fcntl.h> /* values for fchmodat */		#include <fcntl.h> /* values for fchmodat */

#if __has_include(<sys/sendfile.h>)		#if __has_include(<sys/sendfile.h>)
# include <sys/sendfile.h>		# include <sys/sendfile.h>
# define _LIBCPP_FILESYSTEM_USE_SENDFILE		# define _LIBCPP_FILESYSTEM_USE_SENDFILE
#elif defined(__APPLE__) \|\| __has_include(<copyfile.h>)		#elif defined(__APPLE__) \|\| __has_include(<copyfile.h>)
# include <copyfile.h>		# include <copyfile.h>
▲ Show 20 Lines • Show All 1,644 Lines • ▼ Show 20 Lines	path::iterator& path::iterator::__decrement() {
PathParser PP(__path_ptr_->native(), __entry_, __state_);		PathParser PP(__path_ptr_->native(), __entry_, __state_);
--PP;		--PP;
__state_ = static_cast<_ParserState>(PP.State);		__state_ = static_cast<_ParserState>(PP.State);
__entry_ = PP.RawEntry;		__entry_ = PP.RawEntry;
__stashed_elem_.__assign_view(*PP);		__stashed_elem_.__assign_view(*PP);
return *this;		return *this;
}		}

		#if defined(_LIBCPP_WIN32API)
		////////////////////////////////////////////////////////////////////////////
		// Windows path conversions
		size_t __wide_to_char(const wstring &str, char *out, size_t outlen) {
		if (str.empty())
		return 0;
		ErrorHandler<size_t> err("__wide_to_char", nullptr);
		UINT codepage = AreFileApisANSI() ? CP_ACP : CP_OEMCP;
		amccarthUnsubmitted Not Done Reply Inline Actions I'm not clear on what AreFileApisANSI does, nor have I found it using code search. In what case would any APIs use the OEM code page when the user's code page is something else? I'm also wondering whether CP_ACP should be CP_THREAD_ACP, which, by default, will be the user's current code page which could be different than the system code page. amccarth: I'm not clear on what AreFileApisANSI does, nor have I found it using code search. In what…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions See D91133 for a testcase (which passes with MSVC STL) for this and some more - a process can switch between whether the narrow file apis take OEM CP or ACP with SetFileApisToANSI() and SetFileApisToOEM(). The MSVC STL sources don't seem to use CP_THREAD_ACP at least... however they also check for `___lc_codepage_func() == CP_UTF8`, which I guess we also should... mstorsjo: See D91133 for a testcase (which passes with MSVC STL) for this and some more - a process can…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions To follow up on the last bit regarding `___lc_codepage_func()`, contrary to `SetFileApisToOEM()` which operates on the kernel32 level, affecting how all narrow file names are interpreted, one can also do `setlocale(LC_ALL, ".utf8");`, which makes e.g. `fopen()` work with utf8 file names, but it has no effect on actual kernel32 -A suffixed file APIs. But in any case, I guess adding the extra check for `___lc_codepage_func()`, like `___lc_codepage_func() == CP_UTF8 ? CP_UTF8 : AreFileApisANSI() ? CP_ACP : CP_OEMCP`. After reading up on CP_THREAD_ACP, I don't think that's relevant. The main point is that if I export a narrow form of a filename from a std::filesystem::path object (or create a path object from a narrow string), I expect the string to be in the same codepage that `fopen()` or `CreateFileA()` would accept and interpreted in the same way, and as far as I'm reading about CP_THREAD_ACP, it only affects other bits, but not how kernel32 file APIs map narrow chars to wchar. mstorsjo: To follow up on the last bit regarding `___lc_codepage_func()`, contrary to `SetFileApisToOEM…
		BOOL used_default = FALSE;
		int ret = WideCharToMultiByte(codepage, 0, str.data(), str.size(), out,
		outlen, nullptr, &used_default);
		if (ret <= 0 \|\| used_default)
		return err.report(errc::illegal_byte_sequence);
		return ret;
		}

		size_t __char_to_wide(const string &str, wchar_t *out, size_t outlen) {
		if (str.empty())
		return 0;
		ErrorHandler<size_t> err("__char_to_wide", nullptr);
		UINT codepage = AreFileApisANSI() ? CP_ACP : CP_OEMCP;
		int ret = MultiByteToWideChar(codepage, MB_ERR_INVALID_CHARS, str.data(),
		str.size(), out, outlen);
		if (ret <= 0)
		return err.report(errc::illegal_byte_sequence);
		return ret;
		}
		#endif


///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////
// directory entry definitions		// directory entry definitions
///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////

#ifndef _LIBCPP_WIN32API		#ifndef _LIBCPP_WIN32API
error_code directory_entry::__do_refresh() noexcept {		error_code directory_entry::__do_refresh() noexcept {
__data_.__reset();		__data_.__reset();
error_code failure_ec;		error_code failure_ec;
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[5/N] [libcxx] Convert paths to/from the right narrow code page for narrow strings on windows
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 312717

libcxx/include/filesystem

libcxx/src/filesystem/filesystem_common.h

libcxx/src/filesystem/operations.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[5/N] [libcxx] Convert paths to/from the right narrow code page for narrow strings on windowsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 312717

libcxx/include/filesystem

libcxx/src/filesystem/filesystem_common.h

libcxx/src/filesystem/operations.cpp

[5/N] [libcxx] Convert paths to/from the right narrow code page for narrow strings on windows
ClosedPublic