This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
include/
-
__filesystem/
2/2
path.h
2/5
__locale
-
test/std/input.output/filesystems/class.path/path.member/path.construct/
-
std/
-
input.output/
-
filesystems/
-
class.path/
-
path.member/
-
path.construct/
-
source.pass.cpp
2/5
source_and_locale.pass.cpp

Differential D64818

[libc++] Implement missing filesystem::path constructors with locale
Needs RevisionPublic

Authored by ldionne on Jul 16 2019, 1:42 PM.

Download Raw Diff

Details

Reviewers

mclow.lists
EricWF
jguegant
tahonermann

Group Reviewers

Restricted Project

Summary

This patch provides the two missing constructors for std::filesystem::path
using an instance of std::locale to convert the source.

Given that the source needs to be converted using the
codecvt<wchar_t, char, mbstate_t> facet of the std::locale at first,
these two constructors:

Can only take a char source.
Need to have different path than the current two other constructors taking a source.

After converting to a wchar_t source using the codecvt, we need to convert
a second time to store the result. The standard is unclear on how the
second conversion should happen:

> Otherwise a conversion is performed using the codecvt<wchar_t, char, mbstate_t>
> facet of loc, and then a second conversion to the current ordinary encoding.

I interpreted this as the second conversion can be executed however we want,
meaning that I can reuse the facilities from the two constructors that accept
any kind of source.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jguegant created this revision.Jul 16 2019, 1:42 PM

Herald added subscribers: libcxx-commits, dexonsmith, christof. · View Herald TranscriptJul 16 2019, 1:42 PM

jguegant edited the summary of this revision. (Show Details)Jul 16 2019, 1:43 PM

Note: this is my first contribution to libc++. I tried to follow the style used in the rest of files, but I may have missed some hidden coding style rules: some files use four spaces, some two? Which SFINAE pattern (return type, class paramter...) is to be preferred? Free template functions vs template classes with static member functions?

• Quuxplusone added a subscriber: • Quuxplusone.Jul 17 2019, 8:14 AM

• Quuxplusone added inline comments.

libcxx/include/filesystem
809 ↗	(On Diff #210158)	`char{}` is a strange way to spell `'\0'`, and since `sentinel` is used only on line 811, I think you don't need `sentinel` at all.

jguegant marked an inline comment as done.Jul 17 2019, 11:19 AM

jguegant added inline comments.

libcxx/include/filesystem
741 ↗	(On Diff #210158)	Not sure what is the expected behavior if the locale does not this facet.
809 ↗	(On Diff #210158)	Sure, that make sense now! It used to be a generic CharT.

Removed the unnecessary variables __sentinel.

Please re-upload the diff with full context (e.g. git diff -U999 blah blah).

BTW, I'm giving a mostly-superficial review, and only because we talked on Slack the other day. I have no opinion as to whether this patch is wanted by libc++, no opinion as to whether it's correct from a charset-wonk POV, and no ability to land it. You'll still have to get interest from one of the three listed reviewers in order to make concrete progress here. :)

libcxx/include/__locale
1367	Peanut gallery says: I'm not 100% sure of the rules about `const`. I observe that this line seems to be okay even in `-pedantic` mode, but intuitively I would expect `__buf` to be a VLA. IMHO it would be clearer to make `__sz` a `constexpr int` instead of just a `const int` — or even my generally preferred formulation, _InternT __buf[32]; constexpr ptrdiff_t __sz = sizeof(__buf) / sizeof(__buf[0]);
1374	`__s` is of user-provided type. Depending on level of library-maintainer paranoia, you might want to make this `++__p, void(), ++__s`, or pull `++__s` down into the body of the for-loop (and add curly braces). The current formulation will call a user-defined `operator,(const char*, _OutputIterator)` if one exists. (And will perform ADL to find out if one exists, regardless.)
libcxx/include/filesystem
749 ↗	(On Diff #210375)	`class = _EnableIf<!is_constructible<const char, _InputIt>::value>` But also, how is that constructibility relevant? Does the Standard say something like "If the input iterator is explicitly convertible to `const char` then do X, else do Y"? Furthermore, this codepath is going to do heap-allocation with `string __s(__b, __e)`. Can we avoid the heap-allocation somehow?
753 ↗	(On Diff #210375)	I'm worried about the way you use overloading on the name `__widen_from_char_iter_pair`. If some well-intentioned maintainer removed the `const` on line 752, then line 753 would turn into an infinite recursion. I don't see any reason to use overloading here. Maybe change `__widen_from_char_iter_pair` on lines 739, 753, 766, 776, 793, and 811 to `__widen_from_char_pointer_pair`? That would leave only line 889 as a potential source of extra heap-allocations...
792 ↗	(On Diff #210375)	This could probably be more simply expressed as string_view __sv(__b); return __widen_from_char_iter_pair(__sv.data(), __sv.data() + __sv.size(), __loc);
838 ↗	(On Diff #210375)	template <class _SourceOrIter> using _EnableIfWidenableFromCharSource = _EnableIf<__is_widenable_from_char_source<_SourceOrIter>::value>; You don't seem to use the `_Tp` parameter for anything at the moment.
libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source_and_locale.pass.cpp
33	Should `std::locale` be passed as `const std::locale&` instead? Does it matter? (I don't know.)
98	Why is this commented out?

Thanks a lot @Quuxplusone! I am aware that you cannot take any final decision on what gets in but these are really helpful comments here.
I will fix some of you remarks and upload a better diff (with proper context to get a better picture) afterwards.

Any way to gently attract the three reviewers here?

libcxx/include/__locale
1367	This function is a dumb generalisation of the functions 1389, 1426 which had this pattern. Correct me if I am wrong but this will work as C-arrays have such syntax: D1 [ constant-expressionopt ] attribute-specifier-seqopt Where: The constant-expression shall be a converted constant expression of type std::size_t ([expr.const]). Here const int should work thanks to a special rule for const-qualified integrals: A variable is usable in constant expressions after its initializing declaration is encountered if it is a constexpr variable, or it is of reference type or of const-qualified integral or enumeration type, and its initializer is a constant initializer.
1374	Good idea! Now that this function accept more than one type, it seems a reasonable to be paranoid.
libcxx/include/filesystem
749 ↗	(On Diff #210375)	The idea here is to give the priority to the overload with `const char` which effectively make both `char` and `const char`being routed by the first overload line 739. Obviously, my intent is not really clear here, I wonder how I could express that in a better way if I continue in that direction. Maybe having 3 overloads? const char, char* and generic? A more descriptive trait name?
753 ↗	(On Diff #210375)	I believe that this would not create an infinite recursion, the enable_if is sending both char* and const char* to the first overload line 739. I am not sure if there is a smart way to avoid allocation here. Maybe if I call `codecvt::in` on every character from the iterator. But wouldn't that be slow? What happen if your code-point is made of 3-4 char instead of one? Should I then copy X amounts of characters into a buffer allocated on the stack and send process the iterator chunks by chunks? I have a similar issue line 808 and if anyone has a clever solution, I am all for it! I had a look at how other constructors for path already in place (like line 681 or 692) do it and they already heap-allocate on even more code-paths: template <class _Iter> static void __append_range(string& __dest, _Iter __b, _Iter __e) { static_assert(!is_same<_Iter, _ECharT>::value, "Call const overload"); if (__b == __e) return; basic_string<_ECharT> __tmp(__b, __e); _Narrower()(back_inserter(__dest), __tmp.data(), __tmp.data() + __tmp.length()); } If we go in your direction, 889 would now heap-allocation in all scenarios, whereas now the 889 overload supplied with const char and char* would avoid allocating a string.
792 ↗	(On Diff #210375)	Neat! Will do that.
838 ↗	(On Diff #210375)	Indeed, I will remove it!
libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source_and_locale.pass.cpp
33	I was thinking that locale is a pretty cheap type (a sort of reference counted pointer), but I might be wrong. Taking it by reference cannot make it worst ;)
98	This should not be here. Thanks for spotting it!

Fixed few remarks mentioned by @Quuxplusone.

ldionne added a reviewer: Restricted Project.Nov 2 2020, 2:38 PM

[Github PR transition cleanup]

Commandeering to finish.

Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2023, 11:31 AM

Rebase. Simplify the implementation a bit. Remove some tests for SFINAE-friendliness that are not necessary according to my reading of the spec.

Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2023, 11:32 AM

Harbormaster completed remote builds in B257418: Diff 557061.Sep 19 2023, 1:50 PM

Add missing _LIBCPP_HIDE_FROM_ABI

Harbormaster completed remote builds in B257588: Diff 557333.Sep 25 2023, 3:58 PM

Fix C++03 build.

Harbormaster completed remote builds in B257603: Diff 557355.Sep 26 2023, 6:50 PM

Make sure we initialize mbstate_t to avoid uninitialized read.

Harbormaster completed remote builds in B257768: Diff 557613.Oct 5 2023, 5:25 PM

Try to fix the CI

Harbormaster completed remote builds in B257933: Diff 557881.Oct 25 2023, 1:44 PM

Remove invalid test -- we're not allowed to construct a path from a wchar_t* using the locale constructor, only char* it allowed.

Harbormaster completed remote builds in B257941: Diff 557895.Oct 26 2023, 1:25 PM

I think this is looking good. Before I bless it though, I want to make sure the intent is clear.

The new locale sensitive constructors will use the codecvt<wchar_t, char, ...> facet from the specified locale to convert the provided char-based source (locale dependent encoding) to a wchar_t-based representation (UTF-16 or UTF-32 depending on platform). If on Windows, done. If on POSIX, the wchar_t-based representation is then converted to UTF-8 in char-based storage.

That seems like the right semantics; or at least the best that we can do given that filenames don't have strongly associated encodings. I haven't managed to validate for myself yet; is this consistent with the non-locale based constructors that take a char-based source? I presume those are assumed to be UTF-8 on POSIX regardless of whatever the current locale settings are.

If it isn't too difficult to do, it would be good to use a real locale (e.g., one that uses ISO-8859-1 encoding) and validate the conversion to UTF-8 (POSIX) or UTF-16 (Windows) in the test with some characters that require re-encoding. The existing test validates that all the characters are changed to 'o', but doesn't exercise a change to encoded length or that the characters didn't undergo a further conversion (exercising non-ASCII characters would help with that).

No real issue from my side. I'll leave the approval to Tom.

libcxx/include/__filesystem/path.h
552
557

tahonermann requested changes to this revision.Nov 1 2023, 1:17 PM

tahonermann added inline comments.

libcxx/include/__locale
1566–1579	This can end up in an infinite loop when the narrow input string ends in a partial code unit sequence. In that case, `partial` will be returned, but `from_next` (aka `__nn`) will be left pointing to the beginning of the partial sequence such that the next iteration will encounter the same partial result. See https://godbolt.org/z/3q5xaG5dW for an example. The test there reliably exhibits an infinite loop for `std::codecvt<char32_t, char, std::mbstate_t>`. For the `std::codecvt<wchar_t, char, std::mbstate_t>` facet, the infinite loop is avoided because the converter packs partial state into `std::mbstate_t` and then (erroneously) returns `ok` instead of `partial` (erroneously because no character is ever translated, but no error is ever issued either). These appear to be existing bug given that this code was just factored out from other locations.
libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source_and_locale.pass.cpp
129	I think additional tests should be added here to exercise various edge cases like the truncated partial code unit sequence in the godbolt link I added in another comment. We should also exercise some MxN sequence conversions (e.g., UTF-8 to UTF-16). That can be done by calling `.u16string()` on the resulting `path` object and comparing what it returns to expectations.

This revision now requires changes to proceed.Nov 1 2023, 1:17 PM

Revision Contents

Path

Size

libcxx/

include/

__filesystem/

path.h

72 lines

__locale

64 lines

test/

std/

input.output/

filesystems/

class.path/

path.member/

path.construct/

source.pass.cpp

2 lines

source_and_locale.pass.cpp

140 lines

Diff 557895

libcxx/include/__filesystem/path.h

Show All 13 Lines

#include <__algorithm/replace_copy.h>

#include <__availability>

#include <__config>

#include <__functional/hash.h>

#include <__functional/unary_function.h>

#include <__fwd/hash.h>

#include <__iterator/back_insert_iterator.h>

#include <__iterator/iterator_traits.h>

#include <__memory/pointer_traits.h>

#include <__type_traits/decay.h>

#include <__type_traits/is_pointer.h>

#include <__type_traits/remove_const.h>

#include <__type_traits/remove_pointer.h>

#include <cstddef>

#include <string>

#include <string_view>

▲ Show 20 Lines • Show All 408 Lines • ▼ Show 20 Lines

struct _PathExport<char8_t> {

_LIBCPP_HIDE_FROM_ABI

static void __append(_Str& __dest, const __path_string& __src) {

_Narrower()(back_inserter(__dest), __src.data(), __src.data() + __src.size());

}

};

#endif /* !_LIBCPP_HAS_NO_CHAR8_T */

#endif /* _LIBCPP_WIN32API */

# if !defined(_LIBCPP_HAS_NO_LOCALIZATION) && !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS)

template <class _Iterator, __enable_if_t<__libcpp_is_contiguous_iterator<_Iterator>::value, int> = 0>

_LIBCPP_HIDE_FROM_ABI wstring __widen_char_source(_Iterator __first, _Iterator __last, const locale& __loc) {

static_assert(__is_pathable_iter<_Iterator>::value, "this function requires a pathable iterator");

wstring __r;

if (!has_facet<codecvt<wchar_t, char, mbstate_t>>(__loc))

return __r;

__r.reserve(__last - __first);

std::__widen_using_codecvt(

use_facet<codecvt<wchar_t, char, mbstate_t>>(__loc),

back_inserter(__r),

std::__to_address(__first),

std::__to_address(__last));

return __r;

}

template <class _Iterator, __enable_if_t<!__libcpp_is_contiguous_iterator<_Iterator>::value, int> = 0>

_LIBCPP_HIDE_FROM_ABI wstring __widen_char_source(_Iterator __first, _Iterator __last, const locale& __loc) {

static_assert(__is_pathable_iter<_Iterator>::value, "this function requires a pathable iterator");

string __tmp(__first, __last);

return filesystem::__widen_char_source(__tmp.begin(), __tmp.end(), __loc);

}

template <class _Traits>

_LIBCPP_HIDE_FROM_ABI wstring __widen_char_source(basic_string_view<char, _Traits> const& __sv, const locale& __loc) {

return filesystem::__widen_char_source(__sv.begin(), __sv.end(), __loc);

}

template <class _Traits, class _Alloc>

_LIBCPP_HIDE_FROM_ABI wstring __widen_char_source(basic_string<char, _Traits, _Alloc> const& __s, const locale& __loc) {

return filesystem::__widen_char_source(__s.begin(), __s.end(), __loc);

}

template <class _Iterator, __enable_if_t<__libcpp_is_contiguous_iterator<_Iterator>::value, int> = 0>

_LIBCPP_HIDE_FROM_ABI wstring __widen_char_source(_Iterator __it, const locale& __loc) {

static_assert(__is_pathable_iter<_Iterator>::value, "this function requires a pathable iterator");

auto __len = char_traits<char>::length(std::__to_address(__it));

return filesystem::__widen_char_source(std::__to_address(__it), std::__to_address(__it) + __len, __loc);

}

template <class _Iterator, __enable_if_t<!__libcpp_is_contiguous_iterator<_Iterator>::value, int> = 0>

_LIBCPP_HIDE_FROM_ABI wstring __widen_char_source(_Iterator __it, const locale& __loc) {

static_assert(__is_pathable_iter<_Iterator>::value, "this function requires a pathable iterator");

string __s;

for (char __c = *__it; __c != '\0'; ++__it, (void)(__c = *__it))

__s.push_back(__c);

return filesystem::__widen_char_source(__s.begin(), __s.end(), __loc);

}

# endif // !defined(_LIBCPP_HAS_NO_LOCALIZATION) && !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS)

class _LIBCPP_EXPORTED_FROM_ABI path {

template <class _SourceOrIter, class _Tp = path&>

using _EnableIfPathable = __enable_if_t<__is_pathable<_SourceOrIter>::value, _Tp>;

template <class _Tp>

using _SourceChar = typename __is_pathable<_Tp>::__char_type;

template <class _Tp>

Show All 34 Lines

#endif

template <class _InputIt>

_LIBCPP_HIDE_FROM_ABI

path(_InputIt __first, _InputIt __last, format = format::auto_format) {

typedef typename iterator_traits<_InputIt>::value_type _ItVal;

_PathCVT<_ItVal>::__append_range(__pn_, __first, __last);

}

# if !defined(_LIBCPP_HAS_NO_LOCALIZATION) && !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS)

#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)

template <class _Source, _EnableIfPathable<_Source, int> = 0>

// TODO Implement locale conversions.

_LIBCPP_HIDE_FROM_ABI path(const _Source& __src, const locale& __loc, format = format::auto_format) {

template <class _Source, class = _EnableIfPathable<_Source, void> >

_SourceCVT<std::wstring>::__append_source(__pn_, filesystem::__widen_char_source(__src, __loc));

MordanteUnsubmitted

Not Done

_LIBCPP_HIDE_FROM_ABI path(const _Source& __src, const locale& __loc, format = format::auto_format) {

- _SourceCVT<std::wstring>::__append_source(__pn_, filesystem::__widen_char_source(__src, __loc));

+ _SourceCVT<wstring>::__append_source(__pn_, filesystem::__widen_char_source(__src, __loc));

}

template <class _InputIt>

Mordante:

path(const _Source& __src, const locale& __loc, format = format::auto_format);

}

template <class _InputIt>

path(_InputIt __first, _InputIt _last, const locale& __loc,

_LIBCPP_HIDE_FROM_ABI path(_InputIt __first, _InputIt __last, const locale& __loc, format = format::auto_format) {

format = format::auto_format);

_SourceCVT<std::wstring>::__append_source(__pn_, filesystem::__widen_char_source(__first, __last, __loc));

MordanteUnsubmitted

Not Done

_LIBCPP_HIDE_FROM_ABI path(_InputIt __first, _InputIt __last, const locale& __loc, format = format::auto_format) {

- _SourceCVT<std::wstring>::__append_source(__pn_, filesystem::__widen_char_source(__first, __last, __loc));

+ _SourceCVT<wstring>::__append_source(__pn_, filesystem::__widen_char_source(__first, __last, __loc));

}

# endif

Mordante:

}

#endif

# endif

_LIBCPP_HIDE_FROM_ABI

~path() = default;

// assignments

_LIBCPP_HIDE_FROM_ABI

path& operator=(const path& __p) {

__pn_ = __p.__pn_;

▲ Show 20 Lines • Show All 588 Lines • Show Last 20 Lines

libcxx/include/__locale

Show First 20 Lines • Show All 1,358 Lines • ▼ Show 20 Lines	public:
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
result out(state_type& __st,		result out(state_type& __st,
const intern_type* __frm, const intern_type* __frm_end, const intern_type*& __frm_nxt,		const intern_type* __frm, const intern_type* __frm_end, const intern_type*& __frm_nxt,
extern_type* __to, extern_type* __to_end, extern_type*& __to_nxt) const		extern_type* __to, extern_type* __to_end, extern_type*& __to_nxt) const
{		{
return do_out(__st, __frm, __frm_end, __frm_nxt, __to, __to_end, __to_nxt);		return do_out(__st, __frm, __frm_end, __frm_nxt, __to, __to_end, __to_nxt);
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions Peanut gallery says: I'm not 100% sure of the rules about `const`. I observe that this line seems to be okay even in `-pedantic` mode, but intuitively I would expect `__buf` to be a VLA. IMHO it would be clearer to make `__sz` a `constexpr int` instead of just a `const int` — or even my generally preferred formulation, _InternT __buf[32]; constexpr ptrdiff_t __sz = sizeof(__buf) / sizeof(__buf[0]); Quuxplusone: Peanut gallery says: I'm not 100% sure of the rules about `const`. I observe that this line…
		jguegantUnsubmitted Done Reply Inline Actions This function is a dumb generalisation of the functions 1389, 1426 which had this pattern. Correct me if I am wrong but this will work as C-arrays have such syntax: D1 [ constant-expressionopt ] attribute-specifier-seqopt Where: The constant-expression shall be a converted constant expression of type std::size_t ([expr.const]). Here const int should work thanks to a special rule for const-qualified integrals: A variable is usable in constant expressions after its initializing declaration is encountered if it is a constexpr variable, or it is of reference type or of const-qualified integral or enumeration type, and its initializer is a constant initializer. jguegant: This function is a dumb generalisation of the functions 1389, 1426 which had this pattern.
result unshift(state_type& __st,		result unshift(state_type& __st,
extern_type* __to, extern_type* __to_end, extern_type*& __to_nxt) const		extern_type* __to, extern_type* __to_end, extern_type*& __to_nxt) const
{		{
return do_unshift(__st, __to, __to_end, __to_nxt);		return do_unshift(__st, __to, __to_end, __to_nxt);
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions `__s` is of user-provided type. Depending on level of library-maintainer paranoia, you might want to make this `++__p, void(), ++__s`, or pull `++__s` down into the body of the for-loop (and add curly braces). The current formulation will call a user-defined `operator,(const char, _OutputIterator)` if one exists. (And will perform ADL to find out if one exists, regardless.) Quuxplusone:* `__s` is of user-provided type. Depending on level of library-maintainer paranoia, you might…
		jguegantUnsubmitted Done Reply Inline Actions Good idea! Now that this function accept more than one type, it seems a reasonable to be paranoid. jguegant: Good idea! Now that this function accept more than one type, it seems a reasonable to be…
result in(state_type& __st,		result in(state_type& __st,
const extern_type* __frm, const extern_type* __frm_end, const extern_type*& __frm_nxt,		const extern_type* __frm, const extern_type* __frm_end, const extern_type*& __frm_nxt,
intern_type* __to, intern_type* __to_end, intern_type*& __to_nxt) const		intern_type* __to, intern_type* __to_end, intern_type*& __to_nxt) const
{		{
return do_in(__st, __frm, __frm_end, __frm_nxt, __to, __to_end, __to_nxt);		return do_in(__st, __frm, __frm_end, __frm_nxt, __to, __to_end, __to_nxt);
}		}

_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	_LIBCPP_SUPPRESS_DEPRECATED_POP
~__narrow_to_utf8() override;		~__narrow_to_utf8() override;

template <class _OutputIterator, class _CharT>		template <class _OutputIterator, class _CharT>
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
_OutputIterator		_OutputIterator
operator()(_OutputIterator __s, const _CharT* __wb, const _CharT* __we) const		operator()(_OutputIterator __s, const _CharT* __wb, const _CharT* __we) const
{		{
result __r = ok;		result __r = ok;
mbstate_t __mb;		mbstate_t __mb = mbstate_t();
while (__wb < __we && __r != error)		while (__wb < __we && __r != error)
{		{
const int __sz = 32;		const int __sz = 32;
char __buf[__sz];		char __buf[__sz];
char* __bn;		char* __bn;
const char16_t* __wn = (const char16_t*)__wb;		const char16_t* __wn = (const char16_t*)__wb;
__r = do_out(__mb, (const char16_t)__wb, (const char16_t)__we, __wn,		__r = do_out(__mb, (const char16_t)__wb, (const char16_t)__we, __wn,
__buf, __buf+__sz, __bn);		__buf, __buf+__sz, __bn);
Show All 19 Lines	_LIBCPP_SUPPRESS_DEPRECATED_POP
~__narrow_to_utf8() override;		~__narrow_to_utf8() override;

template <class _OutputIterator, class _CharT>		template <class _OutputIterator, class _CharT>
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
_OutputIterator		_OutputIterator
operator()(_OutputIterator __s, const _CharT* __wb, const _CharT* __we) const		operator()(_OutputIterator __s, const _CharT* __wb, const _CharT* __we) const
{		{
result __r = ok;		result __r = ok;
mbstate_t __mb;		mbstate_t __mb = mbstate_t();
while (__wb < __we && __r != error)		while (__wb < __we && __r != error)
{		{
const int __sz = 32;		const int __sz = 32;
char __buf[__sz];		char __buf[__sz];
char* __bn;		char* __bn;
const char32_t* __wn = (const char32_t*)__wb;		const char32_t* __wn = (const char32_t*)__wb;
__r = do_out(__mb, (const char32_t)__wb, (const char32_t)__we, __wn,		__r = do_out(__mb, (const char32_t)__wb, (const char32_t)__we, __wn,
__buf, __buf+__sz, __bn);		__buf, __buf+__sz, __bn);
if (__r == codecvt_base::error \|\| __wn == (const char32_t*)__wb)		if (__r == codecvt_base::error \|\| __wn == (const char32_t*)__wb)
__throw_runtime_error("locale not supported");		__throw_runtime_error("locale not supported");
for (const char* __p = __buf; __p < __bn; ++__p, ++__s)		for (const char* __p = __buf; __p < __bn; ++__p, ++__s)
__s = __p;		__s = __p;
__wb = (const _CharT*)__wn;		__wb = (const _CharT*)__wn;
}		}
return __s;		return __s;
}		}
};		};

		_LIBCPP_SUPPRESS_DEPRECATED_PUSH
		template <class _InternT, class _ExternT, class _StateT, class _OutputIterator>
		_LIBCPP_HIDE_FROM_ABI _OutputIterator __widen_using_codecvt(
		codecvt<_InternT, _ExternT, _StateT> const& __cvt, _OutputIterator __s, const char* __nb, const char* __ne) {
		codecvt_base::result __r = codecvt_base::ok;
		mbstate_t __mb = mbstate_t();
		while (__nb < __ne && __r != codecvt_base::error) {
		static const int __sz = 32;
		_InternT __buf[__sz];
		_InternT* __bn;
		const char* __nn = __nb;
		__r = __cvt.in(__mb, __nb, __ne - __nb > __sz ? __nb + __sz : __ne, __nn, __buf, __buf + __sz, __bn);
		if (__r == codecvt_base::error \|\| __nn == __nb)
		std::__throw_runtime_error("locale not supported");
		for (const _InternT* __p = __buf; __p < __bn; ++__p) {
		__s = __p;
		++__s;
		}
		__nb = __nn;
		}
		tahonermannUnsubmitted Not Done Reply Inline Actions This can end up in an infinite loop when the narrow input string ends in a partial code unit sequence. In that case, `partial` will be returned, but `from_next` (aka `__nn`) will be left pointing to the beginning of the partial sequence such that the next iteration will encounter the same partial result. See https://godbolt.org/z/3q5xaG5dW for an example. The test there reliably exhibits an infinite loop for `std::codecvt<char32_t, char, std::mbstate_t>`. For the `std::codecvt<wchar_t, char, std::mbstate_t>` facet, the infinite loop is avoided because the converter packs partial state into `std::mbstate_t` and then (erroneously) returns `ok` instead of `partial` (erroneously because no character is ever translated, but no error is ever issued either). These appear to be existing bug given that this code was just factored out from other locations. tahonermann: This can end up in an infinite loop when the narrow input string ends in a partial code unit…
		return __s;
		}
		_LIBCPP_SUPPRESS_DEPRECATED_POP

template <size_t _Np>		template <size_t _Np>
struct __widen_from_utf8		struct __widen_from_utf8
{		{
template <class _OutputIterator>		template <class _OutputIterator>
_OutputIterator		_OutputIterator
operator()(_OutputIterator __s, const char* __nb, const char* __ne) const;		operator()(_OutputIterator __s, const char* __nb, const char* __ne) const;
};		};

Show All 22 Lines	_LIBCPP_SUPPRESS_DEPRECATED_POP

~__widen_from_utf8() override;		~__widen_from_utf8() override;

template <class _OutputIterator>		template <class _OutputIterator>
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
_OutputIterator		_OutputIterator
operator()(_OutputIterator __s, const char* __nb, const char* __ne) const		operator()(_OutputIterator __s, const char* __nb, const char* __ne) const
{		{
result __r = ok;		return std::__widen_using_codecvt(*this, __s, __nb, __ne);
mbstate_t __mb;
while (__nb < __ne && __r != error)
{
const int __sz = 32;
char16_t __buf[__sz];
char16_t* __bn;
const char* __nn = __nb;
__r = do_in(__mb, __nb, __ne - __nb > __sz ? __nb+__sz : __ne, __nn,
__buf, __buf+__sz, __bn);
if (__r == codecvt_base::error \|\| __nn == __nb)
__throw_runtime_error("locale not supported");
for (const char16_t* __p = __buf; __p < __bn; ++__p, ++__s)
__s = __p;
__nb = __nn;
}
return __s;
}		}
};		};

_LIBCPP_SUPPRESS_DEPRECATED_PUSH		_LIBCPP_SUPPRESS_DEPRECATED_PUSH
template <>		template <>
struct _LIBCPP_EXPORTED_FROM_ABI __widen_from_utf8<32>		struct _LIBCPP_EXPORTED_FROM_ABI __widen_from_utf8<32>
: public codecvt<char32_t, char, mbstate_t>		: public codecvt<char32_t, char, mbstate_t>
{		{
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
__widen_from_utf8() : codecvt<char32_t, char, mbstate_t>(1) {}		__widen_from_utf8() : codecvt<char32_t, char, mbstate_t>(1) {}
_LIBCPP_SUPPRESS_DEPRECATED_POP		_LIBCPP_SUPPRESS_DEPRECATED_POP

~__widen_from_utf8() override;		~__widen_from_utf8() override;

template <class _OutputIterator>		template <class _OutputIterator>
_LIBCPP_INLINE_VISIBILITY		_LIBCPP_INLINE_VISIBILITY
_OutputIterator		_OutputIterator
operator()(_OutputIterator __s, const char* __nb, const char* __ne) const		operator()(_OutputIterator __s, const char* __nb, const char* __ne) const
{		{
result __r = ok;		return std::__widen_using_codecvt(*this, __s, __nb, __ne);
mbstate_t __mb;
while (__nb < __ne && __r != error)
{
const int __sz = 32;
char32_t __buf[__sz];
char32_t* __bn;
const char* __nn = __nb;
__r = do_in(__mb, __nb, __ne - __nb > __sz ? __nb+__sz : __ne, __nn,
__buf, __buf+__sz, __bn);
if (__r == codecvt_base::error \|\| __nn == __nb)
__throw_runtime_error("locale not supported");
for (const char32_t* __p = __buf; __p < __bn; ++__p, ++__s)
__s = __p;
__nb = __nn;
}
return __s;
}		}
};		};

// template <class charT> class numpunct		// template <class charT> class numpunct

template <class _CharT> class _LIBCPP_TEMPLATE_VIS numpunct;		template <class _CharT> class _LIBCPP_TEMPLATE_VIS numpunct;

template <>		template <>
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source.pass.cpp

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	struct Traits {
using pointer = const char*;		using pointer = const char*;
using reference = const char&;		using reference = const char&;
using difference_type = std::ptrdiff_t;		using difference_type = std::ptrdiff_t;
};		};
using It = cpp17_input_iterator<const char*, Traits>;		using It = cpp17_input_iterator<const char*, Traits>;
static_assert(std::is_constructible<path, It>::value, "");		static_assert(std::is_constructible<path, It>::value, "");
}		}
{		{
using It = cpp17_output_iterator<const char*>;		using It = cpp17_output_iterator<char*>;
static_assert(!std::is_constructible<path, It>::value, "");		static_assert(!std::is_constructible<path, It>::value, "");

}		}
{		{
static_assert(!std::is_constructible<path, int*>::value, "");		static_assert(!std::is_constructible<path, int*>::value, "");
}		}
}		}

Show All 16 Lines

libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source_and_locale.pass.cpp

This file was added.

				//===----------------------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				// UNSUPPORTED: c++03
				// UNSUPPORTED: availability-filesystem-missing
				// UNSUPPORTED: no-localization, no-wide-characters

				// <filesystem>

				// class path

				// template <class Source>
				// path(const Source& source, const locale& loc, format = format::auto_format);
				//
				// template <class InputIterator>
				// path(InputIterator first, InputIterator last, const locale& loc, format = format::auto_format);

				#include "filesystem_include.h"
				#include <cassert>
				#include <cstddef>
				#include <locale>
				#include <string>
				#include <type_traits>

				#include "../../path_helper.h"
				#include "test_macros.h"
				#include "test_iterators.h"
				#include "min_allocator.h"
				QuuxplusoneUnsubmitted Not Done Reply Inline Actions Should `std::locale` be passed as `const std::locale&` instead? Does it matter? (I don't know.) Quuxplusone: Should `std::locale` be passed as `const std::locale&` instead? Does it matter? (I don't know.)
				jguegantUnsubmitted Done Reply Inline Actions I was thinking that locale is a pretty cheap type (a sort of reference counted pointer), but I might be wrong. Taking it by reference cannot make it worst ;) jguegant: I was thinking that locale is a pretty cheap type (a sort of reference counted pointer), but I…

				template <class... Args>
				void RunTestCase(
				MultiStringType const& TestPath, MultiStringType const& Expect, const std::locale& Locale, Args... args) {
				std::string expect_char(Expect);
				fs::path::string_type expect_native(Expect);

				// StringTypes
				{
				const std::string S(TestPath);
				fs::path p(S, Locale, args...);
				assert(p.native() == expect_native);
				assert(p.string<char>() == expect_char);
				}
				{
				const std::string_view S(TestPath);
				fs::path p(S, Locale, args...);
				assert(p.native() == expect_native);
				assert(p.string<char>() == expect_char);
				}
				// char* pointers
				{
				char const* charp = TestPath;
				fs::path p(charp, Locale, args...);
				assert(p.native() == expect_native);
				assert(p.string<char>() == expect_char);
				}
				{
				char const* charp = TestPath;
				char const* charp_end = charp + StrLen(charp);
				fs::path p(charp, charp_end, Locale, args...);
				assert(p.native() == expect_native);
				assert(p.string<char>() == expect_char);
				}
				// Iterators
				{
				using It = cpp17_input_iterator<const char*>;
				char const* charp = TestPath;
				fs::path p(It{charp}, Locale, args...);
				assert(p.native() == expect_native);
				assert(p.string<char>() == expect_char);
				}
				{
				using It = cpp17_input_iterator<const char*>;
				char const* charp = TestPath;
				char const* charp_end = charp + StrLen(charp);
				fs::path p(It{charp}, It{charp_end}, Locale, args...);
				assert(p.native() == expect_native);
				assert(p.string<char>() == expect_char);
				}
				}

				void test_sfinae() {
				{
				using It = cpp17_output_iterator<char*>;
				static_assert(!std::is_constructible<fs::path, It, std::locale>::value, "");
				}
				{
				using It = int*;
				static_assert(!std::is_constructible<fs::path, It, std::locale>::value, "");
				}
				}

				struct CustomCodeCvt : std::codecvt<wchar_t, char, std::mbstate_t> {
				protected:
				QuuxplusoneUnsubmitted Not Done Reply Inline Actions Why is this commented out? Quuxplusone: Why is this commented out?
				jguegantUnsubmitted Done Reply Inline Actions This should not be here. Thanks for spotting it! jguegant: This should not be here. Thanks for spotting it!
				result do_in(state_type&,
				const extern_type* from,
				const extern_type* from_end,
				const extern_type*& from_next,
				intern_type* to,
				intern_type* to_end,
				intern_type*& to_next) const override {
				for (; from < from_end && to < to_end; ++from, ++to)
				*to = 'o';

				from_next = from;
				to_next = to;

				return result::ok;
				}
				};

				int main(int, char**) {
				std::locale Locale;

				// Ensure std::codecvt<wchar_t, char, std::mbstate_t> is used.
				{
				std::locale CustomLocale(Locale, new CustomCodeCvt());
				auto TestPath = MKSTR("aaaa");
				auto Expect = MKSTR("oooo");
				RunTestCase(TestPath, Expect, CustomLocale);
				RunTestCase(TestPath, Expect, CustomLocale, fs::path::format::auto_format);
				RunTestCase(TestPath, Expect, CustomLocale, fs::path::format::native_format);
				RunTestCase(TestPath, Expect, CustomLocale, fs::path::format::generic_format);
				}

				tahonermannUnsubmitted Not Done Reply Inline Actions I think additional tests should be added here to exercise various edge cases like the truncated partial code unit sequence in the godbolt link I added in another comment. We should also exercise some MxN sequence conversions (e.g., UTF-8 to UTF-16). That can be done by calling `.u16string()` on the resulting `path` object and comparing what it returns to expectations. tahonermann: I think additional tests should be added here to exercise various edge cases like the truncated…
				for (auto const& MS : PathList) {
				RunTestCase(MS, MS, Locale);
				RunTestCase(MS, MS, Locale, fs::path::format::auto_format);
				RunTestCase(MS, MS, Locale, fs::path::format::native_format);
				RunTestCase(MS, MS, Locale, fs::path::format::generic_format);
				}

				test_sfinae();

				return 0;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Implement missing filesystem::path constructors with localeNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 557895

libcxx/include/__filesystem/path.h

libcxx/include/__locale

libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source.pass.cpp

libcxx/test/std/input.output/filesystems/class.path/path.member/path.construct/source_and_locale.pass.cpp

[libc++] Implement missing filesystem::path constructors with locale
Needs RevisionPublic