This is an archive of the discontinued LLVM Phabricator instance.

[3/N] [libcxx] Make filesystem::path::value_type wchar_t on windows
ClosedPublic

Authored by mstorsjo on Nov 10 2020, 1:29 AM.

Details

Summary

Also set the preferred separator to backslash.

libc++ doesn't compile successfully for windows prior to this change, and this change on its own isn't enough to make it compile successfully either, but is the first stepping stone towards making it work correctly.

Most of operations.cpp will need to be touched, both for calling functions that take wchar paths, but also for using other windows specific functions instead of the posix functions used so far; that is handled in later commits.

Changing parts of operations.cpp to generalize the string type handling in code that doesn't touch system functions.

Diff Detail

Event Timeline

mstorsjo created this revision.Nov 10 2020, 1:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2020, 1:29 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
amccarth added inline comments.Dec 1 2020, 1:55 PM
libcxx/src/filesystem/operations.cpp
176–179

Is "/" the preferred representation of the root directory even on Windows?

mstorsjo added inline comments.Dec 1 2020, 2:01 PM
libcxx/src/filesystem/operations.cpp
176–179

No; this patch just handles the char->wchar transition. This particular case is changed to return backslashes in D91176 as part of making the path parser handle windows specific concepts as "root name".

mstorsjo added inline comments.Dec 1 2020, 2:04 PM
libcxx/src/filesystem/operations.cpp
176–179

Ok, I see that I generally change the preferred separator to backslashes here, so I guess I could move that bit from that other patch into this one as well if you want it strictly that way - it originated in that patch as it came from trying to make the path parser/iterator return the right things.

mstorsjo updated this revision to Diff 308893.Dec 2 2020, 12:20 AM

Rebased; moved bit from later patch for making PathParser::operator* return backslashes where relevant into this patch.

mstorsjo updated this revision to Diff 309783.Dec 6 2020, 11:37 AM
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Rebase

curdeius added inline comments.
libcxx/include/filesystem
725

Wouldn't it be wise to reserve at least the minimum expected output size?
It seems that there are some places that a reserve would fit well.
But that's out of scope of this patch.

1129

Shouldn't there be a difference between string and generic_string, the former returning native format (with backslash on Windows?) and the latter always path with forward slashes?

1137

The comment doesn't match the #if.

1297–1298

Hmm?

1307–1308

Ditto.

libcxx/src/filesystem/operations.cpp
176–179

Why does it depend on RawEntry and not on OS only?

curdeius added inline comments.Dec 9 2020, 1:00 PM
libcxx/include/filesystem
579

Do you really need this type? Can't you use __u8_string::value_type instead?

mstorsjo added inline comments.Dec 9 2020, 1:11 PM
libcxx/include/filesystem
1129

Yes, there's a difference; that aspect is corrected later in patch [24/N] in D91181. (I understand that it'd be cleaner to review if all changed code lands in the right form directly the first time it's touched - but that'd make the first patches even more unwieldy - so here I try to primarily focus on getting the wchar_t aspect right.)

And in general, string() and native() returns the string with whatever separator form the string originally contained, it can be any combination of forward and backslashes, while generic_string() indeed returns a version with forward slashes only.

1137

Right, I guess I should make it _LIBCPP_WIN32API instead, and the ! is confusing. I meant it in the form #else /* condition * as it could be #elseif condition i.e. describing the condition for the code following the #else, not the condition for the block above, but I guess that's not common practice. I'll change it to just plain #else /* _LIBCPP_WIN32API */.

1297–1298

Right, this was a generalization from before I added the more windows specific bits, trying to move things to use explicitly value_type and string_type instead of hardcoding the types - but here it's a bit out of place, so I can drop these two changes from the patch.

libcxx/src/filesystem/operations.cpp
176–179

Because when iterating over the path name, we want to return the root dir with the slash in the same form as in the actual path itself.

mstorsjo added inline comments.Dec 9 2020, 1:13 PM
libcxx/include/filesystem
579

I guess I can do that; this particular typedef goes away in one of the later patches anyway.

mstorsjo updated this revision to Diff 310635.Dec 9 2020, 1:28 PM

Removed the unnecessary __u8_value typedef, fixed an #else comment, removed unnecessary changes from char to value_type in a non-windows ifdef section.

mstorsjo added inline comments.Dec 9 2020, 1:31 PM
libcxx/include/filesystem
1297–1298

Oh, sorry, I misread, now I see what you meant. Will fix.

mstorsjo updated this revision to Diff 310641.Dec 9 2020, 1:34 PM

Fixed two missed cases that should use use char_traits<value_type>

curdeius accepted this revision.Dec 9 2020, 1:58 PM

LGTM

libcxx/include/filesystem
1129

Ok. And thank you for the explanation.

1137

Yeah that's not very clear sometimes how it should be read.
Personally I love the MS STL style with "^^^^ if-condition vvvvv negated condition".

libcxx/src/filesystem/operations.cpp
176–179

Ok.

Nit.

libcxx/include/filesystem
852

To be consistent with D91138.

mstorsjo updated this revision to Diff 310663.Dec 9 2020, 2:23 PM

Define preferred_separator as L'\\'.

mstorsjo updated this revision to Diff 310666.Dec 9 2020, 2:23 PM
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

... and set the repository

wmaxey added a subscriber: wmaxey.Dec 10 2020, 12:26 PM
wmaxey removed a subscriber: wmaxey.
ldionne requested changes to this revision.Dec 14 2020, 3:17 PM
ldionne added a subscriber: ldionne.
ldionne added inline comments.
libcxx/include/filesystem
795

This code is valid even outside of Windows, right? If so, I'd rather we keep it in the header unconditionally. I don't mind if it's unused outside of Windows -- I think it's better to try to remove these #ifdefs as much as possible.

1110–1122

Would it be possible to share these definitions between Windows and non-Windows to avoid code duplication?

This revision now requires changes to proceed.Dec 14 2020, 3:17 PM
mstorsjo added inline comments.Dec 14 2020, 11:11 PM
libcxx/include/filesystem
795

The code added here in this patch should be valid code on any platform, yes, but it's code that assumes the source char type is wchar_t (which is kind of not nice to have outside of specific ifdefs), and D91137 adds more code here that calls the windows-specific __wide_to_char and __char_to_wide here, so those would need to be ifdeffed anyway (as long as we keep those declarations under an ifdef).

There's of course the benefit of having more syntax checked on non-windows platforms though, even though it's never actually referenced. With other ifdefs, it could look a bit odd, like this:

// Windows specific code, unused elsewhere
template <class _ECharT> struct _PathExport {
    void first_method() {
    }
#ifdef _LIBCPP_WIN32API
    void other_method() {
        // Code referencing things that don't build on other platforms, referencing helpers like __wide_to_char
    }
#endif
    void third_method() {
    }
};
// End of windows specific block
1110–1122

At least u16string() and u32string() could be shared, the rest is a bit messier. On posix, string() and u8string() just return __pn_ directly and work outside of !_LIBCPP_HAS_NO_LOCALIZATION, while on windows, it's wstring() that works without localization and string() and u8string(). And later, D91181 adds more windows specifics to generic_*string(). So all in all, yes, some small bits could be shared, but I'm not entirely convinced it's worth it.

mstorsjo updated this revision to Diff 312435.Dec 17 2020, 3:50 AM
mstorsjo removed rG LLVM Github Monorepo as the repository for this revision.
mstorsjo set the repository for this revision to rG LLVM Github Monorepo.

Rebased, retriggering CI

ldionne accepted this revision.Dec 17 2020, 2:03 PM
This revision is now accepted and ready to land.Dec 17 2020, 2:03 PM