This is an archive of the discontinued LLVM Phabricator instance.

[Support] Add a new path style for Windows with forward slashes
ClosedPublic

Authored by mstorsjo on Oct 15 2021, 4:48 AM.

Details

Summary

This behaves just like the regular Windows style, with both separator
forms accepted, but with get_separator() returning forward slashes.

Add a new function make_preferred() (like the C++17
std::filesystem::path function with the same name), which converts
windows paths to the preferred separator form (while this one works on
any platform and takes a path::Style argument).

Contrary to native() (just like make_preferred() in std::filesystem),
this doesn't do anything at all on Posix, it doesn't try to reinterpret
backslashes into forward slashes there.

Diff Detail

Event Timeline

mstorsjo created this revision.Oct 15 2021, 4:48 AM
mstorsjo requested review of this revision.Oct 15 2021, 4:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2021, 4:48 AM

I have a few (mostly nitpicky) comments inline.

Also, please separate out the change to start using is_windows_style() into a separate/prep patch that's no functionality change (NFC). That'll make this diff smaller, and clarify where the behaviour is actually changing. (BTW, I updated https://reviews.llvm.org/D112288 to cover many of those before I looked here and saw all the conflicts; happy remove those if you want, but I had thought since my patch was NFC anyway I might as well clean them up.)

llvm/include/llvm/Support/Path.h
28–34

I wonder if the names would be more clear as:

windows_slash,               // or windows_forward?
windows_backslash,           // or windows_backward?
windows = windows_backslash, // leave a "deprecated" comment, and eventually remove?

... and then prefer the explicit windows_* names, eventually removing the plain windows. This avoids guessing games. WDYT?

BTW, since you're shuffling this enum anyway, I wonder if you could reorder native to the front? That way zero-initializers can be used in the common case.

266–271

It took me a few reads to understand how this was different from native, I think because the first sentence is identical. Can you update that sentence to clarify?

Also, the name didn't make it obvious to me what it does. One idea I had for the name was make_preferred_separators (describing the feature of Windows that causes it to do something). Another was native_if_windows (clearly describing the behaviour).

Alternatively, after https://reviews.llvm.org/D112288 or similar, you could skip this entirely and have callers do the logic... is it reasonable for call sites to understand that this is triggered by is_style_windows()?

llvm/lib/Support/Path.cpp
554

Personally I don't find replace_if easier to reason about than a simple for loop:

for (char &ch : Path)
  if (is_separator(ch, style))
    ch = preferred_separator(style);

but if you'd prefer std::replace_if then I'd suggest using a lambda, since I don't see other uses of std::bind in llvm/lib.

566–570

Might be nice to write this inline in the header since it's so short, to document what it does. That'd require is_style_windows() to be available in the header (such as through https://reviews.llvm.org/D112288).

Thanks for the review!

Also, please separate out the change to start using is_windows_style() into a separate/prep patch that's no functionality change (NFC). That'll make this diff smaller, and clarify where the behaviour is actually changing. (BTW, I updated https://reviews.llvm.org/D112288 to cover many of those before I looked here and saw all the conflicts; happy remove those if you want, but I had thought since my patch was NFC anyway I might as well clean them up.)

Thanks! Yeah D112289 covers most of the refactorings needed. I'll rebase this on top of both of your patches, and split out the remaining refactorings too (which can go in either on their own, or squashed into D112289, I don't have a preference).

llvm/include/llvm/Support/Path.h
28–34

Thanks, that sounds both clear and intuitive. I'll reorder native to the front too.

266–271

The name is indeed a bit vague, but I tried to align it with https://en.cppreference.com/w/cpp/filesystem/path/make_preferred which does the same - but if you prefer clarity over alignment with that, I don't mind changing - make_preferred_separators is quite clear.

Currently, this is implemented on top of native, which in addition to changing separators also expands ~ into the home directory path on Windows. I wonder if it would be better to keep make_preferred (or whatever we settle on) doing only the separator adjustment and nothing else. Although I don't think the distinction would matter in practice anyway.

Regarding dropping this and pushing the condition to the caller - actually the vast majority of callers of this are in llvm/lib/Support/Windows (most of them added in D111880), so for them, the condition isn't really needed. But in my full patchset, I do have a couple calls elsewhere scattered around the llvm and clang subdirs, and the intent is that you can have a fairly concise call to make_preferred in such codepaths if needed without taking up mental and visual space with the full condition there.

I rewrote the doxyen comment into this:

/// For Windows path styles, convert path to use the preferred path separators. 
/// For other styles, do nothing.
///
/// @param path A path that is transformed to preferred format.
llvm/lib/Support/Path.cpp
554

I don't have a strong preference for std::replace_if, a plain loop is probably just as good.

566–570

Sure, that sounds good to me.

mstorsjo updated this revision to Diff 383262.Oct 29 2021, 2:04 AM

Rebased on top of D112288 and D112289, plus D112786 with more preparatory refactorings. Applied most of the suggested changes (kept the make_preferred name for now, pending more discussion of the name).

dexonsmith accepted this revision.Oct 29 2021, 12:08 PM

LGTM, with one more nitpick.

llvm/include/llvm/Support/Path.h
266–271

New docs look great. I missed std::make_preferred() -- better just to use the same name here, so let's keep the name as you suggest.

llvm/lib/Support/Path.cpp
39–45

I suggest inverting the new check with an early return to avoid unnecessarily adding nesting.

This revision is now accepted and ready to land.Oct 29 2021, 12:08 PM
dexonsmith added inline comments.Oct 29 2021, 12:38 PM
llvm/unittests/Support/Path.cpp
73

Also, might be good to add windows_slash and windows_backslash to the tests here.

1460

Mostly unrelated to this patch, but I wonder if adding llvm::sys::fs::is_windows() (or is_system_windows or is_filesystem_windows or llvm::sys::is_windows or ...) and _posix() would lead to good cleanups.

dexonsmith added inline comments.Oct 29 2021, 5:15 PM
llvm/unittests/Support/Path.cpp
90–91

FYI, I just caught myself wondering what is_style_native() *should* return for Style::windows_slash and Style::windows_backslash when running on Windows. In practice (at least right now) it will return true for both. But I'm not sure if most people looking at a call to is_style_native() could guess correctly, and I could see a justifications for being more fine-grained.

So, I removed it in 8077a19f66b50573a043ef1c405e2149e0c69cb7.

mstorsjo marked an inline comment as done.Oct 30 2021, 12:42 PM

Thanks! I noticed that the squashed in changes in 99023627010bbfefb71e25a2b4d056de1cbd354e had mixed up with ones of get_separator and separators should use is_style_windows, but I’ll include the fix for that in this patch now. There’s no distinction between the two until this patch anyway.

llvm/lib/Support/Path.cpp
39–45

Oh, yes - and that simplifies the diff even further.

llvm/unittests/Support/Path.cpp
73

Sure, will do. I also noticed that there’s no direct tests of get_separator, I’ll as that too.

90–91

Good call, yes - I found that one a bit hard to see where it’d be used.

1460

Hmm, that might be nice in some cases, yeah, as long as the ifdefs don’t contain calls to windows-only functions.

mstorsjo updated this revision to Diff 383613.Oct 30 2021, 12:44 PM
mstorsjo marked an inline comment as done.

Applied the suggested changes, added tests for get_separator.

I'd still like an ack from @rnk about the direction of the implementation before landing it.

mstorsjo updated this revision to Diff 383638.Oct 30 2021, 10:59 PM

Rerun pre-merge checks after detaching the dropped separate parent revision.

I am a bit busy at the moment.

@aaron.ballman , can I use you as a representative of the MSVC ecosystem developer community? Are you on board with this change?

I am a bit busy at the moment.

@aaron.ballman , can I use you as a representative of the MSVC ecosystem developer community? Are you on board with this change?

Sure can! I need to think about this for a bit though -- I'm not super comfortable initially, but I can become comfortable. Windows silently accepts forward slashes as path separators and has for a long time, but there are circumstances where the forward slashes are NOT supported too. Those cases should hopefully be rare, but it'd be good to know how we treat them. For example, Win32 file namespace path names do not undergo separator normalization (e.g., paths that start with \\?\ so that they work with path components larger than max path, etc, such as \\?\C:\Users\aballman\Desktop\test.c). I think I'd prefer *not* supporting mixed path separators with that style of path because the system does not support it.

Sure can! I need to think about this for a bit though -- I'm not super comfortable initially, but I can become comfortable. Windows silently accepts forward slashes as path separators and has for a long time, but there are circumstances where the forward slashes are NOT supported too. Those cases should hopefully be rare, but it'd be good to know how we treat them. For example, Win32 file namespace path names do not undergo separator normalization (e.g., paths that start with \\?\ so that they work with path components larger than max path, etc, such as \\?\C:\Users\aballman\Desktop\test.c). I think I'd prefer *not* supporting mixed path separators with that style of path because the system does not support it.

FWIW, as llvm::sys::path currently already supports paths with either slash form (possibly mixed) - this keeps that behaviour, but adds another mode of operation, where the preferred separator is a forward slash (inserted whenever constructing paths, or when normalizing paths to that style). This particular patch in itself is entirely platform independent as it only adds the different mode for constucting paths (when one can choose to operate on paths in any mode on any platform).

Admittedly, while one could have paths with mixed or forward slashes before, it probably was rather rare in practice, while this patch makes it possible to have it occur much more frequently.

The follow-up, D111880, shows how the added facilities can be used to get an environment with mostly coherent forward slashes.

Most of the handling of paths (inspecting paths, appending, etc) doesn't care about the individual separator form at all, but when we actually want to operate on files, we mostly pass through the widenPath function in llvm/lib/Support/Windows/Path.inc. That function already takes a UTF8 pathname (with possibly mixed separators) and converts it to UTF16 for use with the windows -W suffixed APIs, and possibly prepends a \\?\ prefix if needed for long names.

So unless we have something trying to use pathnames that bypass the widenPath function (which is needed to map UTF8 to the actual filesystem charset anyway), \\?\ shouldn't be an issue. (However I realized one case in D111880 that I should fix, unrelated to this patch.)

Sure can! I need to think about this for a bit though -- I'm not super comfortable initially, but I can become comfortable. Windows silently accepts forward slashes as path separators and has for a long time, but there are circumstances where the forward slashes are NOT supported too. Those cases should hopefully be rare, but it'd be good to know how we treat them. For example, Win32 file namespace path names do not undergo separator normalization (e.g., paths that start with \\?\ so that they work with path components larger than max path, etc, such as \\?\C:\Users\aballman\Desktop\test.c). I think I'd prefer *not* supporting mixed path separators with that style of path because the system does not support it.

FWIW, as llvm::sys::path currently already supports paths with either slash form (possibly mixed) - this keeps that behaviour, but adds another mode of operation, where the preferred separator is a forward slash (inserted whenever constructing paths, or when normalizing paths to that style). This particular patch in itself is entirely platform independent as it only adds the different mode for constucting paths (when one can choose to operate on paths in any mode on any platform).

Admittedly, while one could have paths with mixed or forward slashes before, it probably was rather rare in practice, while this patch makes it possible to have it occur much more frequently.

The follow-up, D111880, shows how the added facilities can be used to get an environment with mostly coherent forward slashes.

Most of the handling of paths (inspecting paths, appending, etc) doesn't care about the individual separator form at all, but when we actually want to operate on files, we mostly pass through the widenPath function in llvm/lib/Support/Windows/Path.inc. That function already takes a UTF8 pathname (with possibly mixed separators) and converts it to UTF16 for use with the windows -W suffixed APIs, and possibly prepends a \\?\ prefix if needed for long names.

So unless we have something trying to use pathnames that bypass the widenPath function (which is needed to map UTF8 to the actual filesystem charset anyway), \\?\ shouldn't be an issue. (However I realized one case in D111880 that I should fix, unrelated to this patch.)

Fantastic, thank you for the explanation, that's very handy!

llvm/lib/Support/Path.cpp
608

Up on line 48, we use is_style_windows() instead of calling real_style(), but here we've flipped the edits. Is there a reason for the distinction?

mstorsjo added inline comments.Nov 2 2021, 10:27 AM
llvm/lib/Support/Path.cpp
608

Yes - on line 48 in separators, we want to return all accepted forms of separators - in both windows path styles, we tolerate both kinds of separators on input, i.e. we want to check if the style is any windows style. Here in get_separator (and similarly in preferred_separator) we return the one that we want to use when we concatenate/produce new paths (i.e. we want to check strictly if real_style(style) evaluates to Style::windows_backslash (Style::windows is an backwards compat alias for it).

(Changing existing occurrances of Style::windows into Style::windows_backslash for clarity could be done as a separate commit later.)

The calls to is_style_windows() were added recently in a refactoring (changing most cases of real_style() == windows into is_style_windows() in D112289), but those particular functions accidentally got changed the wrong way.

@aaron.ballman Is this one fine with you now too?