This is an archive of the discontinued LLVM Phabricator instance.

[Support] Improve Windows widenPath and add support for long UNC paths
ClosedPublic

Authored by andrewng on Feb 28 2020, 10:43 AM.

Details

Summary

Check the path length limit against the length of the UTF-16 version of
the input rather than the UTF-8 equivalent, as the UTF-16 length may be
shorter. Move widenPath from the llvm::sys::path namespace in Path.h to
the llvm::sys::windows namespace in WindowsSupport.h. Only use the
reduced path length limit for create directory. Canonicalize using
sys::path::remove_dots().

Diff Detail

Event Timeline

andrewng created this revision.Feb 28 2020, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 28 2020, 10:43 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
rnk added inline comments.Feb 28 2020, 1:45 PM
llvm/include/llvm/Support/FileSystem.h
1161

Let's move this somewhere more obscure so clients like llvm-ar are less likely to find it and try to use it. I'd suggest WindowsSupport.h. Ideally we would make it local to Support, but we don't have any existing internal support headers.

llvm/lib/Support/Windows/Path.inc
82

Why not always use the lower limit? What is the harm in using the UNC prefixes slightly more often?

llvm/unittests/Support/Path.cpp
1887

Is this the first introduction of a real UTF-8 source character? Are you confident that this will work portably? Do you need to write u8"pi" to make this work where utf-8 is not the local codepage?

I'm not sure I agree with moving widen from sys::path to sys::fs. Is there consensus on the layering here? This adds dependencies for sys::fs on sys::path, but I would expect the dependencies to be in the other direction--the path style you use would depend on the underlying file system, not the other way around, right? The long path prefix on an absolute local path depends on the underlying file system being NTFS.

And, in my recent experience sys::path::is_absolute(..., sys::path::style::windows) seems buggy with UNC paths, so I'm skeptical if it works right with the long path prefix.

llvm/lib/Support/Windows/Path.inc
82

The long path prefix isn't a UNC prefix. The long path prefix is an escape that tells many system APIs not to try to interpret the string as a file system path, but simply to pass it on to the underlying file system as an opaque string. That's great if you underlying file system is NTFS, that's great. I don't know whether something like a FAT32 flash drive handles that. If not, then there could be some harm to prematurely appending the long path prefix. That's probably pretty low risk.

I think the ForCreateDir parameter makes this API a leaky abstraction and the risk is low. If you insist on having a way to enforce different limits, then I'd make the parameter be the path limit you want, default it to MAX_PATH, and let the special cases (like code that calls CreateDirectoryW) explain why they're setting it lower.

117

Would it be appropriate to check (or at least assert) that RootName.size() <= Path8Str.size()? If somehow the entire input string could be considered the root name, then this reference would be out of bounds.

llvm/unittests/Support/Path.cpp
674

Does this patch, which allows longer paths on Windows, invalidate this test which checks that PATH_MAX is enforced? Presumably, createUniqueFile could use the new widen. Or is the issue here that PATH_MAX also applied to each component of a path?

1880

const?

1903

This seems like a better place to define Expected.

andrewng marked 11 inline comments as done.Mar 2 2020, 3:15 AM
andrewng added inline comments.
llvm/lib/Support/Windows/Path.inc
82

I think anything to help avoid having to mangle the input is beneficial. The need for this type of code is already bad enough.

The real purpose of this function is to best prepare the path for the Win32 file APIs. It's not meant to be any kind of validation. So the Win32 file APIs could still fail for various reasons, such as other file system constraints.

I like the idea to change ForCreateDir to MaxPathLen, thanks.

117

Thanks for spotting this one!

llvm/unittests/Support/Path.cpp
674

As you've said, I believe that there is also a length limit to each component of a path, although I'm not sure what that exact limit is.

1880

My usual style is to liberally use const where appropriate. But I've had objections to that in the past. As you've mentioned it here, I'll go back to my usual style and see if anyone objects.

1887

Good point. It should be OK for all supported Win32 compilers, but to be on the safe side, I'll change it to hex representation to be sure.

andrewng updated this revision to Diff 247586.Mar 2 2020, 3:19 AM
andrewng marked 4 inline comments as done.

Updated to address review comments and suggestions.

andrewng edited the summary of this revision. (Show Details)Mar 2 2020, 3:27 AM
rnk accepted this revision.Mar 18 2020, 9:22 AM

LGTM

I like Adrian's suggestion of having the caller pass in the maximum path length, thanks!

This revision is now accepted and ready to land.Mar 18 2020, 9:22 AM
This revision was automatically updated to reflect the committed changes.