This is an archive of the discontinued LLVM Phabricator instance.

Add llvm::sys::fs::real_path
ClosedPublic

Authored by zturner on Mar 6 2017, 3:18 PM.

Details

Summary

LLVM currently has a mechanism for resolving symlinks and collapsing . and .. items through the openFileForRead method, but it has a couple of limitations:

  1. It's not convenient to call, since it requires calling an intermediate function first and interpreting the results, then cleaning up after the function (by closing the file descriptor). It would be nice to just be able to say real_path(Path, Real);
  2. It requires an open file descriptor, which means that particularly on Windows, there's currently no way to get the real path of a directory ("opening" a directory and opening a file require different code, and we don't expose a method to do the former)
  3. On non-Windows platforms, we can't currently handle tilde expressions.

This patch addresses all of this by providing a simple method with the signature std::error_code real_path(const Twine &, SmallVectorImpl<char> &). It works even with directories on Windows, and on Posix platforms it is smart enough to translate ~ into the appropriate home directory.

As a result, real_path(X) works like you would expect it to from a bash shell on Linux/Mac/etc, and real_path(X) works with directories on Windows, providing maximum portability.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Mar 6 2017, 3:18 PM
amccarth edited edge metadata.Mar 6 2017, 4:12 PM

I can see why it's probably not feasible to test realPathFromHandle and directoryRealPath directly, but is there a way to test real_path to ensure that it covers a lot of the important cases (like MAX_PATH on Windows)?

llvm/lib/Support/Unix/Path.inc
480 ↗(On Diff #90751)

Isn't the style expandTildeExpr?

837 ↗(On Diff #90751)

realPath?

838 ↗(On Diff #90751)

I would move the definition of fd down near the definition of EC so that it's not live before it needs to be.

847 ↗(On Diff #90751)

s/then/them/

llvm/lib/Support/Windows/Path.inc
792 ↗(On Diff #90751)

Maybe this should be larger than MAX_PATH because it will have the \\?\ prefix.

793 ↗(On Diff #90751)

Reading MSDN, it looks like the length of the buffer should be reported as MAX_PATH - 1, otherwise a path that's exactly the maximum length might result in a one character overrun.

794 ↗(On Diff #90751)

GetFinalPathNameByHandle is Vista+, no XP. I assume that's OK.

804 ↗(On Diff #90751)

Looks like this silently fails for paths longer than MAX_PATH. Is that a limitation we're already facing in the rest of path support in LLVM? Should this return an error in that case?

814 ↗(On Diff #90751)

Would PathUTF16.data() be more appropriate than .begin()?

819 ↗(On Diff #90751)

If realPathFromHandle could return an error in the long name case, then you could propagate the error out here.

I can see why it's probably not feasible to test realPathFromHandle and directoryRealPath directly, but is there a way to test real_path to ensure that it covers a lot of the important cases (like MAX_PATH on Windows)?

I actually tried adding some tests for real_path, but it's not that easy. You need a way to generate a symbolic link, and creating a symbolic link requires SeCreateSymbolicLink privilege. Because of this, llvm::sys::fs::create_link actually creates a hard link instead of a symbolic link, and when you make a file B which is a hardlink to A, can call real_path(B) it will just return B, not A (as expected).

I could probably add a test for . and .. syntax, but I don't see a way to write one for symbolic link resolution.

labath edited edge metadata.Mar 7 2017, 4:00 AM

I am not sure whether having things work "exactly like from a shell" is a feature (*), as there are other utilities doing similar things which don't do tilde expansion. Maybe I am being too pedantic, but I think it's a point worth bringing up.

(*) The rules for tilde expansion in posix shells are also quite tricky. E.g. ~/foo expands ~, but "~/foo" does not (but "$HOME/foo" expands $HOME, ...).

llvm/lib/Support/Unix/Path.inc
845 ↗(On Diff #90751)

I am wondering whether the tilde expansion should not be controlled by an argument. As the comment says, tilde expansion is done at a completely different level than path canonicalization. ~ is a perfectly valid path component and it's not clear to me that every user of this function would want the magic tilde expansion behavior as well. I think that can create some confusion, as this is not the behavior of any of the standard utilities and functions elsewhere (realpath unix utility, realpath POSIX function, llvm's openFileForRead, ...). For that reason it may be safer for the caller to explicitly opt-in into this behavior.

zturner updated this revision to Diff 90920.Mar 7 2017, 12:55 PM

Makes tilde expansion opt-in, and adds a test for . and .. collapsing. shell_expand does nothing currently on Windows, but we could make it expand environment variables in theory.

rnk added inline comments.Mar 7 2017, 3:44 PM
llvm/lib/Support/Unix/Path.inc
845 ↗(On Diff #90751)

Does the tilde expansion work on Windows? If not, let's leave it out like Pavel suggests.

zturner added inline comments.Mar 7 2017, 3:49 PM
llvm/lib/Support/Unix/Path.inc
845 ↗(On Diff #90751)

No, but I thought he just suggested to make it opt-in rather than leave it out entirely. The "equivalent" thing on Windows would be to expand environment variables. That said, maybe there's a case to be made for allowing tilde expansion on Windows too. I mean, Windows users do have a home directory, so it's not like it doesn't make sense at all.

labath added a comment.Mar 8 2017, 1:07 AM
llvm/include/llvm/Support/FileSystem.h
347 ↗(On Diff #90920)

I think this description is very vague. After reading this, I would expect we will handle all kinds of fancy shell expansions (${HOME//foo} anyone ?). I suggest name the argument tilde_expand, document it as performing home-directory expansion using ~ and nothing more, until we have a very good idea of what other kinds of expansions we may want to perform.

llvm/lib/Support/Unix/Path.inc
845 ↗(On Diff #90751)

Yes, my suggestion was to make this opt-in. I can see a case for making ~-expansion a separate function, but that may then be clunky to use for cases when you want both, and we will probably end up with a wrapper function that does both sooner or later.

PS: I like the idea of ~ working on windows as well (we just shouldn't be pretending to emulate any specific shell then).

jhenderson added inline comments.
llvm/lib/Support/Windows/Path.inc
794 ↗(On Diff #90751)

I had a similar question earlier, but I noticed that we don't use the XP toolset in the MSVC build, so I'm guessing the LLVM tools are not supported on XP. Haven't investigated the official policy though.

zturner updated this revision to Diff 91092.Mar 8 2017, 3:45 PM

Support tilde expressions on Windows.

zturner updated this revision to Diff 91093.Mar 8 2017, 3:47 PM

Add a better comment to the function's declarations.

mgorny added a subscriber: mgorny.Mar 8 2017, 11:17 PM

I'm sorry but what is the exact use case for this? I'm not exactly against having a function like this but 'realpath' logic is the kind of thing people frequently abuse, so I'd rather make sure we don't end up accidentally replacing 'canonical' paths provided by the user with 'real' paths.

labath accepted this revision.Mar 9 2017, 1:40 AM

looks good to me, thank you for the patience. BTW I accidentally stumbled onto this https://docs.python.org/2/library/os.path.html#os.path.expanduser yesterday, so it looks like supporting tilde expressions on windows is not without precedent.

@mgorny: The immediate motivation is lowering similar code for this in lldb. I am not sure what are all the reasons lldb needs it in the first place, but I can totally believe that it is overusing this (OTOH, lldb only consumes paths, it does not produce them, so it's probably not that critical). Still, maybe this is another reason to have ~ logic separate from real_path stuff -- then one could still support home directory expansion where it makes sense, without forcing full path resolution.

This revision is now accepted and ready to land.Mar 9 2017, 1:40 AM
zturner updated this revision to Diff 91240.Mar 9 2017, 5:05 PM

Fixed a few edge cases and removed a few copies, making things slightly more efficient.

Also, I responded yesterday but for some reason I don't see it showing up on the review. So just to make sure everyone got it: @mgorny LLVM already has realpath-like functionality built in, it's just cumbersome to use. You can do it by calling openFileForRead and passing an optional output parameter. A search of the codebase shows that a few clients are already using this (mostly in clang). Unfortunately though, it doesn't work for directories on Windows. And as mentioned, it's a clunky interface if the *only* thing you want is the path.

In any case, I'll leave this up for another day in case anyone has further comments.

Most of my comments on llvm/lib/Support/Windows/Path.inc still seem unaddressed. Are you not seeing them? The rest looks good.

zturner updated this revision to Diff 91359.EditedMar 10 2017, 8:48 AM

I think this should address all of Adrian's comments. I've left real_path as is instead of realPath since I didn't want to break consistency with all of the other path and file system functions like home_directory etc, but I've made the internal private functions using camel case notations.

amccarth accepted this revision.Mar 10 2017, 9:35 AM

Thanks for the changes. LGTM.

This revision was automatically updated to reflect the committed changes.