Page MenuHomePhabricator

Fix more VFS tests on Windows
ClosedPublic

Authored by amccarth on Nov 25 2019, 4:06 PM.

Details

Summary

Since VFS paths can be in either Posix or Windows style, we have to use a more flexible definition of "absolute" path.

The key here is that FileSystem::makeAbsolute is now virtual, and the RedirectingFileSystem override checks for either concept of absolute before trying to make the path absolute by combining it with the current directory.

Diff Detail

Event Timeline

amccarth created this revision.Nov 25 2019, 4:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 25 2019, 4:06 PM
rnk added inline comments.Dec 2 2019, 1:06 PM
llvm/lib/Support/VirtualFileSystem.cpp
1077–1078

I think Posix-style paths are considered absolute, even on Windows. The opposite isn't true, a path with a drive letter is considered to be relative if the default path style is Posix. But, I don't think that matters. We only end up in this mixed path style situation on Windows.

Does this change end up being necessary? I would expect the existing implementation of makeAbsolute to do the right thing on Windows as is. I think the other change seems to be what matters.

1431

Is there a way to unit test this? I see some existing tests in llvm/unittests/Support/VirtualFileSystemTest.cpp.

I looked at the yaml files in the VFS tests this fixes, and I see entries like this:

{ 'name': '/tests', 'type': 'directory', ... },
{ 'name': '/indirect-vfs-root-files', 'type': 'directory', ... },
{ 'name': 'C:/src/llvm-project/clang/test/VFS/Inputs/Broken.framework', 'type': 'directory', ... },
{ 'name': 'C:/src/llvm-project/build/tools/clang/test/VFS/Output/vfsroot-with-overlay.c.tmp/cache', 'type': 'directory', ... },

So it makes sense to me that we need to handle both kinds of absolute path.

1448–1449

I wonder if there's a simpler fix here. If the working directory is an absolute Windows-style path, we could prepend the drive letter of the working directory onto any absolute Posix style paths read from YAML files. That's somewhat consistent with what native Windows tools do. In cmd, you can run cd \Windows, and that will mean C:\Windows if the active driver letter is C. I think on Windows every drive has an active directory, but that's not part of the file system model.

amccarth marked 3 inline comments as done.Dec 2 2019, 1:56 PM
amccarth added inline comments.
llvm/lib/Support/VirtualFileSystem.cpp
1077–1078

Yes, it's necessary. The Posix-style path \tests is not considered absolute on Windows. Thus the original makeAboslute would merge it with the current working directory, which gives it a drive letter, like D:\tests\. The drive letter component then causes comparisons to fail.

1431

Is there a way to unit test this?

What do you mean by "this"? The /tests and /indirect-vfs-root-files entries in that yaml are the ones causing several tests to fail without this fix, so I take it that this is already being tested. But perhaps you meant testing something more specific?

1448–1449

I'm not seeing how this would be simpler.

I don't understand the analogy of your proposal to what the native Windows tools do. When you say, cd \Windows, the \Windows is _not_ an absolute path. It's relative to the current drive.

I could be wrong, but I don't think prepending the drive onto absolution Posix style paths read from YAML would work. That would give us something like D:/tests (which is what was happening in makeAbsolute) and that won't match paths generated from non-YAML sources, which will still come out as /tests/foo.h.

I think on Windows every drive has an active directory ....

Yep. I think of it as "every drive has a _current_ directory" and each process has a current drive. When you want the current working directory, you get the current directory of the current drive.

ormris added a subscriber: ormris.Dec 2 2019, 2:10 PM
amccarth marked 2 inline comments as done.Dec 3 2019, 12:53 PM

A (hopefully) more cogent response than my last round. I'm still hoping to hear from VFS owners.

llvm/lib/Support/VirtualFileSystem.cpp
1077–1078

To make sure I wasn't misremembering or hallucinating, I double-checked the behavior here: Posix-style paths like \tests are not considered absolute paths in Windows-style.

1448–1449

... I don't think prepending the drive onto absolution Posix style paths read from YAML would work....

I misspoke. The idea of prepending the drive onto absolute Posix-style paths read from the YAML probably could be made to work for the common cases like the ones in these tests.

But I still don't think that approach would simplify anything.

Furthermore, I think it could open a potential Pandora's box of weird corner cases. For example, in a system with multiple drives, the current drive may not always be the "correct" one to use. Slapping a drive letter onto a Posix-style path creates a Frankenstein hybrid that's neither Windows-style nor Posix-style. Doing so because we know the subsequent code would then recognize it as an absolute path seems like a way to create an unnecessary coupling between the VFS YAML parser and the LLVM path support.

In my mind, the model here is that these roots can be in either style. I prefer to let the code explicitly reflect that fact rather than trying to massage some of the paths into a form that happens to cause the right outcome.

rnk added inline comments.Dec 5 2019, 3:14 PM
llvm/lib/Support/VirtualFileSystem.cpp
1077–1078

I see, I agree, the platforms diverge here:

bool rootDir = has_root_directory(p, style);
bool rootName =
    (real_style(style) != Style::windows) || has_root_name(p, style);

return rootDir && rootName;

So, on Windows rootDir is true and rootName is false.

I still wonder if this behavior shouldn't be pushed down into the base class. If I pass the path \test to the real FileSystem::makeAbsolute on Windows, should that prepend the CWD, or should it leave the path alone?

1431

What do you mean by "this"?

I guess what I meant was, can you unit test the whole change in case there are behavior differences here not covered by the clang tests?

1448–1449

The way in which I see it as being "simpler" is that all absolute paths would end up having drive letters on Windows. I was hoping that would avoid some corner cases that arise from having a virtual file system tree in memory where some files are rooted in a drive letter and some appear in non-drive letter top level directories from Posix paths.

In any case, I think the change you have here is definitely correct, and is a smaller change in behavior. The path component iterators below need to use the correct style.

amccarth marked 2 inline comments as done.Dec 9 2019, 4:20 PM
amccarth added inline comments.
llvm/lib/Support/VirtualFileSystem.cpp
1431

This change causes no regressions in those llvm unit tests (llvm/unittests/Support/VirtualFileSystemTest.cpp). They all still pass.

But thanks for pointing those out, because my subsequent changes do seem to make a difference.

rnk accepted this revision.Dec 17 2019, 3:26 PM
rnk added a subscriber: JDevlieghere.

+@JDevlieghere, due to recent VFS test fixup.

I think this looks good, and in the absence of input from other VFS stakeholders, let's land this soon. Thanks!

llvm/lib/Support/VirtualFileSystem.cpp
1077–1078

I think we discussed this verbally, and decided we should prepend the CWD, as is done here.

This revision is now accepted and ready to land.Dec 17 2019, 3:26 PM
This revision was automatically updated to reflect the committed changes.
amccarth marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptDec 18 2019, 11:45 AM

@rnk / @amccarth, I've been looking at the history of makeAbsolute being virtual, since it's a bit awkward that RedirectingFileSystem behaves differently from the others. I'm hoping you can give me a bit more context.

I'm wondering about an alternative implementation where:

  • The path style is detected when reading the YAML file (as now).
  • Paths in the YAML file are canonicalized to native at parse time.
  • The nodes in-memory all use native paths so the non-native style isn't needed after construction is complete.
  • makeAbsolute() doesn't need to be overridden / special.

Was this considered? If so, why was it rejected? (If it doesn't work, why not?)

If we could limit the scope the special version of makeAbsolute() to "construction time" it would simplify the mental model. As it stands it's a bit difficult to reason about makeAbsolute, and whether it's safe/correct to send a makeAbsolute-d path into ExternalFS.

I didn't design VFS. I just fixed a bunch of portability bugs that
prevented us from running most of the VFS tests on Windows. If I were
designing it, I (hope) I wouldn't have done it this way. But the horse is
already out of the barn.

Here's my background with VFS (and, specifically, the redirecting
filesystem):

VFS is used in many tests to map a path in an arbitrary (usually Posix)
style to another path, possibly in a different filesystem. This allows the
test to be platform agnostic. For example, the test can refer to
/foo/bar.h if it uses the redirecting filesystem to map /foo to an
actual directory on the host. If it's a Windows machine, that target might
be something like C:\foo, in which case the actual file would be
C:\foo\bar.h, which LLVM thinks of as C:\foo/bar.h. That's inherently
a path with hybrid style. There are other cases, too, e.g., where the host
is Windows, but the target filesystem is an LLVM in-memory one (which uses
only Posix style).

When I first tried to tackle the portability bugs, I tried various
normalization/canonicalization strategies, but always encountered a
blocker. That's when rnk pointed out to me that clang generally doesn't do
any path normalization; it just treats paths as strings that can be
concatenated. With that in mind, I tried accepting the fact that hybrid
path styles are a fact of life in VFS, and suddenly nearly all of the
portability problems became relatively easy to solve.

Note that lots of LLVM and Clang tests were using VFS, but the VFS tests
themselves couldn't run on Windows. All those tests were built upon
functionality that wasn't being tested.

I think that we probably could do something simpler, but it would force a
breaking change in the design of the redirecting filesystem. The most
obvious victim of that break would be various LLVM and clang tests that
exclusively use Posix-style paths and rely on VFS to make it work on
non-Posix OSes. I'm not sure how significant the break would be for others.

Looking specifically at your proposal:

  • The path style is detected when reading the YAML file (as now).

Which path's style? The virtual one that's going to be redirected or the
actual one it's redirected at?

  • Paths in the YAML file are canonicalized to native at parse time.

If we canonicalize the virtual path, VFS would no longer be valuable for
creating platform-agnostic tests.

I don't remember the details, but canonicalizing the target paths caused
problems. Do we need to be careful about multiple redirections (e.g.,
/foo directs to /zerz which directs to C:\bumble)? I seem to recall
there was a good reason why the redirecting filesystem waits to the last
moment to convert a virtual path to an actual host path.

  • The nodes in-memory all use native paths so the non-native style isn't

needed after construction is complete.

I'm guessing that would affect how paths are displayed (e.g., in diagnostic
messages). At a minimum, we'd have to fix some tests. I don't know all
the contexts this might occur and how that might affect things. For
example, paths may be written into debug info metadata.

  • makeAbsolute() doesn't need to be overridden / special.

Honestly, I'm not sure we have a good definition of what makeAbsolute
should do. Sure, on Posix, it's well understood. But is \foo an
absolute path on Windows? Is D:bar.h an absolute path on Windows? If
not, how should those be made absolute? LLVM assumes that there's a well
defined mapping between Posix filesystem concepts and the host filesystem.
But I haven't seen any documentation on how a Posix->Windows mapping should
work (let alone the inverse), and I certainly don't have an intuitive
understanding of how that mapping should work.

In LLDB, we have the additional wrinkle of remote debugging, where the
debugger may be running on a Windows machine while the program being
debugged is running on a Linux box. You always have to know whether a path
will be used on the debugger host or the debuggee host. And there are
similar concerns for post-mortem debugging from a crash collected on a
different type of host.

I'm not opposed to making this better, but I don't think I understand your
proposal well enough to discuss it in detail. I'm pretty sure anything
that eliminates hybrid paths is going to cause some breaking changes. That
might be as simple as fixing up a bunch of tests, but it might have wider
impact.

Adrian.

Thanks for the quick and detailed response!

Your explanation of hybrid behaviour on Windows was especially helpful, as I didn't fully understand how that worked before.

One thing I see, but wasn't obvious from your description, is that the mixed/hybrid separator behaviour only happens for defined(_WIN_32). E.g., from Path.cpp:

bool is_separator(char value, Style style) {
  if (value == '/')
    return true;
  if (real_style(style) == Style::windows)
    return value == '\\';
  return false;
}
  • The path style is detected when reading the YAML file (as now).

Which path's style? The virtual one that's going to be redirected or the
actual one it's redirected at?

Both, but you've mostly convinced me not to go down this route.

  • Paths in the YAML file are canonicalized to native at parse time.

If we canonicalize the virtual path, VFS would no longer be valuable for
creating platform-agnostic tests.

This is a good point I hadn't considered.

In LLDB, we have the additional wrinkle of remote debugging, where the
debugger may be running on a Windows machine while the program being
debugged is running on a Linux box. You always have to know whether a path
will be used on the debugger host or the debuggee host. And there are
similar concerns for post-mortem debugging from a crash collected on a
different type of host.

Ah, interesting.

Honestly, I'm not sure we have a good definition of what makeAbsolute
should do.

Perhaps the name isn't ideal -- prependRelativePathsWithCurrentWorkingDirectory() would be more precise -- but otherwise I'm not sure I fully agree. Regardless, I acknowledge your point that the two worlds are awkwardly different.

I'm going to think about other options; thanks again for your feedback. I am still leaning toward FileSystem::makeAbsolute() not being virtual / overridden, but I have a better idea of how to approach that. One idea is for the RedirectingFileSystem to keep track of where different styles were used when parsing.... I'm not sure if that'll pan out though.

BTW, I hope I didn't come across as overly negative in my previous
response. I'd love to see the situation improved. I just don't know what
that would look like.

One thing I see, but wasn't obvious from your description, is that the

mixed/hybrid separator behaviour only happens for defined(_WIN_32).

[copy of is_separator elided]

It's a runtime decision based on the specified style, not a compile-time
one based on _WIN_32. If the caller of is_separator passes
Style::windows then it'll accept either / or \\, even if LLVM was
compiled and run on Linux or Mac. I believe there's a VFS test that
redirects a Windows-style path to a Posix-style one (an in-memory file
system), and that test passes on both kinds of hosts.

But I get the gist of the point. My feeling is that, unless we can
eliminate hybrid styles, Paths should support a Style::hybrid. It would be
messy because more ambiguities about how to map things crop up.

Honestly, I'm not sure we have a good definition of what makeAbsolute
should do.

I should have said: I don't have a good understanding of what
makeAbsolute should do. Even saying it should prepend the current working
directory is an incomplete answer. On Windows, a process can have multiple
current directories: one for each drive. And a process has one current
drive. So the current working directory is the current directory for the
current drive. A Windows path like "D:foo.txt" is a relative path.
Literally prepending the current working directory gives us
C:\users\me\D:foo.txt, which is syntactically wrong. But even if we're
smart enough to fix up the syntax, we'd get C:\users\me\foo.txt or
D:\users\me\foo.txt, both of which would (likely) be wrong. The way the
OS would resolve it is to look up the process's current directory for the
D: drive, and insert that into the missing bit.

Anyway, I look forward to any and all proposals to improve this situation.

BTW, I hope I didn't come across as overly negative in my previous
response.

No, not at all!

[...] On Windows, a process can have multiple
current directories: one for each drive. And a process has one current
drive. So the current working directory is the current directory for the
current drive. A Windows path like "D:foo.txt" is a relative path.

Also news to me; thanks for the extra info. Looks like this isn't just a problem with makeAbsolute; even sys::fs::make_absolute is incorrect for Windows.

Seems to me like we could:

  • Fix the one argument version of sys::fs::make_absolute on Windows by calling GetFullPathNameW (documented at https://en.cppreference.com/w/cpp/filesystem/absolute)
  • "Fix" the two argument version of sys::fs::make_absolute by returning an error if the path and CWD both have drive names and they don't match. (Or leave it alone.)

For the VFS, I imagine we could:

  • Keep an enum on FileSystem that indicates its path style (immutable, set on construction).
  • Maybe change APIs to support the windows idea of a different CWD per drive (getCurrentRootDirectoryFor(...)), in order to have a correct implementation of the one-argument makeAbsolute (I assume there's way to get the full initial set to support getPhysicalFileSystem()?).
  • Split RedirectingFileSystem implementation in two, between the "redirecting" and the "overlay" parts (overlaying being optional, depending on fallthrough:).
  • Add a WindowsFileSystemAsPOSIX adaptor/proxy that takes an underlying filesystem in windows mode and presents it as-if posix, given a drive mapping in the constructor. This could be implemented using the same guts as the "redirecting" part of RedirectingFileSystem. (Note also https://reviews.llvm.org/D94844, which allows a directory to be remapped/redirected wholesale; this could be used for mapping drives to POSIX paths.)
  • Add a POSIXFileSystemAsWindows adaptor/proxy that does the inverse.
  • Disallow overlaying FileSystems that have mismatched path styles. E.g., OverlayFileSystem::pushOverlay would error if a mismatched style is passed into it, requiring the caller to first wrap it in an adaptor.
  • Maybe add make{Native,Windows,POSIX}FileSystem adaptors which return the given filesystem directly or add the appropriate adaptor (I'm not sure how drive mapping would be handled... maybe there are sane default mappings we could use...).
  • Maybe make getVFSFromYAML add an adaptor to native by default.

I suppose I'm imagining windows mode would remain "hybrid", but I wonder if this would model the two worlds, even in the face of a VFS using different path styles than native. WDYT?