This is an archive of the discontinued LLVM Phabricator instance.

[VFS] Allow multiple RealFileSystem instances with independent CWDs.
ClosedPublic

Authored by sammccall on Jan 10 2019, 6:11 AM.

Details

Summary

Previously only one RealFileSystem instance was available, and its working
directory is shared with the process. This doesn't work well for multithreaded
programs that want to work with relative paths - the vfs::FileSystem is assumed
to provide the working directory, but a thread cannot control this exclusively.

The new vfs::createPhysicalFileSystem() factory copies the process's working
directory initially, and then allows it to be independently modified.

This implementation records the working directory path, and glues it to relative
paths to provide the correct absolute path to the sys::fs:: functions.
This will give different results in unusual situations (e.g. the CWD is moved).

The main alternative is the use of openat(), fstatat(), etc to ask the OS to
resolve paths relative to a directory handle which can be kept open. This is
more robust. There are two reasons not to do this initially:

  1. these functions are not available on all supported Unixes, and are somewhere between difficult and unavailable on Windows. So we need a path-based fallback anyway.
  2. this would mean also adding support at the llvm::sys::fs level, which is a larger project. My clearest idea is an OS-specific BaseDirectory object that can be optionally passed to functions there. Eventually this could be backed by either paths or a fd where openat() is supported. This is a large project, and demonstrating here that a path-based fallback works is a useful prerequisite.

There is some subtlety to the path-manipulation mechanism:

  • when setting the working directory, both Specified=makeAbsolute(path) and Resolved=realpath(path) are recorded. These may differ in the presence of symlinks.
  • getCurrentWorkingDirectory() and makeAbsolute() use Specified - this is similar to the behavior of $PWD and sys::path::current_path
  • IO operations like openFileForRead use Resolved. This is similar to the behavior of an openat() based implementation, that doesn't see changes in symlinks.

There may still be combinations of operations and FS states that yield unhelpful
behavior. This is hard to avoid with symlinks and FS abstractions :(

The caching behavior of the current working directory is removed in this patch.
getRealFileSystem() is now specified to link to the process CWD, so the caching
is incorrect.
The user who needed this so far is clangd, which will immediately switch to
createPhysicalFileSystem().

Diff Detail

Event Timeline

sammccall created this revision.Jan 10 2019, 6:11 AM

The patch makes sense to me.

I wonder if instead of lambda-wrapping everything we couldn't use a pattern like this:

Twine fixPath(const Twin &Orig, SmallVectorImpl<char> &Storage) {
  if (path_needs_fixing()) {
    fix_path(Orig, Storage);
    return Storage;
  }
  return Orig;
}

T whatever(const Twine &Path) {
  SmallString<N> Storage;
  // This is safe because the returned twine is either a copy of the input twine or a unary twine backed by the provided storage
  Twine Fixed = fixPath(Path, Storage);
  return do_stuff(Fixed);
}

This usage of Twine isn't very nice, but neither is lambda-wrapping everything, so it's kind of a pick-your-poison situation. Anyway, just throwing the idea out there...

lib/Support/VirtualFileSystem.cpp
241

explicit

265

It looks like it should be possible to use the trailing return type syntax to simplify this (auto WithFixedPath(Path, F) -> decltype(F(Path))

sammccall updated this revision to Diff 181062.Jan 10 2019, 7:44 AM

Address comments, avoid weird lambdas.

This usage of Twine isn't very nice, but neither is lambda-wrapping everything, so it's kind of a pick-your-poison situation. Anyway, just throwing the idea out there...

Thanks, I think that's a bit nicer.

I'm almost certain that ripping Twine out of all our filesystem interfaces would improve both performance and readability overall :-/

labath accepted this revision.Jan 10 2019, 7:56 AM

Thanks for jumping on this. Looks good to me, though I would feel better if someone else acked this too.

This revision is now accepted and ready to land.Jan 10 2019, 7:56 AM

Why did you choose to make this a property of the RealFileSystem as opposed to subclassing the RealFileSystem to a PhysicalFileSystem? It seems like the latter would be more in line with the VFS design.

Why did you choose to make this a property of the RealFileSystem as opposed to subclassing the RealFileSystem to a PhysicalFileSystem? It seems like the latter would be more in line with the VFS design.

The reason the logic is interleaved like this: the operations aren't a simple "translation" into equivalent operations with absolute paths. Note that the returned files and statuses must report the relative paths provided, not the absolute ones we translate the actual IO into. So building the FS-with-own-WD as an aggregate or derived class of the FS-with-native-WD is going to be awkward and fragile. (And semantically, inheritance seems dubious here).
This could be a hierarchy of three classes where the abstract base tries not to express an opinion (or CRTP, or a template over a policy object), but I didn't find a pattern that seemed less confusing/indirect than just using a single class with if(). I don't expect the performance aspect of having to branch to matter at all.

JDevlieghere accepted this revision.Jan 10 2019, 9:53 AM

Why did you choose to make this a property of the RealFileSystem as opposed to subclassing the RealFileSystem to a PhysicalFileSystem? It seems like the latter would be more in line with the VFS design.

The reason the logic is interleaved like this: the operations aren't a simple "translation" into equivalent operations with absolute paths. Note that the returned files and statuses must report the relative paths provided, not the absolute ones we translate the actual IO into. So building the FS-with-own-WD as an aggregate or derived class of the FS-with-native-WD is going to be awkward and fragile. (And semantically, inheritance seems dubious here).
This could be a hierarchy of three classes where the abstract base tries not to express an opinion (or CRTP, or a template over a policy object), but I didn't find a pattern that seemed less confusing/indirect than just using a single class with if(). I don't expect the performance aspect of having to branch to matter at all.

Fair enough, thanks for the explanation!

This revision was automatically updated to reflect the committed changes.
jfb added a subscriber: jfb.Aug 8 2019, 5:15 PM

I did a follow-up to this: D65986
It caused some unexpected breakage.

Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2019, 5:15 PM

Thanks for the heads up!
I think the breakage was rather from D62271 (when this was written, there was no plan to switch the default)