This is an archive of the discontinued LLVM Phabricator instance.

[VFS] Add support to RedirectingFileSystem for mapping a virtual directory to one in the external FS.
ClosedPublic

Authored by nathawes on Jan 15 2021, 4:17 PM.

Details

Summary

Previously file entries in the -ivfsoverlay yaml could map to a file in the external file system (specified via the 'external-contents' property, taking a path), but directories had to list their contents in the form of other file or directory entries (via the 'contents' property, taking an array). Allowing directory entries to map to a directory in the external file system (via the same 'external-contents' property as file entries) makes it possible to present an external directory's contents in a different location and, in combination with the 'fallthrough' option, overlay one directory's contents on top of another.

As an example, with the yaml below:

{ 'roots': [
  {
    'type': 'directory',
    'name': '//root/',
    'contents': [
      {
        'type': 'directory-remap',
        'name': 'mappeddir',
        'external-contents': '//root/foo/bar'
      }
    ]
  }
]}

The path 'root/mappeddir/somefile' would map to 'root/foo/bar/somefile' in the external file system. If that file wasn't found in the external file system and the 'fallthrough' configuration option is true, it would fall back to checking for 'root/mappeddir/somefile' in the external file system. This effectively overlays the contents of root/foo/bar/ on top of //root/mappeddir.

Resolves rdar://problem/72485443

Diff Detail

Event Timeline

nathawes created this revision.Jan 15 2021, 4:17 PM
nathawes requested review of this revision.Jan 15 2021, 4:17 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 15 2021, 4:17 PM

This patch also refactors OverlayFileSystem's directory iterator implementation (OverlayFSDirIterImpl) and VFSFromYamlDirIterImpl into a single implementation, addressing a FIXME about their conceptual similarity.

Can the OverlayFSDirIterImpl part be split into a separate prep commit (possibly even NFC) in order to clarify the functional changes here?

nathawes planned changes to this revision.Jan 15 2021, 4:56 PM

Yeah, certainly. Let me put that up separately and update this diff.

Yeah, certainly. Let me put that up separately and update this diff.

Thanks! (Optionally, you can link them using the "edit related revisions" feature at the top of the web interface, if you find it useful.)

nathawes updated this revision to Diff 317157.Jan 15 2021, 10:27 PM
nathawes edited the summary of this revision. (Show Details)

Moved the change combining OverlayFSDirIterImpl and VFSFromYamlDirIterImpl in a single implementation into a separate NFC patch (https://reviews.llvm.org/D94857)

I don't see a test for a case where the mapped directory already exists in the external FS, something like:

# external fs
/d1/f1
/d2/f2

# redirect fs from yaml (fallthrough: true)
/d1 -> /d2

Unless I just missed it, can you add one?

My intuition is we'd want to overlay the directory contents, not replace them:

# overlayed contents (my intuition for fallthrough: true)
/d1/f1
/d1/f2
/d2/f2

# replaced contents (I think unexpected)
/d1/f2
/d2/f2

If you agree, it might be good to test that the recursive directory iterator does the right thing.

Might also be valuable to test the non-fallthrough case, which I assume should give this view:

# fallthrough: false
/d1/f2

@JDevlieghere, I'm hoping you can help review the details of the code here, as I don't know the iteration code well. Also, are these API changes okay for LLDB (I seem to remember LLDB subclassing this...)?

llvm/include/llvm/Support/VirtualFileSystem.h
617–619

Is it easy to move this declaration after DirectoryEntry? If so, I think that'd reduce the diff a bit, and make it more clear what parts of this were being used verbatim from the old FileEntry. Up to you though.

812–817

It's not entirely clear what the extra parameters do / when they're filled in / etc; can you add a bit more documentation?

Also, I see that the public caller ignores the ExternalRedirect parameter. I wonder if this should be renamed to lookupPathWithName or lookupPathImpl, leaving behind a lookupPath wrapper that creates the ignored buffer for the public interface... no strong opinion though.

llvm/lib/Support/VirtualFileSystem.cpp
1904

Could this return E, and then the callers return SetExtRedirect(...)? Not sure if it'd actually be cleaner, up to you.

2098

Nit: please keep the early return.

2106

Nit: please use an early return here to avoid unnecessary indentation (I think the rest of this diff goes away...).

2129–2132

This seems a bit unfortunate to have to change, but not a big deal...

nathawes planned changes to this revision.Jan 21 2021, 11:55 PM

Thanks for reviewing @dexonsmith!

I don't see a test for a case where the mapped directory already exists in the external FS ... Unless I just missed it, can you add one?

I believe the clang/test/VFS/directory.c regression test covers that case. In that one the real file system is initially set up as follows:

%t/Underlying/
    C.h
    B.h
%t/Middle
    C.h
%t/Overlay
    C.h

And it then sets up vfs overlay yaml files with fallthrough set true that redirect Underlying -> Overlay, or Underlying -> Middle and Middle -> Overlay, as specified on the numbered lines in the test (e.g. // 1) Underlying -> Overlay).

My intuition is we'd want to overlay the directory contents, not replace them

I think it behaves as you expect (as an overlay, rather than replacement) when fallthrough is set to true. You can see that happening in first case in the same test (clang/test/VFS/directory.c). It has the real file system set up as above and creates a vfs yaml file that maps %t/Underlying to %t/Overlay in the external/real file system with fallthrough set true. It invokes clang passing that overlay file, an include path of %t/Underlying, and a source file that includes both C.h and B.h, and checks that it finds the C.h from Overlay (after redirecting Underlying -> Overlay) and B.h from Underlying (after redirecting Underlying -> Overlay, not finding it, and falling back to the original path in the underlying file system).

I really should have a unit test showing the directory iterator output directly, though. That makes the overlay behavior much more obvious.

Overall this looks good. I wonder if abstracting the ExternalRedirect as a small wrapper class around a SmallString would help. There's a few operations that are repeated, like the example below, and it could also house the logic that's currently in the lambda.

if (ExternalRedirect.empty()) {
  assert(RE->getKind() == RedirectingFileSystem::EK_File);
  ExternalRedirect = RE->getExternalContentsPath();
}

One thing I found tricky during the review was differentiating between places where the ExternalRedirect was computed and uses. Looking at the signatures the lookupPath functions are computing it and status is consuming it, although the latter then changes the StringRef (which I guess shouldn't persist?). I don't know if the wrapper could alleviate this as well.

llvm/include/llvm/Support/VirtualFileSystem.h
564

I think we should expand on this a bit more to contrast a directory-remapped directory vs a file-remapped directory (for a lack of better term). Specifically, an empty file-remapped directory could be confusing, as it behaves quite differently but looks pretty similar in YAML.

{
  'type': 'file',
  'name': <string>,
  'external-contents': <path to external file>
}
{
  'type': 'directory',
  'name': <string>,
  'contents': []
}

Looking at the two, I wonder if there would be any benefit to changing the type in the yaml to match the NameKind.

633

nit: s/whether/Whether/

812–817

+1.

nathawes updated this revision to Diff 319505.Jan 27 2021, 2:45 AM
  • Updated the yaml for the new kind of virtual directory (that maps to a directory in the external filesystem) to have a distinct value for its 'type' field ('directory-remap') to better distinguish it from the existing 'directory' type that behaves quite differently.
  • Updated lookupPath to return a structure giving both the matched Entry and the ExternalRedirect path, rather than returning the matched Entry and setting the ExternalRedirect path via an out parameter. This makes it clearer where the ExternalRedirect path is computed vs used.
  • Updated RedirectingFileSystem::dir_begin to respect the 'use-external-names' setting for DirectoryRemapEntry.
  • Added unit tests covering directory iteration (dir_begin) with DirectoryRemapEntry support.
  • Expanded on the documentation comments for RedirectingFileSystem to hopefully make the difference between 'directory' and 'directory-remap' entries clearer.
nathawes marked 6 inline comments as done.Jan 27 2021, 3:16 AM

@JDevlieghere and @dexonsmith would you mind taking another look?

I ended up changing the 'lookup' functions to drop the ExternalRedirect out param and supply that via a new 'LookupResult' return value that also gives the matched Entry. Hopefully that avoids any confusion over where it's computed vs used. It also served as a good home for the duplicated code and the logic in the old SetExternalRedirect closure, as Jonas had mentioned. While adding the extra directory iteration tests Duncan suggested I noticed I wasn't respecting the 'use-external-names' setting in the return items, so I've also added a new directory iterator implementation (RedirectingFSDirRemapIterImpl). That one's used to wrap the iterator the external file system gives for the directory being mapped to, and updates the paths in the items it returns to refer to the virtual directory's path instead, before passing them along.

I haven't looked at the code in detail (deferring to @JDevlieghere), but the new tests look great. Thanks!

JDevlieghere accepted this revision.Jan 27 2021, 4:10 PM

@JDevlieghere and @dexonsmith would you mind taking another look?

I ended up changing the 'lookup' functions to drop the ExternalRedirect out param and supply that via a new 'LookupResult' return value that also gives the matched Entry. Hopefully that avoids any confusion over where it's computed vs used. It also served as a good home for the duplicated code and the logic in the old SetExternalRedirect closure, as Jonas had mentioned. While adding the extra directory iteration tests Duncan suggested I noticed I wasn't respecting the 'use-external-names' setting in the return items, so I've also added a new directory iterator implementation (RedirectingFSDirRemapIterImpl). That one's used to wrap the iterator the external file system gives for the directory being mapped to, and updates the paths in the items it returns to refer to the virtual directory's path instead, before passing them along.

I like the new approach, it adds an indirection but nicely abstracts how we get the ExternalRedirect. I've left an inline comment about a copy I think we can avoid, but other than that this LGTM.

llvm/include/llvm/Support/VirtualFileSystem.h
748

While I prefer Optional over a pointer, this would now incur a copy, so I think it's better to pass a LookupResult* here, or maybe even a RemapEntry*.

This revision is now accepted and ready to land.Jan 27 2021, 4:10 PM
nathawes updated this revision to Diff 320257.Jan 29 2021, 6:38 PM
nathawes marked an inline comment as done.
nathawes edited the summary of this revision. (Show Details)
nathawes set the repository for this revision to rG LLVM Github Monorepo.
  • Updated shouldFallBackToExternalFS() to accept an Entry * rather than an Optional<LookupResult> to avoid an unnecessary copy.
Herald added a project: Restricted Project. · View Herald TranscriptJan 29 2021, 6:38 PM
nathawes updated this revision to Diff 320646.Feb 1 2021, 5:50 PM

Made the following changes as a speculative fix for the windows test failures in the pre-merge checks:

  • When appending the remaining path components to the external redirect path of a directory-remap entry, use the separator type of the redirect path rather than the host system.
  • For a directory-remap entry, when changing the paths returned in the external file system's directory iterator to appear to be in the virtual file system's directory, use the separator type from the virtual directory's path rather than the host system's.
This revision was landed with ongoing or failed builds.Feb 1 2021, 8:57 PM
This revision was automatically updated to reflect the committed changes.