This is an archive of the discontinued LLVM Phabricator instance.

[VFS] OverlayFileSystem should consistently use the last-added FS first
Changes PlannedPublic

Authored by bnbarham on Mar 10 2022, 3:34 PM.

Details

Summary

Rather than checking whether a file exists before running an operation
and using the result of the first FS where it exists, always run
operations on the most recently added FS's first. If there's any error
other than missing file, return it immediately.

Also update CWD handling. The previous implementation attempted to
setCWD on all contained FS and then would use the CWD of the first
FS for getCWD, assuming they are all synced. This had a bunch of odd
behaviors:

  1. setCWD would fail but all FS up to the failure would have their CWD set. getCWD would then return the just set CWD, even though it "failed"
  2. Their CWD could get out of sync, so the CWD used by each FS wouldn't necessarily match the just set CWD

Fix this by keeping a CWD in OverlayFileSystem instead and pass the
already-absoluted path down into the contained FS. This has the downside
that any result needs to have the path overridden back to the original
(possibly relative) path, which is further complicated by certain
implementations needing to return a different path that *shouldn't* be
mapped.

Right now this is handled by only overriding the path if the returned
path was the one that was requested, but it would be nice to come up
with a better solution.

Depends on D121421

Diff Detail

Event Timeline

bnbarham created this revision.Mar 10 2022, 3:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 3:34 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
bnbarham requested review of this revision.Mar 10 2022, 3:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 3:34 PM
bnbarham updated this revision to Diff 414521.Mar 10 2022, 3:50 PM

Moved overlays_range into D121421.

clang-tidy/infrastructure/empty-database.cpp is failing because setCurrentWorkingDirectory("") previously returned an error and no longer does.

It would have run on the RealFileSystem with a *process linked* CWD previously, which really just runs a chdir. chdir errors if the path doesn't exist. It now successfully sets on the InMemoryFileSystem since it checks nothing. This raises a few couple interesting questions:

  1. The implementation of setCurrentWorkingDirectory is different for the non-process linked case. It first makes the path absolute and then errors if that path doesn't exist. So if that was being used, we also wouldn't error here (it would effectively be a no-op).
  2. Should InMemoryFileSystem fail to set the CWD if the directory doesn't exist? I was considering doing this before I noticed this test failing, since it seems inconsistent with the other implementation. Thoughts?
bnbarham updated this revision to Diff 414733.Mar 11 2022, 1:06 PM

Fix failing test

Adds an error to InMemoryFileSystem if attempting to set an empty CWD when no CWD has been set.

This seems like a great thing to do. I have some comments

clang-tidy/infrastructure/empty-database.cpp is failing because setCurrentWorkingDirectory("") previously returned an error and no longer does.

It would have run on the RealFileSystem with a *process linked* CWD previously, which really just runs a chdir. chdir errors if the path doesn't exist. It now successfully sets on the InMemoryFileSystem since it checks nothing.

Is this behaviour on an empty string consistent across platforms?

This raises a few couple interesting questions:

  1. The implementation of setCurrentWorkingDirectory is different for the non-process linked case. It first makes the path absolute and then errors if that path doesn't exist. So if that was being used, we also wouldn't error here (it would effectively be a no-op).

I feel like there are a few reasonable options.

  • Special case empty string in the VFS as meaning the same thing as ..
  • Special case empty string in the VFS as always given an error.
  • Try to match behaviour of the underlying OS (if they're all the same, that's the same as one of the above).

IMO, cd "" is not a very interesting case, but it'd be nice for it to be consistent.

  1. Should InMemoryFileSystem fail to set the CWD if the directory doesn't exist? I was considering doing this before I noticed this test failing, since it seems inconsistent with the other implementation. Thoughts?

My guess is that it doesn't error partly to align with the behaviour in OverlayFileSystem, so updating it as you've done seems fine to me, assuming we get the CWD working correctly in OverlayFS.

However, I'm concerned that the combination of the changes here will make the following do the wrong thing:

overlay:
  in-memory #1: (base)
    /a/x
  in-memory #2:
    /c/a/x

operations:

cd /
cd /c
cat a/x

IIUC the combined effect of your changes, this "should" print /c/a/x from in-memory-v2, but instead it'll print /a/x from in-memory-v1. And I think before this patch, it would have done the right thing.

llvm/lib/Support/VirtualFileSystem.cpp
510

I think this will be easier to reason about if CWD is only updated on the first success inside a given call to OverlayFileSystem::setCurrentWorkingDirectory. (Note that different VFSes could be canonicalizing relative paths differently.)

I also wonder if, instead, we can just keep track of which FS currently has the canonical CWD:

uint32_t IndexForCWD = 0; // Starts as BaseFS.

Calls to getCurrentWorkingDirectory() would forward to overlays_begin()[IndexForCWD].getCurrentWorkingDirectory(). This way, the overlay inherits the CWD canonicalization of the first working FS (usually, BaseFS).

514–517

The out-of-sync-ness here will remain hard to reason about if there is a sequence of cd relative-path calls.

I think if errc::no_such_file_or_directory is NOT returned (if there was success on any of the filesystems) then

  • A bit should be stored in its entry in the OverlayFS, which prevents passing subsequent relative paths to it, until it has a valid CWD again.
  • Even setCurrentWorkingDirectory() would not be forwarded for a relative CWD unless the filesystem has a valid CWD. It'd be possible to get "permanently lost" after successive relative path moves, such as cd a/ (not found in FS2) followed by cd .. (can't move back!), but I think much easier to reason about. (You could also potentially recover from that, by "trying" to apply ..s in the path and asking a working FS whether the two CWDs are equivalent.)

(I'm a bit scared of changing this to return true in new places until something like that is in place.)

This seems like a great thing to do. I have some comments

[Forgot to finish this thought]

I meant to say that I don't think the current approach is going to work quite correctly; comments inline :).

However, I'm concerned that the combination of the changes here will make the following do the wrong thing:

overlay:
  in-memory #1: (base)
    /a/x
  in-memory #2:
    /c/a/x

operations:

cd /
cd /c
cat a/x

IIUC the combined effect of your changes, this "should" print /c/a/x from in-memory-v2, but instead it'll print /a/x from in-memory-v1. And I think before this patch, it would have done the right thing.

I think my explanation was a bit muddled.

  • Before this patch:
    • cd /c would have returned "directory not found". Clients think we're in /.
    • cat a/x will give content aligning with /a/x.
    • This seems correct.
  • After this patch:
    • cd /c has no error. Clients think we're in /c.
    • cat a/x will give content aligning with /a/x/.
    • This seems incorrect. Since the cd /c succeeded, we should get /c/a/x.

I think my explanation was a bit muddled.

  • Before this patch:
    • cd /c would have returned "directory not found". Clients think we're in /.
    • cat a/x will give content aligning with /a/x.
    • This seems correct.
  • After this patch:
    • cd /c has no error. Clients think we're in /c.
    • cat a/x will give content aligning with /a/x/.
    • This seems incorrect. Since the cd /c succeeded, we should get /c/a/x.

To be clear, setCWD never failed on InMemoryFileSystem (except with the current change where it fails on empty).

So before this patch:

  • cd /c succeeds, both FS get that CWD
  • cat a/x looks up /c/a/x in #1 which fails and then gives #2

After this patch:

  • Same thing since setCWD never fails when given an actual path

But this *is* a problem. Consider the same thing but with RedirectingFileSystem instead:

overlay
  #1 /a/x -> /real/foo
  #2 /c/a/x -> /real/bar
  #3 real, assume foo and bar exist but /a and /c do not

Before this patch setCWD would immediately fail. So I guess no weirdness but also... still not great since I clearly do have a /c?

After this patch, setCWD fails on #1 and #3 but succeeds on #2, so the overlay now has /c. We then lookup a/x which gives /a/x instead of /c/a/x, which is likely what is expected.

Neither particularly make any sense here. I think the best option is probably what you suggested - store a bit representing whether or not setCWD failed and if we get any relative paths in, immediately return with no_such_file_or_directory if that bit is set.

To alleviate the "getting stuck" maybe we could find the first setCWD that succeeds, getCWD, and then re-run setCWD on all the rest to that CWD. Thoughts?

llvm/lib/Support/VirtualFileSystem.cpp
510

Updating multiple times was definitely unintentional. I did consider keeping track of the first working FS. I decided against it since theoretically you could change the CWD of one of the underlying FS if you had a reference to it. That is:

OverlayFS->setCurrentWorkingDirectory("/foo");
ContainedFS->setCurrentWorkingDirectory("/bar");
OverlayFS->getCurrentWorkingDirectory(); // Should this be /foo or /bar?

Also note that overlays_begin() is the reverse order, ie. BaseFS is the last FS used (not the first). I'd think it'd actually be fairly uncommon for it to be the first working FS.

514–517

Yes, out-of-sync-ness is a real issue. I'll expand on that in my other comment.

Note that the currently failing tests are related to this somehow. They set CWD to "." and then try to find a relative file in the InMemoryFileSystem. That used to work but is now failing. I'm a little confused by that because the particular sequence seems the same in both:

  • setCWD to "." // Does nothing to InMemoryFileSystem, RealFileSystem will continue to have the original path
  • add "a.cc" to InMemoryFileSystem
  • try to read "a.cc", with my changes it asks for the full path (which fails), since it worked previously I can only assume that originally it passed it in as relative, so I need to find why that is

I've thought of another concern: the "recursively call CWD" behaviour can result in CWD being called multiple times on a filesystem instance.

overlay:
  base:
    /a/a/c
  overlay:
    base:
      /a/a/c
operations:
  cd /
  cd a
  pwd   # /a
  cat c # cat /a/a/c

I think that has changed subtly with this patch:

  • Before, pwd would give /a/a
  • Now, pwd would give /a.

Another case:

overlay:
  base:
    /a/b/c
  overlay:
    base:
      /a/b/c
operations:
  cd /
  cd a
  pwd     # /a
  cat b/c # fail
  • Before, cd a would have failed (but a subsequent pwd would give /a)
  • Now it'll succeed

I think that's better? but only incidentally, I think, not by design.

To alleviate the "getting stuck" maybe we could find the first setCWD that succeeds, getCWD, and then re-run setCWD on all the rest to that CWD. Thoughts?

That might work. Or, getCWD() might return /bad/path/../.. because it doesn't know how to remove the ...


Stepping back, I'm not sure it's sound to be calling cd on the underlying FS. I think this is what led to RedirectingFS maintaining its own CWD and using absolute paths for access.

llvm/lib/Support/VirtualFileSystem.cpp
510

Aha, right, BaseFS is last.

I decided against it since theoretically you could change the CWD of one of the underlying FS if you had a reference to it.

Maybe a side-channel modification of the CWD of the underlying OS is not something that overlay-fs should have to reason about.

I've thought of another concern: the "recursively call CWD" behaviour can result in CWD being called multiple times on a filesystem instance.

Ouch. I didn't consider this case. There's certainly no tests for it and I'd be surprised if anyone actually does that.

But in saying that, I think:

To alleviate the "getting stuck" maybe we could find the first setCWD that succeeds, getCWD, and then re-run setCWD on all the rest to that CWD. Thoughts?

Does actually fix that issue right? Ie. the problem is passing a relative path to each FS. But if we avoid that by using the absolute path of the first FS that succeeds then we're all good.

Stepping back, I'm not sure it's sound to be calling cd on the underlying FS. I think this is what led to RedirectingFS maintaining its own CWD and using absolute paths for access.

I would have to check the order of commits here. But I believe OverlayFS stopped being used for -ivfsoverlay before CWD was added, so RedirectingFS necessarily had to pass the absolute path down because there was no other option. That path leads to insanity though (IMO) and the current implementation isn't even correct - given multiple overlays, if the first fails then we *always* end up returning the original path.

What's the alternative to setting the CWD on the underlying FS here? Just removing OverlayFS? I did attempt to fix the bug I just mentioned directly, but it makes RedirectingFS even *more* of a mess and I was struggling quite a bit to reason about everything. I really don't think that's the direction we want to go.

My vote is on:

  1. Use the CWD of the first successful FS
  2. Set CWD on all FS to the one returned by the first successful FS

Though I believe this will mean the tests are still broken as this really only fixes the relative case, I haven't had a chance to figure out what's going on there yet.

I've thought of another concern: the "recursively call CWD" behaviour can result in CWD being called multiple times on a filesystem instance.

Ouch. I didn't consider this case. There's certainly no tests for it and I'd be surprised if anyone actually does that.

But in saying that, I think:

To alleviate the "getting stuck" maybe we could find the first setCWD that succeeds, getCWD, and then re-run setCWD on all the rest to that CWD. Thoughts?

Does actually fix that issue right? Ie. the problem is passing a relative path to each FS. But if we avoid that by using the absolute path of the first FS that succeeds then we're all good.

I think you're right. If the first success is used to convert to absolute, it should be okay. I think worth adding tests for since I do think this sort of thing can happen... TBH, all of the cases I can think of are about old RedirectingFS behaviour, so maybe not, but at least worth figuring out if it works with the changes here and keeping it working to prevent future confusion.

Stepping back, I'm not sure it's sound to be calling cd on the underlying FS. I think this is what led to RedirectingFS maintaining its own CWD and using absolute paths for access.

I would have to check the order of commits here. But I believe OverlayFS stopped being used for -ivfsoverlay before CWD was added, so RedirectingFS necessarily had to pass the absolute path down because there was no other option. That path leads to insanity though (IMO) and the current implementation isn't even correct - given multiple overlays, if the first fails then we *always* end up returning the original path.

What's the alternative to setting the CWD on the underlying FS here? Just removing OverlayFS? I did attempt to fix the bug I just mentioned directly, but it makes RedirectingFS even *more* of a mess and I was struggling quite a bit to reason about everything. I really don't think that's the direction we want to go.

My vote is on:

  1. Use the CWD of the first successful FS
  2. Set CWD on all FS to the one returned by the first successful FS

Though I believe this will mean the tests are still broken as this really only fixes the relative case, I haven't had a chance to figure out what's going on there yet.

Redirecting FS is definitely hard to reason about right now; still integrating what gets simplified with your change. I had completely missed that, because you're removing the fallthrough behaviour, it no longer needs to query the underlying FS with a relative path... ever. This is awesome!


A few remaining thoughts:

  • The first successful FS will often be a RedirectingFS. But the BaseFS seems most likely to be a filesystem that is well-equipped to canonicalize the CWD. Since we're calling this on every FS, should we run the cd loop bottom-up instead of top-down (reverse(overlays()))?
  • Should we make PhysicalFileSystem better at canonicalizing the CWD? (maybe off-topic for this patch)
    • On Windows, it could just run remove_dots(/*remove_dot_dot=/true). This is trivial.
    • Else, if there are ..s in a new relative path CWD, it can getRealPath() and:
      • if it matches what remove_dot_dot would give, use that
      • else, try getRealPath() on the remove_dot_dot=true path, and if that's the same as the origin getRealPath(), use the remove-dot-dot path
      • else, give up and use the current logic
    • Downside is that getRealPath() can be slow. But will this matter in practice for setCWD() calls?
  • The client's view of the OverlayFS *should* be a single (virtual) filesystem. But without top-level CWD modelling, we can't actually provide that. Consider:
overlay:
  base:
    /file
  memory:
    /dir/

operations:
  cd /dir
  cat ../file

This *should* give the content from /file in base. But even with our proposed changes it's not going to work correctly. I think one way to be correct would be:

  • Find the "first" FS (like CWD?) where getRealPath(dirname(relativepath)) does not fail
  • Then combine that real-path with filename(relativepath), and go through the overlays with that path

... but this would probably be too slow. Not sure if you have ideas on how to fix it; and I don't think it's made worse by this patch (off topic?). Maybe worth adding a FIXME or description somewhere documenting that it doesn't work.

(Regardless, I think your plan should work.)

bnbarham added a comment.EditedMar 14 2022, 3:47 PM
  • The client's view of the OverlayFS *should* be a single (virtual) filesystem. But without top-level CWD modelling, we can't actually provide that. Consider:
overlay:
  base:
    /file
  memory:
    /dir/

operations:
  cd /dir
  cat ../file

This *should* give the content from /file in base. But even with our proposed changes it's not going to work correctly. I think one way to be correct would be:

  • Find the "first" FS (like CWD?) where getRealPath(dirname(relativepath)) does not fail
  • Then combine that real-path with filename(relativepath), and go through the overlays with that path

... but this would probably be too slow. Not sure if you have ideas on how to fix it; and I don't think it's made worse by this patch (off topic?). Maybe worth adding a FIXME or description somewhere documenting that it doesn't work.

(Regardless, I think your plan should work.)

The real problem here is that we can't just resolve the path in OverlayFileSystem and pass that down because that would mean that the returned file/status would have a different path to the one requested. Unfortunately we can't just overwrite it with the original path, since it's possible the underlying FS *has to* return a different path (ie. because of external-names: true).

If we could resolve the path in OverlayFileSystem then we could just do a status + check that it's a directory on each FS and if so, set CWD in OverlayFileSystem and not any underlying FS.

Maybe we should just do that though with one exception - if the returned path is not the one we gave then *don't* replace the path...? External names is a bit of a hack anyway, so hack for a hack 😆?

EDIT: Note that this wouldn't help with symlinks, but the current implementation doesn't handle that anyway. Even doing Duncan's realpath suggestion wouldn't fix that I believe, since later FS may have different mappings.

And FWIW, this is basically how it worked/works in RedirectingFileSystem, minus the equality check. But the lack of that check (or some other way to handle this case) is a bug that results in external-names: true being ignored.

If we could resolve the path in OverlayFileSystem then we could just do a status + check that it's a directory on each FS and if so, set CWD in OverlayFileSystem and not any underlying FS.

Maybe we should just do that though with one exception - if the returned path is not the one we gave then *don't* replace the path...? External names is a bit of a hack anyway, so hack for a hack 😆?

I went ahead and implemented this. The main problem I can see is that the File returned by RealFileSystem::openFileForRead looks like it's supposed to give the *real path* for File::getName (though this seems to be the case for only RealFile). setPath currently replaces that as well, which would result in different behaviour for a OverlayFileSystem containing a RealFileSystem than just using RealFileSystem itself:

OverlayFileSystem
  RealFileSystem
    /a/b
    /f

cd /a/b
cat ../../f # File::getName + Status::getName == ../../f

RealFileSystem
  /a/b
  /f

cd /a/b
cat ../../f # File::getName == /a/b/f, Status::getName == ../../f

And this is even more of a problem when RedirectingFileSystem with use-external-names: false:

OverlayFileSystem
  RedirectingFileSystem(useExternalNames: false)
    /f2 -> f
  RealFileSystem
    /a/b
    /f

cd /a/b
cat ../../f2

A FileWithFixedStatus is returned that would have /a/b/../../f2 for File::getName and Status::getName. Then OverlayFileSystem would setPath("../../f2") and thus both would now return ../../f2. I'm not sure what we actually *want* to return here though. Maybe that's fine?

If we could resolve the path in OverlayFileSystem then we could just do a status + check that it's a directory on each FS and if so, set CWD in OverlayFileSystem and not any underlying FS.

Maybe we should just do that though with one exception - if the returned path is not the one we gave then *don't* replace the path...? External names is a bit of a hack anyway, so hack for a hack 😆?

I went ahead and implemented this. The main problem I can see is that the File returned by RealFileSystem::openFileForRead looks like it's supposed to give the *real path* for File::getName (though this seems to be the case for only RealFile).

Do you really mean real path here (system call to realpath, that sees through symlinks), or just absolute path?

Is it only when RealFileSystem has a CWD? If RealFileSystem has a CWD, then in practice it needs to use absolute paths for calls to sys::fs since the process CWD cannot be trusted. I suspect it's just failing to model the working directory for the paths it returns (which is sometimes hard, in the presence of symlinks). Maybe that's fine for now.

Is this behaviour (overlayfilesystem's behaviour when wrapping a RealFileSystem) a regression/change or is it the same as before?

Do you really mean real path here (system call to realpath, that sees through symlinks), or just absolute path?

Is it only when RealFileSystem has a CWD? If RealFileSystem has a CWD, then in practice it needs to use absolute paths for calls to sys::fs since the process CWD cannot be trusted. I suspect it's just failing to model the working directory for the paths it returns (which is sometimes hard, in the presence of symlinks). Maybe that's fine for now.

Is this behaviour (overlayfilesystem's behaviour when wrapping a RealFileSystem) a regression/change or is it the same as before?

I mean realpath, see https://llvm.org/doxygen/namespacellvm_1_1sys_1_1fs.html#a3e2cd27a3a2e6c44e9c0fc022952ea54, which is what RealFile::RealName gets set to. Honestly this seems a bit odd to me, there was a commit back in 2018 that renamed getName to getRealName but it ended up being reverted because tests failed and never re-added. I certainly wouldn't expect File::getName to return a realpath (and then only sometimes). Personally I'd like to remove the realpath behavior, or actually rename it to getRealName and make sure all filesystems implement... but I think that's a rabbit hole for a different day.

Anyway, it would be a regression, though one that's easy to fix - just don't set RealName in RealFile::setPath. This is what I've done (I'll put up new changes after lunch). I was more pointing out the RedirectingFileSystem behavior in comparison (which would still be the same since it returns a FileWithFixedStatus that just uses status()->getName()).

bnbarham updated this revision to Diff 415614.Mar 15 2022, 3:42 PM
bnbarham edited the summary of this revision. (Show Details)

Pass absolute paths down into contained FS

bnbarham planned changes to this revision.Mar 17 2022, 6:47 PM

We had some discussions around the issue of needing to replace the returned path with the original, but that not being possible because of use-external-names: true. I'm going to put up a separate patch that should make it possible to detect when an external path was returned and then not replace in that case. We can then use that here.