This is an archive of the discontinued LLVM Phabricator instance.

[VFS] Cache the current working directory for the real FS.
ClosedPublic

Authored by ioeric on Sep 4 2018, 9:20 AM.

Diff Detail

Repository
rC Clang

Event Timeline

ioeric created this revision.Sep 4 2018, 9:20 AM
sammccall added inline comments.Sep 4 2018, 9:27 AM
lib/Basic/VirtualFileSystem.cpp
279–280

Doesn't this need to be guarded by a lock? I know the current version is thread-hostile in principle but this seems likely to lead to corruption in multithreaded programs.
(I suspect clangd tests would fail under TSan for example)

ioeric updated this revision to Diff 163858.Sep 4 2018, 10:15 AM
ioeric marked an inline comment as done.
  • Guard CWDCache with mutex.
lib/Basic/VirtualFileSystem.cpp
279–280

Done. Thanks for the catch!

sammccall accepted this revision.Sep 5 2018, 1:27 AM
sammccall added inline comments.
lib/Basic/VirtualFileSystem.cpp
249

nit: it's slightly more than than, you're also making sure that concurrent getCurrentWorkingDirectory calls are coalesced.

I think that's fine and maybe not worth spelling out, but maybe the comment adds as much confusion as it resolves.

250

CWDMutex?
If this was other state added to this class, you would *not* want it using the same mutex (because we're doing actual FS operations with the lock held)

291–297

don't you also want to do this under the lock?

Otherwise you can write this code:

Notification N;
RealFileSystem FS;
async([&] { FS.setCWD("a"); FS.setCWD("b"); N.notify(); });
async([&] { N.wait(); FS.getCWD(); }

and the getCWD call can see "a".

This revision is now accepted and ready to land.Sep 5 2018, 1:27 AM
ioeric updated this revision to Diff 163986.Sep 5 2018, 2:20 AM
ioeric marked an inline comment as done.
  • s/Mutex/CWDMutex/
This revision was automatically updated to reflect the committed changes.

Might I ask what was the motivation for this change? Performance optimalization?

I am asking this because this makes the RealFileSystem return bogus values for the CWD if it changes through means other than the VFS::setCurrentWorkingDirectory (which is something that, as a library, we can never rule out). We ran into problems with this in lldb where our tests use lldb as a library, and they are driven by a python scripts (which does some CWD changing as a part of test discovery).

If I understand the long scary comment in the setCurrentWorkingDirectory function correctly, the RealFileSystem wants to be independent of the OS notion of CWD. However, right now it's not doing either job very well (sharing the CWD with the OS nor being independent of it), because all other RFS functions will use the real OS CWD (as it is at the moment of the call), only getCurrentWorkingDirectory will return whatever was the last CWD set through the appropriate setter.

Might I ask what was the motivation for this change? Performance optimalization?

Yes, this was for performance.
IIRC the problem was something like calling VFS->makeAbsolute(P) for every filename in a large PCH.

This cache indeed breaks two related assumptions that used to hold:

  1. VFS->getCurrentWorkingDirectory() is consistent with how paths are interpreted by VFS->openFileForRead() etc
  2. VFS->getCurrentWorkingDirectory() is consistent with the process's CWD: ::getcwd, llvm::sys::fs::current_path, etc

The first should clearly hold, so this cache certainly introduced a bug. And probably a hard one to track down, sorry :-(
However, I don't think the second should actually be guaranteed here (see below...)

I am asking this because this makes the RealFileSystem return bogus values for the CWD if it changes through means other than the VFS::setCurrentWorkingDirectory (which is something that, as a library, we can never rule out). We ran into problems with this in lldb where our tests use lldb as a library, and they are driven by a python scripts (which does some CWD changing as a part of test discovery).

If I understand the long scary comment in the setCurrentWorkingDirectory function correctly, the RealFileSystem wants to be independent of the OS notion of CWD. However, right now it's not doing either job very well (sharing the CWD with the OS nor being independent of it), because all other RFS functions will use the real OS CWD (as it is at the moment of the call), only getCurrentWorkingDirectory will return whatever was the last CWD set through the appropriate setter.

Right. In order to write correct multithreaded programs that use the FileSystem abstraction, the VFS's working directory needs to be a local attribute of that VFS, not something global.
However the current implementation of RealFileSystem doesn't satisfy this. It is fixable on most OSes (on recent unix, store a FD for the working dir and use openat()) but nobody was burned badly enough to fix it yet I think. With this fix, clearly no cache would be required.

This would imply that RealFileSystem would always return the working directory from when it was created, and all other function would use that working directory too.
Does that seem reasonable? (Does it make it easier to fix the code that got broken by this cache?)

I can try to get an openat()-based implementation ready.

labath added a comment.Jan 9 2019, 8:46 AM

I wholeheartedly support an openat(2) based VFS, as the current one falls short of the expectations you have of it and is pretty broken right now.

As for your assumption #2, I think that depends on what does one want to get out of the VFS. I can certainly see a use for having a thread-local (VFS-local) CWD. In fact I remember seeing some code in LLDB, which used some darwin-only interface to create a real thread-local CWD in one function. However, I can also see a use for a VFS which is so "real" that it shares the notion of the CWD with the OS. What could happen in lldb is that a user sets the CWD in python (using the proper python api), and then passes a relative path to (lib)lldb. Since lldb never advertised to have any notion of a "lldb-local" CWD, the user would (righfully) expect that that path is resolved relative to the CWD he has just set. And that's what happened until we started using VFS. In fact, it still happens now, because the VFS is so bad at having a local CWD. So, the only reason we actually discovered this was because VFS->getCurrentWorkingDirectory returned an empty string if a it's CWD has never been set. This later translated to a null pointer and a crash.

So it almost sounds to me like we should have two implementations of the VFS. A "real" one, which shared the CWD with the OS, and an "almost real" one, which has a local CWD. Another position would be to just say that lldb was wrong to use the VFS in the first place, as it does not promise we want (and it should use a different abstraction instead). That view would be supported by the fact that there are other interface disconnects relating to the use of VFS in lldb (FILE* and friends). However, that's something you and Jonas will have to figure out (I'm just the peanut gallery here :P).

I wholeheartedly support an openat(2) based VFS, as the current one falls short of the expectations you have of it and is pretty broken right now.

Let me bait-and-switch then... D56545 adds a VFS that emulates openat() using path manipulation.

Since openat isn't available everywhere, to use it we'd need:

  1. openat-like implementation of path ops
  2. a fallback path-based implementation
  3. new APIs to expose this from llvm::sys::fs::
  4. a factory for the openat-like VFS

but I think it's less invasive and controversial to start with just 2 and 4.

In fact, it still happens now, because the VFS is so bad at having a local CWD. So, the only reason we actually discovered this was because VFS->getCurrentWorkingDirectory returned an empty string if a it's CWD has never been set. This later translated to a null pointer and a crash.

(Hmm, RealFileSystem::getCurrentWorkingDirectory shouldn't return empty unless fs::current_path()` does. But certainly here be dragons)

So it almost sounds to me like we should have two implementations of the VFS. A "real" one, which shared the CWD with the OS, and an "almost real" one, which has a local CWD. Another position would be to just say that lldb was wrong to use the VFS in the first place, as it does not promise we want (and it should use a different abstraction instead). That view would be supported by the fact that there are other interface disconnects relating to the use of VFS in lldb (FILE* and friends). However, that's something you and Jonas will have to figure out (I'm just the peanut gallery here :P).

Yeah - I will confess to finding LLDB's use of VFS difficult - doing some but not all operations through the VFS seems to constrains its implementation and makes the abstraction leakier.
But practically, I think you're right. And the default should probably stay as-is, at least until we have openat() on most platforms, or multithreading becomes more pervasive in LLVM-land.
D56545 rips out the cache for the default version, and explicitly guarantees synchronization with the OS workdir.

In fact, it still happens now, because the VFS is so bad at having a local CWD. So, the only reason we actually discovered this was because VFS->getCurrentWorkingDirectory returned an empty string if a it's CWD has never been set. This later translated to a null pointer and a crash.

(Hmm, RealFileSystem::getCurrentWorkingDirectory shouldn't return empty unless fs::current_path()` does. But certainly here be dragons)

My bad. After re-reading the original report it seems that happened was that we got a stale CWD, then went to search for a file that we expected to be there (but it wasn't, because we were searching in the wrong place), and *then* we crashed. :)