Details
Diff Detail
- Repository
- rC Clang
Event Timeline
lib/Basic/VirtualFileSystem.cpp | ||
---|---|---|
279–280 | Doesn't this need to be guarded by a lock? I know the current version is thread-hostile in principle but this seems likely to lead to corruption in multithreaded programs. |
- Guard CWDCache with mutex.
lib/Basic/VirtualFileSystem.cpp | ||
---|---|---|
279–280 | Done. Thanks for the catch! |
lib/Basic/VirtualFileSystem.cpp | ||
---|---|---|
249 | nit: it's slightly more than than, you're also making sure that concurrent getCurrentWorkingDirectory calls are coalesced. I think that's fine and maybe not worth spelling out, but maybe the comment adds as much confusion as it resolves. | |
250 | CWDMutex? | |
291–297 | don't you also want to do this under the lock? Otherwise you can write this code: Notification N; RealFileSystem FS; async([&] { FS.setCWD("a"); FS.setCWD("b"); N.notify(); }); async([&] { N.wait(); FS.getCWD(); } and the getCWD call can see "a". |
Might I ask what was the motivation for this change? Performance optimalization?
I am asking this because this makes the RealFileSystem return bogus values for the CWD if it changes through means other than the VFS::setCurrentWorkingDirectory (which is something that, as a library, we can never rule out). We ran into problems with this in lldb where our tests use lldb as a library, and they are driven by a python scripts (which does some CWD changing as a part of test discovery).
If I understand the long scary comment in the setCurrentWorkingDirectory function correctly, the RealFileSystem wants to be independent of the OS notion of CWD. However, right now it's not doing either job very well (sharing the CWD with the OS nor being independent of it), because all other RFS functions will use the real OS CWD (as it is at the moment of the call), only getCurrentWorkingDirectory will return whatever was the last CWD set through the appropriate setter.
Yes, this was for performance.
IIRC the problem was something like calling VFS->makeAbsolute(P) for every filename in a large PCH.
This cache indeed breaks two related assumptions that used to hold:
- VFS->getCurrentWorkingDirectory() is consistent with how paths are interpreted by VFS->openFileForRead() etc
- VFS->getCurrentWorkingDirectory() is consistent with the process's CWD: ::getcwd, llvm::sys::fs::current_path, etc
The first should clearly hold, so this cache certainly introduced a bug. And probably a hard one to track down, sorry :-(
However, I don't think the second should actually be guaranteed here (see below...)
I am asking this because this makes the RealFileSystem return bogus values for the CWD if it changes through means other than the VFS::setCurrentWorkingDirectory (which is something that, as a library, we can never rule out). We ran into problems with this in lldb where our tests use lldb as a library, and they are driven by a python scripts (which does some CWD changing as a part of test discovery).
If I understand the long scary comment in the setCurrentWorkingDirectory function correctly, the RealFileSystem wants to be independent of the OS notion of CWD. However, right now it's not doing either job very well (sharing the CWD with the OS nor being independent of it), because all other RFS functions will use the real OS CWD (as it is at the moment of the call), only getCurrentWorkingDirectory will return whatever was the last CWD set through the appropriate setter.
Right. In order to write correct multithreaded programs that use the FileSystem abstraction, the VFS's working directory needs to be a local attribute of that VFS, not something global.
However the current implementation of RealFileSystem doesn't satisfy this. It is fixable on most OSes (on recent unix, store a FD for the working dir and use openat()) but nobody was burned badly enough to fix it yet I think. With this fix, clearly no cache would be required.
This would imply that RealFileSystem would always return the working directory from when it was created, and all other function would use that working directory too.
Does that seem reasonable? (Does it make it easier to fix the code that got broken by this cache?)
I can try to get an openat()-based implementation ready.
I wholeheartedly support an openat(2) based VFS, as the current one falls short of the expectations you have of it and is pretty broken right now.
As for your assumption #2, I think that depends on what does one want to get out of the VFS. I can certainly see a use for having a thread-local (VFS-local) CWD. In fact I remember seeing some code in LLDB, which used some darwin-only interface to create a real thread-local CWD in one function. However, I can also see a use for a VFS which is so "real" that it shares the notion of the CWD with the OS. What could happen in lldb is that a user sets the CWD in python (using the proper python api), and then passes a relative path to (lib)lldb. Since lldb never advertised to have any notion of a "lldb-local" CWD, the user would (righfully) expect that that path is resolved relative to the CWD he has just set. And that's what happened until we started using VFS. In fact, it still happens now, because the VFS is so bad at having a local CWD. So, the only reason we actually discovered this was because VFS->getCurrentWorkingDirectory returned an empty string if a it's CWD has never been set. This later translated to a null pointer and a crash.
So it almost sounds to me like we should have two implementations of the VFS. A "real" one, which shared the CWD with the OS, and an "almost real" one, which has a local CWD. Another position would be to just say that lldb was wrong to use the VFS in the first place, as it does not promise we want (and it should use a different abstraction instead). That view would be supported by the fact that there are other interface disconnects relating to the use of VFS in lldb (FILE* and friends). However, that's something you and Jonas will have to figure out (I'm just the peanut gallery here :P).
Let me bait-and-switch then... D56545 adds a VFS that emulates openat() using path manipulation.
Since openat isn't available everywhere, to use it we'd need:
- openat-like implementation of path ops
- a fallback path-based implementation
- new APIs to expose this from llvm::sys::fs::
- a factory for the openat-like VFS
but I think it's less invasive and controversial to start with just 2 and 4.
In fact, it still happens now, because the VFS is so bad at having a local CWD. So, the only reason we actually discovered this was because VFS->getCurrentWorkingDirectory returned an empty string if a it's CWD has never been set. This later translated to a null pointer and a crash.
(Hmm, RealFileSystem::getCurrentWorkingDirectory shouldn't return empty unless fs::current_path()` does. But certainly here be dragons)
So it almost sounds to me like we should have two implementations of the VFS. A "real" one, which shared the CWD with the OS, and an "almost real" one, which has a local CWD. Another position would be to just say that lldb was wrong to use the VFS in the first place, as it does not promise we want (and it should use a different abstraction instead). That view would be supported by the fact that there are other interface disconnects relating to the use of VFS in lldb (FILE* and friends). However, that's something you and Jonas will have to figure out (I'm just the peanut gallery here :P).
Yeah - I will confess to finding LLDB's use of VFS difficult - doing some but not all operations through the VFS seems to constrains its implementation and makes the abstraction leakier.
But practically, I think you're right. And the default should probably stay as-is, at least until we have openat() on most platforms, or multithreading becomes more pervasive in LLVM-land.
D56545 rips out the cache for the default version, and explicitly guarantees synchronization with the OS workdir.
My bad. After re-reading the original report it seems that happened was that we got a stale CWD, then went to search for a file that we expected to be there (but it wasn't, because we were searching in the wrong place), and *then* we crashed. :)
nit: it's slightly more than than, you're also making sure that concurrent getCurrentWorkingDirectory calls are coalesced.
I think that's fine and maybe not worth spelling out, but maybe the comment adds as much confusion as it resolves.