This is an archive of the discontinued LLVM Phabricator instance.

Support lib changes to allow VFS implementation handling working dir locally.
AbandonedPublic

Authored by ilya-biryukov on Aug 3 2017, 4:51 AM.

Details

Reviewers
None
Summary

This is in unfinished state as of now and not ready for submission.

  • Has a bunch of TODOs that must be fixed.
  • Laking Windows implementation.

Event Timeline

ilya-biryukov created this revision.Aug 3 2017, 4:51 AM

Hi, @chapuni,

@klimek told me you could probably help with Windows-specific questions.

We're working on making clang's vfs::RealFileSystem more thread-friendly. Current implementation uses process-global working directory, we want to get rid of that. (vfs::FileSystem has setCurrentWorkingDirectory and must do all relative paths lookups according to its value).
To do that, we want to store file descriptor of working directory in vfs::RealFileSystem and do all lookups relatively to that. On Unix we achieve that by calling openat and their equivalents.
This review contains a prototype of the changes we need to Support library to make it work on Unix. The question is how to do it properly on Windows.

After doing some research, it seems that we should use NtCreateFile and pass proper ObjectAttributes.RootDirectory (e.g. see this stackoverflow question).
But AFAIU, this means we'll have to rewrite the Windows version Path.inc to use Nt-functions everywhere.
What are your thoughts on that? Is it ok to rewrite Path.inc to use Nt functions? Are there any alternative implementations/APIs that would work for us?

It's not just Windows, but not even all unixes support openat. Minimum required version of various Unixes are:

DragonFly 2.3.
FreeBSD 8.0.
Linux 2.6.16
NetBSD 7.0.
OpenBSD 5.0.
OS X 10.10.

I don't know how old some of these are, but you should check first to make sure we don't support any versions of these unix variants with versions older than specified above. As for Windows, If we're going to use Nt functions, I would rather they be entirely separate functions. i.e. not done via an optional parameter to an existing function, but a completely new function.

Also, even without using the Nt functions, you could get the working directory via the process's HANDLE, then stitch together a new working directory and build an absolute path out of it. It's not entirely atomic the way openat is because someone could change the working directory of the process after you query its working directory, but I suspect that wouldn't matter?

Also, even without using the Nt functions, you could get the working directory via the process's HANDLE, then stitch together a new working directory and build an absolute path out of it. It's not entirely atomic the way openat is because someone could change the working directory of the process after you query its working directory, but I suspect that wouldn't matter?

Right, that does not matter. I actually made a patch that does exactly that in clang's vfs::FileSystem, it doesn't require any changes in the Support library and all LLVM tests seem to pass if you do that: https://reviews.llvm.org/D36226?vs=on&id=109362&whitespace=ignore-most#toc.
But there's a major flaw in that implementation: if your working dir is symlinked and it changes during the lifetime of your program, you will be looking into different directories on each file lookup.
Unfortunately, that's what would break us at Google, since we to access the code via a FUSE filesystem that has a symlink that will change its destination on each request, we really need to read its file descriptor only once.

It's not just Windows, but not even all unixes support openat. Minimum required version of various Unixes are:

DragonFly 2.3.
FreeBSD 8.0.
Linux 2.6.16
NetBSD 7.0.
OpenBSD 5.0.
OS X 10.10.

I don't know how old some of these are, but you should check first to make sure we don't support any versions of these unix variants with versions older than specified above. As for Windows, If we're going to use Nt functions, I would rather they be entirely separate functions. i.e. not done via an optional parameter to an existing function, but a completely new function.

Thanks, we definitely don't want to make LLVM incompatible with those platforms. I imagine we still need to keep LLVM running on those versions, they aren't too old and probably still widely used.
Now it seems we'll have to put this as a separate implementation under a configurable macro, so that it can be turned off on systems that don't support openat.
Maybe there's a list of minimal versions that LLVM strives to support?

Technically, creating separate functions should not be too hard.
There are two concerns that I have with having both APIs:

  1. We must be very careful to not mix Win32 and Nt APIs (i.e. file opened via NtCreateFile should be read via NtReadFile, not ReadFile).
  2. Having both Win32 and Nt functions is that Nt variants may stay untested if most of LLVM and Clang will be using Win32.

What problems do you foresee if we switch to Nt API everywhere?

I'll start to work on an implementation that is only enable via a configurable macro.

That way, we could have it available for Unix variants that support openat.
Other systems will have to fallback to converting the paths to absolute.

Technically, creating separate functions should not be too hard.
There are two concerns that I have with having both APIs:

  1. We must be very careful to not mix Win32 and Nt APIs (i.e. file opened via NtCreateFile should be read via NtReadFile, not ReadFile).
  2. Having both Win32 and Nt functions is that Nt variants may stay untested if most of LLVM and Clang will be using Win32.

What problems do you foresee if we switch to Nt API everywhere?

The main concern I have is that the Nt functions are not intended for public consumption. They're considered internal, and as such they are subject to all the caveats listed here.

I fear it would also be impossible to safely implement some of the functions with the Nt methods. For example, openFileForRead returns an fd in one of its arguments. The way this is implemented is by calling CreateFile and then taking the resulting HANDLE and calling _open_osfhandle on it. Does this work when it's an NtCreate handle? Maybe? I don't know, but if we can't mix functions then this is just one of the problems. The point is that some of our APIs expose these handles and descriptors to the outside world, and we can't expect external users to use the Nt APIs.

If all this is hidden behind a separate function, then the requirements for that function could be documented independently. Regardless, the main thing is that it's an internal API, and it seems like a very bad idea to have our entire filesystem library for all of LLVM be based entirely on unsupported, internal OS APIs.

It's not just Windows, but not even all unixes support openat. Minimum required version of various Unixes are:

DragonFly 2.3.
FreeBSD 8.0.
Linux 2.6.16
NetBSD 7.0.
OpenBSD 5.0.
OS X 10.10.

I don't know how old some of these are, but you should check first to make sure we don't support any versions of these unix variants with versions older than specified above. As for Windows, If we're going to use Nt functions, I would rather they be entirely separate functions. i.e. not done via an optional parameter to an existing function, but a completely new function.

Almost certainly nobody maintains newer LLVM on older versions of mentioned BSD systems.

The main concern I have is that the Nt functions are not intended for public consumption. They're considered internal, and as such they are subject to all the caveats listed here.

I totally agree that is not ideal. Even though Nt functions for file manipulations have been there for ages. I personally highly doubt they will go away in the foreseeable future, but there is no guarantee.

I fear it would also be impossible to safely implement some of the functions with the Nt methods. For example, openFileForRead returns an fd in one of its arguments. The way this is implemented is by calling CreateFile and then taking the resulting HANDLE and calling _open_osfhandle on it. Does this work when it's an NtCreate handle? Maybe? I don't know, but if we can't mix functions then this is just one of the problems. The point is that some of our APIs expose these handles and descriptors to the outside world, and we can't expect external users to use the Nt APIs.

That is certainly an issue I overlooked. There are no indications if _open_osfhandle will work properly on handles created by Nt functions.

If all this is hidden behind a separate function, then the requirements for that function could be documented independently. Regardless, the main thing is that it's an internal API, and it seems like a very bad idea to have our entire filesystem library for all of LLVM be based entirely on unsupported, internal OS APIs.

Another thing is that for Windows we may get away with just converting the paths to absolute all the time. The symlinks changing their target rapidly (i.e. during a single compile) is something we're having trouble with only on Linux, and I doubt this use-case would ever come up on Windows.
My intention currently is to provide the openat-based APIs only on Unix that have a relevant library support.
The final aim to implement a clang::vfs::FileSystem that does not rely on process-wide working directory. It would either be based on top of openat-based APIs or would convert paths to absolute on each request.

Almost certainly nobody maintains newer LLVM on older versions of mentioned BSD systems.

Good to know, thanks.

ilya-biryukov abandoned this revision.May 23 2019, 1:59 AM