This is an archive of the discontinued LLVM Phabricator instance.

[FileSystem] Open File instances through the FileSystem.
ClosedPublic

Authored by JDevlieghere on Nov 1 2018, 6:45 PM.

Details

Summary

This patch modifies how we open File instances in LLDB. Rather than passing a path or FileSpec to the constructor, we now go through the virtual file system. This is needed in order to make things work with the VFS.

The current implementation doesn't use the VFS yet. The reason is that in LLDB we mostly use FILE pointers or file descriptors to manipulate files. This is currently not supported by the VFS and I haven't decided to address this yet. The two alternative I see are:

  • Having the VFS translate paths. This will only work for virtual file systems where the files actually reside on disk. This would be sufficient for the reproducer use case but lacks generality.
  • Having the VFS provide FILE pointers. Still we wouldn't be able to completely support in-memory vfses. I believe on POSIX there's a way to have a file point to memory, but I don't think you can do that for a file descriptor. Also not sure if that would work on Windows?

Both options are incomplete and I don't have the bandwidth to change how LLDB deals with files. I'm currently leaning towards the first alternative because LLVM doesn't provide FILE* APIs and I don't think we should start adding them now. Also I don't want to extend the VFS for just LLDB as that's going to be a pain to hook up and maintain. Anyway, I'm open to suggestions!

Diff Detail

Repository
rLLDB LLDB

Event Timeline

JDevlieghere created this revision.Nov 1 2018, 6:45 PM

I am not sure what do you mean by "translating paths" in the VFS. If you mean something like "return a FILE * for /whatever/my_reproducer/vfs/bin/ls when I ask for /bin/ls", then I think that's a good idea, as it's most likely to work with all of our existing code (e.g. mmapping). Although, in this case, i am not sure how much benefit will the llvm VFS bring to the game, as most of the operations could then be implemented by plainly prepending some prefix to a given path.

Getting rid of FILE * is not going to be easy, as it is used for some of our external dependencies (libedit). It's possible to create a fake FILE* on posix systems (fopencookie, fmemopen), but I don't think it's possible to do something like that on windows (which is why I stopped when I was looking at this some time ago).

Also, be aware that there are some places where we open raw_ostreams directly (Debugger::EnableLog comes to mind). I guess those will need to go through the FileSystem too...

source/Host/common/FileSystem.cpp
249

File::eOpenOptionRead should be sufficient, no?

source/Host/posix/FileSystem.cpp
74–79 ↗(On Diff #172295)

Why are these two const? It seems that at least with eOpenOptionCanCreate, one can create a new file, which is definitely not a "const" operation.

source/Utility/FileSpec.cpp
321–322 ↗(On Diff #172295)

We already have operator bool for this.

I am not sure what do you mean by "translating paths" in the VFS. If you mean something like "return a FILE * for /whatever/my_reproducer/vfs/bin/ls when I ask for /bin/ls", then I think that's a good idea, as it's most likely to work with all of our existing code (e.g. mmapping). Although, in this case, i am not sure how much benefit will the llvm VFS bring to the game, as most of the operations could then be implemented by plainly prepending some prefix to a given path.

Let me try to explain this better. This is mostly a question about what kind of API the VFS (which lives in LLVM) provides to deal with files in lldb (i.e. through FILE* and file descriptors).

The first possibility is providing a method in the VFS that takes a "virtual" path and returns the "underlying" path. Something like Optional<path> vfs::getUnderlyingPath(path). This doesn't just have to be a prefix thing but you are right that it's mostly going to be just that. The problem is that this method wouldn't work for virtual file systems that don't have files on disk. Hence the lack of generality.

The second possibility is providing a method in the VFS that returns FILE*. When I was writing this I was still hoping that there was something like the fake FILE* you mentioned below. If that were the case it could support virtual file systems that have files that strictly live in memory. But if this doesn't work for Windows it's not really any better than the first alternative.

Getting rid of FILE * is not going to be easy, as it is used for some of our external dependencies (libedit). It's possible to create a fake FILE* on posix systems (fopencookie, fmemopen), but I don't think it's possible to do something like that on windows (which is why I stopped when I was looking at this some time ago).

Yup, this is not something I'm proposing.

Also, be aware that there are some places where we open raw_ostreams directly (Debugger::EnableLog comes to mind). I guess those will need to go through the FileSystem too...

Yup, we need to audit all file access. This particular example actually goes through file so it should be covered by this change.

JDevlieghere added inline comments.Nov 2 2018, 8:53 AM
source/Host/common/FileSystem.cpp
249

Sorry, I don't know what you mean?

source/Host/posix/FileSystem.cpp
74–79 ↗(On Diff #172295)

Depends on what you consider const, the real filesystem or the class. But yeah, the const doesn't add much so I can un-const this.

JDevlieghere marked 3 inline comments as done.

Address Pavel's feedback:

  • Un-const open functions.
  • Remove ::Empty() again as it was redundant with operator bool().
  • Rename OpenFile to just Open. We might need more Open methods in the future.
labath accepted this revision.Nov 2 2018, 11:54 AM

I am not sure what do you mean by "translating paths" in the VFS. If you mean something like "return a FILE * for /whatever/my_reproducer/vfs/bin/ls when I ask for /bin/ls", then I think that's a good idea, as it's most likely to work with all of our existing code (e.g. mmapping). Although, in this case, i am not sure how much benefit will the llvm VFS bring to the game, as most of the operations could then be implemented by plainly prepending some prefix to a given path.

Let me try to explain this better. This is mostly a question about what kind of API the VFS (which lives in LLVM) provides to deal with files in lldb (i.e. through FILE* and file descriptors).

The first possibility is providing a method in the VFS that takes a "virtual" path and returns the "underlying" path. Something like Optional<path> vfs::getUnderlyingPath(path). This doesn't just have to be a prefix thing but you are right that it's mostly going to be just that. The problem is that this method wouldn't work for virtual file systems that don't have files on disk. Hence the lack of generality.

Ok, I think we're on the same page here. What I was wondering is that given this lack of generality (this operation would only be supported on DiskBackedFilesystem, and not all other kinds of file systems), whether it is not better to just do a custom solution instead of going through the VFS layer (thereby removing one layer, since now we have lldb_private::FileSystem sitting on top of llvm::VirtualFileSystem, sitting on top of the real thing). E.g., if we know we can restrict ourselves to the case where the on disk layout matches the real system, then all filesystem operations can be implemented very trivially via something like:

auto virtual_op(const Twine &Path, ...) { return real_op(Prefix+Path, ...); }

I am not sure whether this is a good idea (there's definitely a case to be made for reusing existing infrastructure), but it is something to think about.

Also, be aware that there are some places where we open raw_ostreams directly (Debugger::EnableLog comes to mind). I guess those will need to go through the FileSystem too...

Yup, we need to audit all file access. This particular example actually goes through file so it should be covered by this change.

You're right, I should have looked at this more closely. It should be fine (assuming we can have real fds for the virtual files).

source/Host/common/FileSystem.cpp
249

I meant to remove the superfluous OpenOptions qualification, As this is a non-class enum, it is not necessary, and it doesn't help clarity, since the enum value already contains the type name. None of the existing code uses this kind of qualification.

This revision is now accepted and ready to land.Nov 2 2018, 11:54 AM

Ok, I think we're on the same page here. What I was wondering is that given this lack of generality (this operation would only be supported on DiskBackedFilesystem, and not all other kinds of file systems), whether it is not better to just do a custom solution instead of going through the VFS layer (thereby removing one layer, since now we have lldb_private::FileSystem sitting on top of llvm::VirtualFileSystem, sitting on top of the real thing). E.g., if we know we can restrict ourselves to the case where the on disk layout matches the real system, then all filesystem operations can be implemented very trivially via something like:

auto virtual_op(const Twine &Path, ...) { return real_op(Prefix+Path, ...); }

I am not sure whether this is a good idea (there's definitely a case to be made for reusing existing infrastructure), but it is something to think about.

Thanks Pavel, I really appreciate you thinking this through with me! I understand what you're saying but I still think the VFS is worthwhile:

  • The lack of generality I brought up only applies to in-memory file systems. Every VFS implementation that's backed by disk would still work.
  • The only reason hard-coding the vfs implementation (because that's essentially what the prefix thing is) is the coincidence that no prefix equals the real file system.
  • We could have a prefix VFS implementation that's really simple and self-contained without cluttering the LLDB FileSystem class with details.

In the end I believe the indirection is more of a feature than a bug. I'm worried that, if we do our own thing, we'll end up in this situation where we have something in LLDB that looks really similar to something in LLVM but isn't quite the same. As soon as we want to do more complex than just prepending a prefix we'll need an abstraction anyway, so it might as well be the thing from LLVM.

source/Host/common/FileSystem.cpp
249

Got it, thanks!

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.