This is an archive of the discontinued LLVM Phabricator instance.

Move RedirectingFileSystem interface into header.
ClosedPublic

Authored by JDevlieghere on Nov 8 2018, 2:53 PM.

Details

Summary

This moves the RedirectingFileSystem into the VFS header so we can extend it from LLVM with a custom implementation to get the external path.

Diff Detail

Repository
rL LLVM

Event Timeline

JDevlieghere created this revision.Nov 8 2018, 2:53 PM

How do you want it to work with symlinks? Would getRealPath be sufficient for your purpose?

How do you want it to work with symlinks? Would getRealPath be sufficient for your purpose?

I'm not sure how those two are related, other than the confusing name. As far as I can tell this only wraps the realpath(3) call, which is definitely much too expensive to do in the "real" file system case. For symlinks I wouldn't really care, it's up to the user to resolve them if they matter.

How do you want it to work with symlinks? Would getRealPath be sufficient for your purpose?

I'm not sure how those two are related, other than the confusing name. As far as I can tell this only wraps the realpath(3) call, which is definitely much too expensive to do in the "real" file system case. For symlinks I wouldn't really care, it's up to the user to resolve them if they matter.

OK. Now I understand that you prefer to avoid realpath as it is too expensive for this purpose.

Do you think Optional is sufficient for handling errors? On one hand it is a simpler interface but on the other hand it loses information why external path might be unavailable.

lib/Support/VirtualFileSystem.cpp
1906–1907 ↗(On Diff #173229)

Need to double check but at the first glance I have reservations about a case with nested RedirectingFileSystem. Should getExternalPath go only through 1 level of indirection or all the way to real file system?

How do you want it to work with symlinks? Would getRealPath be sufficient for your purpose?

I'm not sure how those two are related, other than the confusing name. As far as I can tell this only wraps the realpath(3) call, which is definitely much too expensive to do in the "real" file system case. For symlinks I wouldn't really care, it's up to the user to resolve them if they matter.

OK. Now I understand that you prefer to avoid realpath as it is too expensive for this purpose.

Do you think Optional is sufficient for handling errors? On one hand it is a simpler interface but on the other hand it loses information why external path might be unavailable.

Yeah, I wasn't sure about this. Usually the client won't care what the error is, but there's no harm in returning an ErrorOr<>. I'll update the patch!

JDevlieghere planned changes to this revision.Nov 10 2018, 4:33 PM
JDevlieghere added inline comments.
lib/Support/VirtualFileSystem.cpp
1906–1907 ↗(On Diff #173229)

Good point. My use case would require all the way to the real FS.

I don't understand the concept of an "external path". What is it in the general case, when would you want to pass a path through this function vs using it directly?

From the patch description, it sounds like it may be specific to LLDB and/or RedirectingFileSystem. If so, maybe it shouldn't be on the interface?

I don't understand the concept of an "external path". What is it in the general case, when would you want to pass a path through this function vs using it directly?

From the patch description, it sounds like it may be specific to LLDB and/or RedirectingFileSystem. If so, maybe it shouldn't be on the interface?

It's true that this is only relevant for VFS implementations that are somehow backed by disk. I'm curious what you have in mind. How would it not be on the interface and still work transparently?

I don't understand the concept of an "external path". What is it in the general case, when would you want to pass a path through this function vs using it directly?

From the patch description, it sounds like it may be specific to LLDB and/or RedirectingFileSystem. If so, maybe it shouldn't be on the interface?

It's true that this is only relevant for VFS implementations that are somehow backed by disk. I'm curious what you have in mind. How would it not be on the interface and still work transparently?

I can't really answer these questions without knowing what an external path is or when you'd want to use it.
Could you describe it here or add some documentation to the patch?

I don't understand the concept of an "external path". What is it in the general case, when would you want to pass a path through this function vs using it directly?

From the patch description, it sounds like it may be specific to LLDB and/or RedirectingFileSystem. If so, maybe it shouldn't be on the interface?

It's true that this is only relevant for VFS implementations that are somehow backed by disk. I'm curious what you have in mind. How would it not be on the interface and still work transparently?

I can't really answer these questions without knowing what an external path is or when you'd want to use it.
Could you describe it here or add some documentation to the patch?

I'll try here first and we can always improve the patch later.

Let's start from the beginning: I'm adding reproducer functionality to LLDB. Just like in clang, we need files to be available somewhere else form where they actually live on disk. This should also be transparent to the consumer. I want to use the VFS for this rather than hand-rolling my own implementation in LLDB. I did a whole bunch of refactoring to make most of LLDB's file system operations go through the VFS.

One notable difference between llvm/clang and LLDB is how the latter deals with files. In addition to using memory buffers, LLDB manipulates the majority of files through file pointers (FILE*) and file descriptors. It's also a lot less explicit in whether it's opening a file for reading or writing. By this I mean in llvm we have functions like openMemoryBufferForWriting or openMemoryBufferForReading while in LLDB we would just pass a different open flag. While I would love to change how LLDB manipulates files, I simply don't have the bandwidth to do that right now. So that brings me to my current problem: how do I access files from LLDB through the VFS.

I considered extending the VFS API with functions to provide access to file pointers and descriptors, but decided against it. The reason is that (AFAIK) there's no way to access memory through either of those on both windows and unix. I believe you can have a FILE*to memory but that doesn't work on Windows. File descriptors to memory don't make sense on either.

The alternative, proposed in this patch, is to have an API in the VFS to access the "underlying" file through its "external path". This is the path specified in YAML by external-contents. So when the VFS thinks that the file foo lives at /root/foo but it actually lives at /root/whatever/reproducer/foo, I want to be able to get that latter path and open it with (f)open.

Please let me know if this makes things more clear.

I don't understand the concept of an "external path". What is it in the general case, when would you want to pass a path through this function vs using it directly?

From the patch description, it sounds like it may be specific to LLDB and/or RedirectingFileSystem. If so, maybe it shouldn't be on the interface?

It's true that this is only relevant for VFS implementations that are somehow backed by disk. I'm curious what you have in mind. How would it not be on the interface and still work transparently?

I can't really answer these questions without knowing what an external path is or when you'd want to use it.
Could you describe it here or add some documentation to the patch?

I'll try here first and we can always improve the patch later.

One notable difference between llvm/clang and LLDB is how the latter deals with files. In addition to using memory buffers, LLDB manipulates the majority of files through file pointers (FILE*) and file descriptors. It's also a lot less explicit in whether it's opening a file for reading or writing. By this I mean in llvm we have functions like openMemoryBufferForWriting or openMemoryBufferForReading while in LLDB we would just pass a different open flag. While I would love to change how LLDB manipulates files, I simply don't have the bandwidth to do that right now. So that brings me to my current problem: how do I access files from LLDB through the VFS.

So if I understand correctly, you want a way to devirtualize the VFS: to take a VFS file and get a concrete FILE* out of it. (In this case, via the filename and fopen()).

The fact that VFS doesn't have any way to do this is an important property, it's what makes the file system virtual.
VFS allows code to work in the same way on OS-native, in-memory, or distributed filesystems, and getExternalPath would break that - any code that called it can only work when the files are 1:1 with native OS files. Code that wants to couple to the FS implementation in this way shouldn't work transparently with VFS, because it breaks the design goal that VFSes are substitutable.

For what you're trying to do, I'd suggest one of the following:

  • use a concrete VFS implementation that maps files and exposes the functions you need (e.g. expose RedirectingFileSystem from VFS.h with an appropriate public interface)
  • use the existing getRealPath if you know which concrete VFSes you're using and it happens to do what you want
  • track the correspondence between virtual and real paths externally (where you set up the FS)
  • decouple LLDB's IO from the concrete filesystem (use VFS file instead of FILE*)
JDevlieghere planned changes to this revision.Nov 14 2018, 9:31 AM

Thanks for bearing with me on this, Sam. I really appreciate you taking the time to find the best solution that works for all of us.

So if I understand correctly, you want a way to devirtualize the VFS: to take a VFS file and get a concrete FILE* out of it. (In this case, via the filename and fopen()).

The fact that VFS doesn't have any way to do this is an important property, it's what makes the file system virtual.
VFS allows code to work in the same way on OS-native, in-memory, or distributed filesystems, and getExternalPath would break that - any code that called it can only work when the files are 1:1 with native OS files. Code that wants to couple to the FS implementation in this way shouldn't work transparently with VFS, because it breaks the design goal that VFSes are substitutable.

For what you're trying to do, I'd suggest one of the following:

  • use a concrete VFS implementation that maps files and exposes the functions you need (e.g. expose RedirectingFileSystem from VFS.h with an appropriate public interface)

I guess this could work if we ignore the concern Volodymyr raised. If we assume that the ExternalFS is the RealFileSystem this could work. The problem is that RedirectingFileSystem takes any FileSystem as its external file system so if that doesn't have the right API we can't "dig deeper".

  • use the existing getRealPath if you know which concrete VFSes you're using and it happens to do what you want.

This is not a viable option because it defeats the purpose of the VFS.

  • track the correspondence between virtual and real paths externally (where you set up the FS)

This is not a viable option because it defeats the purpose of the VFS.

  • decouple LLDB's IO from the concrete filesystem (use VFS file instead of FILE*)

This is also not a viable option because, like I said earlier, this would be too big of a change for LLDB. In addition to the risk such a change brings, I don't have the bandwidth for it and even if I did I'm not sure it be feasible. Pavel said he looked into this and found that some of LLDB's dependencies formed a problem.

I'll update the patch to reflect what (1) would look like.

I've added an interface ExternalFileSystem (I'm open to a better name) which has a slightly different meaning than before as will become clear later.

There's currently two implementations of this interface:

  • The real filesystem.
  • The redirecting filesystem.

The implementation for both is trivial. For the real file system it's a no-op. For the redirecting file system it does the translation. It's different from the previous patch in that it follows only one level of indirection. If a RedirectingFileSystem is backed by an in-memory file system it still works, you just get the translated path.

I strongly believe the interface is better than exposing the RedirectingFileSystem because it (1) hides more abstraction and (2) allows me to transparently switch between the two in LLDB.

Hi Jonas,

I'm not sure about this abstraction: because it's non-recursive, it's hard to give meaningful semantics to getExternalPath() - the thing that would be useful to LLDB (if abstraction-breaking) is "native OS path", but for a general VFS, getExternalPath may give a native path, fail, or give a string that's not a native OS path.
If dispatching between the real VFS and redirecting VFS in this way is the best thing for LLDB, I'd suggest exposing RedirectingFS and adding the indirection on the LLDB side.

I think *if* VFS is to support devirtualization, then putting it on the main VFS interface so it can be recursive as in your original patch is the right way to do it, and it needs to be explicitly optional. (i.e. supporting devirtualization is a dynamic property rather than a static one.)

For example:

// If \p Path corresponds directly to a native operating system path, attempts to return that path.
// (e.g. if this is the native file system, or merely remaps paths for an underlying native file system).
// May return `None` if there is no such native file or its path cannot be determined.
virtual Optional<std::string> getNativeOSPath(StringRef Path) { return None; }

(Note the lack of "error" return value: this makes it safe for VFSes that don't implement this to not distinguish between the "no file" and "can't be determined" case)

But I don't personally think this addition is a good idea at the VFS layer. The upside is it allows LLDB to use VFS in some places, which may eventually turn into true VFS support. The risk is that adding an escape hatch will mean over time we can no longer rely on VFS-aware code being compatible with all VFSes.

I think this could benefit from a discussion in the wider LLVM community - I'll start a discussion on llvm-dev.

JDevlieghere retitled this revision from Extend VFS with function to get external path. to Move RedirectingFileSystem interface into header..
JDevlieghere edited the summary of this revision. (Show Details)

Move RedirectingFileSystem interface into header so we can extend it from LLDB.

I'm not sure if this is way out of scope for this patch, but if I understand correctly this could be used for extending it to, for example, handle things like Perforce-style paths which could indeed be useful, although I'm not sure if that is possible (ie. would //depot/asdf behave differently on Windows and Linux/Darwin/any other OS that uses Unix-style paths?). If this is with the intention to allow extending RedirectingFileSystem downstream then I think it may be worth splitting it up from the YAML based implementation altogether, allowing custom path handling where needed (as a separate downstream extension), with the concrete reference implementation of RedirectingFileSystem (the YAML based one) being a separate thing. This would make it more extensible from the POV of downstream forks that may desire to implement path redirection that is too complicated to do using YAML mappings, but still fundamentally rely on some sort of path remapping.

Do you think it would be worth decoupling RedirectingFileSystem from YAML for such use cases and exposing both RedirectingFileSystem and, for your use case, something like YAMLRedirectingFileSystem? Just a stray thought, not really sure if it makes sense or is out of scope here (I don't believe it is, since if the interface is being exposed, it may be worth rethinking how it is layered if it's to be exposed through LLVMSupport headers).

I'm not sure if this is way out of scope for this patch, but if I understand correctly this could be used for extending it to, for example, handle things like Perforce-style paths which could indeed be useful, although I'm not sure if that is possible (ie. would //depot/asdf behave differently on Windows and Linux/Darwin/any other OS that uses Unix-style paths?). If this is with the intention to allow extending RedirectingFileSystem downstream then I think it may be worth splitting it up from the YAML based implementation altogether, allowing custom path handling where needed (as a separate downstream extension), with the concrete reference implementation of RedirectingFileSystem (the YAML based one) being a separate thing. This would make it more extensible from the POV of downstream forks that may desire to implement path redirection that is too complicated to do using YAML mappings, but still fundamentally rely on some sort of path remapping.

Do you think it would be worth decoupling RedirectingFileSystem from YAML for such use cases and exposing both RedirectingFileSystem and, for your use case, something like YAMLRedirectingFileSystem? Just a stray thought, not really sure if it makes sense or is out of scope here (I don't believe it is, since if the interface is being exposed, it may be worth rethinking how it is layered if it's to be exposed through LLVMSupport headers).

It sounds like the scenario you're describing could be handled by just having a custom VFS implementation. What part of the RedirectingFileSystem would you repurpose for this? The reason I want to move it into the header is because I actually need the YAML stuff for LLDB's reproducers.

post-holiday ping

This looks reasonable to me, thanks! Just questions about what's exposed where.
(Apologies, was out for all of december and just got back)

include/llvm/Support/VirtualFileSystem.h
496 ↗(On Diff #176033)

Entry and EntryKind don't seem to apply to anything other than RedirectingVFS, so I think they could be nested within RedirectingVFS or at least given names tying them to that class (rather than to VFS itself)

624 ↗(On Diff #176033)

does this stuff need to be protected rather than private? I can't find any subclasses

750 ↗(On Diff #176033)

Unrelated/unused?

This looks reasonable to me, thanks! Just questions about what's exposed where.
(Apologies, was out for all of december and just got back)

No worries, thanks for having a look!

I moved the classes and enum into the RedirectingFileSystem.

JDevlieghere marked 4 inline comments as done.Jan 8 2019, 10:30 AM
JDevlieghere added inline comments.
include/llvm/Support/VirtualFileSystem.h
750 ↗(On Diff #176033)

Ah, this is needed for the LLDB reproducers. I'll split it off into a separate patch.

sammccall accepted this revision.Jan 15 2019, 4:39 AM

(Really sorry, I thought I'd accepted this on the previous iteration)

This revision is now accepted and ready to land.Jan 15 2019, 4:39 AM
JDevlieghere marked an inline comment as done.Jan 15 2019, 2:40 PM

(Really sorry, I thought I'd accepted this on the previous iteration)

No worries & thank you!

This revision was automatically updated to reflect the committed changes.