This is an archive of the discontinued LLVM Phabricator instance.

[FileSystem] Extend file system and have it use the VFS.
ClosedPublic

Authored by JDevlieghere on Oct 22 2018, 3:12 PM.

Details

Summary

This patch extends the FileSystem class with a bunch of operations that are currently implemented as methods of the FileSpec class.

The implementation uses the virtual file system which means that the class is now stateful. I've turned it into a singleton because:

  • We need it for the shared module cache which lives above the debugger.
  • Passing it around would be inconvenient as we'll need it wherever we want to manipulate a file. I also don't think passing the FS as a parameter convey any useful information from an API standpoint.

This is the first patch in a series that will remove methods manipulating the filesystem from FileSpec. As I make more changes I might have to updated it or extend it with new methods.

Diff Detail

Repository
rL LLVM

Event Timeline

JDevlieghere created this revision.Oct 22 2018, 3:12 PM

Back when the FileSystem class was introduced, the idea was that *it* would eventually become the gateway to the real filesystem, and FileSpec would just deal with abstract path manipulation.

I still think that is a good idea, particularly in light of this patch. So I would propose to integrate this VFS functionality into that class, and then remove any operations that touch real files from the file spec class.

(Also, given that now about 50% of filesystem operations go through FileSpec and the other half through FileSystem (or other methods), any attempts to use FileSpec to control how filenames are interpreted will not be complete).

JDevlieghere planned changes to this revision.Oct 24 2018, 9:37 AM
JDevlieghere retitled this revision from [FileSpec] Add VFS support to [FileSpec] Add VFS support to FileSpec convenience methods..
JDevlieghere edited the summary of this revision. (Show Details)
JDevlieghere edited reviewers, added: davide; removed: espindola.Oct 24 2018, 2:19 PM

Note that there's also a File class which we'll need to patch up separately together with all other uses that actually open or read a file. For this class we can do the same as we did for FileSpec: deferring relevant operations to the FileSystem class.

It seems like FileSpec was moved out of Utility and into Host. I’m not a
fan of this change, because it took a lot of effort to get it into Utility,
which helped break a lot of bad dependencies. Can we invert this dependency
and move FileSpec back into Utility?

I also think it we should try to keep FileSpec in the Utility module. We should be able to do that by deleting the "utility" functions in the FileSpec class, and updating everything to go through the FileSystem class.

After the VFS introduction, something like file_spec.Exists() will not even be well-defined anyway, as one will have to specify which file system are we checking the existence in. So, it would be more correct to say file_spec.Exists(my_vfs), at which point it's not too much of a stretch to change this into my_vfs.Exists(file_spec).

(This is the same argument I was giving earlier, since even before VFS we kinda were dealing with multiple file systems (host, remote device). In fact, after this it might be a nice cleanup to change the Platform class from vending a bunch of FS-like methods to (FileExists, Unlink, CreateSymlink, ...), to have a single getRemoteFS method returning a VFS which does the right thing "under the hood").

tl;dr: I'd propose to (in two or more patches):

  • replace FileSpec "utility" methods with equivalent FileSystem functionality
  • do the necessary FileSystem changes to support VFS
source/Host/common/FileSystem.cpp
24–29 ↗(On Diff #170979)

replace with static FileSystem *g_fs = new FileSystem();

33–35 ↗(On Diff #170979)

I'm not sure what's your use case here, but would it make more sense instead of modifying the singleton object to have more that one FileSystem instance floating around? It could be local to each Debugger (?) object and everyone would fetch it from there when needed.

While I don't disagree with the proposed approach, it needs to be highlighted that extracting all file system operations would be very intrusive.

  • Almost all logic in FileSpec deals with the filesystem. You can't even resolve a path without it. How do you imagine this would work?
  • This will significantly increase the verbosity of working with FileSpec. Depending on the solution to the previous bullet point, you might have to do something like FileSpec spec(path);FS.Resolve(spec);FS.Exists(spec).

Additionally, if we make the FileSystem a member of the debugger, we'd have to pass it everywhere we pass a FileSpec. I'm not convinced that's something we want. We could put the FileSpec and the FileSystem in a wrapper object but then we're essentially where we are today. Furthermore, for the reproducers, we need the FileSystem to live above the debugger because of the shared module cache. I can't really imagine a scenario where we'd need two debuggers with a different VFS.

I envision FileSpec as being more of a PathSpec. It should be able manipulate paths as strings and nothing more. There is indeed some logic that still remains that resolves paths, but it manages to do this using LLVM's file system APIs and the only reason it's still there is because it was baked in and a bit hard to remove.

One idea for removing it would be to have FileSpec FileSystem::Resolve(const FileSpec &). Then instead of saying FileSpec foo(path, /*resolve = */ true);, they could say FileSpec foo = FileSystem::Resolve(FileSpec(path)); or something.

Thanks for the feedback! I totally agree it's a good solution and it was one of the things I considered. It didn't make it to the top of the list because it is very intrusive and changes the semantics of FileSpec quite a bit (i.e. it becomes a PathSpec as Zachary noted). Anyway, I'm happy to do this and I'll split this up in two patches as Pavel suggested. I'll extend the FileSystem first, so that we can do the VFS stuff straight away instead of implementing these methods twice.

JDevlieghere planned changes to this revision.Oct 25 2018, 1:38 PM

Thanks for the feedback! I totally agree it's a good solution and it was one of the things I considered. It didn't make it to the top of the list because it is very intrusive and changes the semantics of FileSpec quite a bit (i.e. it becomes a PathSpec as Zachary noted). Anyway, I'm happy to do this and I'll split this up in two patches as Pavel suggested. I'll extend the FileSystem first, so that we can do the VFS stuff straight away instead of implementing these methods twice.

It does change them, but that's always been the sort of "end goal" of FileSpec anyway, it just hasn't been reached yet. Hopefully we can get there with your help :)

JDevlieghere retitled this revision from [FileSpec] Add VFS support to FileSpec convenience methods. to [FileSystem] Extend file system and have it use the VFS..
JDevlieghere edited the summary of this revision. (Show Details)

I'll recycle this differential for the first patch that changes the FileSystem class.

Now that I know your use case (I forgot that you're working on the replay now, and the mention of VFS threw me off in a completely different direction), I agree that having a single global filesystem is fine. As for the increased verbosity, I wonder if we should make the fs accessor a free function instead of a member (so that you write something like hostFS().Symlink(..) instead of FileSystem::instance().Symlink(...).

(Theoretically, we might even make the VFS instance a global variable, and then we could keep the current pattern of accessing stuff via static methods, but I like the fact that this VFS thingy could then be reused for working with files on remote devices. Also, you may need a working FS instance in the guts of the replay code, to e.g., read the replay data.)

include/lldb/Host/FileSystem.h
35 ↗(On Diff #171376)

I am wondering if we could make this work without exposing the ability to change the FS implementation mid-flight to everyone. How are you planning to initialize the record/replay? Would it be possible to pass in the VFS instance via some kind of an Initialize function (and have it be immutable from that point on)?

include/lldb/Utility/FileSpec.h
423 ↗(On Diff #171376)

accidental change?

Address Pavel's feedback:

  • Change to Initialize method which can only be called once.
  • Fix accidental change in comment.
JDevlieghere marked 2 inline comments as done.Oct 29 2018, 10:01 AM

Address Pavel's feedback:

  • Change to Initialize method which can only be called once.

I suppose this is slightly better, but what I meant was to have a static (like all other functions with the same name) Initialize function, which has to be called before the first call to Instance(). During normal startup, this could be called from SystemInitializer, but I am not sure how would that work for record/replay (that's why I was asking how you planned to initialize those). It would require to setup the record/replay machinery very early on, which may be tricky. But on the other hand, you probably want to initialize these as early as possible anyway.

Do you think something like that would be possible?

Address Pavel's feedback:

  • Change to Initialize method which can only be called once.

I suppose this is slightly better, but what I meant was to have a static (like all other functions with the same name) Initialize function, which has to be called before the first call to Instance(). During normal startup, this could be called from SystemInitializer, but I am not sure how would that work for record/replay (that's why I was asking how you planned to initialize those). It would require to setup the record/replay machinery very early on, which may be tricky. But on the other hand, you probably want to initialize these as early as possible anyway.

Do you think something like that would be possible?

Alright, I see what you mean. You know more about the architecture, but it looks like we initialize the system before even reading the command line arguments in the driver. I have to read the reproducer before I can set the appropriate file system, so I'm not sure if this is possible.

Address Pavel's feedback:

  • Change to Initialize method which can only be called once.

I suppose this is slightly better, but what I meant was to have a static (like all other functions with the same name) Initialize function, which has to be called before the first call to Instance(). During normal startup, this could be called from SystemInitializer, but I am not sure how would that work for record/replay (that's why I was asking how you planned to initialize those). It would require to setup the record/replay machinery very early on, which may be tricky. But on the other hand, you probably want to initialize these as early as possible anyway.

Do you think something like that would be possible?

Alright, I see what you mean. You know more about the architecture, but it looks like we initialize the system before even reading the command line arguments in the driver. I have to read the reproducer before I can set the appropriate file system, so I'm not sure if this is possible.

Yes, right now it certainly seems to be the case that we parse cmdline late in the game. However, it's not clear to me whether that has to be the case.

I can't say I have thought this through to the end, but it seems to me that setting up the repro engine should be one of the first (if not THE first) SB calls from the driver. Right now the parsing happens too late (e.g. we already have an SBDebugger instance created at that point). I am not sure you could safely initialize record/replay at that point, even if you got FileSystem switching working.

Architecturally, the cleanest solution to me seems to be doing the command line parsing (even if it's just the repro-related args) before SBDebugger::Initialize, and then have special versions of the Initialize function like Initialize(args_for_record) and Initialize(args_for_replay). The interesting repro-related stuff would then happen in these functions, which can then call the Initialize functions of relevant components (like FileSystem) and pass appropriate arguments.

However, I can't say how would that fit into your intended design or any code you already have implemented.

Yes, right now it certainly seems to be the case that we parse cmdline late in the game. However, it's not clear to me whether that has to be the case.

I can't say I have thought this through to the end, but it seems to me that setting up the repro engine should be one of the first (if not THE first) SB calls from the driver. Right now the parsing happens too late (e.g. we already have an SBDebugger instance created at that point). I am not sure you could safely initialize record/replay at that point, even if you got FileSystem switching working.

Architecturally, the cleanest solution to me seems to be doing the command line parsing (even if it's just the repro-related args) before SBDebugger::Initialize, and then have special versions of the Initialize function like Initialize(args_for_record) and Initialize(args_for_replay). The interesting repro-related stuff would then happen in these functions, which can then call the Initialize functions of relevant components (like FileSystem) and pass appropriate arguments.

However, I can't say how would that fit into your intended design or any code you already have implemented.

Sounds reasonable. Let's go with the static Initialize approach now and figure out the details about initializing the reproducer later.

labath accepted this revision.Oct 31 2018, 2:16 AM

Thanks for bearing with me. I am very happy with how this turned out in the end. I have more suggestion on how to clean up the initialization, which you can incorporate if you like it, but I don't want to hold up the patch over that.

include/lldb/Host/FileSystem.h
21 ↗(On Diff #171845)

Not needed anymore?

include/lldb/Utility/FileSpec.h
423 ↗(On Diff #171376)

It seems this is still here... :(

source/Host/common/FileSystem.cpp
32 ↗(On Diff #171845)

Instead of manual m_initialized management, what would you say to this:

  • have a private InstanceImpl (or whatever) function, which returns an Optional<FileSystem> &
  • Initialize does InstanceImpl().emplace(VFS); (it can check that the value is None beforehand if you want, but I am not worried about people accidentally calling Initialize twice).
  • Terminate does InstanceImpl().reset();
  • Instance does return *InstanceImpl();

I think this would be a bit cleaner as it would allow is to specify the VFS directly during construction. We would avoid having to construct a FileSystem object with a dummy VFS, only to replace it with another one later. And also you wouldn't need to subclass FileSystem in the unit tests just to change the implementation.

unittests/Host/FileSystemTest.cpp
302–310 ↗(On Diff #171845)

You can write this more concisely (and with better error messages) as EXPECT_THAT(visited, testing::UnorderedElementsAre("/foo", "/bar", "/baz", "/qux")); (you need to include gmock.h for that to work).

This revision is now accepted and ready to land.Oct 31 2018, 2:16 AM
This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.