This is an archive of the discontinued LLVM Phabricator instance.

[VFS] Simplify RedirectingFileSystem to perform mapping only
Changes PlannedPublic

Authored by bnbarham on Mar 10 2022, 3:36 PM.

Details

Summary

RedirectingFileSystem has grown significantly over time. When it was
added it simply performed a path remapping and then used that path in
the filesystem it was passed (ExternalFS). The functionality provided
by -ivfsoverlay was handled by OverlayFileSystem, which would run
operations on the contained filesystems in reverse order.

Since then, "fallthrough" and "redirecting-with" options were added to
the overlay YAML format. As part of these changes, OverlayFileSystem
stopped being used in the handling of -ivfsoverlay and this was
instead handled by nesting RedirectingFileSystem together. The
right-most overlay would use the next-to-the-left
RedirectingFileSystem as its external filesystem, etc.

This allowed:

  1. Overlay paths to be dependent on previous -ivfsoverlay, ie. a second -ivfsoverlay could specify a virtual path for its overlay if that path is in the first overlay.
  2. Paths from one overlay to affect those in an earlier, ie. if the right-most overlay specified a mapping from A -> B and the left-most a mapping from B -> C, A could now map to C.
  3. The real filesystem to be skipped, ie. all specified paths have to be contained in an overlay (which would still map to the real filesystem).

(1) and (3) are now implemented by getVFSFromYAMLs using the
OverlayFileSystem (D121425), which allows removing the
redirecting-with handling from RedirectingFileSystem entirely. With
that removal RedirectingFileSystem is now back to only performing a
path remapping and using that path in the ExternalFS. (2) will no
longer be possible, but this also makes overlays *much* simpler to
reason about.

Depends on D121425

Diff Detail

Event Timeline

bnbarham created this revision.Mar 10 2022, 3:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 3:36 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
bnbarham requested review of this revision.Mar 10 2022, 3:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 3:36 PM
bnbarham updated this revision to Diff 414701.Mar 11 2022, 10:56 AM

Noticed there's a ::assign on SmallVectorImpl, use that instead of clear + append.

Ah, I should only be depending on the one rather than multiple. I'll fix that up soon (and for the following as well). That does make more sense, not sure why I thought it would need all of them!

bnbarham updated this revision to Diff 414749.Mar 11 2022, 2:19 PM
bnbarham edited the summary of this revision. (Show Details)

Update to single review dependency

This allowed:

  • Overlay paths to be dependent on previous -ivfsoverlay, ie. a second -ivfsoverlay could specify a virtual path for its overlay if that path is in the first overlay.
  • Paths from one overlay to affect those in an earlier, ie. if the right-most overlay specified a mapping from A -> B and the left-most a mapping from B -> C, A could now map to C.
  • The real filesystem to be skipped, ie. all specified paths have to be contained in an overlay (which would still map to the real filesystem).

FileSystem also had a get/set added for the current working directory
to allow for filesystem-specific CWD rather than a process-wide CWD,
further complicating the RedirectingFileSystem implementation.

It's not clear from the description what the plan is for the RedirectingFileSystem's CWD. It sounds like you're suggesting dropping it. If so, what happens if the redirecting filesystem remaps things to a new directory /new/dir that doesn't exist in the underlying filesystem, and the client calls cd /new/dir?

This change reverts RedirectingFileSystem to a single responsibility -
mapping paths and passing that path to the underlying filesystem.

I'm not clear about two things:

  • What is "the underlying filesystem"?
  • What does RedirectingFileSystem return for something not explicitly listed in the .yaml?

A
later change implements (1) and (3) through the OverlayFileSystem. (2)
is no longer possible, which fixes llvm-project#53306 as a side-effect.

Sorry if I missed a discussions somewhere, but I'm not sure I fully understand the plan for landing these changes. It sounds as if this is going to land, regressing (1) and (3), and then we'll add back (1) and (3) later; is that the plan? That doesn't quite seem right; is there another way to sequence the changes to avoid temporary regressions in tree? (If not, then they probably need to land together.)

This allowed:

  • Overlay paths to be dependent on previous -ivfsoverlay, ie. a second -ivfsoverlay could specify a virtual path for its overlay if that path is in the first overlay.
  • Paths from one overlay to affect those in an earlier, ie. if the right-most overlay specified a mapping from A -> B and the left-most a mapping from B -> C, A could now map to C.
  • The real filesystem to be skipped, ie. all specified paths have to be contained in an overlay (which would still map to the real filesystem).

FileSystem also had a get/set added for the current working directory
to allow for filesystem-specific CWD rather than a process-wide CWD,
further complicating the RedirectingFileSystem implementation.

It's not clear from the description what the plan is for the RedirectingFileSystem's CWD. It sounds like you're suggesting dropping it. If so, what happens if the redirecting filesystem remaps things to a new directory /new/dir that doesn't exist in the underlying filesystem, and the client calls cd /new/dir?

Sorry, I was only trying to give some history here. Adding CWD in caused various complications in RedirectingFileSystem due to the added nesting, ie. what path do we pass to the next FS? Hence all the mess with trying to map back to the original path depending on whether use external names is set, but still passing down the canonical name, etc. I'm certainly not suggesting we remove it, it's much easier to reason about with the simplified RedirectinFileSystem.

This change reverts RedirectingFileSystem to a single responsibility -
mapping paths and passing that path to the underlying filesystem.

I'm not clear about two things:

  • What is "the underlying filesystem"?
  • What does RedirectingFileSystem return for something not explicitly listed in the .yaml?
  1. I've been using "external" and "underlying" interchangeably. But I mean ExternalFS, ie. the FS that's passed into RedirectingFileSystem.
  2. It returns no_such_file_or_directory, as it once did. If we want an overlay, we should use OverlayFileSystem. Most of this commit message is trying to explain how we got to where we are today in order to provide some context.

A
later change implements (1) and (3) through the OverlayFileSystem. (2)
is no longer possible, which fixes llvm-project#53306 as a side-effect.

Sorry if I missed a discussions somewhere, but I'm not sure I fully understand the plan for landing these changes. It sounds as if this is going to land, regressing (1) and (3), and then we'll add back (1) and (3) later; is that the plan? That doesn't quite seem right; is there another way to sequence the changes to avoid temporary regressions in tree? (If not, then they probably need to land together.)

The plan is to land them together. I just split them to make it easier to review (ie. smaller chunks). Perhaps this has only just added to the confusion.

To be clear:

  • D121421 and D121423 and both be landed separately
  • D121424, D121425, and D121426 all need to be landed together. I don't see a way to split them up to avoid that, but if you do please let me know.

Sorry, I was only trying to give some history here. Adding CWD in caused various complications in RedirectingFileSystem due to the added nesting, ie. what path do we pass to the next FS? Hence all the mess with trying to map back to the original path depending on whether use external names is set, but still passing down the canonical name, etc. I'm certainly not suggesting we remove it, it's much easier to reason about with the simplified RedirectinFileSystem.

Great; I feel like we need the CWD there; the history was useful, I just took the wrong conclusion overall :).

This change reverts RedirectingFileSystem to a single responsibility -
mapping paths and passing that path to the underlying filesystem.

I'm not clear about two things:

  • What is "the underlying filesystem"?
  • What does RedirectingFileSystem return for something not explicitly listed in the .yaml?
  1. I've been using "external" and "underlying" interchangeably. But I mean ExternalFS, ie. the FS that's passed into RedirectingFileSystem.

Either one would have felt ambiguous to me, because I was wondering if you were referring to the sys::fs filesystem and they both kind of point there! Thanks for clarifying.

  1. It returns no_such_file_or_directory, as it once did. If we want an overlay, we should use OverlayFileSystem. Most of this commit message is trying to explain how we got to where we are today in order to provide some context.

Okay, makes sense.

The plan is to land them together. I just split them to make it easier to review (ie. smaller chunks). Perhaps this has only just added to the confusion.

Got it!

To be clear:

  • D121421 and D121423 and both be landed separately
  • D121424, D121425, and D121426 all need to be landed together. I don't see a way to split them up to avoid that, but if you do please let me know.

Got it; I suggest editing each description to make that super clear in those situations ("these will be landed at the same time, split up only for review convenience").

In this case:

  • I think there might be a way to re-sequence D121425 ahead of this commit.
    • The new API can be added and unit tested without deleting features from RedirectingFS.
    • There may be some tests that fail until this commit lands; you can either check for the temporary wrong behaviour and fix them in this commit, or shift those tests entirely to this commit if it's too confusing. But it'll help to see what tests suddenly start passing related to the new API.
  • Maybe this commit could be made runtime-configurable (OverlayFS constructor, off-by-default, only used in the new API to start, then the old behaviour removed later), to land it ahead of D121426? Maybe that's too complicated; just pointing out the flexibility.
    • Else, I'd suggest merging D121426 with this one. The changes don't touch the same files and it's not a massive patch so I don't think it'll confuse the review much.

In this case:

  • I think there might be a way to re-sequence D121425 ahead of this commit.
    • The new API can be added and unit tested without deleting features from RedirectingFS.
    • There may be some tests that fail until this commit lands; you can either check for the temporary wrong behaviour and fix them in this commit, or shift those tests entirely to this commit if it's too confusing. But it'll help to see what tests suddenly start passing related to the new API.
  • Maybe this commit could be made runtime-configurable (OverlayFS constructor, off-by-default, only used in the new API to start, then the old behaviour removed later), to land it ahead of D121426? Maybe that's too complicated; just pointing out the flexibility.
    • Else, I'd suggest merging D121426 with this one. The changes don't touch the same files and it's not a massive patch so I don't think it'll confuse the review much.

Landing D121425 before this one is definitely *possible*, but I'd really rather this change be atomic - neither really make sense on their own (IMO). The last option you listed said "merging D121426 with this one", but that one is basically just API updates and a few test fixes for the new behavior. Did you mean with D121425 or just all of them together?

In this case:

  • I think there might be a way to re-sequence D121425 ahead of this commit.
    • The new API can be added and unit tested without deleting features from RedirectingFS.
    • There may be some tests that fail until this commit lands; you can either check for the temporary wrong behaviour and fix them in this commit, or shift those tests entirely to this commit if it's too confusing. But it'll help to see what tests suddenly start passing related to the new API.
  • Maybe this commit could be made runtime-configurable (OverlayFS constructor, off-by-default, only used in the new API to start, then the old behaviour removed later), to land it ahead of D121426? Maybe that's too complicated; just pointing out the flexibility.
    • Else, I'd suggest merging D121426 with this one. The changes don't touch the same files and it's not a massive patch so I don't think it'll confuse the review much.

Landing D121425 before this one is definitely *possible*, but I'd really rather this change be atomic - neither really make sense on their own (IMO).

Can you explain more why you don't think it's a good idea in this case?

For complicated changes, it's not uncommon to add the new API (with tests for it), and then do adoption later.

Actually, I wonder if you could even just land these as three separate commits by doing D121424 last:

  • D121425: add new API (causes any created RedirectingFS to be created with these options turned off)
  • D121426: change clang to use it
  • D121424: remove now-unneeded RedirectingFS code and add tests proving that A->B->C case no longer works

That'd be the usual incremental flow, I think. It also gives out-of-tree clients a good commit window to adopt the new API:

  • Temporarily revert D121424 on out-of-tree branch to allow merging main.
  • Update out-of-tree code to use the new API.
  • Reapply D121424 to match main again.

This also has the major benefit of making post-commit review (e.g., reading git log -u) easier to manage.

Maybe it's not worth it here -- maybe it's a lot of work to make these separately-landable and maybe there aren't (known) external uses of RedirectingFS -- but naïvely, based on seeing three Phab reviews and having a better understanding now of what's in each, it seems like it might not be hard to keep them split?

(If there's some temporary regression to worry about, it's better to unify them, but otherwise separate commits would be preferred I think.)

The last option you listed said "merging D121426 with this one", but that one is basically just API updates and a few test fixes for the new behavior. Did you mean with D121425 or just all of them together?

I meant D121425, then D121424 + D121426, because I assumed those last two were intrinsically linked, but maybe they aren't?

Can you explain more why you don't think it's a good idea in this case?

For complicated changes, it's not uncommon to add the new API (with tests for it), and then do adoption later.

Actually, I wonder if you could even just land these as three separate commits by doing D121424 last:

  • D121425: add new API (causes any created RedirectingFS to be created with these options turned off)
  • D121426: change clang to use it
  • D121424: remove now-unneeded RedirectingFS code and add tests proving that A->B->C case no longer works

My thinking was that the state after D121425 + D121426 would be such that the redirecting-with options are implemented in both OverlayFS and RedirectingFS, which I was worried would lead to some confusion if we end up needing to revert/looking at history later/etc. I'd also need to split the parsing changes out of this patch, move into D121425, and then add a bit extra to it to still set redirecting-with on RedirectingFS.

Perhaps that isn't as bad as I was imagining though. I'm a little worried it's just Friday night and I'm not thinking this through clearly, so I'll give it the weekend and try it on Monday if I don't think of anything.

add tests proving that A->B->C case no longer works

Are you asking for a test that manually nests RedirectingFileSystem and checks that this case doesn't work? Or just the existing TEST(VFSFromYAMLsTest, NotTransitive) test in D121425? For RedirectingFileSystem all we really want to check is that ExternalFS isn't used as a fallback, which I believe is already tested in the various VFSFromYAMLTest cases. That actually brings me to something I was meaning to bring up...

One thing that I haven't gotten to in these patches is to fix up the VFS tests in general - I only started laying the groundwork for them. We currently have a whole bunch of VFSFromYAMLTest cases that should probably be split up into testing OverlayFS, RedirectingFS, and then the actual getVFSFromYAML(s) API itself. My guess is that ended up getting mixed in together because RedirectingFS became the implementation of -ivfsoverlay and thus getVFSFromYAML *was* RedirectingFileSystem.

I almost did that in this patch, but thought that it would make it more difficult to review (since other than the A -> B -> C chaining, there shouldn't be any functional changes from the -ivfsoverlay perspective). I would like to do that as a follow up though if that makes sense to you.

My thinking was that the state after D121425 + D121426 would be such that the redirecting-with options are implemented in both OverlayFS and RedirectingFS, which I was worried would lead to some confusion if we end up needing to revert/looking at history later/etc. I'd also need to split the parsing changes out of this patch, move into D121425, and then add a bit extra to it to still set redirecting-with on RedirectingFS.

Perhaps that isn't as bad as I was imagining though. I'm a little worried it's just Friday night and I'm not thinking this through clearly, so I'll give it the weekend and try it on Monday if I don't think of anything.

Doesn't sound too bad to me, but let me know what you think. In the first commit adding the new API you can add FIXME/TODOs to the old APIs indicating that the functionality is going away to make it clear what the plan is.

One major benefit is that if there is another client (besides clang), and you need to temporarily revert the "delete the old API patch", it's much less revert-churn that reverting the whole shebang (also, it doesn't block forward progress in clang).

add tests proving that A->B->C case no longer works

Are you asking for a test that manually nests RedirectingFileSystem and checks that this case doesn't work? Or just the existing TEST(VFSFromYAMLsTest, NotTransitive) test in D121425? For RedirectingFileSystem all we really want to check is that ExternalFS isn't used as a fallback, which I believe is already tested in the various VFSFromYAMLTest cases. That actually brings me to something I was meaning to bring up...

One thing that I haven't gotten to in these patches is to fix up the VFS tests in general - I only started laying the groundwork for them. We currently have a whole bunch of VFSFromYAMLTest cases that should probably be split up into testing OverlayFS, RedirectingFS, and then the actual getVFSFromYAML(s) API itself. My guess is that ended up getting mixed in together because RedirectingFS became the implementation of -ivfsoverlay and thus getVFSFromYAML *was* RedirectingFileSystem.

I almost did that in this patch, but thought that it would make it more difficult to review (since other than the A -> B -> C chaining, there shouldn't be any functional changes from the -ivfsoverlay perspective). I would like to do that as a follow up though if that makes sense to you.

One historical reason is that RedirectingFileSystem wasn't previously exposed in a header. Would be great to get this cleaned up!

bnbarham updated this revision to Diff 415618.Mar 15 2022, 3:47 PM
bnbarham retitled this revision from [VFS] Revert RedirectingFileSystem to having a single responsibility to [VFS] Simplify RedirectingFileSystem to perform mapping only.
bnbarham edited the summary of this revision. (Show Details)

Re-order to be after D121425

bnbarham planned changes to this revision.Mar 17 2022, 7:04 PM

Blocked on the dependents

ormris removed a subscriber: ormris.May 16 2022, 11:19 AM