This is an archive of the discontinued LLVM Phabricator instance.

[vfs] Allow root paths relative to the directory of the vfsoverlay YAML file
ClosedPublic

Authored by haowei on Nov 4 2022, 5:12 PM.

Details

Summary

VFS overlay file allow using relative paths in root->name paths (root directory of the virtual file) and external-contents (the actual files). While the 'external-contents' could be configured to relative to directory of the YAML file (change https://reviews.llvm.org/D17457), the root paths could only be relative to the current working directory.

This patch add the "root-relative" option to the YAML file format. When this option is set to "yaml-dir" the root->name will be prepend by the YAML file directory. This option is helpful when compiling on case sensitive file systems when cross compiling to Windows as we can create a vfsoverlay YAML file for the Windows libraries without using absolute paths. Related change: https://reviews.llvm.org/D125800

Diff Detail

Event Timeline

haowei created this revision.Nov 4 2022, 5:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2022, 5:12 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
haowei requested review of this revision.Nov 4 2022, 5:12 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 4 2022, 5:12 PM
haowei edited the summary of this revision. (Show Details)Nov 7 2022, 10:29 AM
haowei added reviewers: bnbarham, keith.

Please take a look.

phosek added inline comments.Nov 8 2022, 12:39 AM
llvm/include/llvm/Support/VirtualFileSystem.h
660

Could we make this just a boolean akin to overlay-relative since there are only two options (default to false)?

This seems reasonable to me in general. @dexonsmith in case you have any thoughts.

clang/test/VFS/Inputs/root-relative-overlay.yaml
4

I'd prefer a test without overlay-relative set to make it clear they don't depend on each other.

llvm/include/llvm/Support/VirtualFileSystem.h
660

I personally prefer being explicit here, overlay-relative is fairly confusing as it is.

overlay-relative isn't about allowing relative paths, but instead means that *all* external paths should be prefixed with the directory of the overlay. To put another way, external paths can be relative whether this is true/false, overlay-relative just *always* prepends the overlay path.

Could you add a comment to make it clear that this has no interaction with overlay-relative? If you want to add a comment to overlay-relative with something like the above that would also be appreciated :)

752

to the current working directory when the overlay is created. is maybe a little clearer to me

755

Any thoughts on something like OverlayDir instead?

926

*and the path is relative*

llvm/lib/Support/VirtualFileSystem.cpp
1903

IMO both this and CWD should be using the base FS instead. VFS didn't have a CWD previously, but now that it does it doesn't really make sense to use the process wide CWD. Especially since -working-directory doesn't change it.

This seems reasonable to me in general. @dexonsmith in case you have any thoughts.

SGTM! (I haven't reviewed in detail but I figure @bnbarham is on it...)

bnbarham added inline comments.Nov 8 2022, 12:58 PM
clang/test/VFS/Inputs/root-relative-overlay.yaml
4

There's also unit tests in llvm/unittests/Support/VirtualFileSystemTest.cpp that you could add to.

haowei updated this revision to Diff 474115.Nov 8 2022, 4:56 PM
haowei marked 2 inline comments as done.
haowei added inline comments.Nov 8 2022, 5:00 PM
clang/test/VFS/Inputs/root-relative-overlay.yaml
4

I need some time to patch this unit test for testing the new option.
Existing functions in this unit test file does not support defining the YAMLFilePath.

I will update this patch when I finish.

4

I updated the yaml file and add unit tests with overlay-relative set to true and false.

llvm/include/llvm/Support/VirtualFileSystem.h
660

I also think overlay-relative is a bit misleading and it should be an enum instead of a boolean option. And it should only work if the external paths are relative instead of blindly prepend the overlay dir to every external paths. But changing this will be a breaking change so I prefer to avoid it in this patch.

I updated the descriptions.

752

Added into the comments.

755

I think OverlayDir is better.

llvm/lib/Support/VirtualFileSystem.cpp
1903

Could you clarify a bit more about the "base FS" please? I am still quite new to the LLVM VFS system. Which API should I use to get the appropriate working directory instead of the the process wide CWD?

I avoided changing the default behavior (relative to the process's current working directory) as I am a bit concerned breaking other people's use cases.

I looked up a bit, the value "--working-directory" will be writen into "VFS->setCurrentWorkingDirectory()" Do you think it is a better idea to use this value instead of the process current working directory? Though it would still be a behavior change for users rely on process current working directories.

bnbarham added inline comments.Nov 8 2022, 5:21 PM
llvm/include/llvm/Support/VirtualFileSystem.h
660

FWIW it was added explicitly to always prepend, it's used for reproducers where the overlay may already have absolute paths. I was just trying to explain what overlay-relative does.

llvm/lib/Support/VirtualFileSystem.cpp
1903

Could you clarify a bit more about the "base FS" please?

By "base FS" I just meant the filesystem that's passed down when creating the RedirectingFileSystem, ie. ExternalFS in getVFSFromYAML.

Which API should I use to get the appropriate working directory instead of the the process wide CWD?

ExternalFS->getCurrentWorkingDirectory() and ExternalFS->makeAbsolute is what I'd expect to be used here.

Do you think it is a better idea to use this value instead of the process current working directory? Though it would still be a behavior change for users rely on process current working directories.

Yes, I think it's surprising that it currently *isn't* using it. It's just a hold over from when the VFS didn't have the concept of CWD. I'd be happy with accepting this patch as is, with a separate patch after that changes to using the VFS CWD.

haowei updated this revision to Diff 474391.Nov 9 2022, 4:42 PM

Add additional unit test.

haowei added inline comments.Nov 9 2022, 4:45 PM
clang/test/VFS/Inputs/root-relative-overlay.yaml
4

I updated llvm/unittests/Support/VirtualFileSystemTest.cpp to include tests on root-relative option.

llvm/lib/Support/VirtualFileSystem.cpp
1903

Yes, I think it's surprising that it currently *isn't* using it. It's just a hold over from when the VFS didn't have the concept of CWD. I'd be happy with accepting this patch as is, with a separate patch after that changes to using the VFS CWD.

I see, I can upload a follow up change after this one is landed. to use VFS CWD instead of process CWD here. It is also easier to revert if anything goes wrong.

haowei updated this revision to Diff 475347.Nov 14 2022, 10:53 PM

Unit test failures under Windows should be fixed now.

Root->name will also use base FS's current directory instead of process current directory now. Please take a look.

bnbarham added inline comments.Nov 15 2022, 10:29 AM
llvm/include/llvm/Support/VirtualFileSystem.h
1000

Should be private IMO

llvm/lib/Support/VirtualFileSystem.cpp
1363

Did you find this was needed? It's already checked by the normal makeAbsolute (which checks this already) and the only other caller is when we're using the overlay directory path, which should always be absolute.

1901

This is unused now. Maybe we should just merge the else if and else branches and just grab either getOverlayFileDir or getCurrentWorkingDirectory depending on RootRelative. They should otherwise be identical.

llvm/unittests/Support/VirtualFileSystemTest.cpp
1875 ↗(On Diff #475347)

Just override the previous FS/S IMO - that way we avoid the accidental FS->status a few lines down 😅

haowei updated this revision to Diff 475574.Nov 15 2022, 2:17 PM

I have to revert back to use process cwd instead of base fs cwd. There are a few tests relied the behavior of using process cwd that I need to take a closer look.

haowei updated this revision to Diff 475591.Nov 15 2022, 2:50 PM

Address review comments

haowei marked 2 inline comments as done.Nov 15 2022, 2:51 PM
haowei added inline comments.
llvm/include/llvm/Support/VirtualFileSystem.h
1000

Moved it to private section and added comments

llvm/lib/Support/VirtualFileSystem.cpp
1363

No, it is not needed for now. As the OverlayDir is always abs when it was set. The normal makeAbsolute function checks "Path" param but not checking the returned WorkDir, which is OK because the return value from getCurrentWorkingDirectory will be abs or empty.

It is more to be future proof if someone decide to use this function for other purpose and putting a non-abs path in WorkingDir field will causes unexpected behaviors.

I also need to exclude empty WorkingDir though because in unit tests, the WorkingDir is always empty.

1901

I merged these 2 else block.
I have to use sys::fs::make_absolute function for now as using FS's CWD will break a few tests. It is better to address these test in a separate patch.

llvm/unittests/Support/VirtualFileSystemTest.cpp
1875 ↗(On Diff #475347)

Thanks for pointing that out 😅

haowei updated this revision to Diff 475884.Nov 16 2022, 11:45 AM
haowei marked 2 inline comments as done.

Fixing Windows test failures.
overlay-relative flag will always using native path separator, therefore, it needs a separate base FS and OverlayYAML file setup on Windows. This diff adds this.

bnbarham added inline comments.Nov 18 2022, 12:56 PM
llvm/lib/Support/VirtualFileSystem.cpp
1919–1921

windows_slash is going to be windows_backslash here. I assume this was fine since sys::fs::make_absolute(Name) would always return the native style, but that isn't the case now. It's also unfortunate we calculate this again when makeAbsolute only just did it.

1920

Why the canonicalization?

llvm/unittests/Support/VirtualFileSystemTest.cpp
1878 ↗(On Diff #475884)

separator -> separator

As per my comment above, this is only true because we're either using posix or windows_backslash. I'm not sure what we really want here, should we always convert to native? Any thoughts @dexonsmith?

FWIW the non-windows case is identical to the above block. It'd be interesting to test / and \ on both filesystems.

dexonsmith added a subscriber: rnk.Nov 18 2022, 1:21 PM
dexonsmith added inline comments.
llvm/unittests/Support/VirtualFileSystemTest.cpp
1878 ↗(On Diff #475884)

I think we might need a Windows person to help answer; @rnk, can you suggest someone to help answer @bnbarham's question?

rnk added inline comments.
llvm/unittests/Support/VirtualFileSystemTest.cpp
1878 ↗(On Diff #475884)

I can try suggesting @hans, @compnerd, or @Bigcheese

haowei added inline comments.Nov 18 2022, 2:46 PM
llvm/lib/Support/VirtualFileSystem.cpp
1919–1921

Thanks for pointing that out. I need to consider the windows forward slash case.

There is a bit in consistency in the VFS implementation. for Windows. we can use forward slashes('/') on Windows. Most Windows API/Syscalls accept it. But some APIs may not work if you mix '/' with '\' in the path on Windows. What RedirectFileSystem::makeAbsolute is trying to do is, it first determine the slashes used in the WorkDir(either from process working directory or from command line flags, or OverlayDir), in which it can be a forward slash on Windows. Then it uses the same separator when appending the relative directory. Using sys::fs::make_absolute will break that.

The reason that I use makeAbsolute here is that the OverlayDir will likely use forward slash on Windows if people uses CMake options LINK_FLAG="/vfsoverlay:C:/path/to/overlay.xml". In that case, sys::fs::make_absolute will generate some paths like C:/path/to/foo\\bar, which may cause issues.

I will rewrite this to consider the forward slashes.

1920

If the Name here somehow has something like //foo/./../bar, it will cause match failures. Canonicalize function removes these dots.
The setOverlayFileDir does not canonicalize the dir so this is quite common to see.

llvm/unittests/Support/VirtualFileSystemTest.cpp
1878 ↗(On Diff #475884)

The non-windows case enable the overlay-relative option. The above block didn't. And that is where I discover the issue with overlay-relative always uses native separator instead of trying to match OverlayDir.

The best approach for root-relative and overlay-relative option would probably be matching the style of OverlayDir/WorkDir and unify the path separator if they are mixed in a path.
But this change will likely affects a few tests.

phosek added inline comments.Nov 18 2022, 6:47 PM
llvm/lib/Support/VirtualFileSystem.cpp
1919–1921

This was also discussed in D87732 which may provide some additional context.

haowei updated this revision to Diff 482644.Dec 13 2022, 3:28 PM

I changed some comments and add more details.
If there is no objection, can I get an approval on this change so I can land it? It will unblock our development on Windows cross compilation support.

llvm/lib/Support/VirtualFileSystem.cpp
1919–1921

I looked into the implementation of llvm path library and how windows_backslash and windows_slash are implemented and used.
TL;DR,
The difference between windows_backslash and windows_slash is that when llvm::get_separator is called, it returns a different slash. The rest of the API behaviors are the same. So in this VirtualFileSystem.cpp, it doesn't matter if the path_style is assigned to
sys::path::Style::windows_backslash or sys::path::Style::windows_slash here as the output of this function parseEntry will be the same.

llvm/unittests/Support/VirtualFileSystemTest.cpp
1878 ↗(On Diff #475884)

I corrected the typo.

I did some Windows API tests in the past week and I can confirm on Windows 10 (but should be apply to all NT6 family Windows), the forward slashes "/" and backward slashes "\" are both legit path separators and they be mixed in a path in any combinations. They will behave properly and no errors will be reported. However, the cmd.exe and some other Windows program will have some issue if forward slashes is used or mixed with backward slashes because it looks like flag switches, probably because these programs parse the user Input and doing their own path processing before invoking Windows APIs.

In LLVM's case, since File IO will rely on Windows APIs, I think how my current code handling the path separators under relative path scenario is sufficient. It shouldn't bring new issues.

On Linux and other POSIX OS, the backward slash is a legit path character.
e.g.
/foo/bar/\\a/b means

foo
├- bar
      ├ `\a`
            ├ b

Therefore, if backward slashes are used in the overlay file under Linux, they should not be treated as path separators. This is how it is currently implemented in LLVM path library and VFS library didn't perform any path separator conversion as well. My patch didn't change this behavior.

Do you think we still need to add additional tests?

haowei updated this revision to Diff 482658.Dec 13 2022, 4:34 PM

Rebased the change to solve the merge conflicts on the bots.

haowei updated this revision to Diff 482660.Dec 13 2022, 4:36 PM
bnbarham accepted this revision.Dec 13 2022, 4:49 PM

LGTM. If you could put up a PR after to fix the use of sys::fs::make_absolute that would be appreciated 🙇.

This revision is now accepted and ready to land.Dec 13 2022, 4:49 PM
This revision was landed with ongoing or failed builds.Dec 16 2022, 11:45 AM
This revision was automatically updated to reflect the committed changes.

LGTM. If you could put up a PR after to fix the use of sys::fs::make_absolute that would be appreciated 🙇.

Thanks, I will start working on the fix.