This is an archive of the discontinued LLVM Phabricator instance.

[llvm][vfs] For virtual directories, use the virtual path as the real path
ClosedPublic

Authored by jansvoboda11 on Oct 12 2022, 9:17 PM.

Details

Summary

A follow-up to D135841. This patch returns the virtual path for directories from RedirectingFileSystem. This ensures the contents of Path are the same as the contents of FS->getRealPath(Path). This also means we can drop the workaround in Clang's module map canonicalization, where we couldn't use the real path for a directory if it resolved to a different DirectoryEntry. In addition to that, we can also avoid introducing new workaround for a bug triggered by the newly introduced test case.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Oct 12 2022, 9:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 9:17 PM
jansvoboda11 requested review of this revision.Oct 12 2022, 9:17 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 12 2022, 9:17 PM

Return canonical virtual path from RedirectingFileSystem

jansvoboda11 retitled this revision from [clang][deps][lex] Avoid canonicalization of remapped framework directories (take 2) to [llvm] Return canonical virtual path from `RedirectingFileSystem`.Jun 30 2023, 6:58 AM
jansvoboda11 added a reviewer: benlangmuir.

Module map canonicalization in clang-scan-deps causes issues even after D135841 landed.

Here's the problematic scenario this patch aims to solve:

  • there is a framework FW whose headers (but not module maps) are redirected in a VFS overlay file
  • there is a TU with #import <fw/Header.h> (note the incorrect case of the framework name)
  • what follows is how Clang treats the involved file/directory paths:
    • find the incorrectly-cased header through the case-insensitive VFS overlay
    • chop off Headers/Header.h
    • call FileManager::getCanonicalName() on the fw.framework directory
      • RedirectingFileSystem calls the underlying FS to get the real path
        • note: this happens since the node is not RedirectingFileSystem::{FileEntry,DirectoryRemapEntry}, see the last branch of RedirectingFileSystem::getRealPath()
        • the underlying case-sensitive FS doesn't recognize the incorrectly-cased path
      • FileManager falls back to just using Dir.getName() (fw.framework) for that virtual DirectoryEntry as the canonical path
    • the fw.framework/Modules/module.modulemap file is not in the VFS overlay, and due to the wrong case, the underlying case-sensitive FW doesn't recognize it either
    • no matching module map means #include <fw/Header.h> is resolved to textual include
    • that header then includes other headers from the same FW using correct case: #import <FW/AnotherHeader.h>
    • the correctly-cased module map is found in the underlying FS and the framework gets compiled into a PCM file
    • at the end of dependency scan, scanner takes the correctly-cased module map path from the PCM file
    • scanner tries to canonicalize the module map path
      • Modules/module.modulemap is chopped off and FileManager::getCanonicalName() is called on the framework directory
      • FileManager::getDirectory("FW.framework") resolves to the same DirectoryEntry that fw.framework resolved to (VFS overlay is case-insensitive)
      • FileManager looks in the cache of canonical names and finds fw.framework for that virtual DirectoryEntry
      • FW.framework/Modules/module.modulemap is canonicalized to fw.framework/Modules/module.modulemap
    • scanner asserts that file exists, but it's not in the VFS nor in the underlying case-sensitive FS

This patch fixes this issue by returning the canonical virtual path for fw.framework at the very beginning. This is achieved by using the as-written spelling from the overlay file.

I can think of two other alternatives to this patch:

  • In ModuleMap::canonicalizeModuleMapPath(), avoid canonicalization if the new module map path resolves to a different (or no) FileEntry.
  • In FileManager, cache real paths based on the directory name, not the DirectoryEntry.

Both of these alternatives are workarounds, whereas the VFS fix seems fairly straightforward.

Note that I plan to add unit tests for this and add // XFAIL: case-insensitive-filesystem to the clang-scan-deps test if we get consensus on this approach.

Just realized this most likely won't work if the case-insensitive VFS is specified with wrong spelling too, e.g. Fw.framework. Maybe we'll need to combine this patch with one of the alternatives after all.

Just realized this most likely won't work if the case-insensitive VFS is specified with wrong spelling too, e.g. Fw.framework.

Is this about the spelling in the VFS definition itself, or about the path being looked up? If it's the VFS definition maybe we can say that's a bug for whoever wrote the VFS? Ideally we could diagnose but I'm not sure we want to pay for a lot of calls to realpath.

llvm/include/llvm/Support/VirtualFileSystem.h
876

SmallVector?

Just realized this most likely won't work if the case-insensitive VFS is specified with wrong spelling too, e.g. Fw.framework.

Is this about the spelling in the VFS definition itself, or about the path being looked up?

Yes, spelling in the VFS definition itself.

If it's the VFS definition maybe we can say that's a bug for whoever wrote the VFS? Ideally we could diagnose but I'm not sure we want to pay for a lot of calls to realpath.

You mean diagnosing whenever the spelling in the VFS definition differs from its realpath? How could we make that work with symlinks in place?

You mean diagnosing whenever the spelling in the VFS definition differs from its realpath?

Right, this would be ideal, but may be too expensive in practice.

How could we make that work with symlinks in place?

Ah right, we are intentionally allowing symlinks in the VFS path (e.g. Foo.framework/Headers) because we don't correctly handle symlink resolution in the VFS itself.

You can get the canonical spelling of a symlink in a couple of ways: you can readdir the parent directory and find all case-insensitive matches. If there is only 1, that's the spelling, if there are more than 1 then it's a case-sensitive filesystem and the original path must be correct. Another way that is OS-specific is on Darwin you can open(..., O_SYMLINK) the path component to open the symlink itself and then fcntl(..., F_GETPATH, ...) to get its realpath. Both of these approaches require handling each component of the path individually, though each step is cacheable. Not sure if there are any better ways.

As I said, not sure we can afford to diagnose this.

Okay, I'll try to figure out how expensive this would be. I'd like Clang to be stricter in situations like these, but if that's not feasible, I'll probably implement the first workaround.

For the second workaround, I don't think that moves us in better direction - we should be able to assign one canonical real path to each DirectoryEntry.

The unfortunate aspect of the ModuleMap::canonicalizeModuleMapPath() implementation is that it uses FileManager.getDirectoryRef(llvm::sys::path::parent_path(ModuleMapFilePath)). If the request for getting the module map file fell through the VFS into the underlying FS, parent DirectoryEntry of that file is different than the virtual directory made for the VFS. Chopping off the file name and going through the VFS again doesn't guarantee falling through the VFS again, which that code doesn't account for. It'd be really nice to have DirectoryEntry::getDir() API, so that we can walk up the directory hierarchy while preserving the virtual/real distinction between directories in the overlay/on disk, never accidentally "bubbling up" into the overlay again. What's your take on that?

It'd be really nice to have DirectoryEntry::getDir() API, so that we can walk up the directory hierarchy while preserving the virtual/real distinction between directories in the overlay/on disk, never accidentally "bubbling up" into the overlay again. What's your take on that?

Can you say more about how you would do this and preserve the real/virtual distinction? Currently FileManager is just filling in the directory based on the parent path with a lookup to the VFS, so isn't it the same issue? Or did you mean we would keep more info?

It'd be really nice to have DirectoryEntry::getDir() API, so that we can walk up the directory hierarchy while preserving the virtual/real distinction between directories in the overlay/on disk, never accidentally "bubbling up" into the overlay again. What's your take on that?

Can you say more about how you would do this and preserve the real/virtual distinction? Currently FileManager is just filling in the directory based on the parent path with a lookup to the VFS, so isn't it the same issue? Or did you mean we would keep more info?

Yes, the mechanism you mention has the same issue. My idea was that we could keep more information when looking up the path in the VFS, associate each {File,Directory}Entry with list of FileSystem * that "owns" each of its parent directories. We could then send the llvm::sys::path::parent_path(...) argument directly to the owning FS instead to the top one. This is not realistic from performance standpoint, though.

Add unit tests, use SmallVector, make clang-scan-deps test portable

jansvoboda11 retitled this revision from [llvm] Return canonical virtual path from `RedirectingFileSystem` to [llvm][vfs] For virtual directories, use the virtual path as real path.Jul 6 2023, 6:17 AM
jansvoboda11 edited the summary of this revision. (Show Details)

I think this patch should now stand on its own. Dealing with a VFS definition that uses the wrong case for virtual paths is not necessary for correctness. If Clang asks the FS for a file using the correct case, the query either fails and we never get far enough to need module map path canonicalization, or the query succeeds thanks to case-insensitivity of the VFS and canonicalizing to the VFS spelling means all future queries using it will succeed as well.

jansvoboda11 retitled this revision from [llvm][vfs] For virtual directories, use the virtual path as real path to [llvm][vfs] For virtual directories, use the virtual path as the real path.Jul 6 2023, 6:27 AM
benlangmuir accepted this revision.Jul 6 2023, 11:13 AM
This revision is now accepted and ready to land.Jul 6 2023, 11:13 AM
This revision was landed with ongoing or failed builds.Jul 10 2023, 10:41 AM
This revision was automatically updated to reflect the committed changes.