[VirtualFileSystem] InMemoryFileSystem::status: Return a Status with the requested name
ClosedPublic

Authored by simark on Tue, Jul 3, 3:41 PM.

Details

Summary

InMemoryFileSystem::status behaves differently than
RealFileSystem::status. The Name contained in the Status returned by
RealFileSystem::status will be the path as requested by the caller,
whereas InMemoryFileSystem::status returns the normalized path.

For example, when requested the status for "../src/first.h",
RealFileSystem returns a Status with "../src/first.h" as the Name.
InMemoryFileSystem returns "/absolute/path/to/src/first.h".

The reason for this change is that I want to make a unit test in the
clangd testsuite (where we use an InMemoryFileSystem) to reproduce a
bug I get with the clangd program (where a RealFileSystem is used).
This difference in behavior "hides" the bug in the unit test version.

Diff Detail

Repository
rC Clang
simark created this revision.Tue, Jul 3, 3:41 PM
simark updated this revision to Diff 154010.Tue, Jul 3, 3:44 PM

Update commit message

I'm not familiar with this part of code, but the change looks fine to me. I think @bkramer is the right person to review it.

Please make sure the style align with LLVM code style.

lib/Basic/VirtualFileSystem.cpp
525–527

I think we should have a comment saying the InMemoryFile has the same behavior as the real file system.

Mimicing RealFS seems like the right thing to do here, so I would vouch for checking this change in.
I'm a little worried that there are tests/clients relying on the old behavior, have you run all the tests?

Also, could you please run git-clang-format to fix the formatting issues?

lib/Basic/VirtualFileSystem.cpp
536

NIT: The formatting is broken here.

539

Maybe add a RequestedName parameter to the InMemoryNode instead to make sure it's not misused?
It looks like all the clients calling it have to change the name and some are not doing it now, e.g. the directory iterator will use statuses with full paths.

540

Maybe add a comment that this matches the real file system behavior?

740

NIT: The indent is incorrect here.

742

NIT: we don't need str() here

Sorry, have missed the @hokein 's comments, so one of mine seems like a duplicate.

simark added a comment.Wed, Jul 4, 7:02 AM

I'm not familiar with this part of code, but the change looks fine to me. I think @bkramer is the right person to review it.

Please make sure the style align with LLVM code style.

Woops indeed I forgot to run git clang-format.

simark added a comment.Wed, Jul 4, 7:08 AM

Mimicing RealFS seems like the right thing to do here, so I would vouch for checking this change in.
I'm a little worried that there are tests/clients relying on the old behavior, have you run all the tests?

I didn't have time yesterday, I'm doing this now. Is ninja/make clang-test sufficient?

simark updated this revision to Diff 154118.Wed, Jul 4, 8:42 AM
  • Fixed formatting (ran git-clang-format)
  • Fixed expectation in TEST_F(InMemoryFileSystemTest, WorkingDirectory)
  • Added test TEST_F(InMemoryFileSystemTest, StatusName)
simark added a comment.Wed, Jul 4, 8:43 AM

I ran ninja clang-test:

Testing Time: 1720.20s
  Expected Passes    : 12472
  Expected Failures  : 18
  Unsupported Tests  : 263

I usually run ninja check-clang check-clang-tools for clang changes. Have never used clang-test, not sure what it does.
I ran it with this change, found a few failures from clang-move:

Failing Tests (5):
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.AddDependentNewHeader
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.AddDependentOldHeader
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.DontMoveAll
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.MoveHeaderAndCC
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.MoveHeaderOnly

I usually run ninja check-clang check-clang-tools for clang changes. Have never used clang-test, not sure what it does.

I think clang-test is an alias for check-clang.

I ran it with this change, found a few failures from clang-move:

Failing Tests (5):
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.AddDependentNewHeader
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.AddDependentOldHeader
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.DontMoveAll
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.MoveHeaderAndCC
    Extra Tools Unit Tests :: clang-move/./ClangMoveTests/ClangMove.MoveHeaderOnly

Doh, I'll run check-clang-tools too.

simark added a comment.Wed, Jul 4, 1:52 PM

I opened D48951 to fix the failures in clang-move.

hokein added a comment.Thu, Jul 5, 7:47 AM

Seems to me you have a few comments unaddressed (and make sure you marked them done when updating the patch).

simark added a comment.Thu, Jul 5, 7:53 AM

Seems to me you have a few comments unaddressed (and make sure you marked them done when updating the patch).

Ah damn I missed them, I'm not too used to how Phabricator displays things. I'll do that.

simark marked 6 inline comments as done.Thu, Jul 5, 2:29 PM
simark added inline comments.
lib/Basic/VirtualFileSystem.cpp
536

Hmm this is what git-clang-format does (unless this comment refers to a previous version where I had not run clang-format).

539

Ok, I tried to do something like that.

540

I added a comment to InMemoryNode::getSatus, since that's where the copyWithNewName is done now.

742

Otherwise I'm getting this:

/home/emaisin/src/llvm/tools/clang/lib/Basic/VirtualFileSystem.cpp:1673:9: error: no matching function for call to 'copyWithNewName'
    S = Status::copyWithNewName(S, Path);
        ^~~~~~~~~~~~~~~~~~~~~~~
/home/emaisin/src/llvm/tools/clang/lib/Basic/VirtualFileSystem.cpp:76:16: note: candidate function not viable: no known conversion from 'const llvm::Twine' to 'llvm::StringRef' for 2nd argument
Status Status::copyWithNewName(const Status &In, StringRef NewName) {
               ^
/home/emaisin/src/llvm/tools/clang/lib/Basic/VirtualFileSystem.cpp:82:16: note: candidate function not viable: no known conversion from 'clang::vfs::Status' to 'const llvm::sys::fs::file_status' for 1st argument
Status Status::copyWithNewName(const file_status &In, StringRef NewName) {
               ^
simark updated this revision to Diff 154321.Thu, Jul 5, 2:44 PM
simark marked 4 inline comments as done.
  • Add RequestedName to InMemoryNode::getStatus.
  • Also fix the directory_iterator code path.
hokein added a comment.Fri, Jul 6, 5:41 AM

The code looks good. I'll let Ben take a final look.

lib/Basic/VirtualFileSystem.cpp
493

Can we use llvm::StringRef here?

unittests/Basic/VirtualFileSystemTest.cpp
950

This is not used.

simark updated this revision to Diff 154422.Fri, Jul 6, 9:13 AM
simark marked an inline comment as done.
  • Use StringRef in InMemoryNode::getStatus
  • Remove unused variable in TEST_F(InMemoryFileSystemTest, StatusName)
simark added a comment.Fri, Jul 6, 3:03 PM

I found something fishy. There is this code in FileManager.cpp:

if (UFE.File)
  if (auto RealPathName = UFE.File->getName())
    UFE.RealPathName = *RealPathName;

The real path is obtained from UFE.File->getName(). In the RealFile implementation, it returns the RealName field, which is fine. For other implementations, it uses File::getName, which is:

/// Get the name of the file
virtual llvm::ErrorOr<std::string> getName() {
  if (auto Status = status())
    return Status->getName().str();
  else
    return Status.getError();
}

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

    I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

I'd also say FileManager should use FileSystem::getRealPath. The code that does it differently was there before FileSystem::getRealPath was added.
And RealFile should probably not do any magic in getName, we could add a separate method for (getRealName?) if that's absolutely needed.

Refactorings in that area would probably break stuff and won't be trivial and I don't think this change should be blocked by those. So I'd be happy if this landed right away with a FIXME in FileManager mentioning that InMemoryFileSystem might give surprising results there.
@ioeric added FileSystem::getRealPath, he may more ideas on how we should proceed.

lib/Basic/VirtualFileSystem.cpp
536

This LG now, it was unindented in the original version.

742

Sorry, I thought Path is StringRef, but it's actually Twine, so we do need the str() call.

ilya-biryukov added inline comments.Mon, Jul 9, 1:13 AM
lib/Basic/VirtualFileSystem.cpp
496

Given that this method is inconsistent with getStatus() and seems to be only used in toString methods, maybe we could make it protected? Otherwise it's really easy to write code that gets the wrong path.

simark marked 6 inline comments as done.Mon, Jul 9, 7:59 AM
simark added inline comments.
lib/Basic/VirtualFileSystem.cpp
496

I now use it in InMemoryDirIterator::setCurrentEntry. I will write a comment to the getName method to clarify this.

simark marked an inline comment as done.Mon, Jul 9, 9:58 AM

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

    I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

I'd also say FileManager should use FileSystem::getRealPath. The code that does it differently was there before FileSystem::getRealPath was added.
And RealFile should probably not do any magic in getName, we could add a separate method for (getRealName?) if that's absolutely needed.

I made FileManager::getFile use FileSystem::getRealPath and see no regression in clang and clang-tools-extra.

simark updated this revision to Diff 154631.Mon, Jul 9, 9:59 AM
  • Use FileSystem::getRealPath in FileManager::getFile
simark updated this revision to Diff 154662.Mon, Jul 9, 11:52 AM
  • Change InMemoryNode::getName to InMemoryNode::getFileName, to reduce the risk

of mis-using it. Make the Stat field protected, make the subclasses' toString
access it directly.

ilya-biryukov added inline comments.Tue, Jul 10, 1:03 AM
lib/Basic/FileManager.cpp
320

NIT: replace replace equality with negative test, i.e. if (!FS->getRealPath(…))

I'm not a big fan of bash-like error codes, but that seems to be the idiomatic way to use them.

lib/Basic/VirtualFileSystem.cpp
486

The inheritors should not be able to modify this field.
Can we get away with a private field and a protected getter that returns a const reference instead?

496

getFileName as a public method and its usage in setCurrentEntry LG , thanks!

simark updated this revision to Diff 154782.Tue, Jul 10, 6:17 AM
simark marked 5 inline comments as done.
  • Make InMemoryNode::Stat private again, add protected accessor.
  • Change condition formatting.
ilya-biryukov accepted this revision.Tue, Jul 10, 7:23 AM

LGTM if that does not introduce any regressions in clang and clang-tools.

lib/Basic/VirtualFileSystem.cpp
770

NIT: maybe increase the size to 256? This could save an extra allocation in some cases, and hopefully won't be expensive, stack allocs are cheap in most cases.

This revision is now accepted and ready to land.Tue, Jul 10, 7:23 AM
simark marked an inline comment as done.Tue, Jul 10, 10:11 AM

LGTM if that does not introduce any regressions in clang and clang-tools.

Thanks. I have seen no failures in check-clang and check-clang-tools, so I will push it.

simark updated this revision to Diff 154835.Tue, Jul 10, 10:21 AM

Bump SmallString size from 32 to 256

Thanks. I have seen no failures in check-clang and check-clang-tools, so I will push it.

LG! We can always revert the change is anything breaks...

This revision was automatically updated to reflect the committed changes.

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

    I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

I'd also say FileManager should use FileSystem::getRealPath. The code that does it differently was there before FileSystem::getRealPath was added.
And RealFile should probably not do any magic in getName, we could add a separate method for (getRealName?) if that's absolutely needed.

Refactorings in that area would probably break stuff and won't be trivial and I don't think this change should be blocked by those. So I'd be happy if this landed right away with a FIXME in FileManager mentioning that InMemoryFileSystem might give surprising results there.
@ioeric added FileSystem::getRealPath, he may more ideas on how we should proceed.

Sorry for being late in the discussion. I'm not sure if getRealPath is the right thing to do in FileManager as it also supports virtual files added via FileManager::getVirtualFile, and they don't necessary have a real path. This would also break users of ClangTool::mapVirtualFile when the mapped files are relative. As a matter of fact, I'm already seeing related breakages in our downstream branch :(

Would you mind reverting this patch for now so that we can come up with a solution to address those use cases?

Sorry again about missing the discussion earlier!

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

    I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

I'd also say FileManager should use FileSystem::getRealPath. The code that does it differently was there before FileSystem::getRealPath was added.
And RealFile should probably not do any magic in getName, we could add a separate method for (getRealName?) if that's absolutely needed.

Refactorings in that area would probably break stuff and won't be trivial and I don't think this change should be blocked by those. So I'd be happy if this landed right away with a FIXME in FileManager mentioning that InMemoryFileSystem might give surprising results there.
@ioeric added FileSystem::getRealPath, he may more ideas on how we should proceed.

Sorry for being late in the discussion. I'm not sure if getRealPath is the right thing to do in FileManager as it also supports virtual files added via FileManager::getVirtualFile, and they don't necessary have a real path. This would also break users of ClangTool::mapVirtualFile when the mapped files are relative. As a matter of fact, I'm already seeing related breakages in our downstream branch :(

Would you mind reverting this patch for now so that we can come up with a solution to address those use cases?

Sorry again about missing the discussion earlier!

Sorry, having taken a closer look, it seems that only ClangTool::mapVirtualFile would be affected. I'll try to see if there is an easy fix.

Would you mind reverting this patch for now so that we can come up with a solution to address those use cases?

Sorry again about missing the discussion earlier!

Of course, feel free to revert if needed (I'm not sure how to do that). Are you able to come up with a test case that covers the use case you mention?

Would you mind reverting this patch for now so that we can come up with a solution to address those use cases?

Sorry again about missing the discussion earlier!

Of course, feel free to revert if needed (I'm not sure how to do that). Are you able to come up with a test case that covers the use case you mention?

Thanks! I'll try to come up with a smaller test case asap.

The virtual file support in FileManager is totally confusing and full of traps (when used with VFS). I really feel we should replace it with InMemoryFileSystem at some point :(

After looking at a few more use cases of the in-memory file system, I think a problem we need to address is the consistency of file paths you get from clang when using in-mem vfs. The clang-move tests that you have mitigated in D48951 could be an example.

For example, suppose you have header search directories -I/path/to/include and -I. in your compile command. When preprocessor searches for an #include "x.h", it will try to stat "/path/to/include/x.h" and "./x.h" and take the first one that exists. This can introduce indeterminism for the path (./x.h or /abs/x.h) you later get for the header file, e.g. when you try to look up file name by FileID through SourceManager. The path you get for a FileEntry or FileID would depend on how clang looks up a file and how a file is first opened into SourceManager/FileManager. It seems that the current behavior of clangd + in-memory file system would give you paths that are relative to the working directory for both cases. I'm not sure if that's the best behavior, but I think the consistency has its value. For example, in unit tests where in-memory file systems are heavily used, it's important to have a way to compare the reported file path (e.g. file path corresponding to a source location) with the expected paths. We could choose to always return real path, which could be potentially expensive, or we could require users to always compare real paths (it's unclear how well this would work though e.g. ClangTool doesn't expose its vfs), but they need to be enforced by the API. Otherwise, I worry it would cause more confusions for folks who use clang with in-memory file system in the future.

For example, suppose you have header search directories -I/path/to/include and -I. in your compile command. When preprocessor searches for an #include "x.h", it will try to stat "/path/to/include/x.h" and "./x.h" and take the first one that exists. This can introduce indeterminism for the path (./x.h or /abs/x.h) you later get for the header file, e.g. when you try to look up file name by FileID through SourceManager. The path you get for a FileEntry or FileID would depend on how clang looks up a file and how a file is first opened into SourceManager/FileManager. It seems that the current behavior of clangd + in-memory file system would give you paths that are relative to the working directory for both cases. I'm not sure if that's the best behavior, but I think the consistency has its value. For example, in unit tests where in-memory file systems are heavily used, it's important to have a way to compare the reported file path (e.g. file path corresponding to a source location) with the expected paths. We could choose to always return real path, which could be potentially expensive, or we could require users to always compare real paths (it's unclear how well this would work though e.g. ClangTool doesn't expose its vfs), but they need to be enforced by the API. Otherwise, I worry it would cause more confusions for folks who use clang with in-memory file system in the future.

It's hard to tell without seeing an actual failing use case. I understand the argument, but I think the necessity to work as closely as RealFileSystem as possible is important. Otherwise, it becomes impossible to reproduce bugs happening in the real world in unittests.

For example, suppose you have header search directories -I/path/to/include and -I. in your compile command. When preprocessor searches for an #include "x.h", it will try to stat "/path/to/include/x.h" and "./x.h" and take the first one that exists. This can introduce indeterminism for the path (./x.h or /abs/x.h) you later get for the header file, e.g. when you try to look up file name by FileID through SourceManager. The path you get for a FileEntry or FileID would depend on how clang looks up a file and how a file is first opened into SourceManager/FileManager. It seems that the current behavior of clangd + in-memory file system would give you paths that are relative to the working directory for both cases. I'm not sure if that's the best behavior, but I think the consistency has its value. For example, in unit tests where in-memory file systems are heavily used, it's important to have a way to compare the reported file path (e.g. file path corresponding to a source location) with the expected paths. We could choose to always return real path, which could be potentially expensive, or we could require users to always compare real paths (it's unclear how well this would work though e.g. ClangTool doesn't expose its vfs), but they need to be enforced by the API. Otherwise, I worry it would cause more confusions for folks who use clang with in-memory file system in the future.

It's hard to tell without seeing an actual failing use case.

I think the clang-move test you mitigated in D48951 is an example. When setting up compiler instance, it uses -I. in the compile command (https://github.com/llvm-mirror/clang-tools-extra/blob/master/unittests/clang-move/ClangMoveTests.cpp#L236), which results in the header paths that start with ./. If you change replace . with the absolute path of the working directory e.g. -I/usr/include, the paths you get will start with /usr/include/. This is caused by the way preprocessor looks up include headers. We could require users (e.g. the clang-move test writer) to clean up dots, but this has to be enforced somehow by the framework.

I understand the argument, but I think the necessity to work as closely as RealFileSystem as possible is important. Otherwise, it becomes impossible to reproduce bugs happening in the real world in unittests.

FWIW, I don't think we could reproduce all real world bugs with unit tests ;) But I agree with you that we should try to converge the behavior if possible, and I support the motivation of this change ;) This change would be perfectly fine as is if in-mem fs is always used directly by users, but the problem is that it's also used inside clang and clang tooling, where users don't control over how files are opened. My point is we should do this in a way that doesn't introduce inconsistency/confusion for users of clang and tooling framework.