Page MenuHomePhabricator

[VirtualFileSystem] InMemoryFileSystem::status: Return a Status with the requested name
ClosedPublic

Authored by simark on Jul 3 2018, 3:41 PM.

Details

Summary

InMemoryFileSystem::status behaves differently than
RealFileSystem::status. The Name contained in the Status returned by
RealFileSystem::status will be the path as requested by the caller,
whereas InMemoryFileSystem::status returns the normalized path.

For example, when requested the status for "../src/first.h",
RealFileSystem returns a Status with "../src/first.h" as the Name.
InMemoryFileSystem returns "/absolute/path/to/src/first.h".

The reason for this change is that I want to make a unit test in the
clangd testsuite (where we use an InMemoryFileSystem) to reproduce a
bug I get with the clangd program (where a RealFileSystem is used).
This difference in behavior "hides" the bug in the unit test version.

Diff Detail

Repository
rC Clang

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
simark updated this revision to Diff 154422.Jul 6 2018, 9:13 AM
simark marked an inline comment as done.
  • Use StringRef in InMemoryNode::getStatus
  • Remove unused variable in TEST_F(InMemoryFileSystemTest, StatusName)
simark added a comment.Jul 6 2018, 3:03 PM

I found something fishy. There is this code in FileManager.cpp:

if (UFE.File)
  if (auto RealPathName = UFE.File->getName())
    UFE.RealPathName = *RealPathName;

The real path is obtained from UFE.File->getName(). In the RealFile implementation, it returns the RealName field, which is fine. For other implementations, it uses File::getName, which is:

/// Get the name of the file
virtual llvm::ErrorOr<std::string> getName() {
  if (auto Status = status())
    return Status->getName().str();
  else
    return Status.getError();
}

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

    I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

I'd also say FileManager should use FileSystem::getRealPath. The code that does it differently was there before FileSystem::getRealPath was added.
And RealFile should probably not do any magic in getName, we could add a separate method for (getRealName?) if that's absolutely needed.

Refactorings in that area would probably break stuff and won't be trivial and I don't think this change should be blocked by those. So I'd be happy if this landed right away with a FIXME in FileManager mentioning that InMemoryFileSystem might give surprising results there.
@ioeric added FileSystem::getRealPath, he may more ideas on how we should proceed.

lib/Basic/VirtualFileSystem.cpp
533

This LG now, it was unindented in the original version.

739

Sorry, I thought Path is StringRef, but it's actually Twine, so we do need the str() call.

ilya-biryukov added inline comments.Jul 9 2018, 1:13 AM
lib/Basic/VirtualFileSystem.cpp
494

Given that this method is inconsistent with getStatus() and seems to be only used in toString methods, maybe we could make it protected? Otherwise it's really easy to write code that gets the wrong path.

simark marked 6 inline comments as done.Jul 9 2018, 7:59 AM
simark added inline comments.
lib/Basic/VirtualFileSystem.cpp
494

I now use it in InMemoryDirIterator::setCurrentEntry. I will write a comment to the getName method to clarify this.

simark marked an inline comment as done.Jul 9 2018, 9:58 AM

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

    I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

I'd also say FileManager should use FileSystem::getRealPath. The code that does it differently was there before FileSystem::getRealPath was added.
And RealFile should probably not do any magic in getName, we could add a separate method for (getRealName?) if that's absolutely needed.

I made FileManager::getFile use FileSystem::getRealPath and see no regression in clang and clang-tools-extra.

simark updated this revision to Diff 154631.Jul 9 2018, 9:59 AM
  • Use FileSystem::getRealPath in FileManager::getFile
simark updated this revision to Diff 154662.Jul 9 2018, 11:52 AM
  • Change InMemoryNode::getName to InMemoryNode::getFileName, to reduce the risk

of mis-using it. Make the Stat field protected, make the subclasses' toString
access it directly.

ilya-biryukov added inline comments.Jul 10 2018, 1:03 AM
lib/Basic/FileManager.cpp
320

NIT: replace replace equality with negative test, i.e. if (!FS->getRealPath(…))

I'm not a big fan of bash-like error codes, but that seems to be the idiomatic way to use them.

lib/Basic/VirtualFileSystem.cpp
478

The inheritors should not be able to modify this field.
Can we get away with a private field and a protected getter that returns a const reference instead?

494

getFileName as a public method and its usage in setCurrentEntry LG , thanks!

simark updated this revision to Diff 154782.Jul 10 2018, 6:17 AM
simark marked 5 inline comments as done.
  • Make InMemoryNode::Stat private again, add protected accessor.
  • Change condition formatting.
ilya-biryukov accepted this revision.Jul 10 2018, 7:23 AM

LGTM if that does not introduce any regressions in clang and clang-tools.

lib/Basic/VirtualFileSystem.cpp
767

NIT: maybe increase the size to 256? This could save an extra allocation in some cases, and hopefully won't be expensive, stack allocs are cheap in most cases.

This revision is now accepted and ready to land.Jul 10 2018, 7:23 AM
simark marked an inline comment as done.Jul 10 2018, 10:11 AM

LGTM if that does not introduce any regressions in clang and clang-tools.

Thanks. I have seen no failures in check-clang and check-clang-tools, so I will push it.

simark updated this revision to Diff 154835.Jul 10 2018, 10:21 AM

Bump SmallString size from 32 to 256

Thanks. I have seen no failures in check-clang and check-clang-tools, so I will push it.

LG! We can always revert the change is anything breaks...

This revision was automatically updated to reflect the committed changes.

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

    I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

I'd also say FileManager should use FileSystem::getRealPath. The code that does it differently was there before FileSystem::getRealPath was added.
And RealFile should probably not do any magic in getName, we could add a separate method for (getRealName?) if that's absolutely needed.

Refactorings in that area would probably break stuff and won't be trivial and I don't think this change should be blocked by those. So I'd be happy if this landed right away with a FIXME in FileManager mentioning that InMemoryFileSystem might give surprising results there.
@ioeric added FileSystem::getRealPath, he may more ideas on how we should proceed.

Sorry for being late in the discussion. I'm not sure if getRealPath is the right thing to do in FileManager as it also supports virtual files added via FileManager::getVirtualFile, and they don't necessary have a real path. This would also break users of ClangTool::mapVirtualFile when the mapped files are relative. As a matter of fact, I'm already seeing related breakages in our downstream branch :(

Would you mind reverting this patch for now so that we can come up with a solution to address those use cases?

Sorry again about missing the discussion earlier!

With the InMemoryFileSystem, this now returns a non-real path. The result is that we fill RealPathName with that non-real path. I see two options here:

  1. Either the FileManager is wrong to assume that File::getName returns a real path, and should call FS->getRealPath to do the job.
  2. If the contract is that the File::getName interface should return a real path, then we should fix the File::getName implementation to do that somehow.

    I would opt for 1. This way, we could compute the RealPathName field even if we don't open the file (unlike currently).

I'd also say FileManager should use FileSystem::getRealPath. The code that does it differently was there before FileSystem::getRealPath was added.
And RealFile should probably not do any magic in getName, we could add a separate method for (getRealName?) if that's absolutely needed.

Refactorings in that area would probably break stuff and won't be trivial and I don't think this change should be blocked by those. So I'd be happy if this landed right away with a FIXME in FileManager mentioning that InMemoryFileSystem might give surprising results there.
@ioeric added FileSystem::getRealPath, he may more ideas on how we should proceed.

Sorry for being late in the discussion. I'm not sure if getRealPath is the right thing to do in FileManager as it also supports virtual files added via FileManager::getVirtualFile, and they don't necessary have a real path. This would also break users of ClangTool::mapVirtualFile when the mapped files are relative. As a matter of fact, I'm already seeing related breakages in our downstream branch :(

Would you mind reverting this patch for now so that we can come up with a solution to address those use cases?

Sorry again about missing the discussion earlier!

Sorry, having taken a closer look, it seems that only ClangTool::mapVirtualFile would be affected. I'll try to see if there is an easy fix.

Would you mind reverting this patch for now so that we can come up with a solution to address those use cases?

Sorry again about missing the discussion earlier!

Of course, feel free to revert if needed (I'm not sure how to do that). Are you able to come up with a test case that covers the use case you mention?

Would you mind reverting this patch for now so that we can come up with a solution to address those use cases?

Sorry again about missing the discussion earlier!

Of course, feel free to revert if needed (I'm not sure how to do that). Are you able to come up with a test case that covers the use case you mention?

Thanks! I'll try to come up with a smaller test case asap.

The virtual file support in FileManager is totally confusing and full of traps (when used with VFS). I really feel we should replace it with InMemoryFileSystem at some point :(

After looking at a few more use cases of the in-memory file system, I think a problem we need to address is the consistency of file paths you get from clang when using in-mem vfs. The clang-move tests that you have mitigated in D48951 could be an example.

For example, suppose you have header search directories -I/path/to/include and -I. in your compile command. When preprocessor searches for an #include "x.h", it will try to stat "/path/to/include/x.h" and "./x.h" and take the first one that exists. This can introduce indeterminism for the path (./x.h or /abs/x.h) you later get for the header file, e.g. when you try to look up file name by FileID through SourceManager. The path you get for a FileEntry or FileID would depend on how clang looks up a file and how a file is first opened into SourceManager/FileManager. It seems that the current behavior of clangd + in-memory file system would give you paths that are relative to the working directory for both cases. I'm not sure if that's the best behavior, but I think the consistency has its value. For example, in unit tests where in-memory file systems are heavily used, it's important to have a way to compare the reported file path (e.g. file path corresponding to a source location) with the expected paths. We could choose to always return real path, which could be potentially expensive, or we could require users to always compare real paths (it's unclear how well this would work though e.g. ClangTool doesn't expose its vfs), but they need to be enforced by the API. Otherwise, I worry it would cause more confusions for folks who use clang with in-memory file system in the future.

For example, suppose you have header search directories -I/path/to/include and -I. in your compile command. When preprocessor searches for an #include "x.h", it will try to stat "/path/to/include/x.h" and "./x.h" and take the first one that exists. This can introduce indeterminism for the path (./x.h or /abs/x.h) you later get for the header file, e.g. when you try to look up file name by FileID through SourceManager. The path you get for a FileEntry or FileID would depend on how clang looks up a file and how a file is first opened into SourceManager/FileManager. It seems that the current behavior of clangd + in-memory file system would give you paths that are relative to the working directory for both cases. I'm not sure if that's the best behavior, but I think the consistency has its value. For example, in unit tests where in-memory file systems are heavily used, it's important to have a way to compare the reported file path (e.g. file path corresponding to a source location) with the expected paths. We could choose to always return real path, which could be potentially expensive, or we could require users to always compare real paths (it's unclear how well this would work though e.g. ClangTool doesn't expose its vfs), but they need to be enforced by the API. Otherwise, I worry it would cause more confusions for folks who use clang with in-memory file system in the future.

It's hard to tell without seeing an actual failing use case. I understand the argument, but I think the necessity to work as closely as RealFileSystem as possible is important. Otherwise, it becomes impossible to reproduce bugs happening in the real world in unittests.

For example, suppose you have header search directories -I/path/to/include and -I. in your compile command. When preprocessor searches for an #include "x.h", it will try to stat "/path/to/include/x.h" and "./x.h" and take the first one that exists. This can introduce indeterminism for the path (./x.h or /abs/x.h) you later get for the header file, e.g. when you try to look up file name by FileID through SourceManager. The path you get for a FileEntry or FileID would depend on how clang looks up a file and how a file is first opened into SourceManager/FileManager. It seems that the current behavior of clangd + in-memory file system would give you paths that are relative to the working directory for both cases. I'm not sure if that's the best behavior, but I think the consistency has its value. For example, in unit tests where in-memory file systems are heavily used, it's important to have a way to compare the reported file path (e.g. file path corresponding to a source location) with the expected paths. We could choose to always return real path, which could be potentially expensive, or we could require users to always compare real paths (it's unclear how well this would work though e.g. ClangTool doesn't expose its vfs), but they need to be enforced by the API. Otherwise, I worry it would cause more confusions for folks who use clang with in-memory file system in the future.

It's hard to tell without seeing an actual failing use case.

I think the clang-move test you mitigated in D48951 is an example. When setting up compiler instance, it uses -I. in the compile command (https://github.com/llvm-mirror/clang-tools-extra/blob/master/unittests/clang-move/ClangMoveTests.cpp#L236), which results in the header paths that start with ./. If you change replace . with the absolute path of the working directory e.g. -I/usr/include, the paths you get will start with /usr/include/. This is caused by the way preprocessor looks up include headers. We could require users (e.g. the clang-move test writer) to clean up dots, but this has to be enforced somehow by the framework.

I understand the argument, but I think the necessity to work as closely as RealFileSystem as possible is important. Otherwise, it becomes impossible to reproduce bugs happening in the real world in unittests.

FWIW, I don't think we could reproduce all real world bugs with unit tests ;) But I agree with you that we should try to converge the behavior if possible, and I support the motivation of this change ;) This change would be perfectly fine as is if in-mem fs is always used directly by users, but the problem is that it's also used inside clang and clang tooling, where users don't control over how files are opened. My point is we should do this in a way that doesn't introduce inconsistency/confusion for users of clang and tooling framework.

I uploaded a new version of this patch here:
https://reviews.llvm.org/D49804

It would make it easier for your reviewers to look at the new changes if
you just reopen this patch and update the diff :)

simark reopened this revision.Jul 25 2018, 9:06 AM

It would make it easier for your reviewers to look at the new changes if
you just reopen this patch and update the diff :)

I tried but didn't know how. But I just saw the "Add Action" drop down with "Reopen" in it.

This revision is now accepted and ready to land.Jul 25 2018, 9:06 AM
simark updated this revision to Diff 157287.Jul 25 2018, 9:07 AM

Fix tests on Windows

Fix InMemoryFileSystem tests on Windows. The paths returned by the
InMemoryFileSystem directory iterators in the tests mix posix and windows
directory separators. THis is because we do queries with posix-style
separators ("/a/b") but filnames are appended using native-style separators
(backslash on Windows). So we end up with "/a/b\c".

I fixed the test by re-using the ReplaceBackslashes function defined in another
test. I'm not sure this is the best fix, but the only alternative I see would
be to completely rewrite tests to use posix-style paths on non-Windows and
Windows-style paths on Windows. That would lead to quite a bit of
duplication...

simark requested review of this revision.Jul 25 2018, 9:08 AM

Many thanks! Great cleanup. Just a few nits and we're good to go

lib/Basic/VirtualFileSystem.cpp
476

NIT: maybe keep the order of members the same to keep the patch more focused? Unless there's a good reason to swap them.

530

NIT: remove this blank line to follow the code style of the file more closely?

769

NIT: Path.str() can be replaced with Path (SmallString is convertible to StringRef)

unittests/Basic/VirtualFileSystemTest.cpp
156

Maybe replace lambda with a funciton?

156

Could the name mention the expected result? E.g. getUnixPath() or something similar

157

Maybe use llvm::sys::path::native(S, style::posix)?

793–795

Maybe add a comment about windows and the path we get there?

simark marked 7 inline comments as done.Jul 26 2018, 5:36 AM
simark added inline comments.
lib/Basic/VirtualFileSystem.cpp
476

Oops, that was not intended.

simark updated this revision to Diff 157467.Jul 26 2018, 5:36 AM
simark marked an inline comment as done.

I think this addresses all of Ilya's comments.

This revision is now accepted and ready to land.Jul 26 2018, 9:02 AM
This revision was automatically updated to reflect the committed changes.

@eric_niebler, question for you:

This patch causes clang to emit a -Wnonportable-include-path warning where it did not before. It affects the following test on Windows:
https://github.com/llvm-mirror/clang/blob/master/test/PCH/case-insensitive-include.c

The warning is currently not emitted for the last clang invocation (the one that includes PCH), because the real path value is not available, and therefore this condition is false:
https://github.com/llvm-mirror/clang/blob/fe1098c84823b8eac46b0bfffc5f5788b6c26d1a/lib/Lex/PPDirectives.cpp#L2015

With this patch, the real path value is available, so we emit the warning. Is it on purpose that this warning is not emitted in this case? If not should I simply add -Wno-nonportable-include-path to the last clang invocation, as was done earlier with the first invocation?

It seems to me like the warning is valid, even though we use precompiled headers.

It seems to me like the warning is valid, even though we use precompiled headers.

Agreed.

simark reopened this revision.Aug 1 2018, 1:07 PM
This revision is now accepted and ready to land.Aug 1 2018, 1:07 PM
simark updated this revision to Diff 158625.Aug 1 2018, 1:49 PM

Add -Wno-nonportable-include-path to test/PCH/case-insensitive-include.c

This revision got 'reopened' and is now in the list of accepted revision. Should we close it again?

simark requested review of this revision.Aug 3 2018, 9:24 AM

This revision got 'reopened' and is now in the list of accepted revision. Should we close it again?

It got reverted a second time because it was breaking a test on Windows. The new bit is the change to test/PCH/case-insensitive-include.c, so it would need review again. If somebody else could run the tests on Windows, it would make me a bit more confident too.

If somebody else could run the tests on Windows, it would make me a bit more confident too.

Which tests/targets exactly? If you know

If somebody else could run the tests on Windows, it would make me a bit more confident too.

Which tests/targets exactly? If you know

NVM, I saw the "check-clang and check-clang-tools" above.

Both check-clang and check-clang-tools pass successfully for me on Windows with the patch.

simark added a comment.Aug 5 2018, 6:39 PM

Both check-clang and check-clang-tools pass successfully for me on Windows with the patch.

Awesome, thanks!

ilya-biryukov accepted this revision.Aug 6 2018, 1:01 AM

This revision got 'reopened' and is now in the list of accepted revision. Should we close it again?

It got reverted a second time because it was breaking a test on Windows. The new bit is the change to test/PCH/case-insensitive-include.c, so it would need review again. If somebody else could run the tests on Windows, it would make me a bit more confident too.

I see, thanks for the explanation.

LGTM for the update revision, given we have confirmation the tests pass on Windows.

This revision is now accepted and ready to land.Aug 6 2018, 1:01 AM
simark added a comment.Aug 6 2018, 2:47 PM

I see, thanks for the explanation.

LGTM for the update revision, given we have confirmation the tests pass on Windows.

Thanks, I'll push it, let's hope this time is the right time!

This revision was automatically updated to reflect the committed changes.