This is an archive of the discontinued LLVM Phabricator instance.

[LibTooling] use ToolFileManager to store file managers for each CWD
Needs ReviewPublic

Authored by Kale on May 2 2022, 7:31 PM.

Details

Summary

ClangTool using single FileManager might confuse two header files with the same relative path in different CWD(current working directory).

As the cache of previously opened FileEntry in FileManager is indexed by the file name, which is fine to be used in single compiler invocation, ClangTool, however, needs to parse more than one compile command, and the CWD is not assured to be consistent, thus the previous cached FileEntry which uses relative path as index would be invalid.

This patch tries to solve it by using a container named ToolFileManager called, which actually stores different FileManager for different CWD.

All file managers stored in ToolFileManager will be invalidated from where using FileManager::setVirtualFileSystem(), because of possibly unsound file status caches.

The unit test is written by Hao Zhang <zhanghao19@ios.ac.cn> in his original but unfinished commit.

Refer-to: https://github.com/llvm/llvm-project/issues/54410
Refer-to: https://reviews.llvm.org/D92160

Diff Detail

Event Timeline

Kale created this revision.May 2 2022, 7:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2022, 7:31 PM
Herald added a subscriber: dexonsmith. · View Herald Transcript
Kale requested review of this revision.May 2 2022, 7:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2022, 7:31 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Kale changed the visibility from "Public (No Login Required)" to "Kale (Shi Chen)".May 3 2022, 4:21 AM
Kale updated this revision to Diff 426709.May 3 2022, 7:56 AM

Use different FileManager for each CWD instead. Also add a unit test.

Kale retitled this revision from [Tooling] add ToolingFileManager to force using abs path to [Tooling] use different FileManager for each CWD.May 3 2022, 7:57 AM
Kale edited the summary of this revision. (Show Details)
Kale changed the visibility from "Kale (Shi Chen)" to "Public (No Login Required)".May 3 2022, 8:01 AM
Kale updated this revision to Diff 427313.May 5 2022, 7:18 AM

Fix path in compile_commands.json for windows.

Kale updated this revision to Diff 427317.May 5 2022, 7:32 AM

Format all overriding functions

Two high-level comments:

  • Making these functions virtual makes it harder to reason about possible implementations, but I don't think that's necessary for what you're doing here. Why not expose ToolFileManager as a container of filemanagers which ToolInvocation takes by parameter, then calls getOrCreateFileManager() on at time of need passing relevant arguments (e.g., CWD)?
    • This could also create/manage the relevant VFS instance, to which the FileManager is fundamentally tied. See also below.
  • I don't think this goes far enough. The FileManager keeps state for any accessed file, which can be wrong any time the VFS changes at all, not just if the CWD is different. E.g., different -ivfsoverlay options are passed, or someone is reading from an InMemoryFileSystem vs. from disk.
    • AFAICT, every call to FileManager::setVirtualFileSystem() is fundamentally unsound unless the new FS is equivalent to the old one or the filemanager hasn't been used yet.
  • AFAICT, every call to FileManager::setVirtualFileSystem() is fundamentally unsound unless the new FS is equivalent to the old one or the filemanager hasn't been used yet.

More precisely, I think calling FileManager::setVirtualFileSystem() is fundamentally unsound unless the new VFS behaves identically for every path the FileManager has accessed so far.

Kale added a comment.May 6 2022, 12:01 AM

Two high-level comments:

  • Making these functions virtual makes it harder to reason about possible implementations, but I don't think that's necessary for what you're doing here. Why not expose ToolFileManager as a container of filemanagers which ToolInvocation takes by parameter, then calls getOrCreateFileManager() on at time of need passing relevant arguments (e.g., CWD)?
    • This could also create/manage the relevant VFS instance, to which the FileManager is fundamentally tied. See also below.
  • I don't think this goes far enough. The FileManager keeps state for any accessed file, which can be wrong any time the VFS changes at all, not just if the CWD is different. E.g., different -ivfsoverlay options are passed, or someone is reading from an InMemoryFileSystem vs. from disk.
    • AFAICT, every call to FileManager::setVirtualFileSystem() is fundamentally unsound unless the new FS is equivalent to the old one or the filemanager hasn't been used yet.

Thanks for your careful and useful suggestions. With your explanation, I realized that the change of VFS of course should be considered as well. I'll revise this patch and tries to cope with it.

Kale updated this revision to Diff 427820.May 6 2022, 11:40 PM

Reimplement ToolFileManager as a container of filemanagers which ToolInvocation takes by parameter. Also invalidate all file managers when using FileManager::setVirtualFileSystem().

Kale retitled this revision from [Tooling] use different FileManager for each CWD to [LibTooling] use ToolFileManager to store file managers for each CWD.May 6 2022, 11:51 PM
Kale edited the summary of this revision. (Show Details)
Kale updated this revision to Diff 427822.May 7 2022, 12:25 AM

Fix tests & runInvocation in clang-tools-extra.

This looks mostly correct to me, but perhaps pessimistic enough it'll regress performance unnecessarily. A few comments inline.

clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
169–170 ↗(On Diff #427822)

Should WorkingDirectory be an argument to getOrCreateFileManager() here?

171–172 ↗(On Diff #427822)

Do these need to be moved lower, after the FileMgr/VFS might change?

202–203 ↗(On Diff #427822)

I don't see how calling this on ToolFileMgr threads through to updating the FileMgr in use, which has already been sent to ScanInstance.

Note that createVFSFromCompilerInvocation() *always* returns a new VFS, even if it's equivalent to the one used before (same DepFS options and same -ivfsoverlays from ScanInstance.getInvocation()). Threading it through would mean that FileMgr is never reused.

IMO, never-reuse-the-filemanager is the right/correct behaviour for clang-scan-deps, since DepFS provides caching for things that can be cached soundly, but probably better to explicitly stop reusing the FileMgr (in a separate patch ahead of time) rather than leaving code that looks like it might get reused.

clang/lib/Tooling/Tooling.cpp
445–446

This will *never* reuse the file manager, since it's *always* a new VFS.

  • Maybe that's correct, since we want to inject different things into InMemoryFS?
  • Or, maybe that's overly pessimistic, and we should pass BaseFS into the filemanager if appendArgumentsAdjuster() never added anything to InMemoryFS?
Kale added inline comments.May 14 2022, 1:51 AM
clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
171–172 ↗(On Diff #427822)

Seems visitPrebuiltModule in line 179 uses the FileManager set by ScanInstance.setFileManager(FileMgr);, perhaps it's better to keep it here, and redo this after createVFSFromCompilerInvocation.

202–203 ↗(On Diff #427822)

That makes sense. Will prepend a patch to explicitly create a new FileManager if DepFS is used.

clang/lib/Tooling/Tooling.cpp
445–446

Use BaseFS as the condition to judge whether to invalidate previous file managers? Sounds worth a try. Will amend later.

Kale updated this revision to Diff 430268.May 18 2022, 1:18 AM
  1. Use BaseFS as the condition to judge whether to invalidate previous file managers
  2. Use WorkingDir as a parameter for getOrCreateFileManager