This is an archive of the discontinued LLVM Phabricator instance.

Fix PCM read from ModuleCache for ext4 filesystem
Needs RevisionPublic

Authored by ilyakuteev on Mar 3 2021, 5:34 AM.

Details

Summary

Problem:
FileManager::getFileRef uses UniqueRealFiles std::map to cache file entries by inode. On ext4 filesystem this solution only works for directories without changes during compilation process, which false for ModuleCache directories. This results in misleading cache hits. inode gets reused after PCM or .idx deletion (Happens very often), but this reuse is not tracked in FileManager.UniqueRealFiles.

Solution:
Use getBypassFile for PCM files as it does not have caching by inode.

Diff Detail

Event Timeline

ilyakuteev requested review of this revision.Mar 3 2021, 5:34 AM
ilyakuteev created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2021, 5:34 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ilyakuteev edited the summary of this revision. (Show Details)Mar 3 2021, 5:43 AM

If I understand this correctly, then the inode caching logic in FileManager isn't just breaking PCMs but all file operations:

TEST_F(FileManagerTest, InodeReuse) {
  {
    std::ofstream myfile;
    myfile.open("a.cpp");
    myfile << "content\n";
  }
  llvm::ErrorOr<const FileEntry*> fe1 = manager.getFile("a.cpp");
  EXPECT_TRUE(fe1);
  const FileEntry *f1 = *fe1;
  remove("a.cpp");
  {
    std::ofstream myfile;
    myfile.open("b.cpp");
    myfile << "different content\n";
  }
  llvm::ErrorOr<const FileEntry*> fe2 = manager.getFile("b.cpp");
  EXPECT_TRUE(fe2);
  const FileEntry *f2 = *fe2;
  EXPECT_NE(f2->getSize(), f1->getSize());
  EXPECT_NE(f2->getUniqueID().getFile(), f1->getUniqueID().getFile());
}

This fails consistently for me when running in an empty ext4 directory with:

Expected: (f2->getSize()) != (f1->getSize()), actual: 8 vs 8
Expected: (f2->getUniqueID().getFile()) != (f1->getUniqueID().getFile()), actual: 57855544 vs 57855544

I guess this wasn't considered a valid use case for the normal #include logic within Clang (which I believe is the primary beneficiary of this cache and doesn't really care about a changing file system). But with PCMs and Clang REPLs this is probably causing some strange bugs.

Anyway, I don't think I know the FileManager code well enough to come up with a real fix (beside just removing the inode cache).

@teemperor 's test shows the problem correctly.

In my case I am working on a dist-compilation system (similar to distcc) for objective-c with -fmodules. Our previous generation used tmpfs for module cache and was ephemeral (Unique temp module cache per compilation). Now we want to move module cache to ext4 filesystem to make it actually cache modules across compilations. This bug is blocking us from doing this change, so I came up with this draft fix.

I think that removing inode cache for PCMs may be some kind of final solution. (For example we can make another FileManager.getNoncacedFileRef method).

It seems we already hit this issue before and decided to add a workaround: https://reviews.llvm.org/D86823

I believe removing inode numbers is the correct fix, yes. The workaround I applied, and the one here, are both insufficient in the general case.

dexonsmith requested changes to this revision.Mar 15 2021, 2:14 PM

Agreed that the right fix is to remove/replace the inode-based cache, but it's not safe to just drop it. Consider the following filesystem layout:

/dir/file.h
/dir/symlink.h -> file.h

the inode-based cache is what tells us that the following all point at the same file (given a working directory of /dir):

  • /dir/file.h
  • /dir/./file.h
  • ../dir/file.h
  • ./file.h
  • file.h
  • symlink.h
  • ./symlink.h
  • ../dir/symlink.h
  • /dir/symlink.h
  • /dir/./symlink.h

We can't drop the inode cache until there's another solution in place for this.

(One roundabout solution would be to keep this unchanged, and rely on the VFS to provide stable inodes. As it happens, I'm prototyping an on-disk caching FileSystem implementation that assigns stable/virtual inodes for each real path (hard links get distinct inodes); doesn't seem to have any overhead over the "real" filesystem (at least not at scale in clang-scan-deps). We could change the main path in Clang to use something like that, and change the PCM cache to use different logic (it's the only filemanager client that doesn't want/expect a stable view of the FS). I don't think it'd be too hard.)

Marking this as "request changes" since getBypassFile() is unsound and I'd prefer its use not be proliferated; I have some WIP to remove its current use.

Maybe there's a simpler solution though. If it's safe to use getBypassFile() every time a PCM file is opened, then we're not getting any value out of the FileManager. Maybe the module manager could skip the FileManager entirely...

This revision now requires changes to proceed.Mar 15 2021, 2:14 PM

If a fix will be in ModuleManager and only for ModuleCache the problem with symlinks and path will not affect it as ModuleCache is managed by clang and we can rely on it.
I agree that using FileMgr.getBypassFile is not the best way to solve this problem, we need to replace FileMgr.getFileRef with some other method but I did not found such method in FileManager. Maybe we need to add one or not use FileManager as was mentioned. Not sure which way is better and safer.

If a fix will be in ModuleManager and only for ModuleCache the problem with symlinks and path will not affect it as ModuleCache is managed by clang and we can rely on it.
I agree that using FileMgr.getBypassFile is not the best way to solve this problem, we need to replace FileMgr.getFileRef with some other method but I did not found such method in FileManager. Maybe we need to add one or not use FileManager as was mentioned. Not sure which way is better and safer.

FileManager plays two roles (unless I'm missing a third?):

  • Establish an identity for multiple paths that should be treated as "the same" (address of FileEntry).
  • Cache stat information (the content of FileEntry).

This patch is predicated on it being safe to skip the former (makes sense to me, although I think clang is a bit inconsistent about using relative paths for the module cache, so there might be some work to do). We already skip / avoid the latter (ModuleManager is the only caller of FileManager::getNoncachedStatValue). Seems like skipping the FileManager will simplify both the ModuleManager and the FileManager.

Might be worth an RFC on cfe-dev?