This is an archive of the discontinued LLVM Phabricator instance.

Reuse preamble even if an unsaved file does not exist
ClosedPublic

Authored by nik on Dec 8 2017, 6:17 AM.

Details

Summary

When a preamble is created an unsaved file not existing on disk is
already part of PrecompiledPreamble::FilesInPreamble. However, when
checking whether the preamble can be re-used, a failed stat of such an
unsaved file invalidated the preamble, which led to pointless and time
consuming preamble regenerations on subsequent reparses.

Do not require anymore that unsaved files should exist on disk.

This avoids costly preamble invalidations depending on timing issues for
the cases where the file on disk might be removed just to be regenerated
a bit later.

It also allows an IDE to provide in-memory files that might not exist on
disk, e.g. because the build system hasn't generated those yet.

Diff Detail

Event Timeline

nik created this revision.Dec 8 2017, 6:17 AM
ilya-biryukov requested changes to this revision.Dec 8 2017, 7:27 AM

This looks useful. Could we avoid creating the OverlayFileSystem, though?

include/clang/Frontend/PrecompiledPreamble.h
109 ↗(On Diff #126134)

Having this time stamp in the interface of PrecompiledPreamble doesn't really makes sense.

I propose we remove this method and test in a different way instead.

For example, we could add a counter to ASTUnit that increases when preamble is built and check this counter.
Or we could add a unit-test that uses PrecompiledPreamble directly.

lib/Frontend/PrecompiledPreamble.cpp
455

Can we do the same thing without creating an additional OverlayFileSystem?

If we add a separate map to check for those:

llvm::StringMap<PreambleFileHash> OverriddenFilesWithoutFS.
// Populate the map with paths not existing in VFS.

for (const auto &F : FilesInPreamble) {
    vfs::Status Status;
    if (!moveOnNoError(OFS.status(F.first()), Status)) {
      // If we can't stat the file, check whether it was overriden
      auto It = OverriddenFilesWithoutFS[F.first()];
      if (It == OverriddenFilesWithoutFS.end())
        return false;  
      //.....
    }
//......

}
456–457

assert macro is a no-op when NDEBUG is defined.
One must never put an operation with side-effects inside assert.

This revision now requires changes to proceed.Dec 8 2017, 7:27 AM
cameron314 edited edge metadata.Dec 8 2017, 7:40 AM

It's been a while since I was in this code, but I seem to recall the file needed to exist on disk in order to uniquely identify it (via inode). Does this break the up-to-date check?

It's been a while since I was in this code, but I seem to recall the file needed to exist on disk in order to uniquely identify it (via inode). Does this break the up-to-date check?

When the file is missing from the disk, but was remapped, preamble can not be reused. (Because we're always looking at uniqueids).
If the remapped file did not exist on disk originally when building the preamble, we should allow the preamble to be reused if it's still remapped, but not created on disk.

nik marked 2 inline comments as done.Dec 11 2017, 6:29 AM
nik added inline comments.
include/clang/Frontend/PrecompiledPreamble.h
109 ↗(On Diff #126134)

For example, we could add a counter to ASTUnit that increases when preamble is built and check this counter.

OK, I've followed this proposal because there is already test infrastructure in PCHPreambleTest that I can easily reuse.

lib/Frontend/PrecompiledPreamble.cpp
455

Can we do the same thing without creating an additional OverlayFileSystem?

Hmm, I thought I had a good reason for this one. I don't remember it and the test still passes, so I did it without it now.

Is there any real advantage in first trying the stat, then checking OverriddenFilesWithoutFS? Since I don't see any, I've changed order because the stat can then be avoided; also, it removes some nesting.

456–457

Huch, forgot to remove it on cleanup. Anyway, it's obsolete now.

nik updated this revision to Diff 126353.Dec 11 2017, 6:30 AM
nik edited edge metadata.
nik marked 2 inline comments as done.

Addressed Ilya's comments.

nik added inline comments.Dec 11 2017, 6:33 AM
include/clang/Frontend/ASTUnit.h
214

Any better name for this one? Otherwise I would suggest renaming PreambleRebuildCounter to PreambleRebuildCountdownCounter to make them more distinct.

ilya-biryukov added inline comments.Dec 13 2017, 5:14 AM
include/clang/Frontend/ASTUnit.h
214

+1, names PreambleRebuildCounter and PreambleCounter are too similar.

lib/Frontend/PrecompiledPreamble.cpp
455

I don't see any advantage.
I used to think it had something to do with overriden symlinks, but after thinking more about it I'm not even sure what should the semantics of those be. And I don't think the current code handles that (we have neither tests nor comments suggesting that this use-case was considered in the first place).

This possibly handles case-insensitive filesystems, like NTFS on Windows. But I'm not sure if that was the original intention of this code.

484

Will anything fail if we remove the map from UniqueID to hashes of overriden files and the corresponding checks?

We should either document why having UniqueID-based checks is important or remove the code doing those checks.

nik added inline comments.Dec 14 2017, 7:03 AM
include/clang/Frontend/ASTUnit.h
214

OK, will address in the next patch set / diff.

lib/Frontend/PrecompiledPreamble.cpp
484

Will anything fail if we remove the map from UniqueID to hashes of overriden files and the corresponding checks?

Hmm, I would need to remove/replace it and run the tests.

I see these reasons for UniqueID:

  • It's already there :)
  • Normalization of file paths is not necessary.
  • It potentially can deal with hard links, though I'm not sure whether this is real world use case at all.

We should either document why having UniqueID-based checks is important or remove the code doing those checks.

Agree.

nik added inline comments.Jan 2 2018, 6:47 AM
lib/Frontend/PrecompiledPreamble.cpp
484

Will anything fail if we remove the map from UniqueID to hashes of overriden files and the corresponding checks?

OK, I've applied the following patch (on this patch set/revision here) applied and run check-clang - no failures here.

From 8f755128d1e167193c8f6de1456aec01d14307f6 Mon Sep 17 00:00:00 2001
From: Nikolai Kosjar <nikolai.kosjar@qt.io>
Date: Tue, 2 Jan 2018 15:14:07 +0100
Subject: [PATCH] Use file path instead of uniqueID

---
 lib/Frontend/PrecompiledPreamble.cpp | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/Frontend/PrecompiledPreamble.cpp b/lib/Frontend/PrecompiledPreamble.cpp
index a43205a1c3..09e2b20a48 100644
--- a/lib/Frontend/PrecompiledPreamble.cpp
+++ b/lib/Frontend/PrecompiledPreamble.cpp
@@ -433,7 +433,7 @@ bool PrecompiledPreamble::CanReuse(const CompilerInvocation &Invocation,
   // Check that none of the files used by the preamble have changed.
   // First, make a record of those files that have been overridden via
   // remapping or unsaved_files.
-  std::map<llvm::sys::fs::UniqueID, PreambleFileHash> OverriddenFiles;
+  std::map<std::string, PreambleFileHash> OverriddenFiles;
   for (const auto &R : PreprocessorOpts.RemappedFiles) {
     vfs::Status Status;
     if (!moveOnNoError(VFS->status(R.second), Status)) {
@@ -442,7 +442,7 @@ bool PrecompiledPreamble::CanReuse(const CompilerInvocation &Invocation,
       return false;
     }
 
-    OverriddenFiles[Status.getUniqueID()] = PreambleFileHash::createForFile(
+    OverriddenFiles[R.first] = PreambleFileHash::createForFile(
         Status.getSize(), llvm::sys::toTimeT(Status.getLastModificationTime()));
   }
 
@@ -470,8 +470,8 @@ bool PrecompiledPreamble::CanReuse(const CompilerInvocation &Invocation,
         return false;
     }
 
-    std::map<llvm::sys::fs::UniqueID, PreambleFileHash>::iterator Overridden =
-        OverriddenFiles.find(Status.getUniqueID());
+    std::map<std::string, PreambleFileHash>::iterator Overridden =
+        OverriddenFiles.find(F.first());
     if (Overridden != OverriddenFiles.end()) {
       // This file was remapped; check whether the newly-mapped file
       // matches up with the previous mapping.
-- 
2.15.1
nik updated this revision to Diff 128416.Jan 2 2018, 6:52 AM

Rebased and renamed the counter variable only.

I do not feel comfortable changing the "std::map<llvm::sys::fs::UniqueID,
PreambleFileHash> OverriddenFiles". I can do this in a follow-up change if you
want.

@Ivan: Coul you please run the tests with this change on Windows?! If it goes
well (no failures), then please also give it also a try with the additional
change (reply to "Will anything fail if we remove the map from UniqueID to
hashes of overriden files and the corresponding checks?").

yvvan added a comment.Jan 3 2018, 12:45 AM

No regression in tests on Windows with and without extra patch ([PATCH] Use file path instead of uniqueID)

ilya-biryukov added inline comments.Jan 3 2018, 2:26 AM
include/clang/Frontend/ASTUnit.h
208–211

NIT: Maybe shorten to PreambleRebuildCountdown?
It's not a great name, but a bit shorter than now and seems to convey the same meaning.

unittests/Frontend/PCHPreambleTest.cpp
129

NIT: Maybe use raw string literals? Escpated string are hard to read.
E.g.,

R"cpp(
#include "//./header1.h"
int main() { return ZERO; }
)cpp"
132

Are we testing the right thing here?

We're testing that preamble does not get rebuild when some header that was previously unresolved when seen inside #include directive is now added to the filesystem. That is actually a bug, we should rebuild the preamble in that case.

We should probably call RemapFile before calling ParseAST instead and make sure it's properly resolved.

AddFile(MainName, ...);
// We don't call AddFile for the header at this point, so that it does not exist on the filesystem.
RemapFile(Header1, "#define ZERO 0\n");

std::unique_ptr<ASTUnit> AST(ParseAST(MainName));
// Also assert that there were no compiler errors? (I.e. file was resolved properly)
// .... 

// Now the file is on the filesystem, call reparse and check that we reused the preamble.
AddFile(Header1, "#define ZERO 0\n");
ASSERT_TRUE(ReparseAST(AST));
ASSERT_EQ(AST->getPreambleCounter(), 1U);

@yvvan: The clang frontend tests (PCHPreambleTest and friends) are disabled on Windows in the makefile (I think because the VFS tests depend on linux-like paths). So running the tests on Windows without failures is encouraging but not the whole story.

yvvan added a comment.Feb 8 2018, 6:49 AM

@yvvan: The clang frontend tests (PCHPreambleTest and friends) are disabled on Windows in the makefile (I think because the VFS tests depend on linux-like paths). So running the tests on Windows without failures is encouraging but not the whole story.

Nice to know. But the bad thing is that it is not obvious at all from the code. And another bad thing is that tests are platform dependent.

nik added a comment.Apr 19 2018, 7:03 AM

Sorry for the delay, I think I'll come back to this one soon.

nik added a comment.Nov 6 2018, 7:19 AM

Coming back to this one, I see a failing test: PCHPreambleTest.ReparseWithOverriddenFileDoesNotInvalidatePreamble

Note that PCHPreambleTest.ReparseWithOverriddenFileDoesNotInvalidatePreamble references the header paths in different ways ("./header1.h" vs "./foo/../header1.h"). Before this change, this was not a problem because OverriddenFiles were keyed on Status.getUniqueID(). Starting with this change, the key is the file path.

Is there a nice way to map different file paths of the same file to the same id without touching the real file system? Would it be sufficient to normalize the file paths? If yes, is there a utility function for this (can't find it right now).

include/clang/Frontend/ASTUnit.h
208–211

Will do.

unittests/Frontend/PCHPreambleTest.cpp
129

Will do.

132

Are we testing the right thing here?

Huch, indeed, the test is wrong.

I'll incorporate your suggested test (real file is added after providing unsaved file) and add also another one: real file is removed after providing an unsaved file.

That is actually a bug, we should rebuild the preamble in that case.

Agree, the preamble should be rebuild in such a case, but that's something for a separate change.

Before this change, this was not a problem because OverriddenFiles were keyed on Status.getUniqueID(). Starting with this change, the key is the file path.

I suggest keeping two maps for overridden files: one for existing files (key is UniqueID), another one for the ones that don't exist (key is the file path).

Is there a nice way to map different file paths of the same file to the same id without touching the real file system? Would it be sufficient to normalize the file paths? If yes, is there a utility function for this (can't find it right now).

I don't think there is one, the unique ids are closely tied to the filesystem. IIUC the unique ids are the same whenever two different paths are symlinks (or hardlinks?) pointing to the same physical file and there's no way to tell if they're the same without resolving the symlink.

nik added a comment.Nov 14 2018, 5:34 AM

Before this change, this was not a problem because OverriddenFiles were keyed on Status.getUniqueID(). Starting with this change, the key is the file path.

I suggest keeping two maps for overridden files: one for existing files (key is UniqueID), another one for the ones that don't exist (key is the file path).

Done.

Is there a nice way to map different file paths of the same file to the same id without touching the real file system? Would it be sufficient to normalize the file paths? If yes, is there a utility function for this (can't find it right now).

I don't think there is one, the unique ids are closely tied to the filesystem. IIUC the unique ids are the same whenever two different paths are symlinks (or hardlinks?) pointing to the same physical file and there's no way to tell if they're the same without resolving the symlink.

OK, so if the unsaved file is not being backed up by a real file on disk and symlinks are used, we can't do much about it.
I've normalized the file paths to handle at least that case where they might differ.

nik updated this revision to Diff 174022.Nov 14 2018, 5:38 AM

Addressed comments.

ilya-biryukov added inline comments.Dec 3 2018, 7:33 AM
include/clang/Frontend/ASTUnit.h
580

NIT: getPreambleCounterForTests()? This is clearly an internal detail, would try giving it a name that discourages using it outside the testing code.

lib/Frontend/ASTUnit.cpp
1369

Why not increase this counter unconditionally for every error? (And a non-error case too, I guess)

lib/Frontend/PrecompiledPreamble.cpp
463

Could we avoid adding path normalization in this patch? Would it break anything?
This is clearly an important detail that preamble might get wrong, but I'd rather add it in a separate patch to avoid putting multiple things in the same patch.

nik marked 4 inline comments as done.Dec 6 2018, 6:15 AM
nik added inline comments.
include/clang/Frontend/ASTUnit.h
580

Done.

Side note: I hardly see something like this used in clang, but I agree that it's good.

nik updated this revision to Diff 176962.Dec 6 2018, 6:15 AM
nik marked an inline comment as done.

Addressed comments.

ilya-biryukov added inline comments.Dec 13 2018, 1:38 AM
lib/Frontend/PrecompiledPreamble.cpp
461

NIT: remove braces around single-statement branches, they are usually omitted in the LLVM code

nik updated this revision to Diff 186429.Feb 12 2019, 2:37 AM
nik marked an inline comment as done.

Addressed comment.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2019, 2:37 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
nik added a comment.Feb 12 2019, 2:50 AM

Meh, something changed in the meanwhile. ReparseReusesPreambleAfterUnsavedFileWasRemovedFromDisk fails now. Looking into it.

nik updated this revision to Diff 186436.Feb 12 2019, 3:59 AM

Meh, something changed in the meanwhile. ReparseReusesPreambleAfterUnsavedFileWasRemovedFromDisk fails now. Looking into it.

No, it's just me ;) I've referenced the header file wrong.

nik updated this revision to Diff 198656.May 8 2019, 7:19 AM

Minor diff update fixing indentation and removing not needed include.

Ping.

Again, sorry for the delay. This looks good, just a few NITs from me before I stamp it

lib/Frontend/PrecompiledPreamble.cpp
455

Could you add a comment that this contains only the files there were not found on disk (i.e. the vfs call failed and we couldn't get a UniqueID)

470

NIT: change to: The file's buffer was remapped and the file was not found in VFS

nik marked 2 inline comments as done.May 9 2019, 6:35 AM
nik updated this revision to Diff 198809.May 9 2019, 6:35 AM

Addressed inline comments.

ilya-biryukov accepted this revision.May 17 2019, 2:18 AM

Thanks! LGTM

This revision is now accepted and ready to land.May 17 2019, 2:18 AM
This revision was automatically updated to reflect the committed changes.